GWAS Catalog Track Settings
 
NHGRI-EBI Catalog of Published Genome-Wide Association Studies   (All Phenotype and Literature tracks)

Display mode:      Duplicate track
Data schema/format description and download
Assembly: Human Feb. 2009 (GRCh37/hg19)
Data last updated at UCSC: 2024-12-18

Description

This track displays single nucleotide polymorphisms (SNPs) identified by published Genome-Wide Association Studies (GWAS), collected in the NHGRI-EBI GWAS Catalog published jointly by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EMBL-EBI). Some abbreviations are used above.

From http://www.ebi.ac.uk/gwas/docs/about:

The Catalog is a quality controlled, manually curated, literature-derived collection of all published genome-wide association studies assaying at least 100,000 SNPs and all SNP-trait associations with p-values < 1.0 x 10-5 (Hindorff et al., 2009). For more details about the Catalog curation process and data extraction procedures, please refer to the Methods page.

Methods

From http://www.ebi.ac.uk/gwas/docs/methods:

The GWAS Catalog data is extracted from the literature. Extracted information includes publication information, study cohort information such as cohort size, country of recruitment and subject ethnicity, and SNP-disease association information including SNP identifier (i.e. RSID), p-value, gene and risk allele. Each study is also assigned a trait that best represents the phenotype under investigation. When multiple traits are analysed in the same study either multiple entries are created, or individual SNPs are annotated with their specific traits. Traits are used both to query and visualise the data in the Catalog's web form and diagram-based query interfaces.

Data extraction and curation for the GWAS Catalog is an expert activity; each step is performed by scientists supported by a web-based tracking and data entry system which allows multiple curators to search, annotate, verify and publish the Catalog data. Papers that qualify for inclusion in the Catalog are identified through weekly PubMed searches. They then undergo two levels of curation. First all data, including association information for SNPs, traits and general information about the study, are extracted by one curator. A second curator then performs an additional round of curation to double-check the accuracy and consistency of all the information. Finally, an automated pipeline performs validation of the extracted data, see the Quality control and SNP mapping section below for more details. This information is then used for queries and in the production of the diagram.

Data Access

The raw data can be explored interactively with the Table Browser, or Data Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server (gwasCatalog*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.

Previous versions of this track can be found on our archive download server.

References

Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7. PMID: 19474294; PMC: PMC2687147