GenCC Track Settings
 
GenCC: The Gene Curation Coalition Annotations   (All Phenotype and Literature tracks)

Display mode:      Duplicate track

Filter by evidence classification (select multiple items - Help)


Display data as a density graph:
Data schema/format description and download
Assembly: Human Feb. 2009 (GRCh37/hg19)
Data last updated at UCSC: 2024-12-17 16:19:56

Description

This track shows annotations from The Gene Curation Coalition (GenCC). The GenCC provides information pertaining to the validity of gene-disease relationships, with a current focus on Mendelian diseases. Curated gene-disease relationships are submitted by GenCC member organizations that currently provide online resources (e.g. ClinGen, DECIPHER, Orphanet, etc.), as well as diagnostic laboratories that have committed to sharing their internal curated gene-level knowledge (e.g. Ambry Genetics, Illumina, Invitae, etc.).

The GenCC aims to clarify overlap between gene curation efforts and develop consistent terminology for validity, allelic requirement and mechanism of disease. Each item on this track corresponds with a gene, and contains a large number of information such as associated disease, evidence classification, specific submission notes and identifiers from different databases. In cases where multiple annotations exist for the same gene, multiple items are displayed.

Display Conventions and Configuration

Each item displayed represents a submission to the GenCC database. The displayed name is a combination of the gene symbol and the disease's original submission ID. This submission ID is either the OMIM#, MONDO# or Orphanet#. Clicking on any item will display the complete meta data for that item, including linkouts to the GenCC, NCBI, Ensembl, HGNC, GeneCards, Pombase (MONDO), and Human Phenotype Ontology (HPO). Mousing over any item will display the associated disease title, the classification title, and the mode of inheritance title.

Items are colored based on the GenCC classification, or validation, of the evidence in the color scheme seen in the table below. For more information on this process, see the GenCC validity terms FAQ. A filter for the track is also available to display a subset of the items based on their classification.

Color Evidence classification
Definitive
Strong
Moderate
Supportive
Limited
Disputed Evidence
Refuted Evidence
No Known Disease Relationship

Limitations: Most entries include both NM_ accessions as well as ENST and ENSG identifiers. From the original file, which contains no coordinates, two genes were not mapped to the hg38 genome, SLCO1B7 and ATXN8. This means that the hg38 track has 2 fewer items than what can be found in the GenCC download file. For hg19, one additional gene was not mapped, KCNJ18. In addition to this, the GenCC data in the Genome Browser does not include OMIM data due to licensing restrictions. For more information, see the Methods section below.

Data Access

The source data can be explored in GenCC database. The source files can also be found on the GenCC downloads page.

The GenCC data on the UCSC Genome Browser can be explored interactively with the Table Browser or the Data Integrator. For automated download and analysis, the genome annotation is stored at UCSC in bigBed files that can be downloaded from our download server. The data may also be explored interactively using our REST API.

The file for this track may also be locally explored using our tools bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tools can also be used to obtain features confined to a given range, e.g.,

bigBedToBed -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/hg19/bbi/genCC.bb stdout

Methods

The data were downloaded from the GenCC downloads page in tsv format. Manual curation was performed on the file to remove newline characters and tab characters present in the submission notes, in total fewer than 20 manual edits were made.

The track was first built on hg38 by associating the gene symbols with the NCBI MANE 1.0 release transcripts. These coordinates were added to the items as well as the NM_ accession, ENST ID and ENSG ID. For items where there was no gene symbol match in MANE (~130), the gene symbols were queried against GENCODEv40 comprehensive set release. In places where multiple transcript matches were found, the earliest transcription start and latest end site was used from among the transcripts to encompass the entire gene coordinates. Two genes were not able to be mapped for hg38, SLCO1B7 and ATXN8, resulting in two missing submissions in the Genome Browser when compared to the raw file. Lastly, the items were colored according to their evidence classification as seen on the GenCC database.

For hg19, the hg38 NM_ accessions were used to convert the item coordinates according to the latest hg19 refseq release. For items that failed to convert, the gene symbols were queried using the GENCODEv40 hg19 lift comprehensive set. One additional gene symbol failed to map in hg19, KCNJ18, leading to 3 fewer items on this track when compared to the raw file.

For both assemblies, GenCC OMIM data is excluded do to data restrictions. For complete documentation of the processing of these tracks, read the GenCC MakeDoc.

Credits

Thanks to the entire GenCC committee for creating these annotations and making them available.

References

DiStefano MT, Goehringer S, Babb L, Alkuraya FS, Amberger J, Amin M, Austin-Tse C, Balzotti M, Berg JS, Birney E et al. The Gene Curation Coalition: A global effort to harmonize gene-disease evidence resources. Genet Med. 2022 May 4;. PMID: 35507016