updated
Note: Updated Nov. 8, 2024
Description
The aim of the GENCODE
Genes project (Harrow et al., 2006) is to produce a set of
highly accurate annotations of evidence-based gene features on the human reference genome.
This includes the identification of all protein-coding loci with associated
alternative splice variants, non-coding with transcript evidence in the public
databases (NCBI/EMBL/DDBJ) and pseudogenes. A high quality set of gene
structures is necessary for many research studies such as comparative or
evolutionary analyses, or for experimental design and interpretation of the
results.
The GENCODE Genes tracks display the high-quality manual annotations merged
with evidence-based automated annotations across the entire
human genome. The GENCODE gene set presents a full merge
between HAVANA manual annotation and Ensembl automatic annotation.
Priority is given to the manually curated HAVANA annotation using predicted
Ensembl annotations when there are no corresponding manual annotations. With
each release, there is an increase in the number of annotations that have undergone
manual curation.
This annotation was carried out on the GRCh37 (hg19) genome assembly.
Experimental verification details are given in each descriptions for each
track. Transcript Support Levels were determined for version 10 onwards based
on evidence provided by GenBank mRNA and EST sequences. Versions 7 and 10 are
being used in data analysis by the ENCODE consortium.
NOTE: Due to the UCSC Genome Browser using the NC_001807 mitochondrial
genome sequence
(chrM) and GENCODE annotating the NC_012920 mitochondrial sequence, the
GENCODE mitochondrial sequences are not available in the UCSC Genome Browser.
These annotations are available for download in the
GENCODE GTF files.
For more information on the different gene tracks, see our Genes FAQ.
Display Conventions
These are multi-view composite tracks that contain differing data sets
(views). Instructions for configuring multi-view tracks are
here.
Only some subtracks are shown by default. The user can select which subtracks
are displayed via the display controls on the track details pages.
Further details on display conventions and data interpretation are available in the track descriptions.
Data access
GENCODE Genes and its associated tables can be explored interactively using the
REST API, the
Table Browser or the
Data Integrator.
The GENCODE data files for hg19 are available in our
downloads directory as wgEncodeGencode* in genePred format.
All the tables can also be queried directly from our public MySQL
servers, with instructions on this method available on our
MySQL help page and on
our blog.
Release Notes
GENCODE version 47lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 113.
GENCODE version 46lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 112.
GENCODE version 45lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 111.
GENCODE version 43lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 109.
GENCODE version 42lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 108.
GENCODE version 41lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 107.
GENCODE version 40lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 106.
GENCODE version 39lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 105.
GENCODE version 38lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 104.
GENCODE version 37lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 103.
GENCODE version 36lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 102.
GENCODE version 35lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 101.
GENCODE version 34lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 100.
GENCODE version 33lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 99.
GENCODE version 30lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 96.
GENCODE version 29lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 94.
GENCODE version 28lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 92.
GENCODE version 27lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 90.
GENCODE version 24lift37 (mapped from GRCh38 to GRCh37)
corresponds to Ensembl 83.
GENCODE version 19 corresponds to Ensembl 74 and Vega 54.
GENCODE version 17
corresponds to Ensembl 72 and Vega 52.
GENCODE version 14
corresponds to Ensembl 69 and Vega 49
GENCODE version 7
corresponds to Ensembl 62 and Vega 42
and is used in ENCODE analysis.
See also: The GENCODE Project Release History.
Credits
The GENCODE project is an international collaboration funded by NIH/NHGRI
grant U41HG007234. More information is available
at www.gencodegenes.org,
Participating GENCODE institutions and personnel can be found
here.
References
Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, Sisu C, Wright JC, Armstrong
J, Barnes I et al.
GENCODE 2021.
Nucleic Acids Res. 2021 Jan 8;49(D1):D916-D923.
PMID: 33270111;
PMC: PMC7778937;
DOI: 10.1093/nar/gkaa1087
A full list of GENCODE publications are available
at The GENCODE
Project web site.
Data Release Policy
GENCODE data are available for use without restrictions.
|