Pseudogenes Track Settings
 
Pseudogenes and Parents   (All Genes and Gene Predictions tracks)

Display mode:       Reset to defaults Show Label:

Display data as a density graph:
List subtracks: only selected/visible    all  
hide
 Configure
 Pseudogene Parents  Yale Pseudogene Parents   Data format 
hide
 Configure
 Pseudogenes  Yale Pseudogenes   Data format 
Assembly: Human Dec. 2013 (GRCh38/hg38)


new Note: Released Mar. 31, 2025

Description

These tracks contain pseudogene predictions and their parents as identified by PseudoPipe. PseudoPipe is a developed homology-based computational pipeline that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner.

Pseudogenes are genomic sequences that bear similarity to specific protein-coding genes, but are unable to produce functional proteins due to the existence of frameshifts, premature stop codons, or other deleterious mutations. They arise from gene duplication or retrotransposition events and are important resources in understanding the evolutionary history of genes and genomes.

Display Conventions

This composite track consists of two subtracks: the Pseudogenes track and the Pseudogene Parents track. The Pseudogene Parents track displays parent genes labeled with their HUGO IDs, which were derived from Ensembl gene IDs provided by the Gerstein lab after dataset creation. It includes indicators for pseudogenes, each linked to its corresponding entry in the Pseudogenes track. The Pseudogenes track shows pseudogenes labeled with their parent HUGO ID and colored according to pseudogene type. The authors assigned PGOHUMG IDs to genes and PGOHUMT IDs to transcripts. Note: Not all PseudoPipe IDs could be mapped back to their original Ensembl IDs. In these cases, the gene ID is listed as NA.

Pseudogene types:
  • Unspecified pseudogenes include pseudogenic fragments and protein/chromosome homologies with high sequence similarity but are too decayed to be reliably classified as processed or duplicated.
  • Processed pseudogenes (retrotransposed pseudogenes) result from the reverse transcription of mRNA into DNA, which is then inserted into the genome. These pseudogenes lack introns, often have small flanking direct repeats, and may retain a 3' polyadenine tail. PseudoPipe distinguishes them from duplicated pseudogenes by a combination of these features, with the emphasis on the evidence of ancient introns.
  • Unprocessed pseudogenes (duplicated pseudogenes) arise from genomic DNA duplication or unequal crossing-over. They often retain the original exon-intron structures of the functional genes, although sometimes incompletely.

Pseudogene Parents track

Each parent gene is shown with their pseudogenes represented as grey blocks.

  • purple - parent gene
  • grey - pseudogene indicators

If a parent gene has four grey blocks beneath it, this indicates the presence of four pseudogenes elsewhere in the genome. Hovering over a grey block displays the pseudogene type and its PGOHUMT ID, along with a link to its corresponding entry in the Pseudogenes track and its genomic position. Clicking the PGOHUMT ID redirects the genome browser to the pseudogene's locus.

Pseudogenes track

Pseudogenes are colored by type.

  • orange - unspecified pseudogene
  • blue - unprocessed pseudogene
  • olive green - processed pseudogene

Mouse over on an item will display the PseudoPipe ID (PGOHUMG), the Parent Ensembl gene ID with a link to the corresponding parent gene location in the Pseudogenes track, and the pseudogene type.

Methods

The PseudoPipe pipeline identifies pseudogenes through a series of steps. It first uses BLAST to rapidly cross-reference potential parent proteins against the intergenic regions of the genome. The resulting raw hits are then processed by removing redundancies, clustering neighboring sequences, and aligning each cluster with a unique parent gene. Finally, pseudogenes are classified based on a combination of criteria, including homology, intron-exon structure, and the presence of stop codons or frameshifts. This method is designed to detect pseudogenes that are unable to be translated into proteins.

These tracks were generated using a Bash script that processes a GTF file with pseudogene annotations by removing duplicates, correcting overlapping exons, and converting the data to BED format with pseudoPipeToBed.py. This script extracts gene and transcript IDs, merges overlapping exons, assigns colors based on pseudogene type, and outputs a BED file with gene and parent annotations. PseudoPipeParents.py then links pseudogenes to their functional genes by determining parent gene coordinates, updating pseudogene entries with interactive browser links and generating a parent BED file. The final data are formatted into pseudoPipePgenes.bb and pseudoPipeParents.bb BigBed files. The detailed documentation (makeDoc) and Python scripts are available in our GitHub repository.

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. The data may also be explored interactively using our REST API.

For automated download and analysis, the genome annotation is stored at UCSC in bigBed files that can be downloaded from the download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system.

Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g.

bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/hg38/pseudogenes/pseudoPipePgenes.bb -chrom=chr21 -start=0 -end=10000000 stdout

Credits

Thanks to the Gerstein lab at Yale University for making this data available, and to Cristina Sisu for providing data in GTF format with parent annotations.

References

Zhang Z, Carriero N, Zheng D, Karro J, Harrison PM, Gerstein M. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics. 2006 Jun 15;22(12):1437-9. PMID: 16574694