Description
The SIB Genes track is a transcript-based set of gene predictions based
on data from RefSeq and EMBL/GenBank. Genes all have the support of at
least one GenBank full length RNA sequence, one RefSeq RNA, or one spliced
EST. The track includes both protein-coding and non-coding transcripts.
The coding regions are predicted using
ESTScan.
Display Conventions and Configuration
This track in general follows the display conventions for
gene prediction
tracks. The exons for putative non-coding genes and untranslated regions
are represented by relatively thin blocks while those for coding open
reading frames are thicker.
This track contains an optional codon coloring
feature that allows users to quickly validate and compare gene predictions.
To display codon colors, select the genomic codons option from the
Color track by codons pull-down menu. Go to the
Coloring Gene Predictions and
Annotations by Codon page for more information about this feature.
Further information on the predicted transcripts can be found on the
Transcriptome Web
interface.
Methods
The SIB Genes are built using a multi-step pipeline:
- RefSeq and GenBank RNAs and ESTs are aligned to the genome with
SIBsim4, keeping
only the best alignments for each RNA.
- Alignments are broken up at non-intronic gaps, with small isolated
fragments thrown out.
- A splicing graph is created for each set of overlapping alignments. This
graph has an edge for each exon or intron, and a vertex for each splice site,
start, and end. Each RNA that contributes to an edge is kept as evidence for
that edge.
- The graph is traversed to generate all unique transcripts. The traversal is
guided by the initial RNAs to avoid a combinatorial explosion in alternative
splicing.
- Protein predictions are generated.
Credits
The SIB Genes track was produced on the Vital-IT high-performance
computing platform
using a computational pipeline developed by Christian Iseli with help from
colleagues at the Ludwig Institute
for Cancer
Research and the Swiss Institute
of Bioinformatics. It is based on data from NCBI RefSeq and GenBank/EMBL. Our
thanks to the people running these databases and to the scientists worldwide
who have made contributions to them.
References
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.
GenBank: update.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.
PMID: 14681350; PMC: PMC308779
|