Description
This track helps call out sections of the genome that often cause problems for bioinformaticians.
The 12 subtracks identify genomic regions known to cause analysis artifacts for common sequencing
downstream computations, such as alignment, variant calling, or peak calling. The underlying data was
imported from the
NCBI GeT-RM, the
Genome-in-a-Bottle,
and Anshul Kundaje's ENCODE Blacklist projects.
The only exception is the UCSC Unusual Regions subtrack, which contains annotations of
a few special gene clusters (IGH, IGL, PAR1/2, TCRA, TCRB, etc) and fixed sequences,
alternate haplotypes, unplaced contigs, pseudo-autosomal regions, and
mitochondria. These loci can yield alignments with low-quality mapping scores and
discordant read pairs. This data set
was manually curated, based on the Genome Browser's assembly
description, the FAQs about assembly,
and the NCBI RefSeq "other" annotations
track data.
The ENCODE Blacklist subtrack contains a comprehensive set of regions which are troublesome
for high-throughput Next-Generation Sequencing (NGS) aligners. These regions tend to have a very
high ratio of multi-mapping to unique mapping reads and high variance in mappability due to
repetitive elements such as satellite, centromeric and telomeric repeats.
The Genome-In-A-Bottle (GIAB) track set contains defined regions where it is difficult to
make a confident call, due to low coverage, systematic sequencing errors, and local alignment
problems. These regions were identified from sequencing data generated by multiple technologies.
The NCBI GeT-RM, Genetic Testing Reference Materials, track set contains highly homologous
gene- and exon-level regions difficult
or impossible to analyze with standard Sanger or short-read NGS approaches and are relevant to
current clinical testing.
Display Conventions and Configuration
Each track contains a set of regions of varying length with no special configuration options.
The UCSC Unusual Regions track has a mouse-over description, all other tracks have at most
a name field, which can be shown in pack mode. The tracks are usually kept in dense mode.
The Hide empty subtracks control hides subtracks with no data in the browser window. Changing the browser window by zooming or scrolling may result in the display of a different selection of tracks.
Data access
The raw data can be explored interactively with the Table Browser
or the Data Integrator.
For automated download and analysis, the genome annotation is stored in bigBed files that
can be downloaded from
our download server.
Individual
regions or the whole genome annotation can be obtained using our tool bigBedToBed
which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and binaries can be found
here.
The tool
can also be used to obtain only features within a given range, e.g.
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/bbi/problematic/deadZone.bb -chrom=chr21 -start=0 -end=100000000 stdout
Methods
Files were downloaded from the respective databases and converted to bigBed format.
The procedure is documented in our
hg19 makeDoc file (search problematic).
Credits
Thanks to Anna Benet-Pages, Max Haeussler, Angie Hinrichs, and Daniel Schmelter
at the UCSC Genome Browser for planning, building, and testing these tracks. The
underlying data comes from the
ENCODE Blacklist, the
GeT-RM,
and the
Genome-in-a-Bottle
projects.
References
Amemiya HM, Kundaje A, Boyle AP.
The ENCODE Blacklist: Identification of Problematic Regions of the Genome.
Sci Rep. 2019 Jun 27;9(1):9354.
PMID: 31249361; PMC: PMC6597582
Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O, Hide W, Salit M.
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype
calls.
Nat Biotechnol. 2014 Mar;32(3):246-51.
PMID: 24531798
Mandelker D, Schmidt RJ, Ankala A, McDonald Gibson K, Bowser M, Sharma H, Duffy E, Hegde M, Santani
A, Lebo M et al.
Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-
generation sequencing.
Genet Med. 2016 Dec;18(12):1282-1289.
PMID: 27228465
|