Description
These tracks indicate regions with uniquely mappable reads of particular lengths before and after
bisulfite conversion. Both Umap and Bismap tracks contain single-read mappability and multi-read
mappability tracks for four different read lengths: 24 bp, 36 bp, 50 bp, and 100 bp.
You can use these tracks for many purposes, including filtering unreliable signal from
sequencing assays. The Bismap track can help filter unreliable signal from sequencing assays
involving bisulfite conversion, such as whole-genome bisulfite sequencing or reduced representation
bisulfite sequencing.
Bismap single-read and multi-read mappability
- Bismap single-read mappability
-
These tracks mark any region of the bisulfite-converted genome that is uniquely mappable by
at least one k-mer on the specified strand. Mappability of the forward strand was
generated by converting all instances of cytosine to thymine. Similarly, mappability of the
reverse strand was generated by converting all instances of guanine to adenine.
To calculate the single-read mappability, you must find the overlap of a given region with
the region that is uniquely mappable on both strands. Regions not uniquely mappable on both
strands or have a low multi-read mappability might bias the downstream analysis.
- Bismap multi-read mappability
-
These tracks represent the probability that a randomly selected k-mer which overlaps
with a given position is uniquely mappable. Multi-read mappability track is calculated for
k-mers that are uniquely mappable on both strands, and thus there is no strand
specification.
Umap single-read and multi-read mappability
- Umap single-read mappability
-
These tracks mark any region of the genome that is uniquely mappable by at least one
k-mer. To calculate the single-read mappability, you must find the overlap of a given
region with this track.
- Umap multi-read mappability
-
These tracks represent the probability that a randomly selected k-mer which overlaps
with a given position is uniquely mappable.
For greater detail and explanatory diagrams, see the
preprint, the
Umap and Bismap project website, or the
Umap and Bismap software
documentation.
Data Access
The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, genome annotation is stored in a bigBed
or bigWig file that can be downloaded from the
download
server. Individual regions or the whole genome annotation can be obtained using our tool
bigBedToBed or bigWigToWig, which can be compiled from the source code or
downloaded as a precompiled binary for your system. Instructions for downloading source code and
binaries can be found here.
The tool can also be used to obtain only features within a given range, for example:
bigBedToBed -chrom=chr6 -start=0 -end=1000000
http://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k24.Unique.Mappability.bb stdout
bigWigToWig -chrom=chr6 -start=0 -end=1000000
http://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k24.Umap.MultiTrackMappability.bw
stdout
Please refer to our mailing list archives for questions, or our
Data Access FAQ for more
information.
Credits
Anshul Kundaje (Stanford
University) created the original Umap software in MATLAB. The original Umap repository is available
here.
Mehran Karimzadeh (Michael Hoffman
lab, Princess Margaret Cancer Centre) implemented the Python version of Umap and added features,
including Bismap.
References
Karimzadeh M, Ernst C, Kundaje A, Hoffman MM.,
Umap and Bismap:
quantifying genome and methylome mappability
bioRxiv bioRxiv, p. 095463, 2016.; doi: https://doi.org/10.1101/095463.
|
|