Short Variants Tracks
 
Short Variants tracks   (All Human Pangenome - HPRC tracks)

Display mode:   

 All
HPRC All Variants  HPRC variants decomposed from hprc-v1.0-mc.grch38.vcfbub.a100k.wave.vcf.gz (Liao et al 2023), no size filtering  Source data version: August 2023
HPRC Variants <= 3bp  HPRC VCF variants filtered for items size <= 3bp  Source data version: August 2023
HPRC Variants > 3bp  HPRC VCF variants filtered for items size > 3bp  Source data version: August 2023
Assembly: Human Dec. 2013 (GRCh38/hg38)

Description

This track shows short nucleotide variants of a few base pairs when aligning HPRC genomes to the hg38 reference assembly. The alignment was made with the Minigraph-cactus approach described in the references below.

There are three subtracks in this superTrack:

  1. All short variants up to 50bp, without any length filter
  2. All short variants <= 3 bp long
  3. All short variants > 3 bp long

VCF Decomposition from HPRC Pangenome Resources Github: "The Raw VCF files contain a site for each bubble in the graph. Nested bubbles will result in overlapping sites. The nesting relationships are denoted with the PS (parent snarl), LV (level) and AT (allele traversal) tags and need to be taken into account when interpreting the VCF. Alternatively, you can use the 'Decomposed VCFs' which have been normalized by using vcfbub to 'pop' bubbles with alleles larger than 100k and vcfwave to realign each alt (script). Note that in order to reproduce the PanGenie analyses from the papers, you should instead use the PanGenie HPRC Workflow. This workflow has a CHM13 branch to use when working with that reference.

The exact tools and commands used to produce the VCFs are given here."

Display Conventions and Configuration

The Name of the items are the pair of node labels that denote the site's location in the graph, with the '>' and '<' denoting the forward and reverse orientation of the node. Mouseover on items in "squish" and "pack" modes shows the items Name and Genotypes. Mouseover on items in "full" mode shows Alleles.

Methods

The Minigraph-Cactus HPRC v1.0 graph was converted to VCF using vg deconstruct. This result was further postprocessed using vcfbub to flatten nested sites then vcfwave to normalize by realigning alt alleles to the reference. All steps are described in Hickey et al 2023. The postprocessing command lines and data can be found on Github. Finally, the resulting VCF was filtered by length and split into two VCFs using a cutoff of 3bp.

Credits

Thanks to Glenn Hickey for providing the HAL file from the HPRC project and for making these VCFs from them.

References

Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649; DOI: 10.1038/s41586-020-2871-y

Glenn Hickey, Jean Monlong, Jana Ebler, Adam M Novak, Jordan M Eizenga, Yan Gao; Human Pangenome Reference Consortium; Tobias Marschall, Heng Li, Benedict Paten Pangenome graph construction from genome alignments with Minigraph-Cactus. Nature Biotechnology. 2023 May 10. doi: 10.1038/s41587-023-01793-w. PMID: 37165083; DOI: 10.1038/s41587-023-01793-w

Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 2011 Sep;21(9):1512-28. PMID: 21665927; PMC: PMC3166836; DOI: 10.1101/gr.123356.111

Wen-Wei Liao, Mobin Asri, Jana Ebler, ...et al, Heng Lin, Benedict Paten A draft human pangenome reference. Nature. 2023 May;617(7960):312-324. PMID: 37165242; PMC: PMC1017212; DOI: 10.1038/s41586-023-05896-x