Description
This track shows approximately 4.5 million single nucleotide variants (SNVs) and
0.6 million short insertions/deletions (indels) from 7 different parent/child trios as
produced by the
International
Genome Sample Resource (IGSR), from sequence data generated by the
1000 Genomes Project
in its Phase 3 sequencing of 2,504 genomes from 16 populations worldwide.
Variants were called on the autosomes (chromosomes 1 through 22) and on the
Pseudo-Autosomal Regions (PARs) of chromosome X.
Therefore this track has no annotations on alternate haplotype sequences, fix patches,
chromosome Y, or the non-PAR portion (the majority) of chromosome X.
The variant genotypes have been phased (i.e., the two alleles of each diploid genotype
have been assigned to two
haplotypes,
one inherited from each parent). This information allows us to illustrate which
haplotypes in the child have been inherited from which parent.
Trios from six different populations are available, including:
- YRI - Yoruban from Idaban, Nigeria
- KHV - Kinh in Ho Chi Minh City, Vietnam
- PUR - Puerto Ricans from Puerto Rico
- CEU - CEPH Utah
- CHS - Southern Han Chinese
- MXL - Mexican Ancestry from Los Angeles
Display Conventions and Configuration
This track illustrates the vcfPhasedTrio track type, where two lines, one for each chromosome
in the diploid genome, is drawn per sample in the underlying VCF. Variants in the window
are then drawn on the haplotype line corresponding to which haplotype they belong to, such that
variants on the same line were likely inherited together. The sorting routine is the same as
what is used to draw the haplotype sorted display in the non-trio 1000 Genomes track, and is
described here.
The child haplotypes are drawn in the center of each group, flanked above and below by
parent haplotypes, and variants are sorted to show the transmitted alleles:
parent 1 untransmitted haploytpe
parent 1 transmitted haplotype
child haplotype inherited from parent 1
child haplotype inherited from parent 2
parent 2 transmitted haplotype
parent 2 untransmitted haploytpe
Track configuration options include:
- Showing the child haplotypes below the parent(s)
- Toggling the haplotype labels with mother/father/child or VCF sample IDs
- Hiding the parent samples
Allele coloring options include:
- No shading - the default option
- Shading by functional effect of the variant relative to NCBI RefSeq Curated Transcripts:
- reference alleles invisible
- alternate alleles in red for non-synonymous
- alternate alleles in green for synonymous
- alternate alleles in blue for UTR/noncoding
- alternate alleles in black otherwise
- Child de novo alleles in red - all alternate alleles black except for cases where the child has
an allele not present in either parent
- Child alleles that are "inconsistent" with phasing in red - all alternate alleles black except for cases where the "inherited" child allele does not match the "transmitted" parent allele. Note that as the genomic location changes, and thus the alleles present to use for sorting change, whether an allele is marked as inconsistent can change as well. Because all the variants present in the window are considered a haplotype, what haplotypes are considered "inherited" and "transmitted" varies as the viewing location changes
From the subtrack configure menu, there is the option to manually rearrange
the family order for each trio by dragging haplotypes.
Clicking on a variant takes one to a details page with the standard VCF details, including
INFO column annotations, the REF and ALT alleles, and the genotypes from all three samples.
Methods
The genomes of 2,504 individuals were sequenced using both whole-genome sequencing
(mean depth = 7.4x) and targeted exome sequencing (mean depth = 65.7x).
Sequence reads were aligned to the reference genome using alt-aware BWA-MEM
(Zheng-Bradley et al.).
Variant discovery and quality control were performed as described in
Lowy-Gallego et al.
See also:
UCSC Methods
Trio samples were extracted out of both the main 1000 Genomes set, and the
related samples using the pedigree information from 1000
Genomes. Variants that were homozygous reference across all three samples were removed.
Data Access
Trio VCFs are available for download from
our download server.
Credits
Thanks to the
International Genome Sample
Resource (IGSR)
for making these variant calls freely available.
References
Zheng-Bradley X, Streeter I, Fairley S, Richardson D, Clarke L, Flicek P, 1000 Genomes Project
Consortium.
Alignment of 1000 Genomes Project reads to reference assembly GRCh38.
Gigascience. 2017 Jul 1;6(7):1-8.
PMID: 28531267; PMC: PMC5522380
Fairley S, Lowy-Gallego E, Perry E, Flicek P.
The International Genome Sample Resource (IGSR) collection of open human genomic variation
resources.
Nucleic Acids Res. 2019 Oct 4.
PMID: 31584097
Lowy-Gallego E, Fairley S, Zheng-Bradley X, Ruffier M, Clarke L, Flicek P,
1000 Genomes Project Consortium.
Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project [version 1; peer review: 2 not approved].
Wellcome Open Research. 2019 Mar. 11.
1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO,
Marchini JL, McCarthy S, McVean GA et al.
A global reference for human genetic variation.
Nature. 2015 Oct 1;526(7571):68-74.
PMID: 26432245
|