FRAMESHIFT SHORT VARIANTS 

INTRODUCTION TO SHORT VARIANTS

Variants are found throughout all of our genomic sequences. In the past, the word "mutations" was typically used in place of  "variants," but the present-day nomenclature is the latter, to avoid the negative assoications of the word. Some variants are unique to certain individuals or familial bloodlines ("private" variants), while others are common and well-documented. They can be completely innocuous, or result in faulty proteins with pathogenic ramifications. The Genome Browser can be utilized to visualize variants and their consequences. 

A variant is typically defined as a deviation from the sequence of the reference assembly, which itself is the sequence of one arbitrary individual at any genomic location. The gene BRCA2 is known for having variants that are correlated with breast and ovarian cancer because its main function is as a tumor suppressor gene that helps repair damaged DNA; it also has a plethora of potential variations and serves as a good example for exploring different types of variants. We can examine some of these variations by turning on two types of  "dbSNP" tracks, both of which tag variation sites in a gene. "SNP" is an acronym for "Single Nucleotide Polymorphism", and has been replaced by the more accurate "SNV" (Single Nucleotide Variant) to avoid misrepresenting the allele frequency — "polymorphism" implies that a variant is common in the population, which is not usually true. The database is still called dbSNP, however.

In Figures 1 and 2, the "UCSC Genes" track is displaying exon 11 of the BRCA2 gene. Below the "UCSC Genes" track are two "dbSNP" tracks, "dbSNP153" (upper) and "dbSNP151" (lower); these two tracks are different iterations of the same rsIDs and data, with some differences between the two data releases. The two tracks have different details available when clicking on one track or the other. Checking both versions of the rsID, or "reference SNP" can provide a wide range of insight into a variant. 

Each colored box, labeled with an rsID, note marks a documented variation site within the sequence. A green box indicates that the variant is synonymous, or does not change the resulting amino acid. Red boxes signify that the nucleotide change does impact the amino acid sequence.

Figure 1. Zoomed-in picture of BRCA2, exon 11 (frame is 23 nucleotides across). The DNA sequence is at the top. The "UCSC Genes" track shows corresponding amino acids. The lower tracks labeled, "All Short Genetic Variants from dbSNP Release 153" and "Simple Nucleotide Polymorphisms (dbSNP 151)", have show identified variant sites.

An active Browser session for this view can be found at https://genome.ucsc.edu/s/education/hg19_BRCA2variants

 

Figure 2. How to enable the "dbSNP 153" and "dbSNP151" on human genome assembly hg19. The "UCSC Genes" track shows exon 11 of BRCA2.

 

FRAMESHIFT SHORT VARIANTS 

Figure 3 illustrates an example of a frameshift variant, rs80359421, which is a two nucleotide deletion (C and T). A frameshift variant involves a deletion or an insertion of either a single or multiple nucleotides. The resulting reading frame of the sequence is shifted either right or left, hence the name "frameshift," unless the variant involves three nucleotides or a multiple thereof, and the overall reading frame is not impacted. While the other examples (synonymous, missense and nonsense) are single nucleotide changes, frameshift variations can involve multiple nucleotides changing alleles or position, therefore the SNPs are shown as an elongated red box spanning two nucleotides. Like the other variants, clicking on a box will bring you to its corresponding details page. 

Figure 3. ( https://genome.ucsc.edu/s/education/hg19_BRCA2frameshift) Highlighted in turquoise is rs80359421 CT/- (CT deletion), seen on both "dbSNP153" and "dbSNP 151." Follow the highlight up to see the corresponding reference nucleotides (CT). The black box marks which codon includes the highlighted alleles. Below, the "UCSC Genes Track" shows the amino acid (Threonine 1346) that is encoded by the codon boxed in black.

To visualize how this two-nucleotide deletion will affect the overall reading frame, you can set the base position track to "full," showing three alternate reading frames of the sequence, and using the highlight tool to your advantage, as seen in Figure 4.

Figure 4. Visualizing reading-frame shifts: "Base position" (set to "full" in this figure, but "dense" in all previous figures) and "UCSC Genes" are the only tracks enabled. Both "dbSNP&qot; tracks are disabled for visual clarity. The C and T alleles, deleted in rs80359421, are highlighted in turquoise; this variant's resulting codon is highlighted in yellow. The stop codon in the  resulting reading frame is red and highlighted in orange.

The amino acid seen on the "UCSC Genes" track ("T 1346") lines up with the reference sequence codon "ACT". This is only one potential reading frame of the BRCA2 gene; with the base position track now set to "full," two additional alternate reading frames are present on the browser. If an individual possesses the dbSNP variant rs80359421, the CT alleles (highlighted with turquoise in the beginning of Figure 4) would be deleted. The pale yellow highlight in figure 5 illustrates what the resulting codon change would look like; imagine the C and T nucleotides (now a pale green) are deleted, the codon would instead read AGT (which encodes a serine). The new codon and pale yellow column line up with the top reading frame on the "base position" track (boxed in black). This reading frame's stop codon is located only four amino acids downstream, following the L, Y, and S amino acids in that frame, which are not the proper amino acids for the BRCA2 protein. The rs80359421 variant results in a reading frame shift that in turn, has a stop codon much earlier than the original reading frame has. 

Figure 5. ( https://genome.ucsc.edu/s/education/hg19_BRCA2frames) The codon produced with the rs80359421 variant is highlighted in pale yellow (imagine the CT alleles highlighted in pale green are deleted). Resulting reading frame is boxed in black on the "base position" track; this reading frame's stop codon is highlighted in red.

Written by Zoë Shmidt, UCSC.  Major:  BA, Biological Anthropology