SYNONYMOUS SHORT VARIANTS 

INTRODUCTION TO SHORT VARIANTS

Variants are found throughout all of our genomic sequences. In the past, the word "mutations" was typically used in place of  "variants," but the present-day nomenclature is the latter, to avoid the negative assoications of the word. Some variants are unique to certain individuals or familial bloodlines ("private" variants), while others are common and well-documented. They can be completely innocuous, or result in faulty proteins with pathogenic ramifications. The Genome Browser can be utilized to visualize variants and their consequences. 

A variant is typically defined as a deviation from the sequence of the reference assembly, which itself is the sequence of one arbitrary individual at any genomic location. The gene BRCA2 is known for having variants that are correlated with breast and ovarian cancer because its main function is as a tumor suppressor gene that helps repair damaged DNA; it also has a plethora of potential variations and serves as a good example for exploring different types of variants. We can examine some of these variations by turning on two types of  "dbSNP" tracks, both of which tag variation sites in a gene. "SNP" is an acronym for "Single Nucleotide Polymorphism", and has been replaced by the more accurate "SNV" (Single Nucleotide Variant) to avoid misrepresenting the allele frequency — "polymorphism" implies that a variant is common in the population, which is not usually true. The database is still called dbSNP, however.

In Figures 1 and 2, the "UCSC Genes" track is displaying exon 11 of the BRCA2 gene. Below the "UCSC Genes" track are two "dbSNP" tracks, "dbSNP153" (upper) and "dbSNP151" (lower); these two tracks are different iterations of the same rsIDs and data, with some differences between the two data releases. The two tracks have different details available when clicking on one track or the other. Checking both versions of the rsID, or "reference SNP" can provide a wide range of insight into a variant. 

Each colored box, labeled with an rsID, note marks a documented variation site within the sequence. A green box indicates that the variant is synonymous, or does not change the resulting amino acid. Red boxes signify that the nucleotide change does impact the amino acid sequence.

Figure 1. Zoomed-in picture of BRCA2, exon 11 (frame is 23 nucleotides across). The DNA sequence is at the top. The "UCSC Genes" track shows corresponding amino acids. The lower tracks labeled, "All Short Genetic Variants from dbSNP Release 153" and "Simple Nucleotide Polymorphisms (dbSNP 151)", have show identified variant sites.

An active Browser session for this view can be found at https://genome.ucsc.edu/s/education/hg19_BRCA2variants

 

Figure 2. How to enable the "dbSNP 153" and "dbSNP151" on human genome assembly hg19. The "UCSC Genes" track shows exon 11 of BRCA2.

 

SYNONYMOUS SHORT VARIANTS 

Figure 3 is an example of a synonymous variant. The highlighted turquoise column intersects two green boxes, both identified as rs770199777, which mark a T to C nucleotide change. Regardless of which allele is present, the amino acid being coded for does not change; synonymous SNPs are displayed in green. Some genes are transcribed left to right, while others are transcribed right to left. In this case, the amino acid numbering increases from left to right (ie. "V1347", "C1348", "I1349"), indicating that the isoform is transcribed by reading the sequence in the aforementioned direction ("top strand"). With this in mind, we can confirm for ourselves that this variant is synonymous; the genomic sequence "TGT" codes for the amino acid cysteine (or "C"). If the alternate allele is present, the codon will be transcribed as "TGC", which also results in a cysteine. With no amino acid change, the identical protein is produced. Clicking on a dbSNP rsID leads you to its details page.

Figure 3. ( https://genome.ucsc.edu/s/education/hg19_BRCA2synon) Highlighted in turquoise is rs770199777 T/C (T to C change), seen on both "dbSNP153" and "dbSNP 151." Follow the highlight up to see the reference nucleotide (T). The black box marks which codon includes the highlighted T allele. Below, the "UCSC Genes Track" shows the amino acid (Cysteine 1348) that is encoded by the codon boxed in black. 

Figure 4 shows the details page for rs770199777 accessed from the "dbSNP 153" track. This details page can be useful when looking for information regarding a variant, including the coordinates ("Position"); both the "Reference allele" and "Alternate allele"; type of variant ("Functional Effects", shown to be synonymous in this case); the "ClinVar" significance (likely-benign); and further resources or databases that can provide more information on the consequences of a variant. There is also an allele frequency chart which shows that the alternate allele, C, is extremely rare, having an average frequency value in the gnomAD database of 0.000004. 

Figure 4."dbSNP 153" details page of rs770199777. Note allele frequency and ClinVar consequence, likely-benign.

We can also view the other dbSNP track's details page for this rsID by going back to the browser and selecting rs770199777 from the "dbSNP 151" track, as seen in figure 5. This page provides some additional information that the "dbSNP 153" page does not, including which allele is seen in macaque monkeys and two species of great ape. At the bottom of the page, "Coding annotations by dbSNP" shows the allele, codon, and amino acid change. The "UCSC's predicted function relative to selected gene tracks" provides the same information. 

Figure 5. "dbSNP 151" details page for rs770199777 .

Written by Zoë Shmidt, UCSC.  Major:  BA, Biological Anthropology