ENCODE Regulation TF Clusters Track Settings

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Transcription Factor ChIP-seq Clusters (340 factors, 129 cell types) from ENCODE 3

Track collection: Integrated Regulation from ENCODE

Description

These tracks contain information relevant to the regulation of transcription from the ENCODE Project.

The TF rPeak Clusters track shows genomic regions bound by DNA-associated proteins involved in transcriptional regulation from ENCODE 4.
The Transcription track shows transcription levels assayed by sequencing of polyadenylated RNA from a variety of cell types.
The Layered H3K4Me1 and Layered H3K27Ac tracks show where modification of histone proteins is suggestive of enhancer and, to a lesser extent, other regulatory activity. These histone modifications, particularly H3K4Me1, are quite broad. The actual enhancers are typically just a small portion of the area marked by these histone modifications.
The Layered H3K4Me3 track shows a histone mark associated with promoters.
The DNase I Hypersensitivity tracks indicate where chromatin is hypersensitive to cutting by the DNase enzyme, which has been assayed in a large number of cell types. Regulatory regions, in general, tend to be DNase-sensitive, and promoters are particularly DNase-sensitive.
The Txn Factor ChIP tracks show DNA regions where transcription factors, proteins responsible for modulating gene transcription, bind as assayed by chromatin immunoprecipitation with antibodies specific to the transcription factor followed by sequencing of the precipitated DNA (ChIP-seq).

These tracks complement each other and together can shed much light on regulatory DNA. The histone marks are informative at a high level, but they have a resolution of just ~200 bases and do not provide much in the way of functional detail. The DNase hypersensitivity assay is higher in resolution at the DNA level and can be done on a large number of cell types since it's just a single assay. At the functional level, DNase hypersensitivity suggests that a region is very likely to be regulatory in nature, but provides little information beyond that. The transcription factor ChIP assay has a high resolution at the DNA level and, due to the very specific nature of the transcription factors, is often informative with respect to functional detail. However, since each transcription factor must be assayed separately, the information is only available for a limited number of transcription factors on a limited number of cell lines. Though each assay has its strengths and weaknesses, the fact that all of these assays are relatively independent of each other gives increased confidence when multiple tracks are suggesting a regulatory function for a region.

For additional information, please click on the hyperlinks for the individual tracks above. Also note that additional histone marks and transcription information is available in other ENCODE tracks. This integrative supertrack just shows a selection of the most informative data of most general interest.

To view the full description, click here.

All tracks in this collection (10)

Display mode: Duplicate track

Filter by factor (select multiple items - Help)

Cluster right label: cell count (detected/assayed) cell abbreviations

Cell Abbreviations

Data schema/format description and download

Source data version: ENCODE 3 Nov 2018
Assembly: Human Dec. 2013 (GRCh38/hg38)
Data last updated at UCSC: 2019-05-16

Description

This track shows regions of transcription factor binding derived from a large collection of ChIP-seq experiments performed by the ENCODE project between February 2011 and November 2018, spanning the first production phase of ENCODE ("ENCODE 2") through the second full production phase ("ENCODE 3").

Transcription factors (TFs) are proteins that bind to DNA and interact with RNA polymerases to regulate gene expression. Some TFs contain a DNA binding domain and can bind directly to specific short DNA sequences ('motifs'); others bind to DNA indirectly through interactions with TFs containing a DNA binding domain. High-throughput antibody capture and sequencing methods (e.g. chromatin immunoprecipitation followed by sequencing, or 'ChIP-seq') can be used to identify regions of TF binding genome-wide. These regions are commonly called ChIP-seq peaks.

ENCODE TF ChIP-seq data were processed using the ENCODE Transcription Factor ChIP-seq Processing Pipeline to generate peaks of TF binding. Peaks from 1264 experiments (1256 in hg38) representing 338 transcription factors (340 in hg38) in 130 cell types (129 in hg38) are combined here into clusters to produce a summary display showing occupancy regions for each factor. The underlying ChIP-seq peak data are available from the ENCODE 3 TF ChIP Peaks tracks ( hg19, hg38)

Display Conventions

A gray box encloses each peak cluster of transcription factor occupancy, with the darkness of the box being proportional to the maximum signal strength observed in any cell type contributing to the cluster. The HGNC gene name for the transcription factor is shown to the left of each cluster.

To the right of the cluster a configurable label can optionally display information about the cell types contributing to the cluster and how many cell types were assayed for the factor (count where detected / count where assayed). For brevity in the display, each cell type is abbreviated to a single letter. The darkness of the letter is proportional to the signal strength observed in the cell line. Abbreviations starting with capital letters designate ENCODE cell types initially identified for intensive study, while those starting with lowercase letters designate cell lines added later in the project.

Click on a peak cluster to see more information about the TF/cell assays contributing to the cluster and the cell line abbreviation table.

Methods

Peaks of transcription factor occupancy ("optimal peak set") from ENCODE ChIP-seq datasets were clustered using the UCSC hgBedsToBedExps tool. Scores were assigned to peaks by multiplying the input signal values by a normalization factor calculated as the ratio of the maximum score value (1000) to the signal value at one standard deviation from the mean, with values exceeding 1000 capped at 1000. This has the effect of distributing scores up to mean plus one 1 standard deviation across the score range, but assigning all above to the maximum score. The cluster score is the highest score for any peak contributing to the cluster.

Data Access

The raw data for the ENCODE3 TF Clusters track can be accessed from the Table Browser or combined with other datasets through the Data Integrator. This data is stored internally as a BED5+3 MySQL table with additional metadata tables. For automated analysis and download, the encRegTfbsClusteredWithCells.hg38.bed.gz track data file can be downloaded from our downloads server, which has 5 fields of BED data followed by a comma-separated list of cell types. The data can also be queried using the JSON API or the Public SQL server.

Credits

Thanks to the ENCODE Consortium, the ENCODE ChIP-seq production laboratories, and the ENCODE Data Coordination Center for generating and processing the TF ChIP-seq datasets used here. The ENCODE accession numbers of the constituent datasets are available from the peak details page. Special thanks to Henry Pratt, Jill Moore, Michael Purcaro, and Zhiping Weng, PI, at the ENCODE Data Analysis Center (ZLab at UMass Medical Center) for providing the peak datasets, metadata, and guidance developing this track. Please check the ZLab ENCODE Public Hubs for the most updated data.

The integrative view presented here was developed by Jim Kent at UCSC.

References

ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011 Apr;9(4):e1001046. PMID: 21526222; PMCID: PMC3079585

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. PMID: 22955616; PMCID: PMC3439153

Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Gabdank I, Narayanan AK, Ho M, Lee BT et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 2016 Jan 4;44(D1):D726-32. PMID: 26527727; PMC: PMC4702836

Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012 Sep 6;489(7414):91-100. PMID: 22955619

Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012 Sep;22(9):1798-812. PMID: 22955990; PMC: PMC3431495

Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, Moore J, Pierce BG, Dong X, Virgil D et al. Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res. 2013 Jan;41(Database issue):D171-6. PMID: 23203885; PMC: PMC3531197

Data Use Policy

Users may freely download, analyze and publish results based on any ENCODE data without restrictions. Researchers using unpublished ENCODE data are encouraged to contact the data producers to discuss possible coordinated publications; however, this is optional.

Users of ENCODE datasets are requested to cite the ENCODE Consortium and ENCODE production laboratory(s) that generated the datasets used, as described in Citing ENCODE.