ENCODE Regulation TF rPeak Clusters Track Settings
 
Transcription Factor Representative Peak (rPeak) Clusters (912 factors in 1152 biosamples) from ENCODE 4

Track collection: Integrated Regulation from ENCODE

+  Description
+  All tracks in this collection (10)

Display mode:      Duplicate track

Filter by Name of transcription factor (select multiple items - Help)
Match if all one or more match

Shade of lowest-scoring items: Show Label:

Display data as a density graph:

Data schema/format description and download

Decoration settings

Decorators are optional sub-annotations shown in addition to the rectangular annotations of this track. They can either have the shape of rectangular blocks transparently overlaid or shown beneath features. Or they can be glyphs, symbols such as triangles, stars, etc. drawn on top of onto existing annotations. See the decorator documentation page.

Block decoration placement:  include labels in overlay mode:  
  Auto-hide labels when window is larger than bp
Glyph decoration placement: 
Assembly: Human Dec. 2013 (GRCh38/hg38)
Data last updated at UCSC: 2024-07-23 11:17:48

Description

This track displays regulatory regions in the human genome identified using ENCODE data, specifically spanning ENCODE phases 2 through 4. It highlights genomic regions bound by DNA-associated proteins involved in transcriptional regulation, such as RNA polymerase, transcription factors (TFs), and chromatin remodeling proteins. Sequence-specific TFs bind directly to short DNA motifs via their DNA-binding domains, while other DNA-associated proteins interact with DNA indirectly through protein-protein interactions with sequence-specific TFs. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a high-throughput method for mapping genome-wide protein-DNA interactions. Regions of high ChIP signal, commonly referred to as ChIP-seq peaks, indicate protein binding sites. For each DNA -associated protein, all ENCODE ChIP-seq peaks across biosamples were integrated to generate a set of representative peaks (rPeaks). This track displays these rPeaks alongside detected DNA motif sites.

Display Conventions and Configuration

Each rPeak is represented as a gray box, with the shade of gray corresponding to the maximum ChIP-seq signal observed across contributing biosamples. The HGNC gene name of the associated protein is displayed to the left of the box. If the rPeak overlaps a cognate TF motif site in the collection built previously (PMID: 37104580 DOI: 10.1126/science.abn7930), the motif site is highlighted in green.

Clicking on an rPeak provides detailed information about the biosamples where the rPeak was detected, including the count of biosamples with contributing ChIP-seq peaks and the total number of biosamples assayed for the protein. Links to relevant ENCODE ChIP-seq experiments and overlapping ENCODE candidate cis-regulatory elements (cCREs) are also provided.

By default, rPeaks for all 912 DNA-associated proteins with ENCODE ChIP-seq data are displayed. Users can customize the display by selecting specific DNA-associated proteins in the track settings.

Methods

2,509 ENCODE ChIP-seq experiments were integrated from 912 DNA-associated proteins across 1,152 unique biosamples to produce representative peaks (rPeaks) for each protein. The processing steps were as follows:

  1. ChIP-seq peaks for each protein were downloaded from the ENCODE Portal, generated using the ENCODE Transcription Factor ChIP-seq Processing Pipeline.
  2. Using bedtools merge, ChIP-seq peaks were clustered from the protein’s experiments across all biosamples.
  3. In each cluster, the peak with the highest ChIP signal (normalized by sequencing depth) was selected as the rPeak.
  4. All ChIP-seq peaks overlapping this rPeak by at least one nucleotide were marked as represented and removed from subsequent clustering rounds.
  5. Steps 2-4 were repeated until a final list of non-overlapping rPeaks was generated, representing all ChIP-seq peaks for the protein.

Data Access

The raw data for the ENCODE TF rPeak track will soon be available.

The raw data can be explored interactively with the Table Browser, for download, intersection or correlations with other tracks. To join this track with others based on the chromosome positions, use the Data Integrator.

Regarding access to this data track in the Genome Browser, for automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called TFrPeakClusters.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/ENCODE4/TFrPeakClusters.bb -chrom=chr21 -start=0 -end=100000000 stdout

For automated access, this track like all others, is also available via our API. However, for bulk processing in pipelines, downloading the data and/or using bigBed files as described above is usually faster.

Credits

This track was made possible thanks to the efforts of the ENCODE Consortium, ENCODE ChIP-seq production laboratories, and the ENCODE Data Coordination Center for generating and processing the ChIP-seq datasets. The ENCODE accession numbers for the constituent datasets are accessible from the peak details page. Special thanks to Drs. Mingshi Gao, Greg Andrews, Jill Moore, and Zhiping Weng at UMass Chan Medical School, who were members of the ENCODE Data Analysis Center, for developing this track, including providing the rPeak and motif datasets and associated metadata and building the track. We also extend our gratitude to Max Haeussler and Jonathan Casper from the UCSC Genome Browser Project Team for their assistance in developing this track. For updates on the track, please contact the Weng lab.