Description
This track maps genome-wide human transcription factor binding sites using
second-generation massively parallel sequencing. This mapping uses expressed
transcription factors as GFP-tagged fusion proteins after bacterial artificial
chromosome (BAC) recombineering (recombination-mediated genetic engineering).
The University of Chicago and Max Planck Institute (Dresden) pipeline generates
recombineered BACs for the production of cell lines or animals that express fusion
proteins from epitope-tagged transgenes.
Display Conventions and Configuration
This track is a multi-view composite track that contains multiple data types
(views). For each view, there are multiple subtracks that display
individually on the browser. Instructions for configuring multi-view tracks
are here.
For each cell type, this track contains the following views:
- Peaks
- Regions of signal enrichment based on processed data (usually normalized data from pooled replicates).
- Signal
- Density graph (wiggle) of signal enrichment based on aligned read density.
Peaks and signals displayed in this track are the results of pooled replicate
sequence. Alignment files for each replicate are available for
download.
Methods
Cells were grown according to the approved
ENCODE cell culture protocols.
Recombineering Strategy
To facilitate high-throughput production of the transgenic constructs, the
program BACFinder (Crowe et al., 2002) automatically selects the
most suitable BAC clone for any given human gene and generates the sets of
PCR primers required for tagging and verification (Poser et al.,
2008). Recombineering is used for tagging cassettes at either the N or C
terminus of the protein. The N-terminal cassette has a dual eukaryotic-prokaryotic
promoter (PGK-gb2) driving a neomycin-kanamycin resistance gene within an
artificial intron inside the tag coding sequence. The selection cassette is
flanked by two loxP sites and can be permanently removed by Cre
Recombinase-mediated excision. The C-terminal cassette contains the sequence
encoding the tag followed by an internal ribosome entry site (IRES) in front
of the neomycin resistance gene. In addition, a short bacterial promoter (Gb3)
drives the expression of the neomycin-kanamycin resistance gene in E. coli.
The tagging cassettes, containing 50 nucleotides of PCR-introduced homology arms,
were inserted into the BAC by recombineering, either behind the start codon (for
the N-terminal tag) or in front of the stop codon (for the C-terminal tag) of the
gene. E. coli cells that had successfully recombined the cassette were
selected for kanamycin resistance in liquid culture. Each saturated culture from
a specific recombineering reaction derived 10-200 independent recombination events.
Two independent clones were checked for each PCR through the tag insertion point and
97% (85/88) yielded a PCR product of the expected size. Most of the clones that failed
to grow were missing the targeted genomic region. An estimated 10% of the BACs used
were chimeric, rearranged or wrongly mapped. Thus, initial results indicated that the
necessary recombineering steps could be carried out with high fidelity.
The White lab produced all epitope tagged transcription and chromatin factor BACs,
as well as the genome-wide ChIP data and analysis. An application of this approach
to the analysis of closely related paralogs (RARa and RARg) yielded transcription
factors, chromatin factors, cell lines, ChIP-chip data and ChIP-seq data (Hua
et al., 2009). Such paralogous transcription factors often cannot otherwise
be distinguished by antibodies.
Sample Preparation
ChIP DNA from samples were sheared to approximately 800 bp using a nebulizer. The ends of the DNA
were polished and two unique adapters were ligated to the fragments. Ligated fragments of
150-200 bp were isolated by gel extraction and amplified using limited cycles of PCR.
Sequencing System
Illumina GAIIx and HySeq next-generation sequencing were used to produce all ChIP-seq data.
Processing and Analysis Software
Raw sequencing reads were aligned using
Bowtie 0.12.5
(Langmead et al., 2009). The "-m 1" parameter was applied to suppress
alignments mapping more than once in the genome. Reads were aligned to the UCSC hg19 assembly.
Wiggle format signal files were generated with
SPP 2.7.1
(Kharchenko et al., 2008) for R 2.7.1.
MACS 1.3.7
was used to call peaks. The MACS parameters used varied by experiment.
The White lab used goat anti-GFP antibody to perform ChIP in untagged K562 cells as a background control.
The test IP was performed in the same manner as the background control. Results were expressed as values
of the test normalized to the background.
Credits
These data and annotations were created by a collaboration of University of Chicago and Argonne National Laboratory:
References
Crowe ML, Rana D, Fraser F, Bancroft I, Trick M.
BACFinder: genomic localisation of large insert genomic clones based on restriction fingerprinting.
Nucleic Acids Res. 2002 Nov 1;30(21):e118.
Hua S, Kittler R, White KP.
Genomic antagonism between retinoic acid and estrogen signaling in breast cancer.
Cell. 2009 Jun 26;137(7):1259-71.
Kharchenko PV, Tolstorukov MY, Park PJ.
Design and analysis of ChIP-seq experiments for DNA-binding proteins.
Nat Biotechnol. 2008 Dec;26(12):1351-9.
Langmead B, Trapnell C, Pop M, Salzberg SL.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
Genome Biol. 2009;10(3):R25.
Poser I, Sarov M, Hutchins JR, Hériché JK, Toyoda Y, Pozniakovsky A, Weigl D, Nitzsche A, Hegemann B, Bird AW et al.
BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals.
Nat Methods. 2008 May;5(5):409-15.
Data Release Policy
Data users may freely use ENCODE data, but may not, without prior consent,
submit publications that use an unpublished ENCODE dataset until nine months
following the release of the dataset. This date is listed in the Restricted Until
column, above. The full data release policy for ENCODE is available
here.
|