RNase-mediated protein footprint sequencing reveals protein-binding sites throughout the human transcriptome
Ian Silverman1,2,3,8, Fan Li1,2,4,8, Anissa Alexander1,2,3, Loyal Goff5,6, Cole Trapnell5,6, John L. Rinn5,6,7, and Brian D. Gregory1,2,3,4
1Department of Biology
2Penn Genome Frontiers Institute
3Cell and Molecular Biology Graduate Group
4Genomics and Computational Biology Graduate Group, University of Pennsylvania, Philadelphia, PA 19104, USA
5Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA
6Broad Institute, Cambridge, MA 02142, USA
7Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
8These authors contributed equally to this work.
Although numerous approaches have been developed to map RNA-binding sites of individual RNA-binding proteins (RBPs), few methods exist that allow assessment of global RBP–RNA interactions. Here, we describe PIP-seq, a universal, high-throughput, ribonuclease-mediated protein footprint sequencing approach that reveals RNA-protein interaction sites throughout a transcriptome of interest. We apply PIP-seq to the HeLa transcriptome and compare binding sites found using different cross-linkers and ribonucleases. From this analysis, we identify numerous putative RBP-binding motifs, reveal novel insights into co-binding by RBPs, and uncover a significant enrichment for disease-associated polymorphisms within RBP interaction sites.
[+] Click here to see an outline of the PIP-seq method
All of our sequencing data and protein-protected sites are available for browsing using the JBrowse genome browser:
The following UCSC Genome Browser sessions are also available:
Description: These files contain all protein-protected sites (PPSs) identified. Individual replicates are available from GEO (GSE49309)
File format: (Gzipped) BED format file.
- HeLa.CSAR_PPS_FDR05.bed.gz (10 MB)
- HeLa_Form.CSAR_PPS_FDR05.bed.gz (9.0 MB)
- HeLa_UV.CSAR_PPS_FDR05.bed.gz (929 KB)
- HeLa_NO.CSAR_PPS_FDR05.bed.gz (383 KB)
Sequence motifs and modules
Description: These files contain enriched sequence motifs in protein-protected sites as identfied by MEME. Modules of co-occuring sequence motifs identified by hierarchical clustering are also included.
File format: Excel spreadsheets of sequence motifs, and gzipped archives of MEME output directories.
- Additional_data_file_9.xlsx (699 KB)
- MEME_output.HeLa_intron.tar.gz (3.8 MB)
- MEME_output.HeLa_UTR5.tar.gz (1.1 MB)
- MEME_output.HeLa_UTR3.tar.gz (6.1 MB)
- MEME_output.HeLa_internal_CDS_exon.tar.gz (1.9 MB)
Genomic motif loci
Description: These files contain all genomic loci of the above sequence motifs (detected using FIMO).
File format: (Gzipped) archives of FIMO output directories.
- FIMO_output.HeLa_intron.tar.gz (55.9 MB)
- FIMO_output.HeLa_UTR5.tar.gz (1.1 MB)
- FIMO_output.HeLa_UTR3.tar.gz (23.8 MB)
- FIMO_output.HeLa_internal_CDS_exon.tar.gz (3.1 MB)
The raw sequencing data are available from GEO here: GSE49309