RNase-mediated protein footprint sequencing reveals protein-binding sites throughout the human transcriptome

Ian Silverman1,2,3,8, Fan Li1,2,4,8, Anissa Alexander1,2,3, Loyal Goff5,6, Cole Trapnell5,6, John L. Rinn5,6,7, and Brian D. Gregory1,2,3,4

1Department of Biology
2Penn Genome Frontiers Institute
3Cell and Molecular Biology Graduate Group
4Genomics and Computational Biology Graduate Group, University of Pennsylvania, Philadelphia, PA 19104, USA
5Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA
6Broad Institute, Cambridge, MA 02142, USA
7Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
8These authors contributed equally to this work.


Abstract

Although numerous approaches have been developed to map RNA-binding sites of individual RNA-binding proteins (RBPs), few methods exist that allow assessment of global RBP–RNA interactions. Here, we describe PIP-seq, a universal, high-throughput, ribonuclease-mediated protein footprint sequencing approach that reveals RNA-protein interaction sites throughout a transcriptome of interest. We apply PIP-seq to the HeLa transcriptome and compare binding sites found using different cross-linkers and ribonucleases. From this analysis, we identify numerous putative RBP-binding motifs, reveal novel insights into co-binding by RBPs, and uncover a significant enrichment for disease-associated polymorphisms within RBP interaction sites.


[+] Click here to see an outline of the PIP-seq method


Genome browser

All of our sequencing data and protein-protected sites are available for browsing using the JBrowse genome browser:

The following UCSC Genome Browser sessions are also available:

Supplemental data


Protein-protected sites

Description: These files contain all protein-protected sites (PPSs) identified. Individual replicates are available from GEO (GSE49309)

File format: (Gzipped) BED format file.

Files:


Sequence motifs and modules

Description: These files contain enriched sequence motifs in protein-protected sites as identfied by MEME. Modules of co-occuring sequence motifs identified by hierarchical clustering are also included.

File format: Excel spreadsheets of sequence motifs, and gzipped archives of MEME output directories.

Files:


Genomic motif loci

Description: These files contain all genomic loci of the above sequence motifs (detected using FIMO).

File format: (Gzipped) archives of FIMO output directories.

Files:


Raw data


FASTQ files

The raw sequencing data are available from GEO here: GSE49309


Contact

If you have any questions or comments, please contact:

Mailing address:
204G Carolyn Lynch Laboratory
433 S. University Ave.
Department of Biology
University of Pennsylvania
Philadelphia, PA 19104 USA
Gregory Lab homepage