Global analysis of the RNA-protein interaction and RNA secondary structure landscapes of the Arabidopsis nucleus

Sager J. Gosai1,5, Shawn W. Foley1,2,5, Dongxue Wang3, Ian M. Silverman1,2, Nur Selamoglu1, Andrew D.L. Nelson4, Mark A. Beilstein4, Fevzi Daldal1, Roger B. Deal3, and Brian D. Gregory1,2s

1Department of Biology
2Cell and Molecular Biology Graduate Group, University of Pennsylvania, Philadelphia, PA USA
3Department of Biology, Emory University, Atlanta, GA USA
4School of Plant Sciences, University of Arizona, Tucson, AZ USA
5These authors contributed equally to this work.


Post-transcriptional regulation in eukaryotes requires cis- and trans-acting features and factors including RNA secondary structure, and RNA-binding proteins (RBPs). However, a comprehensive view of the structural and RBP interaction landscape of RNAs in the nucleus has yet to be compiled for any organism. Here, we use our ribonuclease-mediated structure and RBP binding site mapping approach on Arabidopsis seedling nuclei in vivo to globally profile these features within the nuclear compartment. We reveal opposing patterns of secondary structure and RBP binding levels throughout native messenger RNAs that demarcate alternative splicing and polyadenylation. We also uncover a collection of protein bound sequence motifs, and identify their structural contexts, co-occurrences in transcripts encoding functionally related proteins, and interactions with putative RBPs. Finally, we identify a nuclear role for the chloroplast RBP, CP29A. In total, we provide the first simultaneous view of the RNA secondary structure and RBP interaction landscapes in a eukaryotic nucleus.

[+] Click here to see an outline of the PIP-seq method

Genome browser

All of our sequencing data and protein-protected sites are available for browsing using the JBrowse genome browser:

Supplemental data

Protein-protected sites

Description: These files contain all protein-protected sites (PPSs) identified. Individual replicates are available from GEO (GSE#####)

File format: (Gzipped) BED format file.


Sequence motifs and modules

Description: These files contain enriched sequence motifs in protein-protected sites as identfied by MEME and HOMER. Modules of co-occuring sequence motifs identified by hierarchical clustering are also included.

File format: Excel spreadsheets of sequence motifs, and gzipped archives of MEME and HOMER output directories.


Genomic motif loci

Description: These files contain all genomic loci of the above sequence motifs (detected using HOMER).

File format: Gzipped bed file of all HOMER motif loci.


RNA affinity probes and qPCR primer sequences

Description: This file contains sequences for RNA probes used in UV crosslinking and RNA affinity purification, as well as primers for RT-qPCR.

File format: Excel sheet with sequences listed.


Peptides Identified by Mass Spec

Description: This file contain all peptides identified by mass spec.

File format: Excel sheet for the peptides identified.


Raw data

FASTQ files

The raw sequencing data are available from GEO using accession number GSE58974: GSE58974

Mass Spectral Data

DescriptionThe raw spectral data for the candidate RNA binding proteins.

File format: PDF of raw spectra output.



If you have any questions or comments, please contact:

Mailing address:
103D Carolyn Lynch Laboratory
433 S. University Ave.
Department of Biology
University of Pennsylvania
Philadelphia, PA 19104 USA
Gregory Lab homepage