Regulatory impact of RNA secondary structure across the Arabidopsis transcriptome

Fan Li1,2,3,*, Qi Zheng1,2,*, Lee E. Vandivier1,2,4, Matthew R. Willmann1,2, Ying Chen1,2,3, and Brian D. Gregory1,2,3,4 [link][pdf]

1Department of Biology
2Penn Genome Frontiers Institute
3Genomics and Computational Biology Graduate Group
4Cell and Molecular Biology Graduate Group, University of Pennsylvania, Philadelphia, PA 19104, USA
*These authors contributed equally to this work.


The secondary structure of an RNA molecule plays an integral role in its maturation, regulation, and functionality. However, the global influence of this feature on plant gene expression is still largely unclear. Here, we use a high-throughput, sequencing-based, structure-mapping approach in conjunction with transcriptome-wide sequencing of ribosomal RNA-depleted (RNA-seq), small (smRNA-seq), and ribosome-bound (ribo-seq) RNA populations to investigate the impact of RNA secondary structure on gene expression regulation in Arabidopsis. From this analysis, we find that highly unpaired and paired RNAs are strongly correlated with euchromatic and heterochromatic epigenetic histone modifications, respectively, providing further evidence that secondary structure is necessary for RNA-mediated posttranscriptional regulatory pathways. Additionally, we uncover key structural patterns across protein-coding transcripts that indicate RNA folding demarcates regions of protein translation and likely affects microRNA-mediated regulation of mRNAs in this model plant. We further reveal that RNA folding is significantly anti-correlated with overall transcript abundance, which is likely due to the increased propensity of highly structured mRNAs to be degraded and/or processed into smRNAs. Finally, we find that secondary structure affects mRNA translation, suggesting that this feature regulates plant gene expression at multiple levels. Overall, our findings provide the first global assessment of RNA folding and its significant regulatory effects in a plant transcriptome.

Genome browser

Update (5/23/13): We are now migrating to the JBrowse genome browser. You can view our sequencing data using JBrowse, or using the existing AnnoJ link below.

All of our sequencing data are available for browsing using the AnnoJ genome browser:

For users unfamiliar with the AnnoJ browser, this demo video will be helpful.

Supplemental data


Description: These files contain all dsRNA and ssRNA 'hotspots' identified in A. thaliana.

File format: (Gzipped) BED format file.


RPKM values

Description: These files contain RPKM data for various sequencing libraries.

File format: Tab-delimited plain text file.


Structure models

Description: These files contain models of RNA secondary structures for protein coding transcripts, either folded using default RNAfold settings ("native") or with additional constraints determined by our sequencing data ("constrained").

File format: Gzip files. These will extract into a directory that contains files for each transcript.


Structure scores

Description: These files contain either raw or standardized log-ratio structure scores for protein coding transcripts.

File format: Gzip files. These will extract into a directory that contains files for each transcript.


Raw data

GEO links

Description: Raw sequencing data is available from GEO.


Mapped reads

Description: Mapped reads in BED format are also available for download.

File format: (Gzipped) BED format files. Note that the 'score' column of these BED files represents the number of times each sequence was cloned.


7-way flowering plant phastCons scores

Description: These files contain per-nucleotide phastCons scores for Arabidopsis thaliana (TAIR9 assembly). The other plants used in the multiple alignment are:

  • Glycine max (Glyma1 assembly)
  • Populus trichocarpa (v1.0 assembly)
  • Sorghum bicolor (v1.0 assembly)
  • Medicago truncatula (Mt3 assembly)
  • Oryza sativa (release 6.1)
  • Vitis vinifera (Genoscope 8.4× release)

If you use these scores in your work, please cite:

  • Q. Zheng, P. Ryvkin, F. Li, I. Dragomir, O. valladares, J. Yang, K. Cao, L.S. Wang, and B.D. Gregory. "Genome-wide double-stranded RNA sequencing reveals the functional significance of base-paired RNAs in Arabidopsis". PLoS Genet. 2010 Sep 30;6(9). pii: e1001141. doi: 10.1371/journal.pgen.1001141.
  • F. Li, Q. Zheng, L.E. Vandivier, M.R. Willmann, Y. Chen, and B.D. Gregory. "Regulatory impact of RNA secondary structure across the Arabidopsis thaliana transcriptome". Plant Cell. 2012 Nov;24(11):4346-59. doi: 10.1105/tpc.112.104232.

File format: (Gzipped) WIG files. Click here for more details.



If you have any questions or comments, please contact:

Mailing address:
204G Carolyn Lynch Laboratory
433 S. University Ave.
Department of Biology
University of Pennsylvania
Philadelphia, PA 19104 USA
Gregory Lab homepage