The 30 untranslated regions (30UTRs) of eukaryotic genes regulate mRNA stability, localization and translation. Here, we present evidence that large numbers of 30UTRs in human, mouse and fly are also expressed separately from the associated protein-coding sequences to which they are normally linked, likely by post-transcriptional cleavage. Analysis of CAGE (capped analysis of gene expression), SAGE (serial analysis of gene expression) and cDNA libraries, as well as microarray expression profiles, demonstrate that the independent expression of 30UTRs is a regulated and conserved genome-wide phenomenon. We characterize the expression of several 30UTR-derived RNAs (uaRNAs) in detail in mouse embryos, showing by in situ hybridization that these transcripts are expressed in a celland subcellular-specific manner. Our results suggest that 30UTR sequences can function not only in cis to regulate protein expression, but also intrinsically and independently in trans, likely as noncoding RNAs, a conclusion supported by a number of previous genetic studies. Our findings suggest novel functions for 30UTRs, as well as caution in the use of 30UTR sequence probes to analyze gene expression. INTRODUCTION The 30 untranslated regions (30UTRs) of messenger RNAs (mRNAs) affect the expression of eukaryotic genes by regulating mRNA translation, stability and subcellular localization (1). 30UTRs are typically defined by cDNA cloning, which shows they are contiguous with the upstream protein-coding region in the mRNA. The length of 30UTRs has undergone a massive expansion during metazoan evolution, with annotated 30UTRs in human and mouse rivaling the average size of protein-coding sequences and in some cases exceeding 10 kb (2,3). Furthermore, 30UTRs are highly conserved and contain some of the most conserved elements within the mammalian genome (4). Together, these observations suggest that 30UTRs have assumed an increasingly important role in the evolution of the eukaryotic genome. The control of mRNA expression by 30UTRs is mediated by trans-acting factors, including RNA-binding proteins and microRNAs (miRNAs), which interact with cis-regulatory elements within the 30UTR (1). The post-transcriptional regulation mediated by 30UTRs is crucial for the correct spatial and temporal expression of the protein encoded by the mRNA. Indeed, the importance of regulation by 30UTRs was recently highlighted by the finding that 30UTRs are reduced in length in proliferating cells, which in some cases was shown to mediate an increased expression of the associated mRNA (5). Interestingly, the analysis of transcription *To whom correspondence should be addressed. Tel: +61 7 3346 2079; Fax: +61 7 3346 2101; Email: email@example.com The authors wish it to be known that, in their opinion, the first four authors should be regarded as joint First Authors. Published online 12 November 2010 Nucleic Acids Research, 2011, Vol. 39, No. 6 2393–2403 doi:10.1093/nar/gkq1158 The Author(s) 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. start sites has suggested that transcription may also be initiated from within 30UTR sequences, and therefore act as a source of independent transcripts (6) that may exhibit expression patterns different from their upstream protein-coding sequences. Here, we show that the 50 termini of many RNAs map within 30UTRs of genes in human, mouse and fly, and verify the separate and developmentally regulated expression of 30UTR-associated RNAs (which we have termed uaRNAs) by a range of in silico and molecular biology approaches, including in situ hybridization (ISH). Furthermore, we present evidence that a portion of these distinctively expressed 30UTRs arises by posttranscriptional processing rather than new transcription initiation. Our results, supported by previous genetic studies of several individual genes, suggest that there is trans-acting embedded genetic information in 30UTRs with potential biological function. MATERIALS AND METHODS Capped analysis gene expression/serial analysis of gene expression analysis Analyses were performed using RefSeq (7) gene annotations and the hg18, mm8 and dm3 genome assemblies provided within the UCSC Genome Browser (8). Human and mouse capped analysis gene expression (CAGE) retrieved from RIKEN (http://fantom3.gsc. riken.jp/) and fruity fly Serial Analysis of Gene Expression (SAGE) tags retrieved from MachiBase (9) were mapped to the genome with ZOOM requiring exact and unique matches (10). Syntenic locations of mouse 30UTR CAGE tags in the human genome were identified using the LiftOver utility (8). Mouse CAGE tags that mapped to the same site as human CAGE tags were defined as conserved. Full-length cDNA analysis Full-length human and mouse cDNA sequences were retrieved from RIKEN (http://fantom3.gsc.riken.jp/). Putative uaRNAs were identified by intersecting 50 cDNA coordinates with RefSeq-annotated 30UTRs. The CRITICA algorithm (11) was used to identify nonprotein-coding from the RIKEN FANTOM3 full-length mouse cDNA library as described previously (12). UaRNA transcription initiation analysis Deep sequencing tags derived from H3K4me1, H3K4me2, H3K4me3 and H3K27ac and RNAPII immunoprecipitation for resting CD4+ cells (13) were obtained from the NCBI short read archive (accession ID SRA000234 and SRA000287) and mapped to the human genome (hg18) with ZOOM requiring exact and unique matches (10). To determine enrichment of chromatin marks with uaRNA or mRNA initiation sites, the relative mapping position of sequencing tags to the nucleotide associated with the highest CAGE tag frequency within the 30UTR or promoter was plotted over a ±50-nt window. CAGE tags spanning exon–exon junctions (EEJs) were identified by mapping tags without a perfect match to the genome to EEJ sequences, which comprise 20 nt on either side of the splice site, located within RefSeq-annotated 30UTRs. CAGE expression analysis To determine the dynamic expression of 30UTR CAGE tags across eight mouse tissues (embryo, lung, liver, visual cortex, somatosensory cortex, cerebellum and hippocampus) (14) and six time points during the differentiation of the human THP1 myelomonocytic leukemia cell line (15), we summed the total normalized CAGE tag frequency for each 30UTR. The 500 genes that contained the highest frequency of 30UTR CAGE tags were clustered using the Cluster utility (16). For human genes, 30UTR CAGE tag frequency was normalized to the median across the time series. CAGE tag frequencies were log transformed and visualized as a heat map. CAGE tag frequencies in 30UTRs were compared to the CAGE tag frequency in the promoter for the gene subset. Promoter expression levels were defined as the sum of CAGE tags within the promoter region (±50-nt window around RefSeq-annotated transcription start site). The ratio of promoter and 30UTR expression levels were calculated and visualized as a heat map alongside the expression clusters. In situ hybridization Section in situ hybridization (ISH) on paraffin-embedded, sectioned at 7 mm, whole-mouse embryos was performed as described previously (17). The genomic coordinates and length of the different ISH probes used are shown in the Supplementary Data.