SURVEY AND SUMMARY Design and bioinformatics analysis of genome-wide CLIP experiments

Abstract

The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. RNAs are regulated by RNAbinding proteins (RBPs) at all post-transcriptional stages, including splicing, transportation, stabilization and translation. Defects in the functions of these RBPs underlie a broad spectrum of human pathologies. Systematic identification of RBP functional targets is among the key biomedical research questions and provides a new direction for drug discovery. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (genome-wide CLIP) technology has recently enabled the investigation of genome-wide RBP–RNA binding at single base-pair resolution. This technology has evolved through the development of three distinct versions: HITS-CLIP, PAR-CLIP and iCLIP. Meanwhile, numerous bioinformatics pipelines for handling the genome-wide CLIP data have also been developed. In this review, we discuss the genomewide CLIP technology and focus on bioinformatics analysis. Specifically, we compare the strengths and weaknesses, as well as the scopes, of various bioinformatics tools. To assist readers in choosing optimal procedures for their analysis, we also review experimental design and procedures that affect bioinformatics analyses. INTRODUCTION The diversity of RNA in sequence and structure underpins much of cell heterogeneity and complexity. RNA-binding proteins (RBPs) are proteins that bind to doubleor singlestranded RNAs in cells and form ribonucleoprotein complexes with the bound RNAs. Located in either the nucleus or cytoplasm, or both, they engage in every step of the posttranscriptional modification process, including alternative splicing, regulation of mRNA levels, transport between cellular compartments, alternative polyadenylation, transcript stability, etc. (1,2). For example, the TIAR protein has been shown to be transported from the nucleus to the cytoplasm during Fas-mediated apoptotic cell death (3). One example of an intra-nuclear RBP is Yra1p, which has been found to be involved in mRNA export (4). Cytoplasmic RBPs, on the other hand, include Unr, which has been shown to be required for internally initiating the translation of human rhinovirus RNA (5). RBPs bind target RNAs by recognizing their sequences or/and RNA secondary structures through RNA-binding motifs. For example, the AUF1 protein recognizes RNAs through a signature motif composed of 29–39 nt with high A and U contents and a secondary structure specific to the RNAs (6). Binding of RBPs with RNA targets can also be regulated through competition with other RBPs and noncoding RNAs (7,8). RBPs may influence the global coordination of gene expression by organizing nascent groups of RNAs into downstream chains of the post-transcriptional modification process, through what is known as the ‘RNAoperon’ theory (9). RBPs have been implicated in various *To whom correspondence should be addressed. Tel: +1 214 648 5178; Fax: +1 214 648 5120; Email: yang.xie@utsouthwestern.edu C © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research Advance Access published May 9, 2015 at U niersity of T exas at D allas on Jne 1, 2015 http://narrdjournals.org/ D ow nladed from 2 Nucleic Acids Research, 2015 types of human diseases (1,10–13). For instance, the RBP Musashi1 was found to be related to many cancer types, including those of the breast, colon, medulloblastoma and glioblastoma, as well as to neurogenesis and neurodegenerative diseases (13). In addition, lack of Fragile X mental retardation protein (FMRP) results in a deficiency in human cognition and premature ovarian insufficiency (14) and the FUS, EWSR1, and TAF15 (FET) protein family is responsible for RNA editing and plays important roles in many diseases (15,16). In summary, studying RNA-protein interactions is necessary to achieve a systematic understanding of transcription, translation and other biological processes. CLIP (cross-linking immunoprecipitation) is a molecular biology technology that employs ultraviolet (UV) crosslinking and immunoprecipitation in order to identify RBP– RNA interactions (17,18). The advantage of CLIP lies in allowing identification of interactions within cells (where the crosslinking occurs) versus interactions that might occur after cells are lysed. CLIP increases the confidence that observed interactions are physiologically relevant and can better justify identification of candidates for experimental validation. In early reports, CLIPed cDNAs were sequenced in a low-throughput manner that yielded a few hundred sequence reads. Recently, next-generation sequencing (NGS) techniques have been applied to globally analyzing transcriptional and post-transcriptional regulation, including mRNA sequencing (19), alternative splicing (20) and miRNA profiling (21). The combination of CLIP with NGS technology has greatly improved our ability to study RBP–RNA interactions on the genome scale (22). While earlier genome-wide CLIP studies focused more on the binding of RBP to mRNAs, recent studies have implicated a wide range of regulatory functions of RBP binding sites in long noncoding RNA (lncRNA) (23), circular RNA (24) and mitochondrial RNA (25). In this study, we first review the general procedure and then compare current genome-wide CLIP technologies. Next, we discuss the major experimental design and bioinformatics analysis considerations. Finally, we provide an overview of the current analysis software and databases for genome-wide CLIP data. Current genome-wide CLIP technologies There are three major technologies for genome-wide CLIP experiments: (i) HITS-CLIP (high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation) (22,26), which is the first version of genome-wide CLIPSeq technology; (ii) Photoactivatable-RibonucleosideEnhanced Crosslinking and Immunoprecipitation (PARCLIP) (27), which improved the signal-to-noise ratio of the characteristic mutations observed in sequencing data by use of nucleoside analog; and (iii) Individual-nucleotide resolution CLIP (iCLIP) (28), which achieved a much higher efficiency in reverse-transcription compared with HITS-CLIP and PAR-CLIP. Throughout this text, we used genome-wide CLIP as a generic name for HITS-CLIP, PAR-CLIP and iCLIP. The field of RNA-regulation has seen rapid growth for all versions of genome-wide CLIP technology (Figure 1). In general, genome-wide CLIP technology involves cross-linking, partial RNA digestion, 0 100 200 300 400 500 600 700 800 90

6 Figures and Tables

Cite this paper

@inproceedings{Wang2015SURVEYAS, title={SURVEY AND SUMMARY Design and bioinformatics analysis of genome-wide CLIP experiments}, author={Tao Wang and Guanghua Xiao and Yongjun Chu and Michael Q. Zhang and David R. Corey and Yang Xie}, year={2015} }