Genome-Wide Scans for Candidate Genes Involved to the Aquatic Adaptation of Dolphins

Abstract

Since their divergence from the terrestrial artiodactyls, cetaceans have fully adapted to an aquatic lifestyle, which represents one of the most dramatic transformations in mammalian evolutionary history. Numerous morphological and physiological characters of cetaceans have been acquired in response to this drastic habitat transition, such as thickened blubber, echolocation, and ability to hold their breath for a long period of time. However, knowledge about the molecular basis underlying these adaptations is still limited. The sequence of the genome of Tursiops truncates provides an opportunity for a comparative genomic analyses to examine the molecular adaptation of this species. Here, we constructed 11,838 high-quality orthologous gene alignments culled from the dolphin and four other terrestrial mammalian genomes and screened for positive selection occurring in the dolphin lineage. A total of 3.1% (368) of the genes were identified as having undergone positive selection by the branch-site model. Functional characterization of these genes showed that they are significantly enriched in the categories of lipid transport and localization, ATPase activity, sense perception of sound, and muscle contraction, areas that are potentially related to cetacean adaptations. In contrast, we did not find a similar pattern in the cow, a closely related species. We re-sequenced some of the positively selected sites (PSSs), within the positively selected genes (PSGs), and showed that most of our identified PSSs (50/52) could be replicated. The results from this study should have important implications for our understanding of cetacean evolution and their adaptations to the aquatic environment. Introduction Cetaceans diverged from artiodactyls approximately 50 million years ago (Meredith et al. 2011) and their habitat transition, from land to an aquatic environment, represents one of the most dramatic transformations in mammalian evolutionary history. These adaptation inevitably posed challenges for the ancient cetaceans, which had originally been adapted for terrestrial life, with locomotion (navigation) and detection of prey being major ones. For locomotion, they needed to confront the by gest on A ril 0, 2016 http://gberdjournals.org/ D ow nladed from considerable obstacle provided by water, whose density is much higher than air. To overcome drag, cetaceans have evolved some extreme changes in morphology and physiology, including a streamlined form and a modified skeletal system (Fish, Beneski, Ketten 2007; Reidenberg 2007). In addition, most cetaceans possess a thick layer of blubber, which increases their buoyancy (Struntz et al. 2004). For foraging in water, these mammals constantly need to hunt at night or in deep water, therefore, it is vital for them to possess superior capabilities of long-time diving and locating prey. It is striking that some cetacean species have acquired an ability to echolocate that has enabled them to use sound to locate prey or escape obstacles when navigating (Cranford, Amundin, Norris 1996). Moreover, cetaceans have elevated levels of myoglobin in their skeletal muscles (Noren et al. 2001; Wright, Davis 2006), which vastly increases their ability to retain oxygen, allowing for longer time between breaths. They also utilize glycolysis metabolism to compensate for insufficient levels of oxygen (Butler, Jones 1997), which potentially supports the energy supply for their long dives. Considering the significant phenotypic modifications in cetaceans, it should be expected that these modifications were shaped by natural selection, and conferred a selective advantage, as they adapted to the new aquatic environment. What are the underlying molecular mechanisms for these innovations? Positive Darwinian selection is one of the major driving forces for adaptive evolution and species diversification, which had been widely investigated in many species (Kosiol et al. 2008; Lefebure, Stanhope 2009; Shen et al. 2010; Oliver et al. 2011; Wissler et al. 2011; McGowen, Grossman, Wildman 2012). A few studies have focused on the adaptive evolution of marine mammals (McClellan et al. 2005; Wang et al. 2009), however, as the complete genomes of marine mammals were not available at that time, the datasets analyzed in these previous studies were limited to only a few genes (eg. cytB and HoxD) (McClellan et al. 2005; Wang et al. 2009). Whole-genome wide identifications of positively selected genes (PSGs) along the marine mammal lineage should greatly help us understand the genetic bases underlying adaptive evolution in marine mammals. The genome sequence of the bottlenose dolphin (Tursiops truncates) by gest on A ril 0, 2016 http://gberdjournals.org/ D ow nladed from provides an opportunity to conduct this analysis. A series of evolutionary models for testing positive selection have been developed in the past decade, including the branch model, the site model, and the branch-site model (Yang 1998; Yang et al. 2000; Yang, Nielsen 2002; Yang, Wong, Nielsen 2005; Zhang, Nielsen, Yang 2005). In the first two models, positive selection is inferred only if the dN/dS average over all sites or all branches is significantly greater than 1. Positive selection, however, often operates episodically on only a small number of sites on a few lineages (Yang, Nielsen 2002), limiting the power of detecting positive selection by the branch and site models. The branch-site model, a more powerful model, was developed to address this issue (Yang, Nielsen 2002; Zhang, Nielsen, Yang 2005) and has been widely used in screens for positive selection [cf. (Bakewell, Shi, Zhang 2007; Kosiol et al. 2008; Studer et al. 2008; Shen et al. 2010)]. Here, we constructed whole-genome ortholog gene sets among five mammalian species, including dolphin (Tursiops truncates), cow (Bos Taurus), dog (Canis familiaris), panda (Ailuropoda melanoleuca), and human (Homo sapiens), and identified positively selected genes (PSGs) along the dolphin lineage with the improved branch-site model to build a database of genes that might be correlated with aquatic adaptation in the dolphin. As the current release of the dolphin genome has only 2.59× coverage, there are limitations for comparative genomic analyses, especially the detection of positive selection. Sequencing errors, problems with annotation, alternative splicing, amino acid repeats, and frameshift mutations could generate a higher rate of false-positive with the branch-site model (Mallick et al. 2009; Schneider et al. 2009; Markova-Raina, Petrov 2011), therefore, generating accurate alignments is an essential step in the inference of positive selection. The Prank software (Loytynoja, Goldman 2005; Loytynoja, Goldman 2008) was recently reported as being able to generate much more accurate alignments than other traditional aligners (Fletcher, Yang 2010; Markova-Raina, Petrov 2011), thus we used this algorithm to align all the genes used in this study. Moreover, we re-sequenced some of the candidate PSS regions to confirm their reliability. We show that most by gest on A ril 0, 2016 http://gberdjournals.org/ D ow nladed from (50/52) of our identified PSSs are reliable. Through a functional clustering analysis of the dolphin PSGs, we found that they are enriched for categories such as lipid transport and localization, ATPase activity, perception of sound, and muscle contraction clusters. Material and Methods Coding region sequences of individual genes from the genomes of the dolphin and other species were downloaded from Ensembl (version 66, March 2012) using the BioMart tool (Vilella et al. 2009). The species used here for comparison with dolphin include cow (Bos Taurus, UMD3.1), dog (Canis lupus familiaris, CanFam_2.0), panda (Ailuropoda melanoleuca, ailMel1), and human (Homo sapiens, GRCh37.p6). A phylogenetic tree of these species is shown in Figure 1, which is derived from Murphy et al. (Murphy et al. 2007). To predict homologs among the five genomes, we used the Ensembl inferences (Vilella et al. 2009). For each pair of these genomes, only those that Ensembl annotated as one2one orhologous genes were retrieved and analyzed in the following step. If a gene had multiple transcripts, then the longest one was chosen. After these treatments, we obtained 12,057 gene sets. The Prank program (Loytynoja, Goldman 2005; Loytynoja, Goldman 2008) was used to align all of the gene sets. Since Prank performs much better at the codon level than at the amino-acid level (Fletcher, Yang 2010) for protein-coding sequences, all of the genes were thus directly aligned at the codon level with the option ―-codon‖. After the alignments were generated, we performed a trimming treatment to remove potentially unreliable regions using the Gblocks program (Castresana 2000). The parameters used were the default settings with the sequence type being codon (―-t=c‖). In addition, to reduce the effect of uncertain bases on the inference of positive selection, we deleted all positions that had gaps (―-‖) and ―N‖ from the alignments. After the trimming process, if the remaining alignment was shorter than 120bp (40 codons), then the entire alignment was discarded. In addition to alignment uncertainty, saturation at silent sites (dS) may also bias the inference of positive selection (Smith, Smith 1996). To identify saturation, for by gest on A ril 0, 2016 http://gberdjournals.org/ D ow nladed from each gene the third codon positions were extracted and branch lengths on the species tree were estimated using the GTR model with PAML (Yang 2007). Branch lengths were used as a proxy for saturation, and genes were removed from the analysis if one or more branches had a length ≥ 1. Our final dataset contained 11,838 genes. For each of the remaining genes, a branch-site evolutionary analysis for positive selection was conducted using codeml from the PAML package (Yang 2007). In this study, the improved branch-site model (Yang, Nielsen 2002; Zhang, Nielsen, Yang 2005) was used. This model requires that the branches of the tree be divided in priori into foreground and background lineages. A likelihood ratio test (LRT) compares a model with positive selection on the foreground branch to a null model where no positive selection occurred on the foreground branch and calculates the statistic ( ln 2 ) to obtain a P-value. In this study, genes were inferred to be PSGs only if the P-value was less than 0.01. This model can also infer positively selected sites (PSS) based on an empirical Bayes analysis (Yang, Wong, Nielsen 2005). In this study, PSS were inferred only if their posterior probability was greater than 95%. After PSGs were detected, we used the DAVID Functional Annotation tool (Huang da, Sherman, Lempicki 2009) to investigate their enrichment of Gene Ontology (GO) terms. During this analysis, the human ortholog of the PSG was as the input against a background of human genes. Within each annotation cluster, DAVID lists the GO terms that are significantly enriched. In this study we used the approach of McGowen et al. (McGowen, Grossman, Wildman 2012), where only terms with an enrichment score > 1.3 were considered meaningful. To confirm our identified positively selected sites, we randomly selected 48 PSGs for whom the presence of PSS had been detected, and designed PCR primers using Primer3 (Rozen, Skaletsky 2000) to directly amplify and sequence these regions using PCR and an Applied Biosystems 3730 DNA Analyzer, respectively. DNA for this study was extracted from the same species of dolphin (Tursiops truncatus). Information on these primers is available in Supplementary Table 1 and all the segments sequenced in this study were deposited into GenBank with accession numbers from JX856347 to JX856394. by gest on A ril 0, 2016 http://gberdjournals.org/ D ow nladed from

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Sun2012GenomeWideSF, title={Genome-Wide Scans for Candidate Genes Involved to the Aquatic Adaptation of Dolphins}, author={Yan-bo Sun and Weiping Zhou and He-Qun Liu and David M. Irwin and Yongyi Shen and Yaping Zhang}, year={2012} }