Modeling considerations for using expression data from multiple species.


Although genome-wide expression data sets from multiple species are now more commonly generated, there have been few studies on how to best integrate this type of correlated data into models. Starting with a single-species, linear regression model that predicts transcription factor binding sites as a case study, we investigated how best to take into account the correlated expression data when extending this model to multiple species. Using a multivariate regression model, we accounted for the phylogenetic relationships among the species in two ways: (i) a repeated-measures model, where the error term is constrained; and (ii) a Bayesian hierarchical model, where the prior distributions of the regression coefficients are constrained. We show that both multiple-species models improve predictive performance over the single-species model. When compared with each other, the repeated-measures model outperformed the Bayesian model. We suggest a possible explanation for the better performance of the model with the constrained error term.

DOI: 10.1002/sim.5850

Cite this paper

@article{Siewert2013ModelingCF, title={Modeling considerations for using expression data from multiple species.}, author={Elizabeth A Siewert and Katerina Kechris}, journal={Statistics in medicine}, year={2013}, volume={32 23}, pages={4057-70} }