Promoting measured genes and measured environments: on the importance of careful statistical analyses and biological relevance.


R esearch testing the interaction between measured genes and measured environments in psychiatric disorders was promoted in a recent review by Moffitt et al in the ARCHIVES. In presenting the emerging gene environment interaction findings, Caspi et al cite their finding of an interaction between the genetic variants of monoamine oxidase A conferring low enzymatic activity and childhood maltreatment to increase the risk for violent behavior. Moffitt et al cite one replication of this finding by Foley et al that was published in the ARCHIVES. A careful perusal of this latter study, which has now been cited 19 times, shows that it is flawed both in its analyses and interpretation. First, Foley et al argued that the results obtained with logistic regression are likely to be more robust than results obtained within a linear regression framework. However, logistic regression has the disadvantage of collapsing ordinal or count response variables into a dichotomous variable, which may result in loss of information. Ordinal regression, negative binomial regression, or Poisson regression models are robust and more appropriate techniques when analyzing disorder symptom counts. Second, although Foley et al favored logistic over linear regression to avoid false-positive interactions due to scaling artifacts (heteroscedasticity), they did not assess or report the fit of their model. The presence of zero or small cell counts in interaction terms (as evident from Table 2 in the Foley et al article) may cause numerical problems in the modeling stage of the analysis. Using the raw data provided in the Foley et al Table 2, we found that the logistic regression model presented has poor fit (HosmerLemeshow test, 2 4=8.9; P=.06). However, model fit improved when we grouped categories 2, 3, and 4 of environmental adversity and used 3 (0, 1, and 2-4) instead of 5 categories ( 2 3=5.5; P=.14). However, the interaction between monoamine oxidase A and environmental adversity was nonsignificant (P=.36, 2-sided test). This is not surprising and could have been suspected simply by noticing that too many cells (7 [35%] of 20) in Table 2 had between 0 and 4 observations. It is also surprising to see the misleading Figure published in the highly reputed ARCHIVES, where a strong visual effect of interaction is in fact due to 1 observation made on a sample size of n=1 (1/1=100%!). Finally, in this study, the monoamine oxidase A genotypes conferring low enzymatic activity are associated with a decreased risk of antisocial behavior whereas the same genotypes in combination with environmental adversity are associated with the opposite effect. In contradiction with the principal of parsimony, Foley et al interpreted this observation as an important finding indicating the complicated nature of psychiatric genetics. This may simply reflect the lack of rigor in the application of statistical methods to complex psychiatric disorders. In a more recently published study, Thapar and colleagues reported that the catechol-O-methyltransferase (COMT) Val/Val genotype is associated with increased symptoms of conduct disorder particularly in children with lower birth weight. Contrary to Foley et al, they opted to use multiple regression as their primary analysis. However, the Figure provided in the Thapar et al article suggests that the distribution of the outcome variable is highly skewed (the majority of children do not have conduct disorders). Further, birth weight was not corrected for gestational age and we do not know whether it was checked for outliers or not. Unfortunately, we could not assess these critical issues given the lack of basic information such as demographic characteristics (along with their standard deviation) of the 3 genotype groups by birth weight and dispersion parameters of conduct disorder symptom scores for each of the different groups. All these arguments seriously call into question the validity of the linear regression model and the results of this study. Nevertheless, Thapar et al applied logistic regression, which, even though robust to heteroscedasticity, does not address all the other concerns raised herein. Remarkably, all the significant results became only marginal when logistic regression was used. Finally, we call into question the hypothesis advanced in the Thapar et al study, which is based on the “links between COMT and prefrontal cortical functioning,” when a previous study by the same group on the same population concluded that the “Val158Met COMT genotype is not associated with neurocognitive performance (neurocognitive tests of prefrontal cognition).” In conclusion, while it is important to investigate gene environment interactions in psychiatric disorders, we underline the importance of rigorous application of statistical methods while avoiding potential bias, including review and publication biases, and necessitating biological relevance. Failure to do so may result in statistically significant results that may be biologically irrelevant and serve only to wrongly heighten expectations.

Cite this paper

@article{Joober2007PromotingMG, title={Promoting measured genes and measured environments: on the importance of careful statistical analyses and biological relevance.}, author={Ridha Joober and Sarojini M. Sengupta and Norbert Schmitz}, journal={Archives of general psychiatry}, year={2007}, volume={64 3}, pages={377-8; author reply 378-9} }