• Corpus ID: 240354729

Principal Component Pursuit for Pattern Identification in Environmental Mixtures

  title={Principal Component Pursuit for Pattern Identification in Environmental Mixtures},
  author={Elizabeth A. Gibson and Junhui Zhang and Jingkai Yan and Lawrence G. Chillrud and Jaime Benavides and Yanelli Nunez and Julie Beth Herbstman and Jeff Goldsmith and John N. Wright and Marianthi-Anna Kioumourtzoglou Department of Environmental Health Sciences and Columbia University Mailman School of Public Health and Department of Electrical Engineering and Columbia University Data Science Institute and Department of Biostatistics},
Background and Aims: Environmental health researchers often aim to identify sources or behaviors that give rise to potentially harmful environmental exposures. We have adapted principal component pursuit (PCP)—a robust and well-established technique for dimensionality reduction in computer vision and signal processing—to identify patterns in environmental mixtures. PCP decomposes the exposure mixture into a low-rank matrix containing consistent patterns of exposure across pollutants and a… 

Figures and Tables from this paper

Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods
37 new methods from PRIME projects are reviewed and summarized to enable more informed analyses of environmental mixtures and stress training for early career scientists as well as innovation in statistical methodology as an ongoing need.
State-of-the-Art Methods for Exposure-Health Studies: results from the Exposome Data Challenge Event
The exposome data challenge presented a unique opportunity for researchers from different disciplines to create and share state-of-the-art analytical methods, setting a new standard for open science in the exposomes and environmental health field.


An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length
This work employed methods geared toward distinct research questions concerning persistent organic chemicals as a mixture and leukocyte telomere length (LTL) as an outcome to identify patterns of POP exposure, potentially toxic agents, the absence of interaction, and the overall mixture effect.
Complex Mixtures, Complex Analyses: an Emphasis on Interpretable Results
The importance of robust methods and interpretable results over predictive accuracy is emphasized, and collaboration with computer scientists, data scientists, and biostatisticians in future mixture method development is encouraged.
Cross-Validation in Principal Component Analysis
SUMMARY This paper describes a form of cross-validation, in the context of principal component analysis, which has a number of useful aspects as regards multivariate data inspection and description.
Robust principal component analysis?
It is proved that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, this suggests the possibility of a principled approach to robust principal component analysis.
The Urban Exposome during Pregnancy and Its Socioeconomic Determinants
Pregnant women of low SEP were exposed to higher levels of environmental hazards in some cities, but not others, which may contribute to inequities in child health and development.
Background levels of polychlorinated biphenyls in the U.S. population.
Principal component analysis: a review and recent developments
  • I. JolliffeJ. Cadima
  • Computer Science
    Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
  • 2016
The basic ideas of PCA are introduced, discussing what it can and cannot do, and some variants of the technique have been developed that are tailored to various different data types and structures.
Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix Recovery
The authors' simulations corroborate the claim that a universal choice of the regularization parameter yields near optimal performance across a range of noise levels, indicating that the proposed method outperforms the (somewhat loose) bound proved here.
The impact of source contribution uncertainty on the effects of source-specific PM2.5 on hospital admissions: A case study in Boston, MA
Assessing the effects of PM2.5 sources, identified by positive matrix factorization (PMF) and absolute principle component analysis (APCA), on emergency CVD hospital admissions among Medicare enrollees in Boston, MA, during 2003–2010, found agreement in PMF and APCA results was stronger when uncertainty was considered in health models.
Nondetects and Data Analysis: Statistics for Censored Environmental Data
vides an excellent precursor to the final chapter on Neural Networks (NN) which the authors highlight as being especially useful for dealing with longer data gaps that can be accurately dealt with