Parallel-and-stream accelerator for computationally fast supervised learning

  title={Parallel-and-stream accelerator for computationally fast supervised learning},
  author={Emily C Hector and Lan Luo and Peter Xuekun Song},
  journal={Comput. Stat. Data Anal.},

Figures and Tables from this paper



Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

This paper presents a unified framework for supervised learning and inference procedures using the divide-and-conquer approach for high-dimensional correlated outcomes. We propose a general class of

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to subsampling-based methods, sometimes by orders of magnitude, and the advantages of the new approach are also illustrated through analysis of real data.

Score-matching representative approach for big data analysis with generalized linear models

It is shown that MR and SMR are as good as the full data estimate when available, and recommended two representative approaches, mean representative (MR) and score-matching representative (SMR), along with theoretical justifications, for big data analysis with generalized linear models.

A Distributed and Integrated Method of Moments for High-Dimensional Correlated Data Analysis

  • E. HectorP. Song
  • Computer Science
    Journal of the American Statistical Association
  • 2021
A divide-and-conquer procedure implemented in a fully distributed and parallelized computational scheme for statistical estimation and inference of regression parameters is developed, and iron deficiency is significantly associated with two auditory recognition memory related potentials in the left parietal-occipital region of the brain.

Optimal Subsampling for Large Sample Logistic Regression

A two-step algorithm is developed to efficiently approximate the maximum likelihood estimate in logistic regression and derive optimal subsampling probabilities that minimize the asymptotic mean squared error of the resultant estimator.

Scalable estimation strategies based on stochastic approximations: classical results and new insights

Stochastic gradient methods are argued to be poised to become benchmark principled estimation procedures for large datasets, especially those in the family of stable proximal methods, such as implicit stochastic gradient descent.

Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration

This paper proposes a regularized fusion method that allows us to identify and merge inter-study homogeneous parameter clusters in regression analysis, without the use of hypothesis testing approach, and establishes a computationally efficient procedure to deal with large-scale integrated data.

Renewable estimation and incremental inference in generalized linear models with streaming data sets

  • Lan LuoP. X. Song
  • Computer Science
    Journal of the Royal Statistical Society: Series B (Statistical Methodology)
  • 2019
An incremental updating algorithm to analyse streaming data sets using generalized linear models within a new framework of renewable estimation and incremental inference, in which the maximum likelihood estimator is renewed with current data and summary statistics of historical data.

Combining information from independent sources through confidence distributions

This paper develops new methodology, together with related theories, for combining information from independent studies through confidence distributions. A formal definition of a confidence