Linear Multiple Low-Rank Kernel Based Stationary Gaussian Processes Regression for Time Series

  title={Linear Multiple Low-Rank Kernel Based Stationary Gaussian Processes Regression for Time Series},
  author={F. Yin and Lishuo Pan and Xinwei He and Tianshi Chen and Sergios Theodoridis and Zhi-Quan Tom Luo},
  journal={IEEE Transactions on Signal Processing},
  • F. Yin, Lishuo Pan, Z. Luo
  • Published 21 April 2019
  • Computer Science
  • IEEE Transactions on Signal Processing
Gaussian processes (GPs) for machine learning have been studied systematically over the past two decades. However, kernel design for GPs and the associated hyper-parameters optimization are still difficult, and to a large extent open problems. We consider GP regression for time series modeling and analysis. The underlying stationary kernel is approximated closely by a new grid spectral mixture (GSM) kernel, which is a linear combination of low-rank sub-kernels. In the case where a large number… 

Figures and Tables from this paper

Novel Compressible Adaptive Spectral Mixture Kernels for Gaussian Processes with Sparse Time and Phase Delay Structures

This paper proposes a new SM kernel variant with a time and phase delay dependency structure (SMD) and provides a structure adaptation (SA) algorithm for the SMD, corroborate the efficacy of the dependency structure and SA in the SMd, and performs a thorough comparative experimental analysis of theSMD on both synthetic and real-life datasets.

Hyperparameter-Free Transmit-Nonlinearity Mitigation Using a Kernel-Width Sampling Technique

A methodology of assigning kernel-bandwidths that capitalizes on a stochastic sampling of kernel-widths using an ensemble drawn from a pre-designed probability density function is proposed for the widely-used Gaussian kernel.

Rethinking Bayesian Learning for Data Analysis: The Art of Prior and Inference in Sparsity-Aware Modeling

A come back of Bayesian methods is taking place that sheds new light on the design of deep neural networks, which also establish firm links with Bayesian models, such Lei Cheng and Feng Yin contribute equally.

Data-Driven Wireless Communication Using Gaussian Processes

This paper presents a promising family of nonparametric Bayesian machine learning methods, i.e., Gaussian processes (GPs), and their applications in wireless communication due to their interpretable learning ability with uncertainty, and reviews the distributed GP with promising scalability.

Recent advances in data-driven wireless communication using Gaussian processes: A comprehensive survey

This paper first envision three-level motivations of data-driven wireless communication using GP models, then presents the background of the GPs in terms of covariance structure and model inference, and lists representative solutions and promising techniques that adopt GP models in various wireless communication applications.

FedLoc: Federated Learning Framework for Data-Driven Cooperative Localization and Location Data Processing

Experimental results show that near centralized data fitting- and prediction performance can be achieved by a set of collaborative mobile users running distributed algorithms.



Sparse Structure Enabled Grid Spectral Mixture Kernel for Temporal Gaussian Process Regression

Experimental results based on various classic time series data sets corroborate that the proposed GPR with GSM kernel significantly outperforms the G PR with SM kernel in terms of both the mean-squared-error (MSE) and the stability of the optimization algorithm.

System Identification Via Sparse Multiple Kernel-Based Regularization Using Sequential Convex Optimization Techniques

A multiple kernel-based regularization method is proposed to handle model estimation and structure detection with short data records and it is shown that the locally optimal solutions lead to good performance for randomly generated starting points.

Function-Space Distributions over Kernels

Gaussian processes are flexible function approximators, with inductive biases controlled by a covariance kernel. Learning the kernel is the key to representation learning and strong predictive

Convex vs non-convex estimators for regression and sparse estimation: the mean squared error properties of ARD and GLasso

The relation between ARD (and a penalized version which the authors call PARD) and Glasso are discussed, and their asymptotic properties in terms of the Mean Squared Error in estimating the unknown parameter are studied.

Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification

Software is now available that implements Gaussian process methods using covariance functions with hierarchical parameterizations, which can discover high-level properties of the data, such as which inputs are relevant to predicting the response.

Gaussian Process Kernels for Pattern Discovery and Extrapolation

This work introduces simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation, and shows that it is possible to reconstruct several popular standard covariances within this framework.

Deep Neural Networks as Gaussian Processes

The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

MCMC methods for Gaussian process models using fast approximations for the likelihood

This work introduces MCMC methods based on the "temporary mapping and caching" framework, using a fast approximation, $\pi^*$, as the distribution needed to construct the temporary space, and proposes two implementations under this scheme: "mapping to a discretizing chain", and "Mapping with tempered transitions", both of which are exactly correct MC MC methods for sampling $\pi$, even though their transitions are constructed using an approximation.

Sparse Spectrum Gaussian Process Regression

The achievable trade-offs between predictive accuracy and computational requirements are compared, and it is shown that these are typically superior to existing state-of-the-art sparse approximations.

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.