# Parsimonious Bayesian deep networks

@article{Zhou2018ParsimoniousBD, title={Parsimonious Bayesian deep networks}, author={Mingyuan Zhou}, journal={ArXiv}, year={2018}, volume={abs/1805.08719} }

Combining Bayesian nonparametrics and a forward model selection strategy, we construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model. [] Key Method The other one is the construction of a greedy layer-wise learning algorithm that uses a forward model selection criterion to determine when to stop adding another hidden layer. We develop both Gibbs sampling and stochastic…

## 7 Citations

### Deep Autoencoding Topic Model With Scalable Hybrid Bayesian Inference

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2021

A topic-layer-adaptive stochastic gradient Riemannian MCMC that jointly learns simplex-constrained global parameters across all layers and topics, with topic and layer specific learning rates, and a supervised DATM that enhances the discriminative power of its latent representations is proposed.

### Convex Polytope Trees

- Computer ScienceArXiv
- 2020

This paper proposes convex polytope trees (CPT) to expand the family of decision trees by an interpretable generalization of their decision boundary, and develops a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.

### A Variational Edge Partition Model for Supervised Graph Representation Learning

- Computer ScienceNeurIPS
- 2022

A graph generative process to model how the observed edges are generated by aggregating the node interactions over a set of overlapping node communities, each of which contributes to the edges via a logical OR mechanism is introduced.

### The Hawkes Edge Partition Model for Continuous-time Event-based Temporal Networks

- Computer ScienceUAI
- 2020

A novel probabilistic framework to model continuous-time interaction events data that achieves competitive performance for temporal link prediction compared with state-of-the-art methods, but also discovers interpretable latent structure behind the observed temporal interactions.

### Structured Bayesian Latent Factor Models with Meta-data

- Computer Science
- 2019

This research focuses on developing structured Bayesian latent factor models with meta-data for analysing discrete data in the above areas and achieves not only better modelling performance and efficiency, but also preferable interpretability for intuitively understanding those data.

### Random Function Priors for Correlation Modeling

- Computer Science, MathematicsICML
- 2019

This paper introduces random function priors for $Z_n$ for modeling correlations among its dimensions, and derives the Bayesian nonparametric method by applying a representation theorem on separately exchangeable discrete random measures.

### Composing Deep Learning and Bayesian Nonparametric Methods

- Computer Science
- 2019

Composing Deep Learning and Bayesian nonparametric methods for Bayesian Nonparametric Methods is NP-complete and straightforward.

## 51 References

### Augmentable Gamma Belief Networks

- Computer ScienceJ. Mach. Learn. Res.
- 2016

An augmentable gamma belief network (GBN) that factorizes each of its hidden layers into the product of a sparse connection weight matrix and the nonnegative real hidden units of the next layer to infer multilayer deep representations of high-dimensional discrete and non negative real vectors.

### A Fast Learning Algorithm for Deep Belief Nets

- Computer ScienceNeural Computation
- 2006

A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

### Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables

- Mathematics
- 2012

We propose a new data-augmentation strategy for fully Bayesian inference in models with binomial likelihoods. The approach appeals to a new class of Pólya–Gamma distributions, which are constructed…

### Provable learning of noisy-OR networks

- Computer ScienceSTOC
- 2017

Tensor decomposition is applied for learning the single-layer noisy-OR network, which is a textbook example of a bayes net, and used for example in the classic QMR-DT software for diagnosing which disease(s) a patient may have by observing the symptoms he/she exhibits.

### The Poisson Gamma Belief Network

- Computer ScienceNIPS
- 2015

It is demonstrated that the Poisson gamma belief network (PGBN), whose hidden units are imposed with correlated gamma priors, can add more layers to increase its performance gains over Poisson factor analysis, given the same limit on the width of the first layer.

### Default Bayesian analysis for multi-way tables: a data-augmentation approach

- Computer Science, Mathematics
- 2011

A strategy for regularized estimation in multi-way contingency tables, which are common in meta-analyses and multi-center clinical trials, is proposed, based on data augmentation, and appeals heavily to a novel class of Polya{Gamma distributions.

### Distilling the Knowledge in a Neural Network

- Computer ScienceArXiv
- 2015

This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

### Lognormal and Gamma Mixed Negative Binomial Regression

- MathematicsICML
- 2012

A lognormal and gamma mixed negative binomial regression model for counts is proposed, and efficient closed-form Bayesian inference is presented, which has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients.

### Softplus Regressions and Convex Polytopes

- Mathematics
- 2016

To construct flexible nonlinear predictive distributions, the paper introduces a family of softplus function based regression models that convolve, stack, or combine both operations by convolving…

### An Introduction to Variational Methods for Graphical Models

- Computer ScienceMachine Learning
- 2004

This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields), and describes a general framework for generating variational transformations based on convex duality.