Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes
@inproceedings{Saad2016DetectingDI, title={Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes}, author={Feras A. Saad and Vikash K. Mansinghka}, booktitle={International Conference on Artificial Intelligence and Statistics}, year={2016} }
Datasets with hundreds of variables and many missing values are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false positives. This paper proposes an approach that combines probabilistic programming, information theory, and non-parametric Bayes. It shows how to use Bayesian non-parametric modeling to (i) build an ensemble of joint probability models for all the variables; (ii…
12 Citations
Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes
- Computer ScienceArXiv
- 2017
It is found that human evaluators often prefer the results from probabilistic search to results from a standard baseline, and the result is a flexible search technique that applies to a broad class of information retrieval problems, which is integrated into BayesDB.
Bayesian synthesis of probabilistic programs for automatic data modeling
- Computer ScienceProc. ACM Program. Lang.
- 2019
Experimental results show that the techniques presented can accurately infer qualitative structure in multiple real-world data sets and outperform standard data analysis methods in forecasting and predicting new data.
Bayesian Kernelised Test of (In)dependence with Mixed-type Variables
- Computer ScienceArXiv
- 2021
A Bayesian kernelised correlation test of (in)dependence using a Dirichlet process model is proposed and the properties of the approach are theoretically shown, as well as algorithms for fast computation with it.
Artificial intelligence-assisted data analysis with BayesDB
- Computer Science
- 2017
Experiments show that CrossCat, the default model discovery mechanism used by BayesDB, can address all three problems in data analysis effectively, including modeling patterns of missing data, imputing missing values in datasets, and characterizing the error behavior of predictive models.
Hierarchical Infinite Relational Model
- Computer ScienceUAI
- 2021
The HIRM generalizes the standard infinite relational model and can be used for a variety of data analysis tasks including dependence detection, clustering, and density estimation and is used to discover relational structure in real-world datasets from politics and genomics.
SPPL: probabilistic programming with fast exact symbolic inference
- Computer SciencePLDI
- 2021
SPPL translates probabilistic programs into sum-product expressions, a new symbolic representation and associated semantic domain that extends standard sum-Product networks to support mixed-type distributions, numeric transformations, logical formulas, and pointwise and set-valued constraints.
A Bayesian nonparametric test for conditional independence
- MathematicsFoundations of Data Science
- 2020
A Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third using Polya tree priors.
Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series
- Computer ScienceAISTATS
- 2018
A Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series data is proposed, demonstrating superior forecasting accuracy and competitive imputation accuracy as compared to multiple widely used baselines.
Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs
- Computer ScienceProc. ACM Hum. Comput. Interact.
- 2020
The characterization of interpretability work that emerges from the analysis suggests that model interpretability frequently involves cooperation and mental model comparison between people in different roles, often aimed at building trust not only between people and models but also between people within the organization.
Improving Usability, Safety and Patient Outcomes with Health Information Technology - From Research to Practice, Information Technology and Communications in Health Conference, ITCH 2019, Victoria, BC, Canada, 14-17 February 2019
- MedicineITCH
- 2019
A review of currently available opioid apps for the major operating systems and the number of released apps, service providers, operating systems, target user groups, purpose of app, range of features, location, use of evidence, interface, languages, cost and licensing model, and user ratings is examined.
References
SHOWING 1-10 OF 37 REFERENCES
Probabilistic Data Analysis with Probabilistic Programming
- Computer ScienceArXiv
- 2016
Composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques, are introduced.
Context-Specific Independence in Bayesian Networks
- Computer ScienceUAI
- 1996
This paper proposes a formal notion of context-specific independence (CSI), based on regularities in the conditional probability tables (CPTs) at a node, and proposes a technique, analogous to (and based on) d-separation, for determining when such independence holds in a given network.
A Bayesian nonparametric approach to testing for dependence between random variables
- Computer Science
- 2015
A Bayesian nonparametric procedure that leads to a tractable, explicit and analytic quantification of the relative evidence for dependence vs independence and uses Polya tree priors on the space of probability measures to embedded within a decision theoretic test for dependence.
Nonparametric Bayes inference on conditional independence
- Computer Science
- 2014
An encompassing nonparametric Bayes model is relied on for the joint distribution of Y, X and Z, with conditional mutual information used as a summary of the strength of conditional dependence, and an asymptotic theory supporting the approach is provided.
Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams)
- Computer ScienceUAI
- 1998
A new, simple, and efficient "Bayes-ball" algorithm is presented which determines irrelevant sets and requisite information more efficiently than existing methods, and is linear in the size of the graph for belief networks and influence diagrams.
Estimating mutual information.
- Computer SciencePhysical review. E, Statistical, nonlinear, and soft matter physics
- 2004
Two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y), based on entropy estimates from k -nearest neighbor distances are presented.
Dirichlet Process Gaussian Mixture Models: Choice of the Base Distribution
- Computer ScienceJournal of Computer Science and Technology
- 2010
The primary goal of this paper is to compare the choice of conjugate and non-conjugate base distributions on a particular class of DPM models which is widely used in applications, the Dirichlet process Gaussian mixture model (DPGMM).
Scaling Nonparametric Bayesian Inference via Subsample-Annealing
- Computer ScienceAISTATS
- 2014
Improved inference on million-row subsamples of US Census data and network log data and a 307-row hospital rating dataset is demonstrated, using a Pitman-Yor generalization of the Cross Categorization model.
Kernel-based Conditional Independence Test and Application in Causal Discovery
- Computer ScienceUAI
- 2011
A Kernel-based Conditional Independence test (KCI-test) is proposed, by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence.