• Corpus ID: 216077658

Private Topic Modeling

  title={Private Topic Modeling},
  author={Mijung Park and James R. Foulds and Kamalika Chaudhuri and Max Welling},
We develop a privatised stochastic variational inference method for Latent Dirichlet Allocation (LDA). The iterative nature of stochastic variational inference presents challenges: multiple iterations are required to obtain accurate posterior distributions, yet each iteration increases the amount of noise that must be added to achieve a reasonable degree of privacy. We propose a practical algorithm that overcomes this challenge by combining: (1) an improved composition method for differential… 

Figures and Tables from this paper

Locally Private Bayesian Inference for Count Models
A general and modular method for privatizing Bayesian inference for Poisson factorization, a broad class of models that contains some of the most widely used models in the social sciences, which satisfies local differential privacy, which ensures that no single centralized server need ever store the non-privatized data.
An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral Algorithm
This work is the first to achieve utility guarantees under the required level of differential privacy for learning in LDA, and systematically outperforms differentially private variational inference.
Combatting The Challenges of Local Privacy for Distributional Semantics with Compression
It is argued the formulation of limited-precision local privacy is a more appropriate framework for bag-of-words features, and LDA and LSA approaches are applied to synthetic and real data, showing they produce distributional models closer to those in the original data.
Federated Latent Dirichlet Allocation: A Local Differential Privacy Based Framework
FedLDA, a local differential privacy (LDP) based framework for federated learning of LDA models, contains a novel LDP mechanism called Random Response with Priori (RRP), which provides theoretical guarantees on both data privacy and model accuracy.
DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM
A DP Laplacian smoothing SGD (DP-LSSGD) to train ML models with differential privacy (DP) guarantees that makes training both convex and nonconvex ML models more stable and enables the trained models to generalize better.
Industrial Federated Topic Modeling
This article proposes a framework named Industrial Federated Topic Modeling (iFTM), in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immunity to privacy adversaries, and Experimental results verify iFTM’s superiority over conventional topic modeling.
Federated Topic Modeling
A novel framework named Federated Topic Modeling (FTM) is proposed, in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immune to privacy adversaries, which verified FTM's superiority over conventional topic modeling.
Private Cross-Silo Federated Learning for Extracting Vaccine Adverse Event Mentions
This paper describes the experience applying a FL based solution to the Named Entity Recognition (NER) task for an adverse event detection application in the context of mass scale vaccination programs, and presents a comprehensive empirical analysis of various dimensions of benefits gained with FL based training.


Practical Privacy For Expectation Maximization
This work proposes a practical algorithm that overcomes the challenge of iterative expectation maximization and outputs EM parameter estimates that are both accurate and private, and uses a relaxed notion of the differential privacy gold standard, called concentrated differential privacy (CDP).
The Differential Privacy of Bayesian Inference
It is found that while differential privacy is ostensibly achievable for most of the method variants, the conditions needed for it to do so are often not realistic for practical usage.
Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo
It is shown that under standard assumptions, getting one sample from a posterior distribution is differentially private "for free"; and this sample as a statistical estimator is often consistent, near optimal, and computationally tractable; and this observations lead to an "anytime" algorithm for Bayesian learning under privacy constraint.
Online Learning for Latent Dirichlet Allocation
An online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA) based on online stochastic optimization with a natural gradient step is developed, which shows converges to a local optimum of the VB objective function.
Latent Dirichlet Allocation
Variational algorithms for approximate Bayesian inference
A unified variational Bayesian (VB) framework which approximates computations in models with latent variables using a lower bound on the marginal likelihood and is compared to other methods including sampling, Cheeseman-Stutz, and asymptotic approximations such as BIC.
Robust and Private Bayesian Inference
Borders on the robustness of the posterior are proved, a posterior sampling mechanism is introduced, it is shown that it is differentially private and finite sample bounds for distinguishability-based privacy under a strong adversarial model are provided.
Calibrating Noise to Sensitivity in Private Data Analysis
The study is extended to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f, which is the amount that any single argument to f can change its output.
Differentially Private Bayesian Programming
This work presents PrivInfer, an expressive framework for writing and verifying differentially private Bayesian machine learning algorithms that leverages recent developments in Bayesian inference, probabilistic programming languages, and in relational refinement types.
Differentially Private Stochastic Gradient Descent for in-RDBMS Analytics
This work considers a specific algorithm --- stochastic gradient descent (SGD) for differentially private machine learning --- and explores how to integrate it into an RDBMS system and provides a novel analysis of the privacy properties of this algorithm.