James R. Foulds

Learn More
There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on very large-scale corpora,(More)
Online debate forums present a valuable opportunity for the understanding and modeling of dialogue. To understand these debates, a key challenge is inferring the stances of the participants, all of which are interrelated and dependent. While collectively modeling users’ stances has been shown to be effective (Walker et al., 2012c; Hasan and Ng, 2013), there(More)
Multi-instance (MI) learning is a variant of inductive machine learning where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios,(More)
Real-world relational data sets, such as social networks, often involve measurements over time. We propose a Bayesian nonparametric latent feature model for such data, where the latent features for each actor in the network evolve according to a Markov process, extending recent work on similar models for static networks. We show how the number of features(More)
New diagnostics for tuberculosis are urgently needed to replace or facilitate acid-fast bacilli (AFB) microscopy for the identification of smear-positive cases, and to improve the diagnosis of AFB smear-negative cases. These need to be appropriate for use in low income countries. Tests to replace or facilitate AFB microscopy must offer improvements to this(More)
Understanding the diffusion of information in social networks and social media requires modeling the text diffusion process. In this work, we develop the HawkesTopic model (HTM) for analyzing text-based cascades, such as “retweeting a post” or “publishing a follow-up blog post.” HTM combines Hawkes processes and topic modeling to simultaneously reason about(More)
Multiple-Instance Learning via Embedded Instance Selection (MILES) is a recently proposed multiple-instance (MI) classification algorithm that applies a single-instance base learner to a propositionalized version of MI data. However, the original authors consider only one single-instance base learner for the algorithm — the 1-norm SVM. We present an(More)
Bayesian inference has great promise for the privacy-preserving analysis of sensitive data, as posterior sampling automatically preserves differential privacy, an algorithmic notion of data privacy, under certain conditions (Dimitrakakis et al., 2014; Wang et al., 2015b). While this one posterior sample (OPS) approach elegantly provides privacy “for free,”(More)
Relational phrases (e.g., “got married to”) and their hypernyms (e.g., “is a relative of”) are central for many tasks including question answering, open information extraction, paraphrasing, and entailment detection. This has motivated the development of several linguistic resources (e.g. DIRT, PATTY, and WiseNet) which systematically collect and organize(More)
In multi-instance learning, each example is described by a bag of instances instead of a single feature vector. In this paper, we revisit the idea of performing multi-instance classification based on a point-and-scaling concept by searching for the point in instance space with the highest diverse density. This is a computationally expensive process, and we(More)