Christopher DuBois

Learn More
There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on very large-scale corpora,(More)
Most real-world recommender services measure their performance based on the top-N results shown to the end users. Thus, advances in top-N recommendation have far-ranging consequences in practical applications. In this paper, we present a novel method, called Collaborative Denoising Auto-Encoder (CDAE), for top-N recommendation that utilizes the idea of(More)
Real-world relational data sets, such as social networks, often involve measurements over time. We propose a Bayesian nonparametric latent feature model for such data, where the latent features for each actor in the network evolve according to a Markov process, extending recent work on similar models for static networks. We show how the number of features(More)
Several approaches have recently been proposed for modeling of continuous-time network data via dyadic event rates conditioned on the observed history of events and nodal or dyadic covariates. In many cases, however, interaction propensities — and even the underlying mechanisms of interaction — vary systematically across subgroups whose identities are(More)
We consider the problem of analyzing social network data sets in which the edges of the network have timestamps, and we wish to analyze the subgraphs formed from edges in contiguous subintervals of these timestamps. We provide data structures for these problems that use nearlinear preprocessing time, linear space, and sublogarithmic query time to handle(More)
BACKGROUND The obesity-related hormones leptin and adiponectin are independently and oppositely associated with insulin resistance, which is an important risk factor for coronary artery disease (CAD) and restenosis after coronary intervention. In this report, we set out to determine the role of the leptin-adiponectin ratio (LAR) in non-diabetic patients(More)
In this paper, we advance the theory of large scale Bayesian posterior inference by introducing a new approximate slice sampler that uses only small mini-batches of data in every iteration. While this introduces a bias in the stationary distribution, the computational savings allow us to draw more samples in a given amount of time and reduce sampling(More)
Social media are increasingly used to disseminate emergency warnings, alerts, and other hazard-related information. In this context, the timing of information propagation is of immediate interest. Time-sensitive information must reach members of the general public before the pertinence of the information expires. In this research we build a preliminary(More)
Two-mode networks are a natural representation for many kinds of relational data. These networks are bipartite graphs consisting of two distinct sets (“modes”) of entities. For example, one can model multiple recipient email data as a twomode network of (a) individuals and (b) the emails that they send or receive. In this work we present a statistical model(More)