• Publications
  • Influence
Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach
This represents the largest study, by an order of magnitude, of language and personality, and found striking variations in language with personality, gender, and age.
Efficient clustering of high-dimensional data sets with application to reference matching
This work presents a new technique for clustering large datasets, using a cheap, approximate distance measure to eciently divide the data into overlapping subsets the authors call canopies, and presents ex- perimental results on grouping bibliographic citations from the reference sections of research papers.
Methods and metrics for cold-start recommendations
A method for recommending items that combines content and collaborative data under a single probabilistic framework is developed, and it is demonstrated empirically that the various components of the testing strategy combine to obtain deeper understanding of the performance characteristics of recommender systems.
Time Series
This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive process of manually cataloging and cataloging time series.
Automatic personality assessment through social media language.
Results indicated that language-based assessments can constitute valid personality measures: they agreed with self-reports and informant reports of personality, added incremental validity over informant reports, adequately discriminated between traits, and were stable over 6-month intervals.
Iterative combinatorial auctions: achieving economic and computational efficiency
iBundle is proposed, an iterative combinatorial auction in which agents can bid for combinations of items and adjust their bids in response to bids from other agents, and also compute Vickrey payments at the end of the auction.
A Generalized Linear Model for Principal Component Analysis of Binary Data
An alternating least squares method is derived to estimate the basis vectors and generalized linear coefficients of the logistic PCA model, a generalized linear model for dimensionality reduction of binary data that is related to principal component analysis (PCA) and is much better suited to modeling binary data than conventional PCA.
Iterative Combinatorial Auctions: Theory and Practice
iBundle is introduced, the first iterative combinatorial auction that is optimal for a reasonable agent bidding strategy, in this case myopic best-response bidding, and its optimality is proved with a novel connection to primal-dual optimization theory.
Integrated Annotation for Biomedical Information Extraction
We describe an approach to two areas of biomedical information extraction, drug development and cancer genomics. We have developed a framework which includes corpus annotation integrated at multiple
A hybrid neural network‐first principles approach to process modeling
A hybrid neural network-first principles modeling scheme is developed and used to model a fedbatch bioreactor. The hybrid model combines a partial first principles model, which incorporates the