• Publications
  • Influence
Analysis of Stopping Active Learning based on Stabilizing Predictions
TLDR
This paper presents the first theoretical analysis of stopping active learning based on stabilizing predictions (SP). Expand
Statistical inference on attributed random graphs: Fusion of graph features and content: An experiment on time series of Enron graphs
TLDR
Fusion of information from graph features and content can provide superior inference for an anomaly detection task, compared to the corresponding content-only or graph feature-only statistics. Expand
Statistical inference on attributed random graphs: Fusion of graph features and content
TLDR
We prove that tests based on a fusion of graph-derived and content-derived metadata can be more powerful than those based on graph or content features alone. Expand
Social correlates of turn-taking behavior
TLDR
We train statistical models of turn-taking behavior using automatic labels of speech activity and measure the association of these models with socially correlated traits. Expand
Social correlates of turn-taking style
TLDR
We derive simple turn-taking models from speaker activity detection output on the Switchboard-1 corpus that can be used to cluster speakers into turn- taking 'styles. Expand
Tracking changes in language
  • John Grothendieck
  • Computer Science
  • IEEE Transactions on Speech and Audio Processing
  • 15 August 2005
TLDR
This work presents a methodology for understanding how a language model has altered based on utterance clustering and statistical tests on individual features. Expand
Towards Link Characterization From Content: Recovering Distributions From Classifier Output
TLDR
In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. Expand
Towards link characterization from content
TLDR
The Metropolis-Hastings algorithm allows us to construct a Bayes estimator for the true class proportions. Expand
CoCITe—Coordinating Changes in Text
TLDR
This paper describes a procedure for efficiently finding step changes, trends, bursts, and cyclic changes affecting frequencies of words, or more general lexical items, within streams of documents which may be optionally labeled with metadata. Expand
Random attributed graphs for statistical inference from content and context
TLDR
Random attributed graphs provide an effective means to characterize and draw inferences from large volumes of language content plus associated meta-data, which humans deal with as a gestalt. Expand
...
1
2
...