Hanna Wallach

Learn More
This thesis explores a number of parameter estimation techniques for conditional random fields, a recently introduced [31] probabilistic model for labelling and segmenting sequential data. Theoretical and practical disadvantages of the training techniques reported in current literature on CRFs are discussed. We hypothesise that general numerical(More)
Statistical models of text have become increasingly popular in statistics and computer science as a method of exploring large document collections. Social scientists often want to move beyond exploration, to measurement and experimentation, and make inference about social and political processes that drive discourse and content. In this paper, we develop a(More)
Social scientists who do not have specialized natural language processing training often use a unigram bag-of-words (BOW) representation when analyzing text corpora. We offer a new phrase-based method, NPFST, for enriching a unigram BOW. NPFST uses a partof-speech tagger and a finite state transducer to extract multiword phrases to be added to a unigram(More)
Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman–Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some tasks,(More)
Two patients with locally advanced carcinoma of the breast had radiation therapy as primary treatment. Within one year, a lupus-like syndrome developed characterized by pneumonitis, pleural effusion, and positive fluorescent antinuclear antibody (FANA) reaction and lupus erythematosus (LE) preparation. Pericarditis developed in one patient and leukopenia in(More)
What can text corpora tell us about society? How can automatic text analysis algorithms efficiently and reliably analyze the social processes revealed in language production? This work develops statistical text analyses of dynamic social and news media datasets to extract indicators of underlying social phenomena, and to reveal how social factors guide(More)
SCALING MCMC INFERENCE AND BELIEF PROPAGATION TO LARGE, DENSE GRAPHICAL MODELS MAY 2014 SAMEER SINGH B.E., UNIVERSITY OF DELHI M.Sc., VANDERBILT UNIVERSITY Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Andrew McCallum With the physical constraints of semiconductor-based electronics becoming increasingly limiting in the past decade,(More)