Abram Handler

Learn More
Social scientists who do not have specialized natural language processing training often use a unigram bag-of-words (BOW) representation when analyzing text corpora. We offer a new phrase-based method, NPFST, for enriching a unigram BOW. NPFST uses a partof-speech tagger and a finite state transducer to extract multiword phrases to be added to a unigram(More)
We propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We present a newly collected police fatality corpus, which we release publicly, and present a model to solve this problem that uses EM-based distant supervision with logistic regression and convolutional(More)
We explore two techniques which use color to make sense of statistical text models. One method uses in-text annotations to illustrate a model’s view of particular tokens in particular documents. Another uses a high-level, “wordsas-pixels” graphic to display an entire corpus. Together, these methods offer both zoomed-in and zoomed-out perspectives into a(More)
This thesis performs an empirical analysis of Word2Vec by comparing its output to WordNet, a well-known, human-curated lexical database. It finds that Word2Vec tends to uncover more of certain types of semantic relations than others – with Word2Vec returning more hypernyms, synonomyns and hyponyms than hyponyms or holonyms. It also shows the probability(More)
  • 1