• Publications
  • Influence
Supervised Text-based Geolocation Using Language Models on an Adaptive Grid
The adaptive grid achieves competitive results with a uniform grid on small training sets and outperforms it on the large Twitter corpus and the two grid constructions can also be combined to produce consistently strong results across all training sets.
Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns
GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun–name pairs sampled, is presented and released to provide diverse coverage of challenges posed by real-world text and shows that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge.
Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph
Results on polarity classification for several datasets show that the label propagation approach rivals a model supervised with in-domain annotated tweets, and it outperforms the noisily supervised classifier it exploits as well as a lexicon-based polarity ratio classifier.
PAWS: Paraphrase Adversaries from Word Scrambling
PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 well-formed paraphrase and non-paraphrase pairs with high lexical overlap, is introduced, providing an effective instrument for driving further progress on models that better exploit structure, context, and pairwise comparisons.
Simple supervised document geolocation with geodesic grids
This work investigates automatic geolocation (i.e. identification of the location, expressed as latitude/longitude coordinates) of documents and describes several simple supervised methods for document geolocated using only the document's raw text as evidence.
PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification
PAWS-X, a new dataset of 23,659 human translated PAWS evaluation pairs in six typologically distinct languages, shows the effectiveness of deep, multilingual pre-training while also leaving considerable headroom as a new challenge to drive multilingual research that better captures structure and contextual information.
Lexically specified derivational control in combinatory categorial grammar
This dissertation elaborates several refinements to the Combinatory Categorial Grammar (ccg) framework, and shows how the multi-modal perspective on grammatical composition provided by the logical tradition of categorial grammar can be incorporated into ccg’s rulebased approach.
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
This work highlights shortcomings of current metrics for the Room-to-Room dataset and proposes a new metric, Coverage weighted by Length Score (CLS), and shows that agents that receive rewards for instruction fidelity outperform agents that focus on goal completion.
Categorial Grammar (CG, Ajdukiewicz 1935; Bar-Hillel 1953) is one of the oldest lexicalized grammar formalisms, in which all grammatical constituents are distinguished by a syntactic type identifying
Hierarchical Discriminative Classification for Text-Based Geolocation
The effectiveness of using logistic regression models on a hierarchy of nodes in the grid is demonstrated, which improves upon the state of the art accuracy by several percent and reduces mean error distances by hundreds of kilometers on data from Twitter, Wikipedia, and Flickr.