Learn More
We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from(More)
Sequence-to-sequence neural network models for generation of conversational responses tend to generate safe, commonplace responses (e.g., I don't know) regardless of the input. We suggest that the traditional objective function, i.e., the likelihood of output (response) given input (message) is unsuited to response generation tasks. Instead we propose using(More)
We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our(More)
We present persona-based models for handling the issue of speaker consistency in neural response generation. A speaker model encodes personas in distributed embeddings that capture individual characteristics such as background information and speaking style. A dyadic speaker-addressee model captures properties of interactions between two interlocutors. Our(More)
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A(More)
We present a modular system for detection and correction of errors made by non-native (English as a Second Language = ESL) writers. We focus on two error types: the incorrect use of determiners and the choice of prepositions. We use a decision-tree approach inspired by contextual spelling systems for detection and correction suggestions, and a large(More)
This paper presents a pilot study of the use of phrasal Statistical Machine Translation (SMT) techniques to identify and correct writing errors made by learners of English as a Second Language (ESL). Using examples of mass noun errors found in the Chinese Learner Error Corpus (CLEC) to guide creation of an engineered training set, we show that application(More)
We present a machine learning approach to evaluating the well-formedness of output of a machine translation system, using classifiers that learn to distinguish human reference translations from machine translations. This approach can be used to evaluate an MT system, tracking improvements over time; to aid in the kind of failure analysis that can help guide(More)
This paper describes a method of extracting katakana words and phrases, along with their English counterparts from non-aligned monolingual web search engine query logs. The method employs a trainable edit distance function to find <katakana, English> pairs that have a high probability of being equivalent. These pairs can then be used to further bootstrap(More)