• Publications
  • Influence
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems
TLDR
This paper empirically shows how Mem2Seq controls each generation step, and how its multi-hop attention mechanism helps in learning correlations between memories. Expand
One-step and Two-step Classification for Abusive Language Detection on Twitter
TLDR
This research explores a two- step approach of performing classification on abusive language and then classifying into specific types and compares it with one-step approach of doing one multi-class classification for detecting sexist and racist languages. Expand
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
TLDR
Since nonparallel corpora contain a lot more polysemous words, many-to-many translations, and different lexical items in the two languages, the output from Convec is reasonable and useful. Expand
Overview for the First Shared Task on Language Identification in Code-Switched Data
TLDR
The evaluation showed that language identification at the token level is more difficult when the languages present are closely related, as in the case of MSA-DA, where the prediction performance was the lowest among all language pairs. Expand
HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
TLDR
The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS), the largest and first of its kind for Mandarin conversational telephone speech, providing abundant and diversified samples for Mandarin speech recognition and other application-dependent tasks. Expand
An IR Approach for Translating New Words from Nonparallel, Comparable Texts
TLDR
A new method which combines IR and NLP techniques to extract new word translation from automatically downloaded English-Chinese nonparallel newspaper texts is described. Expand
Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus
TLDR
A novel context heterogeneity similarity measure between words and their translations in helping to compile bilingual lexicon entries from a non-parallel English-Chinese corpus is proposed and it is suggested that words with productive context in one language translate to words withproductive context in another language. Expand
Reducing Gender Bias in Abusive Language Detection
TLDR
This work measures gender biases on models trained with different abusive language datasets, while analyzing the effect of different pre-trained word embeddings and model architectures, and experiments with three bias mitigation methods that effectively reduce gender bias by 90-98% and can be extended to correct model bias in other scenarios. Expand
Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E
TLDR
An iterative bootstrapping framework based on the principle of “find-one-get-more”, which claims that documents found to contain one pair of parallel sentences must contain others even if the documents are judged to be of low similarity, is presented. Expand
K-vec: A New Approach for Aligning Parallel Texts
TLDR
An alternative alignment strategy which is presented, which is called K-vec, that starts by estimating the lexicon, that discovers that the English word fisheries is similar to the French peches by noting that the distribution of fisheries in the English text isSimilar to the Distribution of pê'ches in the French. Expand
...
1
2
3
4
5
...