Maryam Siahbani

Learn More
Out-of-vocabulary (oov) words or phrases still remain a challenge in statistical machine translation especially when a limited amount of parallel text is available for training or when there is a domain shift from training data to test data. In this paper, we propose a novel approach to finding translations for oov words. We induce a lexicon by constructing(More)
PROBLEM The complexity of natural language makes text visualization challenging. Typical approaches ignore the rich grammatical structure of language. We present a visual browser for thousands of historical events from Wikipedia which uses natural language processing (NLP) tools. CONTRIBUTION A novel visualization of high-level textual information(More)
Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n 2 b) for input of n words and beam size b, compared to O(n 3) for the CKY algorithm. It requires a single(More)
Left-to-right (LR) decoding (Watanabe et al., 2006) is promising decoding algorithm for hierarchical phrase-based translation (Hiero) that visits input spans in arbitrary order producing the output translation in left to right order. This leads to far fewer language model calls, but while LR decoding is more efficient than CKY decoding, it is unable to(More)
This paper extracts facts using "micro-reading" of text in contrast to approaches that extract common-sense knowledge using "macro-reading" methods. Our goal is to extract detailed facts about events from natural language using a predicate-centered view of events (who did what to whom, when and how). We exploit semantic role labels in order to create a(More)
Hierarchical phrase-based machine translation [1] (Hiero) is a prominent approach for Statistical Machine Translation usually comparable to or better than conventional phrase-based systems. But Hiero typically uses the CKY decoding algorithm which requires the entire input sentence before decoding begins, as it produces the translation in a bottom-up(More)
The multilingual Paraphrase Database (PPDB) is a freely available automatically created resource of paraphrases in multiple languages. In statistical machine translation, paraphrases can be used to provide translation for out-of-vocabulary (OOV) phrases. In this paper, we show that a graph propagation approach that uses PPDB paraphrases can be used to(More)