Melvin Jose Johnson Premkumar

Learn More
A central challenge in relation extraction is the lack of supervised training data. Pattern-based relation extractors suffer from low recall, whereas distant supervision yields noisy data which hurts precision. We propose bootstrapped self-training to capture the benefits of both systems: the precision of patterns and the generalizability of trained models.(More)
Relation triples produced by open domain information extraction (open IE) systems are useful for question answering, inference , and other IE tasks. Traditionally these are extracted using a large set of patterns ; however, this approach is brittle on out-of-domain text and long-range dependencies , and gives no insight into the sub-structure of the(More)
We describe Stanford's entry in the TAC-KBP 2014 Slot Filling challenge. We submitted two broad approaches to Slot Filling , both strongly based on the ideas of distant supervision: one built on the Deep-Dive framework (Niu et al., 2012), and another based on the multi-instance multi-label relation extractor of Surdeanu et al. (2012). In addition, we(More)
This paper summarizes our latest efforts in the development of a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Tamil at different levels: pronunciation dictionary, language modeling (LM) and front-end. Usually in Tamil there are not many word-pronunciation pairs to train data-driven grapheme-to-phoneme (G2P) converters. Therefore, we(More)
Research in the area of Large Vocabulary Continuous Speech Recognition (LVCSR) for Indian languages has not seen the level of advancement as in English since there is a dearth of large scale speech and language corpora even today. Tamil is one among the four major Dravidian languages spoken in southern India. One of the characteristics of Tamil is that it(More)
  • 1