Learn More
In this paper, we present the evaluation of our CLIR system performed as part of our participation in FIRE 2008. We participated in Hindi to English, Marathi to English, English to Hindi bilingual task and English, Hindi, Marathi mono-lingual task. We take a query translation based approach using bilingual dictionaries. Query words not found in the(More)
Generic rule-based systems for Information Extraction (IE) have been shown to work reasonably well out-of-the-box, and achieve state-of-the-art accuracy with further domain customization. However, it is generally recognized that manually building and customiz-ing rules is a complex and labor intensive process. In this paper, we discuss an approach that(More)
Distant supervision, a paradigm of relation extraction where training data is created by aligning facts in a database with a large unannotated corpus, is an attractive approach for training relation extractors. Various models are proposed in recent literature to align the facts in the database to their mentions in the corpus. In this paper, we discuss and(More)
Discovering relational structure between input features in sequence labeling models has shown to improve their accuracies in several problem settings. The problem of learning relational structure for sequence labeling can be posed as learning Markov Logic Networks (MLN) for sequence labeling, which we abbreviate as Markov Logic Chains (MLC). This objective(More)
We describe a novel max-margin learning approach to optimize non-linear performance measures for distantly-supervised relation extraction models. Our approach can be generally used to learn latent variable models under multivariate non-linear performance measures, such as F β-score. Our approach interleaves Concave-Convex Procedure (CCCP) for populating(More)
Building relational models for the structured output classification problem of sequence labeling has been recently explored in a few research works. The models built in such a manner are interpretable and capture much more information about the domain (than models built directly from basic attributes), resulting in accurate predictions. On the other hand,(More)
Information Extraction (IE) has become an indispensable tool in our quest to handle the data deluge of the information age. IE can broadly be classified into Named-entity Recognition (NER) and Relation Extraction (RE). In this thesis, we view the task of IE as finding patterns in unstructured data, which can either take the form of features and/or be(More)
Automatic short answer grading (ASAG) techniques are designed to automatically assess short answers written in natural language having a length of a few words to a few sentences. In this paper, we report an intriguing finding that the set of short answers to a question, collectively, share significant lexical commonalities. Based on this finding, we propose(More)
— The objective of the language identification (LID) is to quickly and accurately identify the language being spoken. The language identification system require segmented and labelled speech corpus for identification. In this paper, a discussion has been carried out as the development of LID system which rely on features derived from speech signals, and do(More)