Karthik Gali

Learn More
This paper is an attempt to show that an intermediary level of analysis is an effective way for carrying out various NLP tasks for linguistically similar languages. We describe a process for developing a simple parser for doing such tasks. This parser uses a grammar driven approach to annotate dependency relations (both inter and intra chunk) at an(More)
In this paper we explore various parameter settings of the state-of-art Statistical Machine Translation system to improve the quality of the translation for a 'distant' language pair like English-Hindi. We proposed new techniques for efficient reordering. A slight improvement over the base-line is reported using these techniques. We also show that a simple(More)
This paper, submitted as an entry for the NERSSEAL-2008 shared task, describes a system build for Named Entity Recognition for South and South East Asian Languages. Our paper combines machine learning techniques with language specific heuris-tics to model the problem of NER for In-dian languages. The system has been tested on five languages: Telugu, Hindi,(More)
In this paper we use the popular phrase-based SMT techniques for the task of machine transliteration, for English-Hindi language pair. Minimum error rate training has been used to learn the model weights. We have achieved an accuracy of 46.3% on the test set. Our results show these techniques can be successfully used for the task of machine transliteration.
Named Entity Recognition(NER) is the task of identifying and classifying tokens in a text document into predefined set of classes. In this paper we show our experiments with various feature combinations for Tel-ugu NER. We also observed that the prefix and suffix information helps a lot in finding the class of the token. We also show the effect of the(More)
This paper describes a machine learning algorithm for Gujarati Part of Speech Tagging. The machine learning part is performed using a CRF model. The features given to CRF are properly chosen keeping the linguistic aspect of Gujarati in mind. As Gujarati is currently a less privileged language in the sense of being resource poor, manually tagged data is only(More)
In this paper, we present five models for sentence realisation from a bag-of-words containing minimal syntactic information. It has a large variety of applications ranging from Machine Translation to Dialogue systems. Our models employ simple and efficient techniques based on n-gram Language modeling. We evaluated the models by comparing the synthesized(More)
1 Talentica Abstract In this paper, we propose a dependency based statistical system that uses discrim-inative techniques to train its parameters. We conducted experiments on an English-Hindi parallel corpora. The use of syntax (dependency tree) allows us to address the large word-reorderings between English and Hindi. And, discriminative training allows us(More)
The structure of a sentence can be seen as a spanning tree in a linguistically augmented graph of syntactic nodes. This paper presents an approach for unlabeled dependency parsing based on this view. The first step involves marking the chunks and the chunk heads of a given sentence and then identifying the intra-chunk dependency relations. The second step(More)
  • 1