Learn More
As participants in the TIDES Surprise language exercise, researchers at the University of Massachusetts helped collect Hindi--English resources and developed a cross-language information retrieval system. Components included normalization, stop-word removal, transliteration, structured query translation, and language modeling using a probabilistic(More)
For the TREC 2004 Novelty track, UMass participated in all four tasks. Although finding relevant sentences was harder this year than last, we continue to show marked improvements over the baseline of calling all sentences relevant, with a variant of tfidf being the most successful approach. We achieve 5–9% improvements over the base-line in locating novel(More)
Out of vocabulary <i>(OOV)</i> words are problematic for cross language information retrieval. One way to deal with OOV words when the two languages have different alphabets, is to <i>transliterate</i> the unknown words, that is, to render them in the orthography of the second language. In the present study, we present a simple statistical technique to(More)
Machine transliteration has received significant research attention in recent years. In most cases, the source language has been English and the target language is an Asian language. This paper focuses on Hindi to English machine transliteration of Indian named entities such as proper nouns, place names and organization names using conditional random fields(More)
• In the HARD track, we developed document metadata to correspond to query metadata requirements; implemented clarification forms based on query expansion, passage retrieval, and clustering; and retrieved variable length passages deemed most likely to be relevant. This work is discussed at length in Section 1. • In the QA track, we focused on retrieving(More)
Almost all transactions ranging from various domains such as travel, shopping, insurance, entertainment, hotels, appointments etc. are available through Internet based applications. Needless to say, all these applications require the knowledge of English. As Internet users are growing day by day, it is logical to say that, there is a great demand to develop(More)
ACKNOWLEDGMENTS A large number of people have played an important role in my quest for a PhD. My heartfelt gratitude goes out to all of them—this list of acknowledgments is by no means complete, and I apologize to anyone I may have inadvertently left out. I would first like to thank my adviser, Prof. Prashant Shenoy, whose expert guidance helped me navigate(More)
The user inputs a query in Hindi language. This query is transliterated into English and is then searched in English corpus using any popular search engine. The results handled back by this search engine are all in English (which includes English dictionary words as well as transliterated words). The English dictionary words are left as such while the(More)
  • 1