P. C. Reghu Raj

Learn More
Question answering (QA) system aims at retrieving precise information from a large collection of documents against a query. This paper describes the architecture of a Natural Language Question Answering (NLQA) system for a specific domain based on the ontological information, a step towards semantic web question answering. The proposed architecture defines(More)
Malayalam, a classical language in India, is spoken by over 40 million people. This paper proposes an effective information retrieval system for Malayalam, which retrieve Malayalam documents relevant to the user's information need. The proposed system improves effectiveness by considering synonyms and negations of the terms specified in the query. Though(More)
This paper studies the contribution of different phones in speech data towards improving the performance of text/language independent speaker recognition systems. This work is motivated by the fact that the removal of silence segments from the speech data improves the system performance significantly as it does not contain any speaker-specific information.(More)
Language Identification (LI) is the process of determining the natural language in which the given content is written. It is an important preprocessing step in many tasks of Natural Language Processing (NLP). In a multilingual society like India, automatic language identification has a wider scope, since it would be a vital step in bridging the digital(More)
In this project we try to introduce an efficient way for string matching. We use parallel processing for fast matching and have organized the lexicon in least space using manner (Here we assume that all length words are equally predominant in the input text). This idea is not limited to English, any language text can use it, as matching finally boils down(More)
Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of(More)
In this paper, we present the details of a phonotactic language identification (LID) system developed for five Indian languages, English (Indian), Hindi, Malayalam, Tamil and Kan-nada. Since there are no publicly available speech databases for English, Malayalam and Kannada, we developed the database for each of the target languages by downloading the audio(More)