This paper is based on morphological analyzer using machine learning approach for complex agglutinative natural languages. Morphological analysis is concerned with retrieving the structure, the syntactic and morphological properties or the meaning of a morphologically complex word. The morphology structure of agglutinative language is unique and capturing… (More)
This paper presents the Part Of Speech tagger and Chunker for Tamil using Machine learning techniques. Part Of Speech tagging and chunking are the fundamental processing steps for any language processing task. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. Chunking is the task… (More)
Clause boundary identification is a very important task in natural language processing. Identifying the clauses in the sentence becomes a tough task if the clauses are embedded inside other clauses in the sentence. In our approach, we use the dependency parser to identify the boundary for the clause. The dependency tag set, contains 11 tags, and is useful… (More)
Transliteration is the process of replacing the characters in one language with the corresponding phonetically equivalent characters of the other language. India is a language diversified country where people speak and understand many languages but does not know the script of some of these languages. Transliteration plays a major role in such cases.… (More)
This paper presents the chunker for Tamil using Machine learning techniques. Chunking is the task of identifying and segmenting the text into syntactically correlated word groups. The chunking is done by the machine learning techniques, where the linguistical knowledge is automatically extracted from the annotated corpus. We have developed our own tagset… (More)
languages are belongs to different language family so it is difficult for system to automate the morpho-syntactic mapping between them using statistical methods. This paper investigates about how English side pre-processing is used to improve the accuracy of English-Tamil SMT system.