This paper presents the Part Of Speech tagger and Chunker for Tamil using Machine learning techniques. Part Of Speech tagging and chunking are the fundamental processing steps for any language processing task. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. Chunking is the task… (More)
Clause boundary identification is a very important task in natural language processing. Identifying the clauses in the sentence becomes a tough task if the clauses are embedded inside other clauses in the sentence. In our approach, we use the dependency parser to identify the boundary for the clause. The dependency tag set, contains 11 tags, and is useful… (More)
languages are belongs to different language family so it is difficult for system to automate the morpho-syntactic mapping between them using statistical methods. This paper investigates about how English side pre-processing is used to improve the accuracy of English-Tamil SMT system.
Transliteration is the process of replacing the characters in one language with the corresponding phonetically equivalent characters of the other language. India is a language diversified country where people speak and understand many languages but does not know the script of some of these languages. Transliteration plays a major role in such cases.… (More)