• Corpus ID: 9886543

Word Formation Approach to Noun Phrase Analysis for Thai

  title={Word Formation Approach to Noun Phrase Analysis for Thai},
  author={Nattakan Pengphon and Asanee Kawtrakul and Mukda Suktarachan},
Noun phrase analysis is one of the most important components in Natural Language Processing (NLP) applications, such as information retrieval, extraction and categorization. For Thai, noun phrase analysis has unique problems, i.e., noun phrase boundary identification, noun phrase decomposition and its relation extraction, and core noun detection. Statistical and rule based Word formation is, then, proposed as a means of efficiently noun phrase analysis by reducing the possible variants of… 

Figures and Tables from this paper

Thai Keyword Extraction using TextRank Algorithm
The word-formation is improving keyword extraction using the compound noun pattern by applying the TextRank algorithm to group the noun phrase, and there are selected as candidates to calculate in the algorithm.
A Supervised Learning based Chunking in Thai using Categorial Grammar
One of the challenging problems in Thai NLP is to manage a problem on a syntactical analysis of a long sentence. This paper applies conditional random field and categorical grammar to develop a
Thai Elementary Discourse Unit Segmentation by Discourse Segmentation Cues and Syntactic Information
Elementary discourse unit (EDU) segmentation is an important process, since it separates full text into minimal discourse units that are used as an input of many applications such as text
A State of the Art of Thai Language Resources and Thai Language Behavior Analysis and Modeling
This paper intended to express the desire to make a bridge between the languages and to share and make maximal use of the existing lexica, corpus and the tools.
Mining Causality Knowledge From Thai Textual Data
Mining causality knowledge will induce knowledge of reasoning beneficial for our daily use in diagnosis. This framework is for discovering causality existing between causative antecedent and
A statistical approach for semantic relation extraction
  • Aurawan Imsombut
  • Computer Science
    2009 Eighth International Symposium on Natural Language Processing
  • 2009
This paper presents a statistical approach for learning the semantic relations between concepts of an ontology in the agricultural domain by using the extracted patterns of concept pairs of the seed verb's component.
Know-Why Extraction from Textual Data for Supporting What Questions
This research aims to automatically extract Know-Why from documents on the website to contribute knowledge sources to support the question-answering system, especially What-Question, for disease
Thai speech processing technology: A review
This paper reviews the progress of Thai speech technology in five areas of research: fundamental analyses and tools, text-to-speech synthesis (TTS), automatic speech recognition (ASR), speech applications, and language resources.
22nd International Conference on Computational Linguistics Proceedings of the workshop on Knowledge and Reasoning for Answering Questions
This paper presents a CRF (Conditional Random Field) model for Semantic Chunk Annotation in a Chinese Question and Answering System (SCACQA). The model was derived from a corpus of real world
Building an Annotated Corpus for Text Summarization and Question Answering
Annotation schemas for identifying the discourse relations that hold between the parts of text as well as the particular textual of span that are related via the discourse relation are presented.


Noun-Phrase Analysis in Unrestricted Text for Information Retrieval
This paper describes an hybrid approach to the extraction of meaningful subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics and shows that indexing based on such extracted subcompound improves both recall and precision in an information retrieval system.
Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation
A probabilistic chunker is applied to deciding the implicit boundaries of constituents and utilize the linguistic knowledge to extract the noun phrases by a finite state mechanism.
NPtool, a Detector of English Noun Phrases
NPtool is a fast and accurate system for extracting noun phrases from English texts for the purposes of e.g. information retrieval, translation unit discovery, and corpus studies. After a general
Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases
The type of analysis used (surface grammatical analysis) is highlighted, as the methodological approach adopted to adapt the rules (experimental approach).
The Role of Lexicalization and Pruning for Base Noun Phrase Grammars
This paper modify the original framework to extract lexicalized treebank grammars that assign a score to each potential noun phrase based upon both the part-of-speech tag sequence and the word sequence of the phrase, and finds that lexicalization dramatically improves the performance of the unpruned treebank Grammars; however, for the simple base noun phrase data set, the lexicalize grammar performs below the corresponding unlexicalized but pruned grammar.
A Statistical Approach to Thai Word Filtering *
Three nontrivial problems of Thai morphological processing are word boundary ambiguity, tagging ambiguity and implicit spelling errors. These problems cause the alternative or erroneous chain of
A stochastic parts program and noun phrase parser for unrestricted text
  • Kenneth Ward Church
  • Computer Science
    International Conference on Acoustics, Speech, and Signal Processing,
  • 1989
A program that tags each word in an input sentence with the most likely part of speech has been written and performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct.
Automatic indexing using selective NLP and first-order thesauri
In an evaluation comparing CLARIT automatic indexing of ten full-text articles in the domain of artificial intelligence to theindexing of two human subjects, it was found thatCLARIT performed as well---and in some respects better---than the humans.
Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods
The prospects for improving the syntax-based approach to document indexing are better than for the non-syntactic approach, and more detailed analysis of individual queries indicates that the performance of both methods is highly variable.
Automatic syntactic analysis of free text
The system called COPSY (context operator syntax), which uses natural language processing techniques during fully automatic syntactic analysis of free text documents is described, which is being tested by the U.S. Department of Commerce for patent search and indexing.