Nguyên Thi Minh Huyên

Learn More
Treebank is an important resource for both research and application of natural language processing. For Vietnamese, we still lack such kind of corpora. This paper presents up-to-date results of a project for Vietnamese treebank construction. Since Vietnamese is an isolating language and has no word delimiter, there are many ambiguities in sentence analysis.(More)
We present in this paper a comparison between three segmentation systems for the Vietnamese language. Indeed, the majority of Vietnamese words is built by semantic composition from about 7,000 syllables, that also have a meaning as isolated words. So the identification of word boundaries in a text is not a simple task, and ambiguities often appear. Beyond(More)
The problem of Vietnamese syntactic parsing, especially constituency parsing, has recently been tackled by several research groups. A common effort of the Vietnamese language processing community has allowed the creation of VietTreebank, a reference parsed corpus containing about 10,000 sentences for the constituency parsing task. In this paper, we present(More)
Only very recently have Vietnamese researchers begun to be involved in the domain of Natural Language Processing (NLP). As there does not exist any published work in formal linguistics nor any recognizable standard for Vietnamese word definition and word categories, the fundamental tasks for automatic Vietnamese language processing, such as part-of-speech(More)
This paper presents the construction and evaluation of a deep syntactic parser based on Lexicalized Tree-Adjoining Grammars for the Vietnamese language. This is a complete system integrating necessary tools to process Vietnamese text, which permits to take as input raw texts and produce syntactic structures. A dependency annotation scheme for Vietnamese and(More)
Seventeen toxic congeners of polychlorinated dibenzo-p-dioxins (PCDDs) and polychlorinated dibenzofurans (PCDFs) were determined in breast milks using the high resolution gas chromatography/high resolution mass spectrometry (HRGC/HRMS) method. Twenty seven breast milk samples were collected from primiparae who have lived over 5 years in wards namely Chinh(More)
We present in this article a hybrid approach to automatically tokenize Vietnamese text. The approach combines both finite-state automata technique, regular expression parsing and the maximal-matching strategy which is augmented by statistical methods to resolve ambiguities of segmentation. The Vietnamese lexicon in use is compactly represented by a minimal(More)
Vietnamese is spoken by about 80 millions people around the world, yet very few concrete works on this language have been noticed in Natural Language Processing (NLP) until now. The fundamental problems in automatic analysis of Vietnamese, such as part-ofspeech (POS) tagging, parsing, etc. are extremely difficult due to the lack of formal linguistic(More)