Learn More
In this paper, we present a three-step multilingual dependency parser based on a deterministic shift-reduce parsing algorithm. Different from last year, we separate the root-parsing strategy as sequential labeling task and try to link the neighbor word dependences via a near neighbor parsing. The outputs of the root and neighbor parsers were encoded as(More)
Phrase pattern recognition (phrase chunking) refers to automatic approaches for identifying predefined phrase structures in a stream of text. Support vector machines (SVMs)-based methods had shown excellent performance in many sequential text pattern recognition tasks such as protein name finding, and noun phrase (NP)-chunking. Even though they yield very(More)
In this paper, we present our statistical-based opinion analysis system for NTCIR-MOAT track this year. Our method involves two different approaches: (1) the machine learning-based prototype system (on the basis of support vector machines (SVMs)) and (2) stochastic estimation of the character-level of words. The former were the real applications of(More)
Data-driven learning based on shift reduce parsing algorithms has emerged dependency parsing and shown excellent performance to many Tree-banks. In this paper, we investigate the extension of those methods while considerably improved the runtime and training time efficiency via L 2-SVMs. We also present several properties and constraints to enhance the(More)
In Chinese, most of the language processing starts from word segmentation and part-of-speech (POS) tagging. These two steps tokenize the word from a sequence of characters and predict the syntactic labels for each segmented word. In this paper , we present two distinct sequential tagging models for the above two tasks. The first word segmentation model was(More)
Several phrase chunkers have been proposed over the past few years. Some state-of-the-art chunkers achieved better performance via integrating external resources, e.g., parsers and additional training data, or combining multiple learners. However, in many languages and domains, such external materials are not easily available and the combination of multiple(More)
Asian languages are far from most western style in their non-separate word sequence especially Chinese. The preliminary step of Asian-like language processing is to find the word boundaries between words. In this paper, we present a general purpose model for both Chinese word segmentation and named entity recognition. This model was built on the word(More)
or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the full citation on the first page. Copyrights for components of this work owned by others than IFETS must be honoured. Abstracting with credit is permitted. To copy otherwise, to(More)