Learn More
This paper describes our robust, easy-to-use and language independent toolkit namely RDRPOSTagger which employs an error-driven approach to automatically construct a Single Classification Ripple Down Rules tree of transformation rules for POS tagging task. During the demonstration session, we will run the tagger on data sets in 15 different languages.
This paper presents a new approach to learn a rule based system for the task of part of speech tagging. Our approach is based on an incremental knowledge acquisition methodology where rules are stored in an exception-structure and new rules are only added to correct errors of existing rules; thus allowing systematic control of interaction between rules.(More)
Word segmentation is one of the most important tasks in NLP. This task, within Vietnamese language and its own features, faces some challenges, especially in words boundary determination. To tackle the task of Vietnamese word segmentation, in this paper, we propose the WS4VN system that uses a new approach based on Maximum matching algorithm combining with(More)
Wikipedia is a free multilingual online encyclopedia covering a wide range of general and specific knowledge. Its content is continuously maintained up-to-date and extended by a supporting community. In many cases, real-world events influence the collaborative editing of Wikipedia articles of the involved or affected entities. In this paper, we present(More)
Crowdsourcing has become ubiquitous in machine learning as a cost effective method to gather training labels. In this paper we examine the challenges that appear when employing crowdsourcing for active learning, in an integrated environment where an automatic method and human labelers work together towards improving their performance at a certain task. By(More)
This paper presents the first work in the task of author profiling for Vietnamese blogs. This task is important in threat identification and marketing intelligence. We have developed a Vietnamese Blog Profiling framework to automatically predict age, gender, geographic origin and occupation of weblogs’ authors purely based on language use. The(More)
Detecting duplicate entities, usually by examining metadata, has been the focus of much recent work. Several methods try to identify duplicate entities, while focusing either on accuracy or on efficiency and speed - with still no perfect solution. We propose a combined layered approach for duplicate detection with the main advantage of using Crowdsourcing(More)
In this paper, we present our method of using Information Extraction techniques to tackle the task of automatically translating English weather bulletins to Vietnamese. It is simple yet effective in satisfying the constraints of low processing power and storage space for the deployment on an embedded system. Experimental results are very promising with the(More)
In this demo we present WikipEvent, an exploratory system that captures and visualises continuously evolving complex event structures , along with the involved entities. The framework facilitates entity-centric and event-centric search, presented via a user-friendly interface and supported by temporal snippets from corresponding Wikipedia page versions. The(More)
  • 1