Learn More
This paper proposes an approach to sentencelevel paraphrase identification by text canonicalization. The source sentence pairs are first converted into surface text that approximates canonical forms. A decision tree learning module which employs simple lexical matching features then takes the output canonicalized texts as its input for a supervised learning(More)
This paper uses Systemic Functional Linguistic (SFL) theory as a basis for extracting semantic features of documents. We focus on the pronominal and determination system and the role it plays in constructing interpersonal distance. By using a hierarchical system model that represents the author’s language choices, it is possible to construct a rich and(More)
We propose a machine learning approach, using a Maximum Entropy (ME) model to construct a Named Entity Recognition (NER) classifier to retrieve biomedical names from texts. In experiments, we utilize a blend of various linguistic features incorporated into the ME model to assign class labels and location within an entity sequence, and a postprocessing(More)
The CABER project has as its aim the development of a system generator for creating a total environment for the Capture and Analysis of Behavioural Events in Real-time. The total environment includes the ability to describe behavioural activities in natural language terminology and then analyse those activities for behaviour patterns. In addition it should(More)
The 11th revision of the International Classification of Diseases and Related Health Problems (ICD) will be developed as a collaborative effort supported by Webbased software. A key to this effort is the content model designed to support detailed description of the clinical characteristics of each category, clear relationships to other terminologies and(More)