Share This Author
ISO-TimeML: An International Standard for Semantic Annotation
In this paper, we present ISO-TimeML, a revised and interoperable version of the temporal markup language, TimeML. We describe the changes and enrichments made, while framing the effort in a more…
Standards going concrete : from LMF to Morphalou
The ongoing activity within ISO/TC 37/SC 4 on LMF (Lexical Markup Framework) is described and it is shown how it can be concretely implemented for the design of an on-line morphological resource for French in the Morphalou project.
XCES: An XML-based Encoding Standard for Linguistic Corpora
This paper instantiated the CES as an XML application called XCES, based on the same data architecture comprised of a primary encoded text and "standoff" annotation in separate documents, and demonstrated how XML mechanisms can be used to select from and manipulate annotated corpora encoded according toXCES specifications.
HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID
- Patrice Lopez, Laurent Romary
- Computer ScienceInternational Workshop on Semantic Evaluation
- 15 July 2010
The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents, andagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates.
Experiments with Citation Mining and Key-Term Extraction for Prior Art Search
- Patrice Lopez, Laurent Romary
- Computer ScienceConference and Labs of the Evaluation Forum
- 20 September 2010
This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS, the modular search infrastructure initially realized for CLEF IP 2009, and considers that an instance-based KNN algorithm is not competitive with standard classifiers based on preliminary large scale training.
Representing Linguistic Corpora and Their Annotations
Some of the more technical aspects of the LAF design that have been addressed in the process of finalizing the specifications for the standard are described.
Veins Theory: A Model of Global Discourse Cohesion and Coherence
A generalization of Centering Theory (CT) (Grosz, Joshi, Weinstein (1995) called Veins Theory (VT) is proposed, which extends the applicability of centering rules from local to global discourse.
A model oriented approach to the mapping of annotation formats using standards
SALT, a framework for mapping heterogeneous linguistic formats from one another based on a model-based approach, is presented and its capacity to integrate a wide range of possible linguistic annotation models is shown.
Standards for Language Resources
An abstract data model for linguistic annotations and its implementation using XML, RDF and related standards is presented and the work of a newly formed committee of the International Standards Organization (ISO), ISO/TC 37/SC 4 Language Resource Management is outlined.
Towards International Standards for Language Resources
The use of the LAF to represent the American National Corpus and its linguistic annotations is described, which is to serve as a basis for harmonizing existing language resources, as well as developing new ones.