Caroline Barrière

Within large corpora of texts, Knowledge-Rich Contexts (KRCs) are a subset of sentences containing information that would be valuable to a human for the construction of a knowledge base. The entry point to the discovery of KRCs is the automatic identification of Knowledge Patterns (KPs) which are indicative of semantic relations. Machine readable dictionary
Knowledge structures called Concept Clustering Knowledge Graphs (CCKGs) are introduced along with a process for their construction from a machine readable dictionary. CCKGs contain multiple concepts interrelated through multiple semantic relations together forming a semantic cluster represented by a conceptual graph. The knowledge acquisition is performed
We propose a system for retrieving similar sentences from a corpus which treats sentences as pure strings. The advantage of such an approach compared to more linguistically motivated approaches is that the system can quickly retrieve similar sentences from a large size corpus (over one million sentences), work well with illstructured sentences, and work
This paper proposes some modest improvements to Extractor, a state-of-the-art keyphrase extraction system, by using a terabyte-sized corpus to estimate the informativeness and semantic similarity of keyphrases. We present two techniques to improve the organization and remove outliers of lists of keyphrases. The first is a simple ordering according to their
Machine translation of prepositions is a difficult task; little work has been done, to date, in this area. This article suggests addressing the problem using a semantic framework for the interpretation of the surrounding elements of a preposition in the source language. This framework, called Use Types, will reduce the set of possible prepositions in the
This paper focuses on a reading task consisting of the identification of letters in mixed-script handwritten words. This task is performed by humans using extended or limited linguistic context. Their performance rate is to give an upper bound on recognition rates of computer programs designed to recognize handwritten letters in mixed-script writing. Many