Anne N. De Roeck

Learn More
We present a clustering algorithm for Arabic words sharing the same root. Root based clusters can substitute dictionaries in indexing for IR. Modifying Adamson and Boreham (1974), our Two-stage algorithm applies light stemming before calculating word pair similarity coefficients using techniques sensitive to Arabic morphology. Tests show a successful(More)
We present a novel technique that automatically alerts authors of requirements to the presence of potentially dangerous ambiguities. We first establish the notion of nocuous ambiguities, which are those that are likely to lead to misunderstandings. We test our approach on coordination ambiguities, which occur when words such as and and or are used. Our(More)
The Ypa project (De Roeck et al., 1998) is building a system to make the information in classiied directories more accessible. BT's Yellow Pages 1 provides an example of a classiied database with which this work would be useful. Accessibility in this context means allowing users (or call center operators) to query the Yellow Pages system using Natural(More)
In this paper we present some heuristics for resolving coordination ambiguities. This type of ambiguity is one of the most pervasive and challenging. We test the hypothesis that the most likely reading of a coordination can be predicted using word distribution information from a generic corpus. The measures that we use are: the relative frequency of the(More)
Term dependence is a natural consequence of language use. Its successful representation has been a long standing goal for Information Retrieval research. We present a methodology for the construction of a concept hierarchy that takes into account the three basic dimensions of term dependence. We also introduce a document evaluation function that allows the(More)
Schema heterogeneity issues often represent an obstacle for discovering coreference links between individuals in semantic data repositories. In this paper we present an approach, which performs ontology schema matching in order to improve instance coreference resolution performance. A novel feature of the approach is its use of existing instancelevel(More)
This paper proposes a model for term reoccurrence in a text collection based on the gaps between successive occurrences of a term. These gaps are modeled using a mixture of exponential distributions. Parameter estimation is based on a Bayesian framework that allows us to fit a flexible model. The model provides measures of a term’s re-occurrence rate and(More)
Many requirements documents are written in natural language (NL). However, with the flexibility of NL comes the risk of introducing unwanted ambiguities in the requirements and misunderstandings between stakeholders. In this paper, we describe an automated approach to identify potentially nocuous ambiguity, which occurs when text is interpreted differently(More)
Natural language is prevalent in requirements documents. However, ambiguity is an intrinsic phenomenon of natural language, and is therefore present in all such documents. Ambiguity occurs when a sentence can be interpreted differently by different readers. In this paper, we describe an automated approach for characterizing and detecting so-called(More)