David Sánchez

Learn More
0950-7051/$ see front matter 2010 Elsevier B.V. A doi:10.1016/j.knosys.2010.10.001 ⇑ Corresponding author. Tel.: +34 977 556563; fax: E-mail addresses: david.sanchez@urv.cat (D. Sánch (M. Batet), david.isern@urv.cat (D. Isern). The information content (IC) of a concept provides an estimation of its degree of generality/concreteness, a dimension which(More)
Estimation of the semantic likeness between words is of great importance in many applications dealing with textual data such as natural language processing, knowledge acquisition and information retrieval. Semantic similarity measures exploit knowledge sources as the base to perform the estimations. In recent years, ontologies have grown in interest thanks(More)
In recent years, much effort has been put in ontology learning. However, the knowledge acquisition process is typically focused in the taxonomic aspect. The discovery of non-taxonomic relationships is often neglected, even though it is a fundamental point in structuring domain knowledge. This paper presents an automatic and unsupervised methodology that(More)
Proper understanding of textual data requires the exploitation and integration of unstructured and heterogeneous clinical sources, healthcare records or scientific literature, which are fundamental aspects in clinical and translational research. The determination of semantic similarity between word pairs is an important component of text understanding that(More)
Semantic similarity estimation is an important component of analysing natural language resources like clinical records. Proper understanding of concept semantics allows for improved use and integration of heterogeneous clinical sources as well as higher information retrieval accuracy. Semantic similarity has been the focus of much research, which has led to(More)
The Web is a valuable repository of information. However, its size and its lack of structure difficult the search and extraction of knowledge. In this paper, we propose an automatic and autonomous methodology to retrieve and represent information from the Web in a standard way for a desired domain. It is based on the intensive use of a publicly available(More)
PURPOSE The agent-oriented paradigm has emerged as a viable approach for the development of autonomic systems in the healthcare domain. This paper reviews representative works in this area in order to identify the main research lines and study their level of applicability. Moreover, from the analysis of those works and the authors' own experiences, some(More)
In the context of Statistical Disclosure Control, microaggregation is a privacy preserving method aimed to mask sensitive microdata prior to publication. It iteratively creates clusters of, at least, k elements, and replaces them by their prototype so that they become k-indistinguishable (anonymous). This data transformation produces a loss of information(More)
Centroids are key components in many data analysis algorithms such as clustering or microaggregation. They are understood as the central value that minimises the distance to all the objects in a dataset or cluster. Methods for centroid construction are mainly devoted to datasets with numerical and categorical attributes, focusing on the numerical and(More)