Learn More
BACKGROUND We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different(More)
The enormous increase in digital scholarly data and computing power combined with recent advances in text mining, linguistics, network science, and scientometrics make it possible to scientifically study the structure and evolution of science on a large scale. This paper discusses the challenges of this ‘BIG science of science’—also called ‘computational(More)
Cross-lingual event tracking from a very large number of information sources (thousands of Web sites, for example) is an open challenge. In this paper we investigate effective and scalable solutions for this problem, focusing on the use of cross-lingual information retrieval techniques to translate a small subset of the training documents, as an alternative(More)
Addressing the research opportunities we've identified could substantially broaden the spectrum of multilingual text-mining and its practicality for supporting global S&T knowledge management. These opportunities also share a common set of challenges that deserve further attention. For example, competitive intelligence surveillance, which allows(More)
We participated in the Cross-Language Information Retrieval evaluation at NTCIR-3 for the EnglishChinese and English-Japanese tasks. We examined several approaches to query translation, including the use of a commercial machine translation system, a thesaurus that is automatically extracted from a parallel corpus, and a general-purpose online dictionary.(More)
Cross-lingual event tracking from a very large number of information sources (thousands of Web sites, for example) is an open challenge. In this paper we investigate effective and scalable solutions for this problem, focusing on the use of cross-lingual information retrieval techniques to translate a small subset of the training documents, as an alternative(More)
  • 1