Learn More
Indexing and ranking are two key factors for efficient and effective XML information retrieval. Inappropriate indexing may result in false negatives and false positives, and improper ranking may lead to low precisions. In this paper, we propose a configurable XML information retrieval system, in which users can configure appropriate index types for XML tags(More)
Many XML applications over the Internet favor high-performance single-pass streaming XPath evaluation. Finite automata-based algorithms suffer from potentially combinatorial explosion of dynamic states for matching descendant axes. We present QuickXScan for streaming evaluation of XPath queries containing child and descendant axes with complex predicates.(More)
Extracting key concepts from clinical texts for indexing is an important task in implementing a medical digital library. Several methods are proposed for mapping free text into standard terms defined by the Unified Medical Language System (UMLS). For example, natural language processing techniques are used to map identified noun phrases into concepts. They(More)
Maximal frequent itemsets (MFI) are crucial to many tasks in data mining. Since the MaxMiner algorithm first introduced enumeration trees for mining MFI in 1998, there have been several methods proposed to use depth first search to improve performance. To further improve the performance of mining MFI, we proposed a technique to gather and pass tail (of a(More)
There is much research suggesting separating transactions from update transactions to improve the performance of database systems. But, how to separate them is still a problem. Some modern database applications, such as data mining and web publication, require a DBMS for huge read-only transactions, i.e. queries. We propose a dual copy method to separate(More)
Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. We propose a pattern decomposition (PD) algorithm that can significantly(More)
Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. In addition, many methods do not generate all frequent patterns, making them(More)
Reviewing brain tumor patients' complete medical record is a daunting task for any clinician. In current practice, the radiologist examines the most recent documents and then dictates an assessment of the patient's condition based on a review of the most current imaging study and compared with the most recent previous image study. Occasionally, the(More)