• Publications
  • Influence
The genome of the mesopolyploid crop species Brassica rapa
We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which hasExpand
  • 1,415
  • 125
Mining concept-drifting data streams using ensemble classifiers
TLDR
We train an ensemble of classification models, such as C4.5, RIPPER, naive Beyesian, etc., from sequential chunks of the data stream using weighted ensemble classifiers. Expand
  • 1,329
  • 124
  • PDF
AdaCost: Misclassification Cost-Sensitive Boosting
TLDR
AdaCost, a variant of AdaBoost, is a misclassification cost-sensitive boosting method. Expand
  • 572
  • 47
  • PDF
Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation
TLDR
In many applications, one can obtain descriptions about the same objects or events from a variety of sources. Expand
  • 278
  • 46
  • PDF
ViST: a dynamic index method for querying XML data by tree structures
TLDR
We propose ViST, a novel index structure for searching XML documents that uses tree structures as the basic unit of query to avoid expensive join operations. Expand
  • 336
  • 41
  • PDF
Mining big data: current status, and forecast to the future
TLDR
Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Expand
  • 655
  • 35
  • PDF
Cost-based modeling for fraud and intrusion detection: results from the JAM project
TLDR
We describe the results achieved using the JAM distributed data mining system for the real world problem of fraud detection in financial information systems. Expand
  • 545
  • 31
  • PDF
Toward Cost-Sensitive Modeling for Intrusion Detection and Response
TLDR
We define cost models to formulate the total expected cost of an IDS, and present cost-sensitive machine learning techniques that can produce detection models that are optimized for user-defined cost metrics. Expand
  • 306
  • 27
  • PDF
A Confidence-Aware Approach for Truth Discovery on Long-Tail Data
TLDR
We propose a confidence-aware truth discovery (CATD) method to automatically detect truths from conflicting data with long-tail phenomenon. Expand
  • 205
  • 26
  • PDF
Knowledge transfer via multiple model local structure mapping
TLDR
We propose a locally weighted ensemble framework to combine multiple models for transfer learning, where the weights are dynamically assigned according to a model's predictive power on each test example. Expand
  • 273
  • 25
  • PDF