• Publications
  • Influence
Efficiently mining long patterns from databases
We present a pattern-mining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previousExpand
  • 1,512
  • 122
Scaling up all pairs similarity search
Given a large collection of sparse vector data in a high dimensional space, we investigate the problem of finding all pairs of vectors whose similarity score (as determined by a function such asExpand
  • 684
  • 112
Data privacy through optimal k-anonymization
  • R. Bayardo, R. Agrawal
  • Computer Science
  • 21st International Conference on Data Engineering…
  • 5 April 2005
Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for theExpand
  • 1,199
  • 98
Using CSP Look-Back Techniques to Solve Real-World SAT Instances
We report on the performance of an enhanced version of the "Davis-Putnam" (DP) proof procedure for propositional satisfiability (SAT) on large instances derived from real-world problems in planning,Expand
  • 681
  • 59
Mining the most interesting rules
Several algorithms have been proposed for finding the “best,” “optimal,” or “most interesting” rule(s) in a database according to a variety of metrics including confidence, support, gain, chi-squaredExpand
  • 694
  • 35
Constraint-based rule mining in large, dense databases
Constraint-based rule miners find all rules in a given dataset meeting user-specified constraints such as minimum support and confidence. We describe a new algorithm that directly exploits allExpand
  • 334
  • 30
InfoSleuth: agent-based semantic integration of information in open and dynamic environments
The goal of the InfoSleuth project at MCC is to exploit and synthesize new technologies into a unified system that retrieves and processes information in an ever-changing network of informationExpand
  • 481
  • 21
Counting Models Using Connected Components
Recent work by Birnbaum & Lozinskii [1999] demonstrated that a clever yet simple extension of the well-known DavisPutnam procedure for solving instances of propositional satisfiability yields anExpand
  • 184
  • 19
A Complexity Analysis of Space-Bounded Learning Algorithms for the Constraint Satisfaction Problem
Learning during backtrack search is a space-intensive process that records information (such as additional constraints) in order to avoid redundant work. In this paper, we analyze the effects ofExpand
  • 133
  • 15
PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce
Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory onExpand
  • 278
  • 14