• Publications
  • Influence
Fast and memory efficient mining of frequent closed itemsets
TLDR
This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemset that can be mined from a transactional database. Expand
  • 230
  • 28
  • PDF
Identifying task-based sessions in search engine query logs
TLDR
The research challenge addressed in this paper is to devise effective techniques for identifying task-based sessions, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given task. Expand
  • 134
  • 24
  • PDF
CoPhIR: a Test Collection for Content-Based Image Retrieval
TLDR
The scalability, as well as the effectiveness, of the different Content-based Image Retrieval (CBIR) approaches proposed in literature, is today an important research issue. Expand
  • 166
  • 17
  • PDF
Direct local pattern sampling by efficient two-step random procedures
TLDR
We present several exact and highly scalable local pattern sampling algorithms. Expand
  • 68
  • 17
  • PDF
Mining Top-K Patterns from Binary Datasets in Presence of Noise
TLDR
We propose a greedy algorithm for the discovery of Patterns in Noisy Datasets, named PaNDa, and show that it outperforms related techniques on both synthetic and realworld data. Expand
  • 57
  • 14
  • PDF
Document Similarity Self-Join with MapReduce
TLDR
We present SSJ-2R, a MapReduce based algorithm for the Sim-SJ problem that is 4.5x faster than the state of the art. Expand
  • 75
  • 12
  • PDF
Extending the state-of-the-art of constraint-based pattern discovery
TLDR
In the last years, in the context of the constraint-based pattern discovery paradigm, properties of constraints have been studied comprehensively and on the basis of this properties, efficient constraint-pushing techniques have been defined. Expand
  • 88
  • 11
A Unifying Framework for Mining Approximate Top- $k$ Binary Patterns
TLDR
We review several state-of-the-art algorithms for approximate top-k pattern mining from binary data, and discuss PANDA+, an algorithmic framework able to optimize different cost functions generalized into a unifying formulation. Expand
  • 45
  • 10
Learning relatedness measures for entity linking
TLDR
We formalize the problem of learning entity relatedness as a learning-to-rank problem. Expand
  • 79
  • 9
  • PDF
On closed constrained frequent pattern mining
  • F. Bonchi, C. Lucchese
  • Computer Science
  • Fourth IEEE International Conference on Data…
  • 1 November 2004
TLDR
Constrained frequent patterns and closed frequent patterns are two paradigms aimed at reducing the set of extracted patterns to a smaller, more interesting, subset. Expand
  • 103
  • 8
  • PDF