Share This Author
Information-theoretic software clustering
- P. Andritsos, Vassilios Tzerpos
- Computer ScienceIEEE Transactions on Software Engineering
- 1 February 2005
LIMBO, a scalable hierarchical clustering algorithm based on the minimization of information loss when clustering a software system, is introduced and a method that can assess the usefulness of any nonstructural attribute in a software clustering context is presented.
LIMBO: Scalable Clustering of Categorical Data
This work introduces LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering, and shows how the LIMBO algorithm can be used to cluster both tuples and values.
Clean Answers over Dirty Databases: A Probabilistic Approach
- P. Andritsos, A. Fuxman, Renée J. Miller
- Computer Science22nd International Conference on Data Engineering…
- 3 April 2006
This work rewrite queries over a database containing duplicates to return each answer with the probability that the answer is in the clean database, and experimentally study the performance of the rewritten queries.
Software clustering based on information loss minimization
- P. Andritsos, Vassilios Tzerpos
- Computer Science10th Working Conference on Reverse Engineering…
- 13 November 2003
LIMBO is a scalable hierarchical clustering algorithm based on the minimization of information loss when clustering asoftware system and can be used to evaluate the usefulness of various types of non-structural information to the software clustering process.
Overview and semantic issues of text mining
This survey discusses semantic issues from the natural language particularities, syntactic matters, tokenization concerns and it focuses on the different text representation techniques, categorisation tasks and similarity measures suggested.
A Process Mining Based Model for Customer Journey Mapping
The proposed CJM model brings data scientists and customer journey planners closer together, the first step in gaining a better understanding of customer behavior, and highlights the prospective value of process mining for CJM analysis.
Limbo: A scalable algorithm to cluster categorical data
This work introduces LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering, and uses the IB framework to define a distance measure for categorical tuples and presents a novel distance measureFor categorical attribute values.
Scalable clustering of categorical data and applications
- P. Andritsos
- Computer Science
This thesis introduces LIMBO, a scalable hierarchical categorical clustering algorithm based on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering, and proposes a set of tools based on LIMBO for finding structural summaries that are useful in characterizing the information content of the data.
Efficient itinerary planning with category constraints
- P. Bolzoni, S. Helmer, Kevin Wellenzohn, J. Gamper, P. Andritsos
- Computer ScienceSIGSPATIAL/GIS
- 4 November 2014
A group of efficient algorithms based on clustering with guaranteed theoretical bounds are developed, showing that in practice the results are better than the theoretical guarantees and very close to the optimal solution.
Making Open Data Transparent: Data Discovery on Open Data
- Renée J. Miller, F. Nargesian, Erkang Zhu, Christina Christodoulakis, K. Pu, P. Andritsos
- Computer ScienceIEEE Data Eng. Bull.
Open Data poses interesting new challenges for data integration research and one of those challenges is data discovery, how can the authors find new data sets within this ever expanding sea of Open Data.