Towards certain fixes with editing rules and master data

@article{Fan2010TowardsCF,
  title={Towards certain fixes with editing rules and master data},
  author={Wenfei Fan and Jianzhong Li and Shuai Ma and Nan Tang and Wenyuan Yu},
  journal={Proceedings of the VLDB Endowment},
  year={2010},
  volume={3},
  pages={173 - 184}
}
A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is… 

Figures from this paper

Towards certain fixes with editing rules and master data
TLDR
A framework and an algorithm to find certain fixes, based on master data, a notion of certain regions, and a class of editing rules are presented, by interacting with the users to ensure that one of the certain regions is correct.
Mapping and Cleaning: the LLUNATIC Way
TLDR
A new semantics for these scenarios that represents a conservative extension of previous semantics for schema mappings and data repairing is developed and a chasebased algorithm to compute solutions is introduced.
Automatic weighted matching rectifying rule discovery for data repairing
TLDR
A novel algorithm to discover effective weighted matching rectifying rules (WMRRs) automatically from dirty data in-hand and perform dependable and full-automatic repairing based on the discovered WMRRs, with higher accuracy than the existing dependable methods.
Sampling from repairs of conditional functional dependency violations
TLDR
This paper proposes a novel data cleaning approach that is not limited to finding a single repair, namely sampling from the space of possible repairs, and presents an algorithm that randomly samples from this space in an efficient way.
Cleaning data with Llunatic
TLDR
This paper develops a general chase-based repairing framework, referred to as Llunatic, in which repairs can be obtained for a large class of constraints and by using different strategies to select preferred values and shows that various instantiations of the framework result in repairs of good quality.
Towards certain fixes with editing rules and master data
A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data ...
Finding Interesting Cleaning Rules from Dirty Data
TLDR
This paper develops a system that is able to find interesting rules from dirty data with the improved TANE algorithm and designs two kinds of ranking methods: the support-based ranking and the comprehensive score ordering.
AutoRepair: an automatic repairing approach over multi-source data
TLDR
This paper proposes AutoRepair, a novel automatic multi-source data repairing approach to enrich the evidence by taking the advantages of truth discovery and data repairing, which outperform both recenttruth discovery and rule-based data repairing methods.
Discovery of Paradigm Dependencies
TLDR
A framework in which strings with similar coding rules and different lengths are clustered together and aligned vertically, from which PDs can be discovered directly is proposed, and the aligning problem is the key component of this framework and is proved in NP-Complete.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Improving Data Quality: Consistency and Accuracy
TLDR
This paper proposes two algorithms: one for automatically computing a repair D' that satisfies a given set of CFDs, and the other for incrementally finding a repair in response to updates to a clean database.
Discovering data quality rules
TLDR
This work proposes a new data-driven tool that can be used within an organization's data quality management process to suggest possible rules, and to identify conformant and non-conformant records.
A cost-based model and effective heuristic for repairing constraints by value modification
TLDR
It is proved that finding minimal-cost repairs in this model is NP-complete in the size of the database, and an approach to heuristic repair-construction based on equivalence classes of attribute values is introduced.
Database repairing using updates
TLDR
This work proposes a theoretical framework that also covers updates as a repair primitive, and shows the construct of nucleus: a single database that yields consistent answers to a class of queries, without the need for query rewriting.
On approximating optimum repairs for functional dependency violations
TLDR
An approximation algorithm is presented that for a fixed set of functional dependencies and an arbitrary input inconsistent database, produces a repair whose distance to the database is within a constant factor of the optimum repair distance.
Conditional functional dependencies for capturing data inconsistencies
TLDR
This work proposes a class of integrity constraints for relational databases, referred to as conditional functional dependencies (CFDs), and provides an inference system analogous to Armstrong's axioms for FDs, and shows that the implication problem is coNP-complete for CFDs in contrast to the linear-time complexity for their traditional counterpart.
Reasoning about Record Matching Rules
TLDR
A class of matching dependencies (MDs) for specifying the semantics of data in unreliable relations is introduced, defined in terms of similarity metrics and a dynamic semantics, and a mechanism for inferring MDs is proposed, a departure from traditional implication analysis.
Dynamic constraints for record matching
TLDR
It is experimentally verified that the algorithms help matching tools efficiently identify keys at compile time for matching, blocking or windowing and in addition, that the md-based techniques effectively improve the quality and efficiency of various record matching methods.
Potter's Wheel: An Interactive Data Cleaning System
TLDR
Potter’s Wheel is presented, an interactive data cleaning system that tightly integrates transformation and discrepancy detection, and users can gradually build a transformation as discrepancies are found, and clean the data without writing complex programs or enduring long delays.
Minimal-change integrity maintenance using tuple deletions
...
1
2
3
4
5
...