Data clone detection and visualization in spreadsheets

@article{Hermans2013DataCD,
  title={Data clone detection and visualization in spreadsheets},
  author={Felienne F. J. Hermans and B. M. W. Sedee and Martin Pinzger and Arie van Deursen},
  journal={2013 35th International Conference on Software Engineering (ICSE)},
  year={2013},
  pages={292-301}
}
  • F. Hermans, B. Sedee, A. Deursen
  • Published 18 May 2013
  • Computer Science
  • 2013 35th International Conference on Software Engineering (ICSE)
Spreadsheets are widely used in industry: it is estimated that end-user programmers outnumber programmers by a factor 5. However, spreadsheets are error-prone, numerous companies have lost money because of spreadsheet errors. One of the causes for spreadsheet problems is the prevalence of copy-pasting. In this paper, we study this cloning in spreadsheets. Based on existing text-based clone detection algorithms, we have developed an algorithm to detect data clones in spreadsheets: formulas whose… 

Figures and Tables from this paper

Copy-Paste Detection in Spreadsheets
TLDR
An algorithm to detect data clones within spreadsheets: formulas whose values are copied in a different location and it is shown that this algorithm is able to detect these data clones with precision rates similar to those achieved by state-of-the-art code clone detection algorithm.
Detecting table clones and smells in spreadsheets
TLDR
Inspired by existing fingerprint-based code clone detection techniques, a detection algorithm was developed to detect table clones and related smells due to inconsistency among them in spreadsheets and applied it to real-world spreadsheets from the EUSES corpus.
On the empirical evaluation of similarity coefficients for spreadsheets fault localization
TLDR
This paper studies the impact of different similarity coefficients on the accuracy of spectrum-based fault localization applied to the spreadsheet domain and shows that three of the 42 studied coefficients require less effort by the user while inspecting the diagnostic report, and can be used interchangeably without a loss of accuracy.
How effectively can spreadsheet anomalies be detected: An empirical study
WARDER: Towards E ective Spreadsheet Defect Detection by Validity-based Cell Cluster Re nements
TLDR
WARDER is presented to improve and discuss and improve one state-of-the-art technique, CUSTODES, which exploits spreadsheet cell clustering and defect detection to extend its scope and make its detection patterns adaptive to varying spreadsheet styles.
Why Does my Spreadsheet Compute Wrong Values?
  • Birgit Hofer, F. Wotawa
  • Computer Science
    2014 IEEE 25th International Symposium on Software Reliability Engineering
  • 2014
TLDR
This paper introduces a novel dependency-based approach for model-based fault localization in spreadsheets that improves diagnostic accuracy while keeping computation times short, thus making the automated fault localization more appropriate for practical applications.
Analyzing and Visualizing Spreadsheets
TLDR
This dissertation aims at developing methods to support spreadsheet users to understand, update and improve spreadsheets and found that methods from software engineering can be applied to spreadsheets very well, and that these methods support end-users in working with spreadsheets.
Using constraints to diagnose faulty spreadsheets
TLDR
This work presents a constraint-based approach, ConBug, for debugging spreadsheets, which helps end users to pinpoint faulty cells in a spreadsheet and demonstrates that the approach is light-weight and efficient.
Smelling Faults in Spreadsheets
TLDR
A technique to automatically pinpoint potential faults in spreadsheets is proposed, which combines a catalog of spreadsheet smells that provide a first indication of a potential fault, with a generic spectrum-based fault localization strategy in order to improve on these initial results.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Exact and Near-miss Clone Detection in Spreadsheets
  • F. Hermans
  • Computer Science
    Tiny Trans. Comput. Sci.
  • 2012
TLDR
Clone detection in spreadsheets is useful both to reveal opportunities for improving the spreadsheet and to detect actual errors, and this work shows that this is a promising avenue.
Exact and Near-miss Clone Detection in Spreadsheets
TLDR
Clone detection in spreadsheets is useful both to reveal opportunities for improving the spreadsheet and to detect actual errors, and this work shows that this is a promising avenue.
Detecting code smells in spreadsheet formulas
TLDR
A list of metrics by which to detect smelly formulas and a visualization technique to highlight these formulas in spreadsheets are presented and indicate that formula smells are common and that they can reveal real errors and weaknesses in spreadsheet formulas.
Detection and analysis of near-miss software clones
  • C. Roy
  • Computer Science
    2009 IEEE International Conference on Software Maintenance
  • 2009
TLDR
A hybrid clone detection method is developed, and a mutation-based framework is developed that automatically and efficiently measures (and compares) the recall and precision of clone detection tools.
Detecting and visualizing inter-worksheet smells in spreadsheets
TLDR
The results of the evaluation indicate that smells can indeed reveal weaknesses in a spreadsheet's design, and that data flow diagrams are an appropriate way to show those weaknesses.
Using Slicing to Identify Duplication in Source Code
TLDR
The design and initial implementation of a tool that finds clones and displays them to the programmer and uses program dependence graphs (PDGs) and program slicing to find isomorphic PDG subgraphs that represent clones is described.
Tracking the Evolution of Code Clones
TLDR
An approach for mapping code duplications from one particular version of the software to another one, based on a similarity distance function, and introduces the term of "clone smells", which gives a clue about why the reported code fragments might be dangerous.
Supporting professional spreadsheet users by generating leveled dataflow diagrams
TLDR
This paper first study the problems and information needs of professional spreadsheet users by means of a survey conducted at a large financial company, and presents an approach that extracts this information from spreadsheets and presents it in a compact and easy to understand way, with leveled dataflow diagrams.
Automatically Extracting Class Diagrams from Spreadsheets
TLDR
This work creates a library of common spreadsheet usage patterns that are localized in the spreadsheet using a two- dimensional parsing algorithm and transformed and enriched with information from the library to automatically extract information and transform it into class diagrams.
Near-miss function clones in open source software: an empirical study
  • C. Roy, J. Cordy
  • Computer Science
    J. Softw. Maintenance Res. Pract.
  • 2010
TLDR
This paper examines more than twenty open source C, Java and C# systems, including the entire Linux Kernel, Apache httpd, J2SDK-Swing and db4o, and compares their use of cloned code in several different dimensions, including language, clone size, clone similarity, clone location and clone density.
...
1
2
3
4
...