# A file comparison program

@article{Miller1985AFC, title={A file comparison program}, author={Webb Miller and Eugene Wimberly Myers}, journal={Software: Practice and Experience}, year={1985}, volume={15} }

This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files' lengths. In experiments performed on typical files, the program often ran four times faster than the UNIX diff command.

## 212 Citations

### Identifying syntactic differences between two programs

- Computer ScienceSoftw. Pract. Exp.
- 1991

A comparison algorithm is developed that can point out the differences between two programs more accurately than previous text comparison tools and is based on a dynamic programming scheme.

### An approximation to the greedy algorithm for differential compression of very large files

- Computer ScienceData Compression Conference, 2004. Proceedings. DCC 2004
- 2004

A new differential compression algorithm that combines the hash value and suffix array technique and depends upon the utilization of three new data structures, the block hash table, the quick index array, and the pointer array, which improves the run-time of the algorithm and compress very large files.

### Dynamic edit distance table under a general weighted cost function

- Computer ScienceJ. Discrete Algorithms
- 2015

### An analysis on computation of longest common subsequence algorithm

- Computer Science2017 International Conference on Intelligent Sustainable Systems (ICISS)
- 2017

This paper has done comparison among various algorithms which works on two or more strings, and put the new proposals for the development of new algorithms for more strings.

### A Semantic Difference Algorithm for Structured Visual Dataflow Programs

- Computer Science
- 2011

This paper presents an algorithm for semantic comparison of programs in controlled visual dataflow languages; that is, languages in which dataflow diagrams are embedded in control structures and performs depth-first search of call structures to determine if two programs are semantically equivalent, and if they are not, discovers the differences.

### Implementation of Java Program Similarity Measurement Tool Using Token Structure and Execution Control Structure

- Computer Science
- 2003

This paper proposes similarity measurement method for Java programs by using software metrics that are calculated from the structure of token and execution control in the target source program by comparing the resulting metrics values without using expensive string comparison.

### Syntactic Software Merging

- Computer ScienceSCM
- 1995

The fundamentals of merging are described, the known methods of software merging are surveyed, including a method based on programming-language syntax, and a set of tools that perform syntactic merging are discussed.

### Measuring the accuracy of page-reading systems

- Computer Science, Mathematics
- 1996

It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified.

### New Refinement Techniques for Longest Common Subsequence Algorithms

- Computer ScienceSPIRE
- 2003

It has turned out to be difficult to develop an lcs algorithm which would be superior for all problem instances, and implementing the most evolved lcs algorithms presented recently is laborious.

### Semantic comparison of structured visual dataflow programs

- Computer ScienceVINCI '10
- 2010

This algorithm performs depth-first search of call structures comparing embedded diagrams using subgraph isomorphism, to determine if two programs are semantically equivalent, and if they are not, discovers the differences.

## References

SHOWING 1-10 OF 11 REFERENCES

### Optimal Code Generation for Expression Trees

- Computer ScienceJ. ACM
- 1976

A dynamic programming algorithm is presented which produces optimal code for any machine in this class of machines, which runs in time linearly proportional to the size of the input.

### The string-to-string correction problem with block moves

- Computer ScienceTOCS
- 1984

An algorithm that produces the shortest edit sequence transforming one string into another is presented and is optimal in the sense that it generates a minimal covering set of common substrings of one string with respect to another.

### A linear space algorithm for computing maximal common subsequences

- MathematicsCommun. ACM
- 1975

The problem of finding a longest common subsequence of two strings has been solved in quadratic time and space. An algorithm is presented which will solve this problem in quadratic time and in linear…

### The String-to-String Correction Problem

- Mathematics, EducationJACM
- 1974

An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.

### Rcs — a system for version control

- Computer ScienceSoftw. Pract. Exp.
- 1985

Basic version control concepts are introduced and the practice of version control using RCS is discussed, and usage statistics show that RCS's delta method is space and time efficient.

### Approximate String Matching

- Computer ScienceCSUR
- 1980

Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The…

### A redisplay algorithm

- Computer Science
- 1981

The algorithm is interesting because it applies results from the theoretical string-to-string correction problem (a generalization of the problem of finding a longest common subsequence) to a problem that is usually approached with crude ad-hoc techniques.

### A fast algorithm for computing longest common subsequences

- Computer ScienceCACM
- 1977

An algorithm for finding the longest common subsequence of two sequences of length n which has a running time of O((r + n) log n), where r is the total number of ordered pairs of positions at which the two sequences match.

### The source code control system

- Computer ScienceIEEE Transactions on Software Engineering
- 1975

The SCCS approach to source code control is discussed, how it is used and explained is shown and how the system is implemented is explained.

### Bounds on the Complexity of the Longest Common Subsequence Problem

- Computer ScienceJ. ACM
- 1976

It is shown that unless a bound on the total number of distinct symbols is assumed, every solution to the problem can consume an amount of time that is proportional to the product of the lengths of the two strings.