# A file comparison program

@article{Miller1985AFC, title={A file comparison program}, author={Webb Miller and Eugene W. Myers}, journal={Software: Practice and Experience}, year={1985}, volume={15} }

This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files' lengths. In experiments performed on typical files, the program often ran four times faster than the UNIX diff command.

#### Topics from this paper

#### 204 Citations

Identifying syntactic differences between two programs

- Computer Science
- Softw. Pract. Exp.
- 1991

A comparison algorithm is developed that can point out the differences between two programs more accurately than previous text comparison tools and is based on a dynamic programming scheme. Expand

An approximation to the greedy algorithm for differential compression of very large files

- Computer Science
- Data Compression Conference, 2004. Proceedings. DCC 2004
- 2004

A new differential compression algorithm that combines the hash value and suffix array technique and depends upon the utilization of three new data structures, the block hash table, the quick index array, and the pointer array, which improves the run-time of the algorithm and compress very large files. Expand

Measuring Similarity of Large Software Systems Based on Source Code Correspondence

- Computer Science
- PROFES
- 2005

A similarity metric between two sets of source code files based on the correspondence of overall source code lines is proposed, which reveals the evolutionary history characteristics of the BSD UNIX Operating System. Expand

An analysis on computation of longest common subsequence algorithm

- Computer Science
- 2017 International Conference on Intelligent Sustainable Systems (ICISS)
- 2017

This paper has done comparison among various algorithms which works on two or more strings, and put the new proposals for the development of new algorithms for more strings. Expand

A Semantic Difference Algorithm for Structured Visual Dataflow Programs

- Computer Science
- 2011

This paper presents an algorithm for semantic comparison of programs in controlled visual dataflow languages; that is, languages in which dataflow diagrams are embedded in control structures and performs depth-first search of call structures to determine if two programs are semantically equivalent, and if they are not, discovers the differences. Expand

Implementation of Java Program Similarity Measurement Tool Using Token Structure and Execution Control Structure

- 2003

In program development process, engineers often reuse components which have already been produced in past development by copying directly or minor modification. In these cases, information on similar… Expand

Syntactic Software Merging

- Computer Science
- SCM
- 1995

The fundamentals of merging are described, the known methods of software merging are surveyed, including a method based on programming-language syntax, and a set of tools that perform syntactic merging are discussed. Expand

Measuring the accuracy of page-reading systems

- Mathematics
- 1996

Given a bitmapped image of a page from any document, a page-reading system identifies the characters on the page and stores them in a text file. This "OCR-generated" text is represented by a string… Expand

New Refinement Techniques for Longest Common Subsequence Algorithms

- Computer Science, Mathematics
- SPIRE
- 2003

It has turned out to be difficult to develop an lcs algorithm which would be superior for all problem instances, and implementing the most evolved lcs algorithms presented recently is laborious. Expand

Semantic comparison of structured visual dataflow programs

- Computer Science
- VINCI '10
- 2010

This algorithm performs depth-first search of call structures comparing embedded diagrams using subgraph isomorphism, to determine if two programs are semantically equivalent, and if they are not, discovers the differences. Expand

#### References

SHOWING 1-10 OF 15 REFERENCES

Optimal Code Generation for Expression Trees

- Computer Science
- J. ACM
- 1976

A dynamic programming algorithm is presented which produces optimal code for any machine in this class of machines, which runs in time linearly proportional to the size of the input. Expand

The string-to-string correction problem with block moves

- Computer Science
- TOCS
- 1984

An algorithm that produces the shortest edit sequence transforming one string into another is presented and is optimal in the sense that it generates a minimal covering set of common substrings of one string with respect to another. Expand

A linear space algorithm for computing maximal common subsequences

- Mathematics, Computer Science
- Commun. ACM
- 1975

The problem of finding a longest common subsequence of two strings has been solved in quadratic time and space. An algorithm is presented which will solve this problem in quadratic time and in linear… Expand

The String-to-String Correction Problem

- Mathematics, Computer Science
- JACM
- 1974

An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings. Expand

Rcs — a system for version control

- Computer Science
- Softw. Pract. Exp.
- 1985

Basic version control concepts are introduced and the practice of version control using RCS is discussed, and usage statistics show that RCS's delta method is space and time efficient. Expand

Approximate String Matching

- Computer Science
- CSUR
- 1980

Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The… Expand

A redisplay algorithm

- 1981

This paper presents an algorithm for updating the image displayed on a conventional video terminal. It assumes that the terminal is capable of doing the usual insert/delete line and insert/delete… Expand

A redisplay algorithm

- Computer Science
- ACM SIGPLAN Notices
- 1981

The algorithm is interesting because it applies results from the theoretical string-to-string correction problem (a generalization of the problem of finding a longest common subsequence) to a problem that is usually approached with crude ad-hoc techniques. Expand

A fast algorithm for computing longest common subsequences

- Mathematics, Computer Science
- CACM
- 1977

An algorithm for finding the longest common subsequence of two sequences of length n which has a running time of O((r + n) log n), where r is the total number of ordered pairs of positions at which the two sequences match. Expand

The source code control system

- Computer Science
- IEEE Transactions on Software Engineering
- 1975

The SCCS approach to source code control is discussed, how it is used and explained is shown and how the system is implemented is explained. Expand