Diff
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, the utility diff is a data comparison tool that computes and displays the differences between the contents of files. Unlike edit distance notions used for other purposes, diff is line-oriented rather than character-oriented, but it is like Levenshtein distance in that it tries to determine the smallest set of deletions and insertions to create one file from the other. The utility displays the changes in one of several standard formats, such that both humans or computers can parse the changes, and use them for
patching Patching is a small village and civil parish that lies amid the fields and woods of the southern slopes of the South Downs in the National Park in the Arun District of West Sussex, England. It has a visible hill-workings history going back t ...
. Typically, ''diff'' is used to show the changes between two versions of the same file. Modern implementations also support
binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fil ...
s. The output is called a "diff", or a
patch Patch or Patches may refer to: Arts, entertainment and media * Patch Johnson, a fictional character from ''Days of Our Lives'' * Patch (''My Little Pony''), a toy * "Patches" (Dickey Lee song), 1962 * "Patches" (Chairmen of the Board song ...
, since the output can be applied with the
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
program . The output of similar file comparison utilities is also called a "diff"; like the use of the word "
grep grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command ''g/re/p'' (''globally search for a regular expression and print matching lines''), which has the sa ...
" for describing the act of searching, the word ''diff'' became a generic term for calculating data difference and the results thereof. The
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming in ...
standard specifies the behavior of the "diff" and "patch" utilities and their file formats.


History

diff was developed in the early 1970s on the Unix operating system, which was emerging from
Bell Labs Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mul ...
in Murray Hill, New Jersey. The first released version shipped with the 5th Edition of Unix in 1974, and was written by
Douglas McIlroy Malcolm Douglas McIlroy (born 1932) is a mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and developed se ...
, and
James Hunt James Simon Wallis Hunt (29 August 1947 – 15 June 1993) ''Autocourse Grand Prix Archive'', 14 October 2007. Retrieved 4 November 2007. was a British racing driver who won the Formula One World Championship in . After retiring from racing in ...
. This research was published in a 1976 paper co-written with James W. Hunt, who developed an initial prototype of . The algorithm this paper described became known as the
Hunt–Szymanski algorithm In computer science, the Hunt–Szymanski algorithm, also known as Hunt–McIlroy algorithm, is a solution to the longest common subsequence problem. It was one of the first non-heuristic algorithms used in diff which compares a pair of files each ...
. McIlroy's work was preceded and influenced by Steve Johnson's comparison program on GECOS and
Mike Lesk Michael E. Lesk (born 1945) is an American computer scientist. Biography In the 1960s, Michael Lesk worked for the SMART Information Retrieval System project, wrote much of its retrieval code and did many of the retrieval experiments, as well as ...
's program. also originated on Unix and, like , produced line-by-line changes and even used angle-brackets (">" and "<") for presenting line insertions and deletions in the program's output. The
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
s used in these early applications were, however, deemed unreliable. The potential usefulness of a diff tool provoked McIlroy into researching and designing a more robust tool that could be used in a variety of tasks, but perform well in the processing and size limitations of the
PDP-11 The PDP-11 is a series of 16-bit minicomputers sold by Digital Equipment Corporation (DEC) from 1970 into the 1990s, one of a set of products in the Programmed Data Processor (PDP) series. In total, around 600,000 PDP-11s of all models were sol ...
's hardware. His approach to the problem resulted from collaboration with individuals at Bell Labs including Alfred Aho, Elliot Pinson,
Jeffrey Ullman Jeffrey David Ullman (born November 22, 1942) is an American computer scientist and the Stanford W. Ascherman Professor of Engineering, Emeritus, at Stanford University. His textbooks on compilers (various editions are popularly known as the d ...
, and Harold S. Stone. In the context of Unix, the use of the line editor provided with the natural ability to create machine-usable "edit scripts". These edit scripts, when saved to a file, can, along with the original file, be reconstituted by into the modified file in its entirety. This greatly reduced the
secondary storage Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a compute ...
necessary to maintain multiple versions of a file. McIlroy considered writing a post-processor for where a variety of output formats could be designed and implemented, but he found it more frugal and simpler to have be responsible for generating the syntax and reverse-order input accepted by the command. Late in 1984
Larry Wall Larry Arnold Wall (born September 27, 1954) is an American computer programmer and author. He created the Perl programming language. Personal life Wall grew up in Los Angeles and then Bremerton, Washington, before starting higher education at ...
created a separate utility,
patch Patch or Patches may refer to: Arts, entertainment and media * Patch Johnson, a fictional character from ''Days of Our Lives'' * Patch (''My Little Pony''), a toy * "Patches" (Dickey Lee song), 1962 * "Patches" (Chairmen of the Board song ...
, releasing its source code on the ''mod.sources'' and ''net.sources'' newsgroups. This program generalized and extended the ability to modify files with output from . Modes in
Emacs Emacs , originally named EMACS (an acronym for "Editor MACroS"), is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, ...
also allow for converting the format of patches and even editing patches interactively. In 's early years, common uses included comparing changes in the source of software code and markup for technical documents, verifying program debugging output, comparing filesystem listings and analyzing computer assembly code. The output targeted for was motivated to provide compression for a sequence of modifications made to a file. The
Source Code Control System Source Code Control System (SCCS) is a version control system designed to track changes in source code and other text files during the development of a piece of software. This allows the user to retrieve any of the previous versions of the origin ...
(SCCS) and its ability to archive revisions emerged in the late 1970s as a consequence of storing edit scripts from .


Algorithm

The operation of is based on solving the longest common subsequence problem. In this problem, given two sequences of items: h q e i k r x y and we want to find a longest sequence of items that is present in both original sequences in the same order. That is, we want to find a new sequence which can be obtained from the first original sequence by deleting some items, and from the second original sequence by deleting other items. We also want this sequence to be as long as possible. In this case it is a b c d f g j z From a longest common subsequence it is only a small step to get -like output: if an item is absent in the subsequence but present in the first original sequence, it must have been deleted (as indicated by the '-' marks, below). If it is absent in the subsequence but present in the second original sequence, it must have been inserted (as indicated by the '+' marks). e h i q k r x y + - + - + + + +


Usage

The diff command is invoked from the command line, passing it the names of two files: diff ''original'' ''new''. The output of the command represents the changes required to transform the ''original'' file into the ''new'' file. If ''original'' and ''new'' are directories, then will be run on each file that exists in both directories. An option, -r, will recursively descend any matching subdirectories to compare files between directories. Any of the examples in the article use the following two files, ''original'' and ''new'': ''original'': This part of the document has stayed the same from version to version. It shouldn't be shown if it doesn't change. Otherwise, that would not be helping to compress the size of the changes. This paragraph contains text that is outdated. It will be deleted in the near future. It is important to spell check this dokument. On the other hand, a misspelled word isn't the end of the world. Nothing in the rest of this paragraph needs to be changed. Things can be added after it. ''new'': This is an important notice! It should therefore be located at the beginning of this document! This part of the document has stayed the same from version to version. It shouldn't be shown if it doesn't change. Otherwise, that would not be helping to compress the size of the changes. It is important to spell check this document. On the other hand, a misspelled word isn't the end of the world. Nothing in the rest of this paragraph needs to be changed. Things can be added after it. This paragraph contains important new additions to this document. The command diff original new produces the following ''normal diff output'': ''Here, the diff output is shown with colors to make it easier to read. The diff utility does not produce colored output; its output is
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
. However, many tools can show the output with colors by using
syntax highlighting Syntax highlighting is a feature of text editors that are used for programming, scripting, or markup languages, such as HTML. The feature displays text, especially source code, in different colours and fonts according to the category of terms ...
.'' In this traditional output format, a stands for ''added'', d for ''deleted'' and c for ''changed''. Line numbers of the original file appear before a/d/c and those of the new file appear after. The less-than and greater-than signs (at the beginning of lines that are added, deleted or changed) indicate which file the lines appear in. Addition lines are added to the original file to appear in the new file. Deletion lines are deleted from the original file to be missing in the new file. By default, lines common to both files are not shown. Lines that have moved are shown as added at their new location and as deleted from their old location. However, some diff tools highlight moved lines.


Output variations


Edit script

An ed script can still be generated by modern versions of diff with the -e option. The resulting edit script for this example is as follows: 24a ''This paragraph contains'' ''important new additions'' ''to this document.'' . 17c ''check this document. On'' . 11,15d 0a ''This is an important'' ''notice! It should'' ''therefore be located at'' ''the beginning of this'' ''document!'' . In order to transform the content of file ''original'' into the content of file ''new'' using , we should append two lines to this diff file, one line containing a w (write) command, and one containing a q (quit) command (e.g. by ). Here we gave the diff file the name ''mydiff'' and the transformation will then happen when we run .


Context format

The Berkeley distribution of Unix made a point of adding the ''context format'' () and the ability to recurse on filesystem directory structures (), adding those features in 2.8 BSD, released in July 1981. The context format of diff introduced at Berkeley helped with distributing patches for source code that may have been changed minimally. In the context format, any changed lines are shown alongside unchanged lines before and after. The inclusion of any number of unchanged lines provides a ''context'' to the patch. The ''context'' consists of lines that have not changed between the two files and serve as a reference to locate the lines' place in a modified file and find the intended location for a change to be applied regardless of whether the line numbers still correspond. The context format introduces greater readability for humans and reliability when applying the patch, and an output which is accepted as input to the
patch Patch or Patches may refer to: Arts, entertainment and media * Patch Johnson, a fictional character from ''Days of Our Lives'' * Patch (''My Little Pony''), a toy * "Patches" (Dickey Lee song), 1962 * "Patches" (Chairmen of the Board song ...
program. This intelligent behavior isn't possible with the traditional diff output. The number of unchanged lines shown above and below a change ''hunk'' can be defined by the user, even zero, but three lines is typically the default. If the context of unchanged lines in a hunk overlap with an adjacent hunk, then diff will avoid duplicating the unchanged lines and merge the hunks into a single hunk. A "" represents a change between lines that correspond in the two files, whereas a "" represents the addition of a line, and a "" the removal of a line. A blank
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...
represents an unchanged line. At the beginning of the patch is the file information, including the full path and a time stamp delimited by a tab character. At the beginning of each hunk are the line numbers that apply for the corresponding change in the files. A number range appearing between sets of three asterisks applies to the original file, while sets of three dashes apply to the new file. The hunk ranges specify the starting and ending line numbers in the respective file. The command produces the following output: *** /path/to/original timestamp --- /path/to/new timestamp *************** *** 1,3 **** --- 1,9 ---- + This is an important + notice! It should + therefore be located at + the beginning of this + document! + This part of the document has stayed the same from version to *************** *** 8,20 **** compress the size of the changes. - This paragraph contains - text that is outdated. - It will be deleted in the - near future. It is important to spell ! check this dokument. On the other hand, a misspelled word isn't the end of the world. --- 14,21 ---- compress the size of the changes. It is important to spell ! check this document. On the other hand, a misspelled word isn't the end of the world. *************** *** 22,24 **** --- 23,29 ---- this paragraph needs to be changed. Things can be added after it. + + This paragraph contains + important new additions + to this document. ''Here, the diff output is shown with colors to make it easier to read. The diff utility does not produce colored output; its output is
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
. However, many tools can show the output with colors by using
syntax highlighting Syntax highlighting is a feature of text editors that are used for programming, scripting, or markup languages, such as HTML. The feature displays text, especially source code, in different colours and fonts according to the category of terms ...
.''


Unified format

The ''unified format'' (or ''unidiff'') inherits the technical improvements made by the context format, but produces a smaller diff with old and new text presented immediately adjacent. Unified format is usually invoked using the "-u" command line option. This output is often used as input to the
patch Patch or Patches may refer to: Arts, entertainment and media * Patch Johnson, a fictional character from ''Days of Our Lives'' * Patch (''My Little Pony''), a toy * "Patches" (Dickey Lee song), 1962 * "Patches" (Chairmen of the Board song ...
program. Many projects specifically request that "diffs" be submitted in the unified format, making unified diff format the most common format for exchange between software developers. Unified context diffs were originally developed by Wayne Davison in August 1990 (in unidiff which appeared in Volume 14 of comp.sources.misc).
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
added unified diff support to the
GNU Project The GNU Project () is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and computing devices by collabor ...
's diff utility one month later, and the feature debuted in GNU diff 1.15, released in January 1991. GNU diff has since generalized the context format to allow arbitrary formatting of diffs. The format starts with the same two-line header as the context format, except that the original file is preceded by "---" and the new file is preceded by "+++". Following this are one or more change hunks that contain the line differences in the file. The unchanged, contextual lines are preceded by a space character, addition lines are preceded by a
plus sign The plus and minus signs, and , are mathematical symbols used to represent the notions of positive and negative, respectively. In addition, represents the operation of addition, which results in a sum, while represents subtraction, res ...
, and deletion lines are preceded by a
minus sign The plus and minus signs, and , are mathematical symbols used to represent the notions of positive and negative, respectively. In addition, represents the operation of addition, which results in a sum, while represents subtraction, resul ...
. A hunk begins with range information and is immediately followed with the line additions, line deletions, and any number of the contextual lines. The range information is surrounded by double at signs, and combines onto a single line what appears on two lines in the context format ( above). The format of the range information line is as follows: @@ -l,s +l,s @@ ''optional section heading'' The hunk range information contains two hunk ranges. The range for the hunk of the original file is preceded by a minus symbol, and the range for the new file is preceded by a plus symbol. Each hunk range is of the format ''l,s'' where ''l'' is the starting line number and ''s'' is the number of lines the change hunk applies to for each respective file. In many versions of GNU diff, each range can omit the comma and trailing value ''s'', in which case ''s'' defaults to 1. Note that the only really interesting value is the ''l'' line number of the first range; all the other values can be computed from the diff. The hunk range for the original should be the sum of all contextual and deletion (including changed) hunk lines. The hunk range for the new file should be a sum of all contextual and addition (including changed) hunk lines. If hunk size information does not correspond with the number of lines in the hunk, then the diff could be considered invalid and be rejected. Optionally, the hunk range can be followed by the heading of the section or function that the hunk is part of. This is mainly useful to make the diff easier to read. When creating a diff with GNU diff, the heading is identified by
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
matching. If a line is modified, it is represented as a deletion and addition. Since the hunks of the original and new file appear in the same hunk, such changes would appear adjacent to one another. An occurrence of this in the example below is:
-check this dokument. On
+check this document. On
The command diff -u original new produces the following output: --- /path/to/original timestamp +++ /path/to/new timestamp @@ -1,3 +1,9 @@ +This is an important +notice! It should +therefore be located at +the beginning of this +document! + This part of the document has stayed the same from version to @@ -8,13 +14,8 @@ compress the size of the changes. -This paragraph contains -text that is outdated. -It will be deleted in the -near future. - It is important to spell -check this dokument. On +check this document. On the other hand, a misspelled word isn't the end of the world. @@ -22,3 +23,7 @@ this paragraph needs to be changed. Things can be added after it. + +This paragraph contains +important new additions +to this document. ''Here, the diff output is shown with colors to make it easier to read. The diff utility does not produce colored output; its output is
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
. However, many tools can show the output with colors by using
syntax highlighting Syntax highlighting is a feature of text editors that are used for programming, scripting, or markup languages, such as HTML. The feature displays text, especially source code, in different colours and fonts according to the category of terms ...
.'' Note that to successfully separate the file names from the timestamps, the delimiter between them is a tab character. This is invisible on screen and can be lost when diffs are copy/pasted from console/terminal screens. There are some modifications and extensions to the diff formats that are used and understood by certain programs and in certain contexts. For example, some
revision control In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections o ...
systems—such as
Subversion Subversion () refers to a process by which the values and principles of a system in place are contradicted or reversed in an attempt to transform the established social order and its structures of power, authority, hierarchy, and social norms ...
—specify a version number, "working copy", or any other comment instead of or in addition to a timestamp in the diff's header section. Some tools allow diffs for several different files to be merged into one, using a header for each modified file that may look something like this: Index: path/to/file.cpp The special case of files that do not end in a newline is not handled. Neither the unidiff utility nor the POSIX diff standard define a way to handle this type of files. (Indeed, such files are not "text" files by strict POSIX definitions.) The
patch Patch or Patches may refer to: Arts, entertainment and media * Patch Johnson, a fictional character from ''Days of Our Lives'' * Patch (''My Little Pony''), a toy * "Patches" (Dickey Lee song), 1962 * "Patches" (Chairmen of the Board song ...
program is not aware even of an implementation specific diff output.


Implementations and related programs

Changes since 1975 include improvements to the core algorithm, the addition of useful features to the command, and the design of new output formats. The basic algorithm is described in the papers ''An O(ND) Difference Algorithm and its Variations'' by Eugene W. Myers and in ''A File Comparison Program'' by Webb Miller and Myers. The algorithm was independently discovered and described in ''Algorithms for Approximate String Matching'', by
Esko Ukkonen Esko Juhani Ukkonen (b. 1950) is a Finnish theoretical computer scientist known for his contributions to string algorithms, and particularly for Ukkonen's algorithm for suffix tree construction. He is a professor emeritus of the University of H ...
. The first editions of the diff program were designed for line comparisons of text files expecting the
newline Newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or ...
character to delimit lines. By the 1980s, support for binary files resulted in a shift in the application's design and implementation. GNU diff and diff3 are included in the diffutils package with other diff and
patch Patch or Patches may refer to: Arts, entertainment and media * Patch Johnson, a fictional character from ''Days of Our Lives'' * Patch (''My Little Pony''), a toy * "Patches" (Dickey Lee song), 1962 * "Patches" (Chairmen of the Board song ...
related utilities. Nowadays there is also a patchutils package that can combine, rearrange, compare and fix context diffs and unified diffs.


Formatters and front-ends

Postprocessors sdiff and diffmk render side-by-side diff listings and applied change marks to printed documents, respectively. Both were developed elsewhere in Bell Labs in or before 1981.
Diff3 diff3 is a Unix utility to compare three files and show any differences among them. diff3 can also merge files, implementing a three-way merge. History and implementations originally appeared in Version 7 Unix of 1979. A very similar version ...
compares one file against two other files by reconciling two diffs. It was originally conceived by Paul Jensen to reconcile changes made by two people editing a common source. It is also used by revision control systems, e.g.
RCS RCS may refer to: Organisations *Racing Club de Strasbourg Alsace * Radio Corporation of Singapore * Radcliffe Choral Society * Rawmarsh Community School *Red Crescent Society * Red Cross Society * Representation of Czechs and Slovaks, a football ...
, for merging.
Emacs Emacs , originally named EMACS (an acronym for "Editor MACroS"), is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, ...
has
Ediff GNU Emacs is a free software text editor. It was created by GNU Project founder Richard Stallman, based on the Emacs editor developed for Unix operating systems. GNU Emacs has been a central component of the GNU project and a flagship project of ...
for showing the changes a patch would provide in a user interface that combines interactive editing and merging capabilities for patch files. Vim provides vimdiff to compare from two to eight files, with differences highlighted in color. While historically invoking the diff program, modern vim uses
git Git () is a distributed version control system: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data in ...
's fork of xdiff library (LibXDiff) code, providing improved speed and functionality. GNU Wdiff is a front end to diff that shows the words or phrases that changed in a text document of written language even in the presence of word-wrapping or different column widths. colordiff is a Perl wrapper for 'diff' and produces the same output but with pretty 'syntax' highlighting.


Algorithmic derivatives

Utilities that compare source files by their syntactic structure have been built mostly as research tools for some programming languages; some are available as commercial tools. In addition, free tools that perform syntax-aware diff include: * C++: zograscope, AST-based. * HTML: Daisydiff, html-differ. * XML: ''xmldiffpatch'' by Microsoft and ''xmldiffmerge'' for IBM. *
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
: astii (AST-based). * Multi-language: Pretty Diff (format code and then diff) spiff is a variant of ''diff'' that ignores differences in floating point calculations with roundoff errors and whitespace, both of which are generally irrelevant to source code comparison.
Bellcore iconectiv is a supplier of network planning and network management services to telecommunications providers. Known as Bellcore after its establishment in the United States in 1983 as part of the break-up of the Bell System, the company's name ...
wrote the original version. An HPUX port is the most current public release. spiff does not support binary files. spiff outputs to the standard output in standard diff format and accepts inputs in the C,
Bourne shell The Bourne shell (sh) is a shell command-line interpreter for computer operating systems. The Bourne shell was the default shell for Version 7 Unix. Unix-like systems continue to have /bin/sh—which will be the Bourne shell, or a symbolic l ...
, Fortran,
Modula-2 Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It ...
and
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispin ...
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s. LibXDiff is an LGPL
library A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vi ...
that provides an interface to many algorithms from 1998. An improved Myers algorithm with Rabin fingerprint was originally implemented (as of the final release of 2008), but
git Git () is a distributed version control system: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data in ...
and libgit2's fork has since expanded the repository with many of its own. One algorithm called "histogram" is generally regarded as much better than the original Myers algorithm, both in speed and quality. This is the modern version of ''LibXDiff'' used by Vim.


See also

*
Comparison of file comparison tools This article compares computer software tools which are used for accomplishing comparisons of files of various types. The file types addressed by individual file comparison apps varies, but may include text, symbols, images, audio, or video. Th ...
* Delta encoding * Difference operator * Edit distance ** Levenshtein distance * History of software configuration management * Longest common subsequence problem * Microsoft File Compare * Microsoft WinDiff *
Revision control In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections o ...
*
Software configuration management In software engineering, software configuration management (SCM or S/W CM) is the task of tracking and controlling changes in the software, part of the larger cross-disciplinary field of configuration management. SCM practices include revisio ...


Other free file comparison tools

* cmp *
comm The command in the Unix family of computer operating systems is a utility that is used to compare two files for common and distinct lines. is specified in the POSIX standard. It has been widely available on Unix-like operating systems s ...
* tkdiff * WinMerge (Microsoft Windows) * meld * Pretty Diff


References


Further reading


A technique for isolating differences between files
*A generic implementation of the Myers SES/LCS algorithm with the Hirschberg linear space refinemen
(C source code)


External links

* * * *
JavaScript Implementation
{{Version control software 1974 software Free file comparison tools Formal languages Pattern matching Data differencing Standard Unix programs Unix SUS2008 utilities Plan 9 commands Inferno (operating system) commands