PHYLogeny Inference Package (PHYLIP) is a free
computational phylogenetics
Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, Heuristic (computer science), heuristics, and approaches involved in Phylogenetics, phylogenetic analyses. The goal i ...
package of programs for inferring evolutionary trees (
phylogenies
A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In o ...
). It consists of 65
portable
Portable may refer to:
General
* Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work
* Portable classroom, a temporary building installed on the grounds of a school to provide a ...
programs, i.e., the
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
is written in the programming language
C. As of version 3.696, it is licensed as
open-source software
Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
; versions 3.695 and older were
proprietary software
Proprietary software is computer software, software that grants its creator, publisher, or other rightsholder or rightsholder partner a legal monopoly by modern copyright and intellectual property law to exclude the recipient from freely sharing t ...
freeware
Freeware is software, often proprietary, that is distributed at no monetary cost to the end user. There is no agreed-upon set of rights, license, or EULA that defines ''freeware'' unambiguously; every publisher defines its own rules for the free ...
. Releases occur as source code, and as precompiled
executable
In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...
s for many
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s including
Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
(95, 98, ME, NT, 2000, XP, Vista),
Mac OS 8
Mac OS 8 is the eighth major release of the classic Mac OS operating system for Macintosh computers, released by Apple Computer on July 26, 1997. It includes the largest overhaul of the classic Mac OS experience since the release of System 7 ...
,
Mac OS 9
Mac OS 9 is the ninth and final major release of the classic Mac OS operating system for Macintosh computers, made by Apple Computer. Introduced on October 23, 1999, it was promoted by Apple as "The Best Internet Operating System Ever", highlight ...
,
OS X
macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
,
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
(
Debian
Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
,
Red Hat
Red Hat, Inc. (formerly Red Hat Software, Inc.) is an American software company that provides open source software products to enterprises and is a subsidiary of IBM. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North ...
); and
FreeBSD
FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
from FreeBSD.org.
Full documentation is written for all the programs in the package and is included therein. The programs in the phylip package were written by Professor
Joseph Felsenstein, of the Department of Genome Sciences and the Department of Biology,
University of Washington
The University of Washington (UW and informally U-Dub or U Dub) is a public research university in Seattle, Washington, United States. Founded in 1861, the University of Washington is one of the oldest universities on the West Coast of the Uni ...
, Seattle.
Methods (implemented by each program) that are available in the package include
parsimony,
distance matrix
In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the ''dist ...
, and
likelihood methods, including bootstrapping and consensus trees. Data types that can be handled include
molecular sequences, gene frequencies,
restriction site
In molecular biology, restriction sites, or restriction recognition sites, are regions of a DNA molecule containing specific (4-8 base pairs in length) sequences of nucleotides; these are recognized by restriction enzymes, which cleave the DNA at ...
s and fragments, distance matrices, and discrete characters.
Each program is controlled through a menu, which asks users which options they want to set, and allows them to start the computation. The data is read into the program from a text file, which the user can prepare using any word processor or text editor (but this text file cannot be in the special format of the word processor, it must instead be in ''flat
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
'' or ''text only'' format). Some sequence analysis programs such as the
Clustal
Clustal is a computer program used for multiple sequence alignment in bioinformatics. The software and its algorithms have gone through several iterations, with ClustalΩ (Omega) being the latest version . It is available as standalone software, ...
W alignment program can write data files in the PHYLIP format. Most of the programs look for the data in a file called
infile
. If the phylip programs do not find this file, they then ask the user to type in the file name of the data file.
File format
The component programs of phylip use several different formats, all of which are relatively simple. Programs for the analysis of DNA sequence alignments, protein sequence alignments, or discrete characters (e.g., morphological data) can accept those data in sequential or interleaved format, as shown below.
Sequential format:
5 42
Turkey AAGCTNGGGC ATTTCAGGGT GAGCCCGGGC AATACAGGGT AT
Salmo schiAAGCCTTGGC AGTGCAGGGT GAGCCGTGGC CGGGCACGGT AT
H. sapiensACCGGTTGGC CGTTCAGGGT ACAGGTTGGC CGTTCAGGGT AA
Chimp AAACCCTTGC CGTTACGCTT AAACCGAGGC CGGGACACTC AT
Gorilla AAACCCTTGC CGGTACGCTT AAACCATTGC CGGTACGCTT AA
Interleaved format:
5 42
Turkey AAGCTNGGGC ATTTCAGGGT
Salmo schiAAGCCTTGGC AGTGCAGGGT
H. sapiensACCGGTTGGC CGTTCAGGGT
Chimp AAACCCTTGC CGTTACGCTT
Gorilla AAACCCTTGC CGGTACGCTT
GAGCCCGGGC AATACAGGGT AT
GAGCCGTGGC CGGGCACGGT AT
ACAGGTTGGC CGTTCAGGGT AA
AAACCGAGGC CGGGACACTC AT
AAACCATTGC CGGTACGCTT AA
The numbers are the number of taxa (different species in the example shown above) followed by the number of characters (aligned nucleotides or amino acids in the case of molecular sequences). Restriction site data must include the number of enzymes as well.
Names are limited to 10 characters by default and must be blank-filled to be of that length and followed immediately by the character data using one-letter codes, although the 10 character limit name can be changed by a minor modification of the code (by changing
nmlngth
in phylip.h and recompiling). All printable ASCII/ISO characters are allowed names, except for parentheses ("
(
" and "
)
"), square brackets ("
" and "">/code>" and "
/code>"), colon (":
"), semicolon (";
") and comma (",
"). The spaces embedded in the alignment are ignored.
Many programs for phylogenetic analyses, including the commonly use
RAxML
an
IQ-TREE
ref> programs, use the phylip format or a minor modification of that format called the relaxed phylip format.
Relaxed phylip format (sequential):
5 42
Turkey AAGCTNGGGCATTTCAGGGTGAGCCCGGGCAATACAGGGTAT
Salmo_schiefermuelleri AAGCCTTGGCAGTGCAGGGTGAGCCGTGGCCGGGCACGGTAT
H_sapiens ACCGGTTGGCCGTTCAGGGTACAGGTTGGCCGTTCAGGGTAA
Chimp AAACCCTTGCCGTTACGCTTAAACCGAGGCCGGGACACTCAT
Gorilla AAACCCTTGCCGGTACGCTTAAACCATTGCCGGTACGCTTAA
The primary difference in relaxed phylip format is the absence of the 10 character limit and the removal of the need to blank fill names to reach that length (although filling names to start the character matrix at the same position can improve readability for user). This example of relaxed uses underscores rather than spaces in the names and uses spaces between the names and the aligned character data; it is often good practice to avoid white space within taxon names and to separate the character data from the name when generating files. Like strict phylip format files, relaxed phylip format files can be in interleaved format and include spaces and endlines within the sequence data.
The programs that use distance data, like the neighbor
program that implements the neighbor-joining
In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm require ...
method, also use a simple distance matrix format the includes only the number of taxa, their names, and numerical values for the distances:
Phylip distance matrix:
7
Bovine 0.0000 1.6866 1.7198 1.6606 1.5243 1.6043 1.5905
Mouse 1.6866 0.0000 1.5232 1.4841 1.4465 1.4389 1.4629
Gibbon 1.7198 1.5232 0.0000 0.7115 0.5958 0.6179 0.5583
Orang 1.6606 1.4841 0.7115 0.0000 0.4631 0.5061 0.4710
Gorilla 1.5243 1.4465 0.5958 0.4631 0.0000 0.3484 0.3083
Chimp 1.6043 1.4389 0.6179 0.5061 0.3484 0.0000 0.2692
Human 1.5905 1.4629 0.5583 0.4710 0.3083 0.2692 0.0000
The number indicates the number of taxa and same limitations for taxon names exist. Note that this matrix is symmetric and the diagonal has values of 0 (since the distance between a taxon and itself is zero by definition).
Programs that use trees as input accept the trees in Newick format, an informal standard agreed to in 1986 by authors of seven major phylogeny packages. Output is written onto files with names like outfile
and outtree
. Trees written onto outtree
are in the Newick format.
Component programs
References
External links
*
Phylogeny Programs List
A large list of phylogeny packages with details on each one. {{As of, 2025, 01, 11, alt=Current count at 392.
Phylogenetics software