BioPerl is a collection of

Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...

modules that facilitate the development of Perl scripts for

bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

applications. It has played an integral role in the

Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...

Background

BioPerl is an active

open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...

software project supported by the

Open Bioinformatics Foundation The Open Bioinformatics Foundation is a non-profit, volunteer-run organization focused on supporting open source programming in bioinformatics. The mission of the foundation is to support the development of open source toolkits for bioinformatics, ...

. The first set of Perl codes of BioPerl was created by Tim Hubbard and Jong Bhak at MRC Centre Cambridge, where the first genome sequencing was carried out by

Fred Sanger Frederick Sanger (; 13 August 1918 – 19 November 2013) was a British biochemist who received the Nobel Prize in Chemistry twice. He won the 1958 Chemistry Prize for determining the amino acid sequence of insulin and numerous other prot ...

. MRC Centre was one of the hubs and birthplaces of modern bioinformatics as it had a large quantity of DNA sequences and 3D protein structures. Hubbard was using the th_lib.pl Perl library, which contained many useful Perl subroutines for bioinformatics. Bhak, Hubbard's first PhD student, created jong_lib.pl. Bhak merged the two Perl subroutine libraries into Bio.pl. The name BioPerl was coined jointly by Bhak and Steven Brenner at the Centre for Protein Engineering (CPE). In 1995, Brenner organized a BioPerl session at the

Intelligent Systems for Molecular Biology Intelligent Systems for Molecular Biology (ISMB) is an annual academic conference on the subjects of bioinformatics and computational biology organised by the International Society for Computational Biology (ISCB). The principal focus of the con ...

conference, held in Cambridge. BioPerl had some users in coming months including Georg Fuellen who organized a training course in Germany. Fuellen's colleagues and students greatly extended BioPerl; this was further expanded by others, including Steve Chervitz who was actively developing Perl codes for his yeast genome database. The major expansion came when Cambridge student Ewan Birney joined the development team. The first stable release was on 11 June 2002; the most recent stable (in terms of API) release is 1.7.2 from 7 September 2017. There are also developer releases produced periodically. Version series 1.7.x is considered to be the most stable (in terms of bugs) version of BioPerl and is recommended for everyday use. In order to take advantage of BioPerl, the user needs a basic understanding of the Perl programming language including an understanding of how to use Perl references, modules, objects, and methods.

Features and examples

BioPerl provides software modules for many of the typical tasks of bioinformatics programming. These include: * Accessing

nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...

and

peptide Peptides are short chains of amino acids linked by peptide bonds. A polypeptide is a longer, continuous, unbranched peptide chain. Polypeptides that have a molecular mass of 10,000 Da or more are called proteins. Chains of fewer than twenty am ...

sequence data from local and remote

databases In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...

Example of accessing GenBank to retrieve a sequence: * Transforming formats of database/ file records Example code for transforming formats * Manipulating individual sequences Example of gathering statistics for a given sequence * Searching for similar sequences * Creating and manipulating

sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...

s * Searching for

gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...

s and other structures on

genomic Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...

DNA * Developing machine-readable sequence

annotations An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For anno ...

Usage

In addition to being used directly by end-users, BioPerl has also provided the base for a wide variety of bioinformatic tools, includin
amongst others
* SynBrowse * GeneComber * TFBS * MIMOX * BioParser * Degenerate primer design * Querying the public databases * Current Comparative Table New tools and algorithms from external developers are often integrated directly into BioPerl itself: * Dealing with phylogenetic trees and nested taxa * FPC Web tools

Advantages

BioPerl was one of the first biological module repositories that increased its usability. It has very easy to install modules, along with a flexible global repository. BioPerl uses good test modules for a large variety of processes.

Disadvantages

There are many ways to use BioPerl, from simple scripting to very complex object programming. This makes the language not clear and sometimes hard to understand. For as many modules that BioPerl has, some do not always work the way they are intended.

Related libraries in other programming languages

Several related bioinformatics libraries implemented in other programming languages exist as part of the

, including: *

Biopython Biopython is an Open-source software, open-source collection of non-commercial Python (programming language), Python tools for computational biology and bioinformatics.Refer to the Biopython website for othepapers describing Biopython and a list ...

BioJava BioJava is an open-source software project dedicated to providing Java tools for processing biological data.VS Matha and P Kangueane, 2009, ''Bioinformatics: a concept-based introduction'', 2009. p26 BioJava is a set of library functions written i ...

* BioRuby * BioPHP * BioJS * Bioconductor

References

{{Perl Perl software Free bioinformatics software Bioinformatics software