GenoCAD is one of the earliest
computer assisted design
Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve c ...
tools for
synthetic biology
Synthetic biology (SynBio) is a multidisciplinary field of science that focuses on living systems and organisms. It applies engineering principles to develop new biological parts, devices, and systems or to redesign existing systems found in nat ...
.
The software is a bioinformatics tool developed and maintained by GenoFAB, Inc.. GenoCAD facilitates the design of protein expression vectors, artificial gene networks and other genetic constructs for
genetic engineering
Genetic engineering, also called genetic modification or genetic manipulation, is the modification and manipulation of an organism's genes using technology. It is a set of Genetic engineering techniques, technologies used to change the genet ...
and is based on the theory of
formal languages
In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet".
The alphabet of a formal language consists of symbol ...
.
History
GenoCAD originated as an offshoot of an attempt to formalize functional constraints of genetic constructs using the theory of
formal languages
In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet".
The alphabet of a formal language consists of symbol ...
. In 2007, the website genocad.org (now retired) was set up as a proof of concept by researchers at
Virginia Bioinformatics Institute
The Biocomplexity Institute of Virginia Tech (formerly the Virginia Bioinformatics Institute) was a research institute specializing in bioinformatics, computational biology, and systems biology. The institute had more than 250 personnel, includi ...
,
Virginia Tech
The Virginia Polytechnic Institute and State University, commonly referred to as Virginia Tech (VT), is a Public university, public Land-grant college, land-grant research university with its main campus in Blacksburg, Virginia, United States ...
. Using the website, users could design genes by repeatedly replacing high-level genetic constructs with lower level genetic constructs, and eventually with actual
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequences.
On August 31, 2009, the
National Science Foundation
The U.S. National Science Foundation (NSF) is an Independent agencies of the United States government#Examples of independent agencies, independent agency of the Federal government of the United States, United States federal government that su ...
granted a three-year $1,421,725 grant to Dr. Jean Peccoud, an associate professor at the
Virginia Bioinformatics Institute
The Biocomplexity Institute of Virginia Tech (formerly the Virginia Bioinformatics Institute) was a research institute specializing in bioinformatics, computational biology, and systems biology. The institute had more than 250 personnel, includi ...
at
Virginia Tech
The Virginia Polytechnic Institute and State University, commonly referred to as Virginia Tech (VT), is a Public university, public Land-grant college, land-grant research university with its main campus in Blacksburg, Virginia, United States ...
, for the development of GenoCAD. GenoCAD was and continues to be developed b
GenoFAB, Inc. a company founded by Peccoud (currently
CSO and acting
CEO
A chief executive officer (CEO), also known as a chief executive or managing director, is the top-ranking corporate officer charged with the management of an organization, usually a company or a nonprofit organization.
CEOs find roles in variou ...
), who was also one of the authors of the originating study.
Source code for GenoCAD was originally released on
SourceForge
SourceForge is a web service founded by Geoffrey B. Jeffery, Tim Perdue, and Drew Streib in November 1999. SourceForge provides a centralized software discovery platform, including an online platform for managing and hosting open-source soft ...
in December 2009.
GenoCAD version 2.0 was released in November 2011 and included the ability to simulate the behavior of the designed genetic code. This feature was a result of a collaboration with the team behind
COPASI
COPASI (COmplex PAthway SImulator) is an open-source software application for creating and solving mathematical models of biological processes such as metabolic networks, cell-signaling pathways, regulatory networks, infectious diseases, and man ...
.
In April, 2015, Peccoud and colleagues published a library of biological parts, called GenoLIB, that can be incorporated into the GenoCAD platform.
Goals
The four aims of the project are to develop a:
#computer language to represent the structure of synthetic DNA molecules used in
E.coli
''Escherichia coli'' ( )Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. is a gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus ''Escherichia'' that is commonly foun ...
,
yeast
Yeasts are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom (biology), kingdom. The first yeast originated hundreds of millions of years ago, and at least 1,500 species are currently recognized. They are est ...
,
mice
A mouse (: mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus' ...
, and
Arabidopsis thaliana
''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small plant from the mustard family (Brassicaceae), native to Eurasia and Africa. Commonly found along the shoulders of roads and in disturbed land, it is generally ...
cells
#compiler capable of translating DNA sequences into mathematical models in order to predict the encoded phenotype
#collaborative workflow environment which allow to share parts, designs, fabrication resource
#means to forward the results to the user community through an external advisory board, an annual user conference, and outreach to industry
Features
The main features of GenoCAD can be organized into three main categories.
[
]
* Management of genetic sequences: The purpose of this group of features is to help users identify, within large collections of genetic parts, the parts needed for a project and to organize them in project-specific libraries.
**''Genetic parts'': Parts have a unique identifier, a name and a more general description. They also have a
DNA sequence
A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nu ...
. Parts are associated with a
grammar
In linguistics, grammar is the set of rules for how a natural language is structured, as demonstrated by its speakers or writers. Grammar rules may concern the use of clauses, phrases, and words. The term may also refer to the study of such rul ...
and assigned to a parts category such a
promoter,
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
, etc.
** ''Parts libraries'': Collections of parts are organized in libraries. In some cases part libraries correspond to parts imported from a single source such as another
sequence database
In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized ("Digital data, digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a ...
. In other cases, libraries correspond to the parts used for a particular design project. Parts can be moved from one library to another through a temporary storage area called the cart (analogous to e-commerce shopping carts).
** ''Searching parts'': Users can search the parts database using the
Lucene
Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a ...
search engine. Basic and advanced search modes are available. Users can develop complex queries and save them for future reuse.
** ''Importing/Exporting parts'': Parts can be imported and exported individually or as entire libraries using standard file formats (e.g.,
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
,
tab delimited
Tab-separated values (TSV) is a simple, text-based file format for storing tabular data. Records are separated by newlines, and values within a record are separated by tab characters. The TSV format is thus a delimiter-separated values format, ...
,
FASTA
FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.
History
The original FASTA program ...
,
SBML
The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of us ...
).
* Combining sequences into genetic constructs: The purpose of this group of features is to streamline the process of combining genetic parts into designs compliant with a specific design strategy.
** ''Point-and-click design tool'': This
wizard guides the user through a series of design decisions that determine the design structure and the selection of parts included in the design.
** ''Design management'': Designs can be saved in the user
workspace
Workspace is a term used in various branches of engineering and economic development.
Business development
Workspace refers to small premises provided, often by local authorities or economic development agencies, to help new businesses to establ ...
. Design statuses are regularly updated to warn users of the consequences of editing parts on previously saved designs.
** ''Exporting designs'': Designs can be exported using standard file formats (e.g.,
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
,
tab delimited
Tab-separated values (TSV) is a simple, text-based file format for storing tabular data. Records are separated by newlines, and values within a record are separated by tab characters. The TSV format is thus a delimiter-separated values format, ...
,
FASTA
FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.
History
The original FASTA program ...
).
** ''Design safety'': Designs are protected from some types of errors by forcing the user to follow the appropriate design strategy.
** ''Simulation'': Sequences designed in GenoCAD can be simulated to display chemical production in the resulting cell.
* User workspace: Users can personalize their
workspace
Workspace is a term used in various branches of engineering and economic development.
Business development
Workspace refers to small premises provided, often by local authorities or economic development agencies, to help new businesses to establ ...
by adding parts to the GenoCAD database, creating specialized libraries corresponding to specific design projects, and saving designs at different stages of development.
Theoretical foundation
GenoCAD is rooted in the theory of
formal languages
In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet".
The alphabet of a formal language consists of symbol ...
; in particular, the design rules describing how to combine different kinds of parts and form
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
s.
[
]
A context free grammar can be defined by its terminals, variables, start variable and substitution rules.
In GenoCAD, the terminals of the grammar are sequences of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
that perform a particular biological purpose (e.g. a
promoter). The variables are less homogeneous: they can represent longer sequences that have multiple functions or can represent a section of DNA that can contain one of multiple different sequences of DNA but perform the same function (e.g. a variable represents the set of promoters). GenoCAD includes built in substitution rules to ensure that the DNA sequence is biologically viable. Users can also define their own sets of rules for other purposes.
Designing a sequence of DNA in GenoCAD is much like creating a derivation in a context free grammar. The user starts with the start variable and repeatedly selects a variable and a substitution for it until only terminals are left.
Alternatives
The most common alternatives to GenoCAD are Proto, GEC and EuGene
References
{{Reflist
External links
GenoCAD.comProject pageon
SourceForge
SourceForge is a web service founded by Geoffrey B. Jeffery, Tim Perdue, and Drew Streib in November 1999. SourceForge provides a centralized software discovery platform, including an online platform for managing and hosting open-source soft ...
Tutorials and FAQsPeccoud Lab
Synthetic biology
Free bioinformatics software
Systems biology
Biotechnology