HOME

TheInfoList



OR:

Bioconductor is a free,
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
and open development software project for the analysis and comprehension of
genomic Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
data generated by
wet lab A wet lab, or experimental lab, is a type of laboratory where it is necessary to handle various types of chemicals and potential "wet" hazards, so the room has to be carefully designed, constructed, and controlled to avoid spillage and contaminatio ...
experiments in
molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
. Bioconductor is based primarily on the
statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
R programming language R is a programming language for statistical computing and data visualization. It has been widely adopted in the fields of data mining, bioinformatics, data analysis, and data science. The core R language is extended by a large number of so ...
, but does contain contributions in other programming languages. It has two
releases Release may refer to: * Art release, the public distribution of an artistic production, such as a film, album, or song * Legal release, a legal instrument * News release, a communication directed at the news media * Release (ISUP), a code to iden ...
each year that follow the semiannual releases of R. At any one time there is a
release version Release may refer to: * Art release, the public distribution of an artistic production, such as a film, album, or song * Legal release, a legal instrument * News release, a communication directed at the news media * Release (ISUP), a code to iden ...
, which corresponds to the released version of R, and a development version, which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are many genome annotation packages available that are mainly, but not solely, oriented towards different types of
microarray A microarray is a multiplex (assay), multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a Substrate (materials science), solid substrate—usu ...
s. The project was started in the Fall of 2001 and is overseen by the Bioconductor core team, based primarily at the
Fred Hutchinson Cancer Research Center The Fred Hutchinson Cancer Center, formerly known as the Fred Hutchinson Cancer Research Center and also known as Fred Hutch or The Hutch, is a cancer research institute established in 1975 in Seattle, Washington (state), Washington. History ...
, with other members coming from international institutions.


Packages

Most Bioconductor components are distributed as R packages, which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel
Affymetrix Affymetrix is now Applied Biosystems, a brand of DNA microarray products sold by Thermo Fisher Scientific that originated with an American biotechnology research and development and manufacturing company of the same name. The Santa Clara, Calif ...
and two or more channel
cDNA In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
/ Oligo microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE,
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
, or SNP data.


Goals

The broad goals of the projects are to: * Provide widespread access to a broad range of powerful
statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
graphical Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of the data, as in design and manufactu ...
methods for the analysis of genomic data. * Facilitate the inclusion of biological metadata in the analysis of genomic data, e.g. literature data from
PubMed PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...
, annotation data from LocusLink/
Entrez The Entrez () Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCB ...
. * Provide a common
software platform A computing platform, digital platform, or software platform is the infrastructure on which software is executed. While the individual components of a computing platform may be obfuscated under layers of abstraction, the ''summation of the requi ...
that enables the rapid
development Development or developing may refer to: Arts *Development (music), the process by which thematic material is reshaped * Photographic development *Filmmaking, development phase, including finance and budgeting * Development hell, when a proje ...
and
deployment Deployment may refer to: * Military deployment, the movement of armed forces and their logistical support * Software deployment, all of the activities that make a software system available for use * System deployment The deployment of a mecha ...
of plug-able,
scalable Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system. In an economic context, a scalable business model implies that ...
, and
interoperable Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader de ...
software. * Further scientific understanding by producing high-quality documentation and reproducible research. * Train researchers on computational and statistical methods for the analysis of genomic data.


Main features

* Documentation and reproducible research. Each Bioconductor package contains at least one vignette, which is a document that provides a textual, task-oriented description of the package's functionality. These vignettes come in several forms. Many are simple "
How-to How to or how-to (among other spellings) may refer to: * A user guide ** An owner's manual, more narrowly * A tutorial ** Especially, instructional material created for the do it yourself market In titles of specific works * How to... (film serie ...
"s that are designed to demonstrate how a particular task can be accomplished with that package's software. Others provide a more thorough overview of the package or might even discuss general issues related to the package. In the future, the Bioconductor project is looking towards providing vignettes that are not specifically tied to a package, but rather are demonstrating more complex concepts. As with all aspects of the Bioconductor project, users are encouraged to participate in this effort. * Statistical and graphical methods. The Bioconductor project aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. Analysis packages are available for: pre-processing
Affymetrix Affymetrix is now Applied Biosystems, a brand of DNA microarray products sold by Thermo Fisher Scientific that originated with an American biotechnology research and development and manufacturing company of the same name. The Santa Clara, Calif ...
and Illumina,
cDNA In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
array data; identifying differentially expressed genes; graph theoretical analyses; plotting genomic data. In addition, the R package system itself provides implementations for a broad range of state-of-the-art
statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
graphical Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of the data, as in design and manufactu ...
techniques, including
linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...
and
non-linear In mathematics and science, a nonlinear system (or a non-linear system) is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathe ...
modeling,
cluster analysis Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...
,
prediction A prediction (Latin ''præ-'', "before," and ''dictum'', "something said") or forecast is a statement about a future event or about future data. Predictions are often, but not always, based upon experience or knowledge of forecasters. There ...
, resampling,
survival analysis Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory, reliability analysis ...
, and
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
analysis. * Genome annotation. The Bioconductor project provides software for associating microarray and other genomic data in real time to biological metadata from web databases such as
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
, LocusLink and
PubMed PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...
(annotate package). Functions are also provided for incorporating the results of statistical analysis in HTML reports with links to annotation WWW resources. Software tools are available for assembling and processing genomic annotation data, from databases such as
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
, the
Gene Ontology Consortium The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and ...
, LocusLink,
UniGene UniGene was a NCBI database of the transcriptome and thus, despite the name, not primarily a database for genes. Each entry is a set of transcripts that appear to stem from the same transcription locus (i.e. gene or expressed pseudogene). Info ...
, the UCSC Human Genome Project and others with the AnnotationDbi package. Data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, LocusLink,
PubMed PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...
). Customized annotation libraries can also be assembled.This project also contain several functions for genomic analysis and phylogenetic (e.g
ggtree
packages ..). *
Open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
. The Bioconductor project has a commitment to full open source discipline, with distribution via a SourceForge.net-like platform. All contributions are expected to exist under an
open source license Open-source licenses are software licenses that allow content to be used, modified, and shared. They facilitate free and open-source software (FOSS) development. Intellectual property (IP) laws restrict the modification and sharing of creative ...
such as Artistic 2.0, GPL2, or
BSD The Berkeley Software Distribution (BSD), also known as Berkeley Unix or BSD Unix, is a discontinued Unix operating system developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berkeley, beginni ...
. There are many different reasons why open-source software is beneficial to the analysis of microarray data and to
computational biology Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and Computer simulation, computational simulations to understand biological systems and relationships. An intersection of computer sci ...
in general. The reasons include: ** To provide full access to
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s and their implementation ** To facilitate software improvements through bug fixing and plug-ins ** To encourage good scientific computing and statistical practice by providing appropriate tools and instruction ** To provide a workbench of tools that allow researchers to explore and expand the methods used to analyze biological data ** To ensure that the international
scientific community The scientific community is a diverse network of interacting scientists. It includes many "working group, sub-communities" working on particular scientific fields, and within particular institutions; interdisciplinary and cross-institutional acti ...
is the owner of the
software tools A programming tool or software development tool is a computer program that is used to software development, develop another computer program, usually by helping the developer manage computer files. For example, a programmer may use a tool called ...
needed to carry out research ** To lead and encourage commercial support and development of those tools that are successful ** To promote reproducible research by providing open and accessible tools with which to carry out that research (reproducible research is distinct from independent verification) * Open development.
Users Ancient Egyptian roles * User (ancient Egyptian official), an ancient Egyptian nomarch (governor) of the Eighth Dynasty * Useramen, an ancient Egyptian vizier also called "User" Other uses * User (computing), a person (or software) using an ...
are encouraged to become developers, either by contributing Bioconductor compliant packages or documentation. Additionally Bioconductor provides a mechanism for linking together different groups with common goals to foster
collaboration Collaboration (from Latin ''com-'' "with" + ''laborare'' "to labor", "to work") is the process of two or more people, entities or organizations working together to complete a task or achieve a goal. Collaboration is similar to cooperation. The ...
on software, possibly at the level of shared development.


Milestones

Each release of Bioconductor is developed to work best with a chosen version of R. In addition to bugfixes and updates, a new release typically adds packages. The table below maps a Bioconductor release to a R version and shows the number of available Bioconductor software packages for that release.


Resources

* * * *


See also

*
Computational biology Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and Computer simulation, computational simulations to understand biological systems and relationships. An intersection of computer sci ...
*
Bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
* List of open source bioinformatics software *
List of sequence alignment software This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins. Database searc ...
*
R (programming language) R is a programming language for statistical computing and Data and information visualization, data visualization. It has been widely adopted in the fields of data mining, bioinformatics, data analysis, and data science. The core R language is ...
*
DNA microarray A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or t ...
*
Affymetrix Affymetrix is now Applied Biosystems, a brand of DNA microarray products sold by Thermo Fisher Scientific that originated with an American biotechnology research and development and manufacturing company of the same name. The Santa Clara, Calif ...
, a microarray technology platform


References


External links

*
The R Project
GNU GNU ( ) is an extensive collection of free software (394 packages ), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operating systems popu ...
R is a programming language for statistical computing.
Bioconductor Releases
* The community of the
Debian GNU/Linux Debian () is a free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kernel, and is the basis of many ot ...
distribution strives towards a
automated building of BioConductor packages
{{Webarchive, url=https://web.archive.org/web/20070811135011/http://wiki.debian.org/AliothPkgBioc , date=2007-08-11 for their distribution
BioKnoppix
an

are projects extending
Knoppix Knoppix, stylized KNOPPIX ( ), is an operating system based on Debian designed to be run directly from a CD or DVD (Live CD) or a USB flash drive ( Live USB). It was first released in 2000 by German Linux consultant Klaus Knopper, and was one ...
that have contributed bootable
Debian GNU/Linux Debian () is a free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kernel, and is the basis of many ot ...
CDs providing BioConductor installations. Free bioinformatics software Free R (programming language) software Science software for macOS Science software for Windows Science software for Linux