A taxonomic database is a
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
created to hold information on biological
taxa
In biology, a taxon (back-formation from ''taxonomy''; : taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular name and ...
– for example groups of organisms organized by
species name
In taxonomy, binomial nomenclature ("two-term naming system"), also called binary nomenclature, is a formal system of naming species of living things by giving each a name composed of two parts, both of which use Latin grammatical forms, altho ...
or other taxonomic identifier – for efficient
data management
Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization's data so it can be analyzed for decision making.
Concept
The concept of data management emerged alongsi ...
and
information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
. Taxonomic databases are routinely used for the automated construction of biological checklists such as
floras
Flora (: floras or florae) is all the plant life present in a particular region or time, generally the naturally occurring ( indigenous) native plants. The corresponding term for animals is ''fauna'', and for fungi, it is ''funga''. Sometimes b ...
and
faunas, both for print publication and online; to underpin the operation of web-based species information systems; as a part of biological collection management (for example in
museum
A museum is an institution dedicated to displaying or Preservation (library and archive), preserving culturally or scientifically significant objects. Many museums have exhibitions of these objects on public display, and some have private colle ...
s and
herbaria
A herbarium (plural: herbaria) is a collection of preserved plant specimens and associated data used for scientific study.
The specimens may be whole plants or plant parts; these will usually be in dried form mounted on a sheet of paper (called ...
); as well as providing, in some cases, the taxon management component of broader science or biology information systems. They are also a fundamental contribution to the discipline of
biodiversity informatics Biodiversity informatics is the application of informatics (academic field), informatics techniques to biodiversity information, such as Taxonomy (biology), taxonomy, biogeography or ecology. It is defined as the application of information technolog ...
.
Goals
Taxonomic databases digitize scientific biodiversity data and provide access to taxonomic data for research.
Taxonomic databases vary in breadth of the groups of taxa and geographical space they seek to include, for example: beetles in a defined region, mammals globally, or all described taxa in the tree of life.
A taxonomic database may incorporate organism identifiers (scientific name, author, and – for zoological taxa – year of original publication), synonyms, taxonomic opinions, literature sources or citations, illustrations or photographs, and biological attributes for each taxon (such as geographic distribution, ecology, descriptive information, threatened or vulnerable status, etc.).
Some databases, such as the
Global Biodiversity Information Facility
The Global Biodiversity Information Facility (GBIF) is an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around th ...
(GBIF) database and the
Barcode of Life Data System
The Barcode of Life Data System (commonly known as BOLD or BOLDSystems) is a web platform specifically devoted to DNA barcoding. It is a cloud-based data storage and analysis platform developed at the Centre for Biodiversity Genomics in Canada. It ...
, store the
DNA barcode of a taxon if one exists (also called the Barcode Index Number (BIN) which may be assigned, for example, by the International Barcode of Life project (iBOL) or UNITE, a database for
fungal DNA barcoding).
A taxonomic database aims to accurately model the characteristics of interest that are relevant to the organisms which are in scope for the intended coverage and usage of the system.
For example, databases of
fungi
A fungus (: fungi , , , or ; or funguses) is any member of the group of eukaryotic organisms that includes microorganisms such as yeasts and mold (fungus), molds, as well as the more familiar mushrooms. These organisms are classified as one ...
,
algae
Algae ( , ; : alga ) is an informal term for any organisms of a large and diverse group of photosynthesis, photosynthetic organisms that are not plants, and includes species from multiple distinct clades. Such organisms range from unicellular ...
,
bryophytes
Bryophytes () are a group of land plants ( embryophytes), sometimes treated as a taxonomic division referred to as Bryophyta '' sensu lato'', that contains three groups of non-vascular land plants: the liverworts, hornworts, and mosses. In t ...
and
vascular plants ("higher plants") encode conventions from the
International Code of Botanical Nomenclature
The ''International Code of Nomenclature for algae, fungi, and plants'' (ICN or ICNafp) is the set of rules and recommendations dealing with the formal botanical names that are given to plants, fungi and a few other groups of organisms, all tho ...
while their counterparts for
animals
Animals are multicellular, eukaryotic organisms in the biological kingdom Animalia (). With few exceptions, animals consume organic material, breathe oxygen, have myocytes and are able to move, can reproduce sexually, and grow from a ...
and most
protists
A protist ( ) or protoctist is any Eukaryote, eukaryotic organism that is not an animal, Embryophyte, land plant, or fungus. Protists do not form a Clade, natural group, or clade, but are a Paraphyly, paraphyletic grouping of all descendants o ...
encode equivalent rules from the
International Code of Zoological Nomenclature
The International Code of Zoological Nomenclature (ICZN) is a widely accepted Convention (norm), convention in zoology that rules the formal scientific name, scientific naming of organisms treated as animals. It is also informally known as the I ...
. Modelling the relevant
taxonomic hierarchy for any taxon is a natural fit with the
relational model employed in almost all database systems. Scientific consensus is not reached for all taxon groups, and new species continue to be described; therefore, another goal of taxonomic databases is to aid in resolving conflicts of scientific opinion and unify taxonomy.
History
Possibly the earliest documented management of taxonomic information in computerised form comprised the taxonomic coding system developed by Richard Swartz et al. at the Virginia Institute of Marine Science for the Biota of Chesapeake Bay and described in a published report in 1972.
This work led directly or indirectly to other projects with greater profile including the NODC Taxonomic Code system
which went through 8 versions before being discontinued in 1996, to be subsumed and transformed into the still current
Integrated Taxonomic Information System
The Integrated Taxonomic Information System (ITIS) is an American partnership of federal agencies designed to provide consistent and reliable information on the taxonomy of biological species. ITIS was originally formed in 1996 as an interagenc ...
(ITIS). A number of other taxonomic databases specializing in particular groups of organisms that appeared in the 1970s through to the present jointly contribute to the Species 2000 project, which since 2001 has been partnering with ITIS to produce a combined product, the
Catalogue of Life
The Catalogue of Life (CoL) is an online database that provides an index of known species of animals, plants, fungi, and microorganisms. It was created in 2001 as a partnership between the global Species 2000 and the American Integrated Taxono ...
. While the Catalogue of Life currently concentrates on assembling basic name information as a global species checklist, numerous other taxonomic database projects such as
Fauna Europaea
Fauna Europaea is a database of the scientific names and distribution of all living multicellular European land and fresh-water animals. It serves as a standard taxonomic source for animal taxonomy within the Pan-European Species directories Infr ...
, the Australian Faunal Directory,
and more supply rich ancillary information including descriptions, illustrations, maps, and more. Many taxonomic database projects are currently listed at the TDWG "Biodiversity Information Projects of the World" site.
Issues
The representation of taxonomic information in machine-encodable form raises a number of issues not encountered in other domains, such as variant ways to cite the same species or other taxon name, the same name used for multiple taxa (
homonyms
In linguistics, homonyms are words which are either; ''homographs''—words that mean different things, but have the same spelling (regardless of pronunciation), or ''homophones''—words that mean different things, but have the same pronunciatio ...
), multiple non-current names for the same taxon (
synonyms
A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
), changes in name and taxon concept definition through time, and more.
Non-standardized categories and metadata in taxonomic databases hampers the ability for researchers to analyze the data.
One forum that has promoted discussion and possible solutions to these and related problems since 1985 is the
Biodiversity Information Standards (TDWG), originally called the Taxonomic Database Working Group.
While online databases have great benefits (for example, increased access to taxonomic information), they also have issues such as data integrity risks due to on- and off-line versions and continuous updates, technical access issues due to server or internet outage, and differing capacities for complex queries to extract taxonomic data into lists.
As the quantity of information in online taxonomic databases rapidly expands, data aggregation, and the integration and alignment of non-standardized data across databases, is a big challenge in taxonomy and biodiversity informatics.
See also
*
List of biodiversity databases
This is a list of biodiversity databases. Biodiversity databases store taxonomic information alone or more commonly also other information like distribution (spatial) data and ecological data, which provide information on the biodiversity of a pa ...
*
Biological classification
In biology, taxonomy () is the scientific study of naming, defining ( circumscribing) and classifying groups of biological organisms based on shared characteristics. Organisms are grouped into taxa (singular: taxon), and these groups are give ...
*
Darwin Core, a body of standards for sharing machine-readable taxonomic data on biodiversity
*
Pan-European Species directories Infrastructure
The Pan-European Species-directories Infrastructure (PESI) provides a mechanism to deliver an integrated, annotated checklist of the species occurring in Europe, aiming to cover the Western Palearctic biogeographic region. PESI integrates the ef ...
References
{{reflist
*
Catalogues
Information science
Database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...