Berkeley DB (BDB) is an
unmaintained embedded database
An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:
* database systems with differing a ...
software
Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work.
...
library
A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vi ...
for
key/value data, historically significant in
open source software
Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
. Berkeley DB is written in
C with API bindings for many other
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
s. BDB stores arbitrary key/data pairs as byte arrays, and supports multiple data items for a single key. Berkeley DB is not a
relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
, although it has advanced database features including
database transaction
A database transaction symbolizes a unit of work, performed within a database management system (or similar system) against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally repr ...
s,
multiversion concurrency control
Multiversion concurrency control (MCC or MVCC), is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.
Descriptio ...
and
write-ahead logging
In computer science, write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems. A write ahead log is an append-only auxiliary disk-resident structure used for crash ...
. BDB runs on a wide variety of
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s including most
Unix-like
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
and
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
systems, and
real-time operating system
A real-time operating system (RTOS) is an operating system (OS) for real-time applications that processes data and events that have critically defined time constraints. An RTOS is distinct from a time-sharing operating system, such as Unix, which ...
s.
BDB was commercially supported and developed by
Sleepycat Software from 1996 to 2006. Sleepycat Software was acquired by
Oracle Corporation
Oracle Corporation is an American multinational computer technology corporation headquartered in Austin, Texas. In 2020, Oracle was the third-largest software company in the world by revenue and market capitalization. The company sells da ...
in February 2006, who continued to develop and sell the C Berkeley DB library. In 2013 Oracle re-licensed BDB under the
AGPL license. As of 2022 Oracle has ceased to develop BDB.
Bloomberg LP
Bloomberg L.P. is a privately held financial, software, data, and media company headquartered in Midtown Manhattan, New York City. It was co-founded by Michael Bloomberg in 1981, with Thomas Secunda, Duncan MacMillan, Charles Zegar, and a ...
continues to develop a
fork
In cutlery or kitchenware, a fork (from la, furca ' pitchfork') is a utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tines with which one can spear foods ...
of the 2013 version of BDB within their
Comdb2
Comdb2 is an open source, highly available clustered RDBMS developed by Bloomberg LP, built on optimistic concurrency control techniques. It provides multiple isolation levels, including Snapshot and Serializable Isolation. Read/Write transactio ...
database, under the original
Sleepycat permissive license.
Origin
Berkeley DB originated at the
University of California, Berkeley
The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public land-grant research university in Berkeley, California. Established in 1868 as the University of California, it is the state's first land-grant u ...
as part of
BSD
The Berkeley Software Distribution or Berkeley Standard Distribution (BSD) is a discontinued operating system based on Research Unix, developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Be ...
, Berkeley's version of the
Unix
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
operating system. After 4.3BSD (1986), the BSD developers attempted to remove or replace all code originating in the original
AT&T
AT&T Inc. is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the world's largest telecommunications company by revenue and the third largest provider of mobile tel ...
Unix from which BSD was derived. In doing so, they needed to rewrite the Unix database package.
Seltzer and Yigit created a new database, unencumbered by any AT&T patents: an on-disk
hash table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
that outperformed the existing
dbm libraries. Berkeley DB itself was first released in 1991 and later included with 4.4BSD. In 1996
Netscape
Netscape Communications Corporation (originally Mosaic Communications Corporation) was an American independent computer services company with headquarters in Mountain View, California and then Dulles, Virginia. Its Netscape web browser was on ...
requested that the authors of Berkeley DB improve and extend the library, then at version 1.86, to suit Netscape's requirements for an
LDAP
The Lightweight Directory Access Protocol (LDAP ) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Directory serv ...
server and for use in the
Netscape browser. That request led to the creation of
Sleepycat Software. This company was acquired by
Oracle Corporation
Oracle Corporation is an American multinational computer technology corporation headquartered in Austin, Texas. In 2020, Oracle was the third-largest software company in the world by revenue and market capitalization. The company sells da ...
in February 2006.
Since its initial release, Berkeley DB has gone through various versions. Each major release cycle has introduced a single new major feature generally layering on top of the earlier features to add functionality to the product. The 1.x releases focused on managing key/value data storage and are referred to as "Data Store" (DS). The 2.x releases added a locking system enabling concurrent access to data. This is what is known as "Concurrent Data Store" (CDS). The 3.x releases added a logging system for transactions and recovery, called "Transactional Data Store" (TDS). The 4.x releases added the ability to replicate log records and create a distributed highly available single-master multi-replica database. This is called the "High Availability" (HA) feature set. Berkeley DB's evolution has sometimes led to minor API changes or log format changes, but very rarely have database formats changed. Berkeley DB HA supports online upgrades from one version to the next by maintaining the ability to read and apply the prior release's log records.
The
FreeBSD
FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
and
OpenBSD
OpenBSD is a security-focused operating system, security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking N ...
operating systems continue to use Berkeley DB 1.8x for compatibility reasons; Linux-based operating systems commonly include several versions to accommodate for applications still using older interfaces/files.
Starting with the 6.0.21 (Oracle 12c) release, all Berkeley DB products are licensed under the
GNU AGPL
The GNU Affero General Public License (GNU AGPL) is a free, copyleft license published by the Free Software Foundation in November 2007, and based on the GNU General Public License, version 3 and the Affero General Public License.
The Free So ...
. Previously, Berkeley DB was redistributed under the 4-clause
BSD license
BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD li ...
(before version 2.0), and the Sleepycat Public License, which is an
OSI-approved
open-source license
An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared under defined terms and conditions. This allows end users and commercial compan ...
as well as an
FSF-approved
free software license
A free-software license is a notice that grants the recipient of a piece of software extensive rights to modify and software distribution, redistribute that software. These actions are usually prohibited by copyright law, but the rights-holde ...
. The product ships with complete source code, build script, test suite, and documentation. The comprehensive feature along with the licensing terms have led to its use in a multitude of
free and open-source software
Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
. Those who do not wish to abide by the terms of the GNU AGPL, or use an older version with the Sleepycat Public License, have the option of purchasing another
proprietary license
Proprietary software is software that is deemed within the free and open-source software to be non-free because its creator, publisher, or other rightsholder or rightsholder partner exercises a legal monopoly afforded by modern copyright and inte ...
for redistribution from
Oracle Corporation
Oracle Corporation is an American multinational computer technology corporation headquartered in Austin, Texas. In 2020, Oracle was the third-largest software company in the world by revenue and market capitalization. The company sells da ...
. This technique is called
dual licensing
Multi-licensing is the practice of distributing software under two or more different sets of terms and conditions. This may mean multiple different software licenses or sets of licenses. Prefixes may be used to indicate the number of licenses ...
.
Berkeley DB includes compatibility interfaces for some historic Unix database libraries:
dbm, ndbm and hsearch (a
System V
Unix System V (pronounced: "System Five") is one of the first commercial versions of the Unix operating system. It was originally developed by AT&T and first released in 1983. Four major versions of System V were released, numbered 1, 2, 3, an ...
and
POSIX
The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming inte ...
library for creating in-memory
hash table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
s).
Architecture
Berkeley DB has an architecture notably simpler than that of other database systems like
relational database management system
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
s. For example, like
SQLite
SQLite (, ) is a database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it belongs to the family of embedded databases. It is the mo ...
and
LMDB
Lightning Memory-Mapped Database (LMDB) is a software library that provides an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrar ...
, it is not based on a
server/client model, and does not provide support for network access programs access the database using in-process
API calls. Oracle added support for SQL in 11g R2 release based on the popular SQLite API by including a version of SQLite in Berkeley DB (it uses Berkeley DB for storage).
A program accessing the database is free to decide how the data is to be stored in a record. Berkeley DB puts no constraints on the record's data. The record and its key can both be up to four gigabytes long.
Despite having a simple architecture, Berkeley DB supports many advanced database features such as
ACID transactions, fine-grained
locking, hot
backups and
replication
Replication may refer to:
Science
* Replication (scientific method), one of the main principles of the scientific method, a.k.a. reproducibility
** Replication (statistics), the repetition of a test or complete experiment
** Replication crisi ...
.
Oracle Corporation use of name "Berkeley DB"
The name "Berkeley DB" is used by Oracle Corporation for three different products, only one of which is BDB:
# Berkeley DB, the C database library that is the subject of this article
# Berkeley DB Java Edition, a pure Java library whose design is modelled after the C library but is otherwise unrelated
# Berkeley DB XML, a C++ program that supports
XQuery
XQuery (XML Query) is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats (JSON, bi ...
, and which includes a legacy version of the C database library
Open Source Programs still using Berkeley DB
BDB was once very widespread, but usage dropped steeply from 2013 (see
licensing section). Notable software that still uses Berkeley DB for data storage include:
*
Bogofilter
Bogofilter is a mail filter that classifies e-mail as spam or ham (non-spam) by a statistical analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections. It was originally writ ...
– A free/open source
spam filter that saves its wordlists using Berkeley DB by default
*
Citadel
A citadel is the core fortified area of a town or city. It may be a castle, fortress, or fortified center. The term is a diminutive of "city", meaning "little city", because it is a smaller part of the city of which it is the defensive core.
In ...
– A free/open source
groupware platform that keeps all of its data stores, including the message base, in Berkeley DB. Citadel is licensed under the
GPLv3
The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
which is compatible with Oracle BDB licensing
*
Sendmail
Sendmail is a general purpose internetwork email routing facility that supports many kinds of mail-transfer and delivery methods, including the Simple Mail Transfer Protocol (SMTP) used for email transport over the Internet.
A descendant of the ...
– A free/open source
MTA first release in 1983 for Linux/Unix systems and no longer widely used
*
Spamassassin
Apache SpamAssassin is a computer program used for e-mail spam filtering. It uses a variety of spam-detection techniques, including DNS and fuzzy checksum techniques, Bayesian filtering, external programs, blacklists and online databases. It is ...
– A free/open source anti-spam application
Licensing
Berkeley DB V2.0 and higher is available under a
dual license
Multi-licensing is the practice of distributing software under two or more different sets of terms and conditions. This may mean multiple different software licenses or sets of licenses. Prefixes may be used to indicate the number of licenses ...
:
# Oracle commercial license
# The
GNU AGPL v3.
Switching the open source license in 2013 from th
Sleepycat licenseto the AGPL had a major effect on open source software. Since BDB is a library, any application linking to it must be under an AGPL-compatible license. Many open source applications and all closed source applications would need to be relicensed to become AGPL-compatible, which was not acceptable to many developers and open source operating systems. By 2013 there were many alternatives to BDB, and
Debian Linux was typical in their decision to completely phase out Berkeley DB, with a preference for the
Lightning Memory-Mapped Database
Lightning Memory-Mapped Database (LMDB) is a software library that provides an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary ...
(LMDB).
References
External links
Oracle Berkeley DBOracle Berkeley DB DocumentationLicensing pitfalls for Oracle Technology ProductsOracle Licensing Knowledge Net''The Berkeley DB Book'' by Himanshu Yadava
{{DEFAULTSORT:Berkeley Db
Database engines
Database-related software for Linux
Embedded databases
Free database management systems
Free software programmed in C
Key-value databases
NoSQL
Oracle software
Structured storage
Software using the GNU AGPL license