Gdbm
   HOME

TheInfoList



OR:

In computing, a DBM is a
library A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
and
file format A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
providing fast, single-keyed access to data. A key-value database from the original
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
, ''dbm'' is an early example of a
NoSQL NoSQL (originally meaning "Not only SQL" or "non-relational") refers to a type of database design that stores and retrieves data differently from the traditional table-based structure of relational databases. Unlike relational databases, which ...
system.


History

The original ''dbm'' library and file format was a simple
database engine A database engine (or storage engine) is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database. Most database management systems include their own application ...
, originally written by
Ken Thompson Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B (programmi ...
and released by
AT&T AT&T Inc., an abbreviation for its predecessor's former name, the American Telephone and Telegraph Company, is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the w ...
in 1979. The name is a
three-letter acronym A three-letter acronym (TLA), or three-letter abbreviation is, as the phrase suggests, an abbreviation consisting of three letters. The term has a special status among abbreviations and to some is considered humorous since the term ''TLA'' is its ...
for ''DataBase Manager'', and can also refer to the family of database engines with APIs and features derived from the original ''dbm''. The ''dbm'' library stores arbitrary data by use of a single key (a
primary key In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combinati ...
) in fixed-size buckets and uses hashing techniques to enable fast retrieval of the data by key. The hashing scheme used is a form of
extendible hashing Extendible hashing is a type of hash system which treats a hash as a bit string and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation (done one bucket at a time, as needed). T ...
, so that the hashing scheme expands as new buckets are added to the database, meaning that, when nearly empty, the database starts with one bucket, which is then split when it becomes full. The two resulting child buckets will themselves split when they become full, so the database grows as keys are added. The ''dbm'' library and its derivatives are pre-
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s they manage
associative array In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
s, implemented as on-disk
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
s. In practice, they can offer a more practical solution for high-speed storage accessed by key, as they do not require the overhead of connecting and preparing queries. This is balanced by the fact that they can generally only be opened for writing by a single process at a time. An agent
daemon A demon is a malevolent supernatural being, evil spirit or fiend in religion, occultism, literature, fiction, mythology and folklore. Demon, daemon or dæmon may also refer to: Entertainment Fictional entities * Daemon (G.I. Joe), a character ...
can handle requests from multiple processes, but introduces
IPC IPC may refer to: Businesses and organizations Arts and media * Intellectual Property Committee, a coalition of US corporations with intellectual property interests * International Panorama Council, an international network of specialists in ...
overhead.


Implementations

The original AT&T ''dbm'' library has been replaced by its many successor implementations. Notable examples include: * ''ndbm'' ("new dbm"), based on the original dbm with some new features.
GDBM
("GNU dbm"),
GNU GNU ( ) is an extensive collection of free software (394 packages ), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operating systems popu ...
rewrite of the library implementing ''ndbm'' features and its own interface. Also provides new features like crash tolerance for guaranteeing data consistency. * ''sdbm'' ("small dbm"), a
public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
rewrite of ''dbm''. It is a part of the standard distribution for
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
and is available as an external library for
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
. * ''qdbm'' ("Quick Database Manager"), a higher-performance ''dbm'' employing many of the same techniques as Tokyo/Kyoto Cabinet. Written by the same author before they moved on to the cabinets. * ''tdb'' ("Trivial Database"), a simple database used by
Samba Samba () is a broad term for many of the rhythms that compose the better known Brazilian music genres that originated in the Afro-Brazilians, Afro Brazilian communities of Bahia in the late 19th century and early 20th century, It is a name or ...
that supports multiple writers. Has a gdbm-based API. *
Berkeley DB Berkeley DB (BDB) is an embedded database software library for key/value data, historically significant in open-source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitrary key/data ...
, 1991 replacement of ndbm by Sleepycat Software (now
Oracle An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination. Descript ...
) created to get around the AT&T Unix copyright on
BSD The Berkeley Software Distribution (BSD), also known as Berkeley Unix or BSD Unix, is a discontinued Unix operating system developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berkeley, beginni ...
. It features many extensions like parallelism, transactional control, hashing, and B-tree storage. * LMDB:
copy-on-write Copy-on-write (COW), also called implicit sharing or shadowing, is a resource-management technique used in programming to manage shared data efficiently. Instead of copying data right away when multiple programs use it, the same data is shared ...
memory-mapped
B+ tree A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children. A B+ tree can be viewed as a B ...
implementation in C with a Berkeley-style API. The following databases are dbm-inspired, but they do not directly provide a dbm interface, even though it would be trivial to wrap one: * cdb ("constant database"), database by Daniel J. Bernstein, database files can only be created and read, but never be modified * Tkrzw, an Apache 2.0 licensed successor to Kyoto Cabinet and Tokyo Cabinet *
WiredTiger WiredTiger is a NoSQL, open source extensible platform for data management. It is released under version 2 or 3 of the GNU General Public License. WiredTiger uses multiversion concurrency control (MVCC) architecture. Details MongoDB acquired Wired ...
: database with traditional row-oriented and column-oriented storage.


Availability

As of 2001, the ''ndbm'' implementation of DBM was standard on Solaris and IRIX, whereas ''gdbm'' is ubiquitous on
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
. The Berkeley DB implementations were standard on some free operating systems. After a change of licensing of the Berkeley DB to
GNU AGPL The GNU Affero General Public License (GNU AGPL) is a free, copyleft license published by the Free Software Foundation in November 2007, and based on the GNU GPL version 3 and the ''Affero General Public License'' (non-GNU). It is intended for ...
in 2013, projects like
Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
have moved to LMDB.


Reliability

A 2018 AFL
fuzzing In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptio ...
test against many DBM-family databases exposed many problems in implementations when it comes to corrupt or invalid database files. Only ''freecdb'' by Daniel J. Bernstein showed no crashes. The authors of gdbm, tdb, and lmdb were prompt to respond. Berkeley DB fell behind due to the sheer amount of other issues; the fixes would be irrelevant to open-source software users due to the licensing change locking them back on an old version.


See also

*
Embedded database An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application (instead of coming as a standalone application). It is a broad technology category that ...
*
Flat file database A flat-file database is a database stored in a file called a flat file. Records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. The file is simple. A flat file can be a plain t ...
*
ISAM Indexed Sequential Access Method (ISAM) is a method for creating, maintaining, and manipulating computer files of data so that records can be retrieved sequentially or randomly by one or more keys. Indexes of key fields are maintained to achieve ...
*
Key–value database A key–value database, or key–value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a ''dictionary'' or ''hash table''. Dictionaries contain a ...
*
Mobile database Mobile computing devices (e.g., smartphones and Personal digital assistant, PDAs) store and share data over a mobile network, or access a database which is actually stored by the mobile device. This could be a list of contacts, price information, di ...
*
NoSQL NoSQL (originally meaning "Not only SQL" or "non-relational") refers to a type of database design that stores and retrieves data differently from the traditional table-based structure of relational databases. Unlike relational databases, which ...
*
Semaphore (programming) In computer science, a semaphore is a variable or abstract data type used to control access to a common resource by multiple threads and avoid critical section problems in a concurrent system such as a multitasking operating system. Semaphor ...


References


Bibliography

* * *
SDBM library
@
Apache The Apache ( ) are several Southern Athabaskan language-speaking peoples of the Southwestern United States, Southwest, the Southern Plains and Northern Mexico. They are linguistically related to the Navajo. They migrated from the Athabascan ho ...
* *{{cite web , last1=Olson , first1=Michael A. , last2=Bostic , first2=Keith , last3=Seltzer , first3=Margo , year=1999 , title=Berkeley DB , work=Proceedings of the FREENIX Track:1999 USENIX Annual Technical Conference , url=http://www.usenix.org/events/usenix99/full_papers/olson/olson.pdf Database engines Free database management systems Structured storage Embedded databases