HOME

TheInfoList



OR:

A flat-file database is a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
stored in a file called a flat file. Records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. The file is simple. A flat file can be a
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a lim ...
file (e.g. csv, txt or tsv), or a
binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document file ...
. Relationships can be inferred from the data in the database, but the database format itself does not make those relationships explicit. The term has generally implied a small database, but very large databases can also be flat.


Overview

Plain text files usually contain one record per line. There are different conventions for depicting data. In
comma-separated values A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separa ...
and
delimiter-separated values Formats that use delimiter-separated values (also DSV)DSV stands for ''Delimiter Separated Values'' store two-dimensional arrays of data by separating the values in each row with specific delimiter characters. Most database and spreadsheet program ...
files,
field Field may refer to: Expanses of open ground * Field (agriculture), an area of land used for agricultural purposes * Airfield, an aerodrome that lacks the infrastructure of an airport * Battlefield * Lawn, an area of mowed grass * Meadow, a grass ...
s can be separated by delimiters such as
comma The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
or tab characters. In other cases, each field may have a fixed length; short values may be padded with
space character In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an ar ...
s. Extra formatting may be needed to avoid
delimiter collision A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts ...
. Using delimiters incurs some overhead in locating them every time they are processed (unlike fixed-width formatting), which may have performance implications. However, use of character delimiters (especially commas) is also a crude form of
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
which may assist overall performance by reducing data volumes — especially for
data transmission Data transmission and data reception or, more broadly, data communication or digital communications is the transfer and reception of data in the form of a digital bitstream or a digitized analog signal transmitted over a point-to-point or p ...
purposes. Use of character delimiters which include a length component ( Declarative notation) is comparatively rare but vastly reduces the overhead associated with locating the extent of each field. Examples of flat files include /etc/passwd and /etc/group on Unix-like operating systems. Another example of a flat file is a name-and-address list with the fields ''Name'', ''Address'', and ''Phone Number''. A list of names, addresses, and phone numbers written by hand on a sheet of paper is a flat-file database. This can also be done with any
typewriter A typewriter is a mechanical or electromechanical machine for typing characters. Typically, a typewriter has an array of keys, and each one causes a different single character to be produced on paper by striking an inked ribbon selectivel ...
or
word processor A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features. Early word processors were stand-alone devices dedicated to the function, but current ...
. A
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in ce ...
or
text editor A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be u ...
program may be used to implement a flat-file database, which may then be printed or used
online In computer technology and telecommunications, online indicates a state of connectivity and offline indicates a disconnected state. In modern terminology, this usually refers to an Internet connection, but (especially when expressed "on line" or ...
for improved search capabilities.


History

Herman Hollerith Herman Hollerith (February 29, 1860 – November 17, 1929) was a German-American statistician, inventor, and businessman who developed an electromechanical tabulating machine for punched cards to assist in summarizing information and, later, in ...
's work for the
US Census Bureau The United States Census Bureau (USCB), officially the Bureau of the Census, is a principal agency of the Federal Statistical System of the United States, U.S. Federal Statistical System, responsible for producing data about the Americans, Ame ...
first exercised in the
1890 United States Census The United States census of 1890 was taken beginning June 2, 1890, but most of the 1890 census materials were destroyed in 1921 when a building caught fire and in the subsequent disposal of the remaining damaged records. It determined the reside ...
, involving data tabulated via hole punches in paper cards, is sometimes considered the first computerized flat-file database, as it included no cards indexing other cards, or otherwise relating the individual cards to one another, save by their group membership. In the 1980s, configurable flat-file database computer applications were popular on the
IBM PC The IBM Personal Computer (model 5150, commonly known as the IBM PC) is the first microcomputer released in the IBM PC model line and the basis for the IBM PC compatible de facto standard. Released on August 12, 1981, it was created by a team ...
and the
Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and software en ...
. These programs were designed to make it easy for individuals to design and use their own databases, and were almost on par with
word processors A word processor is an electronic device (later a computer software application) for text, composing, editing, formatting, and printing. The word processor was a stand-alone office machine in the 1960s, combining the keyboard text-entry and prin ...
and
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in ce ...
s in popularity. Examples of flat-file database software include early versions of FileMaker and the
shareware Shareware is a type of proprietary software that is initially shared by the owner for trial use at little or no cost. Often the software has limited functionality or incomplete documentation until the user sends payment to the software developer ...
PC-File and the popular
dBase dBase (also stylized dBASE) was one of the first database management systems for microcomputers and the most successful in its day. The dBase system includes the core database engine, a query system, a forms engine, and a programming languag ...
. Flat-file databases are common and ubiquitous because they are easy to write and edit, and suit myriad purposes in an uncomplicated way.


Modern implementations

Linear stores of NoSQL data,
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other se ...
formatted data, primitive spreadsheets (perhaps comma-separated or tab-delimited), and text files can all be seen as flat-file databases, because they lack integrated indexes, built-in references between data elements, or complex data types. Programs to manage collections of books or appointments and
address book An address book or a name and address book is a book, or a database used for storing entries called contacts. Each contact entry usually consists of a few standard fields (for example: first name, last name, company name, address, telephone num ...
may use essentially single-purpose flat-file databases, storing and retrieving information from flat files unadorned with indexes or pointing systems. While a user can write a table of contents into a text file, the text file format itself does not include a concept of a table of contents. While a user may write "friends with Kathy" in the "Notes" section for John's contact information, this is interpreted by the user rather than a built-in feature of the database. When a database system begins to recognize and codify relationships between records, it begins to drift away from being "flat," and when it has a detailed system for describing types and hierarchical relationships, it is now too structured to be considered "flat."


Example database

The following example illustrates typical elements of a flat-file database. The
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
arrangement consists of a series of columns and rows organized into a tabular format. This specific example uses only one table. The columns include: ''name'' (a person's name, second column); ''team'' (the name of an athletic team supported by the person, third column); and a numeric ''unique ID'', (used to uniquely identify records, first column). Here is an example textual representation of the described data: id name team 1 Amy Blues 2 Bob Reds 3 Chuck Blues 4 Richard Blues 5 Ethel Reds 6 Fred Blues 7 Gilly Blues 8 Hank Reds 9 Hank Blues This type of data representation is quite standard for a flat-file database, although there are some additional considerations that are not readily apparent from the text: * Data types: each column in a database table such as the one above is ordinarily restricted to a specific
data type In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most prog ...
. Such restrictions are usually established by convention, but not formally indicated unless the data is transferred to a relational database system. * Separated columns: In the above example, individual columns are separated using
whitespace White space or whitespace may refer to: Technology * Whitespace characters, characters in computing that represent horizontal or vertical space * White spaces (radio), allocated but locally unused radio frequencies * TV White Space Database, a mec ...
characters. This is also called indentation or "fixed-width" data formatting. Another common convention is to separate columns using one or more
delimiter A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts ...
characters, such as a tab or comma. * Relational algebra: Each row or record in the above table meets the standard definition of a
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
under
relational algebra In database theory, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by Edgar F. Codd. The main application of relational algebr ...
(the above example depicts a series of 3-tuples). Additionally, the first row specifies the field names that are associated with the values of each row. * Database management system: Since the formal operations possible with a text file are usually more limited than desired, the text in the above example would ordinarily represent an intermediary state of the data prior to being transferred into a
database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
.


See also

* /etc/passwd, a commonly used flat file, used to detail users in Unix * CSV (standard Comma-Separated Values) *
Berkeley DB Berkeley DB (BDB) is an unmaintained embedded database software library for key/value data, historically significant in open source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitr ...
(typical flat-file database) *
Awk AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems. The AWK lan ...
(classical flat-file processor) *
Recfiles recfiles is a file format for human-editable, plain text databases. Databases using this file format can be edited using any text editor. recfiles allow for basic relational database operations, typing, auto-incrementing, as well as a simple joi ...
(plain text database file format)


References

{{DEFAULTSORT:Flat File Database Data management Computer file formats Database models it:Flat file