Database design is the organization of data according to a
database model
A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relatio ...
. The designer determines what data must be stored and how the data elements interrelate. With this information, they can begin to fit the data to the database model.
[Teorey, T.J., Lightstone, S.S., et al., (2009). Database Design: Know it all.1st ed. Burlington, MA.: Morgan Kaufmann Publishers] A database management system manages the data accordingly.
Database design is a process that consists of several steps.
Conceptual data modeling
The first step of database design involves classifying data and identifying interrelationships. The theoretical representation of data is called an ''
ontology
Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...
'' or a ''
conceptual data model''.
Determining data to be stored
In a majority of cases, the person designing a database is a person with expertise in database design, rather than expertise in the domain from which the data to be stored is drawn e.g. financial information, biological information etc. Therefore, the data to be stored in a particular database must be determined in cooperation with a person who does have expertise in that domain, and who is aware of the meaning of the data to be stored within the system.
This process is one which is generally considered part of
requirements analysis
In systems engineering and software engineering, requirements analysis focuses on the tasks that determine the needs or conditions to meet the new or altered product or project, taking account of the possibly conflicting requirements of the v ...
, and requires skill on the part of the database designer to elicit the needed information from those with the
domain knowledge. This is because those with the necessary domain knowledge often cannot clearly express the system requirements for the database as they are unaccustomed to thinking in terms of the discrete data elements which must be stored. Data to be stored can be determined by Requirement Specification.
Determining data relationships
Once a database designer is aware of the data which is to be stored within the database, they must then determine where dependency is within the data. Sometimes when data is changed you can be changing other data that is not visible. For example, in a list of names and addresses, assuming a situation where multiple people can have the same address, but one person cannot have more than one address, the address is dependent upon the name. When provided a name and the list the address can be uniquely determined; however, the inverse does not hold – when given an address and the list, a name cannot be uniquely determined because multiple people can reside at an address. Because an address is determined by a name, an address is considered dependent on a name.
(NOTE: A common misconception is that the
relational model
The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data are represented in terms of t ...
is so called because of the stating of relationships between data elements therein. This is not true. The relational model is so named because it is based upon the mathematical structures known as
relations.)
Conceptual schema
The information obtained can be formalized in a diagram or schema. At this stage, it is a
conceptual schema.
ER diagram (entity–relationship model)
One of the most common types of conceptual schemas is the ER (
entity–relationship model
An entity–relationship model (or ER model) describes interrelated things of interest in a specific domain of knowledge. A basic ER model is composed of entity types (which classify the things of interest) and specifies relationships that can e ...
) diagrams.
Attributes in ER diagrams are usually modeled as an oval with the name of the attribute, linked to the entity or relationship that contains the attribute.
ER models are commonly used in information system design; for example, they are used to describe information requirements and / or the types of information to be stored in the database during the conceptual structure design phase.
Logical data modeling
Once the relationships and dependencies amongst the various pieces of information have been determined, it is possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by the
database management system
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...
. In the case of
relational databases the storage objects are
tables which store data in rows and columns. In an
Object database
An object database or object-oriented database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are ...
the storage objects correspond directly to the objects used by the
Object-oriented programming language
Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impleme ...
used to write the applications that will manage and access the data. The relationships may be defined as attributes of the object classes involved or as methods that operate on the object classes.
The way this mapping is generally performed is such that each set of related data which depends upon a single object, whether real or abstract, is placed in a table. Relationships between these dependent objects are then stored as links between the various objects.
Each table may represent an implementation of either a logical object or a relationship joining one or more instances of one or more logical objects. Relationships between tables may then be stored as links connecting child tables with parents. Since complex logical relationships are themselves tables they will probably have links to more than one parent.
Normalization
In the field of
relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
design, ''normalization'' is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies that could lead to loss of
data integrity
Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle. It is a critical aspect to the design, implementation, and usage of any system that stores, proc ...
.
A standard piece of database design guidance is that the designer should create a fully normalized design; selective
denormalization can subsequently be performed, but only for
performance
A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function.
Performance has evolved glo ...
reasons. The trade-off is storage space vs performance. The more normalized the design is, the less data redundancy there is (and therefore, it takes up less space to store), however, common data retrieval patterns may now need complex joins, merges, and sorts to occur – which takes up more data read, and compute cycles. Some modeling disciplines, such as the
dimensional modeling approach to
data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
design, explicitly recommend non-normalized designs, i.e. designs that in large part do not adhere to
3NF. Normalization consists of normal forms that are
1NF,
2NF, 3NF,
Boyce-Codd NF (3.5NF),
4NF,
5NF and
6NF.
Document databases take a different approach. A document that is stored in such a database, typically would contain more than one normalized data unit and often the relationships between the units as well. If all the data units and the relationships in question are often retrieved together, then this approach optimizes the number of retrieves. It also simplifies how data gets replicated, because now there is a clearly identifiable unit of data whose consistency is self-contained. Another consideration is that reading and writing a single document in such databases will require a single transaction – which can be an important consideration in a
Microservices
In software engineering, a microservice architecture is an architectural pattern that organizes an application into a collection of loosely coupled, fine-grained services that communicate through lightweight protocols. This pattern is characterize ...
architecture. In such situations, often, portions of the document are retrieved from other services via an API and stored locally for efficiency reasons. If the data units were to be split out across the services, then a read (or write) to support a service consumer might require more than one service calls, and this could result in management of multiple transactions, which may not be preferred.
Physical design
Physical data modeling
The physical design of the database specifies the physical configuration of the database on the storage media. This includes detailed specification of
data element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:
# An identification such as a data element name
# A clear data element definition
# One or more representation term ...
s and
data type
In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
s.
Other physical design
This step involves specifying the
indexing options and other parameters residing in the DBMS
data dictionary
A data dictionary, or metadata repository, as defined in the ''IBM Dictionary of Computing'', is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format". ''Oracle Corporation, ...
. It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system. Some aspects that are addressed at the physical layer:
* Performance – mainly addressed via indexing for the read/update/delete queries, data type choice for insert queries
* Replication – what pieces of data get copied over into another database, and how often. Are there multiple-masters, or a single one?
* High-availability – whether the configuration is active-passive, or active-active, the topology, coordination scheme, reliability targets, etc all have to be defined.
* Partitioning – if the database is distributed, then for a single entity, how is the data distributed amongst all the partitions of the database, and how is partition failure taken into account.
* Backup and restore schemes.
At the application level, other aspects of the physical design can include the need to define stored procedures, or materialized query views,
OLAP
In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction processi ...
cubes, etc.
See also
References
Further reading
*S. Lightstone, T. Teorey, T. Nadeau, "Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more", Morgan Kaufmann Press, 2007.
*M. Hernandez,
Database Design for Mere Mortals A Hands-On Guide to Relational Database Design", 3rd Edition, Addison-Wesley Professional, 2013.
External links
Database Normalization Basics by Mike Chapple (About.com)
Database Normalization IntroPart 2
*
*
{{DEFAULTSORT:Database Design
Databases
Database management systems
Database theory