HOME

TheInfoList



OR:

''In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, data (treated as singular, plural, or as a mass noun) is any sequence of one or more
symbol A symbol is a mark, Sign (semiotics), sign, or word that indicates, signifies, or is understood as representing an idea, physical object, object, or wikt:relationship, relationship. Symbols allow people to go beyond what is known or seen by cr ...
s; datum is a single symbol of data. Data requires interpretation to become
information Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
.
Digital data Digital data, in information theory and information systems, is information represented as a string of Discrete mathematics, discrete symbols, each of which can take on one of only a finite number of values from some alphabet (formal languages ...
is data that is represented using the
binary number A binary number is a number expressed in the Radix, base-2 numeral system or binary numeral system, a method for representing numbers that uses only two symbols for the natural numbers: typically "0" (zero) and "1" (one). A ''binary number'' may ...
system of ones (1) and zeros (0), instead of analog representation. In modern (post-1960) computer systems, all data is digital.'' Data exists in three states: data at rest, data in transit and data in use. Data within a computer, in most cases, moves as parallel data. Data moving to or from a computer, in most cases, moves as serial data. Data sourced from an analog device, such as a temperature sensor, may be converted to digital using an analog-to-digital converter. Data representing quantities, characters, or symbols on which operations are performed by a
computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
are stored and recorded on magnetic, optical, electronic, or mechanical recording media, and transmitted in the form of digital electrical or optical signals. Data pass in and out of computers via peripheral devices. Physical
computer memory Computer memory stores information, such as data and programs, for immediate use in the computer. The term ''memory'' is often synonymous with the terms ''RAM,'' ''main memory,'' or ''primary storage.'' Archaic synonyms for main memory include ...
elements consist of an address and a byte/word of data storage. Digital data are often stored in relational databases, like tables or SQL databases, and can generally be represented as abstract key/value pairs. Data can be organized in many different types of
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
s, including arrays, graphs, and objects. Data structures can store data of many different
types Type may refer to: Science and technology Computing * Typing, producing text via a keyboard, typewriter, etc. * Data type, collection of values used for computations. * File type * TYPE (DOS command), a command to display contents of a file. * Ty ...
, including
numbers A number is a mathematical object used to count, measure, and label. The most basic examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual numbers can ...
, strings and even other
data structures In computer science, a data structure is a data organization and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functi ...
.


Characteristics

Metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
helps translate data to information. Metadata is data about the data. Metadata may be implied, specified or given. Data relating to physical events or processes will have a temporal component. This temporal component may be implied. This is the case when a device such as a temperature logger receives data from a temperature sensor. When the temperature is received it is assumed that the data has a temporal reference of ''now''. So the device records the date, time and temperature together. When the data logger communicates temperatures, it must also report the date and time as metadata for each temperature reading. Fundamentally, computers follow a sequence of instructions they are given in the form of data. A set of instructions to perform a given task (or tasks) is called a '' program''. A program is data in the form of coded instructions to control the operation of a computer or other machine. In the nominal case, the program, as executed by the computer, will consist of machine code. The elements of storage manipulated by the program, but not actually executed by the
central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
(CPU), are also data. At its most essential, a single datum is a value stored at a specific location. Therefore, it is possible for computer programs to operate on other computer programs, by manipulating their programmatic data. To store data
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s in a file, they have to be serialized in a
file format A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
. Typically, programs are stored in special file types, different from those used for other data. Executable files contain programs; all other files are also data files. However, executable files may also contain data used by the program which is built into the program. In particular, some executable files have a data segment, which nominally contains constants and initial values for variables, both of which can be considered data. The line between program and data can become blurry. An interpreter, for example, is a program. The input data to an interpreter is itself a program, just not one expressed in native machine language. In many cases, the interpreted program will be a human-readable
text file A text file (sometimes spelled textfile; an old alternative name is flat file) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In ope ...
, which is manipulated with a text editor program. Metaprogramming similarly involves programs manipulating other programs as data. Programs like
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
s,
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
s,
debugger A debugger is a computer program used to test and debug other programs (the "target" programs). Common features of debuggers include the ability to run or halt the target program using breakpoints, step through code line by line, and display ...
s, program updaters, virus scanners and such use other programs as their data. For example, a user might first instruct the
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
to load a word processor program from one file, and then use the running program to open and edit a
document A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of nonfiction, non-fictional, as well as fictional, content. The word originates from the Latin ', which denotes ...
stored in another file. In this example, the document would be considered data. If the word processor also features a spell checker, then the dictionary (word list) for the spell checker would also be considered data. The
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s used by the spell checker to suggest corrections would be either machine code data or text in some interpretable
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
. In an alternate usage, binary files (which are not human-readable) are sometimes called ''data'' as distinguished from human-readable ''
text Text may refer to: Written word * Text (literary theory) In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothi ...
''. The total amount of digital data in 2007 was estimated to be 281 billion
gigabyte The gigabyte () is a multiple of the unit byte for digital information. The SI prefix, prefix ''giga-, giga'' means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte i ...
s (281 exabytes).


Data keys and values, structures and persistence

Keys in data provide the context for values. Regardless of the structure of data, there is always a key component present. Keys in data and data-structures are essential for giving meaning to data values. Without a key that is directly or indirectly associated with a value, or collection of values in a structure, the values become meaningless and cease to be data. That is to say, there has to be a key component linked to a value component in order for it to be considered data. Data can be represented in computers in multiple ways, as per the following examples:


RAM

* Random access memory (RAM) holds data that the CPU has direct access to. A CPU may only manipulate data within its processor registers or memory. This is as opposed to data storage, where the CPU must direct the transfer of data between the storage device (disk, tape...) and memory. RAM is an array of linear contiguous locations that a processor may read or write by providing an address for the read or write operation. The processor may operate on any location in memory at any time in any order. In RAM the smallest element of data is the binary bit. The capabilities and limitations of accessing RAM are processor specific. In general main memory is arranged as an array of locations beginning at address 0 (
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
0). Each location can store usually 8 or 32 bits depending on the computer architecture.


Keys

* Data keys need not be a direct hardware address in memory. Indirect, abstract and logical keys codes can be stored in association with values to form a
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
. Data structures have predetermined offsets (or links or paths) from the start of the structure, in which data values are stored. Therefore, the data key consists of the key to the structure plus the offset (or links or paths) into the structure. When such a structure is repeated, storing variations of the data values and the data keys within the same repeating structure, the result can be considered to resemble a table, in which each element of the repeating structure is considered to be a column and each repetition of the structure is considered as a row of the table. In such an organization of data, the data key is usually a value in one (or a composite of the values in several) of the columns.


Organised recurring data structures

* The tabular view of repeating data structures is only one of many possibilities. Repeating data structures can be organised hierarchically, such that nodes are linked to each other in a cascade of parent-child relationships. Values and potentially more complex data-structures are linked to the nodes. Thus the nodal hierarchy provides the key for addressing the data structures associated with the nodes. This representation can be thought of as an
inverted tree Inverse or invert may refer to: Science and mathematics * Inverse (logic), a type of conditional sentence which is an immediate inference made from another conditional sentence * Additive inverse, the inverse of a number that, when added to the ...
. Modern computer operating system file systems are a common example; and
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
is another.


Sorted or ordered data

* Data has some inherent features when it is sorted on a key. All the values for subsets of the key appear together. When passing sequentially through groups of the data with the same key, or a subset of the key changes, this is referred to in data processing circles as a break, or a control break. It particularly facilitates the aggregation of data values on subsets of a key.


Peripheral storage

* Until the advent of bulk non-volatile memory like flash, persistent data storage was traditionally achieved by writing the data to external block devices like magnetic tape and disk drives. These devices typically seek to a location on the magnetic media and then read or write blocks of data of a predetermined size. In this case, the seek location on the media, is the data key and the blocks are the data values. Early used ''raw disk'' data file-systems or disc operating systems reserved contiguous blocks on the disc drive for data files. In those systems, the files could be filled up, running out of data space before all the data had been written to them. Thus much unused data space was reserved unproductively to ensure adequate free space for each file. Later file-systems introduced partitions. They reserved blocks of disc data space for partitions and used the allocated blocks more economically, by dynamically assigning blocks of a partition to a file as needed. To achieve this, the file system had to keep track of which blocks were used or unused by data files in a catalog or file allocation table. Though this made better use of the disc data space, it resulted in fragmentation of files across the disc, and a concomitant performance overhead due additional seek time to read the data. Modern file systems reorganize fragmented files dynamically to optimize file access times. Further developments in file systems resulted in virtualization of disc drives i.e. where a logical drive can be defined as partitions from a number of physical drives.


Indexed data

* Retrieving a small subset of data from a much larger set may imply inefficiently searching through the data sequentially.
Index Index (: indexes or indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on the Halo Array in the ...
es are a way to copy out keys and location addresses from data structures in files, tables and data sets, then organize them using inverted tree structures to reduce the time taken to retrieve a subset of the original data. In order to do this, the key of the subset of data to be retrieved must be known before retrieval begins. The most popular indexes are the B-tree and the dynamic hash key indexing methods. Indexing is overhead for filing and retrieving data. There are other ways of organizing indexes, e.g. sorting the keys and using a binary search algorithm.


Abstraction and indirection

*
Object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impl ...
uses two basic concepts for understanding data and software: # The taxonomic rank-structure of '' classes'', which is an example of a hierarchical data structure; and # at run time, the creation of references to in-memory data-structures of objects that have been instantiated from a class library. It is only after instantiation that an object of a specified class exists. After an object's reference is cleared, the object also ceases to exist. The memory locations where the object's data was stored are
garbage Garbage, trash (American English), rubbish (British English), or refuse is waste material that is discarded by humans, usually due to a perceived lack of utility. The term generally does not encompass bodily waste products, purely liquid or ...
and are reclassified as unused memory available for reuse.


Database data

* The advent of
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s introduced a further layer of abstraction for persistent data storage. Databases use
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
, and a structured query language protocol between client and server systems, communicating over a
computer network A computer network is a collection of communicating computers and other devices, such as printers and smart phones. In order to communicate, the computers and devices must be connected by wired media like copper cables, optical fibers, or b ...
, using a two phase commit logging system to ensure transactional completeness, when saving data.


Parallel distributed data processing

* Modern scalable and high-performance data persistence technologies, such as Apache Hadoop, rely on massively parallel distributed data processing across many commodity computers on a high bandwidth network. In such systems, the data is distributed across multiple computers and therefore any particular computer in the system must be represented in the key of the data, either directly, or indirectly. This enables the differentiation between two identical sets of data, each being processed on a different computer at the same time.


See also

*
Big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
*
Data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
*
Data dictionary A data dictionary, or metadata repository, as defined in the ''IBM Dictionary of Computing'', is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format". ''Oracle Corporation, ...
* Data modeling * Data stream *
Data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
*
Database index A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data withou ...
*
State (computer science) In information technology and computer science, a system is described as stateful if it is designed to remember preceding events or user interactions; the remembered information is called the state of the system. The set of states a system can oc ...
*
Tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...


References

{{Authority control