Comma-separated values (CSV) is a text file format that uses
comma
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
s to separate values, and
newline
Newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a ...
s to separate records. A CSV file stores
tabular
Table may refer to:
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (landform), a flat area of land
* Table (information), a data arrangement with rows and columns
* Table (database), how the table data ...
data (numbers and text) in plain text, where each line of the file typically represents one data
record
A record, recording or records may refer to:
An item or collection of data Computing
* Record (computer science), a data structure
** Record, or row (database), a set of fields in a database related to one entity
** Boot sector or boot record, ...
. Each record consists of the same number of fields, and these are separated by commas in the CSV file. If the field delimiter itself may appear within a field, fields can be surrounded with quotation marks.
The CSV file format is one type of delimiter-separated file format. Delimiters frequently used include the comma, tab, space, and semicolon. Delimiter-separated files are often given a ".csv"
extension
Extension, extend or extended may refer to:
Mathematics
Logic or set theory
* Axiom of extensionality
* Extensible cardinal
* Extension (model theory)
* Extension (predicate logic), the set of tuples of values that satisfy the predicate
* E ...
even when the field separator is not a comma. Many applications or libraries that consume or produce CSV files have options to specify an alternative delimiter.
The lack of adherence to the CSV standard RFC 4180 necessitates the support for a variety of CSV formats in data input software. Despite this drawback, CSV remains widespread in data applications and is widely supported by a variety of software, including common spreadsheet applications such as Microsoft Excel. Benefits cited in favor of CSV include human readability and the simplicity of the format.
Applications
CSV is a common data exchange format that is widely supported by consumer, business, and scientific applications. Among its most common uses is moving tabular data between programs that natively operate on incompatible (often
proprietary
{{Short pages monitor
The 2005 technical standard RFC 4180 formalizes the CSV file format and defines the
MIME type
A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority, Internet Assigned Numbers Authority (IANA) is the official authority for t ...
"text/csv" for the handling of text-based fields. However, the interpretation of the text of each field is still application-specific. Files that follow the RFC 4180 standard can simplify CSV exchange and should be widely portable. Among its requirements:
* MS-DOS-style lines that end with (CR/LF) characters (optional for the last line).
* An optional header record (there is no sure way to detect whether it is present, so care is required when importing).
* Each record ''should'' contain the same number of comma-separated fields.
* Any field ''may'' be quoted (with double quotes).
* Fields containing a line-break, double-quote or commas ''should'' be quoted. (If they are not, the file will likely be impossible to process correctly.)
* ''If'' double-quotes are used to enclose fields, then a double-quote in a field ''must'' be represented by two double-quote characters.
The format can be processed by most programs that claim to read CSV files. The exceptions are ''(a)'' programs may not support line-breaks within quoted fields, ''(b)'' programs may confuse the optional header with data or interpret the first data line as an optional header, and ''(c)'' double-quotes in a field may not be parsed correctly automatically.
OKF frictionless tabular data package
In 2011 Open Knowledge Foundation (OKF) and various partners created a data protocols working group, which later evolved into the Frictionless Data initiative. One of the main formats they released was the Tabular Data Package. Tabular Data package was heavily based on CSV, using it as the main data transport format and adding basic type and schema metadata (CSV lacks any type information to distinguish the string "1" from the number 1).
The Frictionless Data Initiative has also provided a standard CSV Dialect Description Format for describing different dialects of CSV, for example specifying the field separator or quoting rules.
W3C tabular data standard
In 2013 the
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
"CSV on the Web" working group began to specify technologies providing higher interoperability for web applications using CSV or similar formats. The working group completed its work in February 2016 and is officially closed in March 2016 with the release of a set of documents and W3C recommendations
for modeling "Tabular Data", and enhancing CSV with
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
Many informal documents exist that describe "CSV" formats.
IETF RFC 4180 (summarized above) defines the format for the "text/csv"
MIME type
A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority, Internet Assigned Numbers Authority (IANA) is the official authority for t ...
registered with the
IANA
The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Interne ...
.
Rules typical of these and other "CSV" specifications and implementations are as follows:
Example
The above table of data may be represented in CSV format as follows:
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
Example of a USA/UK CSV file (where the decimal separator is a period/full stop and the value separator is a comma):
Year,Make,Model,Length
1997,Ford,E350,2.35
2000,Mercury,Cougar,2.38
Example of an analogous European CSV/ DSV file (where the decimal separator is a comma and the value separator is a semicolon):
Year;Make;Model;Length
1997;Ford;E350;2,35
2000;Mercury;Cougar;2,38
The latter format is not RFC 4180 compliant. Compliance could be achieved by the use of a comma instead of a semicolon as a separator and by quoting all numbers that have a decimal mark.
Application support
Some applications use CSV as a data interchange format to enhance its interoperability, exporting and importing CSV. Others use CSV as an ''internal format''.
As a data interchange format: the CSV file format is supported by almost all spreadsheets and database management systems,
* Spreadsheets including Apple Numbers, LibreOffice Calc, and Apache OpenOffice Calc. Microsoft Excel also supports a dialect of CSV with restrictions in comparison to other spreadsheet software (e.g., Excel still cannot export CSV files in the commonly used UTF-8 character encoding, and separator is not enforced to be the comma). LibreOffice Calc CSV importer is actually a more generic delimited text importer, supporting multiple separators at the same time as well as field trimming.
* Various Relational databases support saving query results to a CSV file.
PostgreSQL
PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
provides the COPY command, which allows for both saving and loading data to and from a file. saves the content of a table articles to a file called /home/wikipedia/file.csv.
* Many utility programs on Unix-style systems (such as cut, paste, join,
sort
Sort may refer to:
* Sorting, any process of arranging items in sequence or in sets
** Sorting algorithm, any algorithm for arranging elements in lists
** Sort (Unix), a Unix utility which sorts the lines of a file
** Sort (C++), a function in the ...
, uniq, awk) can split files on a comma delimiter, and can therefore process simple CSV files. However, this method does not correctly handle commas or new lines within quoted strings.
As (main or optional) internal representation. Can be native or foreign, but differ from interchange format ("export/import only") because it is not necessary to create a copy in another format:
* Some Spreadsheets including LibreOffice Calc offers this option, without enforcing user to adopt another format.
* Some relational databases, when using standard SQL, offer ''foreign-data wrapper'' (FDW). For example, PostgreSQL offers the and commands to configure any variant of CSV.
* Databases like Apache Hive offer the option to express CSV or .csv.gz as an internal table format.
* The
emacs
Emacs , originally named EMACS (an acronym for "Editor MACroS"), is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, s ...
editor can operate on CSV files using csv-nav mode.
CSV format is supported by libraries available for many programming languages. Most provide some way to specify the field delimiter, decimal separator, character encoding, quoting conventions, date format, etc.
Software and row limits
Programs that work with CSV may have limits on the maximum number of rows CSV files can have.
Below is a list of common software and its limitations:
* Microsoft Excel: 1,048,576 row limit;
* Apple Numbers: 1,000,000 row limit;
* Google Sheets: 5,000,000 cell limit (the product of columns and rows);
* OpenOffice and LibreOffice: 1,048,576 row limit;
* Text Editors (such as
WordPad
WordPad is the basic word processor that has been included with almost all versions of Microsoft Windows from Windows 95 onwards. It is more advanced than Windows Notepad, and simpler than Microsoft Word and Microsoft Works (last updated in 2007) ...
Vim
Vim means enthusiasm and vigor. It may also refer to:
* Vim (cleaning product)
* Vim Comedy Company, a movie studio
* Vim Records
* Vimentin, a protein
* "Vim", a song by Machine Head on the album ''Through the Ashes of Empires''
* Vim (text ed ...
, etc.): no row or cell limit;
* Databases (COPY command and FDW): no row or cell limit.
Simple Data Format
Simple Data Format (SDF) is a platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays. It was written in 2007 by George H. Fisher, a researcher at the Space Sciences Laboratory at UC B ...