COCOA (an acronym derived from COunt and COncordance Generation on Atlas) was an early
text file
A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operat ...
utility and associated file format for
digital humanities
Digital humanities (DH) is an area of scholarly activity at the intersection of computing or digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanities, as well as the analy ...
, then known as humanities computing. It was approximately 4000
punched card
A punched card (also punch card or punched-card) is a piece of stiff paper that holds digital data represented by the presence or absence of holes in predefined positions. Punched cards were once common in data processing applications or to di ...
s of
FORTRAN and created in the late 1960s and early 1970s at
University College London
, mottoeng = Let all come who by merit deserve the most reward
, established =
, type = Public research university
, endowment = £143 million (2020)
, budget = � ...
and the
Atlas Computer Laboratory in
Harwell, Oxfordshire
Harwell is a village and civil parish in the Vale of White Horse about west of Didcot, east of Wantage and south of Oxford. The parish measures about north – south, and almost east – west at its widest point. In 1923 its area was . Hist ...
. Functionality included word-counting and
concordance building.
Oxford Concordance Program
The
Oxford Concordance Program (OCP) format was a direct descendant of COCOA developed at
Oxford University Computing Services. The
Oxford Text Archive
Oxford Text Archive (OTA) is an archive of electronic texts and other literary and language resources which have been created, collected and distributed for the purpose of research into literary and linguistic topics at the University of Oxford, En ...
holds items in this format.
Later developments
The COCOA file format bears at least a passing similarity to the later
markup language
Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
s such as
SGML
The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
* Declarative: Markup should d ...
and
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
. A noticeable difference with its successors is that COCOA tags are flat and not tree structured. In that format, every information type and value encoded by a tag should be considered true until the same tag changes its value. Members of the
Text Encoding Initiative
The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and main ...
community maintain legacy support for COCOA, although most in-demand texts and corpora have already been migrated to more widely understood formats such as
TEI XML
References
{{Reflist
Digital humanities
Computer file formats
History of software
Markup languages