PRONOM (
Public Record Office
The Public Record Office (abbreviated as PRO, pronounced as three letters and referred to as ''the'' PRO), Chancery Lane in the City of London, was the guardian of the national archives of the United Kingdom from 1838 until 2003, when it was m ...
and
Nôm 喃) is a
web
Web most often refers to:
* Spider web, a silken structure created by the animal
* World Wide Web or the Web, an Internet-based hypertext system
Web, WEB, or the Web may also refer to:
Computing
* WEB, a literate programming system created by ...
-based technical registry to support
digital preservation
In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
services, developed by
The National Archives of the United Kingdom. PRONOM was the first and remains, to date, the only operational public
file format
A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
registry in the world, although the "Magic File" repository of the
File Command has served this role in a less formal capacity for two decades. Other projects to develop technical registries, including the UK
Digital Curation Centre
The Digital Curation Centre (DCC) was established to help solve the extensive challenges of digital preservation and digital curation and to lead research, development, advice, and support services for higher education institutions in the United ...
's Representation Information Registry, and the
Global Digital Format Registry project at
Harvard University
Harvard University is a Private university, private Ivy League research university in Cambridge, Massachusetts, United States. Founded in 1636 and named for its first benefactor, the History of the Puritans in North America, Puritan clergyma ...
, are now in progress.
PRONOM's origins lie in a requirement to have access to reliable technical information about the electronic records held by The National Archives. By definition, electronic records are not inherently human-readable - file formats encode information into a form which can only be processed and rendered comprehensible by very specific technological environments. The accessibility of that information is therefore highly vulnerable to
technological obsolescence. Technical information about the structure of those file formats, and the
software
Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.
The history of software is closely tied to the development of digital comput ...
and
hardware environments required to support them, is therefore a prerequisite for any digital preservation regime. PRONOM was developed to provide this function, initially as an internal resource for National Archives staff, and subsequently as public, web-based resource.
__TOC__
Development
The first version of PRONOM was developed by The National Archives digital preservation department led by David Ryan in March 2002. PRONOM 2 was released in December 2002, and provided support for the development of multi-lingual versions of the registry. The web-enabling of PRONOM (PRONOM 3) in February 2004 represented the starting point for the development of PRONOM as a major online resource for the international digital preservation community.
PRONOM 4, released in October 2005, includes a significant reworking of the underlying data model to allow the capture of detailed technical information on file formats and support future interoperability with other planned registry systems, and the release of the DROID software for automatic file format identification.
The latest version PRONOM 5 was a relatively minor update to support improvements to DROID and was released in 2006. A much more substantial update is planned for 2007, which will include the exposure of core PRONOM functions through
web service
A web service (WS) is either:
* a service offered by an electronic device to another electronic device, communicating with each other via the Internet, or
* a server running on a computer device, listening for requests at a particular port over a n ...
s
interfaces
Interface or interfacing may refer to:
Academic journals
* ''Interface'' (journal), by the Electrochemical Society
* '' Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics''
* '' Inter ...
. This work forms part of the Seamless Flow programme to position The National Archives to receive and manage future government records in electronic formats.
The National Archives won the 2007
Digital Preservation Award sponsored by the
Digital Preservation Coalition
ThDigital Preservation Coalition (DPC)is a UK-based charity that works with global partners to 'a welcoming and inclusive global community, working together to bring about a sustainable future for our digital assets'.
Background
The origins o ...
, for its work on PRONOM and DROID.
The Global Digital Format Registry project which began at Harvard in 2005 was eventually rolled along with PRONOM into the joint Unified Digital Format Registry effort. In 2012, however, the UDFR was mothballed leading to the
California Digital Library
The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management ...
eventually removing access to their node in 2016 and recommending the use of PRONOM.
Services
The core technical registry supports a number of specific services:
The PRONOM registry provides a searchable web database of technical information about file formats, the software tools required to access them, and the technical environments required to access them. Users can search for formats and software using a variety of criteria, such as format or software name and
file extension
File or filing may refer to:
Mechanical tools and processes
* File (tool), a tool used to remove fine amounts of material from a workpiece.
** Filing (metalworking), a material removal process in manufacturing
** Nail file, a tool used to gen ...
. PRONOM also holds information about support periods for software products, and can also be queried on this basis. In addition to on-screen viewing, registry information can be exported in
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
,
CSV and printer-friendly formats. The PRONOM website allows users to submit new information for inclusion in PRONOM.
The PRONOM Persistent Unique Identifier (PUID) scheme
The PRONOM Persistent Unique Identifier (PUID) is an extensible scheme of persistent, unique and unambiguous
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
s for records in the PRONOM registry. Such identifiers are fundamental to the exchange and management of digital objects, by allowing human or automated user agents to unambiguously identify, and share that identification of, the representation information required to support access to an object. This is a virtue both of the inherent uniqueness of the identifier, and of its binding to a definitive description of the representation information in a registry such as PRONOM.
At present, the PUID scheme is limited to one particular class of representation information: the
format in which a digital object is encoded. Formats were considered a particular priority for such a scheme, as no existing, universally applicable system provides for this.
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
magic numbers and
Macintosh
Mac is a brand of personal computers designed and marketed by Apple Inc., Apple since 1984. The name is short for Macintosh (its official name until 1999), a reference to the McIntosh (apple), McIntosh apple. The current product lineup inclu ...
data forks do provide some of this functionality, but the same is not true within
DOS
DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
or
Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
environments. The three-character
file extension
File or filing may refer to:
Mechanical tools and processes
* File (tool), a tool used to remove fine amounts of material from a workpiece.
** Filing (metalworking), a material removal process in manufacturing
** Nail file, a tool used to gen ...
is neither standardised nor unique, and is interpreted differently by different environments. Equally, the
IANA
The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Internet P ...
MIME
A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...
-type scheme does not provide sufficient granularity or coverage to satisfy the requirements for unique identifiers. The PUID scheme has been developed for the single purpose of providing such identifiers.
The scheme has been adopted as the recommended encoding scheme for describing file formats in the latest version of the ''UK e-Government Metadata Standard''. The scheme is designed to be extensible, and may be expanded in future to include other classes of representation information in PRONOM, such as
compression methods,
character encoding schemes, and
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s.
PUIDs can be expressed as
Uniform Resource Identifier
A Uniform Resource Identifier (URI), formerly Universal Resource Identifier, is a unique sequence of characters that identifies an abstract or physical resource, such as resources on a webpage, mail address, phone number, books, real-world obje ...
s using the
info:pronom/
namespace, details of which are available from the
info URI registry. Neither the PUID scheme, nor its expression as an info URI, supports any inherent dereferencing mechanism, i.e. a PUID does not resolve to a
Uniform Resource Locator
A uniform resource locator (URL), colloquially known as an address on the World Wide Web, Web, is a reference to a web resource, resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific t ...
. However, The National Archives is planning to develop a range of services to expose PRONOM registry content, including a resolution service for PUIDs.
DROID
DROID (Digital Record Object Identification) is a software tool developed by The National Archives to perform automated batch identification of file formats. It is one of a planned series of tools utilising PRONOM to provide specific digital preservation services. DROID uses internal (byte sequence) and external (file extension) signatures to identify and report the specific file format versions of digital files. These signatures are stored in an
XML signature
XML Signature (also called ''XMLDSig'', ''XML-DSig'', ''XML-Sig'') defines an XML syntax for digital signatures and is defined in the W3C recommendationbr>XML Signature Syntax and Processing Functionally, it has much in common with PKCS #7 but is ...
file, generated from information recorded in the PRONOM technical registry. New and updated signatures are regularly added to PRONOM, and DROID can be configured to automatically download updated signature files from the PRONOM website via
web service
A web service (WS) is either:
* a service offered by an electronic device to another electronic device, communicating with each other via the Internet, or
* a server running on a computer device, listening for requests at a particular port over a n ...
s.
DROID allows files and folders to be selected from a file system for identification. After the identification process had been run, the results can be output in
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
,
CSV or printer-friendly formats.
DROID is a platform-independent
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
tool. It includes a documented, public
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
, and can be invoked from both
GUI and
command line
A command-line interface (CLI) is a means of interacting with software via command (computing), commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user ...
interfaces.
Future services
Proposed future services include format risk assessments and preservation planning, and the automated generation of migration pathways for converting between formats.
See also
*
Digital curation
Digital curation is the selection, Preservation (library and archival science), preservation, maintenance, collection, and archiving of Digital data, digital assets.
Digital curation establishes, maintains, and adds value to repositories of digita ...
*
Digital preservation
In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
*
File format
A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
*
File (command)
file is a shell command for reporting the type of data contained in a file. It is commonly supported in Unix and Unix-like operating systems.
As the command uses relatively quick-running heuristics to determine file type, it can report mislea ...
References
{{Reflist
External links
PRONOM technical registrynamespace registrationGlobal Digital Format Registry project
Preservation (library and archival science)
Web applications
Computer archives
The National Archives (United Kingdom)