The LOCKSS ("Lots of Copies Keep Stuff Safe") project, under the auspices of
Stanford University, is a peer-to-peer network that develops and supports an
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
system allowing libraries to collect, preserve and provide their readers with access to material published on the Web. Its main goal is
digital preservation
In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods an ...
.
The system attempts to replicate the way libraries do this for material published on paper. It was originally designed for scholarly journals, but is now also used for a range of other materials. Examples include the
SOLINET project to preserve theses and dissertations at eight universities, US government documents, and the
MetaArchive Cooperative program preserving at-risk digital archival collections, including Electronic Theses and Dissertations (ETDs), newspapers, photograph collections, and audio-visual collections.
[ Free PDF download.]
A similar project called CLOCKSS (Controlled LOCKSS) "is a tax-exempt,
501(c)3
A 501(c)(3) organization is a United States corporation, Trust (business), trust, unincorporated association or other type of organization exempt from federal income tax under section 501(c)(3) of Title 26 of the United States Code. It is one of t ...
, not-for-profit organization, governed by a Board of Directors made up of librarians and publishers."
CLOCKSS runs on LOCKSS technology.
Problem
Traditionally, academic libraries have retained issues of scholarly journals, either individually or collaboratively, providing their readers access to the content received even after the publisher has ceased or the subscription has been canceled. In the digital age, libraries often subscribe to journals that are only available digitally over the
Internet
The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a ''internetworking, network of networks'' that consists ...
. Although convenient for patron access, the model for digital subscriptions does not allow the libraries to retain a copy of the journal. If the publisher ceases to publish, or the library cancels the subscription, or if the publisher's website is down for the day, the content that has been paid for is no longer available.
Methods
The LOCKSS system allows a library, with permission from the publisher, to collect, preserve and disseminate to its patrons a copy of the materials to which it has subscribed as well as open access material (perhaps published under a
Creative Commons
Creative Commons (CC) is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. The organization has releas ...
license). Each library's system collects a copy using a specialized
web crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spi ...
that verifies that the publisher has granted suitable permission. The system is format-agnostic, collecting whatever formats the publisher delivers via
HTTP
The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
. Libraries which have collected the same material cooperate in a
peer-to-peer
Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer ...
network to ensure its preservation. Peers in the network vote on
cryptographic hash functions of preserved content and a
nonce
Nonce may refer to:
* Cryptographic nonce, a number or bit string used only once, in security engineering
* Nonce word, a word used to meet a need that is not expected to recur
* The Nonce, American rap duo
* Nonce orders, an architectural term
...
; a peer that is outvoted regards its copy as damaged and repairs it from the publisher or other peers.
The LOCKSS license used by most publishers allows a library's readers access to its own copy, but does not allow similar access to other libraries or unaffiliated readers; the system does not support
file sharing
File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include r ...
. On request, a library may supply another library with content to effect a repair, but only if the requesting library proved that in the past that it had a good copy by voting with the majority. If the reader's browser no longer supports the format in which the copy was collected, a ''format migration process'' can convert it to a current format. These limits on the use that may be made of preserved copies of copyright material have been effective in persuading copyright owners to grant the necessary permission.
The LOCKSS approach of selective collection with permission from the publisher, distributed storage, and restricted dissemination contrasts with, for example, the
Internet Archive
The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music ...
's approach of omnivorous collection without permission from the publisher, centralized storage, and unrestricted dissemination. The LOCKSS system is far smaller, but it can preserve subscription materials to which the Internet Archive has no access.
Since each library administers its own LOCKSS peer and maintains its own copy of preserved material, and since there are libraries doing so worldwide (see the list of participating libraries
below
Below may refer to:
*Earth
* Ground (disambiguation)
* Soil
* Floor
* Bottom (disambiguation)
* Less than
*Temperatures below freezing
* Hell or underworld
People with the surname
* Ernst von Below (1863–1955), German World War I general
* Fr ...
), the system provides a much higher degree of
replication
Replication may refer to:
Science
* Replication (scientific method), one of the main principles of the scientific method, a.k.a. reproducibility
** Replication (statistics), the repetition of a test or complete experiment
** Replication crisi ...
than is usual in a
fault-tolerant system. The voting process makes use of this high degree of replication to eliminate the need for
backups to off-line media, and to provide robust defenses against attacks aimed at corrupting preserved content.
Importance
In addition to preserving access, libraries have traditionally made it difficult to rewrite or suppress printed material. The existence of an indeterminate but large number of identical copies on a somewhat tamper-resistant medium under many independent administrations meant that attempts to alter or remove all copies of a published work would likely both fail and be detected. Web publishing, based on a single copy under a single administration, provides none of these safeguards against subversion. Web publishing is, therefore, an amenable tool for rewriting history. By preserving many copies under diverse administration, by automatically auditing the copies at intervals against each other (and, in the future, against the publisher's copy), and by alerting libraries when changes are detected, the LOCKSS system attempts to restore many of these safeguards in the now digital world of publication.
Implementation
Prior to implementing a LOCKSS system, some questions need to be considered carefully in order to make sure the content is verified, evaluated, and auditable by users. The user must ask questions such as, "What are your procedures?", "What are your methods?", "How is this system evaluated?", and "What is your disaster preparedness program?". These questions will enable the user to evaluate the system, create a successful maintenance plan for their materials, and enable the system to be reinforced by a carefully evaluated support structure.
The source code for the entire LOCKSS system carries BSD-style
open-source license
An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared under defined terms and conditions. This allows end users and commercial compan ...
s and is available from
GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, co ...
.
LOCKSS is a trademark of Stanford University.
See also
*
Clock of the Long Now
*
Digital library
A digital library, also called an online library, an internet library, a digital repository, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital ...
*
Digital preservation
In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods an ...
*
*
Portico (service)
References
Further reading
*
External links
*{{official website, http://www.lockss.org/
LOCKSS (YouTube video, June 2007)LOCKSS Part I: Why Libraries Should Care About LOCKSS (YouTube video, December 2007)CLOCKSS "Controlled LOCKSS", a federated global (vice local) LOCKSS archive
Digital libraries
Free institutional repository software
Peer-to-peer computing
Software using the BSD license