The LOCKSS ("Lots of Copies Keep Stuff Safe") project, under the auspices of
Stanford University
Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
, is a peer-to-peer network that develops and supports an
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
system allowing libraries to collect, preserve and provide their readers with access to material published on the Web. Its main goal is
digital preservation
In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
.
The system attempts to replicate the way libraries do this for material published on paper. It was originally designed for scholarly journals, but is now also used for a range of other materials. Examples include the
SOLINET
Lyrasis is a non-profit member organization serving and supporting libraries, archives, museums, and cultural heritage organizations around the world. Lyrasis is based in the United States. It was created in April 2009 from the merger of SOLINET an ...
project to preserve theses and dissertations at eight universities, US government documents, and the
MetaArchive Cooperative
The MetaArchive Cooperative was an international digital preservation network composed of libraries, archives, and other memory institutions. As of August 2011, the MetaArchive preservation network was composed of 24 secure servers (referred to a ...
program preserving at-risk digital archival collections, including Electronic Theses and Dissertations (ETDs), newspapers, photograph collections, and audio-visual collections.
[ Free PDF download.]
A similar project called CLOCKSS (Controlled LOCKSS) "is a tax-exempt,
501(c)(3)
A 501(c)(3) organization is a United States corporation, Trust (business), trust, unincorporated association or other type of organization exempt from federal income tax under section 501(c)(3) of Title 26 of the United States Code. It is one of ...
, not-for-profit organization, governed by a Board of Directors made up of librarians and publishers."
CLOCKSS runs on LOCKSS technology.
Problem
Traditionally, academic libraries have retained issues of scholarly journals, either individually or collaboratively, providing their readers access to the content received even after the publisher has ceased or the subscription has been canceled. In the digital age, libraries often subscribe to journals that are only available digitally over the
Internet
The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
. Although convenient for patron access, the model for digital subscriptions does not allow the libraries to retain a copy of the journal. If the publisher ceases to publish, or the library cancels the subscription, or if the publisher's website is down for the day, the content that has been paid for is no longer available.
Methods
The LOCKSS system allows a library, with permission from the publisher, to collect, preserve and disseminate to its patrons a copy of the materials to which it has subscribed as well as open access material (perhaps published under a
Creative Commons
Creative Commons (CC) is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. The organization has release ...
license). Each library's system collects a copy using a specialized
web crawler
Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spider ...
that verifies that the publisher has granted suitable permission. The system is format-agnostic, collecting whatever formats the publisher delivers via
HTTP
HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
. Libraries which have collected the same material cooperate in a
peer-to-peer
Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
network to ensure its preservation. Peers in the network vote on
cryptographic hash functions
A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with a fixed size of n bits) that has special properties desirable for a cryptographic application:
* the probability of a particula ...
of preserved content and a
nonce; a peer that is outvoted regards its copy as damaged and repairs it from the publisher or other peers.
The LOCKSS license used by most publishers allows a library's readers access to its own copy, but does not allow similar access to other libraries or unaffiliated readers; the system does not support
file sharing
File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include ...
. On request, a library may supply another library with content to effect a repair, but only if the requesting library proved that in the past that it had a good copy by voting with the majority. If the reader's browser no longer supports the format in which the copy was collected, a ''format migration process'' can convert it to a current format. These limits on the use that may be made of preserved copies of copyright material have been effective in persuading copyright owners to grant the necessary permission.
The LOCKSS approach of selective collection with permission from the publisher, distributed storage, and restricted dissemination contrasts with, for example, the
Internet Archive
The Internet Archive is an American 501(c)(3) organization, non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including web ...
's approach of omnivorous collection without permission from the publisher, centralized storage, and unrestricted dissemination. The LOCKSS system is far smaller, but it can preserve subscription materials to which the Internet Archive has no access.
Since each library administers its own LOCKSS peer and maintains its own copy of preserved material, and since there are libraries doing so worldwide (see the list of participating libraries
below
Below may refer to:
*Earth
*Ground (disambiguation)
*Soil
*Floor
* Bottom (disambiguation)
*Less than
*Temperatures below freezing
*Hell or underworld
People with the surname
* Ernst von Below (1863–1955), German World War I general
* Fred Belo ...
), the system provides a much higher degree of
replication than is usual in a
fault-tolerant system
Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission-critical, or even life-critical systems.
Fault to ...
. The voting process makes use of this high degree of replication to eliminate the need for
backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "wikt:back ...
s to off-line media, and to provide robust defenses against attacks aimed at corrupting preserved content.
Importance
In addition to preserving access, libraries have traditionally made it difficult to rewrite or suppress printed material. The existence of an indeterminate but large number of identical copies on a somewhat tamper-resistant medium under many independent administrations meant that attempts to alter or remove all copies of a published work would likely both fail and be detected. Web publishing, based on a single copy under a single administration, provides none of these safeguards against subversion. Web publishing is, therefore, an amenable tool for rewriting history. By preserving many copies under diverse administration, by automatically auditing the copies at intervals against each other (and, in the future, against the publisher's copy), and by alerting libraries when changes are detected, the LOCKSS system attempts to restore many of these safeguards in the now digital world of publication.
Implementation
Prior to implementing a LOCKSS system, some questions need to be considered carefully in order to make sure the content is verified, evaluated, and auditable by users. The user must ask questions such as, "What are your procedures?", "What are your methods?", "How is this system evaluated?", and "What is your disaster preparedness program?". These questions will enable the user to evaluate the system, create a successful maintenance plan for their materials, and enable the system to be reinforced by a carefully evaluated support structure.
The source code for the entire LOCKSS system carries BSD-style
open-source license
Open-source licenses are software licenses that allow content to be used, modified, and shared. They facilitate free and open-source software (FOSS) development. Intellectual property (IP) laws restrict the modification and sharing of creative ...
s and is available from
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
. LOCKSS is a trademark of Stanford University.
See also
*
Clock of the Long Now
The Clock of the Long Now, also called the 10,000-year clock, is a mechanical clock under construction that is designed to keep time for 10,000 years. It is being built by the Long Now Foundation. A two-meter prototype is on display at the Sci ...
[Alt URL]
/ref>
* Digital library
A digital library (also called an online library, an internet library, a digital repository, a library without walls, or a digital collection) is an online database of digital resources that can include text, still images, audio, video, digital ...
* Digital preservation
In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
*
* Portico (service)
Ithaka Harbors, Inc. is a US not-for-profit, the parent company of digital library website JSTOR, the digital preservation service Portico, and the research and consulting group Ithaka S+R. Its stated mission is to "help the academic community us ...
References
Further reading
*
External links
*{{official website, http://www.lockss.org/
LOCKSS (YouTube video, June 2007)
LOCKSS Part I: Why Libraries Should Care About LOCKSS (YouTube video, December 2007)
CLOCKSS
"Controlled LOCKSS", a federated global (vice local) LOCKSS archive
Digital library software
Free institutional repository software
Peer-to-peer computing
Software using the BSD license