Linked Open Data
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard
Web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
technologies such as
HTTP HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the
Internet The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
to become a global
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
.
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web, the HTML markup language, the URL system, and HTTP. He is a professorial research fellow a ...
, director of the
World Wide Web Consortium The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
(W3C), coined the term in a 2006 design note about the
Semantic Web The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding o ...
project. Linked data may also be
open data Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license. The goals of the open data movement are similar to those of other "open(-so ...
, in which case it is usually described as Linked Open Data.


Principles

In his 2006 "Linked Data" note,
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web, the HTML markup language, the URL system, and HTTP. He is a professorial research fellow a ...
outlined four principles of linked data, paraphrased along the following lines: #
Uniform Resource Identifier A Uniform Resource Identifier (URI), formerly Universal Resource Identifier, is a unique sequence of characters that identifies an abstract or physical resource, such as resources on a webpage, mail address, phone number, books, real-world obje ...
s (URIs) should be used to name and identify individual things. #
HTTP HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
URIs should be used to allow these things to be looked up, interpreted, and subsequently "dereferenced". #Useful information about what a name identifies should be provided through open standards such as RDF,
SPARQL SPARQL (pronounced ":wikt:sparkle, sparkle", a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a Semantic Query, semantic query language for databases—able to retrieve and manipulate data sto ...
, etc. #When publishing data on the Web, other things should be referred to using their HTTP URI-based names. Tim Berners-Lee later restated these principles at a 2009 TED conference, again paraphrased along the following lines: #All conceptual things should have a name starting with
HTTP HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
. #Looking up an HTTP name should return useful data about the thing in question in a standard format. #Anything else that that same thing has a relationship with through its data should also be given a name beginning with HTTP.


Components

Thus, we can identify the following components as essential to a global Linked Data system as envisioned, and to any actual Linked Data subset within it: *
URI Uri may refer to: Places * Canton of Uri, a canton in Switzerland * Úri, a village and commune in Hungary * Uri, Iran, a village in East Azerbaijan Province * Uri, Jammu and Kashmir, a town in India * Uri (island), off Malakula Island in V ...
s *
HTTP HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
*
Structured data A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be ...
using
controlled vocabulary A controlled vocabulary provides a way to organize knowledge for subsequent retrieval. Controlled vocabularies are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled v ...
terms and dataset definitions expressed in
Resource Description Framework The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and formats, of whi ...
serialization In computing, serialization (or serialisation, also referred to as pickling in Python (programming language), Python) is the process of translating a data structure or object (computer science), object state into a format that can be stored (e. ...
formats such as
RDFa RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within web documents. The Resource Descript ...
,
RDF/XML RDF/XML is a syntax,RDF/XML Syntax Specification
N3,
Turtle Turtles are reptiles of the order (biology), order Testudines, characterized by a special turtle shell, shell developed mainly from their ribs. Modern turtles are divided into two major groups, the Pleurodira (side necked turtles) and Crypt ...
, or
JSON-LD JSON-LD (JavaScript Object Notation for Linked Data) is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD. JSON-LD allows data ...
* Linked Data Platform * CSV-W


Linked open data

Linked open data are linked data that are
open data Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license. The goals of the open data movement are similar to those of other "open(-so ...
. Tim Berners-Lee gives the clearest definition of linked open data as differentiated from linked data. Large linked open data sets include
DBpedia DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia a ...
,
Wikibase Wikibase is a set of software tools for working with versioned semi-structured data in a central repository. It is based upon JSON instead of the unstructured data of wikitext normally used in MediaWiki. It stores and organizes information that ...
,
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
and .


5-star linked open data

In 2010,
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web, the HTML markup language, the URL system, and HTTP. He is a professorial research fellow a ...
suggested a 5-star scheme for grading the quality of open data on the web, for which the highest ranking is Linked Open Data: * 1 star: data is openly available in some format. * 2 stars: data is available in a structured format, such as
Microsoft Excel file format Microsoft Excel is a spreadsheet editor developed by Microsoft for Microsoft Windows, Windows, macOS, Android (operating system), Android, iOS and iPadOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a ...
(.xls). * 3 stars: data is available in a non-proprietary structured format, such as
Comma-separated values Comma-separated values (CSV) is a text file format that uses commas to separate values, and newlines to separate records. A CSV file stores Table (information), tabular data (numbers and text) in plain text, where each line of the file typically r ...
(.csv). * 4 stars: data follows
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
standards, like using RDF and employing URIs. * 5 stars: all of the others, plus links to other Linked Open Data sources.


History

The term "linked open data" has been in use since at least February 2007, when the "Linking Open Data" mailing list was created. The mailing list was initially hosted by the
SIMILE A simile () is a type of figure of speech that directly ''compares'' two things. Similes are often contrasted with metaphors, where similes necessarily compare two things using words such as "like", "as", while metaphors often create an implicit c ...
project at the
Massachusetts Institute of Technology The Massachusetts Institute of Technology (MIT) is a Private university, private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of moder ...
.


Linking Open Data community project

The goal of the W3C Semantic Web Education and Outreach group's Linking Open Data community project is to extend the Web with a data commons by publishing various
open Open or OPEN may refer to: Music * Open (band), Australian pop/rock band * The Open (band), English indie rock band * ''Open'' (Blues Image album), 1969 * ''Open'' (Gerd Dudek, Buschi Niebergall, and Edward Vesala album), 1979 * ''Open'' (Go ...
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record o ...
s as RDF on the Web and by setting RDF links between data items from different data sources. In October 2007, datasets consisted of over two billion RDF
triples TripleS (; ; stylized as tripleS) is a South Korean 24-member multinational girl group formed by Modhaus. They aim to be the world's first decentralized idol group, where the members will rotate between the full group, sub-units, and solo activi ...
, which were interlinked by over two million RDF links. By September 2011 this had grown to 31 billion RDF triples, interlinked by around 504 million RDF links. A detailed statistical breakdown was published in 2014.


European Union projects

There are a number of
European Union The European Union (EU) is a supranational union, supranational political union, political and economic union of Member state of the European Union, member states that are Geography of the European Union, located primarily in Europe. The u ...
projects involving linked data. These include the linked open data around the clock (LATC) project, the AKN4EU project for machine-readable legislative data, the PlanetData project, the DaPaaS (Data-and-Platform-as-a-Service) project, and the Linked Open Data 2 (LOD2) project. Data linking is one of the main goals of the
EU Open Data Portal Before data.europa.eu, the EU Open Data Portal was the point of access to public data published by the EU institutions, agencies and other bodies. On April 21, 2021 it was consolidated to the data.europa.eu portal, together with the European Da ...
, which makes available thousands of datasets for anyone to reuse and link.


Ontologies

Ontologies In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
are formal descriptions of data structures. Some of the better known ontologies are: * FOAF – an ontology describing persons, their properties and relationships *
UMBEL UMBEL (Upper Mapping and Binding Exchange Layer) is a logically organized knowledge graph of 34,000 concepts and entity types that can be used in information science for relating information from disparate sources to one another. It was retired ...
– a lightweight reference structure of subject concept classes and their relationships derived from OpenCyc, which can act as binding classes to external data; also has links to 1.5 million named entities from DBpedia and YAGO


Datasets

*
DBpedia DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia a ...
– a dataset containing extracted data from Wikipedia; it contains about 3.4 million concepts described by 1 billion
triples TripleS (; ; stylized as tripleS) is a South Korean 24-member multinational girl group formed by Modhaus. They aim to be the world's first decentralized idol group, where the members will rotate between the full group, sub-units, and solo activi ...
, including abstracts in 11 different languages *
GeoNames GeoNames (or GeoNames.org) is a user-editable geographical database available and accessible through various web services, under a Creative Commons attribution license. The project was founded in late 2005. The GeoNames dataset differs from, b ...
– provides RDF descriptions of more than geographical features worldwide *
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
– a collaboratively-created linked dataset that acts as central storage for the structured data of its
Wikimedia Foundation The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as foundation (United States law), a charitable foundation. It is the host of Wikipedia, th ...
sibling projects *
Global Research Identifier Database Global Research Identifier Database (GRID) is a database of educational and research organizations worldwide, created and maintained by Digital Science & Research Solutions Ltd., part of the technology company Digital Science. In 2021 public rele ...
(''GRID'') – an international database of institutions engaged in academic research, with relationships. GRID models two types of relationships: a parent-child relationship that defines a subordinate association, and a related relationship that describes other associations * KnowWhereGraph – an integrated 12 billion triples strong
knowledge graph In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
of 30 data layers at the intersection between humans and their environment using Semantic Web and Linked Data technologies. * - a
multilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. When the languages are just two, it is usually called bilingualism. It is believed that multilingual speakers outnumber monolin ...
open catalogue containing product
datasheets A datasheet, data sheet, or spec sheet is a document that summarizes the performance and other characteristics of a product, machine, component (e.g., an electronic component), material, subsystem (e.g., a power supply), or software in sufficie ...
, related
digital assets A digital asset is anything that exists only in digital form and comes with a distinct usage right or distinct permission for use. Data that do not possess those rights are not considered assets. ''Digital assets'' include, but are not limited t ...
and usage
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
.


Dataset instance and class relationships

Clickable diagrams that show the individual datasets and their relationships within the DBpedia-spawned LOD cloud (as by the figures to the right) are available.


See also

* American Art Collaborative - consortium of US art museums committed to establishing a critical mass of linked open data on American art *
Authority control In information science, authority control is a process that organizes information, for example in library catalogs, by using a single, distinct spelling of a name (heading) or an identifier (generally persistent and alphanumeric) for each top ...
– about ''controlled headings'' in library catalogs *
Citation analysis Citation analysis is the examination of the frequency, patterns, and graphs of citations in documents. It uses the directed graph of citationslinks from one document to another documentto reveal properties of the documents. A typical aim would b ...
– for citations between scholarly articles *
data.gov.uk data.gov.uk is a UK Government project to make available non-personal UK government data as open data. It was launched as closed beta in , and publicly launched in . As of February 2015, it contained over 19,343 datasets, rising to over 40,000 ...
* Hyperdata *
Network model In computing, the network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship ty ...
– an older type of database management system * Schema.org *
VoID Void may refer to: Science, engineering, and technology * Void (astronomy), the spaces between galaxy filaments that contain no galaxies * Void (composites), a pore that remains unoccupied in a composite material * Void, synonym for vacuum, a s ...
– Vocabulary of Interlinked Datasets *
Web Ontology Language The Web Ontology Language (OWL) is a family of Knowledge representation and reasoning, knowledge representation languages for authoring Ontology (information science), ontologies. Ontologies are a formal way to describe Taxonomy, taxonomies and ...
*
List of datasets for machine-learning research These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learni ...


References


Further reading

* Ahmet Soylu, Felix Mödritscher, and Patrick De Causmaecker. 2012
"Ubiquitous Web Navigation through Harvesting Embedded Semantic Data: A Mobile Scenario."
Integrated Computer-Aided Engineering 19 (1): 93–109. *
Linked Data: Evolving the Web into a Global Data Space
' (2011) by Tom Heath and Christian Bizer, Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan & Claypool
How to Publish Linked Data on the Web
, by Chris Bizer, Richard Cyganiak and Tom Heath, Linked Data Tutorial at Freie Universität Berlin, Germany, 27 July 2007.
The Web Turns 20: Linked Data Gives People Power
part 1 of 4, by Mark Fischetti, ''
Scientific American ''Scientific American'', informally abbreviated ''SciAm'' or sometimes ''SA'', is an American popular science magazine. Many scientists, including Albert Einstein and Nikola Tesla, have contributed articles to it, with more than 150 Nobel Pri ...
'' 2010 October 23
Linked Data Is Merely More Data
– Prateek Jain, Pascal Hitzler, Peter Z. Yeh, Kunal Verma, and Amit P. Sheth. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness: ''Linked Data Meets Artificial Intelligence''. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp. 82–86.
Moving beyond sameAs with PLATO: Partonomy detection for Linked Data
– Prateek Jain, Pascal Hitzler, Kunal Verma, Peter Z. Yeh, Amit Sheth. In: Proceedings of the 23rd ACM Hypertext and Social Media conference (HT 2012), Milwaukee, WI, USA, June 25–28, 2012. * Freitas, André, Edward Curry, João Gabriel Oliveira, and Sean O’Riain. 2012
"Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends."
IEEE Internet Computing 16 (1): 24–33.
Interlinking Open Data on the Web
– Chris Bizer, Tom Heath, Danny Ayers, Yves Raimond. In Proceedings Poster Track, ESWC2007, Innsbruck, Austria
Ontology Alignment for Linked Open Data
– Prateek Jain, Pascal Hitzler, Amit Sheth, Kunal Verma, Peter Z. Yeh. In proceedings of the 9th International Semantic Web Conference, ISWC 2010, Shanghai, China
Linked open drug data for pharmaceutical research and development
- J Cheminform. 2011; 3: 19. Samwald, Jentzsch, Bouton, Kallesøe, Willighagen, Hajagos, Marshall, Prud'hommeaux, Hassenzadeh, Pichler, and Stephens (May 2011)
Interview with Sören Auer, head of the LOD2 project about the continuation of LOD2 in 2011
June 2011
Linked Open Data: The Essentials
- Florian Bauer and Martin Kaltenböck (January 2012)
The Flap of a Butterfly Wing
- semanticweb.com Richard Wallis (February 2012)


External links


LinkedData
at the W3C Wiki
LinkedData.org

OpenLink Software white papers
{{Authority control Computer-related introductions in 2007 Cloud standards Data management Distributed computing architecture Hypermedia Internet terminology Open data Semantic Web