HOME

TheInfoList



OR:

Wikidata is a collaboratively edited multilingual
knowledge graph In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
hosted by the
Wikimedia Foundation The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as foundation (United States law), a charitable foundation. It is the host of Wikipedia, th ...
. It is a common source of
open data Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license. The goals of the open data movement are similar to those of other "open(-so ...
that Wikimedia projects such as
Wikipedia Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
, and anyone else, are able to use under the CC0
public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
license. Wikidata is a wiki powered by the software
MediaWiki MediaWiki is free and open-source wiki software originally developed by Magnus Manske for use on Wikipedia on January 25, 2002, and further improved by Lee Daniel Crocker,mailarchive:wikipedia-l/2001-August/000382.html, Magnus Manske's announc ...
, including its extension for semi-structured data, the Wikibase. As of early 2025, Wikidata had 1.65 billion item statements ( semantic triples).


Concept

Wikidata is a
document-oriented database A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one ...
, focusing on ''items'', which represent any kind of topic, concept, or object. Each item is allocated a unique
persistent identifier A persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other object. The term "persistent identifier" is usually used in the context of digital objects that are accessible over the Internet. Typically, s ...
called its ''QID'', a positive integer prefixed with the upper-case letter "Q". This makes it possible to provide translations of the basic information describing the topic each item covers without favouring any particular language. Some examples of items and their QIDs are , , , , and . Item ''labels'' do not need to be unique. For example, there are two items named "Elvis Presley": , which represents the American singer and actor, and , which represents his self-titled album. However, the combination of a label and its ''description'' must be unique. To avoid ambiguity, an item's QID is hence linked to this combination.


Main parts

Fundamentally, an item consists of: * An
identifier An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
(the QID), related to a label and a description. * Optionally, multiple aliases and some number of statements (and their properties and values).


Statements

''Statements'' are how any information known about an item is recorded in Wikidata. Formally, they consist of key–value pairs, which match a ''property'' (such as "author", or "publication date") with one or more entity ''values'' (such as " Sir Arthur Conan Doyle" or "1902"). For example, the informal English statement "milk is white" would be encoded by a statement pairing the property with the value under the item . Statements may map a property to more than one value. For example, the "occupation" property for
Marie Curie Maria Salomea Skłodowska-Curie (; ; 7 November 1867 – 4 July 1934), known simply as Marie Curie ( ; ), was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity. She was List of female ...
could be linked with the values "physicist" and "chemist", to reflect the fact that she engaged in both occupations. Values may take on many types including other Wikidata items, strings, numbers, or media files. Properties prescribe what types of values they may be paired with. For example, the property may only be paired with values of type "URL". Optionally, ''qualifiers'' can be used to refine the meaning of a statement by providing additional information. For example, a "population" statement could be modified with a qualifier such as "point in time (P585): 2011" (as its own key-value pair). Values in the statements may also be annotated with ''references'', pointing to a source backing up the statement's content. As with statements, all qualifiers and references are property–value pairs.


Properties

Each property has a numeric identifier prefixed with a capital P and a page on Wikidata with optional label, description, aliases, and statements. As such, there are properties with the sole purpose of describing other properties, such as . Properties may also define more complex rules about their intended usage, termed ''constraints''. For example, the property includes a "single value constraint", reflecting the reality that (typically) territories have only one capital city. Constraints are treated as testing alerts and hints, rather than inviolable rules. Before a new property is created, it needs to undergo a discussion process. The most used property is , which is used on more than item pages


Lexemes

In
linguistics Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
, a
lexeme A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms ta ...
is a unit of lexical meaning representing a group of words that share the same core meaning and grammatical characteristics. Similarly, Wikidata's ''lexemes'' are items with a structure that makes them more suitable to store lexicographical data. Since 2016, Wikidata has supported lexicographical entries in the form of lexemes. In Wikidata, lexicographical entries have a different identifier from regular item entries. These entries are prefixed with the letter L, such as in the example entries for
book A book is a structured presentation of recorded information, primarily verbal and graphical, through a medium. Originally physical, electronic books and audiobooks are now existent. Physical books are objects that contain printed material, ...
and cow. Lexicographical entries in Wikidata can contain statements, senses, and forms. The use of lexicographical entries in Wikidata allows for the documentation of word usage, the connection between words and items on Wikidata, word translations, and enables machine-readable lexicographical data. In 2020, lexicographical entries on Wikidata exceeded 250,000. The language with the most lexicographical entries was Russian, with a total of 101,137 lexemes, followed by English with 38,122 lexemes. There are over 668 languages with lexicographical entries on Wikidata.


Entity schemas

In Wikidata, a schema is a data model that outlines the necessary attributes for a data item. For instance, a data item that uses the attribute " instance of" with the value "
human Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
" would typically include attributes such as " place of birth," "
date of birth A birthday is the anniversary of the birth of a person or figuratively of an institution. Birthdays of people are celebrated in numerous cultures, often with birthday gifts, birthday cards, a birthday party, or a rite of passage. Many religi ...
," "date of death," and " place of death." The entity schema in Wikidata utilizes Shape Expression (ShEx) to describe the data in Wikidata items in the form of a
Resource Description Framework The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and formats, of whi ...
(RDF). The use of entity schemas in Wikidata helps address data inconsistencies and unchecked vandalism. In January 2019, development started of a new extension for MediaWiki to enable storing ShEx in a separate namespace. Entity schemas are stored with different identifiers than those used for items, properties, and lexemes. Entity schemas are stored with an "E" identifier, such as E10 for the entity schema of human data instances and E270 for the entity schema of building data instances. This extension has since been installed on Wikidata and enables contributors to use ShEx for validating and describing Resource Description Framework data in items and lexemes. Any item or lexeme on Wikidata can be validated against an entity schema, and this makes it an important tool for quality assurance.


Content

Wikidata's content collections include data for biographies, medicine, digital humanities, scholarly metadata through the WikiCite project. It includes data collections from other open projects including Freebase (database).


Development

The creation of the project was funded by donations from the Allen Institute for AI, the Gordon and Betty Moore Foundation, and Google, Inc., totaling 1.3 million. The development of the project is mainly driven by Wikimedia Deutschland under the management of Lydia Pintscher, and was originally split into three phases: # Centralising interlanguage links – links between Wikipedia articles about the same topic in different languages. # Providing a central place for
infobox An infobox is a digital or physical Table (information), table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia r ...
data for all Wikipedias. # Creating and updating list articles based on data in Wikidata and linking to other Wikimedia sister projects, including Meta-Wiki and the own Wikidata (interwikilinks).


Initial rollout

Wikidata was launched on 29 October 2012 and was the first new project of the Wikimedia Foundation since 2006.Wikidata
()
At this time, only the centralization of language links was available. This enabled items to be created and filled with basic information: a label – a name or title, aliases – alternative terms for the label, a description, and links to articles about the topic in all the various language editions of Wikipedia (interwikipedia links). Historically, a Wikipedia article would include a list of interlanguage links (links to articles on the same topic in other editions of Wikipedia, if they existed). Wikidata was originally a self-contained repository of interlanguage links. Wikipedia language editions were still not able to access Wikidata, so they needed to continue to maintain their own lists of interlanguage links. On 14 January 2013, the Hungarian Wikipedia became the first to enable the provision of interlanguage links via Wikidata. This functionality was extended to the
Hebrew Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
and Italian Wikipedias on 30 January, to the
English Wikipedia The English Wikipedia is the primary English-language edition of Wikipedia, an online encyclopedia. It was created by Jimmy Wales and Larry Sanger on 15 January 2001, as Wikipedia's first edition. English Wikipedia is hosted alongside o ...
on 13 February and to all other Wikipedias on 6 March. After no consensus was reached over a proposal to restrict the removal of language links from the English Wikipedia, they were automatically removed by bots. On 23 September 2013, interlanguage links went live on Wikimedia Commons.


Statements and data access

On 4 February 2013, statements were introduced to Wikidata entries. The possible values for properties were initially limited to two data types (items and images on Wikimedia Commons), with more
data type In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
s (such as
coordinates In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the Position (geometry), position of the Point (geometry), points or other geometric elements on a manifold such as ...
and dates) to follow later. The first new type, string, was deployed on 6 March. The ability for the various language editions of Wikipedia to access data from Wikidata was rolled out progressively between 27 March and 25 April 2013. On 16 September 2015, Wikidata began allowing so-called ''arbitrary access'', or access from a given article of a Wikipedia to the statements on Wikidata items not directly connected to it. For example, it became possible to read data about Germany from the Berlin article, which was not feasible before. On 27 April 2016, arbitrary access was activated on Wikimedia Commons. According to a 2020 study, a large proportion of the data on Wikidata consists of entries imported en masse from other databases by Internet bots, which helps to "break down the walls" of data silos.


Query service and other improvements

On 7 September 2015, the
Wikimedia Foundation The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as foundation (United States law), a charitable foundation. It is the host of Wikipedia, th ...
announced the release of the Wikidata Query Service, which lets users run queries on the data contained in Wikidata. The service uses SPARQL as the query language. As of November 2018, there are at least 26 different tools that allow querying the data in different ways. It uses Blazegraph as its
triplestore A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject– predicate– object, like "Bob is 35" (i.e., Bob's age measured in years i ...
and
graph database A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the dat ...
. In 2021, Wikimedia Deutschland released the Query Builder, "a form-based query builder to allow people who don't know how to use SPARQL" to write a query.


Logo

The bars on the
logo A logo (abbreviation of logotype; ) is a graphic mark, emblem, or symbol used to aid and promote public identification and recognition. It may be of an abstract or figurative design or include the text of the name that it represents, as in ...
contain the word "WIKI" encoded in
Morse code Morse code is a telecommunications method which Character encoding, encodes Written language, text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code i ...
. It was created by Arun Ganesh and selected through community decision.


Reception

In November 2014, Wikidata received the Open Data Publisher Award from the Open Data Institute "for sheer scale, and built-in openness". In December 2014, Google announced that it would shut down Freebase in favor of Wikidata. , Wikidata information was used in 58.4% of all English Wikipedia articles, mostly for external identifiers or coordinate locations. In aggregate, data from Wikidata is shown in 64% of all Wikipedias' pages, 93% of all
Wikivoyage Wikivoyage is a free web-based travel guide for travel destinations and travel topics written by volunteer authors. It is a sister project of Wikipedia and supported and hosted by the same non-profit Wikimedia Foundation (WMF). Wikivoyage has ...
articles, 34% of all
Wikiquote is part of a family of wiki-based projects run by the Wikimedia Foundation using MediaWiki software. The project's objective is to collaboratively produce a vast reference of quotations from prominent people, books, films, proverbs, etc. and ...
s', 32% of all
Wikisource Wikisource is an online wiki-based digital library of free-content source text, textual sources operated by the Wikimedia Foundation. Wikisource is the name of the project as a whole; it is also the name for each instance of that project, one f ...
s', and 27% of
Wikimedia Commons Wikimedia Commons, or simply Commons, is a wiki-based Digital library, media repository of Open content, free-to-use images, sounds, videos and other media. It is a project of the Wikimedia Foundation. Files from Wikimedia Commons can be used ...
. , Wikidata's data was visualized by at least 20 other external tools and over 300 papers have been published about Wikidata.


Applications

* Wikidata's structured dataset has been used by
virtual assistant A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to streaml ...
s such as Apple's
Siri Siri ( , backronym: Speech Interpretation and Recognition Interface) is a digital assistant purchased, developed, and popularized by Apple Inc., which is included in the iOS, iPadOS, watchOS, macOS, Apple TV, audioOS, and visionOS operating sys ...
and
Amazon Alexa Amazon Alexa is a virtual assistant technology marketed by Amazon and implemented in software applications for smart phones, tablets, wireless smart speakers, and other electronic appliances. Alexa was largely developed from a Polish speech s ...
. * Mwnci extension can import data from Wikidata to
LibreOffice Calc LibreOffice Calc is the spreadsheet component of the LibreOffice suite. After forking from OpenOffice.org in 2010, LibreOffice Calc underwent a massive re-work of external reference handling to fix many defects in formula calculations involvi ...
spreadsheets * KDE Itinerary – a privacy conscious open source travel assistant that uses data from Wikidata *
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
originally started a frame semantic parser project that aims to parse the information on
Wikipedia Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
and transfer it into Wikidata by coming up with relevant statements using
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
. * MathQA – a mathematical question answering system A systematic literature review of the uses of Wikidata in research was carried out in 2019.


See also

* Abstract Wikipedia * BabelNet *
DBpedia DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia a ...
* Semantic MediaWiki * Wikibase * Wikimedia Enterprise


Notes


References


Further reading

* * Claudia Müller-Birn, Benjamin Karran, Janette Lehmann, Markus Luczak-Rösch:
Peer-production system or collaborative ontology development effort: What is Wikidata?
' In, OpenSym 2015 – Conference on Open Collaboration, San Francisco, US, 19 – 21 Aug 2015 (preprint).


External links

* * Videos
WikidataCon
on ''media.ccc.de''
Wikidata Query Builder
{{Authority control Knowledge graphs Wikimedia projects Lexical databases Online databases Community websites Creative Commons-licensed websites Open data Internet properties established in 2012 Articles containing video clips