Wikidata is a
collaboratively edited multilingual
knowledge graph
In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
hosted by the
Wikimedia Foundation
The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as foundation (United States law), a charitable foundation. It is the host of Wikipedia, th ...
. It is a common source of
open data
Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license.
The goals of the open data movement are similar to those of other "open(-so ...
that Wikimedia projects such as
Wikipedia
Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
,
and anyone else, are able to use under the
CC0 public domain
The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
license. Wikidata is a wiki powered by the software
MediaWiki
MediaWiki is free and open-source wiki software originally developed by Magnus Manske for use on Wikipedia on January 25, 2002, and further improved by Lee Daniel Crocker,mailarchive:wikipedia-l/2001-August/000382.html, Magnus Manske's announc ...
, including its extension for
semi-structured data, the
Wikibase. As of early 2025, Wikidata had 1.65 billion item statements (
semantic triples).
Concept

Wikidata is a
document-oriented database
A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.
Document-oriented databases are one ...
, focusing on ''items'', which represent any kind of topic, concept, or object. Each item is allocated a unique
persistent identifier
A persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other object.
The term "persistent identifier" is usually used in the context of digital objects that are accessible over the Internet. Typically, s ...
called its ''QID'', a positive integer prefixed with the upper-case letter "Q". This makes it possible to provide translations of the basic information describing the topic each item covers without favouring any particular language.
Some examples of items and their QIDs are , , , , and .
Item ''labels'' do not need to be unique. For example, there are two items named "Elvis Presley": , which represents
the American singer and actor, and , which represents his
self-titled album. However, the combination of a label and its ''description'' must be unique. To avoid ambiguity, an item's QID is hence linked to this combination.
Main parts

Fundamentally, an item consists of:
* An
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
(the QID), related to a label and a description.
* Optionally, multiple aliases and some number of statements (and their properties and values).
Statements
''Statements'' are how any information known about an item is recorded in Wikidata. Formally, they consist of
key–value pairs, which match a ''property'' (such as "author", or "publication date") with one or more entity ''values'' (such as "
Sir Arthur Conan Doyle" or "1902"). For example, the informal English statement "milk is white" would be encoded by a statement pairing the property with the value under the item .
Statements may map a property to more than one value. For example, the "occupation" property for
Marie Curie
Maria Salomea Skłodowska-Curie (; ; 7 November 1867 – 4 July 1934), known simply as Marie Curie ( ; ), was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was List of female ...
could be linked with the values "physicist" and "chemist", to reflect the fact that she engaged in both occupations.
Values may take on many types including other Wikidata items, strings, numbers, or media files. Properties prescribe what types of values they may be paired with. For example, the property may only be paired with values of type "URL".
Optionally, ''qualifiers'' can be used to refine the meaning of a statement by providing additional information. For example, a "population" statement could be modified with a qualifier such as "point in time (P585): 2011" (as its own key-value pair). Values in the statements may also be annotated with ''references'', pointing to a source backing up the statement's content. As with statements, all qualifiers and references are property–value pairs.
Properties

Each property has a numeric identifier prefixed with a capital P and a page on Wikidata with optional label, description, aliases, and statements. As such, there are properties with the sole purpose of describing other properties, such as .
Properties may also define more complex rules about their intended usage, termed ''constraints''. For example, the property includes a "single value constraint", reflecting the reality that (typically) territories have only one capital city. Constraints are treated as testing alerts and hints, rather than inviolable rules.
Before a new property is created, it needs to undergo a discussion process.
The most used property is , which is used on more than item pages
Lexemes
In
linguistics
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
, a
lexeme
A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms ta ...
is a unit of
lexical meaning representing a group of words that share the same core meaning and grammatical characteristics. Similarly, Wikidata's ''lexemes'' are items with a structure that makes them more suitable to store
lexicographical data. Since 2016, Wikidata has supported lexicographical entries in the form of lexemes.
In Wikidata, lexicographical entries have a different identifier from regular item entries. These entries are prefixed with the letter L, such as in the example entries for
book
A book is a structured presentation of recorded information, primarily verbal and graphical, through a medium. Originally physical, electronic books and audiobooks are now existent. Physical books are objects that contain printed material, ...
and
cow. Lexicographical entries in Wikidata can contain statements, senses, and forms. The use of lexicographical entries in Wikidata allows for the documentation of word usage, the connection between words and items on Wikidata, word translations, and enables machine-readable lexicographical data.
In 2020, lexicographical entries on Wikidata exceeded 250,000. The language with the most lexicographical entries was
Russian, with a total of 101,137 lexemes, followed by
English with 38,122 lexemes. There are over 668 languages with lexicographical entries on Wikidata.
Entity schemas

In Wikidata, a schema is a data model that outlines the necessary attributes for a data item.
For instance, a data item that uses the attribute "
instance of" with the value "
human
Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
" would typically include attributes such as "
place of birth," "
date of birth
A birthday is the anniversary of the birth of a person or figuratively of an institution. Birthdays of people are celebrated in numerous cultures, often with birthday gifts, birthday cards, a birthday party, or a rite of passage.
Many religi ...
,"
"date of death," and "
place of death." The entity schema in Wikidata utilizes
Shape Expression (ShEx) to describe the data in Wikidata items in the form of a
Resource Description Framework
The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and formats, of whi ...
(RDF). The use of entity schemas in Wikidata helps address data inconsistencies and unchecked vandalism.
In January 2019, development started of a new extension for MediaWiki to enable storing ShEx in a separate namespace. Entity schemas are stored with different identifiers than those used for items, properties, and lexemes. Entity schemas are stored with an "E" identifier, such as
E10 for the entity schema of human data instances and
E270 for the entity schema of building data instances. This extension has since been installed on Wikidata and enables contributors to use ShEx for validating and describing Resource Description Framework data in items and lexemes. Any item or lexeme on Wikidata can be validated against an entity schema, and this makes it an important tool for quality assurance.
Content

Wikidata's content collections include data for biographies, medicine, digital humanities, scholarly metadata through the WikiCite project.
It includes data collections from other open projects including
Freebase (database).
Development
The creation of the project was funded by donations from the
Allen Institute for AI, the
Gordon and Betty Moore Foundation, and
Google, Inc., totaling
€1.3 million. The development of the project is mainly driven by
Wikimedia Deutschland under the management of
Lydia Pintscher, and was originally split into three phases:
# Centralising interlanguage links – links between Wikipedia articles about the same topic in different languages.
# Providing a central place for
infobox
An infobox is a digital or physical Table (information), table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia r ...
data for all Wikipedias.
# Creating and updating list articles based on data in Wikidata and linking to other Wikimedia sister projects, including
Meta-Wiki and the own Wikidata (interwikilinks).
Initial rollout
Wikidata was launched on 29 October 2012 and was the first new project of the Wikimedia Foundation since 2006.
[Wikidata](_blank)
() At this time, only the centralization of language links was available. This enabled items to be created and filled with basic information: a label – a name or title, aliases – alternative terms for the label, a description, and links to articles about the topic in all the various language editions of Wikipedia (interwikipedia links).
Historically, a Wikipedia article would include a list of interlanguage links (links to articles on the same topic in other editions of Wikipedia, if they existed). Wikidata was originally a self-contained
repository of interlanguage links. Wikipedia language editions were still not able to access Wikidata, so they needed to continue to maintain their own lists of interlanguage links.
On 14 January 2013, the
Hungarian Wikipedia became the first to enable the provision of interlanguage links via Wikidata. This functionality was extended to the
Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
and
Italian Wikipedias on 30 January, to the
English Wikipedia
The English Wikipedia is the primary English-language edition of Wikipedia, an online encyclopedia. It was created by Jimmy Wales and Larry Sanger on 15 January 2001, as Wikipedia's first edition.
English Wikipedia is hosted alongside o ...
on 13 February and to all other Wikipedias on 6 March.
After no consensus was reached over a proposal to restrict the removal of language links from the English Wikipedia, they were automatically removed by
bots. On 23 September 2013, interlanguage links went live on Wikimedia Commons.
Statements and data access
On 4 February 2013, statements were introduced to Wikidata entries. The possible values for properties were initially limited to two data types (items and images on Wikimedia Commons), with more
data type
In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
s (such as
coordinates
In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the Position (geometry), position of the Point (geometry), points or other geometric elements on a manifold such as ...
and dates) to follow later. The first new type, string, was deployed on 6 March.
The ability for the various language editions of Wikipedia to access data from Wikidata was rolled out progressively between 27 March and 25 April 2013. On 16 September 2015, Wikidata began allowing so-called ''arbitrary access'', or access from a given article of a Wikipedia to the statements on Wikidata items not directly connected to it. For example, it became possible to read data about Germany from the Berlin article, which was not feasible before. On 27 April 2016, arbitrary access was activated on Wikimedia Commons.
According to a 2020 study, a large proportion of the data on Wikidata consists of entries imported en masse from other databases by
Internet bots, which helps to "break down the walls" of
data silos.
Query service and other improvements
On 7 September 2015, the
Wikimedia Foundation
The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered there as foundation (United States law), a charitable foundation. It is the host of Wikipedia, th ...
announced the release of the Wikidata Query Service, which lets users run queries on the data contained in Wikidata. The service uses
SPARQL as the query language. As of November 2018, there are at least 26 different tools that allow querying the data in different ways. It uses
Blazegraph as its
triplestore
A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject– predicate– object, like "Bob is 35" (i.e., Bob's age measured in years i ...
and
graph database
A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the dat ...
.
In 2021,
Wikimedia Deutschland released the Query Builder, "a form-based query builder to allow people who don't know how to use SPARQL" to write a query.
Logo
The bars on the
logo
A logo (abbreviation of logotype; ) is a graphic mark, emblem, or symbol used to aid and promote public identification and recognition. It may be of an abstract or figurative design or include the text of the name that it represents, as in ...
contain the word "WIKI" encoded in
Morse code
Morse code is a telecommunications method which Character encoding, encodes Written language, text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code i ...
. It was created by Arun Ganesh and selected through community decision.
Reception
In November 2014, Wikidata received the Open Data Publisher Award from the
Open Data Institute "for sheer scale, and built-in openness".
In December 2014, Google announced that it would shut down
Freebase in favor of Wikidata.
, Wikidata information was used in 58.4% of all English Wikipedia articles, mostly for external identifiers or coordinate locations. In aggregate, data from Wikidata is shown in 64% of all
Wikipedias' pages, 93% of all
Wikivoyage
Wikivoyage is a free web-based travel guide for travel destinations and travel topics written by volunteer authors. It is a sister project of Wikipedia and supported and hosted by the same non-profit Wikimedia Foundation (WMF). Wikivoyage has ...
articles, 34% of all
Wikiquote
is part of a family of wiki-based projects run by the Wikimedia Foundation using MediaWiki software. The project's objective is to collaboratively produce a vast reference of quotations from prominent people, books, films, proverbs, etc. and ...
s', 32% of all
Wikisource
Wikisource is an online wiki-based digital library of free-content source text, textual sources operated by the Wikimedia Foundation. Wikisource is the name of the project as a whole; it is also the name for each instance of that project, one f ...
s', and 27% of
Wikimedia Commons
Wikimedia Commons, or simply Commons, is a wiki-based Digital library, media repository of Open content, free-to-use images, sounds, videos and other media. It is a project of the Wikimedia Foundation.
Files from Wikimedia Commons can be used ...
.
, Wikidata's data was visualized by at least 20 other external tools and over 300 papers have been published about Wikidata.
Applications
* Wikidata's structured dataset has been used by
virtual assistant
A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to streaml ...
s such as Apple's
Siri
Siri ( , backronym: Speech Interpretation and Recognition Interface) is a digital assistant purchased, developed, and popularized by Apple Inc., which is included in the iOS, iPadOS, watchOS, macOS, Apple TV, audioOS, and visionOS operating sys ...
and
Amazon Alexa
Amazon Alexa is a virtual assistant technology marketed by Amazon and implemented in software applications for smart phones, tablets, wireless smart speakers, and other electronic appliances.
Alexa was largely developed from a Polish speech s ...
.
* Mwnci extension can import data from Wikidata to
LibreOffice Calc
LibreOffice Calc is the spreadsheet component of the LibreOffice suite.
After forking from OpenOffice.org in 2010, LibreOffice Calc underwent a massive re-work of external reference handling to fix many defects in formula calculations involvi ...
spreadsheets
* KDE Itinerary – a privacy conscious open source travel assistant that uses data from Wikidata
*
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
originally started a
frame semantic parser project that aims to parse the information on
Wikipedia
Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
and transfer it into Wikidata by coming up with relevant statements using
artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
.
* MathQA – a mathematical
question answering system
A systematic literature review of the uses of Wikidata in research was carried out in 2019.
See also
*
Abstract Wikipedia
*
BabelNet
*
DBpedia
DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia a ...
*
Semantic MediaWiki
*
Wikibase
*
Wikimedia Enterprise
Notes
References
Further reading
*
* Claudia Müller-Birn, Benjamin Karran, Janette Lehmann, Markus Luczak-Rösch:
Peer-production system or collaborative ontology development effort: What is Wikidata?' In, OpenSym 2015 – Conference on Open Collaboration, San Francisco, US, 19 – 21 Aug 2015 (preprint).
External links
*
* Videos
WikidataConon ''media.ccc.de''
Wikidata Query Builder
{{Authority control
Knowledge graphs
Wikimedia projects
Lexical databases
Online databases
Community websites
Creative Commons-licensed websites
Open data
Internet properties established in 2012
Articles containing video clips