SPARQL (pronounced " sparkle", a

recursive acronym A recursive acronym is an acronym that refers to itself, and appears most frequently in computer programming. The term was first used in print in 1979 in Douglas Hofstadter's book '' Gödel, Escher, Bach: An Eternal Golden Braid'', in which Hofs ...

for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a

semantic Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...

query language A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve informa ...

for

database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...

s—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the ''RDF Data Access Working Group'' (DAWG) of the

World Wide Web Consortium The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...

, and is recognized as one of the key technologies of the

semantic web The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding o ...

. On 15 January 2008, SPARQL 1.0 was acknowledged by

W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...

as an official recommendation, and SPARQL 1.1 in March, 2013. Screenshot Wikidata Query Service October 2021

SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional

pattern A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated l ...

s. Implementations for multiple

programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...

s exist. There exist tools that allow one to connect and semi-automatically construct a SPARQL query for a SPARQL endpoint, for example ViziQuer. In addition, tools exist to translate SPARQL queries to other query languages, for example to

SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...

and to

XQuery XQuery (XML Query) is a query language and functional programming language designed to query and transform collections of structured and unstructured data, primarily in the form of XML. It also supports text data and, through implementation-sp ...

Features

SPARQL allows users to write queries that follow the RDF specification of the

. Thus, the entire dataset is "subject-predicate-object" triples. Subjects and predicates are always URI identifiers, but objects can be URIs or literal values. This single physical schema of 3 "columns" is hypernormalized in that what would be 1 relational record with (for example) 4 columns is now 4 triples with the subject being repeated over and over, the predicate essentially being the column name, and the object being the column value. Although this seems unwieldy, the SPARQL syntax offers these features: 1. Subjects and Objects can be used to find the other including transitively. Below is a set of triples. It should be clear that ex:sw001 and ex:sw002 link to ex:sw003, which itself has links: ex:sw001 ex:linksWith ex:sw003 . ex:sw002 ex:linksWith ex:sw003 . ex:sw003 ex:linksWith ex:sw004 , ex:sw006 . ex:sw004 ex:linksWith ex:sw005 . In SPARQL, the first time a variable is encountered in the expression pipeline, it is populated with result. The second and subsequent times it is seen, it is used as an input. If we assign ("bind") the URI ex:sw003 to the ?targets variable, then it drives a result into ?src; this tells us all the things that link ''to'' ex:sw003 (upstream dependency): SELECT * WHERE But with a simple switch of the binding variable, the behavior is reversed. This will produce all the things upon which ex:sw003 depends (downstream dependency): SELECT * WHERE Even more attractive is that we can easily instruct SPARQL to transitively follow the path: SELECT * WHERE Bound variables can therefore also be lists and will be operated upon without complicated syntax. The effect of this is similar to the following

pseudocode In computer science, pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actio ...

: If ?S is bound to (ex:A, ex:B) and ?O is UNbound then ?S ex:linksWith ?O behaves like a forward chain: for each s in ?S: for each fetch (s, ex:linksWith): capture o append o to ?O If ?O is bound to (ex:A, ex:B) and ?S is UNbound then ?S ex:linksWith ?O behaves like a backward chain: for each o in ?O: for each fetch (ex:linksWith, o): capture s append s to ?S 2. SPARQL expressions are a pipeline Unlike SQL which has subqueries and CTEs, SPARQL is much more like MongoDB or SPARK. Expressions are evaluated exactly in the order they are declared including filtering and joining of data. The programming model becomes what a SQL statement would be like with multiple WHERE clauses. The combination of list-aware subjects and objects plus a pipeline approach can yield extremely expressive queries spanning many different domains of data. JOIN as used in RDBMS and understanding the dynamics of the JOIN (e.g. what column in what table is suitable to join to another, inner vs. outer, etc.) is not relevant in SPARQL (and in some ways simpler) because objects, if an URI and not a literal, implicity can be used ''only'' to find a subject. Here is a more comprehensive example that illustrates the pipeline using some syntax shortcuts. # SELECT only the terminal values we need. If we did SELECT * (which # is not nessarily bad), then "intermediate" variables ?vendor and ?owner # would be part of the output. SELECT ?slbl ?vlbl ?lei ?lname WHERE Unlike relational databases, the object column is heterogeneous: the object data type, if not an URI, is usually implied (or specified in the

ontology Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...

) by the predicate value. Literal nodes carry type information consistent with the underlying XSD namespace including signed and unsigned short and long integers, single and double precision floats, datetime, penny-precise decimal, Boolean, and string. Triple store implementations on traditional relational databases will typically store the value as a string and a fourth column will identify the real type. Polymorphic databases such as MongoDB and SQLite can store the native value directly into the object field. Thus, SPARQL provides a full set of analytic query operations such as JOIN, SORT, AGGREGATE for data whose

schema Schema may refer to: Science and technology * SCHEMA (bioinformatics), an algorithm used in protein engineering * Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity * Schema.org, a web markup vocab ...

is intrinsically part of the data rather than requiring a separate schema definition. However, schema information (the ontology) is often provided externally, to allow joining of different datasets unambiguously. In addition, SPARQL provides specific

graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discret ...

traversal syntax for data that can be thought of as a graph. The example below demonstrates a simple query that leverages the

definition foaf ("friend of a friend"). Specifically, the following query returns names and emails of every person in the

dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record o ...

: PREFIX foaf: SELECT ?name ?email WHERE This query joins all of the triples with a matching subject, where the type predicate, "a", is a person (foaf:Person), and the person has one or more names (foaf:name) and mailboxes (foaf:mbox). For the sake of readability, the author of this query chose to reference the subject using the variable name "?person". Since the first element of the triple is always the subject, the author could have just as easily used any variable name, such as "?subj" or "?x". Whatever name is chosen, it must be the same on each line of the query to signify that the query engine is to join triples with the same subject. The result of the join is a set of rows – ?person, ?name, ?email. This query returns the ?name and ?email because ?person is often a complex URI rather than a human-friendly string. Note that any ?person may have multiple mailboxes, so in the returned set, a ?name row may appear multiple times, once for each mailbox, duplicating the ?name. An important consideration in SPARQL is that when lookup conditions are not met in the pipeline for terminal entities like ?email, then the whole row is excluded, unlike SQL where typically a null column is returned. The query above will return only those ?person where both at least one ?name and at least one ?email can be found. If a ?person had no email, they would be excluded. To align the output with that expected from an equivalent SQL query, the OPTIONAL keyword is required: PREFIX foaf: SELECT ?name ?email WHERE This query can be distributed to multiple SPARQL endpoints (services that accept SPARQL queries and return results), computed, and results gathered, a procedure known as federated query. Whether in a federated manner or locally, additional triple definitions in the query could allow joins to different subject types, such as automobiles, to allow simple queries, for example, to return a list of names and emails for people who drive automobiles with a high fuel efficiency.

Query forms

In the case of queries that read data from the database, the SPARQL language specifies four different query variations for different purposes. ; SELECT query: Used to extract raw values from a SPARQL endpoint, the results are returned in a table format. ; CONSTRUCT query: Used to extract information from the SPARQL endpoint and transform the results into valid RDF. ; ASK query: Used to provide a simple True/False result for a query on a SPARQL endpoint. ; DESCRIBE query: Used to extract an RDF graph from the SPARQL endpoint, the content of which is left to the endpoint to decide, based on what the maintainer deems as useful information. Each of these query forms takes a WHERE block to restrict the query, although, in the case of the DESCRIBE query, the WHERE is optional. SPARQL 1.1 specifies a language for updating the database with several new query forms.

Example

Another SPARQL query example that models the question "What are all the country capitals in Africa?": PREFIX ex: SELECT ?capital ?country WHERE Variables are indicated by a ? or $ prefix. Bindings for ?capital and the ?country will be returned. When a triple ends with a semicolon, the subject from this triple will implicitly complete the following pair to an entire triple. So for example ex:isCapitalOf ?y is short for ?x ex:isCapitalOf ?y. The SPARQL query processor will search for sets of triples that match these four triple patterns, binding the variables in the query to the corresponding parts of each triple. Important to note here is the "property orientation" (class matches can be conducted solely through class-attributes or properties – see

Duck typing In computer programming, duck typing is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose. With nominative ...

). To make queries concise, SPARQL allows the definition of prefixes and base URIs in a fashion similar to

Turtle Turtles are reptiles of the order (biology), order Testudines, characterized by a special turtle shell, shell developed mainly from their ribs. Modern turtles are divided into two major groups, the Pleurodira (side necked turtles) and Crypt ...

. In this query, the prefix "ex" stands for “http://example.com/exampleOntology#”. SPARQL has native dateTime operations as well. Here is a query that will return all pieces of software where the EOL date is greater than or equal to 1000 days from the release date and the release year is 2020 or greater: SELECT ?lbl ?version ?released ?eol ?duration WHERE ORDER BY DESC(?duration) LIMIT 5

Extensions

GeoSPARQL GeoSPARQL is a model for representing and querying geospatial linked data for the Semantic Web. It is standardized by the Open Geospatial Consortium as OGC GeoSPARQL. The definition of a small ontology based on well-understood OGC standards is inte ...

defines filter functions for

geographic information system A geographic information system (GIS) consists of integrated computer hardware and Geographic information system software, software that store, manage, Spatial analysis, analyze, edit, output, and Cartographic design, visualize Geographic data ...

(GIS) queries using well-understood OGC standards ( GML, WKT, etc.).

SPARUL SPARUL, or SPARQL/Update, was a declarative data manipulation language that extended the SPARQL 1.0 query language standard. SPARUL provided the ability to insert, delete and update RDF data held within a triple store or quad store. SPARUL was ...

is another extension to SPARQL. It enables the RDF store to be updated with this declarative query language, by adding INSERT and DELETE methods. XSPARQL is an integrated query language combining

with SPARQL to query both XML and RDF data sources at once.

Implementations

Open source, reference SPARQL implementations * Eclipse RDF4J, formerly OpenRDF Sesame * Apache Jena * OpenLink Virtuoso See List of SPARQL implementations for more comprehensive coverage, including

triplestore A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject– predicate– object, like "Bob is 35" (i.e., Bob's age measured in years i ...

API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...

s, and other storages that have implemented the SPARQL standard.

Features

Query forms

Example

Extensions

Implementations

See also

References

External links