Versions
There are several versions of XPath in use. XPath 1.0 was published in 1999, XPath 2.0 in 2007 (with a second edition in 2010), XPath 3.0 in 2014, and XPath 3.1 in 2017. However, XPath 1.0 is still the version that is most widely available. *XPath 1.0 became a Recommendation on 16 November 1999 and is widely implemented and used, either on its own (called via an API from languages such asfor
expression that is a cut-down version of the "typeswitch
expression.
*Syntax and semantics (XPath 1.0)
The most important kind of expression in XPath is a ''location path''. A location path consists of a sequence of ''location steps''. Each location step has three components: * an ''Abbreviated syntax
The compact notation allows many defaults and abbreviations for common cases. Given source XML containing at least/A/B/C
that selects C elements that are children of B elements that are children of the A element that forms the outermost element of the XML document. The XPath syntax is designed to mimic URI (A//B/* /code>
selects the first child ('* /code>'), whatever its name, of every B element that itself is a child or other, deeper descendant ('//
') of an A element that is a child of the current context node (the expression does not begin with a '/
'). The predicate /code> binds more tightly than the /
operator. To select the first node selected by the expression A//B/*
, write (A//B/*) /code>. Note also, index values in XPath predicates (technically, 'proximity positions' of XPath node sets) start from 1, not 0 as common in languages like C and Java.
Expanded syntax
In the full, unabbreviated syntax, the two examples above would be written
*
*
Here, in each step of the XPath, the axis (e.g. child
or descendant-or-self
) is explicitly specified, followed by ::
and then the node test, such as A
or node()
in the examples above.
Here the same, but shorter:
Axis specifiers
Axis specifiers indicate navigation direction within the tree representation of the XML document. The axes available are:
As an example of using the attribute axis in abbreviated syntax, //a/@href
selects the attribute called href
in a
elements anywhere in the document tree.
The expression . (an abbreviation for self::node()) is most commonly used within a predicate to refer to the currently selected node.
For example, h3 ='See also'/code> selects an element called h3
in the current context, whose text content is See also
.
Node tests
Node tests may consist of specific node names or more general expressions. In the case of an XML document in which the namespace prefix gs
has been defined, //gs:enquiry
will find all the enquiry
elements in that namespace, and //gs:*
will find all elements, regardless of local name, in that namespace.
Other node test formats are:
; :finds an XML comment node, e.g.
; :finds a node of type text excluding any children, e.g. the hello
in hello world
; :finds XML processing instructions such as . In this case, processing-instruction('php')
would match.
; :finds any node at all.
Predicates
Predicates, written as expressions in square brackets, can be used to filter a node-set according to some condition. For example, a
returns a node-set (all the a
elements which are children of the context node), and keeps only those elements having an href
attribute with the value help.php
.
There is no limit to the number of predicates in a step, and they need not be confined to the last step in an XPath. They can also be nested to any depth. Paths specified in predicates begin at the context of the current step (i.e. that of the immediately preceding node test) and do not alter that context. All predicates must be satisfied for a match to occur.
When the value of the predicate is numeric, it is syntactic-sugar for comparing against the node's position in the node-set (as given by the function position()
). So p /code> is shorthand for and selects the first p
element child, while p ast()/code> is shorthand for and selects the last p
child of the context node.
In other cases, the value of the predicate is automatically converted to a Boolean. When the predicate evaluates to a node-set, the result is true when the node-set is . Thus p x/code> selects those p
elements that have an attribute named x
.
A more complex example: the expression selects the value of the target
attribute of the first a
element among the children of the context node that has its href
attribute set to help.php
, provided the document's html
top-level element also has a lang
attribute set to en
. The reference to an attribute of the top-level element in the first predicate affects neither the context of other predicates nor that of the location step itself.
Predicate order is significant if predicates test the position of a node. Each predicate takes a node-set returns a (potentially) smaller node-set. So will find a match only if the first a
child of the context node satisfies the condition @href='help.php'
, while will find the first a
child that satisfies this condition.
Functions and operators
XPath 1.0 defines four data types: node-sets (sets of nodes with no intrinsic order), strings, numbers and Booleans.
The available operators are:
* The , and operators, used in path expressions, as described above.
* A union operator, , which forms the union of two node-sets.
* Boolean operators and , and a function
* Arithmetic operators , , , (divide), and
* Comparison operators , , , , ,
The function library includes:
* Functions to manipulate strings:
* Functions to manipulate numbers:
* Functions to get properties of nodes:
* Functions to get information about the processing context:
* Type conversion functions:
Some of the more commonly useful functions are detailed below.
Node set functions
; :returns a number representing the position of this node in the sequence of nodes currently being processed (for example, the nodes selected by an xsl:for-each instruction in XSLT).
; :returns the number of nodes in the node-set supplied as its argument.
String functions
; :converts any of the four XPath data types into a string according to built-in rules. If the value of the argument is a node-set, the function returns the string-value of the first node in document order, ignoring any further nodes.
; : concatenates two or more strings
; : returns true
if s1
starts with s2
; :returns true
if s1
contains s2
; :example: substring("ABCDEF",2,3)
returns .
; :example: substring-before("1999/04/01","/")
returns 1999
; :example: substring-after("1999/04/01","/")
returns 04/01
; :returns number of characters in string
; :all leading and trailing whitespace
White space or whitespace may refer to:
Technology
* Whitespace characters, characters in computing that represent horizontal or vertical space
* White spaces (radio), allocated but locally unused radio frequencies
* TV White Space Database, a m ...
is removed and any sequences of whitespace characters are replaced by a single space. This is very useful when the original XML may have been prettyprint
Pretty-printing (or prettyprinting) is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentati ...
formatted, which could make further string processing unreliable.
Boolean functions
; :negates any Boolean expression.
; :evaluates to ''true''.
; :evaluates to ''false''.
Number functions
; :converts the string values of all the nodes found by the XPath argument into numbers, according to the built-in casting rules, then returns the sum of these numbers.
Usage examples
Expressions can be created inside predicates using the operators: =, !=, <=, <, >=
and >
. Boolean expressions may be combined with brackets ()
and the Boolean operators and
and or
as well as the not()
function described above. Numeric calculations can use *, +, -, div
and mod
. Strings can consist of any Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
characters.
selects items whose price attribute is greater than twice the numeric value of their discount attribute.
Entire node-sets can be combined ( 'unioned') using the vertical bar character , . Node sets that meet one or more of several conditions can be found by combining the conditions inside a predicate with 'or
'.
v or y, w /code> will return a single node-set consisting of all the v
elements that have x
or y
child-elements, as well as all the w
elements that have z
child-elements, that were found in the current context.
Syntax and semantics (XPath 2.0)
Syntax and semantics (XPath 3)
Examples
Given a sample XML document
en.wikipedia.org
de.wikipedia.org
fr.wikipedia.org
pl.wikipedia.org
es.wikipedia.org
en.wiktionary.org
fr.wiktionary.org
vi.wiktionary.org
tr.wiktionary.org
es.wiktionary.org
The XPath expression
/Wikimedia/projects/project/@name
selects name attributes for all projects, and
/Wikimedia//editions
selects all editions of all projects, and
selects addresses of all English Wikimedia projects (text of all edition
elements where language
attribute is equal to ''English''). And the following
selects addresses of all Wikipedias (text of all edition
elements that exist under project
element with a name attribute of ''Wikipedia'').
Implementations
Command-line tools
* XMLStarlet
* xmllint (libxml2)
* RaptorXML Server from Altova supports XPath 1.0, 2.0, and 3.0
Xidel
C/C++
* libxml2
libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets.
Description
Written in the C programming language, libxml2 provides bindings to C++, Ch, XSH, ...
* Pathan
* pugixml
* Sedna XML Database
* VTD-XML
* Xalan
Xalan is a popular open source software library from the Apache Software Foundation, that implements the XSLT 1.0 XML transformation language and the XPath 1.0 language. The Xalan XSLT processor is available for both the Java and C++ programming l ...
* XQilla
Free Pascal
* The unit XPath is included in the default libraries
Implementations for database engines
* OpenLink Virtuoso
Java
* Saxon XSLT Saxon is an XSLT and XQuery processor created by Michael Kay and now developed and maintained by the company he founded, Saxonica. There are open-source and also closed-source commercial versions. Versions exist for Java, JavaScript and .NET.
Th ...
supports XPath 1.0, XPath 2.0 and XPath 3.0 (as well as XSLT 2.0, XQuery 3.0, and XPath 3.0)
* BaseX
BaseX is a native and light-weight XML database management system and XQuery processor, developed as a community project on GitHub. It is specialized in storing, querying, and visualizing large XML documents and collections. BaseX is platform-i ...
(also supports XPath 2.0 and XQuery)
* VTD-XML
* Sedna XML Database Both XML:DB and proprietary.
* QuiXPath, a streaming
Streaming media refers to multimedia delivered through a network for playback using a media player. Media is transferred in a ''stream'' of packets from a server to a client and is rendered in real-time; this contrasts with file downl ...
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
implementation by Innovimax
* Xalan
Xalan is a popular open source software library from the Apache Software Foundation, that implements the XSLT 1.0 XML transformation language and the XPath 1.0 language. The Xalan XSLT processor is available for both the Java and C++ programming l ...
* Dom4j
The Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
package has been part of Java standard edition since Java 5 via the Java API for XML Processing
In computing, the Java API for XML Processing (JAXP) ( ), one of the Java XML application programming interfaces (APIs), provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:
* the Document Objec ...
. Technically this is an XPath API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
rather than an XPath implementation, and it allows the programmer the ability to select a specific implementation that conforms to the interface.
JavaScript
* jQuery XPath plugin based on Open-source XPath 2.0 implementation in JavaScript
* FontoXPath Open source XPath 3.1 implementation in JavaScript. Currently under development.
.NET Framework
* In the System.Xml and System.Xml.XPath namespaces
* Sedna XML Database
Perl
* XML::LibXML (libxml2)
PHP
* Sedna XML Database
* DOMXPath via libxml extension
Python
* The ElementTree XML API in the Python Standard Library include
limited support
for XPath expressions
* libxml2
* Amara
* Sedna XML Database
* lxml
* Scrapy
Ruby
* libxml2
libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets.
Description
Written in the C programming language, libxml2 provides bindings to C++, Ch, XSH, ...
* Nokogiri
Scheme
* Sedna XML Database
SQL
* MySQL
MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
supports a subset of XPath from version 5.1.5 onwards
* PostgreSQL
PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
supports XPath and XSLT from version 8.4 onwards
Tcl
* The package provides a complete, compliant, and fast XPath implementation in C
Use in schema languages
XPath is increasingly used to express constraints in schema languages for XML.
* The (now ISO standard
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
) schema language Schematron
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath languages. In many implem ...
pioneered the approach.
* A streaming subset of XPath is used in W3C XML Schema 1.0 for expressing uniqueness and key constraints. In XSD 1.1, the use of XPath is extended to support conditional type assignment based on attribute values, and to allow arbitrary Boolean assertions to be evaluated against the content of elements.
* XForms uses XPath to bind types to values.
*The approach has even found use in non-XML applications, such as the source code analyzer for Java called PMD: the Java is converted to a DOM-like parse tree, then XPath rules are defined over the tree.
See also
* XPath 3
* Navigational database
* XLink
XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links.
The XLink specification
XLink 1.1 is ...
* XML database
An XML database is a data persistence software system that allows data to be specified, and stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented ...
* XSL
In computing, the term Extensible Stylesheet Language (XSL) is used to refer to a family of languages used to transform and render XML documents (e.g., XSL is used to determine how to display a XML document as a webpage).
Historically, the W3C XS ...
* XSL-FO
XSL-FO (XSL Formatting Objects) is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL (Extensible Stylesheet Language), a set of W3C technologies designed for the transformation and f ...
Notes
References
External links
XPath 1.0 specification
XPath 2.0 specification
XPath 3.0 specification
XPath 3.1 specification
XPath Reference (MSDN)
XPath - MDC Docs
b
Mozilla Developer Network
XPath introduction/tutorial
XSLT and XPath function reference
{{DEFAULTSORT:Xpath
Query languages
XML data access