
Semantic HTML is the use of
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
markup to reinforce the
semantics
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
, or meaning, of the information in
web page
A web page (or webpage) is a World Wide Web, Web document that is accessed in a web browser. A website typically consists of many web pages hyperlink, linked together under a common domain name. The term "web page" is therefore a metaphor of pap ...
s and
web application
A web application (or web app) is application software that is created with web technologies and runs via a web browser. Web applications emerged during the late 1990s and allowed for the server to dynamically build a response to the request, ...
s rather than merely to define its
presentation or look. Semantic HTML is processed by traditional
web browser
A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
s as well as by many other
user agent
On the Web, a user agent is a software agent responsible for retrieving and facilitating end-user interaction with Web content. This includes all web browsers, such as Google Chrome and Safari
A safari (; originally ) is an overland jour ...
s.
CSS is used to suggest how it is presented to human users.
History
HTML has included semantic markup since its inception. In an HTML document, the author may, among other things, "start with a title; add headings and paragraphs; add emphasis to
hetext; add images; add links to other pages;
nduse various kinds of lists".
Various versions of the HTML standard have included
presentational markup such as
<font>
(added in HTML 3.2; removed in HTML 4.0 Strict),
<i>
(all versions) and
<center>
(added in HTML 3.2). There are also the semantically neutral
span and div elements. Since the late 1990s when
Cascading Style Sheets were beginning to work in most browsers, web authors have been encouraged to avoid the use of presentational HTML markup with a view to the
separation of content and presentation
Separation of content and presentation (or separation of content and style) is the separation of concerns design principle as applied to the authoring and presentation of content. Under this principle, visual and design aspects (presentation and s ...
.
In 2001,
Tim Berners-Lee
Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web, the HTML markup language, the URL system, and HTTP. He is a professorial research fellow a ...
participated in a discussion of the
Semantic Web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
, where it was presented that intelligent software 'agents' might one day automatically crawl the Web and find, filter and correlate previously unrelated, published facts for the benefit of end users. Such agents are not commonplace even now, but some of the ideas of
Web 2.0
Web 2.0 (also known as participative (or participatory) web and social web) refers to websites that emphasize user-generated content, ease of use, participatory culture, and interoperability (i.e., compatibility with other products, systems, a ...
,
mashups and
price comparison websites may be coming close. The main difference between these web application hybrids and Berners-Lee's semantic agents lies in the fact that the current
aggregation and hybridisation of information is usually designed in by web developers, who already know the web locations and the
API semantics of the specific data they wish to mash, compare and combine.
An important type of web agent that does crawl and read web pages automatically, without prior knowledge of what it might find, is the
web crawler
Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spider ...
or search-engine spider. These software agents are dependent on the semantic clarity of web pages they find as they use various techniques and
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s to read and index millions of web pages a day and provide web users with
search facilities.
In order for search-engine spiders to be able to rate the significance of pieces of text they find in HTML documents, and also for those creating mashups and other hybrids, as well as for more automated agents as they are developed, the semantic structures that exist in HTML need to be widely and uniformly applied to bring out the meaning of published information.
While the true semantic web may depend on complex
RDF ontologies
In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
and
metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
, every HTML document makes its contribution to the meaningfulness of the Web by the correct use of headings, lists, titles and other semantic markup wherever possible. This "plain" use of HTML has been called "Plain Old Semantic HTML" or POSH. The correct use of Web 2.0 'tagging' creates
folksonomies that may be equally or even more meaningful to many.
HTML 5 introduced new semantic elements such as
<section>
,
<article>
,
<footer>
,
<progress>
,
<nav>
,
<aside>
,
<mark>
, and
<time>
. Overall, the goal of the
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
is to slowly introduce more ways for browsers, developers, and crawlers to better distinguish between different types of data, allowing for benefits such as better display on browsers on different devices.
Presentational elements were not formally
deprecated
Deprecation is the discouragement of use of something human-made, such as a term, feature, design, or practice. Typically something is deprecated because it is claimed to be inferior compared to other options available.
Something may be deprec ...
in HTML 4.01 and XHTML recommendations, but were recommended against. In HTML 5, some of those elements, such as
<i>
and
<b>
, are still specified as their meaning has been clearly defined "as to be stylistically offset from the normal prose without conveying any extra importance".
Considerations
In cases where a document requires more precise semantics than those expressed in HTML alone, fragments of the document may be enclosed within
span
or
div
elements with meaningful class names such as
<span class="author">
and
<div class="invoice">
. Where these class names are also a
fragment identifier
In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier poi ...
within a schema or ontology, they may link to a more defined meaning.
Microformat
Microformats (μF) are predefined HTML markup (like HTML classes) created to serve as descriptive and consistent metadata about elements, designating them as representing a certain type of data (such as contact information, geographic coor ...
s formalise this approach to semantics in HTML.
One important restriction of this approach is that such markup based on element inclusion must meet the well-formedness conditions. As these documents are broadly tree-structured, this means that only balanced fragments from a sub-tree can be marked up in this way. A means of marking-up any arbitrary section of HTML would require a mechanism independent of the markup structure itself, such as
XPointer.
Good semantic HTML also improves the
accessibility of web documents (see also
Web Content Accessibility Guidelines). For example, when a screen reader or audio browser can correctly ascertain the structure of a document, it will not waste the visually impaired user's time by reading out repeated or irrelevant information when it has been marked up correctly.
Google "rich snippets"
In 2010,
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
specified three forms of structured metadata that their systems will use to find structured semantic content within webpages. Such information, when related to reviews, people profiles, business listings, and events will be used by Google to enhance the "snippet", or short piece of quoted text that is shown when the page appears in search listings. Google specifies that that data may be given using
microdata,
microformat
Microformats (μF) are predefined HTML markup (like HTML classes) created to serve as descriptive and consistent metadata about elements, designating them as representing a certain type of data (such as contact information, geographic coor ...
s or
RDFa
RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within web documents. The Resource Descript ...
. Microdata is specified inside
itemtype
and
itemprop
attributes added to existing HTML elements; microformat keywords are added inside
class
attributes as discussed above; and RDFa relies on
rel
,
typeof
typeof, alternately also typeOf, and TypeOf, is an operator provided by several programming languages to determine the data type of a variable. This is useful when constructing programs that must accept multiple types of data without explicitly s ...
and
property
attributes added to existing elements.
See also
*
CP/LD (Content Profile/Linked Document)
*
HTML element
An HTML element is a type of HTML (HyperText Markup Language) document component, one of several types of HTML nodes (there are also text nodes, comment nodes and others). The first used version of HTML was written by Tim Berners-Lee in 199 ...
s (complete list)
*
HTML landmarks
*
Microdata (HTML)
Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience ...
*
Microformat
Microformats (μF) are predefined HTML markup (like HTML classes) created to serve as descriptive and consistent metadata about elements, designating them as representing a certain type of data (such as contact information, geographic coor ...
*
RDFa
RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within web documents. The Resource Descript ...
*
schema.org is an initiative launched on 2 June 2011 by Bing, Google and Yahoo!
*
Semantic Web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
*
Semantics (computer science)
In programming language theory, semantics is the rigorous mathematical study of the meaning of programming languages. Semantics assigns computational meaning to valid strings in a programming language syntax. It is closely related to, and oft ...
*
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
References
{{Semantic Web
Domain-specific knowledge representation languages
Web accessibility
Web design