A headless browser is a
web browser
A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
without a
graphical user interface
A graphical user interface, or GUI, is a form of user interface that allows user (computing), users to human–computer interaction, interact with electronic devices through Graphics, graphical icon (computing), icons and visual indicators such ...
.
Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a
command-line interface
A command-line interface (CLI) is a means of interacting with software via command (computing), commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user ...
or using network communication. They are particularly useful for
testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
and
Ajax
Ajax may refer to:
Greek mythology and tragedy
* Ajax the Great, a Greek mythological hero, son of King Telamon and Periboea
* Ajax the Lesser, a Greek mythological hero, son of Oileus, the king of Locris
* Ajax (play), ''Ajax'' (play), by the an ...
which are usually not available when using other testing methods.
Since version 59 of
Google Chrome
Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, an ...
and version 56 of
Firefox
Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
, there is native support for remote control of the browser. This made earlier efforts obsolete, notably
PhantomJS.
Use cases
The main use cases for headless browsers are:
*
Test automation
In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive bu ...
in modern
web application
A web application (or web app) is application software that is created with web technologies and runs via a web browser. Web applications emerged during the late 1990s and allowed for the server to dynamically build a response to the request, ...
s (
web testing
Web testing is software testing that focuses on web applications. Complete testing of a web-based system before going live can help address issues before the system is revealed to the public. Issues may include the security of the web application ...
)
* Taking screenshots of web pages.
* Running automated tests for JavaScript libraries.
* Automating interaction of web pages.
Other uses
Headless browsers are also useful for
web scraping
Web scraping, web harvesting, or web data extraction is data scraping used for data extraction, extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. W ...
.
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax.
Headless browsers have also been misused in various ways:
* Perform
DDoS
In computing, a denial-of-service attack (DoS attack) is a cyberattack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host co ...
attacks on web sites.
* Increase advertisement impressions.
* Automate web sites in unintended ways e.g. for
credential stuffing
Credential stuffing is a type of cyberattack in which the attacker collects stolen account credentials, typically consisting of lists of usernames or email addresses and the corresponding passwords (often from a data breach), and then uses the cr ...
.
However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers.
There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks,
SQL injections or
cross-site scripting
Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may be ...
attacks.
Usage
As several major browsers natively support headless mode through
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
s, some software exists to perform browser automation through a unified interface. These include:
*
Selenium WebDriver - a
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
compliant implementation of WebDriver
*
Playwright
A playwright or dramatist is a person who writes play (theatre), plays, which are a form of drama that primarily consists of dialogue between Character (arts), characters and is intended for Theatre, theatrical performance rather than just
Readin ...
- a
Node.js library to automate Chromium, Firefox and WebKit
* Puppeteer - a
Node.js library to automate Chrome
Test automation
Some
test automation software and frameworks include headless browsers as part of their testing apparati.
*
Capybara
The capybara or greater capybara (''Hydrochoerus hydrochaeris'') is the largest living rodent, native to South America. It is a member of the genus '' Hydrochoerus''. The only other extant member is the lesser capybara (''Hydrochoerus isthmi ...
uses headless browsing, either via
WebKit
WebKit is a browser engine primarily used in Apple's Safari web browser, as well as all web browsers on iOS and iPadOS. WebKit is also used by the PlayStation consoles starting with the PS3, the Tizen mobile operating systems, the Amazon K ...
or Headless Chrome to mimic user behavior in its testing protocols.
*
Jasmine
Jasmine (botanical name: ''Jasminum'', pronounced ) is a genus of shrubs and vines in the olive family of Oleaceae. It contains around 200 species native to tropical and warm temperate regions of Eurasia, Africa, and Oceania. Jasmines are wid ...
uses Selenium by default, but can use WebKit or Headless Chrome, to run browser tests.
*
Cypress
Cypress is a common name for various coniferous trees or shrubs from the ''Cupressus'' genus of the '' Cupressaceae'' family, typically found in temperate climates and subtropical regions of Asia, Europe, and North America.
The word ''cypress'' ...
, a frontend testing framework
*
QF-Test
QF-Test from Quality First Software is a cross-platform software tool for automated testing of programs via the graphical user interface (GUI) test automation). The program is specialized on ( Java/Swing, Standard Widget Toolkit (SWT), Eclipse ...
, a software tool for automated testing of programs via the graphical user interface where a headless browser can also be used for testing.
Alternatives
Another approach is to use software that provides browser APIs. For example,
Deno provides browser APIs as part of its design. For
Node.js, jsdom is the most complete provider. While most are able to support common browser features (HTML parsing,
cookies
A cookie is a sweet biscuit with high sugar and fat content. Cookie dough is softer than that used for other types of biscuit, and they are cooked longer at lower temperatures. The dough typically contains flour, sugar, egg, and some type of ...
,
XHR, some JavaScript, etc.), they do not
render the
DOM and have limited support for
DOM events
DOM (Document Object Model) Events are a signal that something has occurred, or is occurring, and can be triggered by user interactions or by the browser. Client-side scripting languages like JavaScript, JScript, VBScript, and Java can register v ...
. They usually perform faster than full browsers, but are unable to correctly interpret many popular websites.
Another is
HtmlUnit, a headless browser written in Java. HtmlUnit uses the
Rhino engine to provide JavaScript and Ajax support as well as partial rendering capability.
List of headless browsers
These are various software that provide headless browser APIs.
* Splash is a headless web browser written in
Python using the
WebKit
WebKit is a browser engine primarily used in Apple's Safari web browser, as well as all web browsers on iOS and iPadOS. WebKit is also used by the PlayStation consoles starting with the PS3, the Tizen mobile operating systems, the Amazon K ...
layout engine via
Qt. It has an HTTP API,
Lua scripting support and a built-in
IPython
IPython (Interactive Python) is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and his ...
(Jupyter)-based IDE. Development started at ScrapingHub in 2013; it is partially funded by
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adva ...
.
* Zombie.js is a simulated browser environment for
Node.js.
* SimpleBrowser is a headless web browser written in C# supporting .NET Standard 2.0
*
DotNetBrowser is a proprietary .NET Chromium-based library that provides the off-screen rendering mode and can be used without embedding or displaying windows.
Another noted earlier effort was envjs in 2008 from
John Resig, which was a simulated browser environment written in JavaScript for the
Rhino engine.
See also
*
Headless computer
References
{{reflist, 30em
Web browsers