jsoup is an open-source Java library designed to parse, extract, and manipulate data stored in HTML documents.
History
jsoup was created in 2009 by Jonathan Hedley. It is distributed it under the
MIT License
The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts only very limited restriction on reuse and has, therefore, high license co ...
, a
permissive free software license
A permissive software license, sometimes also called BSD-like or BSD-style license, is a free-software license which instead of copyleft protections, carries only minimal restrictions on how the software can be used, modified, and redistributed, ...
similar to the
Creative Commons
Creative Commons (CC) is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. The organization has releas ...
attribution license.
Hedley's avowed intention in writing jsoup was "to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid
tag-soup."
Projects powered by jsoup
jsoup is used in a number of current projects,
including Google's
OpenRefine
OpenRefine is an open-source desktop application for data cleanup and transformation to other formats, an activity commonly known as data wrangling. It is similar to spreadsheet applications, and can handle spreadsheet file formats such as CSV, bu ...
data-wrangling tool.
See also
*
Comparison of HTML parsers
HTML parsers are software for automated Hypertext Markup Language (HTML) parsing. They have two main purposes:
* HTML traversal: offer an interface for programmers to easily access and modify the "HTML string code". Canonical example: DOM par ...
*
Web scraping
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scrapin ...
*
Data wrangling
*
MIT License
The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts only very limited restriction on reuse and has, therefore, high license co ...
References
External links
*
{{DEFAULTSORT:jsoup
Java (programming language) libraries
Free software programmed in Java (programming language)
XML parsers
HTML parsers
Web scraping