HOME

TheInfoList



OR:

jsoup is an open-source Java library designed to parse, extract, and manipulate data stored in HTML documents.


History

jsoup was created in 2009 by Jonathan Hedley. It is distributed it under the
MIT License The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts only very limited restriction on reuse and has, therefore, high license co ...
, a
permissive free software license A permissive software license, sometimes also called BSD-like or BSD-style license, is a free-software license which instead of copyleft protections, carries only minimal restrictions on how the software can be used, modified, and redistributed, ...
similar to the
Creative Commons Creative Commons (CC) is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. The organization has releas ...
attribution license. Hedley's avowed intention in writing jsoup was "to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup."


Projects powered by jsoup

jsoup is used in a number of current projects, including Google's
OpenRefine OpenRefine is an open-source desktop application for data cleanup and transformation to other formats, an activity commonly known as data wrangling. It is similar to spreadsheet applications, and can handle spreadsheet file formats such as CSV, bu ...
data-wrangling tool.


See also

*
Comparison of HTML parsers HTML parsers are software for automated Hypertext Markup Language (HTML) parsing. They have two main purposes: * HTML traversal: offer an interface for programmers to easily access and modify the "HTML string code". Canonical example: DOM par ...
*
Web scraping Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scrapin ...
* Data wrangling *
MIT License The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts only very limited restriction on reuse and has, therefore, high license co ...


References


External links

* {{DEFAULTSORT:jsoup Java (programming language) libraries Free software programmed in Java (programming language) XML parsers HTML parsers Web scraping