HOME

TheInfoList



OR:

Data Toolbar is a
Web scraping Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scrapin ...
computer software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
add-on to the
Internet Explorer Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical user interface, graphical web browsers developed by Microsoft which was used in the Microsoft Wind ...
,
Mozilla Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current a ...
, and
Google Chrome Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macO ...
Web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
s that collects and converts structured data from
Web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created b ...
pages into a tabular format that can be loaded into a
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in ce ...
or database management program.


Algorithm

The program implements a variation of the genetic tree matching algorithm with respect to nested lists. That is, inside a given website, the program recursively traverses the branches of its DOM tree, aiming to detect nested lists of data items matching the format of the specified content. This approach is known to have several advantages over a simple string-matching algorithm.Nitin Jindal, Bing Liu
A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction
' Proceedings of the Tenth SIAM International Conference on Data Mining, 2010


Features

* Collection of data and images directly from the Internet Explorer * Collection of information from Details pages linked to the catalog * Automatic processing of multi-page catalogs * Support of irregular multi-row catalogs mixed with advertisement


Similar tools

*
Automation Anywhere Automation Anywhere is an American global software company that develops robotic process automation (RPA) software. Founded in 2003, the company is headquartered in San Jose, California. History Automation Anywhere was originally founded as ...
- The Web Extractor is a part of the larger automation system
Easy Web Extract
- Standalone application, Windows
Mozenda
- Web based service
Newprosoft
- Standalone application, includes an Agent, Windows
OutWit
– Standalone Application and Firefox Extension
Data Scraping Studio
– Standalone Application for Windows and Chrome Extension
Diggernaut
– Web platform with standalone application for Windows, Linux, MacOS and Google Chrome Extension


Sources


External links

*http://datatoolbar.com/ Internet Explorer add-ons Web scraping