HOME

TheInfoList



OR:

HTTrack is a free and
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
Web crawler Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spider ...
and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3. HTTrack allows users to download
World Wide Web The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
sites from the
Internet The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
to a local computer. By default, HTTrack arranges the downloaded site by the original site's relative link-structure. The downloaded (or " mirrored") website can be browsed by opening a page of the site in a browser. HTTrack can also update an existing mirrored site and resume interrupted downloads. HTTrack is configurable by options and by filters (include/exclude), and has an integrated help system. There is a basic command line version and two GUI versions (WinHTTrack and WebHTTrack); the former can be part of scripts and cron jobs. HTTrack uses a
Web crawler Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spider ...
to download a website. Some parts of the website may not be downloaded by default due to the robots exclusion protocol unless disabled during the program. HTTrack can follow links that are generated with basic
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
and inside Applets or Flash, but not complex links (generated using functions or expressions) or server-side image maps.


See also

* * Robots Exclusion Standard * Website mirroring software


References


External links

* Free software programmed in C Free web crawlers Portable software Web archiving Web scraping {{Web-software-stub