Nokogiri (project)
   HOME

TheInfoList



OR:

Nokogiri is an
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
software library In computing, a library is a collection of resources that can be leveraged during software development to implement a computer program. Commonly, a library consists of executable code such as compiled functions and classes, or a library can ...
to parse
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
and
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
in
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
. It depends on
libxml2 libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets. Description Written in the C programming language, libxml2 provides bindings to C++, Ch, XSH, ...
and
libxslt libxslt is the XSLT C library developed for the GNOME project. It provides an implementation of XSLT 1.0, plus most of the EXSLT set of processor-portable extensions functions and some of Saxon's evaluate and expressions extensions. libxslt is b ...
to provide its functionality.


Overview

It markets itself as providing a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is available for ruby as well as
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
through
JRuby JRuby is an implementation of the Ruby programming language atop the Java Virtual Machine, written largely in Java. It is free software released under a three-way EPL/ GPL/LGPL license. JRuby is tightly integrated with Java to allow the embeddi ...
. It provides fast and standards-compliant parser by relying on native parsers like
libxml2 libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets. Description Written in the C programming language, libxml2 provides bindings to C++, Ch, XSH, ...
( CRuby) and xerces (JRuby). It is one of the most downloaded Ruby gems, having been downloaded over 700 million times from the rubygems.org repository.


Features

* DOM Parser for XML, HTML4, and HTML5 * SAX Parser for XML and HTML4 * Push Parser for XML and HTML4 * Document search via
XPath XPath (XML Path Language) is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values (e.g., strings, numbers, or ...
1.0 * Document search via CSS3 selectors *
XSD XSD (XML Schema Definition), a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item cont ...
Schema validation *
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text, or XSL Formatting Objects. These formats c ...
transformation * XML and HTML Builder Enterprise support is available through tidelift, a paid subscription model, offering commercial support for open source applications.


References


External links

* * {{GitHub, sparklemotion/nokogiri, Nokogiri Ruby (programming language) XML parsers HTML parsers Web scraping