Nokogiri is an
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
software library
In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and sub ...
to parse
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
and
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
in
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
.
It depends on
libxml2
libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets.
Description
Written in the C programming language, libxml2 provides bindings to C++, Ch, XSH, C#, ...
and
libxslt to provide its functionality.
Overview
It markets itself as providing a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is available for ruby as well as java through
Jruby
JRuby is an implementation of the Ruby programming language atop the Java Virtual Machine, written largely in Java. It is free software released under a three-way EPL/ GPL/ LGPL license. JRuby is tightly integrated with Java to allow the embeddi ...
.
It provides fast and standards-compliant parser by relying on native parsers like
libxml2
libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets.
Description
Written in the C programming language, libxml2 provides bindings to C++, Ch, XSH, C#, ...
(
CRuby
Matz's Ruby Interpreter or Ruby MRI (also called CRuby) was the reference implementation of the Ruby programming language named after Ruby creator Yukihiro Matsumoto ("Matz"). Until the specification of the Ruby language in 2011, the MRI impl ...
) and
xerces (JRuby).
It is one of the most downloaded
Ruby gems
RubyGems is a package manager for the Ruby programming language that provides a standard format for distributing Ruby programs and libraries (in a self-contained format called a "gem"), a tool designed to easily manage the installation of gems ...
, having been downloaded over 550 million times from the rubygems.org repository.
Features
*
DOM Parser for XML, HTML4, and HTML5
*
SAX Parser for XML and HTML4
* Push Parser for XML and HTML4
* Document search via
XPath
XPath (XML Path Language) is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values (e.g., strings, numbers, or Boolean v ...
1.0
* Document search via CSS3 selectors
*
XSD Schema validation
*
XSLT
XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subse ...
transformation
* XML and HTML Builder
Enterprise support is available through tidelift,
a paid subscription model, offering commercial support for open source applications.
References
External links
*
* {{GitHub, sparklemotion/nokogiri, Nokogiri
Ruby (programming language)
XML parsers
HTML parsers
Web scraping