Snowball is a small string processing
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
designed for creating
stemming algorithms for use in
information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
.
["Snowball"]
Martin Porter, web page. Retrieved 2 September 2014.
The name Snowball was chosen as a tribute to the
SNOBOL programming language, "with which it shares the concept of string patterns delivering signals that are used to control the flow of the program."
[ The creator of Snowball, Dr. Martin Porter, "toyed with the idea of calling it 'strippergram,'" because it "effectively provides a 'suffix STRIPPER GRAMmar.'"][
The Snowball compiler translates a Snowball script (an .sbl file) into program in thread-safe ANSI C, ]Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, Ada, C#, Go, Javascript, Object Pascal, Python or Rust. For ANSI C, each Snowball script produces a program file and corresponding header file (with .c and .h extensions).[ The Snowball compiler checks the consistency of its script, and this check was used to discover a typo in a seminal academic paper by Lovins which had remained undetected for 30 years.
The basic ]datatype
In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
s handled by Snowball are strings of characters, signed integers, and boolean truth value
In logic and mathematics, a truth value, sometimes called a logical value, is a value indicating the relation of a proposition to truth, which in classical logic has only two possible values ('' true'' or '' false''). Truth values are used in ...
s, or more simply strings, integers and booleans. Snowball's characters are either 8-bit wide, or 16-bit, depending on the mode of use. In particular, both ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
and 16-bit Unicode are supported.[ Like the SNOBOL programming language, the flow of control in Snowball is arranged by the implicit use of signals (each statement returns a true or false value), rather than the explicit use of constructs such as if, then, and break found in C and many other programming languages.]["Snowball Manual"](_blank)
Martin Porter, web page. Retrieved 2 September 2014.
Though the origina
Snowball website
maintained by Dr. Martin Porter and colleague Richard Boulton has been closed since 2014 following Dr. Porter’s retirement,[ the site itself is still accessible, and the language continues to be developed a]
a community project on GitHub
[ Additionally, large projects like the Natural Language Toolkit (NLTK) for Python employ Snowball along with stemming algorithms designed by Dr. Porter and other contributors to the Snowball language.]
References
External links
*
Snowball Stemming language and algorithms project
on GitHub
Porter Stemmer in Snowball
Experimental programming languages
Text-oriented programming languages
SNOBOL programming language family
{{compu-lang-stub