HOME

TheInfoList



OR:

Snowball is a small string processing
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming l ...
designed for creating
stemming In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morph ...
algorithms for use in information retrieval."Snowball"
Martin Porter, web page. Retrieved 2 September 2014.
The Snowball compiler translates a Snowball script (a .sbl file) into program in thread-safe ANSI C,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
, Ada, C#, Go, Javascript, Object Pascal, Python or Rust. For ANSI C, each Snowball script produces a program file and corresponding header file (with .c and .h extensions). The Snowball compiler checks the consistency of its script, and this check was used to discover a
typo A typographical error (often shortened to typo), also called a misprint, is a mistake (such as a spelling mistake) made in the typing of printed (or electronic) material. Historically, this referred to mistakes in manual type-setting (typography). ...
in a seminal academic paper by Lovins which had remained undetected for 30 years. The basic datatypes handled by Snowball are strings of characters, signed integers, and boolean truth values, or more simply strings, integers and booleans. Snowball's characters are either 8-bit wide, or 16-bit, depending on the mode of use. In particular, both
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
and 16-bit Unicode are supported. Like the SNOBOL programming language, the flow of control in Snowball is arranged by the implicit use of signals (each statement returns a true or false value), rather than the explicit use of constructs such as if, then, and break found in C and many other programming languages."Snowball Manual"
Martin Porter, web page. Retrieved 2 September 2014. The name Snowball was chosen as a tribute to the
SNOBOL SNOBOL ("StriNg Oriented and symBOlic Language") is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of ...
programming language, with which it shares the concept of string patterns delivering signals that are used to control the flow of the program. The creator of Snowball, Dr. Martin Porter, "toyed with the idea of calling it 'strippergram' ", because it "effectively provides a 'suffix STRIPPER GRAMmar' ".


References

*P Willett. "The Porter Stemming Algorithm: Then and Now" (July 2006) ''Program''. Volume 40. Issue 3. Pages 219 et seq.


External links


Official site

Porter Stemmer in Snowball
Experimental programming languages Text-oriented programming languages SNOBOL programming language family {{compu-lang-stub