GNU bison
   HOME

TheInfoList



OR:

GNU Bison, commonly known as Bison, is a parser generator that is part of the
GNU Project The GNU Project () is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and computing devices by collabor ...
. Bison reads a specification in the BNF notation (a context-free language), warns about any
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from ...
ambiguities, and generates a parser that reads sequences of
token Token may refer to: Arts, entertainment, and media * Token, a game piece or counter, used in some games * The Tokens, a vocal music group * Tolkien Black, a recurring character on the animated television series ''South Park,'' formerly known a ...
s and decides whether the sequence conforms to the syntax specified by the grammar. The generated parsers are portable: they do not require any specific compilers. Bison by default generates LALR(1) parsers but it can also generate canonical LR, IELR(1) and GLR parsers. In
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming in ...
mode, Bison is compatible with
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
, but also has several extensions over this earlier program, including * Generation of counterexamples for conflicts * Location tracking (e.g., file, line, column) * Rich and internationalizable syntax error messages in the generated parsers * Customizable syntax error generation, * Reentrant parsers * Push parsers, with autocompletion * Support for named references * Several types of reports (graphical, XML) on the generated parser * Support for several programming languages ( C, C++, D, or
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
)
Flex Flex or FLEX may refer to: Computing * Flex (language), developed by Alan Kay * FLEX (operating system), a single-tasking operating system for the Motorola 6800 * FlexOS, an operating system developed by Digital Research * FLEX (protocol), a com ...
, an automatic lexical analyser, is often used with Bison, to tokenise input data and provide Bison with tokens. Bison was originally written by Robert Corbett in 1985. Later, in 1989, Robert Corbett released another parser generator named
Berkeley Yacc Berkeley Yacc (byacc) is a Unix parser generator designed to be compatible with Yacc. It was originally written by Robert Corbett and released in 1989. Due to its liberal license and because it was faster than the AT&T Yacc, it quickly became the ...
. Bison was made Yacc-compatible by
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
. Bison is
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, n ...
and is available under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general ...
, with an exception (discussed below) allowing its generated code to be used without triggering the
copyleft Copyleft is the legal technique of granting certain freedoms over copies of copyrighted works with the requirement that the same rights be preserved in derivative works. In this sense, ''freedoms'' refers to the use of the work for any purpose ...
requirements of the licence.


Features


Counterexample generation

One delicate issue with LR parser generators is the resolution of conflicts (shift/reduce and reduce/reduce conflicts). Resolving conflicts usually requires the analysis of the parser automaton as described in the reports, and some expertise from the user. Counterexamples help understanding quickly some conflicts, and can even actually prove that the problem is that the grammar is actually ambiguous. For instance on a grammar suffering from the infamous dangling else problem, Bison reports


Reentrancy

Reentrancy is a feature which has been added to Bison and does not exist in Yacc. Normally, Bison generates a parser which is not
reentrant Reentrant or re-entrant can refer to: *Re-entrant (landform), the low ground formed between two hill spurs. *Reentrancy (computing) in computer programming * Reentrant mutex in computer science *Reentry (neural circuitry) in neuroscience * Salien ...
. In order to achieve reentrancy the declaration %define api.pure must be used. More details on Bison reentrancy can be found in the Bison manual.


Output languages

Bison can generate code for C, C++, D and
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
. For using the Bison-generated parser from other languages a language binding tool such as SWIG can be used.


License and distribution of generated code

Because Bison generates source code that in turn gets added to the source code of other software projects, it raises some simple but interesting copyright questions.


A GPL-compatible license is not required

The code generated by Bison includes significant amounts of code from the Bison project itself. The Bison package is distributed under the terms of the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general ...
(GPL) but an exception has been added so that the GPL does not apply to output. Earlier releases of Bison stipulated that parts of its output were also licensed under the GPL, due to the inclusion of the yyparse() function from the original source code in the output.


Distribution of packages using Bison

Free software projects that use Bison may have a choice of whether to distribute the source code which their project feeds into Bison, or the resulting C code made output by Bison. Both are sufficient for a recipient to be able to compile the project source code. However, distributing only the input carries the minor inconvenience that the recipients must have a compatible copy of Bison installed so that they can generate the necessary C code when compiling the project. And distributing only the C code in output, creates the problem of making it very difficult for the recipients to modify the parser since this code was written neither ''by'' a human nor ''for'' humans - its purpose is to be fed directly into a C compiler. These problems can be avoided by distributing both the input files and the generated code. Most people will compile using the generated code, no different from any other software package, but anyone who wants to modify the parser component can modify the input files first and re-generate the generated files before compiling. Projects distributing both usually do not have the generated files in their revision control systems. The files are only generated when making a release. Some licenses, such as the GPL, require that the source code be in "''the preferred form of the work for making modifications to it''". GPL'd projects using Bison must thus distribute the files which are the input for Bison. Of course, they can also include the generated files.


Use

Because Bison was written as a replacement for Yacc, and is largely compatible, the code from a lot of projects using Bison could equally be fed into Yacc. This makes it difficult to determine if a project "uses" Bison-specific source code or not. In many cases, the "use" of Bison could be trivially replaced by the equivalent use of Yacc or one of its other derivatives. Bison has features not found in Yacc, so some projects can be truly said to "use" Bison, since Yacc would not suffice. The following list is of projects which are known to "use" Bison in the looser sense, that they use free software development tools and distribute code which is intended to be fed into Bison or a Bison-compatible package. *
Bash Bash or BASH may refer to: Arts and entertainment * ''Bash!'' (Rockapella album), 1992 * ''Bash!'' (Dave Bailey album), 1961 * '' Bash: Latter-Day Plays'', a dramatic triptych * ''BASH!'' (role-playing game), a 2005 superhero game * "Bash" ('' ...
shell uses a yacc grammar for parsing the command input. * Bison's own grammar parser is generated by Bison. *
CMake In software development, CMake is cross-platform free and open-source software for build automation, testing, packaging and installation of software by using a compiler-independent method. CMake is not a build system itself; it generates a ...
uses several Bison grammars. * GCC started out using Bison, but switched to a hand-written recursive-descent parser for C++ in 2004 (version 3.4), and for C and Objective-C in 2006 (version 4.1) * The Go programming language (GC) used Bison, but switched to a hand-written scanner and parser in version 1.5. *
LilyPond LilyPond is a computer program and file format for music engraving. One of LilyPond's major goals is to produce scores that are engraved with traditional layout rules, reflecting the era when scores were engraved by hand. LilyPond is cross-pla ...
requires Bison to generate its parser. *
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database ...
*
GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a lan ...
uses a Bison-generated parser. * Perl 5 uses a Bison-generated parser starting in 5.10. * The PHP programming language (Zend Parser). *
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the ...
* Ruby MRI, the reference implementation of the
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
programming language, relies on a Bison grammar. *
syslog-ng syslog-ng is a free and open-source implementation of the syslog protocol for Unix and Unix-like systems. It extends the original syslogd model with content-based filtering, rich filtering capabilities, flexible configuration options and adds ...
uses several Bison grammars assembled together.


A complete reentrant parser example

The following example shows how to use Bison and flex to write a simple calculator program (only addition and multiplication) and a program for creating an abstract syntax tree. The next two files provide definition and implementation of the syntax tree functions. /* * Expression.h * Definition of the structure used to build the syntax tree. */ #ifndef __EXPRESSION_H__ #define __EXPRESSION_H__ /** * @brief The operation type */ typedef enum tagEOperationType EOperationType; /** * @brief The expression structure */ typedef struct tagSExpression SExpression; /** * @brief It creates an identifier * @param value The number value * @return The expression or NULL in case of no memory */ SExpression *createNumber(int value); /** * @brief It creates an operation * @param type The operation type * @param left The left operand * @param right The right operand * @return The expression or NULL in case of no memory */ SExpression *createOperation(EOperationType type, SExpression *left, SExpression *right); /** * @brief Deletes a expression * @param b The expression */ void deleteExpression(SExpression *b); #endif /* __EXPRESSION_H__ */ /* * Expression.c * Implementation of functions used to build the syntax tree. */ #include "Expression.h" #include /** * @brief Allocates space for expression * @return The expression or NULL if not enough memory */ static SExpression *allocateExpression() SExpression *createNumber(int value) SExpression *createOperation(EOperationType type, SExpression *left, SExpression *right) void deleteExpression(SExpression *b) The tokens needed by the Bison parser will be generated using flex. % %option outfile="Lexer.c" header-file="Lexer.h" %option warn nodefault %option reentrant noyywrap never-interactive nounistd %option bison-bridge %% \r\n\t -9 "*" "+" "(" ")" . %% int yyerror(SExpression **expression, yyscan_t scanner, const char *msg) The names of the tokens are typically neutral: "TOKEN_PLUS" and "TOKEN_STAR", not "TOKEN_ADD" and "TOKEN_MULTIPLY". For instance if we were to support the unary "+" (as in "+1"), it would be wrong to name this "+" "TOKEN_ADD". In a language such as C, "int *ptr" denotes the definition of a pointer, not a product: it would be wrong to name this "*" "TOKEN_MULTIPLY". Since the tokens are provided by flex we must provide the means to communicate between the parser and the lexer.Flex Manual: C Scanners with Bison Parsers
The data type used for communication, ''YYSTYPE'', is set using Bison ''%union'' declaration. Since in this sample we use the reentrant version of both flex and yacc we are forced to provide parameters for the ''yylex'' function, when called from ''yyparse''. This is done through Bison ''%lex-param'' and ''%parse-param'' declarations.
/ref> % %code requires %output "Parser.c" %defines "Parser.h" %define api.pure %lex-param %parse-param %parse-param %union %token TOKEN_LPAREN "(" %token TOKEN_RPAREN ")" %token TOKEN_PLUS "+" %token TOKEN_STAR "*" %token TOKEN_NUMBER "number" %type expr /* Precedence (increasing) and associativity: a+b+c is (a+b)+c: left associativity a+b*c is a+(b*c): the precedence of "*" is higher than that of "+". */ %left "+" %left "*" %% input : expr ; expr : expr "+" expr , expr "*" expr , "(" expr ")" , "number" ; %% The code needed to obtain the syntax tree using the parser generated by Bison and the scanner generated by flex is the following. /* * main.c file */ #include "Expression.h" #include "Parser.h" #include "Lexer.h" #include int yyparse(SExpression **expression, yyscan_t scanner); SExpression *getAST(const char *expr) int evaluate(SExpression *e) int main(void) A simple makefile to build the project is the following. # Makefile FILES = Lexer.c Parser.c Expression.c main.c CC = g++ CFLAGS = -g -ansi test: $(FILES) $(CC) $(CFLAGS) $(FILES) -o test Lexer.c: Lexer.l flex Lexer.l Parser.c: Parser.y Lexer.c bison Parser.y clean: rm -f *.o *~ Lexer.c Lexer.h Parser.c Parser.h test


See also

*
Berkeley Yacc Berkeley Yacc (byacc) is a Unix parser generator designed to be compatible with Yacc. It was originally written by Robert Corbett and released in 1989. Due to its liberal license and because it was faster than the AT&T Yacc, it quickly became the ...
(byacc) another free software Yacc replacement sharing the same author as GNU Bison * ANTLR ANother Tool for Language Recognition, another open-source parser generator


References


Further reading

*


External links


Website in the GNU Project
*
Manual

Bison
project at GNU Savannah
Entry in the Free Software Directory



How to download and install Bison (GNU Parser Generator) on Linux


(version 2.4.1) {{DEFAULTSORT:Gnu Bison Compiling tools Cross-platform software
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant and numerous extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'', found only in North A ...
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant and numerous extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'', found only in North A ...
Parsing