Lex programming tool
   HOME

TheInfoList



OR:

Lex is a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. Computer programs are one component of software, which also includes software documentation, documentation and oth ...
that generates
lexical analyzer In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
s ("scanners" or "lexers"). Lex is commonly used with the
yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
parser generator. Lex, originally written by
Mike Lesk Michael E. Lesk (born 1945) is an American computer scientist. Biography In the 1960s, Michael Lesk worked for the SMART Information Retrieval System project, wrote much of its retrieval code and did many of the retrieval experiments, as well as ...
and
Eric Schmidt Eric Emerson Schmidt (born April 27, 1955) is an American businessman and software engineer known for being the CEO of Google from 2001 to 2011, executive chairman of Google from 2011 to 2015, executive chairman of Alphabet Inc. from 2015 to 2 ...
and described in 1975, is the standard lexical analyzer generator on many
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
systems, and an equivalent tool is specified as part of the
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming in ...
standard. Lex reads an input
stream A stream is a continuous body of surface water flowing within the bed and banks of a channel. Depending on its location or certain characteristics, a stream may be referred to by a variety of local or regional names. Long large streams ...
specifying the lexical analyzer and writes
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
which implements the lexical analyzer in the
C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well a ...
. In addition to C, some old versions of Lex could generate a lexer in Ratfor.


Open source

Although originally distributed as proprietary software, some versions of Lex are now
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized so ...
. Open-source versions of Lex, based on the original proprietary code, are now distributed with open-source operating systems such as
OpenSolaris OpenSolaris () is a discontinued open-source computer operating system based on Solaris and created by Sun Microsystems. It was also, perhaps confusingly, the name of a project initiated by Sun to build a developer and user community around t ...
and
Plan 9 from Bell Labs Plan 9 from Bell Labs is a distributed operating system which originated from the Computing Science Research Center (CSRC) at Bell Labs in the mid-1980s and built on UNIX concepts first developed there in the late 1960s. Since 2000, Plan 9 has be ...
. One popular open-source version of Lex, called
flex Flex or FLEX may refer to: Computing * Flex (language), developed by Alan Kay * FLEX (operating system), a single-tasking operating system for the Motorola 6800 * FlexOS, an operating system developed by Digital Research * FLEX (protocol), a com ...
, or the "fast lexical analyzer", is not derived from proprietary coding.


Structure of a Lex file

The structure of a Lex file is intentionally similar to that of a yacc file: files are divided into three sections, separated by lines that contain only two percent signs, as follows: *The definitions section defines macros and imports
header file Many programming languages and other computer files have a directive, often called include (sometimes copy or import), that causes the contents of the specified file to be inserted into the original file. These included files are called copybooks ...
s written in C. It is also possible to write any C code here, which will be copied verbatim into the generated source file. *The rules section associates
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
patterns with C statements. When the lexer sees text in the input matching a given pattern, it will execute the associated C code. *The C code section contains C statements and functions that are copied verbatim to the generated source file. These statements presumably contain code called by the rules in the rules section. In large programs it is more convenient to place this code in a separate file linked in at
compile In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs th ...
time.


Example of a Lex file

The following is an example Lex file for the
flex Flex or FLEX may refer to: Computing * Flex (language), developed by Alan Kay * FLEX (operating system), a single-tasking operating system for the Motorola 6800 * FlexOS, an operating system developed by Digital Research * FLEX (protocol), a com ...
version of Lex. It recognizes strings of numbers (positive integers) in the input, and simply prints them out. /*** Definition section ***/ % %% /*** Rules section ***/ /* -9 matches a string of one or more digits */ -9 ., \n %% /*** C Code section ***/ int main(void) If this input is given to flex, it will be converted into a C file, . This can be compiled into an executable which matches and outputs strings of integers. For example, given the input: abc123z.!&*2gj6 the program will print: Saw an integer: 123 Saw an integer: 2 Saw an integer: 6


Using Lex with other programming tools


Using Lex with parser generators

Lex and parser generators, such as
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
or
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant and numerous extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'', found only in North A ...
, are commonly used together. Parser generators use a
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
to parse an input stream, something which Lex cannot do using simple
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s, as Lex is limited to simple finite state automata. It is typically preferable to have a parser, one generated by Yacc for instance, accept a stream of tokens (a "token-stream") as input, rather than having to process a stream of characters (a "character-stream") directly. Lex is often used to produce such a token-stream.
Scannerless parsing In computer science, scannerless parsing (also called lexerless parsing) performs tokenization (breaking a stream of characters into words) and parsing (arranging the words into phrases) in a single step, rather than breaking it up into a pipeli ...
refers to parsing the input character-stream directly, without a distinct lexer.


Lex and make

make is a utility that can be used to maintain programs involving Lex. Make assumes that a file that has an extension of .l is a Lex source file. The make internal macro LFLAGS can be used to specify Lex options to be invoked automatically by make.


See also

* Flex lexical analyser *
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
* Ragel * PLY (Python Lex-Yacc) * Comparison of parser generators


References


External links


Using Flex and Bison at Macworld.com
* * {{Plan 9 commands Compiling tools Unix programming tools Unix SUS2008 utilities Plan 9 commands Finite automata Lexical analysis