HOME

TheInfoList



OR:

Lex is a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
that generates lexical analyzers ("scanners" or "lexers"). It is commonly used with the
yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
and is the standard lexical analyzer generator on many
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
and
Unix-like A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
systems. An equivalent tool is specified as part of the
POSIX The Portable Operating System Interface (POSIX; ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with comm ...
standard. Lex reads an input
stream A stream is a continuous body of water, body of surface water Current (stream), flowing within the stream bed, bed and bank (geography), banks of a channel (geography), channel. Depending on its location or certain characteristics, a strea ...
specifying the lexical analyzer and writes
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
which implements the lexical analyzer in the
C programming language C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of ...
. In addition to C, some old versions of Lex could generate a lexer in
Ratfor Ratfor (short for ''Rational Fortran'') is a programming language implemented as a preprocessor for Fortran#FORTRAN 66, Fortran 66. It provides Structured programming, modern control structures, unavailable in Fortran 66, to replace GOTOs and sta ...
.


History

Lex was originally written by
Mike Lesk Michael E. Lesk (born 1945) is an American computer scientist. Biography In the 1960s, Michael Lesk worked for the SMART Information Retrieval System project, wrote much of its retrieval code and did many of the retrieval experiments, as well as ...
and
Eric Schmidt Eric Emerson Schmidt (born April 27, 1955) is an American businessman and former computer engineer who was the chief executive officer of Google from 2001 to 2011 and the company's chairman, executive chairman from 2011 to 2015. He also was the ...
and described in 1975. In the following years, Lex became the standard lexical analyzer generator on many
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
and
Unix-like A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
systems. In 1983, Lex was one of several UNIX tools available for Charles River Data Systems' UNOS operating system under
Bell Laboratories Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, the company operates several lab ...
license. Although originally distributed as proprietary software, some versions of Lex are now
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
. Open-source versions of Lex, based on the original proprietary code, are now distributed with open-source operating systems such as
OpenSolaris OpenSolaris () is a discontinued open-source computer operating system for SPARC and x86 based systems, created by Sun Microsystems and based on Solaris. Its development began in the mid 2000s and ended in 2010. OpenSolaris was developed as ...
and
Plan 9 from Bell Labs Plan 9 from Bell Labs is a distributed operating system which originated from the Computing Science Research Center (CSRC) at Bell Labs in the mid-1980s and built on UNIX concepts first developed there in the late 1960s. Since 2000, Plan 9 has ...
. One popular open-source version of Lex, called flex, or the "fast lexical analyzer", is not derived from proprietary coding.


Structure of a Lex file

The structure of a Lex file is intentionally similar to that of a yacc file: files are divided into three sections, separated by lines that contain only two percent signs, as follows: *The definitions section defines macros and imports
header file An include directive instructs a text file processor to replace the directive text with the content of a specified file. The act of including may be logical in nature. The processor may simply process the include file content at the location of ...
s written in C. It is also possible to write any C code here, which will be copied verbatim into the generated source file. *The rules section associates
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
patterns with C
statement Statement or statements may refer to: Common uses *Statement (computer science), the smallest standalone element of an imperative programming language *Statement (logic and semantics), declarative sentence that is either true or false *Statement, ...
s. When the lexer sees text in the input matching a given pattern, it will execute the associated C code. *The C code section contains C statements and functions that are copied verbatim to the generated source file. These statements presumably contain code called by the rules in the rules section. In large programs it is more convenient to place this code in a separate file linked in at compile time.


Example of a Lex file

The following is an example Lex file for the flex version of Lex. It recognizes strings of numbers (positive integers) in the input, and simply prints them out. /*** Definition section ***/ % %% /*** Rules section ***/ /* -9 matches a string of one or more digits */ -9 ., \n %% /*** C Code section ***/ int main(void) If this input is given to flex, it will be converted into a C file, . This can be compiled into an executable which matches and outputs strings of integers. For example, given the input: abc123z.!&*2gj6 the program will print: Saw an integer: 123 Saw an integer: 2 Saw an integer: 6


Using Lex with other programming tools


Using Lex with parser generators

Lex, as with other lexical analyzers, limits rules to those which can be described by
regular expressions A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of character (computing), characters that specifies a pattern matching, match pattern in string (computer science), text. Usually ...
. Due to this, Lex can be implemented by a finite-state automata as shown by the
Chomsky hierarchy The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a formal language's alphabet that are v ...
of languages. To recognize more complex languages, Lex is often used with parser generators such as
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
or
Bison A bison (: bison) is a large bovine in the genus ''Bison'' (from Greek, meaning 'wild ox') within the tribe Bovini. Two extant taxon, extant and numerous extinction, extinct species are recognised. Of the two surviving species, the American ...
. Parser generators use a
formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
to parse an input stream. It is typically preferable to have a parser, one generated by Yacc for instance, accept a stream of tokens (a "token-stream") as input, rather than having to process a stream of characters (a "character-stream") directly. Lex is often used to produce such a token-stream. Scannerless parsing refers to parsing the input character-stream directly, without a distinct lexer.


Lex and make

make is a utility that can be used to maintain programs involving Lex. Make assumes that a file that has an extension of .l is a Lex source file. The make internal macro LFLAGS can be used to specify Lex options to be invoked automatically by make.


See also

* Flex lexical analyser *
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
* Ragel * re2c * PLY (Python Lex-Yacc) * Comparison of parser generators


References


External links


Using Flex and Bison at Macworld.com
* * {{Plan 9 commands Compiling tools Unix programming tools Unix SUS2008 utilities Plan 9 commands Finite-state machines Lexical analysis