ANTLR4
   HOME

TheInfoList



OR:

In computer-based language recognition, ANTLR (pronounced ''
antler Antlers are extensions of an animal's skull found in members of the Cervidae (deer) Family (biology), family. Antlers are a single structure composed of bone, cartilage, fibrous tissue, skin, nerves, and blood vessels. They are generally fo ...
''), or ANother Tool for Language Recognition, is a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
that uses a LL(*) algorithm for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor
Terence Parr Terence John Parr (born 1964 in Los Angeles) is a professor of computer science at the University of San Francisco. He is best known for his ANTLR parser generator and contributions to parsing theory. He also developed the StringTemplate engine ...
of the
University of San Francisco The University of San Francisco (USF) is a Private university, private Society of Jesus, Jesuit university in San Francisco, California, United States. Founded in 1855, it has nearly 9,000 students pursuing undergraduate and graduate degrees ...
. PCCTS 1.00 was announced April 10, 1992.


Usage

ANTLR takes as input a
grammar In linguistics, grammar is the set of rules for how a natural language is structured, as demonstrated by its speakers or writers. Grammar rules may concern the use of clauses, phrases, and words. The term may also refer to the study of such rul ...
that specifies a language and generates as output
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
for a
recognizer A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
of that language. While Version 3 supported generating code in the
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s Ada95,
ActionScript ActionScript is an object-oriented programming language originally developed by Macromedia Inc. (later acquired by Adobe). It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript (mean ...
, C, C#,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
,
Objective-C Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
,
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
, and
Standard ML Standard ML (SML) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Modular programming, modular, Functional programming, functional programming language with compile-time type checking and t ...
, Version 4 at present targets C#, C++, Dart, Java, JavaScript, Go,
PHP PHP is a general-purpose scripting language geared towards web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by the PHP Group. ...
, Python (2 and 3), and
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIF ...
. A language is specified using a
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the fo ...
expressed using
Extended Backus–Naur Form Extension, extend or extended may refer to: Mathematics Logic or set theory * Axiom of extensionality * Extensible cardinal * Extension (model theory) * Extension (proof theory) * Extension (predicate logic), the set of tuples of values ...
(EBNF). ANTLR can generate lexers,
parser Parsing, syntax analysis, or syntactic analysis is a process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar by breaking it into parts. The term '' ...
s, tree parsers, and combined lexer-parsers. Parsers can automatically generate
parse tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...
s or
abstract syntax tree An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
s, which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. By default, ANTLR reads a grammar and generates a recognizer for the language defined by the grammar (i.e., a program that reads an input stream and generates an error if the input stream does not conform to the syntax specified by the grammar). If there are no syntax errors, the default action is to simply exit without printing any message. In order to do something useful with the language, actions can be attached to grammar elements in the grammar. These actions are written in the programming language in which the recognizer is being generated. When the recognizer is being generated, the actions are embedded in the source code of the recognizer at the appropriate points. Actions can be used to build and check symbol tables and to emit instructions in a target language, in the case of a compiler. Other than lexers and parsers, ANTLR can be used to generate tree parsers. These are recognizers that process abstract syntax trees, which can be automatically generated by parsers. These tree parsers are unique to ANTLR and help processing abstract syntax trees.


Licensing

and ANTLR 4 are
free software Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
, published under a three-clause
BSD License BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lic ...
. Prior versions were released as
public domain software Public-domain software is software that has been placed in the public domain, in other words, software for which there is absolutely no ownership such as copyright, trademark, or patent. Software in the public domain can be modified, distributed, ...
. Documentation, derived from Parr's book ''The Definitive ANTLR 4 Reference'', is included with the BSD-licensed ANTLR 4 source. Various plugins have been developed for the Eclipse development environment to support the ANTLR grammar, including ANTLR Studio, a proprietary product, as well as the "ANTLR 2" and "ANTLR 3" plugins for Eclipse hosted on
SourceForge SourceForge is a web service founded by Geoffrey B. Jeffery, Tim Perdue, and Drew Streib in November 1999. SourceForge provides a centralized software discovery platform, including an online platform for managing and hosting open-source soft ...
.


ANTLR 4

ANTLR 4 deals with direct
left recursion In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on ...
correctly, but not with left recursion in general, i.e., grammar rules ''x'' that refer to ''y'' that refer to ''x''.


Development

As reported on the tools page of the ANTLR project, plug-ins that enable features like syntax highlighting, syntax error checking and code completion are freely available for the most common IDEs (
Intellij IntelliJ IDEA () is an integrated development environment (IDE) written in Java for developing computer software written in Java, Kotlin, Groovy, and other JVM-based languages. It is developed by JetBrains (formerly known as IntelliJ) and is a ...
IDEA,
NetBeans NetBeans is an integrated development environment (IDE) for Java (programming language), Java. NetBeans allows applications to be developed from a set of modular software components called ''modules''. NetBeans runs on Microsoft Windows, Windows, ...
,
Eclipse An eclipse is an astronomical event which occurs when an astronomical object or spacecraft is temporarily obscured, by passing into the shadow of another body or by having another body pass between it and the viewer. This alignment of three ...
,
Visual Studio Visual Studio is an integrated development environment (IDE) developed by Microsoft. It is used to develop computer programs including web site, websites, web apps, web services and mobile apps. Visual Studio uses Microsoft software development ...
and
Visual Studio Code Visual Studio Code, commonly referred to as VS Code, is an integrated development environment developed by Microsoft for Windows, Linux, macOS and web browsers. Features include support for debugging, syntax highlighting, intelligent code comp ...
).


Projects

Software built using ANTLR includes: *
Groovy ''Groovy'' (or, less commonly, ''groovie'' or ''groovey'') is a slang colloquialism popular during the 1960s and 1970s. It is roughly synonymous with words such as "excellent", "fashionable", or "amazing", depending on context. History The word ...
*
Jython Jython is an implementation of the Python (programming language), Python programming language designed to run on the Java (programming language), Java platform. It was known as JPython until 1999. Overview Jython programs can import and use any ...
*
Hibernate Hibernation is a state of minimal activity and metabolic reduction entered by some animal species. Hibernation is a seasonal heterothermy characterized by low body-temperature, slow breathing and heart-rate, and low metabolic rate. It is most ...
* OpenJDK Compiler Grammar project experimental version of the
javac javac (pronounced "java-see") is the primary Java compiler included in the Java Development Kit (JDK) from Oracle Corporation. Martin Odersky implemented the GJ compiler, and his implementation became the basis for javac. The compiler accepts ...
compiler based upon a grammar written in ANTLR *
Apex The apex is the highest point of something. The word may also refer to: Arts and media Fictional entities * Apex (comics) A-Bomb Abomination Absorbing Man Abraxas Abyss Abyss is the name of two characters appearing in Ameri ...
,
Salesforce.com Salesforce, Inc. is an American cloud-based software company headquartered in San Francisco, California. It provides applications focused on sales, customer service, marketing automation, e-commerce, analytics, artificial intelligence, and appl ...
's programming language * The expression evaluator in
Numbers A number is a mathematical object used to count, measure, and label. The most basic examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual numbers can ...
, Apple's spreadsheet *
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
's search query language * Weblogic server *
Apache Cassandra Apache Cassandra is a free and open-source software, free and open-source database management system designed to handle large volumes of data across multiple Commodity computing, commodity servers. The system prioritizes availability and scalab ...
* Processing *
JabRef JabRef is an open-source, cross-platform citation and reference management software. It is used to collect, organize and search bibliographic information. JabRef has a target audience of academics and many university libraries have written guid ...
*
Trino (SQL query engine) Trino is an Open-source software, open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino can query data lakes that contain a variety of file formats such as simple ...
*
Presto (SQL query engine) Presto (including PrestoDB, and PrestoSQL which was re-branded to Trino) is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, All ...
*
MySQL Workbench MySQL Workbench is a visual database design tool that integrates SQL development, administration, database design, creation and maintenance into a single integrated development environment for the MySQL database system. It is the successor to D ...
Over 200 grammars implemented in ANTLR 4 are available on
GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
. They range from grammars for a
URL A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identi ...
to grammars for entire languages like C, Java and Go.


Example

In the following example, a parser in ANTLR describes the sum of expressions can be seen in the form of "1 + 2 + 3": // Common options, for example, the target language options // Followed by the parser class SumParser extends Parser; options // Definition of an expression statement: INTEGER (PLUS^ INTEGER)*; // Here is the Lexer class SumLexer extends Lexer; options PLUS: '+'; DIGIT: ('0'..'9'); INTEGER: (DIGIT)+; The following listing demonstrates the call of the parser in a program: TextReader reader; // (...) Fill TextReader with character SumLexer lexer = new SumLexer(reader); SumParser parser = new SumParser(lexer); parser.statement();


See also

*
Coco/R Coco/R is a compiler generator that takes wirth syntax notationIn the manual, however, it is referred as L-attributed Extended Backus–Naur Form syntax (EBNF). grammars of a source language and generates a scanner and a parser for that lan ...
*
DMS Software Reengineering Toolkit The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source langu ...
* JavaCC * Modular Syntax Definition Formalism *
Parboiled (Java) parboiled is an open-source Java (programming language), Java library released under an Apache License. It provides support for defining Parsing expression grammar, PEG parsers directly in Java source code. parboiled is commonly used as an altern ...
*
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 20 ...
*
SableCC SableCC is an open-source compiler generator (or interpreter generator) in Java. Stable version is licensed under the GNU Lesser General Public License (LGPL). Rewritten version 4 is licensed under Apache License 2.0. SableCC includes the followin ...


References


Bibliography

* * *


Further reading

*


External links

* {{official website, www.antlr.org
ANTLR (mega) Tutorial



ANTLR Studio

Building a Real Estate Search System with ANTLR
1992 software Parser generators Software using the BSD license Public-domain software