In computer-based language recognition, ANTLR (pronounced ''
antler
Antlers are extensions of an animal's skull found in members of the Cervidae (deer) family. Antlers are a single structure composed of bone, cartilage, fibrous tissue, skin, nerves, and blood vessels. They are generally found only on ...
''), or ANother Tool for Language Recognition, is a
parser generator that uses
LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor
Terence Parr of the
University of San Francisco.
Usage
ANTLR takes as input a
grammar
In linguistics, the grammar of a natural language is its set of structure, structural constraints on speakers' or writers' composition of clause (linguistics), clauses, phrases, and words. The term can also refer to the study of such constraint ...
that specifies a language and generates as output
source code
In computing, source code, or simply code, is any collection of code, with or without comment (computer programming), comments, written using a human-readable programming language, usually as plain text. The source code of a Computer program, p ...
for a
recognizer of that language.
While Version 3 supported generating code in the
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
s
Ada95,
ActionScript
ActionScript is an object-oriented programming language originally developed by Macromedia Inc. (later acquired by Adobe). It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript (meani ...
,
C,
C#,
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
,
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
,
Objective-C
Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its N ...
,
Perl
Perl is a family of two High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it ...
,
Python,
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
, and
Standard ML, Version 4 at present targets
C#,
C++,
Dart
Dart or DART may refer to:
* Dart, the equipment in the game of darts
Arts, entertainment and media
* Dart (comics), an Image Comics superhero
* Dart, a character from ''G.I. Joe''
* Dart, a ''Thomas & Friends'' railway engine character
* D ...
,
Java,
JavaScript,
Go,
PHP
PHP is a General-purpose programming language, general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementati ...
,
Python (2 and 3),
and
Swift.
A language is specified using a
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be ...
expressed using
Extended Backus–Naur Form (EBNF).
ANTLR can generate
lexer
In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' (strings with an assigned and thus identified m ...
s,
parsers,
tree parsers, and combined
lexer-parsers. Parsers can automatically generate
parse trees or
abstract syntax trees, which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers.
By default, ANTLR reads a grammar and generates a recognizer for the language defined by the grammar (i.e., a program that reads an input stream and generates an error if the input stream does not conform to the syntax specified by the grammar). If there are no syntax errors, the default action is to simply exit without printing any message. In order to do something useful with the language, actions can be attached to grammar elements in the grammar. These actions are written in the programming language in which the recognizer is being generated. When the recognizer is being generated, the actions are embedded in the source code of the recognizer at the appropriate points. Actions can be used to build and check symbol tables and to emit instructions in a target language, in the case of a compiler.
Other than lexers and parsers, ANTLR can be used to generate tree parsers. These are recognizers that process abstract syntax trees, which can be automatically generated by parsers. These tree parsers are unique to ANTLR and help processing abstract syntax trees.
Licensing
and ANTLR 4 are
free software
Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, ...
, published under a three-clause
BSD License
BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD li ...
.
Prior versions were released as
public domain software. Documentation, derived from Parr's book ''The Definitive ANTLR 4 Reference'', is included with the BSD-licensed ANTLR 4 source.
[
Various plugins have been developed for the Eclipse development environment to support the ANTLR grammar, including ANTLR Studio, a proprietary product, as well as the "ANTLR 2" and "ANTLR 3" plugins for Eclipse hosted on SourceForge.
]
ANTLR 4
ANTLR 4 deals with direct left recursion
In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on ...
correctly, but not with left recursion in general, i.e., grammar rules ''x'' that refer to ''y'' that refer to ''x''.
Development
As reported on the tools page of the ANTLR project, plug-ins that enable features like syntax highlighting, syntax error checking and code completion are freely available for the most common IDEs ( Intellij IDEA, NetBeans, Eclipse, Visual Studio and Visual Studio Code).
Projects
Software built using ANTLR includes:
* Groovy.
* Jython.
* Hibernate
Hibernation is a state of minimal activity and metabolic depression undergone by some animal species. Hibernation is a seasonal heterothermy characterized by low body-temperature, slow breathing and heart-rate, and low metabolic rate. It most ...
* OpenJDK Compiler Grammar project experimental version of the javac compiler based upon a grammar written in ANTLR.
* Apex, Salesforce.com's programming language.
* The expression evaluator in Numbers, Apple's spreadsheet.
* Twitter
Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
's search query language.
* Weblogic server.
* Apache Cassandra.
* Processing.
* JabRef
JabRef is an open-sourced, cross-platform citation and reference management software. It uses BibTeX and BibLaTeX as its native formats and is therefore typically used for LaTeX. The name JabRef stands for Java, Alver, Batada, Reference. The orig ...
.
* Trino (SQL query engine)
* Presto (SQL query engine)
Presto (including PrestoDB, and PrestoSQL which was re-branded to Trino) is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Allu ...
* MySQL Workbench
Over 200 grammars implemented in ANTLR 4 are available on GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, co ...
. They range from grammars for a URL
A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifie ...
to grammars for entire languages like C, Java and Go.
Example
In the following example, a parser in ANTLR describes the sum of expressions can be seen in the form of "1 + 2 + 3":
// Common options, for example, the target language
options
// Followed by the parser
class SumParser extends Parser;
options
// Definition of an expression
statement: INTEGER (PLUS^ INTEGER)*;
// Here is the Lexer
class SumLexer extends Lexer;
options
PLUS: '+';
DIGIT: ('0'..'9');
INTEGER: (DIGIT)+;
The following listing demonstrates the call of the parser in a program:
TextReader reader;
// (...) Fill TextReader with character
SumLexer lexer = new SumLexer(reader);
SumParser parser = new SumParser(lexer);
parser.statement();
See also
* Coco/R
* DMS Software Reengineering Toolkit
* JavaCC
* Modular Syntax Definition Formalism
* Parboiled (Java)
* Parsing expression grammar
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in ...
* SableCC
References
Bibliography
*
*
*
Further reading
*
External links
* {{official website, www.antlr.org
ANTLR (mega) Tutorial
ANTLR Studio
1992 software
Free compilers and interpreters
Parser generators
Software using the BSD license
Public-domain software