HOME

TheInfoList



OR:

In computer-based language recognition, ANTLR (pronounced ''
antler Antlers are extensions of an animal's skull found in members of the Cervidae (deer) family. Antlers are a single structure composed of bone, cartilage, fibrous tissue, skin, nerves, and blood vessels. They are generally found only on ...
''), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.


Usage

ANTLR takes as input a
grammar In linguistics, the grammar of a natural language is its set of structure, structural constraints on speakers' or writers' composition of clause (linguistics), clauses, phrases, and words. The term can also refer to the study of such constraint ...
that specifies a language and generates as output
source code In computing, source code, or simply code, is any collection of code, with or without comment (computer programming), comments, written using a human-readable programming language, usually as plain text. The source code of a Computer program, p ...
for a recognizer of that language. While Version 3 supported generating code in the
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming l ...
s Ada95,
ActionScript ActionScript is an object-oriented programming language originally developed by Macromedia Inc. (later acquired by Adobe). It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript (meani ...
, C, C#,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
,
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
,
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its N ...
,
Perl Perl is a family of two High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it ...
, Python,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
, and Standard ML, Version 4 at present targets C#, C++,
Dart Dart or DART may refer to: * Dart, the equipment in the game of darts Arts, entertainment and media * Dart (comics), an Image Comics superhero * Dart, a character from ''G.I. Joe'' * Dart, a ''Thomas & Friends'' railway engine character * D ...
, Java, JavaScript, Go,
PHP PHP is a General-purpose programming language, general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementati ...
, Python (2 and 3), and Swift. A language is specified using a
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be ...
expressed using Extended Backus–Naur Form (EBNF). ANTLR can generate
lexer In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' (strings with an assigned and thus identified m ...
s, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate parse trees or abstract syntax trees, which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. By default, ANTLR reads a grammar and generates a recognizer for the language defined by the grammar (i.e., a program that reads an input stream and generates an error if the input stream does not conform to the syntax specified by the grammar). If there are no syntax errors, the default action is to simply exit without printing any message. In order to do something useful with the language, actions can be attached to grammar elements in the grammar. These actions are written in the programming language in which the recognizer is being generated. When the recognizer is being generated, the actions are embedded in the source code of the recognizer at the appropriate points. Actions can be used to build and check symbol tables and to emit instructions in a target language, in the case of a compiler. Other than lexers and parsers, ANTLR can be used to generate tree parsers. These are recognizers that process abstract syntax trees, which can be automatically generated by parsers. These tree parsers are unique to ANTLR and help processing abstract syntax trees.


Licensing

and ANTLR 4 are
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, ...
, published under a three-clause
BSD License BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD li ...
. Prior versions were released as public domain software. Documentation, derived from Parr's book ''The Definitive ANTLR 4 Reference'', is included with the BSD-licensed ANTLR 4 source. Various plugins have been developed for the Eclipse development environment to support the ANTLR grammar, including ANTLR Studio, a proprietary product, as well as the "ANTLR 2" and "ANTLR 3" plugins for Eclipse hosted on SourceForge.


ANTLR 4

ANTLR 4 deals with direct
left recursion In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on ...
correctly, but not with left recursion in general, i.e., grammar rules ''x'' that refer to ''y'' that refer to ''x''.


Development

As reported on the tools page of the ANTLR project, plug-ins that enable features like syntax highlighting, syntax error checking and code completion are freely available for the most common IDEs ( Intellij IDEA, NetBeans, Eclipse, Visual Studio and Visual Studio Code).


Projects

Software built using ANTLR includes: * Groovy. * Jython. *
Hibernate Hibernation is a state of minimal activity and metabolic depression undergone by some animal species. Hibernation is a seasonal heterothermy characterized by low body-temperature, slow breathing and heart-rate, and low metabolic rate. It most ...
* OpenJDK Compiler Grammar project experimental version of the javac compiler based upon a grammar written in ANTLR. * Apex, Salesforce.com's programming language. * The expression evaluator in Numbers, Apple's spreadsheet. *
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
's search query language. * Weblogic server. * Apache Cassandra. * Processing. *
JabRef JabRef is an open-sourced, cross-platform citation and reference management software. It uses BibTeX and BibLaTeX as its native formats and is therefore typically used for LaTeX. The name JabRef stands for Java, Alver, Batada, Reference. The orig ...
. * Trino (SQL query engine) *
Presto (SQL query engine) Presto (including PrestoDB, and PrestoSQL which was re-branded to Trino) is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Allu ...
* MySQL Workbench Over 200 grammars implemented in ANTLR 4 are available on
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, co ...
. They range from grammars for a
URL A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifie ...
to grammars for entire languages like C, Java and Go.


Example

In the following example, a parser in ANTLR describes the sum of expressions can be seen in the form of "1 + 2 + 3": // Common options, for example, the target language options // Followed by the parser class SumParser extends Parser; options // Definition of an expression statement: INTEGER (PLUS^ INTEGER)*; // Here is the Lexer class SumLexer extends Lexer; options PLUS: '+'; DIGIT: ('0'..'9'); INTEGER: (DIGIT)+; The following listing demonstrates the call of the parser in a program: TextReader reader; // (...) Fill TextReader with character SumLexer lexer = new SumLexer(reader); SumParser parser = new SumParser(lexer); parser.statement();


See also

* Coco/R * DMS Software Reengineering Toolkit * JavaCC * Modular Syntax Definition Formalism * Parboiled (Java) *
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in ...
* SableCC


References


Bibliography

* * *


Further reading

*


External links

* {{official website, www.antlr.org
ANTLR (mega) Tutorial

ANTLR Studio
1992 software Free compilers and interpreters Parser generators Software using the BSD license Public-domain software