HOME

TheInfoList



OR:

AWK (''awk'') is a
domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
designed for text processing and typically used as a
data extraction Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage ( data migration). The import into the intermediate extracting system is thus usua ...
and reporting tool. Like
sed sed ("stream editor") is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed w ...
and
grep grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command ''g/re/p'' (''globally search for a regular expression and print matching lines''), which has the sa ...
, it is a
filter Filter, filtering or filters may refer to: Science and technology Computing * Filter (higher-order function), in functional programming * Filter (software), a computer program to process a data stream * Filter (video), a software component tha ...
, and is a standard feature of most Unix-like operating systems. The AWK language is a data-driven
scripting language A scripting language or script language is a programming language that is used to manipulate, customize, and automate the facilities of an existing system. Scripting languages are usually interpreted at runtime rather than compiled. A scripting ...
consisting of a set of actions to be taken against
streams A stream is a continuous body of surface water flowing within the bed and banks of a channel. Depending on its location or certain characteristics, a stream may be referred to by a variety of local or regional names. Long large streams a ...
of textual data – either run directly on files or used as part of a pipeline – for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the string
datatype In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
,
associative array In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
s (that is, arrays indexed by key strings), and
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s. While AWK has a limited intended
application domain An application domain is a mechanism (similar to a process in an operating system) used within the Common Language Infrastructure (CLI) to isolate executed software applications from one another so that they do not affect each other. Each applic ...
and was especially designed to support one-liner programs, the language is
Turing-complete In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be used to simulate any ...
, and even the early Bell Labs users of AWK often wrote well-structured large AWK programs. AWK was created at
Bell Labs Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mul ...
in the 1970s, and its name is derived from the surnames of its authors:
Alfred Aho Alfred Vaino Aho (born August 9, 1941) is a Canadian computer scientist best known for his work on programming languages, compilers, and related algorithms, and his textbooks on the art and science of computer programming. Aho was elected into ...
, Peter Weinberger, and
Brian Kernighan Brian Wilson Kernighan (; born 1942) is a Canadian computer scientist. He worked at Bell Labs and contributed to the development of Unix alongside Unix creators Ken Thompson and Dennis Ritchie. Kernighan's name became widely known through co- ...
. The acronym is pronounced the same as the bird
auk An auk or alcid is a bird of the family Alcidae in the order Charadriiformes. The alcid family includes the murres, guillemots, auklets, puffins, and murrelets. The word "auk" is derived from Icelandic ''álka'', from Old Norse ''alka'' (a ...
, which is on the cover of '' The AWK Programming Language''. When written in all lowercase letters, as awk, it refers to the
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
or Plan 9 program that runs scripts written in the AWK programming language.


History

AWK was initially developed in 1977 by
Alfred Aho Alfred Vaino Aho (born August 9, 1941) is a Canadian computer scientist best known for his work on programming languages, compilers, and related algorithms, and his textbooks on the art and science of computer programming. Aho was elected into ...
(author of
egrep grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command ''g/re/p'' (''globally search for a regular expression and print matching lines''), which has the sam ...
),
Peter J. Weinberger Peter Jay Weinberger (born August 6, 1942) is a computer scientist best known for his early work at Bell Labs. He now works at Google. Weinberger was an undergraduate at Swarthmore College, graduating in 1964. He received his PhD in mathemati ...
(who worked on tiny relational databases), and
Brian Kernighan Brian Wilson Kernighan (; born 1942) is a Canadian computer scientist. He worked at Bell Labs and contributed to the development of Unix alongside Unix creators Ken Thompson and Dennis Ritchie. Kernighan's name became widely known through co- ...
. AWK takes its name from their respective initials. According to Kernighan, one of the goals of AWK was to have a tool that would easily manipulate both numbers and strings. AWK was also inspired by Marc Rochkind's programming language that was used to search for patterns in input data, and was implemented using
yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
. As one of the early tools to appear in
Version 7 Unix Seventh Edition Unix, also called Version 7 Unix, Version 7 or just V7, was an important early release of the Unix operating system. V7, released in 1979, was the last Bell Laboratories release to see widespread distribution before the commercial ...
, AWK added computational features to a Unix pipeline besides the
Bourne shell The Bourne shell (sh) is a shell command-line interpreter for computer operating systems. The Bourne shell was the default shell for Version 7 Unix. Unix-like systems continue to have /bin/sh—which will be the Bourne shell, or a symbolic l ...
, the only scripting language available in a standard Unix environment. It is one of the mandatory utilities of the Single UNIX Specification, and is required by the
Linux Standard Base The Linux Standard Base (LSB) was a joint project by several Linux distributions under the organizational structure of the Linux Foundation to standardize the software system structure, including the Filesystem Hierarchy Standard used in the Li ...
specification. AWK was significantly revised and expanded in 1985–88, resulting in the GNU AWK implementation written by Paul Rubin,
Jay Fenlason A jay is a member of a number of species of medium-sized, usually colorful and noisy, passerine birds in the Crow family, Corvidae. The evolutionary relationships between the jays and the magpies are rather complex. For example, the Eurasian ...
, and
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
, released in 1988. GNU AWK may be the most widely deployed version because it is included with GNU-based Linux packages. GNU AWK has been maintained solely by Arnold Robbins since 1994.
Brian Kernighan Brian Wilson Kernighan (; born 1942) is a Canadian computer scientist. He worked at Bell Labs and contributed to the development of Unix alongside Unix creators Ken Thompson and Dennis Ritchie. Kernighan's name became widely known through co- ...
's nawk (New AWK) source was first released in 1993 unpublicized, and publicly since the late 1990s; many BSD systems use it to avoid the GPL license. AWK was preceded by
sed sed ("stream editor") is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed w ...
(1974). Both were designed for text processing. They share the line-oriented, data-driven paradigm, and are particularly suited to writing one-liner programs, due to the implicit
main loop In computer science, the event loop is a programming construct or design pattern that waits for and dispatches events or messages in a program. The event loop works by making a request to some internal or external "event provider" (that generally ...
and current line variables. The power and terseness of early AWK programs – notably the powerful regular expression handling and conciseness due to implicit variables, which facilitate one-liners – together with the limitations of AWK at the time, were important inspirations for the
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
language (1987). In the 1990s, Perl became very popular, competing with AWK in the niche of Unix text-processing languages.


Structure of AWK programs

An AWK program is a series of pattern action pairs, written as: condition condition ... where ''condition'' is typically an expression and ''action'' is a series of commands. The input is split into records, where by default records are separated by newline characters so that the input is split into lines. The program tests each record against each of the conditions in turn, and executes the ''action'' for each expression that is true. Either the condition or the action may be omitted. The condition defaults to matching every record. The default action is to print the record. This is the same pattern-action structure as sed. In addition to a simple AWK expression, such as foo

1
or /^foo/, the condition can be BEGIN or END causing the action to be executed before or after all records have been read, or ''pattern1, pattern2'' which matches the range of records starting with a record that matches ''pattern1'' up to and including the record that matches ''pattern2'' before again trying to match against ''pattern1'' on subsequent lines. In addition to normal arithmetic and logical operators, AWK expressions include the tilde operator, ~, which matches a
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
against a string. As handy
syntactic sugar In computer science, syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language "sweeter" for human use: things can be expressed more clearly, more concisely, or in an ...
, ''/regexp/'' without using the tilde operator matches against the current record; this syntax derives from
sed sed ("stream editor") is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed w ...
, which in turn inherited it from the ed editor, where / is used for searching. This syntax of using slashes as
delimiter A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts a ...
s for regular expressions was subsequently adopted by
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
and
ECMAScript ECMAScript (; ES) is a JavaScript standard intended to ensure the interoperability of web pages across different browsers. It is standardized by Ecma International in the documenECMA-262 ECMAScript is commonly used for client-side scripti ...
, and is now common. The tilde operator was also adopted by Perl.


Commands

AWK commands are the statements that are substituted for ''action'' in the examples above. AWK commands can include function calls, variable assignments, calculations, or any combination thereof. AWK contains built-in support for many functions; many more are provided by the various flavors of AWK. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.


The ''print'' command

The ''print'' command is used to output text. The output text is always terminated with a predefined string called the output record separator (ORS) whose default value is a newline. The simplest form of this command is: ; print :This displays the contents of the current record. In AWK, records are broken down into ''fields'', and these can be displayed separately: ; print $1 : Displays the first field of the current record ; print $1, $3 : Displays the first and third fields of the current record, separated by a predefined string called the output field separator (OFS) whose default value is a single space character Although these fields (''$X'') may bear resemblance to variables (the $ symbol indicates variables in
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
), they actually refer to the fields of the current record. A special case, ''$0'', refers to the entire record. In fact, the commands "print" and "print $0" are identical in functionality. The ''print'' command can also display the results of calculations and/or function calls: /regex_pattern/ Output may be sent to a file: /regex_pattern/ or through a
pipe Pipe(s), PIPE(S) or piping may refer to: Objects * Pipe (fluid conveyance), a hollow cylinder following certain dimension rules ** Piping, the use of pipes in industry * Smoking pipe ** Tobacco pipe * Half-pipe and quarter pipe, semi-circular ...
: /regex_pattern/


Built-in variables

Awk's built-in variables include the field variables: $1, $2, $3, and so on ($0 represents the entire record). They hold the text or values in the individual text-fields in a record. Other variables include: * NR: Number of Records. Keeps a current count of the number of input records read so far from all data files. It starts at zero, but is never automatically reset to zero. * FNR: File Number of Records. Keeps a current count of the number of input records read so far ''in the current file.'' This variable is automatically reset to zero each time a new file is started. * NF: Number of Fields. Contains the number of fields in the current input record. The last field in the input record can be designated by $NF, the 2nd-to-last field by $(NF-1), the 3rd-to-last field by $(NF-2), etc. * FILENAME: Contains the name of the current input-file. * FS: Field Separator. Contains the "field separator" used to divide fields in the input record. The default, "white space", allows any sequence of space and tab characters. FS can be reassigned with another character or character sequence to change the field separator. * RS: Record Separator. Stores the current "record separator" character. Since, by default, an input line is the input record, the default record separator character is a "newline". * OFS: Output Field Separator. Stores the "output field separator", which separates the fields when Awk prints them. The default is a "space" character. * ORS: Output Record Separator. Stores the "output record separator", which separates the output records when Awk prints them. The default is a "newline" character. * OFMT: Output Format. Stores the format for numeric output. The default format is "%.6g".


Variables and syntax

Variable names can use any of the characters -Za-z0-9_ with the exception of language keywords. The operators ''+ - * /'' represent addition, subtraction, multiplication, and division, respectively. For string
concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...
, simply place two variables (or string constants) next to each other. It is optional to use a space in between if string constants are involved, but two variable names placed adjacent to each other require a space in between. Double quotes delimit string constants. Statements need not end with semicolons. Finally, comments can be added to programs by using ''#'' as the first character on a line.


User-defined functions

In a format similar to C, function definitions consist of the keyword function, the function name, argument names and the function body. Here is an example of a function. function add_three (number) This statement can be invoked as follows: (pattern) Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some whitespace in the argument list before the local variables, to indicate where the parameters end and the local variables begin.


Examples


Hello World

Here is the customary " Hello, world" program written in AWK: BEGIN


Print lines longer than 80 characters

Print all lines longer than 80 characters. Note that the default action is to print the current line. length($0) > 80


Count words

Count words in the input and print the number of lines, words, and characters (like wc): END As there is no pattern for the first line of the program, every line of input matches by default, so the increment actions are executed for every line. Note that words += NF is shorthand for words = words + NF.


Sum last word

END ''s'' is incremented by the numeric value of ''$NF'', which is the last word on the line as defined by AWK's field separator (by default, white-space). ''NF'' is the number of fields in the current line, e.g. 4. Since ''$4'' is the value of the fourth field, ''$NF'' is the value of the last field in the line regardless of how many fields this line has, or whether it has more or fewer fields than surrounding lines. $ is actually a unary operator with the highest
operator precedence In mathematics and computer programming, the order of operations (or operator precedence) is a collection of rules that reflect conventions about which procedures to perform first in order to evaluate a given mathematical expression. For exampl ...
. (If the line has no fields, then ''NF'' is 0, ''$0'' is the whole line, which in this case is empty apart from possible white-space, and so has the numeric value 0.) At the end of the input the ''END'' pattern matches, so ''s'' is printed. However, since there may have been no lines of input at all, in which case no value has ever been assigned to ''s'', it will by default be an empty string. Adding zero to a variable is an AWK idiom for coercing it from a string to a numeric value. (Concatenating an empty string is to coerce from a number to a string, e.g. ''s ""''. Note, there's no operator to concatenate strings, they're just placed adjacently.) With the coercion the program prints "0" on an empty input, without it, an empty line is printed.


Match a range of input lines

NR % 4

1, NR % 4

3
The action statement prints each line numbered. The printf function emulates the standard C
printf The printf format string is a control parameter used by a class of functions in the input/output libraries of C and many other programming languages. The string is written in a simple template language: characters are usually copied literal ...
and works similarly to the print command described above. The pattern to match, however, works as follows: ''NR'' is the number of records, typically lines of input, AWK has so far read, i.e. the current line number, starting at 1 for the first line of input. ''%'' is the
modulo In computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another (called the '' modulus'' of the operation). Given two positive numbers and , modulo (often abbreviated as ) is ...
operator. ''NR % 4

1'' is true for the 1st, 5th, 9th, etc., lines of input. Likewise, ''NR % 4

3'' is true for the 3rd, 7th, 11th, etc., lines of input. The range pattern is false until the first part matches, on line 1, and then remains true up to and including when the second part matches, on line 3. It then stays false until the first part matches again on line 5. Thus, the program prints lines 1,2,3, skips line 4, and then 5,6,7, and so on. For each line, it prints the line number (on a 6 character-wide field) and then the line contents. For example, when executed on this input: Rome Florence Milan Naples Turin Venice The previous program prints: 1 Rome 2 Florence 3 Milan 5 Turin 6 Venice


Printing the initial or the final part of a file

As a special case, when the first part of a range pattern is constantly true, e.g. ''1'', the range will start at the beginning of the input. Similarly, if the second part is constantly false, e.g. ''0'', the range will continue until the end of input. For example, /^--cut here--$/, 0 prints lines of input from the first line matching the regular expression ''^--cut here--$'', that is, a line containing only the phrase "--cut here--", to the end.


Calculate word frequencies

Word frequency A word list (or ''lexicon'') is a list of a language's lexicon (generally sorted by frequency of occurrence either by levels or as a ranked list) within some given text corpus, serving the purpose of vocabulary acquisition. A lexicon sorted by ...
using
associative array In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
s: BEGIN END The BEGIN block sets the field separator to any sequence of non-alphabetic characters. Note that separators can be regular expressions. After that, we get to a bare action, which performs the action on every input line. In this case, for every field on the line, we add one to the number of times that word, first converted to lowercase, appears. Finally, in the END block, we print the words with their frequencies. The line for (i in words) creates a loop that goes through the array ''words'', setting ''i'' to each ''subscript'' of the array. This is different from most languages, where such a loop goes through each ''value'' in the array. The loop thus prints out each word followed by its frequency count. tolower was an addition to the One True awk (see below) made after the book was published.


Match pattern from command line

This program can be represented in several ways. The first one uses the
Bourne shell The Bourne shell (sh) is a shell command-line interpreter for computer operating systems. The Bourne shell was the default shell for Version 7 Unix. Unix-like systems continue to have /bin/sh—which will be the Bourne shell, or a symbolic l ...
to make a shell script that does everything. It is the shortest of these methods: #!/bin/sh pattern="$1" shift awk '/'"$pattern"'/ ' "$@" The $pattern in the awk command is not protected by single quotes so that the shell does expand the variable but it needs to be put in double quotes to properly handle patterns containing spaces. A pattern by itself in the usual way checks to see if the whole line ($0) matches. FILENAME contains the current filename. awk has no explicit concatenation operator; two adjacent strings concatenate them. $0 expands to the original unchanged input line. There are alternate ways of writing this. This shell script accesses the environment directly from within awk: #!/bin/sh export pattern="$1" shift awk '$0 ~ ENVIRON pattern"' "$@" This is a shell script that uses ENVIRON, an array introduced in a newer version of the One True awk after the book was published. The subscript of ENVIRON is the name of an environment variable; its result is the variable's value. This is like the getenv function in various standard libraries and
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming in ...
. The shell script makes an environment variable pattern containing the first argument, then drops that argument and has awk look for the pattern in each file. ~ checks to see if its left operand matches its right operand; !~ is its inverse. Note that a regular expression is just a string and can be stored in variables. The next way uses command-line variable assignment, in which an argument to awk can be seen as an assignment to a variable: #!/bin/sh pattern="$1" shift awk '$0 ~ pattern ' "pattern=$pattern" "$@" Or You can use the ''-v var=value'' command line option (e.g. ''awk -v pattern="$pattern" ...''). Finally, this is written in pure awk, without help from a shell or without the need to know too much about the implementation of the awk script (as the variable assignment on command line one does), but is a bit lengthy: BEGIN $0 ~ pattern The BEGIN is necessary not only to extract the first argument, but also to prevent it from being interpreted as a filename after the BEGIN block ends. ARGC, the number of arguments, is always guaranteed to be ≥1, as ARGV /code> is the name of the command that executed the script, most often the string "awk". Also note that ARGV RGC/code> is the empty string, "". # initiates a comment that expands to the end of the line. Note the if block. awk only checks to see if it should read from standard input before it runs the command. This means that awk 'prog' only works because the fact that there are no filenames is only checked before prog is run! If you explicitly set ARGC to 1 so that there are no arguments, awk will simply quit because it feels there are no more input files. Therefore, you need to explicitly say to read from standard input with the special filename -.


Self-contained AWK scripts

On Unix-like operating systems self-contained AWK scripts can be constructed using the shebang syntax. For example, a script that prints the content of a given file may be built by creating a file named print.awk with the following content: #!/usr/bin/awk -f It can be invoked with: ./print.awk The -f tells AWK that the argument that follows is the file to read the AWK program from, which is the same flag that is used in sed. Since they are often used for one-liners, both these programs default to executing a program given as a command-line argument, rather than a separate file.


Versions and implementations

AWK was originally written in 1977 and distributed with
Version 7 Unix Seventh Edition Unix, also called Version 7 Unix, Version 7 or just V7, was an important early release of the Unix operating system. V7, released in 1979, was the last Bell Laboratories release to see widespread distribution before the commercial ...
. In 1985 its authors started expanding the language, most significantly by adding user-defined functions. The language is described in the book '' The AWK Programming Language'', published 1988, and its implementation was made available in releases of
UNIX System V Unix System V (pronounced: "System Five") is one of the first commercial versions of the Unix operating system. It was originally developed by AT&T and first released in 1983. Four major versions of System V were released, numbered 1, 2, 3, an ...
. To avoid confusion with the incompatible older version, this version was sometimes called "new awk" or ''nawk''. This implementation was released under a
free software license A free-software license is a notice that grants the recipient of a piece of software extensive rights to modify and redistribute that software. These actions are usually prohibited by copyright law, but the rights-holder (usually the author) ...
in 1996 and is still maintained by Brian Kernighan (see external links below). Old versions of Unix, such as
UNIX/32V UNIX/32V is an early version of the Unix operating system from Bell Laboratories, released in June 1979. 32V was a direct port of the Seventh Edition Unix to the DEC VAX architecture. Overview Before 32V, Unix had primarily run on DEC PD ...
, included awkcc, which converted AWK to C. Kernighan wrote a program to turn awk into C++; its state is not known. * BWK awk, also known as nawk, refers to the version by
Brian Kernighan Brian Wilson Kernighan (; born 1942) is a Canadian computer scientist. He worked at Bell Labs and contributed to the development of Unix alongside Unix creators Ken Thompson and Dennis Ritchie. Kernighan's name became widely known through co- ...
. It has been dubbed the "One True AWK" because of the use of the term in association with the book that originally described the language and the fact that Kernighan was one of the original authors of AWK. FreeBSD refers to this version as ''one-true-awk''. This version also has features not in the book, such as tolower and ENVIRON that are explained above; see the FIXES file in the source archive for details. This version is used by, for example, Android,
FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
,
NetBSD NetBSD is a free and open-source Unix operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was forked. It continues to be actively developed and is ava ...
,
OpenBSD OpenBSD is a security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by forking NetBSD 1.0. According to the website, the OpenBSD project e ...
,
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and la ...
, and
illumos Illumos (stylized as illumos) is a partly free and open-source Unix operating system. It is based on OpenSolaris, which was based on System V Release 4 (SVR4) and the Berkeley Software Distribution (BSD). Illumos comprises a kernel, device d ...
. Brian Kernighan and Arnold Robbins are the main contributors to a source repository for ''nawk'': . * gawk (
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
awk) is another free-software implementation and the only implementation that makes serious progress implementing
internationalization and localization In computing, internationalization and localization (American) or internationalisation and localisation (British English), often abbreviated i18n and L10n, are means of adapting computer software to different languages, regional peculiarities and ...
and TCP/IP networking. It was written before the original implementation became freely available. It includes its own debugger, and its profiler enables the user to make measured performance enhancements to a script. It also enables the user to extend functionality with shared libraries. Some
Linux distribution A Linux distribution (often abbreviated as distro) is an operating system made from a software collection that includes the Linux kernel and, often, a package management system. Linux users usually obtain their operating system by downloading one ...
s include ''gawk'' as their default AWK implementation. ** gawk-csv. The CSV extension of ''gawk ''provides facilities for handling input and output CSV formatted data. * mawk is a very fast AWK implementation by Mike Brennan based on a
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
interpreter. * libmawk is a fork of mawk, allowing applications to embed multiple parallel instances of awk interpreters. * awka (whose front end is written atop the ''mawk'' program) is another translator of AWK scripts into C code. When compiled, statically including the author's libawka.a, the resulting executables are considerably sped up and, according to the author's tests, compare very well with other versions of AWK,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
, or Tcl. Small scripts will turn into programs of 160–170 kB. * tawk (Thompson AWK) is an AWK
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
for Solaris,
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...
,
OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 r ...
, and
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for se ...
, previously sold by Thompson Automation Software (which has ceased its activities). * Jawk is a project to implement AWK in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
, hosted on SourceForge. Extensions to the language are added to provide access to Java features within AWK scripts (i.e., Java threads, sockets, collections, etc.). * xgawk is a fork of ''gawk'' that extends ''gawk'' with dynamically loadable libraries. The XMLgawk extension was integrated into the official GNU Awk release 4.1.0. * QSEAWK is an embedded AWK interpreter implementation included in the QSE library that provides embedding
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
(API) for C and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
. * libfawk is a very small, function-only, reentrant, embeddable interpreter written in C *
BusyBox BusyBox is a software suite that provides several Unix utilities in a single executable file. It runs in a variety of POSIX environments such as Linux, Android, and FreeBSD, although many of the tools it provides are designed to work with inte ...
includes an AWK implementation written by Dmitry Zakharov. This is a very small implementation suitable for embedded systems. * CLAWK by Michael Parker provides an AWK implementation in
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
, based upon the regular expression library of the same author.


Books

* * * *


See also

*
Data transformation In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integrationCIO.com. Agile Comes to Data Integration. Retrieved from: htt ...
*
Event-driven programming In computer programming, event-driven programming is a programming paradigm in which the flow of the program is determined by events such as user actions (mouse clicks, key presses), sensor outputs, or message passing from other programs or thr ...
*
List of Unix commands This is a list of Unix commands as specified by IEEE Std 1003.1-2008, which is part of the Single UNIX Specification (SUS). These commands can be found on Unix operating systems and most Unix-like operating systems. List See also * List of G ...
*
sed sed ("stream editor") is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed w ...


References


Further reading

* *  – Interview with Alfred V. Aho on AWK * * *
AWK  – Become an expert in 60 minutes
* *


External links


The Amazing Awk Assembler
by
Henry Spencer Henry Spencer (born 1955) is a Canadian computer programmer and space enthusiast. He wrote "regex", a widely used software library for regular expressions, and co-wrote C News, a Usenet server program. He also wrote ''The Ten Commandments for C P ...
. *
awklang.org
The site for things related to the awk language {{DEFAULTSORT:Awk 1977 software Cross-platform software Free compilers and interpreters Pattern matching programming languages Scripting languages Domain-specific programming languages Standard Unix programs Text-oriented programming languages Unix SUS2008 utilities Unix text processing utilities Plan 9 commands Programming languages created in 1977