awk
, it refers to the History
According to Brian Kernighan, one of the goals of AWK was to have a tool that would easily manipulate both numbers and strings. AWK was also inspired by Marc Rochkind's programming language that was used to search for patterns in input data, and was implemented usingStructure of AWK programs
An AWK program is a series of pattern action pairs, written as:foo 1
or /^foo/
, the condition can be BEGIN
or END
causing the action to be executed before or after all records have been read, or ''pattern1, pattern2'' which matches the range of records starting with a record that matches ''pattern1'' up to and including the record that matches ''pattern2'' before again trying to match against ''pattern1'' on subsequent lines.
In addition to normal arithmetic and logical operators, AWK expressions include the tilde operator, ~
, which matches a /
is used for searching. This syntax of using slashes as Commands
AWK commands are the statements that are substituted for ''action'' in the examples above. AWK commands can include function calls, variable assignments, calculations, or any combination thereof. AWK contains built-in support for many functions; many more are provided by the various flavors of AWK. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.The ''print'' command
The ''print'' command is used to output text. The output text is always terminated with a predefined string called the output record separator (ORS) whose default value is a newline. The simplest form of this command is: ;print
:This displays the contents of the current record. In AWK, records are broken down into ''fields'', and these can be displayed separately:
; print $1
: Displays the first field of the current record
; print $1, $3
: Displays the first and third fields of the current record, separated by a predefined string called the output field separator (OFS) whose default value is a single space character
Although these fields (''$X'') may bear resemblance to variables (the $ symbol indicates variables in the usual Unix shells and in print
" and "print $0
" are identical in functionality.
The ''print'' command can also display the results of calculations and/or function calls:
Built-in variables
AWK's built-in variables include the field variables: $1, $2, $3, and so on ($0 represents the entire record). They hold the text or values in the individual text-fields in a record. Other variables include: *NR
: Number of Records. Keeps a current count of the number of input records read so far from all data files. It starts at zero, but is never automatically reset to zero.
* FNR
: File Number of Records. Keeps a current count of the number of input records read so far ''in the current file.'' This variable is automatically reset to zero each time a new file is started.
* NF
: Number of Fields. Contains the number of fields in the current input record. The last field in the input record can be designated by $NF, the 2nd-to-last field by $(NF-1), the 3rd-to-last field by $(NF-2), etc.
* FILENAME
: Contains the name of the current input-file.
* FS
: Field Separator. Contains the "field separator" used to divide fields in the input record. The default, "white space", allows any sequence of space and tab characters. FS can be reassigned with another character or character sequence to change the field separator.
* RS
: Record Separator. Stores the current "record separator" character. Since, by default, an input line is the input record, the default record separator character is a "newline".
* OFS
: Output Field Separator. Stores the "output field separator", which separates the fields when awk prints them. The default is a "space" character.
* ORS
: Output Record Separator. Stores the "output record separator", which separates the output records when awk prints them. The default is a "newline" character.
* OFMT
: Output Format. Stores the format for numeric output. The default format is "%.6g".
Variables and syntax
Variable names can use any of the characters -Za-z0-9_ with the exception of language keywords, and cannot begin with a numeric digit. The operators ''+ - * /'' represent addition, subtraction, multiplication, and division, respectively. For stringUser-defined functions
In a format similar to C, function definitions consist of the keywordfunction
, the function name, argument names and the function body. Here is an example of a function.
Examples
Hello, World!
Here is the customaryPrint lines longer than 80 characters
Print all lines longer than 80 characters. The default action is to print the current line.Count words
Count words in the input and print the number of lines, words, and characters (like wc):words += NF
is shorthand for words = words + NF
.
Sum last word
s
is incremented by the numeric value of $NF
, which is the last word on the line as defined by AWK's field separator (by default, white-space). NF
is the number of fields in the current line, e.g. 4. Since $4
is the value of the fourth field, $NF
is the value of the last field in the line regardless of how many fields this line has, or whether it has more or fewer fields than surrounding lines. $
is actually a unary operator with the highest operator precedence. (If the line has no fields, then NF
is 0, $0
is the whole line, which in this case is empty apart from possible white-space, and so has the numeric value 0.)
At the end of the input, the END
pattern matches, so s
is printed. However, since there may have been no lines of input at all, in which case no value has ever been assigned to s
, s
will be an empty string by default. Adding zero to a variable is an AWK idiom for coercing it from a string to a numeric value. This results from AWK's arithmetic operators, like addition, implicitly casting their operands to numbers before computation as required. (Similarly, concatenating a variable with an empty string coerces from a number to a string, e.g., s ""
. Note, there is no operator to concatenate strings, they are just placed adjacently.) On an empty input, the coercion in
causes the program to print 0
, whereas with just the action
, an empty line would be printed.
Match a range of input lines
Printing the initial or the final part of a file
As a special case, when the first part of a range pattern is constantly true, e.g. ''1'', the range will start at the beginning of the input. Similarly, if the second part is constantly false, e.g. ''0'', the range will continue until the end of input. For example,Calculate word frequencies
Word frequency usingtolower
was an addition to the One True awk (see below) made after the book was published.
Match pattern from command line
This program can be represented in several ways. The first one uses the Bourne shell to make a shell script that does everything. It is the shortest of these methods:$pattern
in the awk command is not protected by single quotes so that the shell does expand the variable but it needs to be put in double quotes to properly handle patterns containing spaces. A pattern by itself in the usual way checks to see if the whole line ($0
) matches. FILENAME
contains the current filename. awk has no explicit concatenation operator; two adjacent strings concatenate them. $0
expands to the original unchanged input line.
There are alternate ways of writing this. This shell script accesses the environment directly from within awk:
ENVIRON
, an array introduced in a newer version of the One True awk after the book was published. The subscript of ENVIRON
is the name of an environment variable; its result is the variable's value. This is like the getenv function in various standard libraries and pattern
containing the first argument, then drops that argument and has awk look for the pattern in each file.
~
checks to see if its left operand matches its right operand; !~
is its inverse. A regular expression is just a string and can be stored in variables.
The next way uses command-line variable assignment, in which an argument to awk can be seen as an assignment to a variable:
BEGIN
is necessary not only to extract the first argument, but also to prevent it from being interpreted as a filename after the BEGIN
block ends. ARGC
, the number of arguments, is always guaranteed to be ≥1, as ARGV /code> is the name of the command that executed the script, most often the string "awk"
. ARGV RGC/code> is the empty string, ""
. #
initiates a comment that expands to the end of the line.
Note the if
block. awk only checks to see if it should read from standard input before it runs the command. This means that
awk 'prog'
only works because the fact that there are no filenames is only checked before prog
is run! If you explicitly set ARGC
to 1 so that there are no arguments, awk will simply quit because it feels there are no more input files. Therefore, you need to explicitly say to read from standard input with the special filename -
.
Self-contained AWK scripts
On Unix-like operating systems self-contained AWK scripts can be constructed using the shebang syntax.
For example, a script that sends the content of a given file to standard output may be built by creating a file named print.awk
with the following content:
#!/usr/bin/awk -f
It can be invoked with: ./print.awk
The -f
tells awk that the argument that follows is the file to read the AWK program from, which is the same flag that is used in sed. Since they are often used for one-liners, both these programs default to executing a program given as a command-line argument, rather than a separate file.
Versions and implementations
AWK was originally written in 1977 and distributed with Version 7 Unix
Version 7 Unix, also called Seventh Edition Unix, Version 7 or just V7, was an important early release of the Unix operating system. V7, released in 1979, was the last Bell Laboratories release to see widespread distribution before the commerc ...
.
In 1985 its authors started expanding the language, most significantly by adding user-defined functions. The language is described in the book '' The AWK Programming Language'', published 1988, and its implementation was made available in releases of UNIX System V. To avoid confusion with the incompatible older version, this version was sometimes called "new awk" or ''nawk''. This implementation was released under a free software license in 1996 and is still maintained by Brian Kernighan (see external links below).
Old versions of Unix, such as UNIX/32V, included awkcc
, which converted AWK to C. Kernighan wrote a program to turn awk into ; its state is not known.
* BWK awk, also known as nawk, refers to the version by Brian Kernighan
Brian Wilson Kernighan (; born January 30, 1942) is a Canadian computer scientist.
He worked at Bell Labs and contributed to the development of Unix alongside Unix creators Ken Thompson and Dennis Ritchie. Kernighan's name became widely known ...
. It has been dubbed the "One True AWK" because of the use of the term in association with the book that originally described the language and the fact that Kernighan was one of the original authors of AWK. FreeBSD refers to this version as ''one-true-awk''. This version also has features not in the book, such as tolower
and ENVIRON
that are explained above; see the FIXES file in the source archive for details. This version is used by, for example, Android, FreeBSD
FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
, NetBSD
NetBSD is a free and open-source Unix-like operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was fork (software development), forked. It continues to ...
, OpenBSD, macOS
macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
, and illumos
Illumos (stylized as "illumos") is a partly free and open-source Unix operating system. It has been developed since 2010 and is based on OpenSolaris, after the discontinuation of that product by Oracle. It comprises a kernel, device driver ...
. Brian Kernighan and Arnold Robbins are the main contributors to a source repository for ''nawk'': .
* gawk ( GNU awk) is another free-software implementation and the only implementation that makes serious progress implementing internationalization and localization
In computing, internationalization and localization (American English, American) or internationalisation and localisation (British English, British), often abbreviated i18n and l10n respectively, are means of adapting to different languages, regi ...
and TCP/IP networking. It was written before the original implementation became freely available. It includes its own debugger, and its profiler enables the user to make measured performance enhancements to a script. It also enables the user to extend functionality with shared libraries. Some Linux distribution
A Linux distribution, often abbreviated as distro, is an operating system that includes the Linux kernel for its kernel functionality. Although the name does not imply product distribution per se, a distro—if distributed on its own—is oft ...
s include ''gawk'' as their default AWK implementation. As of version 5.2 (September 2022) ''gawk'' includes a persistent memory feature that can remember script-defined variables and functions from one invocation of a script to the next and pass data between unrelated scripts, as described in the Persistent-Memory ''gawk'' User Manual: .
** gawk-csv. The CSV extension of ''gawk'' provides facilities for inputting and outputting CSV formatted data.
* mawk is a very fast AWK implementation by Mike Brennan based on a bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
interpreter.
* libmawk is a fork of mawk, allowing applications to embed multiple parallel instances of awk interpreters.
* awka (whose front end is written atop the ''mawk'' program) is another translator of AWK scripts into C code. When compiled, statically including the author's libawka.a, the resulting executables are considerably sped up and, according to the author's tests, compare very well with other versions of AWK, Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
, or Tcl. Small scripts will turn into programs of 160–170 kB.
* tawk (Thompson AWK) is an AWK compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
for Solaris, DOS, OS/2
OS/2 is a Proprietary software, proprietary computer operating system for x86 and PowerPC based personal computers. It was created and initially developed jointly by IBM and Microsoft, under the leadership of IBM software designer Ed Iacobucci, ...
, and Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
, previously sold by Thompson Automation Software (which has ceased its activities).
* Jawk is a project to implement AWK in Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, hosted on SourceForge. Extensions to the language are added to provide access to Java features within AWK scripts (i.e., Java threads, sockets, collections, etc.).
* xgawk is a fork of ''gawk'' that extends ''gawk'' with dynamically loadable libraries. The XMLgawk extension was integrated into the official GNU Awk release 4.1.0.
* QSEAWK is an embedded AWK interpreter implementation included in the QSE library that provides embedding application programming interface
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
(API) for C and C++.
* libfawk is a very small, function-only, reentrant, embeddable interpreter written in C
* BusyBox includes an AWK implementation written by Dmitry Zakharov. This is a very small implementation suitable for embedded systems.
* CLAWK by Michael Parker provides an AWK implementation in Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
, based upon the regular expression library of the same author.
* goawk is an AWK implementation in Go with a few convenience extensions by Ben Hoyt, hosted o
Github
The gawk manual has a list of more AWK implementations.
Books
*
*
*
*
*
See also
* Data transformation
* Event-driven programming
In computer programming, event-driven programming is a programming paradigm in which the Control flow, flow of the program is determined by external Event (computing), events. User interface, UI events from computer mouse, mice, computer keyboard, ...
* List of Unix commands
This is a list of the shell commands of the most recent version of the Portable Operating System Interface (POSIX) IEEE Std 1003.1-2024 which is part of the Single UNIX Specification (SUS). These commands are implemented in many shells on moder ...
* sed
References
Further reading
*
* – Interview with Alfred V. Aho on AWK
*
*
*
AWK – Become an expert in 60 minutes
*
*
External links
The Amazing Awk Assembler
by Henry Spencer.
*
awklang.org
The site for things related to the awk language
*
{{DEFAULTSORT:Awk
1977 software
Cross-platform software
Domain-specific programming languages
Free and open source interpreters
Pattern matching programming languages
Plan 9 commands
Programming languages created in 1977
Scripting languages
Standard Unix programs
Text-oriented programming languages
Unix SUS2008 utilities
Unix text processing utilities