awk -- data transformation, report generation language

SYNOPSIS

awk [-F ere] [-f prog] [-v var=value ...] [program] [var=value ...] [file ...]

awk is a file-processing language which is well suited to data manipulation and retrieval of information from text files. This reference page provides a full technical description of awk. If you are unfamiliar with the language, you may find it helpful to read the online AWK Tutorial before reading the following material.

An awk program consists of any number of user-defined functions and rules in the form:

pattern {action}

There are two ways to specify the awk program:

Directly on the command line. In this case, the program is a single command line argument, usually enclosed in apostrophes (') to prevent the shell from attempting to expand it.

By using the -f prog option.

You can only specify program directly on the command line if you do not use any -f prog arguments.

When you specify files on the command line, those files provide the input data for awk to manipulate. If you specify no such files or you specify - as a file, awk reads data from the standard input.

You can initialize variables on the command line using

var=value

You can intersperse such initializations with the names of input files on the command line. awk processes initializations and input files in the order they appear on the command line. For example, the command

awk -f progfile a=1 f1 f2 a=2 f3

sets a to 1 before reading input from f1 and sets a to 2 before reading input from f3.

Variable initializations that appear before the first file on the command line are performed immediately after the BEGIN action. Initializations appearing after the last file are performed immediately before the END action. For more information on BEGIN and END, see Patterns.

The -v option lets you assign a value to a variable before the awk program begins running (that is, before the BEGIN action). For example, in

awk -v v1=10 -f prog datafile

awk assigns the variable v1 its value before the BEGIN action of the program (but after default assignments made to built-in variables like FS and OFMT; these built-in variables have special meaning to awk, as described in later sections).

awk divides input into records. By default, newline characters separate records; however, you may specify a different record separator if you want.

One at a time, and in order, awk compares each input record with the pattern of every rule in the program. When a pattern matches, awk performs the action part of the rule on that input record. Patterns and actions often refer to separate fields within a record. By default, white space (usually blanks, newlines, or horizontal tab characters) separates fields; however, you can specify a different field separator string using the -F ere option (see Input).

You can omit the pattern or action part of an awk rule (but not both). If you omit pattern, awk performs the action on every input record (that is, every record matches). If you omit action, awk writes every record matching the pattern to the standard output.

awk considers everything after a # in a program line to be a comment. For example:

# This is a comment

To continue program lines on the next line, add a backslash (\) to the end of the line. Statement lines ending with a comma (,), double or-bars (||), or double ampersands (&&) continue automatically on the next line.

Options

-F ere: specifies an extended regular expression to use as the field separator.
-f prog: runs the awk program contained in the file prog. When more than one -f option appears on the command line, the resulting program is a concatenation of all programs you specify.
-v var=value: assigns value to var before running the program. You can specify this option a number of times.

Variables and Expressions

There are three types of variables in awk: identifiers, fields and array elements.

An identifier is a sequence of letters, digits and underscores beginning with a letter or an underscore.

For a description of fields, see the Input subsection.

Arrays are associative collections of values called the elements of the array. Constructs of the form,

identifier[subscript]

where subscript has the form expr or expr,expr,.... refer to array elements. Each such expr can have any string value. For multiple expr subscripts, awk concatenates the string values of all exprs with a separate character SUBSEP between each. The initial value of SUBSEP is set to \034 (ASCII field separator).

Fields and identifiers are sometimes referred to as scalar variables to distinguish them from arrays.

You do not declare awk variables and you do not need to initialize them. The value of an uninitialized variable is the empty string in a string context and the number 0 in a numeric context.

Expressions consist of constants, variables, functions, regular expressions and subscript in array conditions (described later) combined with operators. Each variable and expression has a string value and a corresponding numeric value; awk uses the value appropriate to the context.

When converting a numeric value to its corresponding string value, awk performs the equivalent of a call to the sprintf function (see Built-In String Functions). The one and only expr argument is the numeric value and the fmt argument is either %d (if the numeric value is an integer) or the value of the variable CONVFMT (if the numeric value is not an integer). The default value of CONVFMT is %.6g. If you use a string in a numeric context, and awk cannot interpret the contents of the string as a number, it treats the value of the string as zero.

Numeric constants are sequences of decimal digits.

String constants are quoted, as in "a literal string". Literal strings can contain the escape sequences shown in Table 1, Escape Sequences in awk Literal Strings.

awk supports extended regular expressions (see regexp). When awk reads a program, it compiles characters enclosed in slash characters (/) as regular expressions. In addition, when literal strings and variables appear on the right side of a ~ or !~ operator, or as certain arguments to built-in matching and substitution functions, awk interprets them as dynamic regular expressions.

Escape	Character

`\a`	audible bell
`\b`	backspace
`\f`	formfeed
`\n`	newline
`\r`	carriage return
`\t`	horizontal tab
`\v`	vertical tab
`\`ooo	octal value ooo
`\x`dd	hexadecimal value dd
`\/`	slash
`\"`	quote

Table 1: Escape Sequences in awk Literal Strings

Note:

When you use literal strings as regular expressions, you need extra backslashes to escape regular expression metacharacters, since the backslash is also the literal string escape character. For example the regular expression,

/e\.g\./

when written as a string is:

"e\\.g\\."

awk defines the subscript in array condition as:

index in array

where index looks like expr or (expr,...,expr). This condition evaluates to 1 if the string value of index is a subscript of array, and to 0 otherwise. This is a way to determine if an array element exists. When the element does not exist, this condition does not create it.

Symbol Table

You can access the symbol table through the built-in array SYMTAB.

SYMTAB[expr]

is equivalent to the variable named by the evaluation of expr. For example,

SYMTAB["var"]

is a synonym for the variable var.

Environment

An awk program can determine its initial environment by examining the ENVIRON array. If the environment consists of entries of the form:

name=value

then

ENVIRON[name]

has string value

"value"

For example, the following program is equivalent to the default output of env:

BEGIN {
	for (i in ENVIRON)
		printf("%s=%s\n", i, ENVIRON[i])
	exit
}

Operators

awk follows the usual precedence order of arithmetic operations, unless overridden with parentheses; a table giving the order of operations appears later in this section.

The unary operators are +, -, ++ and --, where you can use the ++ and -- operators as either postfix or prefix operators, as in C. The binary arithmetic operators are +, -, *, /, % and ^.

The conditional operator

expr ? expr1 : expr2

evaluates to expr1 if the value of expr is non-zero, and to expr2 otherwise.

If two expressions are not separated by an operator, awk concatenates their string values.

The operator ~ yields 1 (true) if the regular expression on the right side matches the string on the left side. The operator !~ yields 1 when the right side has no match on the left. To illustrate:

$2 ~ /[0-9]/

selects any line where the second field contains at least one digit. awk interprets any string or variable on the right side of ~ or !~ as a dynamic regular expression.

The relational operators are <, <=, >, >=, == and !=. When both operands in a comparison are numeric, awk compares their values numerically; otherwise it compares them as strings. An operand is numeric if it is an integer or floating point number, if it is a field or ARGV element that looks like a number, or if it is a variable created by a command line assignment that looks like a number.

The Boolean operators are || (or), && (and) and ! (not). Short Circuit Evaluation is used when evaluating expressions. With an && expression, if the first operator is false, the entire expression is false and it is not necessary to evaluate the second operator. With an || expression, a similar situation exists if the first operator is true.

You can assign values to a variable with

var = expr

If op is a binary arithmetic operator,

var op= expr

is equivalent to

var = var op expr

except that var is evaluated only once.

See Table 2, awk Order of Operations for the precedence rules of the operators.

Command Line Arguments

awk sets the built-in variable ARGC to the number of command line arguments. The built-in array ARGV has elements subscripted with digits from zero to ARGC-1, giving command line arguments in the order they appeared on the command line.

The ARGC count and the ARGV vector do not include command line options (beginning with -) or the program file (following -f). They do include the name of the command itself, initialization statements of the form

var=value

and the names of input data files.

awk actually creates ARGC and ARGV before doing anything else. It then walks through ARGV processing the arguments. If an element of ARGV is an empty string, awk skips it. If it contains an equals sign (=), awk interprets it as a variable assignment. If it is a minus sign (-), awk immediately reads input from the standard input until it encounters the end-of-file; otherwise, awk treats the argument as a file name and reads input from that file until it reaches end-of-file.

Note:: awk runs the program by walking through ARGV in this way; thus if the program changes ARGV, awk can read different files and make different assignments.

Input

awk divides input into records. A record separator character separates each record from the next. The value of the built-in variable RS gives the current record separator character; by default, it begins as the newline (\n). If you assign a different character to RS, awk uses that as the record separator character from that point on.

Order of Operations

`(A)`	grouping

`$`i `V[a]`	field, array element

`V++ V--`	increment, decrement
`++V --V`

`A^B`	exponentiation

`+A -A !A`	unary plus, unary minus, logical NOT

`A*B A/B A%B`	multiplication, division, remainder

`A+B A-B`	addition, subtraction

`A B`	string concatenation

`A<B A>B A<=B`	comparisons
`A>=B A!=B A==B`

`A~B A!~B`	regular expression matching

`A in V`	array membership

`A && B`	logical AND

`A \|\| B`	logical OR

`A ? B : C`	conditional expression

`V=B V+=B V-=B`	assignment
`V*=B V/=B V%=B`
`V^=B`

`A`, `B` and `C` are any expression.
i is any expression yielding an integer.
`V` is any variable.

Table 2: awk Order of Operations

awk divides records into fields. A field separator string, given by the value of the built-in variable FS, separates each field from the next. You can set a specific separator string by assigning a value to FS, or by specifying the -F ere option on the command line. You can assign a regular expression to FS. For example,

FS = "[,:$]"

says that commas, colons or dollar signs can separate fields. As a special case, assigning FS a string containing only a blank character sets the field separator to white space. In this case, awk considers any sequence of contiguous space and/or tab characters a single field separator. This is the default for FS; however, if you assign FS a string containing any other character, that character designates the start of a new field. For example, if you set FS="\t" (the tab character),

texta \t textb \t  \t  \t textc

contains five fields, two of which contain only blanks. With the default setting, this record contains only three fields, since awk considers the sequence of multiple blanks and tabs a single separator.

The following list of built-in variables provides various pieces of information about input.

NF        number of fields in the current record
NR        number of records read so far
FILENAME  name of file containing current record
FNR       number of records read from current file

Field specifiers have the form $n where n runs from 1 through NF. Such a field specifier refers to the nth field of the current input record. $0 (zero) refers to the entire current input record.

The getline function can read a value for a variable or $0 from the current input, from a file, or from a pipe. The result of getline is an integer indicating whether the read operation was successful. A value of 1 indicates success; 0 indicates end-of-file encountered; and -1 indicates that an error occurred. Possible forms for getline are:

getline: reads next input record into $0 and splits the record into fields. NF, NR and FNR are set appropriately.
getline var: reads next input record into the variable var. awk does not split the record into fields (which means that the current $n values do not change), but sets NR and FNR appropriately.
getline <expr: interprets the string value of expr to be a file name. awk reads the next record from that file into $0, splits it into fields and sets NF appropriately. If the file is not open, awk opens it. The file remains open until you close it with a close() function.
getline var <expr: interprets the string value of expr to be a file name and reads the next record from that file into the variable var, but does not split it into fields.
expr | getline: interprets the string value of expr as a command line to be run. awk pipes output from this command into getline and reads it into $0 in a manner similar to getline <expr. See the System Function section for additional details.
expr | getline var: runs the string value of expr as a command and pipes the output of the command into getline. The result is similar to getline var <expr.

You can only have a limited number of files and pipes open at one time. You can close files and pipes during execution using the

close(expr)

function. The expr must be one that came before | or after < in getline, or after > or >> in print or printf. For a description of print and printf, see the Output section. If the function successfully closes the pipe, it returns zero. By closing files and pipes that you no longer need, you can use any number of files and pipes in the course of running an awk program.

Built-In Arithmetic Functions

atan2(expr1, expr2): returns the arctangent of expr1/expr2 in the range of -pi through pi.
exp(expr), log(expr), sqrt(expr): returns the exponential, natural logarithm, and square root of the numeric value of expr. If you omit (expr), these functions use $0 instead.
int(expr): returns the integer part of the numeric value of expr. If you omit (expr), the function returns the integer part of $0.
rand(): returns a random floating-point number in the range 0 through 1.
sin(expr), cos(expr): returns the sine and cosine of the numeric value of expr (interpreted as an angle in radians).
srand(expr): sets the seed of the rand() function to the integer value of expr. If you omit (expr), awk uses the time of day as a default seed.

Built-In String Functions

n = gsub(regexp, repl, string)

works the same way as sub(), except that gsub() replaces all matching substrings (global substitution). The return value is the number of substitutions performed.

pos = index(string, str)

returns the position of the first occurrence of str in string. If index() does not find str in string, it returns zero.

len = length(expr)

returns the number of characters in the string value of expr. If you omit (expr), the function uses $0 instead. The parentheses around expr are optional.

pos = match(string, regexp)

searches string for the first substring matching the regular expression regexp and returns an integer giving the position of this substring counting characters; the count starts at one. If it finds no such substring, match() returns zero. This function also sets the built-in variable RSTART to pos and the built-in variable RLENGTH to the length of the matched string. If it does not find a match, match() sets RSTART to zero and RLENGTH to -1. You can enclose regexp in slashes or specify it as a string.

n = ord(expr)

returns the integer value of first character in the string value of expr. This is useful in conjunction with %c in sprintf().

n = split(string, array, regexp)

splits string into fields. regexp is a regular expression giving the field separator string for the purposes of this operation. This function assigns the separate fields, in order, to the elements of array; subscripts for array begin at 1. awk discards all other elements of array. split() returns the number of fields into which it divided string (which is also the maximum subscript for array). regexp divides the record in the same way that the FS field separator string does. If you omit regexp in the call to split(), it uses the current value of FS.

str = sprintf(fmt, expr, expr...)

formats the expression list expr, expr, ... using specifications from the string fmt, then returns the formatted string. The fmt string consists of conversion specifications which convert and add the next expr to the string, and ordinary characters which sprintf() simply adds to the string. These conversion specifications are similar to those used by the ANSI C standard.

Conversion specifications have the form

%[flag][x][.y]c

where

x	is the minimum field width
y	is the precision
c	is the conversion character
flag	is a flag character

In a string, the precision is the maximum number of characters to be printed from the string; in a number, the precision is the number of digits to be printed to the right of the decimal point in a floating point value. If x or y is * (asterisk), the minimum field width or precision is the value of the next expr in the call to sprintf().

The conversion character c is one of following:

d	decimal integer
i	decimal integer
o	unsigned octal integer
x,X	unsigned hexadecimal integer
u	unsigned decimal integer
f,F	floating point
e,E	floating point (scientific notation)
g,G	the shorter of e and f (suppresses non-significant zeros)
c	single character of an integer value; first character of string
s	string

The lowercase x prints alphabetic hex digits in lowercase while the uppercase X prints alphabetic hex digits in uppercase. The other uppercase and lowercase pairs work similarly.

flag is a sting consisting of characters from the following list that provides additional formatting information:

-	left justifies the field; default is right justification
0	leading zero prints numbers with leading zero
'	displays thousands separator when TK_USE_CURRENT_LOCALE is set.  
	(Only with decimal integer and floating point conversions)
"	same as '

Note:: A single or double quote included in the flag string may require appropriate quoting for it to be interpreted correctly.

When flag contains a ' or " character and the TK_USE_CURRENT_LOCALE environment variable is set, a thousands separator is displayed. The digital grouping character (for example, a comma in the United States) as set by the Regional and Language Options control panel applet is used as the thousands separator. When flags contains ' or " and TK_USE_CURRENT_LOCALE is unset, no thousands separator is displayed.

For example, the following MKS KornShell commands:

export TK_USE_CURRENT_LOCALE=1
awk 'BEGIN { printf("%'\''10d\n",123456)}'

display:

123,456

while the MKS KornShell commands:

unset TK_USE_CURRENT_LOCALE
awk 'BEGIN { printf("%'\''10d\n",123456)}'

display:

n = sub(regexp, repl, string)

searches string for the first substring matching the extended regular expression regexp, and replaces the substring with the string repl. awk replaces any ampersand (&) in repl with the substring of string which matches regexp. You can suppress this special behavior by preceding the ampersand with a backslash. If you omit string, sub() uses the current record instead. sub() returns the number of substrings replaced (which is one if it found a match, or zero otherwise).

str = substr(string, offset, len)

returns the substring of string that begins in position offset and is at most len characters long. The first character of the string has an offset equal to one. If you omit len, substr() returns the rest of string.

str = tolower(expr)

converts all letters in the string value of expr into lowercase and returns the result. If you omit expr, tolower() uses $0 instead.

str = toupper(expr)

converts all letters in the string value of expr into uppercase and returns the result. If you omit expr, toupper() uses $0 instead.

System Function

status = system(expr)

runs the string value of expr as a command. For example,

system("tail " $1)

calls the tail command, using the string value of $1 as the file that tail examines. The standard command interpreter runs the command as discussed in the PORTABILITY section, and the exit status returned depends on that command interpreter.

User-Defined Functions

You can define your own functions using the form

function name(parameter-list) {
	statements
}

A function definition can appear in the place of a pattern {action} rule. The parameter-list contains any number of normal (scalar) and array variables separated by commas. When you call a function, awk passes scalar arguments by value, and array arguments by reference. The names specified in the parameter-list are local to the function; all other names used in the function are global. You can define local variables by adding them to the end of the parameter list as long as no call to the function uses these extra parameters.

A function returns to its caller either when it performs the final statement in the function, or when it reaches an explicit return statement. The return value, if any, is specified in the return statement (see the Actions section).

Patterns

A pattern is a regular expression, a special pattern, a pattern range, or any arithmetic expression.

BEGIN is a special pattern used to label actions that awk performs before reading any input records. END is a special pattern used to label actions that awk performs after reading all input records.

You can give a pattern range as

pattern1,pattern2

This range matches all lines from the line that matches pattern1 to the line that matches pattern2, inclusive.

If you omit a pattern, or if the numeric value of the pattern is non-zero (true), awk performs the resulting action for the line.

Actions

An action is a series of statements terminated by semicolons, newlines, or closing braces. A condition is any expression; awk considers a non-zero value true and a zero value false. A statement is one of the following or any series of statements enclosed in braces.

# expression statement, for example, assignment
expression

# if statement
if (condition)
	statement
[else
	statement]

# while loop
while (condition)
	statement

# do-while loop
do
	statement
while (condition)

# for loop
for (expression1; condition; expression2)
	statement

The for statement is equivalent to:

expression1
while (condition) {
	statement
	expression2
}

The for statement can also have the form

for (i in array)
	statement

awk performs the statement once for each element in array; on each repetition, the variable i contains the name of a subscript of array, running through all the subscripts in an arbitrary order. If array is multi-dimensional (has multiple subscripts), i is expressed as a single string with the SUBSEP character separating the subscripts.

The statement

break

exits a for or a while loop immediately.

continue

stops the current iteration of a for or while loop and begins the next iteration (if there is one).

next

terminates any processing for the current input record and immediately starts processing the next input record. Processing for the next record begins with the first appropriate rule.

exit[(expr)]

immediately goes to the END action if it exists; if there is no END action, or if awk is already performing the END action, the awk program terminates. awk sets the exit status of the program to the numeric value of expr. If you omit (expr), the exit status is 0.

return [expr]

returns from the execution of a function. If you specify an expr, the function returns the value of the expression as its result; otherwise, the function result is undefined.

delete array[i]

deletes element i from the given array.

print expr, expr, ...

is described in the Output subsection.

printf fmt, expr, expr, ...

is also described in the Output subsection.

Output

The print statement prints its arguments with only simple formatting. If it has no arguments, it prints the current input record in its entirety. awk adds the output record separator ORS to the end of the output that each print statement produces; when commas separate arguments in the print statement, the output field separator OFS separates the corresponding output values. ORS and OFS are built-in variables, the values of which you can change by assigning them strings. The default output record separator is a newline and the default output field separator is a space.

The variable OFMT gives the format of floating point numbers output by print. By default, the value is %.6g; you can change this value by assigning OFMT a different string value. OFMT applies only to floating point numbers (that is, ones with fractional parts).

The printf statement formats its arguments using the fmt argument. Formatting is the same as for the built-in function sprintf(). Unlike print, printf does not add output separators automatically, giving the program more precise control of the output.

The print and printf statements write to the standard output. You can redirect output to a file or pipe as described later.

If you add >expr to a print or printf statement, awk treats the string value of expr as a file name, and writes output to that file. Similarly, if you add >>expr, awk appends output to the current contents of the file. The distinction between > and >> is only important for the first print to the file expr. Subsequent outputs to an already open file append to what is there already.

To eliminate ambiguities, statements such as

print a > b c

are syntactically illegal. Use parentheses to resolve the ambiguity.

If you add |expr to a print or printf statement, awk treats the string value of expr as an executable command and runs it with the output from the statement piped as input into the command.

As mentioned earlier, only a limited number of files and pipes can be open at any time. To avoid going over the limit, use the close() function to close files and pipes when you no longer need them.

print and printf are also available as functions with the same calling sequence, but no redirection.

EXAMPLES

awk '{print NR ":" $0}' input1

outputs the contents of the file input1 with line numbers prepended to each line.

The following is an example using var=value on the command line.

awk '{print NR SEP $0}' SEP=":" input1

awk can also read the program script from a file as in the command line:

awk -f addline.awk input1

which produces the same output when the file addline.awk contains

{print NR ":" $0}

The following program appends all input lines starting with January to the file jan (which may or may not exist already), and all lines starting with February or March to the file febmar:

/^January/ {print >> "jan"}
/^February|^March/ {print >> "febmar"}

This program prints the total and average for the last column of each input line:

	{s += $NF}
END	{print "sum is", s, "average is", s/NR}

The next program interchanges the first and second fields of input lines:

{
	tmp = $1
	$1 = $2
	$2 = tmp
	print
}

The following example inserts line numbers so that output lines are left-aligned:

{printf "%-6d: %s\n", NR, $0}

The following example prints input records in reverse order (assuming sufficient memory):

{
	a[NR] = $0 # index using record number
}
END {
	for (i = NR; i>0; --i)
		print a[i]
}

The next program determines the number of lines starting with the same first field:

{
	++a[$1] # array indexed using the first field
}
END {	# note output is in undefined order
	for (i in a)
		print a[i], "lines start with", i
}

The following program can be used to determine the number of lines in each input file:

{
	++a[FILENAME]
}
END {
	for (file in a)
		if (a[file] = = 1)
			print file, "has 1 line"
		else
			print file, "has", a[file], "lines"
}

The following program illustrates how you can use a two dimensional array in awk. Assume the first field of each input record contains a product number, the second field contains a month number, and the third field contains a quantity (bought, sold, or whatever). The program generates a table of products versus month.

BEGIN	{NUMPROD = 5}
{
	array[$1,$2] += $3
}
END	{
	print "\t Jan\t Feb\tMarch\tApril\t May\t" \
	    "June\tJuly\t Aug\tSept\t Oct\t Nov\t Dec"
	for (prod = 1; prod <= NUMPROD; prod++) {
		printf "%-7s", "prod#" prod
		for (month = 1; month <= 12; month++){
			printf "\t%5d", array[prod,month]
		}
		printf "\n"
	}
}

As the following program reads in each line of input, it reports whether the line matches a pre-determined value:

function randint() {
	return (int((rand()+1)*10))
}
BEGIN	{
	prize[randint(),randint()] = "$100";
	prize[randint(),randint()] = "$10";
	prize[1,1] = "the booby prize"
	}
{
	if (($1,$2) in prize)
		printf "You have won %s!\n", prize[$1,$2]
}

The following example prints lines, the first and last fields of which are the same, reversing the order of the fields:

$1= =$NF {
	for (i = NF; i > 0; --i)
		printf "%s", $i (i>1 ? OFS : ORS)
}

The following program prints the input files from the command line. The infiles() function first empties the passed array, and then fills the array. Notice that the extra parameter i of infiles() is a local variable.

function infiles(f,i) {
	for (i in f)
		delete f[i]
	for (i = 1; i < ARGC; i++)
		if (index(ARGV[i],"=") = = 0)
			f[i] = ARGV[i]
}
BEGIN	{
	infiles(a)
	for (i in a)
		print a[i]
	exit
}

Here is the standard recursive factorial function:

function fact(num) {
	if (num <= 1)
		return 1
	else
		return num * fact(num - 1)
}
{ print $0 " factorial is " fact($0) }

The following program illustrates the use of getline with a pipe. Here, getline sets the current record from the output of the wc command. The program prints the number of words in each input file.

function words(file,   string) {
	string = "wc " file
	string | getline
	close(string)
	return ($2)
}
BEGIN	{
	for (i=1; i<ARGC; i++) {
		fn = ARGV[i]
		printf "There are %d words in %s.",
		    words(fn), fn
	}
}

ENVIRONMENT VARIABLES

PATH

contains a list of directories that awk searches when looking for commands run by system(expr), or input and output pipes.

TK_USE_CURRENT_LOCALE

specifies whether or not to use relevant information from the current locale. When set, current locale information is used; when unset, the default locale information is used.

Note:: Current locale information is set using the Regional and Language Options control panel applet.

For awk, this environment variable determines the characters displayed for the decimal point and the thousands separator.

Any other environment variable may be accessed by the awk program itself.

DIAGNOSTICS

Possible exit status values are:

0

Successful completion.

1

Any of the following errors:

— parser internal stack overflow

— syntax error

— function redefined

— internal execution tree error

— insufficient memory for string storage

— unbalanced parenthesis or brace

— missing script file

— missing field separator

— missing variable assignment

— unknown option

— invalid character in input

— newline in regular expression

— newline in string

— EOF in regular expression

— EOF in string

— cannot open script file

— inadmissible use of reserved keyword

— attempt to redefine built-in function

— cannot open input file

— error on print

— error on printf

— getline in END action was not redirected

— too many open input/output streams

— error on input/output stream

— insufficient arguments to printf or sprintf()

— array cannot be used as a scalar

— variable cannot be used as a function

— too many fields

— record too long

— division (/ or %) by zero

— syntax error

— cannot assign to a function

— value required in assignment

— return outside of a function

— may delete only array element or array

— scalar cannot be used as array

— SYMTAB must have exactly one index

— impossible function call

— function call nesting level exceeded

— wrong number of arguments to function

— regular expression error

— second parameter to "split" must be an array

— sprintf() string longer than allowed number of characters

— no open file name

— function requires an array

— is not a function

— failed to match

— invalid collation element

— trailing \ in pattern

— newline found before end of pattern

— more than 9  pairs

— number in [0-9] invalid

— [ ] imbalance or syntax error

— ( ) or  imbalance

— { } or \{ \} imbalance

— invalid endpoint in range

— out of memory

— invalid repetition

— invalid character class type

— internal error

— unknown regex error

When an awk program terminates because of a call to exit(), the exit status is the value passed to exit().

LIMITS

Most constructions in this implementation of awk are dynamic, limited only by memory restrictions of the target machine. On Windows systems, awk limits dynamic data to the amount of available conventional memory, the longest input record to 20000 bytes, and the number of fields to 4000. The parser stack depth is limited to 600 levels. Attempting to process extremely complicated programs may result in an overflow of this stack, causing an error.

The maximum record size is guaranteed to be at least LINE_MAX, as returned by getconf. The maximum field size is guaranteed to be LINE_MAX. With MKS Toolkit, LINE_MAX is set to 8192 bytes.

Input must be text files.

PORTABILITY

POSIX.2. x/OPEN Portability Guide 4.0. All UNIX systems. Windows 10. Windows Server 2016. Windows Server 2019. Windows 11. Windows Server 2022. Windows Server 2025.

The ord() function is an extension to traditional implementations of awk. The toupper() and tolower() functions and the ENVIRON array are in POSIX and the UNIX System V Release 4 version of awk. This version is a superset of New AWK as described in The AWK Programming Language by Aho, Weinberger, and Kernighan.

The standard command interpreter that the system() function uses and that awk uses to run pipelines for getline, print and printf is system dependent. On UNIX and POSIX-compliant systems, this interpreter is always ROOTDIR/bin/sh. On Windows systems, the SHELL environment variable (or COMSPEC if SHELL is not set) names the standard command interpreter. The shell in use affects the error status that the system function returns: sh returns the exit status of the run command, while cmd.exe does not return error statuses correctly.

AVAILABILITY

PTC MKS Toolkit for Power Users
PTC MKS Toolkit for System Administrators
PTC MKS Toolkit for Developers
PTC MKS Toolkit for Interoperability
PTC MKS Toolkit for Professional Developers
PTC MKS Toolkit for Professional Developers 64-Bit Edition
PTC MKS Toolkit for Enterprise Developers
PTC MKS Toolkit for Enterprise Developers 64-Bit Edition

awk
data transformation, report generation language

Command