sort

sort/merge utility 

Command


SYNOPSIS

sort [-cmu] [-aN] [-o outfile] [-t char] [-y[n]] [-zn] [-T tempdir] [-bdfiMnr] [-k startpos[,endpos]] ... [file ...]

sort [-cmu] [-aN] [-o outfile] [-t char] [-yn] [-zn] [-T tempdir] [-bdfiMnr] [+startposition [-endposition]] ... [file ...]


DESCRIPTION

The sort command implements a full sort and merge facility. sort operates on input files containing records that are separated by the newline character.

If you do not specify either the -c or the -m option, sort sorts the concatenation of all input files and produces the output on standard output.

Options

-a 

allows files containing null characters to be treated as ASCII files rather than binary files.

-b 

skips, for comparison purposes, any leading white space (blanks or tabs) in any field (or key specification).

-c 

checks input files to ensure that they are correctly ordered according to the key position and sort ordering options specified, but does not modify or output the files. This option only affects the exit code.

-d 

uses dictionary ordering. sort examines only blanks, upper and lowercase letters, and numbers when making comparisons.

-f 

converts lowercase letters to uppercase for comparison purposes.

-i 

ignores, for comparison purposes, non-printable characters.

-k [startpos,[endpos]] 

specifies a sorting key. See the Sorting Keys section of this reference page for more information.

-M 

assumes that the field contains a month name for comparison purposes. Any leading white space is ignored. If the field starts with the first three letters of a month name in uppercase or lowercase, the comparisons are in month-in-year order. Anything that is not a recognizable month name compares less than JAN.

-m 

merges files into one sorted output stream. This option assumes that each input file is correctly ordered according to the other options specified on the command line; you can check this with the -c option.

-N 

specifies that the file being sorted does not have field delimiters.

-n 

assumes that the field contains an initial numeric value. sort sorts first by numeric value, then by the remaining text in the field, according to options. This option treats a field that contains no digits as if it had a value of zero. If more than one line contains no digits, the lines are sorted alphanumerically.

-o outfile 

writes output to the file outfile. By default, sort writes output onto the standard output. The output file can be one of the input files. In this case, sort makes a copy of the data to allow the (potential) overwriting of the input file.

-r 

reverses the order of all comparisons so that sort writes output from largest to smallest rather than smallest to largest.

-T tempdir 

specifies tempdir as the directory to use for sort's temporary files. When this option is not specified, sort stores its temporary file in the directory specified by the TMPDIR environment variable.

-t char 

indicates that the character char separates input fields. When you do not specify the -t option, sort assumes that any number of white space characters (blank or tab) separate fields.

-u 

ensures that output records are unique. If two or more input records have equal sort keys, sort writes only the first record to the output. When you use -u with -c, sort prints a diagnostic message if the input records have any duplicates.

-y[n

restricts the amount of memory available for sorting to n K of memory (where a K of memory is 1024 bytes). If n is missing, sort chooses a reasonable maximum amount of memory for sorting, dependent upon system configuration. sort needs at least enough memory to hold five records simultaneously. If you try to request less, sort automatically takes enough. When the input files overflow available memory, sort automatically does a polyphase merge (external sorting) algorithm which is, of necessity, much slower than internal sorting. n must be at least 2. n has a maximum value of 1024 and a default value of 250.

-zn 

indicates that the longest input record (including the newline character) is n characters in length. By default, record length is limited to 400 characters.

+startposition[-endposition

is an obsolete method of specifying a sorting key. See the Sorting Keys section of this reference page for more information.

The -b, -d, -f, -i, -M, -n, -r, and -t options control how sort compares records to determine the order that the records are written to the output in. These ordering options apply globally to all sorting keys except those keys that you individually specify the ordering option for. For more on sorting keys, see the next section.

Sorting Keys

By default, sort examines entire input records to determine ordering. By specifying sorting keys on the command line, you can tell sort to restrict its attention to one or more parts of each record.

You can indicate the start of a sorting key with

-k m[.n][options]

where m and the optional n are positive integers. You can choose options from the set bdfiMnr (described previously) to specify how sort does comparisons for that sorting key. (The b option behaves differently from the other options; see the next paragraph.) When you set one or more ordering options for a key, sort uses those options instead of the global ordering options for that key. If you do not specify any options for the key, the global ordering options are used.

The number m specifies which field in the input record contains the start of the sorting key. The character given with the -t option separates input fields; if this option is not given, spaces or tabs separate the fields. The number n specifies which character in the mth field marks the start of the sorting key; if you do not specify n, the sorting key starts at the first character of the mth field.

Note:

When you do not specify the -t option, a field is considered to begin with the white space that separates it from the preceding field. When -t is specified, a field begins with the character following the separator.

You can also specify an ending position for a key, with

-k m[.n][options],p[.q][options]

where p and q are positive integers, indicating that the sort key ends with the qth character of the pth field. If you do not specify q or you specify a value of 0 for q, the sorting key ends at the last character of the pth field. For example,

-k 2.3,4.6

defines a sorting key that extends from the third character of the second field to the sixth character of the fourth field. The b option applies only to the key start or key end that it is specified for.

sort also supports a historical method of defining the sorting key. Using this method, you indicate the start of the sorting key with

+m[.n][options]

which is equivalent to

-k m+1[.n+1][options]

You can also indicate the end of a sorting key with

-p[.q][options]

which when preceded with +m[.n] is equivalent to

-k m+1[.n+1],p.0[options]

if q is specified and is zero. Otherwise

-k m+1[.n+1],p+1[.q][options]

For example,

+1.2 -3.5

defines a sorting key with a starting position that sort finds by skipping the first field and the first 2 characters of the next field, its end position is found by skipping the first three fields and then the first five characters of the next field. In other words the sorting key extends from the third character of the second field to the sixth character of the fourth field. This is the same key as defined under the -k option described earlier.

With either syntax, if the end of a sorting key is not a valid position or no end was specified, the sorting key extends to the end of the input record.

You can specify multiple sort key positions by using several -k options, or several + and - options. In this case, sort uses the second sorting key only for records where the first sorting keys are equal, the third sorting key only when the first two are equal, and so on. If all key positions compare equal, sort determines ordering by using the entire record.

When you specify the -u option to determine the uniqueness of output records, sort looks only at the sorting keys, not the whole record. (Of course, if you specify no sorting keys, sort considers the whole record to be the sorting key.)


EXAMPLES

To sort an input file having lines consisting of the day of the month, white space and the month, as in:

30 December
23    MAY
25 June
10     June

use the command:

sort -k 2M -k 1n

To merge two dictionaries, with one word per line:

sort -m -dfi dict1 dict2 >newdict

Often it is useful to explicitly specify both the starting and ending field with the -k option even when the starting and ending field are the same. For example, the following sorts file1 alphabetically by the first field and within that, in reverse alphabetical order by the second field:

sort -k 1,1 -k 2,2r file1

ENVIRONMENT VARIABLES

TMPDIR 

contains the path name of the directory to be used for temporary files.


FILES

/tmp/stm* 

temporary files used for merging and -o option. You can specify a different directory for temporary files using the TMPDIR environment variable. For further information, see envvar.


DIAGNOSTICS

Possible exit status values are:

0 

Successful completion. Also returned if -c is specified and the file is already in correctly sorted order.

1 

Returned if you specified -c and the file is not correctly sorted. Also returned to indicate a non-unique record if you specified -cu.

2 

Failure due to any of the following:

— missing key description after -k
— more than one -o option
— missing file name after -o
— missing character after -t
— more than one character after -t
— missing number with -y or -z
endposition given before a startposition
— badly formed sort key
— invalid command line option
— too many key field positions specified
— insufficient memory
— inability to open the output file
— inability to open the input file
— error writing to the output file
— inability to create a temporary file or temporary file name

Badly formed sort key position x 

The key position was not specified correctly. Check the format and try again.

file filename is binary 

sort has determined that filename is binary because it found a NULL ('\0') character in a line.

Insufficient memory for ... 

This error normally occurs when you specify very large numbers for -y or -z and there is not enough memory available for sort to satisfy the request.

Line too long: limit nn - truncated 

Any input lines that are longer than nn. which is the default number of characters (400) or the number specified with the -z option, are truncated.

Missing key definition after -k 

You specified -k, but did not specify a key definition after the -k.

Non-unique key in record: ... 

With the -c and -u options, a non-unique record was found.

Not ordered properly at: ... 

With the -c option, an incorrect ordering was discovered.

No newline at end of file 

Any file not ending in a newline character has one added.

Tempfile error on ... 

The named temporary (intermediate) file could not be created. Make sure that you have a directory named /tmp and that this directory has space to create files. The directory for temporary files can be changed using the TMPDIR environment variable; see envvar.

Tempnam() error 

sort could not generate a name for a temporary working file. This should almost never happen.

Temporary file error (no space) for ... 

Insufficient space was available for a temporary file. Make sure that you have a directory named /tmp and that this directory has space to create files. The directory for temporary files can be changed using the ROOTDIR and TMPDIR environment variables; see envvar.

Too many key field positions specified 

This implementation of sort has a limit of 64 key field positions.

Write error (no space) on output 

Some error occurred in writing the standard output. This normally occurs when there is insufficient disk space to hold all of the intermediate data, or a diskette is write protected.


PORTABILITY

POSIX.2. x/OPEN Portability Guide 4.0. Windows 8.1. Windows Server 2012 R2. Windows 10. Windows Server 2016. Windows Server 2019. Windows 11. Windows Server 2022.

Available on all UNIX systems, with only UNIX System V.2 or later having the full functionality described here.

The -M and -y options are extensions to the POSIX and XPG standards. The -z option is an XPG extension to the POSIX standard. The POSIX.2 standard regards the historical syntax for defining sorting keys as obsolete. Therefore, you should use only the -k option in the future.


LIMITS

The maximum number of key field positions is 64.


NOTES

The sortgen AWK script is a useful way to handle complex sorting tasks. It is described in the AWK Tutorial in the User's Guide. It originally appeared in The AWK Programming Language, by Aho, Weinberger, and Kernighan.

The sort command provided with PTC MKS Toolkit should not be confused with the Windows sort command.


AVAILABILITY

PTC MKS Toolkit for Power Users
PTC MKS Toolkit for System Administrators
PTC MKS Toolkit for Developers
PTC MKS Toolkit for Interoperability
PTC MKS Toolkit for Professional Developers
PTC MKS Toolkit for Professional Developers 64-Bit Edition
PTC MKS Toolkit for Enterprise Developers
PTC MKS Toolkit for Enterprise Developers 64-Bit Edition


SEE ALSO

Commands:
awk, comm, cut, join, uniq

Miscellaneous:
ascii, envvar


PTC MKS Toolkit 10.4 Documentation Build 39.