wc -- count of newlines, words, bytes, and characters

SYNOPSIS

wc [-c|-m] [-lw] [-U[[[c][lb8oa]][p[lb8oa]]]] [file ...]

DESCRIPTION

wc counts the number of newlines, words, characters and bytes in text files. If you specify multiple files, wc produces counts for each file, plus totals for all files.

Besides normal ASCII text files, wc also works on UTF-8 files and 16-bit wide Unicode files. Such files normally begin with a multiple-byte marker indicating whether the file's contents are Unicod big-endian, Unicode little-endian, or UTF-8. Such files are detected automatically by wc; however, when the multiple-byte marker is missing, you can use the -U option or the TK_STDIO_DEFAULT_INPUT_FORMAT/TK_STDIO_DEFAULT_OUTPUT_FORMAT environment variables to treat any file as a Unicode or UTF-8 file.

Normally, wc's output format defaults to the format of the first file it displays unless the -U option or the TK_STDIO_DEFAULT_OUTPUT_FORMAT environment variable is used to override the output format. For more details on this and other Unicode-related file handling issues see the unicode reference page.

If you did not specify any options, wc produces the following output:

newline_count word_count byte_count filename

When you specify options, wc displays only the selected counts in the same order as the default output. If you specify -m, the character count replaces the byte count. For example, -cw displays the word count followed by the byte count and the file name and -ml displays the newline count followed by the character count and file name.

A word is considered to be a character or characters delimited by white space.

Note:: The -c option of wc counts bytes, not characters. This is a change from previous versions of wc, dictated by the POSIX.2 standard which provides the -m option to count characters. If you have a file containing multibyte characters, the byte count is higher than the character count. On Windows systems, a line of a text file is often delimited by the sequence carriage return/linefeed. Since wc views its input as text, this sequence is counted as a single newline byte.

Options

-c

displays a byte count. You cannot specify this option with -m.

-l

displays a newline count.

-m

displays a character count. You cannot specify this option with -c.

-U[[[c][lb8oa]][p[lb8oa]]]

specifies the input format of any file missing the initial multiple-byte marker, the output format produced, or both.

When c is specified, the specifiers that follow it apply to the input consumed.

When p is specified, the specifiers that follow it apply to the output produced.

When neither c nor p are specified, the remaining -U specifiers apply to the input consumed.

When both c and p are specified, the remaining -U arguments apply to both input and output.

The remaining specifiers indicate the format of the characters read from input or written to output (as determined by c and p):

l     little-endian 16-bit wide characters
b     big-endian 16-bit wide characters
8     UTF-8 characters
a     ASCII characters from the ANSI code page
o     ASCII characters from the OEM code page

When multiple format specifiers can be associated with either c or p, the last appropriate one given on the command for each of c and p is used. For example:

-Ucoapl8

is the same as:

-Ucap8

When a p specifier is given without a c specifier and format specifiers are given before the p specifier, those format specifiers apply to the input. For example:

-Uopl

is the same as:

-Ucopl

When c or p is specified with no format specifies, little endian 16-bit wide characters are used by default for either input or output, as appropriate.

As an alternative to specifying formats for both input and output with the same -U option, you can specify the -U option multiple times. For example, the following are identical:

-Uca -Upb
-Ucapb

Note:

The -U specifiers are actually case-insensitive. For example, the following are all identical in their behavior:

-Ucl
-UcL
-UCl
-UCL

-w

displays a word count.

ENVIRONMENT VARIABLES

TK_STDIO_DEFAULT_INPUT_FORMAT: Sets the default input format for files that don't have the initial multibyte marker. The value must be one of those listed in the File Character Formats section of the unicode reference page.
TK_STDIO_DEFAULT_OUTPUT_FORMAT: Sets the default output format. Normally the format of the first file read is used as the default output format. The value must be one of those listed in the File Character Formats section of the unicode reference page.

DIAGNOSTICS

Possible exit status values are:

0: Successful completion.
1: Failure because of an inability to open the input file.
2: Failure because of an invalid command line option.

PORTABILITY

POSIX.2. x/OPEN Portability Guide 4.0. All UNIX systems. Windows 8.1. Windows Server 2012 R2. Windows 10. Windows Server 2016. Windows Server 2019. Windows 11. Windows Server 2022.

AVAILABILITY

PTC MKS Toolkit for Power Users
PTC MKS Toolkit for System Administrators
PTC MKS Toolkit for Developers
PTC MKS Toolkit for Interoperability
PTC MKS Toolkit for Professional Developers
PTC MKS Toolkit for Professional Developers 64-Bit Edition
PTC MKS Toolkit for Enterprise Developers
PTC MKS Toolkit for Enterprise Developers 64-Bit Edition

wc
count of newlines, words, bytes, and characters

Command