Besides normal ASCII text files, MKS Toolkit utilities also support UTF-8 file and 16-bit wide Unicode files (that is, files using UTF-8 characters and 16-bit wide Unicode characters, respectively). You can also include such characters on MKS Toolkit command lines and in path names. MKS Toolkit utilities cannot handle non-OEM characters in file names unless the locale supports double byte character (such as the Japanese locale). Consequently, even though the utilities support UTF-8 and Unicode characters in files on all platforms, to achieve maximum portability across all Windows platforms, all file names used in scripts for utilities like awk, sh, csh and others should contain only ASCII characters from the OEM code page.
Normally, when a file is read by an MKS Toolkit utility, the utility
determines its format and the type of characters it contains and will
use that same format for any output it produces. The key to determining
the file format is the multiple-byte marker usually found at the beginning
of UTF-8 and Unicode files. This marker indicates whether the file's
contents are Unicode big-endian, Unicode little-endian, or UTF-8.
However, when the multiple-byte marker is not present, you can set the
TK_STDIO_DEFAULT_OUTPUT_FORMAT environment variables (see
to force any input or output to be treated as Unicode or UTF-8.
Additionally, some utilities (such as cat,
more, or tail) feature a
When multiple input files are specified for a utility, the format of the output generated is normally the same as the first input file specified. There are, however, two exceptions to this.
When the utility first reads all input files, processes them, and then generates output (for example, diff), the output is normally in the format of the first specified input file unless the input files are a mix of ASCII and Unicode/UTF-8 formats. In that case, the output format is the format of the first non-ASCII input file specified.
When multiple input files are being read and multiple output files are being generated (as can often be the case with awk and perl scripts), the format of a given output file (or standard output) depends upon what input files have been read at the time of the output file's creation. If only ASCII format input files have at that time, the output file is created using the format of the first ASCII file read. If, however, only non-ASCII (Unicode or UTF-8) or mix of ASCII and non-ASCII files have been read, the output file is created using the format of the first non-ASCII input file read.
When deciding how files should be treated, the input and output code pages are also taken into account. MKS Toolkit supports the system OEM and ANSI code pages and normally sets the output code page to match the input code page.
For the TK_STDIO_DEFAULT_INPUT_FORMAT and TK_STDIO_DEFAULT_OUTPUT_FORMAT environment variables, the true meaning of the value ASCII depends upon the appropriate input or output code page. If the code page is the system ANSI code page, ASCII is equivalent to ASCII_ANSI; otherwise, it is equivalent to ASCII_OEM.
For ASCII input, when TK_STDIO_DEFAULT_INPUT_FORMAT is not set and the input code page is the system ANSI code page, the input is assumed to be in ASCII_ANSI format. Otherwise, it is assumed to be in ASCII_OEM format.
For ASCII output, when TK_STDIO_DEFAULT_OUTPUT_FORMAT is not set, and the output code page is the system ANSI code page, the output is written in ASCII_ANSI format. Otherwise it is assumed to be in ASCII_OEM format.
To change the input and output code page, use the stty utility's cp command (for example, stty cp 437) . To display a list of available code pages on your system or display the current code pages, use the sysinf utility's codepages command (for example, sysinf codepages -c).
By default, Windows systems use the OEM code page for all console input and output. Most MKS Toolkit non-graphical utilities use console input/output as does viw (for compatibility with vi). As a result, files that have been saved in ANSI format (for example, by Notepad) may not be correctly read or displayed by these utilities. To read and display these files correctly, you can do one of the following:
use the stty cp command to set the code page to the ANSI code page
specify the utility's
-Uoption (if available)
assign the appropriate value to the TK_STDIO_DEFAULT_INPUT_FORMAT environment variable
use the:w -f ASCII_ANSI file1
-foption of the appropriate vi Ex command to force the file to be read as ASCII_ANSI. For example, you might enter the following on the Ex command line of vi to save the current file in ASCII_ANSI format with the name file1:
Finally, the font selected for a console window may also affect how characters are displayed. For example, the Lucida Console font allows a greater number of characters to be displayed. When a character cannot be displayed in the console window using the current font, the system default character is displayed in its place.
MKS Toolkit supports text files with characters stored in a variety of formats.
MKS Toolkit utilities support specifying the precise input and output format
of files handled through the use of environment variables (see
ENVIRONMENT VARIABLES, the
The following values indicate ASCII files:
ASCII_ANSI ASCII characters from the ANSI code page ASCII_OEM ASCII characters form the OEM code page ASCII same as ASCII_OEM or ASCII_ANSI depending on input code page ANSI same as ASCII_ANSI OEM same as ASCII_OEM A same as ASCII_ANSI O same as ASCII_OEM
The following values indicate Unicode/UTF-8 (non-ASCII) files:
UNICODE_BIG_ENDIAN Big endian 16-bit wide characters UNICODE_LITTLE_ENDIAN Little endian 16-bit wide characters UTF-8 UTF-8 characters UNICODE same as UNICODE_LITTLE_ENDIAN L same as UNICODE_LITTLE_ENDIAN B same as UNICODE_BIG_ENDIAN UTF8 same as UTF-8 8 same as UTF-8
Contains the format to be used by cpio, tar, pax, vpax, zip, or unzip when reading and writing file names to an archive. The value must be one of ASCII_ANSI, ASCII_OEM, or UTF-8 (or their equivalents) as described in the File Character Formats section above.
When this variable is unset or it is set to a value other than those listed earlier, the default OEM character set is used.
Contains the format to be used for the output from command substitution in MKS KornShell (the `command_line` and $(command_line) structures) and MKS C Shell (the `command_line` structure). The value must be one of those listed in the File Character Formats section above.
When TK_CMDSUB_FORMAT is not set, the value of the TK_STDIO_DEFAULT_INPUT_FORMAT environment variable is used as the default format.
When TK_CMDSUB_FORMAT and TK_STDIO_DEFAULT_INPUT_FORMAT are both unset, either ASCII_OEM or ASCII_ANSI is used as the default format, as dictated by the current code page. This provides compatibility with older versions of MKS Toolkit.
Contains the format to be used for here documents in MKS KornShell and MKS C Shell. The value must be one of those listed in the File Character Formats section above.
When TK_HEREDOC_FORMAT and TK_STDIO_DEFAULT_OUTPUT_FORMAT are both unset, here documents are assumed to use ASCII_OEM characters. This provides compatibility with older versions of MKS Toolkit.
When TK_STDIO_DEFAULT_OUTPUT_FORMAT is set to a Unicode/UTF-8 format and you are feeding a here document to to a non-MKS Toolkit utility that won't understand its format, you should set TK_HEREDOC_FORMAT to ASCII_OEM and export it.
When TK_HEREDOC_FORMAT is unset or TK_STDIO_DEFAULT_OUTPUT_FORMAT is set to an ASCII format and your here document contains non-ASCII (OEM) characters, you should set TK_HEREDOC_FORMAT to UTF-8 and export it.
This variable takes precedence over TK_STDIO_DEFAULT_OUTPUT_FORMAT.
Sets the default input format for files that don't have the initial multibyte marker. The value must be one of those listed in the File Character Formats section above.
Sets the default output format. Normally the format of the first file read is used as the default output format. The value must be one of those listed in the File Character Formats section above.
MKS Toolkit for Power Users
MKS Toolkit for System Administrators
MKS Toolkit for Developers
MKS Toolkit for Interoperability
MKS Toolkit for Professional Developers
MKS Toolkit for Enterprise Developers
- File Formats:
MKS Toolkit 9.5 Documentation Build 3.