csplit

split a text file according to criteria 

Command


SYNOPSIS

csplit [-Aaks] [-f prefix] [-n number] file arg arg ...


DESCRIPTION

csplit takes a text file as input and breaks up its contents into pieces, based on criteria given by the arg value(s) on the command line. For example, you can use csplit to break up a text file into chunks of ten lines each, then save each of those chunks in a separate file. See the subsection Splitting Criteria for more details. If you specify - as the file argument, csplit uses the standard input.

The files created by csplit normally have names of the form

xxnumber

where number is a two digit decimal number which begins at zero and increments by one for each new file that csplit creates.

csplit also displays the size, in bytes, of each file that it creates.

Options

-A 

uses uppercase letters in place of numbers in the number portion of created file names. This generates names of the form xxAA, xxAB, and so on.

-a 

uses lowercase letters in place of numbers in the number portion of created file names. This generates names of the form xxaa, xxab, and so on.

-f prefix 

specifies a prefix to use in place of the default xx when naming files. If prefix causes a file name longer than NAME_MAX bytes an error occurs and csplit exits without creating any files.

-k 

leaves all created files intact. Normally, when an error occurs, csplit removes files that it has created.

-n number 

specifies the number of digits in the number portion of created file names.

-s 

suppresses the display of file sizes.

Splitting Criteria

csplit processes the args on the command line sequentially. The first argument breaks off the first chunk of the file, the second argument breaks off the next chunk (beginning at the first line remaining in the file) and so on. Thus each chunk of the file begins with the first line remaining in the file and goes to the line given by the next arg.

arg values may take any of the following forms:

/regexp

takes the chunk as all the lines from the current line up to but not including the next line that contains a string matching the regular expression regexp. regexp is a basic regular expression (see regexp). After csplit has obtained the chunk and written it to an output file, it sets the current line to the line that matched regexp.

/regexp/offset 

is the same as the previous criterion, except that the chunk goes up to but not including the line that is a given offset from the first line containing a string that matches regexp. The offset may be a positive or negative integer. After csplit has obtained the chunk and written it to an output file, it sets the current line to the line that matched regexp.

%regexp

is the same as /regexp/ except that csplit does not write the chunk to an output file. It simply skips over the chunk.

%regexp%offset 

is the same as /regexp/offset except csplit does not write the chunk to an output file.

linenumber 

obtains a chunk beginning at the current line and going up to but not including the linenumberth line. After csplit writes the chunk to an output file, it sets the current line to linenumber.

{number

repeats the previous criterion number times. If it follows a regular expression criterion, it repeats the regular expression process number more times. If it follows a linenumber criterion, csplit splits the file every linenumber lines, number times, beginning at the current line. For example,

csplit file 10 {10}

obtains a chunk from line 1 to line 9, then every 10 lines after that, up to line 109.

Errors occur if any criterion tries to grab lines beyond the end of the file, if a regular expression does not match any line between the current line and the end of the file, or if an offset refers to a position before the current line or past the end of the file.


DIAGNOSTICS

Possible exit status values are:

0 

Successful completion.

1 

Failure due to any of the following:

— because csplit was unable to open the input or output files
— a write error on the output file
2 

Failure due to any of the following:

— unknown command line option
— the prefix name was missing after -f
— the number of digits was missing after -n
— the input file was not specified
— no arg values were specified
— the command ran out of memory
— an arg was invalid
— the command found end-of-file prematurely
— a regular expression in an arg was badly formed
— a line offset/number in an arg was badly formed
— a {number} repetition count was misplaced or badly formed
— too many file names were generated when using -n
— generated file names would be too long


PORTABILITY

POSIX.2. x/OPEN Portability Guide 4.0. All UNIX systems. Windows 8.1. Windows Server 2012 R2. Windows 10. Windows Server 2016. Windows Server 2019. Windows 11. Windows Server 2022.

The -A and -a options are extensions to the POSIX and XPG standards.


AVAILABILITY

PTC MKS Toolkit for Power Users
PTC MKS Toolkit for System Administrators
PTC MKS Toolkit for Developers
PTC MKS Toolkit for Interoperability
PTC MKS Toolkit for Professional Developers
PTC MKS Toolkit for Professional Developers 64-Bit Edition
PTC MKS Toolkit for Enterprise Developers
PTC MKS Toolkit for Enterprise Developers 64-Bit Edition


SEE ALSO

Commands:
awk, sed

Miscellaneous:
regexp


PTC MKS Toolkit 10.4 Documentation Build 39.