split

file processingLinux/Unix
The split command is one of the most frequently used commands in Linux/Unix-like operating systems. split Split a file into pieces

Quick Reference

Command Name:

split

Category:

file processing

Platform:

Linux/Unix

Basic Usage:

split [options] [arguments]

Common Use Cases

    Syntax

    split [options] [input [prefix]]

    Options

    Option Description
    -a, --suffix-length=N Generate suffixes of length N (default 2)
    -b, --bytes=SIZE Put SIZE bytes per output file
    -C, --line-bytes=SIZE Put at most SIZE bytes of lines per output file
    -d, --numeric-suffixes[=FROM] Use numeric suffixes instead of alphabetic, start at FROM
    --hex-suffixes[=FROM] Use hexadecimal suffixes instead of alphabetic, start at FROM
    -l, --lines=NUMBER Put NUMBER lines per output file
    -n, --number=CHUNKS Generate CHUNKS output files (Only with -l or -b)
    --filter=COMMAND Write to shell COMMAND; file name is $FILE
    -u, --unbuffered Immediately copy input to output with '-n r/...'
    --additional-suffix=SUFFIX Append an additional SUFFIX to file names
    --help Display help and exit
    --version Output version information and exit
    SIZE Units Description
    K 1 Kilobyte (1024 bytes)
    M 1 Megabyte (1024 KB)
    G 1 Gigabyte (1024 MB)
    T 1 Terabyte (1024 GB)
    P 1 Petabyte (1024 TB)

    Examples

    How to Use These Examples

    The examples below show common ways to use the split command. Try them in your terminal to see the results. You can copy any example by clicking on the code block.

    # Basic Examples Basic
    split large_file.txt
    Split large_file.txt into 1000-line chunks with prefix 'x'.
    split -l 100 data.csv data_part_
    Split data.csv into 100-line chunks with prefix 'data_part_'.
    split -b 10M large_file.bin chunk_
    Split large_file.bin into 10MB chunks with prefix 'chunk_'. # Advanced Examples Advanced # Split a file and add numeric suffixes split -d -l 500 large_file.txt segment_ # Creates files like segment_00, segment_01, etc. # Split with specified suffix length split -l 100 -a 4 data.txt output_ # Creates output_aaaa, output_aaab, etc. with 4-char suffixes # Split a file and compress each part split -b 50M large_file.iso iso_part_ --filter='gzip > $FILE.gz' # Split and process each part with a custom command split -l 1000 access.log log_ --filter='grep "ERROR" > $FILE.errors' # Split with hexadecimal suffixes split -l 100 -x data.txt hex_ # Creates hex_00, hex_01, ... hex_ff # Split a file into a specific number of chunks total_lines=$(wc -l < huge_file.txt) lines_per_chunk=$((total_lines / 5)) split -l $lines_per_chunk huge_file.txt chunk_ # Splits into approximately 5 equal parts # Split a CSV file preserving the header in each part head -n 1 data.csv > header.tmp tail -n +2 data.csv | split -l 1000 - chunk_ for f in chunk_*; do cat header.tmp $f > with_header_$f rm $f done rm header.tmp # Use with pipes for on-the-fly splitting
    cat large_file.txt | split -l 500 - output_
    # Split and number output lines split -l 100 --additional-suffix=.txt data.txt part_ --filter='nl > $FILE'

    Try It Yourself

    Practice makes perfect! The best way to learn is by trying these examples on your own system with real files.

    Understanding Syntax

    Pay attention to the syntax coloring: commands, options, and file paths are highlighted differently.

    Notes

    The `split` command is a versatile utility in Unix-like operating systems that divides a file into multiple smaller pieces. This command is particularly useful when dealing with large files that need to be transferred across networks, stored on media with size limitations, or processed in smaller, more manageable chunks. At its most basic level, `split` divides files based on size or line count. By default, it splits a file into 1000-line segments and names them with a prefix 'x' followed by alphabetic suffixes (xaa, xab, xac, etc.). However, the true power of `split` lies in its flexibility through various options. Key features of the `split` command include: 1. Multiple Splitting Criteria: Files can be split based on line count (`-l`), byte size (`-b`), or even by putting a maximum number of bytes of lines per file (`-C`). 2. Customizable Output Naming: Users can specify a custom prefix for output files and control the suffix format - alphabetic (default), numeric (`-d`), or hexadecimal, as well as the suffix length (`-a`). 3. Fixed Number of Chunks: With the `-n` option, a file can be divided into a specific number of approximately equal-sized pieces, rather than pieces of a fixed size. 4. Processing Capability: The `--filter` option allows each output chunk to be processed by a specified command before being written, enabling on-the-fly transformations or analyses. 5. Suffix Customization: Beyond controlling the format and length of suffixes, additional text can be appended to output filenames using `--additional-suffix`. Common use cases for the `split` command include: - Breaking large log files into smaller, more manageable chunks for analysis - Splitting large datasets for parallel processing - Dividing large files for transmission over networks with size limitations - Creating backup archives that fit on specific media - Distributing workloads among multiple processors or systems - Preparing large files for cloud upload where there might be size restrictions It's worth noting that `split` only performs the division of the original file and does not modify it. To reassemble the split files back into the original, one would typically use the `cat` command with the generated files in the correct order. The `split` command is particularly powerful when combined with other Unix utilities in pipelines, allowing for complex data processing workflows that handle large volumes of data efficiently.

    Related Commands

    These commands are frequently used alongside split or serve similar purposes:

    Use Cases

    Learn By Doing

    The best way to learn Linux commands is by practicing. Try out these examples in your terminal to build muscle memory and understand how the split command works in different scenarios.

    $ split
    View All Commands