Write a bash script to joining and splitting files at arbitrary positions

Let’s not be shy! Who has tried to open a large file by accident or even intentionally with an application and it didn’t quite go as planned? I certainly have, and I have certainly seen the limitations such as the number of rows loaded in Excel, or OpenOffice calculator. In these cases, we use a handy tool that can split files at arbitrary points, such as the following:

Table of Contents

Before X number of lines
Before Z number of bytes/chars

In this article, you will create a singe dual purpose script: a script that can use an input file and produce split or multiple files, and a second script to join files using a combining method. There are a few caveats when passing around string variables as they:

Can sometimes lose special characters such as new lines
(Binary) Should be handled by different tools than the usual commands on the command line

This file also reuses the getopts parameter parsing, but it also introduces the mktemp command and the getconf command with the PAGESIZE parameter. Mktemp is a useful command because it can produce unique temporary files that reside in the /tmp directory, but can even produce unique files that follow a template (notice the XXX—this will be replaced with random values, but uniquefile. will remain):

$ mktemp uniquefile.XXXX

Another useful command is the getconf programming utility, which is a standards compliant tool designed to fetch useful system variables. One in particular called PAGESIZE is useful to determine the size of memory in one block. Obviously, this is in very simplistic terms, but choosing the appropriate size to write data can be very beneficial performance-wise.

Prerequisites

Besides having a terminal open, a single text file called input-lines needs to be created with the following content (one character on each line):

input-lines

Next, create a second file called merge-lines with the following content:

merge-lines

It's -17 outside

Write Script:

Open a terminal and create a script named file-splitter.sh.

The following is the code snippet:

file-splitter.sh

#!/bin/bash 
FNAME="" 
LEN=10 
TYPE="line" 
OPT_ERROR=0 
set -f 
function determine_type_of_file() { 
	local FILE="$1" 
	file -b "${FILE}" | grep "ASCII text" > /dev/null 
	RES=$? 
	if [ $RES -eq 0 ]; then 
		echo "ASCII file - continuing" 
	else 
		echo "Not an ASCII file, perhaps it is Binary?" 
	fi 
}

Next, run file-splitter.sh with this command and flags ( -i, -t, -l):

$ bash file-splitter.sh -i input-lines -t line -l 10

Review the output and see what the difference is with -t size and when -l line is used. What about when -l 1 or -l 100 is used? Remember to remove the split files using $ rm input-lines.*:

$ rm input-lines.* 
$ bash file-splitter.sh -i input-lines -t line -l 10 
$ rm input-lines.* 
$ bash file-splitter.sh -i input-lines -t line -l 1 
$ rm input-lines.* 
$ bash file-splitter.sh -i input-lines -t line -l 100 
$ rm input-lines.* 
$ bash file-splitter.sh -i input-lines -t size -l 10

In the next step, create another script called file-joiner.sh.

The following is the code snippet:

file-joine.sh

#!/bin/bash 
INAME="" 
ONAME="" 
FNAME="" 
WHERE="" 
OPT_ERROR=0 
TMPFILE1=$(mktemp) 
function determine_type_of_file() { 
	local FILE="$1" 
	file -b "${FILE}" | grep "ASCII text" > /dev/null 
	RES=$? 
	if [ $RES -eq 0 ]; then 
		echo "ASCII file - continuing" 
	else 
		echo "Not an ASCII file, perhaps it is Binary?" 
	fi 
}

Next, run the script using this command:

$ bash file-joiner.sh -i input-lines -o merge-lines -f final-join.txt -w 2

How Script work:

Before proceeding, notice that the type option (-t) on final-join.txt ignores \n newlines when reading in characters one at a time. Read suffices for the purpose of this recipe, but the reader should be aware that read/cat are not the best tools for this type of work.

Creating the script was trivial and for the most part shouldn’t look like it came from the planet Mars.

Running the $ bash file-splitter.sh -i input-lines -t line -l 10 command should produce three files, all of which are input-lines {1,…,3}. The reason that there is three files is that if you used the same input, which is 22 lines long, it will produce three files (10+10+2). Using read and echo using a concatenated buffer (${BUFFER}), we can write to the file based on a specific criteria (provided by -l). If the EOF or end of file is met and the done loop is done, we need to write the buffer to the file because it may be under the threshold of the write criteria—this would result in lost/missing bytes in the last file created by the splitter script:

$ bash file-splitter.sh -i input-lines -t line -l 10 
ASCII file - continuing 
Wrote buffer to file: input-lines.1 
Wrote buffer to file: input-lines.2 
Wrote buffer to file: input-lines.3

Depending on the usage of the -l flag, the value of 1 will produce a file for every line, and the value of 100 will produce a single file because if fits under the threshold. Using the side-feature -t size, which can be used to split based on bytes, read has an unfortunate side effect: when we pass the buffer, it is altered and the new lines are missing. This sort of activity would be better if we used a tool such as dd, which is better for copying, writing, and creating raw data to files or devices.

Next, we created the script called file-joiners.sh. Again, it used getopts and requires four input parameters: -i originalFile -o, otherFileToMerge -f, finalMergedFile -w, and whereInjectTheOtherFile. The script is simpler overall, but uses the mktemp command to create a temporary file which we can use as a storage buffer without modifying the originals. When we are finished, we can use the mv command to move the file from /tmp to the terminal’s current directory (.). The mv command can also be used to rename files and is usually faster than cp (not so much in this case) because a copy does not occur, rather just a renaming operation at the file system level.

Catting final-join.txt should contain the following output:

Output:

$ cat final-join.txt 
1 
2 
It's -17 outside 
3 
4 
5 
6 
7 
8 
9 
0 
a 
b 
c 
d 
e 
f 
g 
h 
i 
j 
k

0 Comments

Submit a Comment Cancel reply

Are you open to learn Linux?

Get weekly Linux news, tutoials, tips & tricks, and other useful information related to Linux and Open source in your INBOX.

An introduction on Error Checking and Handling

Bash script is a powerful tool that allows you to automate tasks and perform complex operations on...

Read More →

BASH

Bash script: Error prevention

Bash script is a powerful tool for automating repetitive tasks and streamlining your workflow....

Read More →

BASH

Bash script: Error handling

When it comes to writing scripts in Bash, it's important to consider how to handle errors that may...

Read More →

BASH

Bash script: Error checking

Bash is a powerful tool that can automate repetitive tasks and make your life easier. But with...

Read More →

BASH

Bash: Interactive versus non-interactive scripts

Bash, or the Bourne Again Shell, is a popular command-line interpreter for Unix-based systems. It...

Read More →

BASH

Dealing with user input in bash script

Dealing with user input in bash script can be a tricky task, but with a little bit of knowledge...

Read More →

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Write a bash script to joining and splitting files at arbitrary positions

Prerequisites

input-lines

merge-lines

Write Script:

file-splitter.sh

file-joine.sh

How Script work:

Output:

0 Comments

Submit a Comment Cancel reply

Are you open to learn Linux?

Success!

Related Articles

An introduction on Error Checking and Handling

Bash script: Error prevention

Bash script: Error handling

Bash script: Error checking

Bash: Interactive versus non-interactive scripts

Dealing with user input in bash script

LINUXCONCEPT

MOST VISITED

INFORMATION

CONNECT WITH US