Write a bash script to generating datasets and random files of various size

Usually, data that mimics real-world data is always the best, but sometimes we need an assortment of files of various content and size for validation testing without delay. Imagine that you have a web server and it is running some sort of application that accepts files for storage. However, the files have a size limit being enforced. Wouldn’t it be great to just whip up a batch of files in an instant?

Table of Contents

To do this, we can use some few file system features such as /dev/random and a useful program called dd. The dd command is a utility that can be used to convert and copy files (including devices due to Linux’s concept of everything is a file, more or less). It can be used in a later recipe to back up data on an SD card (remember your favorite Raspberry Pi project?) or to “chomp” through files byte by byte without losses. Typical minimal dd usage can be $ dd if="inputFile" of="outputFile" bs=1M count=10. From this command, we can see:

if=: Stands for input file
of=: Stands for output file
bs=: Stands for block size
count=: Stands for numbers of blocks to be copied

Options bs= and count= are optional if you want to perform a 1:1 (pure duplicate) copy of a file because dd will attempt to use reasonably efficient parameters to provide adequate performance. The dd command also has a number of other options such as seek=, which will be explored later when performing low-level backups in another recipe. The count option is typically not needed as it’s far more common to copy an entire file instead of a section (when performing backups).

Note:

/dev/random is a device in Linux (hence the /dev path) which can be used to produce random numbers for use in your scripts or applications. There are also other /dev paths such as the console and various adaptors (for example, USB sticks or mice), all of which may be accessible, and gaining knowledge of them is recommended.

Prerequisites

To get ready for this recipe, install the dd command as follows and make a new directory called qa-data/:

$ sudo apt-get install dd bsdmainutils 
$ mkdir qa-data

This script uses the dmesg command, which is used to return system information such as interface status or the system boot process. It is nearly always present on a system and therefore a good substitute to reasonable system level “lorem ipsum”. If you wish to use another type of random text, or a dictionary of words, dmesg can easily be replaced! Another two commands used are seq and hexdump. The seq command can generate an array of n numbers from a starting point using a specified increment, and hexdump produces a human readable representation of a binary (or executable) in hexadecimal format.

Write Script:

Open a terminal and create a new script called data-maker.sh.

The following is the code snippet of the script:

data-maker.sh

#!/bin/bash 
N_FILES=3 
TYPE=binary 
DIRECTORY="qa-data" 
NAME="garbage" 
EXT=".bin" 
UNIT="M" 
RANDOM=$$ 
TMP_FILE="/tmp/tmp.datamaker.sh" 
function get_random_number() { 
	SEED=$(($(date +%s%N)/100000)) 
	RANDOM=$SEED 
	# Sleep is needed to make sure that the next time rnadom is ran, everything is good. 
	sleep 3 
	local STEP=$1 
	local VARIANCE=$2 
	local UPPER=$3 
	local LOWER=$VARIANCE 
	local ARR; 
	
	INC=0 
	for N in $( seq ${LOWER} ${STEP} ${UPPER} ); 
	do 
		ARR[$INC]=$N 
		INC=$(($INC+1)) 
	done 
	RAND=$[$RANDOM % ${#ARR[@]}] 
	echo $RAND 
}

Let’s begin the execution of the script using the following command. It uses the -t flag for type and is set to text, -n is used for the number of files, which is 5, -l is the lower bound: 1 characters, and -u is 1000 characters:

$ bash data-maker.sh -t text -n 5 -l 1 -u 1000

To checkout the output, use the following command:

$ ls -la qa-data/*.txt 
$ tail qa-data/garbage4.txt

Again, let’s run the data-maker.sh script, but for binary files. Instead of the size limits being 1 char (1 byte) or 1000 chars (1000 bytes or just less than one kilobyte), the sizes are in MB, with there being 1–10 MB files:

$ bash data-maker.sh -t binary -n 5 -l 1 -u 10

To check out the output, use the following command. The use of a new command called hexdump is because we cannot “dump” or “cat” a binary file the same way as we can a “regular” ASCII text file:

$ ls -la qa-data/*.bin 
$ hexdump qa-data/garbage0.bin 
0000000 0000 0000 0000 0000 0000 0000 0000 0000 
*

How Script Work:

Let’s understand, how things are happening:

First, we create the ;data-maker.sh script. This script introduces several new concepts including the ever fascinating concept of randomization. In computers, or really anything in life, true random events or number generation cannot happen and require several mathematical principles such as entropy. While this is beyond the scope of this cookbook, know that when reusing it randomly or even initially, you should give it a unique initialization vector or seed. Using a for loop, we can build an array of numbers using the seq command. Once the array is built, we choose a “random” value from the array. In each type of file output operation (binary or text), we determine approximately both minimum (-l or lower) and maximum (-u or upper) sizes to control the output data.

In step 2, we build 5 text files using the output of dmesg and our pseudo randomization process. We can see that we iterate until we have five text files created using different sizes and starting points with the dd command.

In step 3, we verify that indeed we created five files, and in the fifth one, we viewed the tail of the garbage4.txt file.

In step 4, we create five binary files (full of zeros) using the dd command. Instead of using a number of chars, we used megabytes or (MB).

In step 5, we verify that indeed we created five binary files, and in the fifth one, we viewed the contents of the binary file using the hexdump command. The hexdump command created a simplified “dump” of all of the bytes inside of the garbage0.bin file.

0 Comments

Submit a Comment Cancel reply

Are you open to learn Linux?

Get weekly Linux news, tutoials, tips & tricks, and other useful information related to Linux and Open source in your INBOX.

An introduction on Error Checking and Handling

Bash script is a powerful tool that allows you to automate tasks and perform complex operations on...

Read More →

BASH

Bash script: Error prevention

Bash script is a powerful tool for automating repetitive tasks and streamlining your workflow....

Read More →

BASH

Bash script: Error handling

When it comes to writing scripts in Bash, it's important to consider how to handle errors that may...

Read More →

BASH

Bash script: Error checking

Bash is a powerful tool that can automate repetitive tasks and make your life easier. But with...

Read More →

BASH

Bash: Interactive versus non-interactive scripts

Bash, or the Bourne Again Shell, is a popular command-line interpreter for Unix-based systems. It...

Read More →

BASH

Dealing with user input in bash script

Dealing with user input in bash script can be a tricky task, but with a little bit of knowledge...

Read More →

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Write a bash script to generating datasets and random files of various size

Prerequisites

Write Script:

data-maker.sh

How Script Work:

0 Comments

Submit a Comment Cancel reply

Are you open to learn Linux?

Success!

Related Articles

An introduction on Error Checking and Handling

Bash script: Error prevention

Bash script: Error handling

Bash script: Error checking

Bash: Interactive versus non-interactive scripts

Dealing with user input in bash script

LINUXCONCEPT

MOST VISITED

INFORMATION

CONNECT WITH US