Usually, data that mimics real-world data is always the best, but sometimes we need an assortment of files of various content and size for validation testing without delay. Imagine that you have a web server and it is running some sort of application that accepts files for storage. However, the files have a size limit being enforced. Wouldn’t it be great to just whip up a batch of files in an instant?
To do this, we can use some few file system features such as /dev/random
and a useful program called dd
. The dd
command is a utility that can be used to convert and copy files (including devices due to Linux’s concept of everything is a file, more or less). It can be used in a later recipe to back up data on an SD card (remember your favorite Raspberry Pi project?) or to “chomp” through files byte by byte without losses. Typical minimal dd usage can be $ dd if="inputFile" of="outputFile" bs=1M count=10
. From this command, we can see:
if=
: Stands for input fileof=
: Stands for output filebs=
: Stands for block sizecount=
: Stands for numbers of blocks to be copied
Options bs=
and count=
are optional if you want to perform a 1:1 (pure duplicate) copy of a file because dd
will attempt to use reasonably efficient parameters to provide adequate performance. The dd command also has a number of other options such as seek=
, which will be explored later when performing low-level backups in another recipe. The count option is typically not needed as it’s far more common to copy an entire file instead of a section (when performing backups).
Note:
/dev/random
is a device in Linux (hence the/dev
path) which can be used to produce random numbers for use in your scripts or applications. There are also other/dev
paths such as the console and various adaptors (for example, USB sticks or mice), all of which may be accessible, and gaining knowledge of them is recommended.
Prerequisites
To get ready for this recipe, install the dd
command as follows and make a new directory called qa-data/
:
$ sudo apt-get install dd bsdmainutils
$ mkdir qa-data
This script uses the dmesg
command, which is used to return system information such as interface status or the system boot process. It is nearly always present on a system and therefore a good substitute to reasonable system level “lorem ipsum”. If you wish to use another type of random text, or a dictionary of words, dmesg
can easily be replaced! Another two commands used are seq
and hexdump.
The seq
command can generate an array of n numbers from a starting point using a specified increment, and hexdump
produces a human readable representation of a binary (or executable) in hexadecimal format.
Write Script:
Open a terminal and create a new script called data-maker.sh
.
The following is the code snippet of the script:
data-maker.sh
#!/bin/bash
N_FILES=3
TYPE=binary
DIRECTORY="qa-data"
NAME="garbage"
EXT=".bin"
UNIT="M"
RANDOM=$$
TMP_FILE="/tmp/tmp.datamaker.sh"
function get_random_number() {
SEED=$(($(date +%s%N)/100000))
RANDOM=$SEED
# Sleep is needed to make sure that the next time rnadom is ran, everything is good.
sleep 3
local STEP=$1
local VARIANCE=$2
local UPPER=$3
local LOWER=$VARIANCE
local ARR;
INC=0
for N in $( seq ${LOWER} ${STEP} ${UPPER} );
do
ARR[$INC]=$N
INC=$(($INC+1))
done
RAND=$[$RANDOM % ${#ARR[@]}]
echo $RAND
}
Let’s begin the execution of the script using the following command. It uses the -t
flag for type and is set to text
, -n
is used for the number of files, which is 5
, -l
is the lower bound: 1 characters, and -u is 1000
characters:
$ bash data-maker.sh -t text -n 5 -l 1 -u 1000
To checkout the output, use the following command:
$ ls -la qa-data/*.txt
$ tail qa-data/garbage4.txt
Again, let’s run the data-maker.sh
script, but for binary files. Instead of the size limits being 1 char (1 byte) or 1000
chars (1000
bytes or just less than one kilobyte), the sizes are in MB, with there being 1
–10
MB files:
$ bash data-maker.sh -t binary -n 5 -l 1 -u 10
To check out the output, use the following command. The use of a new command called hexdump
is because we cannot “dump” or “cat” a binary file the same way as we can a “regular” ASCII text file:
$ ls -la qa-data/*.bin
$ hexdump qa-data/garbage0.bin
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
How Script Work:
Let’s understand, how things are happening:
First, we create the ;data-maker.sh
script. This script introduces several new concepts including the ever fascinating concept of randomization. In computers, or really anything in life, true random events or number generation cannot happen and require several mathematical principles such as entropy. While this is beyond the scope of this cookbook, know that when reusing it randomly or even initially, you should give it a unique initialization vector or seed. Using a for loop, we can build an array of numbers using the seq
command. Once the array is built, we choose a “random” value from the array. In each type of file output operation (binary or text), we determine approximately both minimum (-l
or lower) and maximum (-u
or upper) sizes to control the output data.
In step 2, we build 5
text files using the output of dmesg
and our pseudo randomization process. We can see that we iterate until we have five text files created using different sizes and starting points with the dd
command.
In step 3, we verify that indeed we created five files, and in the fifth one, we viewed the tail
of the garbage4.txt
file.
In step 4, we create five binary files (full of zeros) using the dd
command. Instead of using a number of chars, we used megabytes or (MB).
In step 5, we verify that indeed we created five binary files, and in the fifth one, we viewed the contents of the binary file using the hexdump
command. The hexdump
command created a simplified “dump” of all of the bytes inside of the garbage0.bin
file.
0 Comments