Home » BASH » Write a bash script to generating datasets and random files of various size

Write a bash script to generating datasets and random files of various size

Update on:
Oct 3, 2021

Usually, data that mimics real-world data is always the best, but sometimes we need an assortment of files of various content and size for validation testing without delay. Imagine that you have a web server and it is running some sort of application that accepts files for storage. However, the files have a size limit being enforced. Wouldn’t it be great to just whip up a batch of files in an instant?

To do this, we can use some few file system features such as /dev/random and a useful program called dd. The dd command is a utility that can be used to convert and copy files (including devices due to Linux’s concept of everything is a file, more or less). It can be used in a later recipe to back up data on an SD card (remember your favorite Raspberry Pi project?) or to “chomp” through files byte by byte without losses. Typical minimal dd usage can be $ dd if="inputFile" of="outputFile" bs=1M count=10. From this command, we can see:

  • if=: Stands for input file
  • of=: Stands for output file
  • bs=: Stands for block size
  • count=: Stands for numbers of blocks to be copied

Options bs= and count= are optional if you want to perform a 1:1 (pure duplicate) copy of a file because dd will attempt to use reasonably efficient parameters to provide adequate performance. The dd command also has a number of other options such as seek=, which will be explored later when performing low-level backups in another recipe. The count option is typically not needed as it’s far more common to copy an entire file instead of a section (when performing backups).


/dev/random is a device in Linux (hence the /dev path) which can be used to produce random numbers for use in your scripts or applications. There are also other /dev paths such as the console and various adaptors (for example, USB sticks or mice), all of which may be accessible, and gaining knowledge of them is recommended.


To get ready for this recipe, install the dd command as follows and make a new directory called qa-data/:

[et_pb_dmb_code_snippet code=”JCBzdWRvIGFwdC1nZXQgaW5zdGFsbCBkZCBic2RtYWludXRpbHMgCiQgbWtkaXIgcWEtZGF0YQ==” _builder_version=”4.9.4″ _module_preset=”3a2d4e4b-f2ae-4571-a284-ca584312491f” hover_enabled=”0″ sticky_enabled=”0″]JCBzdWRvIGFwdC1nZXQgaW5zdGFsbCBkZCBic2RtYWludXRpbHMgCiQgbWtkaXIgcWEtZGF0YQ==[/et_pb_dmb_code_snippet]

This script uses the dmesg command, which is used to return system information such as interface status or the system boot process. It is nearly always present on a system and therefore a good substitute to reasonable system level “lorem ipsum”. If you wish to use another type of random text, or a dictionary of words, dmesg can easily be replaced! Another two commands used are seq and hexdump. The seq command can generate an array of n numbers from a starting point using a specified increment, and hexdump produces a human readable representation of a binary (or executable) in hexadecimal format.

Write Script:

Open a terminal and create a new script called data-maker.sh.

The following is the code snippet of the script:


Let’s begin the execution of the script using the following command. It uses the -t flag for type and is set to text-n is used for the number of files, which is 5-l is the lower bound: 1 characters, and -u is 1000 characters:

[et_pb_dmb_code_snippet code=”JCBiYXNoIGRhdGEtbWFrZXIuc2ggLXQgdGV4dCAtbiA1IC1sIDEgLXUgMTAwMA==” _builder_version=”4.9.4″ _module_preset=”3a2d4e4b-f2ae-4571-a284-ca584312491f” hover_enabled=”0″ sticky_enabled=”0″]JCBiYXNoIGRhdGEtbWFrZXIuc2ggLXQgdGV4dCAtbiA1IC1sIDEgLXUgMTAwMA==[/et_pb_dmb_code_snippet]

To checkout the output, use the following command:

[et_pb_dmb_code_snippet code=”JCBscyAtbGEgcWEtZGF0YS8qLnR4dCAKJCB0YWlsIHFhLWRhdGEvZ2FyYmFnZTQudHh0″ _builder_version=”4.9.4″ _module_preset=”3a2d4e4b-f2ae-4571-a284-ca584312491f” hover_enabled=”0″ sticky_enabled=”0″]JCBscyAtbGEgcWEtZGF0YS8qLnR4dCAKJCB0YWlsIHFhLWRhdGEvZ2FyYmFnZTQudHh0[/et_pb_dmb_code_snippet]

Again, let’s run the data-maker.sh script, but for binary files. Instead of the size limits being 1 char (1 byte) or 1000 chars (1000 bytes or just less than one kilobyte), the sizes are in MB, with there being 110 MB files:

[et_pb_dmb_code_snippet code=”JCBiYXNoIGRhdGEtbWFrZXIuc2ggLXQgYmluYXJ5IC1uIDUgLWwgMSAtdSAxMA==” _builder_version=”4.9.4″ _module_preset=”3a2d4e4b-f2ae-4571-a284-ca584312491f” hover_enabled=”0″ sticky_enabled=”0″]JCBiYXNoIGRhdGEtbWFrZXIuc2ggLXQgYmluYXJ5IC1uIDUgLWwgMSAtdSAxMA==[/et_pb_dmb_code_snippet]

To check out the output, use the following command. The use of a new command called hexdump is because we cannot “dump” or “cat” a binary file the same way as we can a “regular” ASCII text file:

[et_pb_dmb_code_snippet code=”JCBscyAtbGEgcWEtZGF0YS8qLmJpbiAKJCBoZXhkdW1wIHFhLWRhdGEvZ2FyYmFnZTAuYmluIAowMDAwMDAwIDAwMDAgMDAwMCAwMDAwIDAwMDAgMDAwMCAwMDAwIDAwMDAgMDAwMCAKKg==” _builder_version=”4.9.4″ _module_preset=”3a2d4e4b-f2ae-4571-a284-ca584312491f” hover_enabled=”0″ sticky_enabled=”0″]JCBscyAtbGEgcWEtZGF0YS8qLmJpbiAKJCBoZXhkdW1wIHFhLWRhdGEvZ2FyYmFnZTAuYmluIAowMDAwMDAwIDAwMDAgMDAwMCAwMDAwIDAwMDAgMDAwMCAwMDAwIDAwMDAgMDAwMCAKKg==[/et_pb_dmb_code_snippet]

How Script Work:

Let’s understand, how things are happening:

First, we create the ;data-maker.sh script. This script introduces several new concepts including the ever fascinating concept of randomization. In computers, or really anything in life, true random events or number generation cannot happen and require several mathematical principles such as entropy. While this is beyond the scope of this cookbook, know that when reusing it randomly or even initially, you should give it a unique initialization vector or seed. Using a for loop, we can build an array of numbers using the seq command. Once the array is built, we choose a “random” value from the array. In each type of file output operation (binary or text), we determine approximately both minimum (-l or lower) and maximum (-u or upper) sizes to control the output data.

In step 2, we build 5 text files using the output of dmesg and our pseudo randomization process. We can see that we iterate until we have five text files created using different sizes and starting points with the dd command.

In step 3, we verify that indeed we created five files, and in the fifth one, we viewed the tail of the garbage4.txt file.

In step 4, we create five binary files (full of zeros) using the dd command. Instead of using a number of chars, we used megabytes or (MB).

In step 5, we verify that indeed we created five binary files, and in the fifth one, we viewed the contents of the binary file using the hexdump command. The hexdump command created a simplified “dump” of all of the bytes inside of the garbage0.bin file.

Related Posts

Creating a config file and using it in tandem with your scripts

In this article, we are going to create a config file and use it in our shell script.PrerequisitesBesides having a terminal open, you need basic knowledge of creating scripts and config files.Write scriptNow, we are going to create a script and config file. The...

Calculating and reducing the runtime of a script

In this article, we are going to learn how to calculate and reduce the script’s runtime. A simple time command will help in calculating the execution time.PrerequisitesBesides having a terminal open, make sure you have the necessary scripts present in your...

Using Bash to monitor battery life and optimize it

In this article, we will learn about the TLP Linux tool. TLP is a command-line tool; it is used for power management and will optimize the battery life.PrerequisitesBesides having a Terminal open, you need to ensure that you have TLP installed on your system.How to do...

Creating a simple NAT and DMZ firewall using bash script

In this article, we will create a simple NAT firewall with DMZ using iptables.PrerequisitesBesides having a Terminal open, you need to ensure that iptables is installed in your machine.Write scriptWe will write a script to set up a DMZ using iptables. Create...

Follow Us

Our Communities

More on BASH

The Ultimate Managed Hosting Platform
Load WordPress Sites in as fast as 37ms!



Submit a Comment

Your email address will not be published.

two + 1 =