Write bash script for sanitizing user input and for repeatable results

One of the best practices for scripts (or programs, for that matter) is controlling user input, not only for security, but for controlling functionality in a way that input provides predictable results. For example, imagine a user who enters a number instead of a string. Did you check it? Will it cause your script to exit prematurely? Or will an unforeseen event occur such as the user entering rm -rf /* instead of a valid user name?

In any case, limiting program user input is also useful to you as the author because it can limit paths users take and reduce undefined behavior or bugs. Therefore, if quality assurance is important, test cases and input/output validation can be reduced.


This script might be introducing some readers to a concept they would like to avoid: software engineering. It’s true, you are probably writing scripts to quickly get a task completed, but if your script is to be used by other people (or for a long time), its great to catch errors early when they occur and prevent program misbehaviour.

Let’s look at a step by step example using a program that should echo the username of the user who executed the script via a prompt:

The script expects input to be read into a variable using the read command (for example).

The variable is assumed to be a string, but it could be the user’s name, a number, a post address in a foreign country, an email, or even a malicious command.

The script reads the variable and runs the echo command.

The results returned could be garbage, but could also be executed by another script—what could go wrong?

In all efforts, if security is not important, then the robustness of an application could be!

How to do it…

Let’s start our activity as follows:

Begin by opening a terminal and a new shell script called bad_input.sh with the following contents:


echo $FILE_NAME 

Now, run the following commands:

$ touch TEST.txt 
$ mkdir new_dir/ 
$ bash bad_input.sh "." 
$ bash bad_input.sh "../"

Create a second script called better_input.sh:


# first, strip underscores
FILE_NAME_CLEAN=$(sed 's/..//g' <<< ${FILE_NAME_CLEAN}) 
# next, replace spaces with underscores 
# now, clean out anything that's not alphanumeric or an underscore 
# here you should check to see if the file exists before running the command 

Next, run the script using these commands and not the output:

$ bash better_input.sh "." 
$ bash better_input.sh "../" 
$ bash better_input.sh "anyfile"

Next, create a new script called validate_email.sh to validate email addresses (similarly to how one would validate DNS names):


echo "${EMAIL}" | grep '^[a-zA-Z0-9._]*@[a-zA-Z0-9]*\.[a-zA-Z0-9]* 
if [ $RES -ne 1 ]; then 
	echo "${EMAIL} is valid" 
	echo "${EMAIL} is NOT valid" 
fi >/dev/null 
if [ $RES -ne 1 ]; then 
	echo "${EMAIL} is valid" 
	echo "${EMAIL} is NOT valid" 

Again, we can test the output:

$ bash validate_email.sh 
ron.brash@somedomain.com ron.brash@somedomain.com is valid 
$ bash validate_email.sh 
ron.brashsomedomain.com ron.brashsomedomain.com is NOT valid

Another common task would be to validate IP addresses. Create another script called validate_ip.sh with the following contents:


if echo "$IP_ADDR" | { read octet1 octet2 octet3 octet4 extra;
	[[ "$octet1" == *[[:digit:]]* ]] && 
	test "$octet1" -ge 0 && test "$octet1" -le 255 && 
	[[ "$octet2" == *[[:digit:]]* ]] && 
	test "$octet2" -ge 0 && test "$octet2" -le 255 && 
	[[ "$octet3" == *[[:digit:]]* ]] && 
	test "$octet3" -ge 0 && test "$octet3" -le 255 && 
	[[ "$octet4" == *[[:digit:]]* ]] && 
	test "$octet4" -ge 0 && test "$octet4" -le 255 && 
	test -z "$extra" 2> /dev/null; }; then 
	echo "${IP_ADDR} is valid" 
	echo "${IP_ADDR} is NOT valid" 

Try running the following commands:

$ bash validate_ip.sh "a.a.a.a" 
$ bash validate_ip.sh "0.a.a.a" 
$ bash validate_ip.sh "" 
$ bash validate_ip.sh "" 
$ bash validate_ip.sh ""

How script works…

Let’s understand our script in detail:

First, we begin by creating the bad_input.sh script—it takes $1 (or argument 1) and runs the list or ls command.

Running the following commands, we can either list everything in the directory, subdirectory, or even traverse directories backwards! This is clearly not good and security vulnerabilities have even allowed malicious hackers to traverse through a web server—the idea is to contain the input for predictable results and to control input instead of allowing everything:

$ touch TEST.txt 
$ mkdir new_dir/ 
$ bash bad_input.sh "." ... 
$ bash bad_input.sh "../" 
../all the files backwards

In the second script, better_input.sh, the input is sanitized by the following steps. Additionally, one could also check whether the file being listed is in fact there as well:

  • Remove any underscores (necessary).
  • Remove any sets of double spaces.
  • Replace spaces with underscores.
  • Remove any non-alphanumeric values or anything else that is not an underscore.
  • Then, run the ls command.

Next, running better_input.sh will allow us to view the current working directory or any file contained within it. Wildcards have been removed and now we cannot traverse directories.

To validate the form of an email, we use the grep command combined with a regex. We are merely looking for the form of an email account name, an @ symbol, and a domain name in the form of acme.x. It is important to note that we are not looking to see whether an email is truly valid or can make its way to the intended destination, but merely whether it fits what an email should look like. Additional tests such as testing the domain’s MX or DNS mail records could extend this functionality to improve the likelihood of a user entering a valid email.

In the next step, we test two domain names—one without the @ symbol (invalid) and one with the @ symbol (valid). Feel free to try several combinations.

Validating an IP address is always something that could be done with a regex, but for the purpose of easy-to-use tools that get the job done, read and simple tests using test (and evaluations) will work just fine. In its basic form, an IP address consists of four octets (or in layman terms, four values separated by a period). Without exploring what a truly valid IP address is, normally a valid octet is between 0 and 255 (never more and never less). IP addresses can have various categories and classes called subnets.

In our examples, we know that an IP address containing alphabetic characters is not a valid IP address (excluding the periods), and that the values range between 0 and 255 per octet. 192.168.0.x (or 192.168.1.x) is an IP subnet many people see on their home routers.


Submit a Comment

Your email address will not be published. Required fields are marked *

3 + 3 =

Related Articles