Text filter in Linux

March 18, 2021

Text filtering tools

Normally, shell scripting involves report generation, which will include processing various text files and filtering their output to finally produce the desired results. Let’s start discussing the two Linux commands, namely more and less:

more: Sometimes we get a very large output on the screen for certain commands, which cannot be viewed completely in one screen. In such cases, we can use the more command to view the output text one page at a time. Add | more after the command, as follows:

$ ll /dev | more

The | is called a pipe. You will learn more about it in the next chapters. In this command, pressing the spacebar will move the output on the screen one page at a time, or pressing Enter will move the screen one line at a time.

less: Instead of more, if you use less, it will show a screen containing the full text all at once. We can move forward as well as backward. This is a very useful text-filtering tool.

The syntax usage is as follows:

$ command |  lesse.g. $ ll /proc | less

This command will show a long directory listing of the /proc folder. Let’s say that we want to see whether the cpuinfo file is present in the directory. Just press the arrow key up or down to scroll through the display. With the more command, you cannot scroll backward. You can move forward only. With page up and down key presses, you can move forward or backward one page at a time, which is very fast. In addition to scrolling forward or backward, you can search for a pattern using /for forward search and ? for backward search. You can use N for repeating the search in a forward or backward direction.

Head and tail

For testing the next few commands, we will need a file with a sequence of numbers from 1 to 100. For this, use the following command:

$ seq 100 > numbers.txt

The preceding command creates a file with the numbers 1 to 100 on separate lines. The following example shows the usage of the head command:

$ head numbers.txt      // will display 10 lines$ head -3 numbers.txt   // will show first 3 lines$ head +5 numbers.txt   // will show from line 5. In few shells this command may not work

The following example shows the usage of the tail command:

$ tail numbers.txt      // will display last 10 lines$ tail -5  numbers.txt        // will show last 5 lines$ tail +15 numbers.txt  // will show from line 15 onwards. In few shells this may not work

To print lines 61 to 65 from numbers.txt into file log.txt, type the following:

$ head -65 numbers.txt | tail -5 > log.txt

The diff command

The diff command is used to find differences between two files. Let’s see a few examples to find out its usage. The content of file1 is as follows:

file1

I go for shopping on Saturday 
I rest completely on Sunday 
I use Facebook & Twitter for social networking 

The content of file2 is as follows:

file2

Today is Monday. 
I go for shopping on Saturday 
I rest completely on Sunday 
I use Facebook & Twitter for social networking

Type the diff command:

$ diff file1 file2

The output will be this:

0a1> Today is Monday

In the output, 0a1 tells us that line number 1 is added in file2. Let’s see another example with line deletion. The content of file1 is as follows:

file1

Today is Monday 
I go for shopping on Saturday 
I rest completely on Sunday 
I use Facebook & Twitter for social networking

The content of file2 is as follows:

file2

Today is Monday 
I go for shopping on Saturday 
I rest completely on Sunday 

Type the diff command:

$ diff file1 file2

The output is as follows:

Output:

4d3< I use Facebook & Tweeter for social networking.

In the output, 4d3 tells us that line number 4 is deleted in file2. Similarly, the c command will show us changes to a file as well.

The cut command

The cut command is used to extract specified columns/characters of a piece of text, which is given as follows:

  • -c: Specifies the filtering of characters
  • -d: Specifies the delimiter for fields
  • -f: Specifies the field number

The following are a few examples that show the usage of the cut command:

  • Using the next command, from the /etc/passwd file, fields 1 and 3 will be displayed. The display will contain the login name and user ID. We use the -d: option to specify that the field or columns are separated by a colon (:):
$ cut -d: -f1,3 /etc/passwd

Output:

satish@linuxconcept:/home/satish$ cut -d: -f1,3 /etc/passwd
root:0
daemon:1
bin:2
sys:3
sync:4
games:5
man:6
lp:7
mail:8
news:9
uucp:10
proxy:13
www-data:33
backup:34
list:38
irc:39
gnats:41
nobody:65534
  • Using this command, from the /etc/passwd file, the fields 1 to 5 will be displayed. The display will contain the login name, encrypted password, user ID, group ID, and user name:
$ cut -d: -f1-5 /etc/passwd

Output:

satish@backup:/home/satish$ cut -d: -f1-5 /etc/passwd
root:x:0:0:root
daemon:x:1:1:daemon
bin:x:2:2:bin
sys:x:3:3:sys
sync:x:4:65534:sync
games:x:5:60:games
man:x:6:12:man
lp:x:7:7:lp
mail:x:8:8:mail
news:x:9:9:news
uucp:x:10:10:uucp
proxy:x:13:13:proxy
www-data:x:33:33:www-data
backup:x:34:34:backup
list:x:38:38:Mailing List Manager
irc:x:39:39:ircd
gnats:x:41:41:Gnats Bug-Reporting System (admin)
nobody:x:65534:65534:nobody
  • This command will show characters 1 to 3 and 8 to 12 from the emp.lst file:
$ cut -c1-3,8-12 /home/student/emp.lst
  • The output of the date command is sent as an input to the cut command and only the first three characters are printed on screen, which is shown as follows:
$ date | cut -c1-3Mon

The paste command

Using this utility, we can paste two files horizontally; for example, file_1 will become the first column and file_2 will become the second column:

$ paste file_1 file_2

The join command

Consider two files, namely one.txt and two.txt:

The content of one.txt is as follows:

one.txt

1 India 
2 UK 
3 Canada 
4 US 
5 Ireland 

The content of two.txt is as follows:

two.txt

1 New Delhi 
2 London 
3 Toronto 
4 Washington 
5 Dublin 

In this case, for both the files, the common fields are the fields that have serial numbers that are the same in both files. We can combine both files using the following command:

$ join one.txt two.txt

The output will be as follows:

Output:

1 India New Delhi2 UK London3 Canada Toronto4 US Washington5 Ireland Dublin

The uniq command

The following are a few examples showing the usage of the uniq command:

  • This command removes duplicate adjacent lines from the file:
$ cat testaaaaccccbbbbyyzz$ uniq test
  • This output removes the duplicate adjacent lines from test file, shown as follows:

Output:

aaccbbyyzz
  • The next command only prints duplicate lines:
$ uniq -d test

Output:

aaccbb
  • The following command prints the number of occurrences of all elements on an individual line:
$ uniq -c test

Output:

2 aa2 cc2 bb1 yy1 zz

The comm command

The comm command shows the lines unique to file_1 and file_2 along with the common lines in them. We can use various options while using the command in the scripts:

$ cat file_1Barack ObamaDavid CameronNarendra Modi$ cat file_2Barack ObamaAngela MarkelVladimir Putin$ comm --nocheck-order file_1 file_2      Barack Obama  David Cameron    Engela Merkel  Narendra Modi    Vladimir Putin

In the preceding example, we can see the following:

  • The first column shows the unique lines in file_1
  • The second column shows the unique lines in file_2
  • The last column shows the content common to both files

The output shows that the unique lines in file_1 are David Cameron and Narendra Modi. The unique lines in the second file are Engela Merkel and Vladimir Putin. The common name in both the files is Barack Obama, which is displayed in the third column.

The tr command

The tr command is a Linux utility for text processing, such as translating, deleting, or squeezing repeated characters, which is shown as follows:

$ tr '[a-z]' '[A-Z]' < filename

This will translate the lowercase characters to uppercase:

$ tr '|' '~' < emp.lst

This will squeeze multiple spaces into a single space:

$ ls -l | tr -s " "

In this example, the -s option squeezes multiple contiguous occurrences of the character into a single char. Additionally, the -d option can remove characters.

The sort command

This command sorts the contents of a text file, line by line. The options are as follows:

  • -n: Sorts as per the numeric value
  • -d: Sorts as per the dictionary meaning
  • -h: Compares as per the human-readable numbers (for example, 1K 2G)
  • -r: Sorts in the reverse order
  • -t: Option to specify a delimiter for fields
  • +num: Specifies sort field numbers
  • -knum: Specifies sort field numbers
  • $ sort -k4 sample.txt: This will sort according to the fourth field

Sr

Examples of command usage

Explanation

1

sort sample.txt

Alphabetically sorts the lines

2

sort -u sample.txt

Duplicate entries are sorted

3

sort -r sample.txt

Reverse sort

4

sort -n -k3 sample.txt

Numerical sort of the third field

Related Articles

Calculating and reducing the runtime of a script

In this article, we are going to learn how to calculate and reduce the script’s runtime. A simple time command will help in calculating the execution time.PrerequisitesBesides having a terminal open, make sure you have the necessary scripts present in your...

read more

Lorem ipsum dolor sit amet consectetur

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

3 × two =