How to Create a diff of two files and patching

In several situations or cases, we need to identify the differences between files and patch them, especially when updating configuration files, applications or patches, etc.

So, when we start thinking about our day-to-day communication with the Linux operating system, we found this is the general thing we regularly do.

There are a couple of things where we need to do this dif and patch operations:

  • When determining whether a particular script or configuration file has modifications
  • When considering differences between versions, or migrating data between an old to new script, and so on

So, what is a diff or differential?

Diff is an output that describes the differences between two files (file A and file B). File A is the source, and file B is assumed to be a modified file.

If the output of diff is not created, File A and B are either empty, or there are no differences. Diffs in a unified format look similar to this:

$ diff -urN fileA.txt fileB.txt 
--- fileA.txt 2017-12-11 15:06:49.972849620 -0500
+++ fileB.txt 2017-12-11 15:08:09.201177398 -0500
@@ -1,3 +1,4 @@ 
12345
-abcdef
+abcZZZ
+789aaa

There are different diffs formats, but the unified format is the most popular (and used by the FOSS crowd).

It contains data of both files (A and B), the line numbers and counts in each, and the content added or changed.

If we look at the above sample, we can see that in the original, the string abcdef is removed (-) and then re-added (+) as abcdZZZ. And there is the further addition of a new line containing 789aaa (which can also be seen here: @@ -1,3 +1,4 @@).

A patch is a unified diff that contains changes to one or more files that are to be applied in a specific order or method, hence the concept of patching being the process of applying a patch (which contains diff information).

A patch can consist of several diffs concatenated together as well.

Getting ready

Before start lab section, these two utilities need to be installed:

$ sudo apt-get install patch diff

Now, let’s create a configuration file that’s copied from a real one:

$ cp /etc/updatedb.conf ~/updatedb-v2.conf

Open updatedb-v2.conf and change the contents to look like this:

updatedb-v2.conf

PRUNE_BIND_MOUNTS="yes"
# PRUNENAMES=".git .bzr .hg .svn"
PRUNEPATHS="/tmp /var/spool /media /home/.ecryptfs /var/lib/schroot /media /mount"
PRUNEFS="NFS nfs nfs4 rpc_pipefs afs binfmt_misc proc smbfs autofs iso9660 ncpfs coda devpts ftpfs devfs mfs shfs sysfs cifs lustre tmpfs usbfs udf fuse.glusterfs fuse.sshfs curlftpfs ecryptfs fusesmb devtmpfs"

In the event that your updatedb-v2.conf looks drastically different, add /media /mount to the PRUNEPATHS variable. Notice that they are separated by a space.

How to do it…

Open a terminal, and run the following commands in order to understand the diff command:

$ diff /etc/updatedb.conf ~/updatedb-v2.conf
$ diff -urN /etc/updatedb.conf ~/updatedb-v2.conf

At this point, only the diff information has been output to the console’s standard out and a patch file has not been created. To create the actual patch file, execute the following command:

$ diff -urN /etc/updatedb.conf ~/updatedb-v2.conf > 001-myfirst-patch-for-updatedb.patch

Note:

Patches can be found in many forms, but they usually have the.patch extension and are preceded by a number and a human readable name.

Now, before applying a patch, it can also be tested to ensure that the results are as expected. Try the following commands:

$ echo "NEW LINE" > ~/updatedb-v3.conf
$ cat ~/updatedb-v2.conf >> ~/updatedb-v3.conf
$ patch --verbose /etc/updatedb.conf < 001-myfirst-patch-for-updatedb.patch

Let’s see what happens when patches fail to apply using the following commands:

$ patch --verbose --dry-run ~/updatedb-v1.conf < 001-myfirst-patch-for-updatedb.patch 
$ patch --verbose ~/fileA.txt < 001-myfirst-patch-for-updatedb.patch

How it works…

The first diff command outputs the changes in the simple diff format. However, in the second instance, when running the diff command, we use the -urN flag(s). -u stands for the unified format, -r stands for recursive, and -N stands for a new file:

$ diff /etc/updatedb.conf ~/updatedb-v2.conf3c3
< PRUNEPATHS="/tmp /var/spool /media /home/.ecryptfs /var/lib/schroot"
---
> PRUNEPATHS="/tmp /var/spool /media /home/.ecryptfs /var/lib/schroot /media /mount"  

$ diff -urN /etc/updatedb.conf ~/updatedb-v2.conf
--- /etc/updatedb.conf 2014-11-18 02:54:29.000000000 -0500
+++ /home/rbrash/updatedb-v2.conf 2017-12-11 15:26:33.172955754 -0500
@@ -1,4 +1,4 @@ 
PRUNE_BIND_MOUNTS="yes" 
# PRUNENAMES=".git .bzr .hg .svn"
-PRUNEPATHS="/tmp /var/spool /media /home/.ecryptfs /var/lib/schroot"
+PRUNEPATHS="/tmp /var/spool /media /home/.ecryptfs /var/lib/schroot /media /mount" 
PRUNEFS="NFS nfs nfs4 rpc_pipefs afs binfmt_misc proc smbfs autofs iso9660 ncpfs coda devpts ftpfs devfs mfs shfs sysfs cifs lustre tmpfs usbfs udf fuse.glusterfs fuse.sshfs curlftpfs ecryptfs fusesmb devtmpfs"

Now, we have created a patch by redirecting standard out to the 001-myfirst-patch-for-updatedb.patch file:

$ diff -urN /etc/updatedb.conf ~/updatedb-v2.conf > 001-myfirst-patch-for-updatedb.patch

Now we have created a new modified version of configuration file ~/updatedb-v3, notice anything from the dry-run? 

Ignoring that /etc/updatedb.conf file only has read-only permissions, we can see that HUNK #1 is applied successfully. 

hunk stands for a section of the diff, and you can have several for one file or many files inside of the same patch. 

Did you notice that the line numbers didn’t match precisely like those in the patch? 

It still applied the patch as it knew enough information and fudged the data to match to apply successfully. 

Be aware of this functionality when dealing with large files, which may have similar match criteria:

$ patch --verbose --dry-run /etc/updatedb.conf < 001-myfirst-patch-for-updatedb.patch 
Hmm... Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- /etc/updatedb.conf 2014-11-18 02:54:29.000000000 -0500
|+++ /home/rbrash/updatedb-v2.conf 2017-12-11 15:26:33.172955754 -0500
--------------------------
File /etc/updatedb.conf is read-only; trying to patch anyway
checking file /etc/updatedb.conf
Using Plan A...
Hunk #1 succeeded at 1.
done

If we attempt to apply the patch to a file on a file that does not match, it will fail, like in the following output (if –dry-run is specified). If –dry-run is not specified, the failure will be stored in a reject file as is noted in this line: 1 out of 1 hunk FAILED — saving rejects to file /home/rbrash/fileA.txt.rej:

$ patch --verbose --dry-run /etc/updatedb.conf1 < 001-myfirst-patch-for-updatedb.patch 
Hmm... Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- /etc/updatedb.conf 2014-11-18 02:54:29.000000000 -0500
|+++ /home/rbrash/updatedb-v2.conf 2017-12-11 15:26:33.172955754 -0500
--------------------------
checking file /etc/updatedb.conf1
Using Plan A...
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED
done
$
$ patch --verbose ~/fileA.txt < 001-myfirst-patch-for-updatedb.patch 
Hmm... Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- /etc/updatedb.conf 2014-11-18 02:54:29.000000000 -0500
|+++ /home/rbrash/updatedb-v2.conf 2017-12-11 15:26:33.172955754 -0500
--------------------------
patching file /home/rbrash/fileA.txt
Using Plan A...
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file /home/rbrash/fileA.txt.rej
done

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Related Articles