By apankrat at 2009-11-09 08:59 Edited 2010-09-13 23:04
One of the more interesting and unique features of Bvckup is its support for delta file copying. That's what gives Bvckup its impressive backup speed.
Read on to learn what it is and how it is easily the best thing since sliced bread :-)
When it comes to copying the file, there are generally two ways of doing it.
The most obvious option is to read the source file and then write every single byte into a destination file. This is how a standard copy command works, and this is what Windows Explorer does when it copies a file.
The smarter approach however is to make use of a previous copy of the file if it exists. The source file is read the same way, but the writing occurs only if the current data block is different between the source and the destination. In other words only the differences accumulated since the last copying are copied.
These differences are customarily referred to as deltas, thus giving the copying method its name - the delta copying.
Delta copying attempts to minimize the amount of data that needs to be written out at the destination. This means less disk access, and faster copying. In case if the destination is on the network, it also reduces the amount of resulting network traffic, and this translates into much faster copying.
A couple of concrete examples.
User profile directory in Firefox 3.x includes places.sqlite file, which has a typical size of 30 MB. The file is updated with every visit to a new web page, and the update is isolated to a small part of the file. With conventional copying the entire 30 megabytes require writing to the destination. Delta copying cuts down this number to 384 KB, almost 100 times difference:
2009.11.08 21:58:39 Starting [Firefox] job ..
2009.11.08 21:58:40 Total: 99 MB in 213 files
2009.11.08 21:58:40 Job [Firefox] completed, copied 384 KB out of 32 MB in 1 file
Even greater savings can be typically observed when backing up instant messenger chat logs:
2009.11.08 22:12:27 Starting [Trillian] job ..
2009.11.08 22:12:28 Total: 259 MB in 2632 files
2009.11.08 22:12:29 Job [Trillian] completed, copied 65 KB out of 103 MB in 3 files
The drawback of the delta copying is that when the destination file is updated directly. This is different from the full copying method that uses a temporary file during the copy, and then renames it to the destination once the copying is complete.
The direct file update means that if the copying operation is interrupted, e.g. due to a network outage, the destination file may end up being in a corrupted state. The problem is easily resolved by making a full copy the next time the backup is done, but still there exists that small possibility of having an invalid copy of the file for some period of time.
Secondary, smaller problem with delta copying is that depending on the algorithm being used, it may miss the changes in the file. This generally applies to the algorithms that favor the speed over the accuracy, and this is not the case with the method used by Bvckup.
However, for the sake of being attentive to the details, there is a configuration option that tells the program to make a full copy after specified number of delta copies:
Lastly, and this is not a con per se, depending on the algorithm used the destination file may be required not to change between subsequent copies. In other words, the algorithm may assume that the copying process is the only one that writes to the destination file.
Fortunately, this is a reasonable assumption in the context of the file backup. Backup copies are typically read-only replicas and they are copied elsewhere prior to be worked on and modified.
How it works
The variation of the delta copying as implemented by Bvckup works as follows.
When the source file is read, it is processed in 32 KB blocks. For each block the program computes a checksum using MD5 digest algorithm. This yields a 16 byte value, which is then compared with the same block's checksum from the previous run. If checksums do not match, the block is copied and the stored checksum is updated to its new value. If the source file grows or shrinks between the backups, it is accommodated accordingly.
File's checksums are stored in a separate file that is kept in the Bvckup configuration folder. For every 32 KB of the source file the checksum file grows by 20 bytes. This translates, for example, into 640 KB worth of checksums stored for 1 GB of data, which is a reasonable overhead once the copying speed gains are considered.
If Bvckup is configured to scan the destination, and a destination file is detected to be modified, the program simply switches to the full copying and recalculates an entire checksum set for the file from scratch.
It is a simple idea and an uncomplicated implementation that dramatically speeds up backups in very many cases.
Surprisingly enough no backup software I looked at supported it. Some supported a variation of rsync synchronization, but that required running a copy of the program at the destination, which was hardly an option when backing up onto a NAS device.
And since I couldn't find it, I wrote it.