The rsync utility can be used cross-platform—Linux, Mac OS X and Windows (with Cygwin, of course) — and, in combination with cron and SSH, it can easily be scripted. This makes it one of the essential utilities in one’s toolkit, even if not planning to use it for backups. Another advantage is that it is bundled with almost all major Linux distributions today.
The killer feature, really, is differential backups — rsync, with its unique algorithm, allows you to transfer only the changes made in a file/directory tree, instead of re-transferring all data. This is very beneficial when synchronising large files or directory trees with gigabytes of data. rsync only transfers changed portions, and applies the changes to the file/directory tree copy on the other system, somewhat like the patch utility. It can even be used to synchronise files locally (on the same system), if you want to make backups on the local machine itself (say, to a different drive, like a USB drive). Overall, it is a simple, easy and efficient solution, where we don’t even need to install any complicated backup software.
Setting up and configuring
Normally, rsync can directly be used by specifying source and destination directories, but we usually set it up in daemon mode (an “rsync server”) at one end, so that it can receive requests for synchronisation. It can be set up in one-way or two-way synchronisation methods, as a standalone daemon configuration or as inetd configuration. The type of configuration used depends on the amount of traffic that our daemon is going to receive.
For significant traffic throughout the day, it is better to have a standalone daemon, otherwise the inetd configuration will do. Also, it is obvious that for two-way synchronisation, we have to run rsync in daemon mode at both ends. To configure rsync in daemon mode, modify the
/etc/rsyncd.conf file as follows:
motd file = /etc/rsyncd.motd log file = /var/log/rsyncd.log pid file = /var/run/rsyncd.pid lock file = /var/run/rsync.lock [path_1] path = /directories/here comment = something to comment uid = nobody gid = nobody auth users = username secrets file = /etc/rsyncd.scrt
As you can see, this file has two sections — global parameters and path-specific settings. All file specifications like motd (message of the day) file, log file, pid file, lock file, etc., come under global specifications, while parameters that follow the
[path_1] tag are path-specific settings
[path_1] is the name of that particular path). More than one such path can be set up in a configuration file. Though I have only specified relevant and important settings here, you can take a look at the man page if you want a complete list of path-specific and global configuration parameters.
However, here is a brief explanation of some of them:
|path||The physical directory in the filesystem to be made available|
|uid||The user that should execute the actual transfer process|
|gid||The group name or ID that accompanies the uid parameter|
|auth users||The names of users, as specified in the secrets file, that are allowed to connect to this path. These do not refer to the actual users of the system.|
|secrets file||The name of the secrets file, specifying the user names and passwords that are asked on connection to the client|
Now that rsync has been configured on the server, you can start the stand-alone daemon with the command:
The rsync server can now be used from the client with the following command:
rsync -avz host::path_1 /directory/at/destination
This command will copy all files from the remote module named
path_1 to the local destination folder on the client. This type of command is known as a “pull” command, while one with the remote path as destination is known as a “push” command. This will make it prompt for the password configured for the respective module, which can be bypassed by using the
--password-file option (for use in scripts).
Now, let’s look at another example, of a local copy of files, with some of the optional switches:
rsync --verbose --progress --stats --compress --recursive --times --perms \ --links --delete --exclude *.pdf /home/user/Folder1/ /home/user/Folder2
Some of the options like
--progress and stats are self-explanatory. Compression (
--compress) is one of the features rsync provides to speed up data transfers, which is actually useful when synchronising with remote hosts. The next few options specify the following:
--recursive— to recursively copy all sub-folders;
--times— to maintain previous timestamps
--perms— to keep the permissions intact
--links— to copy symbolic links as links (not to de-reference)
There are a number of other choices that you can use, like
--safe-links — the first copies the linked directories/files instead of the links themselves, while the second altogether avoids links pointing outside the tree.
--delete option specifies that one wants to delete (earlier copied) files in the destination folder, if they have been deleted in the source directory since the previous synchronisation. Finally,
--exclude *.pdf prevents all PDF files from being copied.
/home/user/Folder1/means that the contents (files and sub-folders) inside
Folder1are to be copied to the destination folder. Without the trailing slash, rsync first creates the
Folder1folder at the destination, and then copies
Folder1‘s contents into that new folder. Always keep this in mind, else you can end up with quite a mess to sort out if you’re backing up two or three project folders, like (in
projC. Or if, for example, your rsync commands are as follows:
rsync --recursive --delete /home/ankit/Projects/projA/ /media/BACKUP/currentprojects rsync --recursive --delete /home/ankit/Projects/projB/ /media/BACKUP/currentprojects rsync --recursive --delete /home/ankit/Projects/projC /media/BACKUP/currentprojects
Now, in the
currentprojects folder on the backup drive, you will have a folder
projC, containing that project’s files — but the files and sub-folders from the
projB folders will be mixed together in the
currentprojects folder, not inside individual
projB folders… and this will be because you wrongly used the trailing slash for those folders. If you
have to recover files from this backup, imagine the mess and confusion, and the effort required to sort things out!
There are a number of GUI tools available for rsync, most of which obviously cannot match the flexibility provided by the command-line itself. Still, some of the tools, like Grsync and Gadmin-Rsync, do a good job of implementing the basic functionality. Another really good application based on the
diff commands is Back in Time, which resembles the Time Machine feature of Apple’s Mac OS. It provides more sophisticated features, and an interface understandable even by non-geeks.
In conclusion, we can say that rsync is a simple, yet really powerful application, which when used with tools like cron can provide flexibility and performance comparable with complex and even enterprise back-up tools.
The author is a geek with a crush on Java, and also loves flirting with almost all other stuff related to Web technologies. Feel free poke fun at his articles and direct your feedback to the comments section below.