Backups and More with rsync

1
7179
Time to rsync

Time to rsync

Learn how to use this powerful utility that almost all experts use in their day-to-day work, to perform tasks like back-ups, and much more.

The rsync utility can be used cross-platform—Linux, Mac OS X and Windows (with Cygwin, of course) — and, in combination with cron and SSH, it can easily be scripted. This makes it one of the essential utilities in one’s toolkit, even if not planning to use it for backups. Another advantage is that it is bundled with almost all major Linux distributions today.

The killer feature, really, is differential backups — rsync, with its unique algorithm, allows you to transfer only the changes made in a file/directory tree, instead of re-transferring all data. This is very beneficial when synchronising large files or directory trees with gigabytes of data. rsync only transfers changed portions, and applies the changes to the file/directory tree copy on the other system, somewhat like the patch utility. It can even be used to synchronise files locally (on the same system), if you want to make backups on the local machine itself (say, to a different drive, like a USB drive). Overall, it is a simple, easy and efficient solution, where we don’t even need to install any complicated backup software.

Setting up and configuring

Normally, rsync can directly be used by specifying source and destination directories, but we usually set it up in daemon mode (an “rsync server”) at one end, so that it can receive requests for synchronisation. It can be set up in one-way or two-way synchronisation methods, as a standalone daemon configuration or as inetd configuration. The type of configuration used depends on the amount of traffic that our daemon is going to receive.

For significant traffic throughout the day, it is better to have a standalone daemon, otherwise the inetd configuration will do. Also, it is obvious that for two-way synchronisation, we have to run rsync in daemon mode at both ends. To configure rsync in daemon mode, modify the /etc/rsyncd.conf file as follows:

motd file = /etc/rsyncd.motd
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock

	[path_1]
	path = /directories/here
	comment = something to comment
	uid = nobody
	gid = nobody
	auth users = username
	secrets file = /etc/rsyncd.scrt

As you can see, this file has two sections — global parameters and path-specific settings. All file specifications like motd (message of the day) file, log file, pid file, lock file, etc., come under global specifications, while parameters that follow the [path_1] tag are path-specific settings
([path_1] is the name of that particular path). More than one such path can be set up in a configuration file. Though I have only specified relevant and important settings here, you can take a look at the man page if you want a complete list of path-specific and global configuration parameters.

However, here is a brief explanation of some of them:

Parameter Explanation
path The physical directory in the filesystem to be made available
uid The user that should execute the actual transfer process
gid The group name or ID that accompanies the uid parameter
auth users The names of users, as specified in the secrets file, that are allowed to connect to this path. These do not refer to the actual users of the system.
secrets file The name of the secrets file, specifying the user names and passwords that are asked on connection to the client

Now that rsync has been configured on the server, you can start the stand-alone daemon with the command: rsync --daemon.

rsync client

The rsync server can now be used from the client with the following command:

rsync -avz host::path_1 /directory/at/destination

This command will copy all files from the remote module named path_1 to the local destination folder on the client. This type of command is known as a “pull” command, while one with the remote path as destination is known as a “push” command. This will make it prompt for the password configured for the respective module, which can be bypassed by using the --password-file option (for use in scripts).

Now, let’s look at another example, of a local copy of files, with some of the optional switches:

rsync --verbose --progress --stats --compress --recursive --times --perms \
--links --delete --exclude *.pdf /home/user/Folder1/ /home/user/Folder2

Some of the options like --verbose, --progress and stats are self-explanatory. Compression (--compress) is one of the features rsync provides to speed up data transfers, which is actually useful when synchronising with remote hosts. The next few options specify the following:

  • --recursive — to recursively copy all sub-folders;
  • --times — to maintain previous timestamps
  • --perms — to keep the permissions intact
  • --links — to copy symbolic links as links (not to de-reference)

There are a number of other choices that you can use, like --copy-links and --safe-links — the first copies the linked directories/files instead of the links themselves, while the second altogether avoids links pointing outside the tree.

The --delete option specifies that one wants to delete (earlier copied) files in the destination folder, if they have been deleted in the source directory since the previous synchronisation. Finally, --exclude *.pdf prevents all PDF files from being copied.

Note: The presence or absence of the trailing slash in the source/destination path is very important; it makes quite a difference to how files are copied. In the above command, the trailing slash in the source, /home/user/Folder1/ means that the contents (files and sub-folders) inside Folder1 are to be copied to the destination folder. Without the trailing slash, rsync first creates the Folder1 folder at the destination, and then copies Folder1‘s contents into that new folder. Always keep this in mind, else you can end up with quite a mess to sort out if you’re backing up two or three project folders, like (in /home/ankit/Projects) projA, projB, projC. Or if, for example, your rsync commands are as follows:

rsync --recursive --delete /home/ankit/Projects/projA/ /media/BACKUP/currentprojects
rsync --recursive --delete /home/ankit/Projects/projB/ /media/BACKUP/currentprojects
rsync --recursive --delete /home/ankit/Projects/projC /media/BACKUP/currentprojects

Now, in the currentprojects folder on the backup drive, you will have a folder projC, containing that project’s files — but the files and sub-folders from the projA and projB folders will be mixed together in the currentprojects folder, not inside individual projA and projB folders… and this will be because you wrongly used the trailing slash for those folders. If you
have to recover files from this backup, imagine the mess and confusion, and the effort required to sort things out!

GUI tools

There are a number of GUI tools available for rsync, most of which obviously cannot match the flexibility provided by the command-line itself. Still, some of the tools, like Grsync and Gadmin-Rsync, do a good job of implementing the basic functionality. Another really good application based on the rsync, cron and diff commands is Back in Time, which resembles the Time Machine feature of Apple’s Mac OS. It provides more sophisticated features, and an interface understandable even by non-geeks.

The Interface for Grsync
The Interface for Grsync

The GNOME view for Back In Time Application
The GNOME view for Back In Time Application

In conclusion, we can say that rsync is a simple, yet really powerful application, which when used with tools like cron can provide flexibility and performance comparable with complex and even enterprise back-up tools.

1 COMMENT

  1. Nice run-down. If I may add: if sync’ing to linux from windows or vice versa, one could either download cygwin setup and install rsync (under +Net) and optionally “cygrunsrv” windows service initiator (under +Admin), or the freeware DeltaCopy program which is basically just a wrapper around rsync built with cygwin. Though, until this date, DeltaCopy still seems to have issues with unicode characters, probably due to using old cygwin dll’s, and my resolution was replacing 6 dll’s in the program’s directory with much more recent versions from cygwin’s (bin directory) which I had also minimally installed.

LEAVE A REPLY

Please enter your comment!
Please enter your name here