Hints about Backups Strategies Under Mac OS X

Some time ago I started making backups from the command line, with a command similar to:

sudo time /usr/local/bin/rsync -aEv --progress --delete-excluded \
/Users/olaf /Volumes/Chimera/olaf/Backup/ --exclude "olaf/Movies/**.avi" \
--exclude "olaf/Library/Caches" --exclude "olaf/Music" \
--exclude "olaf/Movies" --exclude "olaf/Downloads"

The result was the update of the backup folder, at each backup.

After the discussion in the MacityNet forum about the WWDC 2006 from Apple, I thought about using Subversion to make my backups by executing a "commit" every time I wanted an updated backup. This way, I would be able to get back every file and every version of it back in the time, something similar to what "Time Machine" in Leopard does.
This would be great, even because Subversion would store only differences between revisions, not whole files, thus saving space on hard disk. It even works with binary files.

Well, this can't be done. The problem, this time, are not metadata (often an issue you always have to check when using traditional UNIX tools under Mac OS X), but the space taken on the hard disk! It has been SIX years since the Subversion developers have been asked to add the option to "consolidate" (or "obliterate") versions before a specified date to save disk space, but still nothing. You can read the discussion here. Only recently something has sterted moving.

Anyway, Subversion is useless at the time being. Another option would be CVS, but it doesn't work well with binary files and has no metadata support.

A tool that is able to make good backups is rsync, when used as shown here: Backups using rsync and Easy Automated Snapshot-Style Backups with Rsync. The idea is interesting: you can get something between a multiple backup system, where you have an indipendent directory for each backup, and file-level incremental backups (similar to what Subversion and Time Machine can).

I then wrote the following script to automate the work and to create a new backup folder with the current date and time per backup:

#!/usr/bin/env sh

NOW=$(date +"%Y.%m.%d-%H.%M")

/usr/local/bin/rsync -aEi --progress --ea-checksum \
--link-dest="../Last" /Users/olaf /Volumes/Ercole/olaf/Backup/$NOW \
--exclude="/olaf/.Trash/" --exclude="/olaf/Library/Caches/" \
--exclude="/olaf/Downloads/" --exclude="/olaf/Movies/" --exclude="/olaf/Music/" \
&& ln -fhs /Volumes/Ercole/olaf/Backup/$NOW /Volumes/Ercole/olaf/Backup/Last

The folder "Last" is a symbolic link to the latest backup. Under Mac OS X Tiger, you may want to check the following rsync patch. The option --ea-checksum is taken from it.

Using hard links to stores unchanged files is a good idea because they use almost no space but still allow the user to browse the old folders, while Subversion and Time Machine require specific tools to browse previous backups. The downside of this approach is that a file is either stored unchanged or copied entirely, so a single changed character in a file is enough to require the whole file to be stored again: rsync cannot store changes, but only whole files. To save some more space (useful only when working with big files that get changed often) is rdiff-backup, since it works similar to rsync but stores only changes as Subversion. It does not allow the user to freely browse old backup as in the above script.

Update August 2007

I have seen a Leopard screenshot, whith the preferences of Time Machine.

It seems that Time Machine will make hourly backups up do a day, then daily backups, then weekly backups. In other words, it will be just like an automated rsync script, as in my example, with the addition of an automatic removal of old backups.

This will make it fast and very useful.

Update January 2008

I had to replace my hard drive so I used the backup made with rsync. Well, all the data are there, but the custom icons of many folders are lost, so I think rsync did not do a very good job with extended attributes (metadata). Not a big problem of course, this time, but it could have been worse, if used with old applications.

Now I switched to Time Machine, integrated in Leopard. Automatic, faster and with automatic management of old backups.

First version: 30th May 2007.

Author: Olaf Marzocchi