JRDN
Jason Roysdon dot Net

Preserving long-term backups with minimal storage

January 22nd 2010 in Linux

There are many tools and many ways to accomplish backups of systems. I find that the simpler something is, the less likely it is to fail, or if it does fail I can sort out what is broken that much easier.

There is no backup simpler at its core than a simple copy (cp in *nix or xcopy in Windows) with the right option flags. However, for many things this is too simplistic and isn't the most efficient.

Another very simple method is rsync. Using rsync to compare and make exact copies works great. The only downside is that if something breaks (operator error, update problems, etc.) and you don't know it is broken, and continue with just rsync backups, you'll push that problem right on top of your only backup and when you do realize it is a problem, it's too late.

Building on simplicity, but yet the best functioning tool, is rsnapshot. This tool is really just a collection of perl scripts that use rsync and other standard commands (mv, rm, etc.) to create multiple copies, without consuming much disk space. Each backup is complete by itself, but the total size taken by all of the backups is only the first backup plus all the differences or changes in each additional backup.

Using rsnapshot provides the ability to go back to a given hour, day, week, month, year, and restore something from one of those backups.

The downside to rsnapshot is that by its design, it is meant to run as an automated cronjob, with the storage always connected. However, my backup destinations are not always available or the connections I have not the fastest or the network I'm on not the most convenient to backup from. I use multiple external hard drives that I rotate through, and keep offline and totally disconnected when not backing up. I connect via my cellphone over relatively slow speeds. I'm onsite at customers or at a public Wifi hotspots very often. It may just be that I don't want to bog down my connection to run an remote backup right then because I'm providing remote support. There are many reasons why I don't want the backups running automatically. Plus, there are times that I want to force a backup before a major change, and then run another backup right after a major change. All of these deviate from the way rsnapshot is intended to be run.

But I still want the ability to have a backup so that I can go back up to 7 days, 4 different weekly, 12 different monthly, and multiple years in time. I like the way that rsnapshot automates all of this and just works, and I don't want to reinvent that wheel.

This is the great thing about Open Source Software (OSS) is that you can take what you want and use it the way you like. I went ahead to come up with a method such that I can manually start rsnapshot, but have 7days+4weeks+12months+2years of backups at all times.

I was manually running rsnapshot. The problem is that it expects when you run a weekly backup that you have run 7 daily backups prior to this (daily.0 - daily.6). The same for a monthly backup, it expects 4 weekly backups (weekly.0 - weekly.3) and so on.

While I could have just run 1 "weekly" backup after every 7 "daily" backups, and so on, this was becoming a pain to keep track of. Further, if I wanted to go back 3 weeks or 5 months or whatever, it didn't really work out since I had usually less backup intervals than that. Really it would have just resulted in 25 (7+4+12+2) backups, but no real time/date preservation tied to them.

So the first thing I did was create a method to check to see that daily.6, weekly.3, monthly.11, yearly.0 all exist. If they don't, it runs through the interations to create all of these. All 7+4+12+1 backups are identical (but take up virtually no extra space for 2 or more than it does for 1).

However, having this in place allows me to do some extra hacking of rsnapshot. Next, I check to see if weekly.0 is 7 days old or more. If it is, then I need to force a daily.6 over to weekly.0 If monthly.0 is 30 days old more more, I force a weekly.3 over to it, and so on.

What this does is force weekly, monthly, yearly backups that are updated on those intervals, but no sooner. While, at the same time allows for me to run an infinite amount of daily backups, while keeping just 7, but never spilling over into weekly until the last weekly is in fact 1 week old.

To put it another way, weekly backups pull from the daily once per week, monthly backups pull from weekly once per month, etc., but the daily never pushes into weekly, weekly never pushes into monthly, etc.

As an aside, I use the two-step rsync then rsnapshot method. This uses a stand-alone rsync first, then rsnapshot method for my backups based on the rsync location. This allows me to rsync the data from one location to a backup host first, when both have the bandwidth available, and rsnapshot later on when I have the physical external drive available (or maybe immdiately following, if I have both).

I use rsync over ssh to encrypt my backups while in transit and push them to my backup hosts. One of my backup hosts is just my laptop, which may be anywhere in the world, but is uses an openvpn ssl/tls-based vpn to obtain a publically routable IPv6 address. Others are co-located servers. I can then point any IPv6-connected host to backup to any of my backup hosts, no matter where they may be. I actually have 3-4 backups of each of my systems on different hosts/physical storage mediums in varying physical locations (some of them portable that I keep with me, like 1TB external usb drivers).

When the rsync over ssh completes to a backup host, then when I have the external drive connected, I just kick off an rsnapshot using my custom script to handle daily/weekly/monthly/yearly rotations for each individual backup (one for each server/computer that I backup).

#!/bin/bash

# jjr-snapshot.sh 2010-01-22 1340 jjr
#
# The purpose of this script is to run rsnapshot at non-automated times,
# while preserving the features of a "daily, weekly, monthly, yearly"
# rotation/preservation.
#
# This is done in two steps:
# 1. Do all the required backups exist (daily.6, weekly.3, monthly.11, yearly) and if not, create them.
# 2. Is INTERVAL older than it should be, requiring a rotation?
#
# Step 1.
# weekly requires daily.6, if it doesn't exist, run daily enough times to produce it
# monthly requires weekly.3, if it doesn't exist, run weekly enough times to produce it
# yearly requires monthly.11, if it doesn't exist, run monthly enough times to produce it
#
# Step 2.
# Check to see if it has run this interval (year, month, week, day)
#

if [[ "$1" = "" ]]; then
echo
echo You must specify a valid rsnapshot config file path:
echo
echo \ jjr-snapshot.sh /path/to/rsnapshot.conf
echo
else

# set rconfig, rpath
rconfig="$1"
#
# find the snapshot_root statement in the config:
# grep snapshot_root $1
# ignore commented-out lines:
# | grep -v \#
# print this path:
# | awk '{ print ($2) }'
# remove the trailing "/" in the path (assuming using default .snapshots path)
# | sed 's/.snapshots\//.snapshots/'`"
rpath="`grep snapshot_root $1 | grep -v \# | awk '{ print ($2) }' | sed 's/.snapshots\//.snapshots/'`"

echo rconfig = $rconfig
echo rpath = $rpath

# Make sure there is a backup for yearly:
while [ ! -e $rpath/yearly.0 ];
do
while [ ! -e $rpath/monthly.11 ];
do
while [ ! -e $rpath/weekly.3 ];
do
while [ ! -e $rpath/daily.6 ];
do
rsnapshot -V -c $rconfig daily
done
rsnapshot -V -c $rconfig weekly
done
rsnapshot -V -c $rconfig monthly
done
rsnapshot -V -c $rconfig yearly
done

# Make sure there are backups for monthly:
while [ ! -e $rpath/monthly.11 ];
do
while [ ! -e $rpath/weekly.3 ];
do
while [ ! -e $rpath/daily.6 ];
do
rsnapshot -V -c $rconfig daily
done
rsnapshot -V -c $rconfig weekly
done
rsnapshot -V -c $rconfig monthly
done

# Make sure there are backups for weekly:
while [ ! -e $rpath/weekly.3 ];
do
while [ ! -e $rpath/daily.6 ];
do
rsnapshot -V -c $rconfig daily
done
rsnapshot -V -c $rconfig weekly
done

# Make sure there are backups for daily:
while [ ! -e $rpath/daily.6 ];
do
rsnapshot -V -c $rconfig daily
done

# day: 86400
# week: 604800
# 30-day month: 2592000
# 365-day year: 31536000

# Check to see if latest yearly is older than 1 year, and if so, cause a yearly rotation:
if [ "$(( $(date +"%s") - $(stat -c "%Y" $rpath/yearly.0) ))" -gt "31536000" ]; then
rsnapshot -V -c $rconfig yearly
fi

# Check to see if latest monthly is older than 1 month, and if so, cause a monthly rotation:
if [ "$(( $(date +"%s") - $(stat -c "%Y" $rpath/monthly.0) ))" -gt "2592000" ]; then
rsnapshot -V -c $rconfig monthly
fi

# Check to see if latest weekly is older than 1 week, and if so, cause a weekly rotation:
if [ "$(( $(date +"%s") - $(stat -c "%Y" $rpath/weekly.0) ))" -gt "604800" ]; then
rsnapshot -V -c $rconfig weekly
fi

# Daily is always run (even if it just ran), since the script has been called:
rsnapshot -V -c $rconfig daily

# closing out the if statement for $1 at the top:
fi

This website is IPv6 Stats


One comment to...
“Preserving long-term backups with minimal storage”

[...] also wouldn't use this as my end-all be-all method to backup files. I'd still use some sort of long-term backup solution. You never know if Dropbox might go belly-up, or have a glitch and lose all those [...]




required



required - won't be displayed


Your Comment:

Details of how you can stay abreast of specific Package updates, as the Fedora sites are not as friendly as they could be for those that don't use them daily.

Previous Entry

Bad Domain Registrars, and using Wireshark packet capture filters to look for confirmation emails.

Next Entry