Thibaut Wdowiak's Blog

My own backup script in bash

23 Sep 2008 // projects

Remember a while ago I was writing about backup solutions to work under linux and concluded that a bash script was what I needed. Well, here it is, at last.

Invocation

Main functionnalities and invocation scheme are described in the -h option:

Usage: kipit.sh [actions]
  Performs actions sequentially as specified.``Actions:
  full start a full backup
  incremental start incremental backup since last full backup
  clean remove all backups but the last full backup
  send send backups to sftp server
  shutdown shutdown
Example:
  kipit.sh clean incremental send shutdown

Script format

The script is called kipit.sh. It loads a config file, which is also a bash script that just sets variables. It uses an exclusions file which contains files and directories not to backup. This file is referenced in the config file and is used by the --exclude option of tar. Each functionality of the script is embedded into a function.

Now let's get to the actual functions:

Perform a full backup I just use tar to backup the directory $DIRECTORIES, that I gzip compress into a .tar.gz file. Of course I exclude the directory that contains the backups and some direcories listed in the $EXCLUDE_FROM file.

# Full
function full {
echo -n "Starting full backup..."
tar --exclude=$TARBALL_DIR --exclude-from=$EXCLUDE_FROM -czf $TARBALL_DIR/${TARBALL_BASENAME_FULL}_${TIMESTAMP}.tar.gz $DIRECTORIES
echo "OK"
}

Perform incremental backup, since last full backup First we get the name of the last full backup tarball, which requires some *nix filters. Then we print a list of files that are newer that creation time of the last full backup. We print that to stdout. This stream is used by tar. This tar command is the same as for the full backup except for the -T - which tells tar to take the list of files from stdin.

# Incremental
function incremental {
echo -n "Starting incremental backup..."
# last full backup tarball
last_full_tarball=`ls -lt ${TARBALL_DIR}/*${TARBALL_BASENAME_FULL}*.tar.gz | grep -v "^total" | head -1 - | tr -s ' ' | cut -d' ' -f8`
new_tarball=$TARBALL_DIR/${TARBALL_BASENAME_INCR}_${TIMESTAMP}.tar.gz
# archive only files newer than ctime of the last full backup tarball
find $DIRECTORIES -cnewer $last_full_tarball -type f -print | tar --exclude-from=$EXCLUDE_FROM --exclude=$TARBALL_DIR -cz -T - -f $new_tarball
echo "OK"
}

Tidy up the house!! The goal is to remove useless backups in order to keep only the last full backup and the following incremental backups. It saves space and you don't need to keep those old backups locally. We are not making a time machine here, we are just trying to protect from crashes. We find the tarballs which are not newer (! -newer) than the last full backup and we print their name on stdout. Each line of which is passed on to rm -f via xargs.

# Clean
function clean {
echo -n "Cleaning..."
# remove tarballs older than last full backup tarball
last_full_tarball=`ls -lt ${TARBALL_DIR}/*${TARBALL_BASENAME_FULL}*.tar.gz | grep -v "^total" | head -1 - | tr -s ' ' | cut -d' ' -f8`
find $TARBALL_DIR ! -newer $last_full_tarball -type f -name "*.tar.gz" ! -name "`basename $last_full_tarball`" -print | xargs rm -f
echo "OK"
}

Send this far away What we want to do is send all the backups we gathered locally to a remote location, as a sftp server for instance.  I chose rsync because you can choose to transmit only files not already present remotely and it can restart a transfer where it stopped. This is cool because backups are large files and transfer errors are likely to happen. Say you are in a train station, using a public wifi access point. You start to send your backup home and suddenly you have to hurry up, pull the plug and jump on the train, losing connection. Aha! You need to recover from partial transfer! So, --progress prints info on the progress status of the transfer. --partial is for recovering from partial transfers. -a is for archive mode, like in cp -a. -v means verbose. -h means human readable. -e is to specify the remote shell we use.

# Send tarballs
function send {
echo -n "Sending tarballs..."
rsync --progress --partial -avhe ssh $TARBALL_DIR/ $RSH:$REMOTE_DIR/
echo "OK"
}

Shutdown

#shutdown
function shutdown {
echo "Shuting down..."
shutdown -P 0
}

Future of this code

Wish list:

  • Backup multiple directories
  • Have the credentials for the remote server stored in some way
  • Insert some controls on loading the config file, because for now anything is executable and we are root...ahum...
  • Use a config file from specified from the command line
  • ...

What this script will never be I do not need an automatic backup solution. I do not need that a server thinks for me when I should do my backups. I want to decide when it is time for me to backup. It's like that. Because it is a much simpler solution (the remote server only needs to accept sftp transfers, that is just a basic ssh setup which should already be there), because I get to decide when I want to spoil my bandwidth with backup transfers, and last but not least because my laptop is seldom up. To sum up KISS is the principle that prevailed.