Storing Data On Archive

From Storrs HPC Wiki
Revision as of 13:10, 8 August 2017 by Lwm14001 (talk | contribs) (Generating Backups to Store on /archive/)
Jump to: navigation, search

There are several ways to generate and maintain backups, which should be stored on /archive/, and depending on what you'd like to use the backups for, you'll prefer some options over the others. Three of the main factors you have to weigh are time to: generate an archive, transfer said archive to permanent storage, and restore from this archive. You must also consider whether you'd only like the most recent backup of the current state, which is what you'd be doing to make sure data on /scratch is resistant to file system failure. Or, if you'd like older versions to exist, so that if you obliterate some of your results you can just restore from a backup. We keep backups like this for the system configuration directories, and your home directories.

The backup process can be automated, please contact us if you need help automating any of the following options.

Disk Image Style Backups

While you will not be creating an actual disk image, the idea here is to generate an exact copy of the state of all files within a directory at a certain time.

tar -czf backup_name.tar.gz directory_to_make_backup_of/

The main issue with generating backups this way is you have to regenerate the .tar.gz file every time which involves re-indexing and copying of every file. While other methods will re-index they likely will not have to copy all the files over. Though you are moving extra copies to archive with this technique, it can be useful if cluster usage is low, since you can generate the tar ball in an sbatch script. And, as the tar ball is compressed, it will have an easier time travelling through the low bandwidth pipe to /archive/.

You should move the archive to /archive/ for storage, and copy it back onto scratch when you'd like to unpack it. You can extract the archive with:

tar -xvf backup_name.tar.gz

This will extract the archive into the current working directory overwriting files in the directory with their versions in the backup. Check the tar man page for more options.

Rsync Backups

The goal of an rsync based backup is to get the appearance of a mirrored file system, ie the source and destination directories look the same.

rsync -avhu source_dir/ desination_dir/

Will only update files at destination which are older then their source counterparts, meaning that even though your first run of rsync will be as slow/slower than the above tar command, future updates with rsync will be faster.

Restoring from an rsync back up is simple you just swap the source and destination arguments.