Backing Up Your Data

From Storrs HPC Wiki
Revision as of 12:38, 10 May 2019 by Jar02014 (talk | contribs) (Transferring your data)
Jump to: navigation, search

Backing Up Your Data


The HPC cluster provides the /archive file system for backing up your data.

Transferring your data

You can transfer data two ways.

  1. The slow way uses the standard Unix utilities (such as cp, tar, etc) run from the HPC nodes. This is suitable only for small transfers.
  1. The fast way uses the Globus service. Globus is about two to five times faster, depending on system traffic, and it can reach a transfer speed of about 50MB per second. It should be used for large transfers.

A NOTE ABOUT GLOBUS: Globus does more than transfer data between the Storrs HPC cluster and /archive storage - it can transfer data within a network of facilities. Globus comprises large network of endpoints that span the US, and it transfers data rapidly between any two endpoints. The Storrs HPC Cluster and /archive are actually two locations connected to a single Globus endpoint serving UConn.

Preparing Your Data For Transfer

If your data contains many small files (where small means half a megabyte), then you should tar your files up into one or more tarballs and store the tarballs. Although it involves an extra step, this will make faster and easier to transfer you files to and retrieve your files from /archive, because the system can handle the transfer of large tarballs much more easily than the transfer of many small files. It also make more efficient use of the /archive file system, owing to the design of the underlying hardware.

Here's an example of using tar.

Suppose your data is in 3 directories. You may find it convenient to create a tarball for each directory, as show in this example

# List directory
% ls -l
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data1
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data2
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data2
# Make tarballs
% tar cf data1.tar  data1
% tar cf data2.tar  data2
% tar cf data3.tar  data3

You can then transfer data1.tar, data2.tar and data3.tar to /archive. To recover your original directories:

# Unpack directories
% tar xf data1.tar
% tar xf data2.tar
% tar xf data3.tar

Moving large files

If your data is mostly in large files (larger than half a megabyte), then you may want to copy your data directly, and not as tarballs.

Transfering Files Using Globus

See the page Globus Connect for instructions on how to use Globus

Transferring Files Using the Command Line

Once you've obtained a folder on /archive (see Data Storage Guide, the last section on Long Term Data Storage) you can copy your tarballs, or your large files, using one of the standard Unix commands: cp, rsync.