Difference between revisions of "Backing Up Your Data"

From Storrs HPC Wiki
Jump to: navigation, search
(Backing Up Your Data)
(Preparing Your Data For Transfer)
Line 20: Line 20:
  
 
== Preparing Your Data For Transfer ==
 
== Preparing Your Data For Transfer ==
If your data contains many small files (where ''small'' means half a megabyte or less),
+
'''''If your data contains many small files''''' (that is, half a megabyte or less),
 
then you should '''tar''' your files up
 
then you should '''tar''' your files up
into one or more '''tarballs''' and store the tarballs. Although it involves
+
into one or more '''tarballs''' and store the tarballs.
an extra step, this will make it faster and easier to transfer your files to
+
Although it involves
 +
an extra step, '''''this will make it faster and easier to transfer your files''''' to
 
and from /archive, because the system can handle the transfer of large tarballs much
 
and from /archive, because the system can handle the transfer of large tarballs much
more easily than the transfer of many small files.  It also make more efficient
+
more easily than the transfer of many small files.   
use of space on the /archive file system, owing to the design of the underlying
+
'''''It will also make more efficient use of space''''' on the /archive file system, owing to the design of the underlying
 
software.
 
software.
  

Revision as of 12:55, 10 May 2019

Backing Up Your Data

The HPC cluster provides the /archive file system for backing up your data. You must request /archive storage before you use it. To do so, send an email to hpc@uconn.edu requesting an archive folder for yourself, or if you need an /archive folder for your group, include the group name.

Transferring your data

You can transfer data to /archive in two ways.

  1. The slow way uses the standard Unix utilities (such as cp, tar, etc) run from the HPC nodes. This is suitable only for small transfers.
  1. The fast way uses the Globus service. Globus is about two to five times faster, depending on system traffic, and it can reach a transfer speed of about 50MB per second. It should be used for large transfers.

A NOTE ABOUT GLOBUS: Globus does more than transfer data between the Storrs HPC cluster and /archive storage - it can transfer data within a network of facilities. Globus comprises large network of endpoints that span the US, and it transfers data rapidly between any two endpoints. The Storrs HPC Cluster and /archive are actually two locations connected to a single Globus endpoint serving UConn.

You read about Globus here.

Preparing Your Data For Transfer

If your data contains many small files (that is, half a megabyte or less), then you should tar your files up into one or more tarballs and store the tarballs. Although it involves an extra step, this will make it faster and easier to transfer your files to and from /archive, because the system can handle the transfer of large tarballs much more easily than the transfer of many small files. It will also make more efficient use of space on the /archive file system, owing to the design of the underlying software.

Here's an example of using tar.

Suppose your data is in 3 directories. You may find it convenient to create a tarball for each directory, as show in this example

# List directory
% ls -l
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data1
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data2
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data2
# Make tarballs
% tar cf data1.tar  data1
% tar cf data2.tar  data2
% tar cf data3.tar  data3

You can then transfer data1.tar, data2.tar and data3.tar to /archive. To recover your original directories:

# Unpack directories
% tar xf data1.tar
% tar xf data2.tar
% tar xf data3.tar

Moving large files

If your data is mostly in large files (larger than half a megabyte), then you may want to copy your data directly, and not as tarballs.

Transfering Files Using Globus

See the page Globus Connect for instructions on how to use Globus

Transferring Files Using the Command Line

Once you've obtained a folder on /archive (see Data Storage Guide, the last section on Long Term Data Storage) you can copy your tarballs, or your large files, using one of the standard Unix commands: cp, rsync.