Difference between revisions of "Backing Up Your Data"

From Storrs HPC Wiki
Jump to: navigation, search
(Created page with "= Backing Up Your Data = ''UNDER CONSTRUCTION'' The HPC cluster provides the /archive file system for backing up your data. == Transferring your data == You can transfer da...")
 
Line 9: Line 9:
 
(such as cp, tar, etc) run on the HPC nodes, and is suitable only for small
 
(such as cp, tar, etc) run on the HPC nodes, and is suitable only for small
 
transfers. The fast way uses the Globus service. Globus is about two to five
 
transfers. The fast way uses the Globus service. Globus is about two to five
times faster (depending on system traffic), and it should be used for large
+
times faster, depending on system traffic, and it can reach a transfer speed
transfers.
+
of about 50MB per second.  It should be used for large transfers.
  
 
A NOTE ABOUT GLOBUS:  We discuss using Globus here specifically
 
A NOTE ABOUT GLOBUS:  We discuss using Globus here specifically
 
to transfer data between the Storrs HPC cluster and /archive storage - but
 
to transfer data between the Storrs HPC cluster and /archive storage - but
know that Globus does more.  Globus comprises large network of ''endpoints''
+
Globus does more.  Globus comprises large network of ''endpoints''
 
that span the US, and it rapidly transfer data between any two endpoints.
 
that span the US, and it rapidly transfer data between any two endpoints.
The Storrs HPC Cluster and /archive are actually two components of a single such endpoint,
+
The Storrs HPC Cluster and /archive actually belong to a single such endpoint,
which connects the UConn campus.
+
which connects the UConn campus to the Globus network.
  
 
== Preparing Your Data For Transfer ==
 
== Preparing Your Data For Transfer ==
 
If your data contains many small files (where ''small'' means half a megabyte),
 
If your data contains many small files (where ''small'' means half a megabyte),
then you should '''tar''' you files up
+
then you should '''tar''' your files up
 
into one or more '''tarballs''' and store the tarballs.  Although it involves
 
into one or more '''tarballs''' and store the tarballs.  Although it involves
 
an extra step, this will make faster and easier to transfer you files to
 
an extra step, this will make faster and easier to transfer you files to
Line 57: Line 57:
 
If your data is mostly in large files (larger than half a megabyte), then
 
If your data is mostly in large files (larger than half a megabyte), then
 
you may want to copy your data directly, and not as tarballs.
 
you may want to copy your data directly, and not as tarballs.
 +
 +
== Transfering Files Using Globus ==
 +
See the page [[Globus_Connect|Globus Connect]] for instructions on how to use Globus
 +
 +
== Transferring Files Using the Command Line ==
 +
Once you've obtained a folder on /archive (see [[Data_Storage_Guide|Data Storage Guide]], the last section on Long Term Data Storage)
 +
you can copy your tarballs, or your large files, using one of the standard Unix commands: cp, rsync.

Revision as of 17:29, 19 April 2019

Backing Up Your Data

UNDER CONSTRUCTION

The HPC cluster provides the /archive file system for backing up your data.

Transferring your data

You can transfer data two ways. The slow way uses the standard Unix utilities (such as cp, tar, etc) run on the HPC nodes, and is suitable only for small transfers. The fast way uses the Globus service. Globus is about two to five times faster, depending on system traffic, and it can reach a transfer speed of about 50MB per second. It should be used for large transfers.

A NOTE ABOUT GLOBUS: We discuss using Globus here specifically to transfer data between the Storrs HPC cluster and /archive storage - but Globus does more. Globus comprises large network of endpoints that span the US, and it rapidly transfer data between any two endpoints. The Storrs HPC Cluster and /archive actually belong to a single such endpoint, which connects the UConn campus to the Globus network.

Preparing Your Data For Transfer

If your data contains many small files (where small means half a megabyte), then you should tar your files up into one or more tarballs and store the tarballs. Although it involves an extra step, this will make faster and easier to transfer you files to

and retrieve your files from /archive, because the system can handle the transfer of large tarballs much

more easily than the transfer of many small files. It also make more efficient use of the /archive file system, owing to the design of the underlying hardware.

Here's an example of using tar.

Suppose your data is in 3 directories. You may find it convenient to create a tarball for each directory, as show in this example

# List directory
% ls -l
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data1
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data2
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data2
# Make tarballs
% tar cf data1.tar  data1
% tar cf data2.tar  data2
% tar cf data3.tar  data3

You can then transfer data1.tar, data2.tar and data3.tar to /archive. To recover your original directories:

# Unpack directories
% tar xf data1.tar
% tar xf data2.tar
% tar xf data3.tar

Moving large files

If your data is mostly in large files (larger than half a megabyte), then you may want to copy your data directly, and not as tarballs.

Transfering Files Using Globus

See the page Globus Connect for instructions on how to use Globus

Transferring Files Using the Command Line

Once you've obtained a folder on /archive (see Data Storage Guide, the last section on Long Term Data Storage) you can copy your tarballs, or your large files, using one of the standard Unix commands: cp, rsync.