Difference between revisions of "Backing Up Your Data"
(Created page with "= Backing Up Your Data = ''UNDER CONSTRUCTION'' The HPC cluster provides the /archive file system for backing up your data. == Transferring your data == You can transfer da...") |
|||
Line 9: | Line 9: | ||
(such as cp, tar, etc) run on the HPC nodes, and is suitable only for small | (such as cp, tar, etc) run on the HPC nodes, and is suitable only for small | ||
transfers. The fast way uses the Globus service. Globus is about two to five | transfers. The fast way uses the Globus service. Globus is about two to five | ||
− | times faster | + | times faster, depending on system traffic, and it can reach a transfer speed |
− | transfers. | + | of about 50MB per second. It should be used for large transfers. |
A NOTE ABOUT GLOBUS: We discuss using Globus here specifically | A NOTE ABOUT GLOBUS: We discuss using Globus here specifically | ||
to transfer data between the Storrs HPC cluster and /archive storage - but | to transfer data between the Storrs HPC cluster and /archive storage - but | ||
− | + | Globus does more. Globus comprises large network of ''endpoints'' | |
that span the US, and it rapidly transfer data between any two endpoints. | that span the US, and it rapidly transfer data between any two endpoints. | ||
− | The Storrs HPC Cluster and /archive | + | The Storrs HPC Cluster and /archive actually belong to a single such endpoint, |
− | which connects the UConn campus. | + | which connects the UConn campus to the Globus network. |
== Preparing Your Data For Transfer == | == Preparing Your Data For Transfer == | ||
If your data contains many small files (where ''small'' means half a megabyte), | If your data contains many small files (where ''small'' means half a megabyte), | ||
− | then you should '''tar''' | + | then you should '''tar''' your files up |
into one or more '''tarballs''' and store the tarballs. Although it involves | into one or more '''tarballs''' and store the tarballs. Although it involves | ||
an extra step, this will make faster and easier to transfer you files to | an extra step, this will make faster and easier to transfer you files to | ||
Line 57: | Line 57: | ||
If your data is mostly in large files (larger than half a megabyte), then | If your data is mostly in large files (larger than half a megabyte), then | ||
you may want to copy your data directly, and not as tarballs. | you may want to copy your data directly, and not as tarballs. | ||
+ | |||
+ | == Transfering Files Using Globus == | ||
+ | See the page [[Globus_Connect|Globus Connect]] for instructions on how to use Globus | ||
+ | |||
+ | == Transferring Files Using the Command Line == | ||
+ | Once you've obtained a folder on /archive (see [[Data_Storage_Guide|Data Storage Guide]], the last section on Long Term Data Storage) | ||
+ | you can copy your tarballs, or your large files, using one of the standard Unix commands: cp, rsync. |
Revision as of 17:29, 19 April 2019
Contents
Backing Up Your Data
UNDER CONSTRUCTION
The HPC cluster provides the /archive file system for backing up your data.
Transferring your data
You can transfer data two ways. The slow way uses the standard Unix utilities (such as cp, tar, etc) run on the HPC nodes, and is suitable only for small transfers. The fast way uses the Globus service. Globus is about two to five times faster, depending on system traffic, and it can reach a transfer speed of about 50MB per second. It should be used for large transfers.
A NOTE ABOUT GLOBUS: We discuss using Globus here specifically to transfer data between the Storrs HPC cluster and /archive storage - but Globus does more. Globus comprises large network of endpoints that span the US, and it rapidly transfer data between any two endpoints. The Storrs HPC Cluster and /archive actually belong to a single such endpoint, which connects the UConn campus to the Globus network.
Preparing Your Data For Transfer
If your data contains many small files (where small means half a megabyte), then you should tar your files up into one or more tarballs and store the tarballs. Although it involves an extra step, this will make faster and easier to transfer you files to
and retrieve your files from /archive, because the system can handle the transfer of large tarballs much
more easily than the transfer of many small files. It also make more efficient use of the /archive file system, owing to the design of the underlying hardware.
Here's an example of using tar.
Suppose your data is in 3 directories. You may find it convenient to create a tarball for each directory, as show in this example
# List directory % ls -l drwxr-xr-x 5 aaa0000 Domain_Users 4096 Jun 22 2018 data1 drwxr-xr-x 5 aaa0000 Domain_Users 4096 Jun 22 2018 data2 drwxr-xr-x 5 aaa0000 Domain_Users 4096 Jun 22 2018 data2
# Make tarballs % tar cf data1.tar data1 % tar cf data2.tar data2 % tar cf data3.tar data3
You can then transfer data1.tar, data2.tar and data3.tar to /archive. To recover your original directories:
# Unpack directories % tar xf data1.tar % tar xf data2.tar % tar xf data3.tar
Moving large files
If your data is mostly in large files (larger than half a megabyte), then you may want to copy your data directly, and not as tarballs.
Transfering Files Using Globus
See the page Globus Connect for instructions on how to use Globus
Transferring Files Using the Command Line
Once you've obtained a folder on /archive (see Data Storage Guide, the last section on Long Term Data Storage) you can copy your tarballs, or your large files, using one of the standard Unix commands: cp, rsync.