Data Storage Guide
The Storrs HPC cluster has a number of data storage options to meet various needs. There is a high-speed scratch file system, which allows parallel file writing from all compute nodes. All users also get a persistent home directory, and groups of users can request private shared folders. Once data is no longer needed for computation, it should be transferred off of the cluster to a permanent data storage location. To meet this need, the university offers a data archival service that features over three petabytes of capacity. Data transfer to permanent locations should be done via the web-based Globus service.
HPC Storage (short term)
The Storrs HPC cluster has a number of local high performance data storage options available for use during job execution and for the short term storage of job results. None of the cluster storage options listed below should be considered permanent, and should not be used for long term archival of data. Please see the next section below for permanent data storage options that offer greater resiliency.
|Name||Path||Size||Relative Performance||Persistence||Backed up?||Purpose|
||1PB shared||Fastest||None, deleted after 60 days||No||Fast parallel storage for use during computation|
||40GB||Fast||None, deleted after 5 days||No|| Fast storage local to each compute node, globally accessible from |
||50GB||Slow||Yes||Twice per week||Personal storage, available on every node|
||By request||Slow||Yes||Twice per week||Short term group storage for collaborative work|
- Data deletion inside the scratch folder is based on directory modification time. You will get 3 warnings by email before deletion.
- Certain directories are only mounted on demand by
autofs. These directories are:
/misc/cnXX. If you try to use shell commands like
lson these directories they may fail. They are only mounted when an attempt is made to access a file under the directory, or using
cdto enter the directory structure.
- You can check on your home directory quota. (Currently unavailable)
- There are read-only datasets available at
/scratch/shareddata. More information is available on this page.
Long Term Data Storage
Once data is no longer needed for computation, it should be transferred off of the cluster to a permanent data storage location. Do not use the scratch file system (/scratch) for long-term storage; it is optimized for fast parallel access from multiple computers, and is too scarce and too expensive for long-term storage.
If you need more storage than is provided by your /home directory (or /shared directory for those groups that use them), then use the /archive file system. This is a relatively slow but reliable file system. It is protected by being geo-spread between three locations (one in Storrs, and two in Farmington), and your data can survive the loss of any one location.
You must request /archive storage before you use it. To do so, send an email to email@example.com requesting an archive folder for yourself, or if you need an /archive folder for your group, include the group name.
Once you have obtained access to the archive system, you can transfer data two ways. The slow way uses the standard Unix utilities (such as cp, tar, etc) run on the HPC nodes, and is suitable only for small transfers. The fast way uses the Globus service. Globus is about two to five times faster (depending on system traffic), and it should be used for large transfers.
For information on how to best organize your backups, see our page on Backing Up Your Data.