Difference between revisions of "Data Storage Guide"

From Storrs HPC Wiki
Jump to: navigation, search
(HPC Storage (short term))
(Permanent Data Storage (long term))
Line 24: Line 24:
 
= Permanent Data Storage (long term) =
 
= Permanent Data Storage (long term) =
  
The university has multiple options for long term permanent data storage. Once data is no longer needed for computation, it should be transferred to one of these locations. Data transfer to permanent locations should be done from the <code>login.storrs.hpc.uconn.edu</code> login node. Please review the [[File transfer between hosts| file transfer guide]] for helpful information on moving data in and out of the cluster.
+
The university has multiple options for long term permanent data storage. Once data is no longer needed for computation, it should be transferred to one of these locations. Data transfer to permanent locations should be done via the [[Globus_Connect]] service.
  
 
{| class="wikitable"
 
{| class="wikitable"

Revision as of 13:20, 11 April 2017

HPC Storage (short term)

The Storrs HPC cluster has a number of local high performance data storage options available for use during job execution and for the short term storage of job results. None of the cluster storage options listed below should be considered permanent, and should not be used for long term archival of data. Please see the next section below for permanent data storage options that offer greater resiliency.

Name Path Size Relative Performance Persistence Backed up? Purpose
Scratch /scratch 1PB shared Fastest None, deleted after 30 days No Fast parallel storage for use during computation
Node-local /work 40GB Fast None, deleted after 5 days No Fast storage local to each compute node, globally accessible from /misc/cnXX
Home ~ 50GB Slow Yes Twice per week Personal storage, available on every node
Group /shared By request Slow Yes Twice per week Short term group storage for collaborative work

Notes

  • Data deletion inside the scratch folder is based on directory modification time. You will get 3 warnings by email before deletion.
  • Certain directories are only mounted on demand by autofs. These directories are: /home, /shared, and /misc/cnXX. If you try to use shell commands like ls on these directories they may fail. They are only mounted when an attempt is made to access a file under the directory, or using cd to enter the directory structure.
  • You can recover files on your own from our backed up directories using snapshots within 2 weeks.
  • You can check on your home directory quota.
  • There are read-only datasets available at /scratch/shareddata. More information is available on this page.

Permanent Data Storage (long term)

The university has multiple options for long term permanent data storage. Once data is no longer needed for computation, it should be transferred to one of these locations. Data transfer to permanent locations should be done via the Globus_Connect service.

Name Path Size Relative Performance Resiliency Purpose
Archival cloud storage /archive 3PB shared Low Data is distributed across three datacenters between the Storrs and Farmington campuses This storage is best for permanent archival of data without frequent access. NOTE: Users must request access to this resource by either creating a ticket or emailing hpc@uconn.edu.
UITS Research Storage Use smbclient to transfer files By request to UITS Moderate Data is replicated between two datacenters on the Storrs campus This storage is best used for long term data storage requiring good performance, such as data that will be accessed frequently for post-analysis.
Departmental/individual storage Use smbclient to transfer files or SCP utilities - - - Some departments and/or individual researchers have their own local network storage options. These can be accessed using SMB Client or SCP utilities.

Data Transfers using Globus Connect

You can make large data transfers with a service called Globus Connect. This allows you to transfer large data sets between the Storrs HPC and your workstation, or any computer set up as a Globus endpoint. The Globus system is optimized for long-distance data transfers and is particularly useful for sharing data with your collaborators at other institutions.