Difference between revisions of "Usage policy"

From Storrs HPC Wiki
Jump to: navigation, search
(Permanent Data Storage (long term))
Line 46: Line 46:
 
</div>
 
</div>
  
= HPC Storage (short term) =
+
= Data Storage =
 
+
Please familiarize yourself with the data storage guidelines described on the [[HPC_Getting_Started#HPC_Storage_.28short_term.29|Getting Started]] page. All data that is stored on the cluster is subject to the restricted described on that page, and data that is not in compliance may be removed.
The Storrs HPC cluster has a number of local high performance data storage options available for use during job execution and for the short term storage of job results. None of the cluster storage options listed below should be considered permanent, and should not be used for long term archival of data. Please see the next section below for permanent data storage options that offer greater resiliency.
 
 
 
{| class="wikitable sortable"
 
! Name          !! Path                          !! Size                  !! Performance !! Persistence  !! Backed up? !! Purpose
 
|-
 
| Scratch          || <code>/scratch/scratch2</code> || 438GB shared    || Fastest        || No, '''2 weeks''' || No        || Fast parallel storage for use during computation
 
|-
 
| Node-local || <code>/work</code>            || 100GB          || Fast          || No, '''5 days'''  || No        || Fast storage local to each compute node, globally accessible from <code>/misc/cnXX</code>
 
|-
 
| Home          || <code>~</code>                || 2GB        || Slow      || Yes    || Yes        || Personal storage, available on every node
 
|-
 
| Group        || <code>/shared</code>          || [[:Category:Help|By request]] || Slow || Yes    || Yes        || Short term group storage for collaborative work
 
|}
 
 
 
* Data deletion of directories inside the '''scratch2''' folder is based on modification time.  You will get 3 warnings by email before deletion.
 
* If you try to run <code>ls</code> on either the <code>/home</code>, <code>/shared</code>, or <code>/misc/cnXX</code> directories, you might not see them. They are invisible because they are mounted on demand by <code>autofs</code>, when an attempt is made to access a file under the directory, or using <code>cd</code> to enter the directory structure.
 
* You can [[recover deleted files|recover files on your own from our backed up directories]] using snapshots within 2 weeks. Beyond 2 weeks we may be able to help if you [[:Category:Help|contact us]].
 
* You can check on your [[Cannot write to home directory|home directory quota]].
 
 
 
= Permanent Data Storage (long term) =
 
 
 
The university has multiple options for long term permanent data storage. Once data is no longer needed for computation, it should be transferred to one of these locations. Data transfer to permanent locations should be done from the <code>login.storrs.hpc.uconn.edu</code> login node. Please review the [[File transfer between hosts| file transfer guide]] for helpful information on moving data in and out of the cluster.
 
 
 
{| class="wikitable sortable"
 
! Name !! Path !! Size !! Performance !! Resiliency !! Purpose
 
|-
 
|UITS Research Storage || [[File_transfer_via_SMB|Use smbclient to transfer files]] || [http://uits.uconn.edu/disk-storage-info By request] || Moderate || Data is replicated between two datacenters on Storrs campus || This storage is best used for long term data storage requiring good performance, such as data that will be accessed frequently for post-analysis.
 
|-
 
|Archival cloud storage || <code>/archive</code> || 1.5PB shared || Low || Data is distributed across three datacenter between the Storrs and Farmington campuses || This storage is best for permanent archival of data without frequent access.
 
|-
 
|Departmental/individual storage || [[File_transfer_via_SMB|Use smbclient to transfer files]] || - || - || - || Some departments and/or individual researchers have their own local network storage options. These can be accessed using <code>smbclient</code>
 
|}
 
  
 
= Shared Read-Only Datasets =
 
= Shared Read-Only Datasets =

Revision as of 10:26, 27 April 2016

To be fair to all users of the cluster, please be aware of these resource limits and usage expectations.

Scheduled Jobs

All computational jobs need to be submitted to the cluster using the job scheduler. Please read the SLURM Guide for helpful information on using the scheduler. Listed below are the runtime and resource limits for scheduled jobs.

Job property Standard QoS Limit Longrun QoS Limit Haswell384 QoS Limit
Run time (hours) 36 72 18
Cores / CPUs 48 384
Jobs 8

Unscheduled programs

Programs that are running on login node (login.storrs.hpc.uconn.edu) without using the job scheduler, are subject to certain restrictions. Any program that violates these restrictions may be throttled or terminated without notice.

Run time (minutes) CPU limit Memory limit
20 5% 5%

Below is a list of programs that are allowed on the login node without restrictions:

  • bzip
  • cp
  • du
  • emacs
  • fort
  • gcc
  • gfortran
  • gunzip
  • gzip
  • icc
  • mv
  • sftp
  • smbclient
  • ssh
  • tar
  • vim
  • wget

Data Storage

Please familiarize yourself with the data storage guidelines described on the Getting Started page. All data that is stored on the cluster is subject to the restricted described on that page, and data that is not in compliance may be removed.

Shared Read-Only Datasets

Users who need read-only datasets can contact our administrators (hpc@uconn.edu) to request the dataset. For example, people who study bioinformatics often need reference dataset for different organisms. The reference dataset is usually very large so user can only save them in /scratch. But. it is inconvenient to touch the dataset every 15 days to prevent deletion. If you have such kind of dataset, we can store the dataset for you. The dataset must meet the following requirements:

  • dataset is read-only, cannot be writable or executable
  • dataset is public (can be used by other users) or is restricted to a group of users

The shared dataset is under path: /scratch/scratch2/shareddata/. The data under this directory will be stored permanently. Now we have 4 reference datasets in genome directory: hg19 hg38 mm9 and mm10.

To make the linking path shorter, you can create a soft link with dataset under your home directory. For example:

$ cd 
$ link -s /scratch/scratch2/shareddata/genome ./genome