Difference between revisions of "HPC Getting Started"

From Storrs HPC Wiki
Jump to: navigation, search
(Off-campus Access)
(SSH access)
Line 13: Line 13:
  
 
This gives you access to a login node, and you should see a terminal prompt like:
 
This gives you access to a login node, and you should see a terminal prompt like:
  [<Your Net ID>@cn65 ~]$
+
  [<Your Net ID>@cn01 ~]$
  
 
==Off-campus Access==
 
==Off-campus Access==

Revision as of 14:37, 15 November 2016


Connecting to the cluster

If you don't already have an account, please fill out the cluster application form.

To access the cluster resources and send commands, SSH is used. SSH stands for secure shell. It is the industry standard for remote access and command execution.

SSH access

On Mac and GNU/Linux, from the a terminal simply run:

ssh <Your Net ID>@login.storrs.hpc.uconn.edu

Windows users can login using PuTTY.

This gives you access to a login node, and you should see a terminal prompt like:

[<Your Net ID>@cn01 ~]$

Off-campus Access

SSH connections are limited to on-campus addresses from both the wired network and the "UCONN-SECURE" wireless network.

There are three ways to connect to HPC from off campus:

  1. VPN: The UConn VPN is the recommended way to access the Storrs HPC cluster from off campus. Windows and Mac users should follow the instructions on that page for installing the VPN client. Linux users can install OpenConnect version 7 or later and connect to the VPN with: openconnect --juniper sslvpn.uconn.edu
  2. UConn Skybox: Login to a virtual desktop and then access the cluster via PuTTY.
  3. Engineering SSH If you have a School of Engineering account you can login to their SSH relay, icarus.engr.uconn.edu, then SSH to the cluster. This process is outlined as follows:
   [<Your User>@<Your Hostname>]$ ssh <Your Net ID>@icarus.engr.uconn.edu
   [<Your Net ID>@icarus.engr.uconn.edu]$ ssh <Your Net ID>@login.storrs.hpc.uconn.edu

Overview of cluster nodes

There are four classes of nodes available on the HPC cluster, each named after the Intel CPU architecture, listed in the table below.

Configuration of each type of CPU compute node
Name CPU name Nodes Cores per Node Cores Total RAM (GB) CPU Frequency (GHz) Host Names
Haswell Xeon E5-2690 175 24 4,200 128 2.60 cn137 - cn325
Ivy Bridge Xeon E5-2680 32 20 640 128 2.80 cn105 - cn136
Sandy Bridge Xeon E5-2650 40 16 640 64 2.00 cn65 - cn104

GPUs are installed in four Westmere nodes. How to submit jobs to these nodes is described in the GPU Guide.

Configuration of each type of GPU compute node
GPU name Nodes Cards per Node Cores per Card Cores Total RAM on Card (MB) Host Names
NVIDIA Tesla K40m 2 2 2,880 11,520 12GB gpu01, gpu02

Data Storage

HPC Storage (short term)

The Storrs HPC cluster has a number of local high performance data storage options available for use during job execution and for the short term storage of job results. None of the cluster storage options listed below should be considered permanent, and should not be used for long term archival of data. Please see the next section below for permanent data storage options that offer greater resiliency.

Name Path Size Relative Performance Persistence Backed up? Purpose
Scratch /scratch/scratch2 343TB shared Fastest None, deleted after 2 weeks No Fast parallel storage for use during computation
Node-local /work 40GB Fast None, deleted after 5 days No Fast storage local to each compute node, globally accessible from /misc/cnXX
Home ~ 50GB Slow Yes Twice per week Personal storage, available on every node
Group /shared By request Slow Yes Twice per week Short term group storage for collaborative work

Notes

  • Data deletion inside the scratch folder is based on directory modification time. You will get 3 warnings by email before deletion.
  • Certain directories are only mounted on demand by autofs. These directories are: /home, /shared, and /misc/cnXX. If you try to use shell commands like ls on these directories they may fail. They are only mounted when an attempt is made to access a file under the directory, or using cd to enter the directory structure.
  • You can recover files on your own from our backed up directories using snapshots within 2 weeks.
  • You can check on your home directory quota.
  • There are read-only datasets available at /scratch/scratch2/shareddata. More information is available on this page.

Permanent Data Storage (long term)

The university has multiple options for long term permanent data storage. Once data is no longer needed for computation, it should be transferred to one of these locations. Data transfer to permanent locations should be done from the login.storrs.hpc.uconn.edu login node. Please review the file transfer guide for helpful information on moving data in and out of the cluster.

Name Path Size Relative Performance Resiliency Purpose
Archival cloud storage /archive 3PB shared Low Data is distributed across three datacenters between the Storrs and Farmington campuses This storage is best for permanent archival of data without frequent access.
UITS Research Storage Use smbclient to transfer files By request to UITS Moderate Data is replicated between two datacenters on the Storrs campus This storage is best used for long term data storage requiring good performance, such as data that will be accessed frequently for post-analysis.
Departmental/individual storage Use smbclient to transfer files or SCP utilities - - - Some departments and/or individual researchers have their own local network storage options. These can be accessed using SMB Client or SCP utilities.

Submitting Jobs

All job submission, management, and scheduling is done using the job scheduler software SLURM. To learn more about job submission and management, please read our SLURM Guide.

Always run jobs via SLURM. If you do not, your process may be throttled or terminated.

Please read our usage policy for more details.

HPC applications

We have created helpful software guides to demonstrate how to effectively use popular scientific applications on the HPC cluster.

Troubleshooting

For any errors, please read FAQ first. For further assistance, visit the Help page for further resources and contact information for technical support.