HPC Getting Started

From Storrs HPC Wiki
Revision as of 16:47, 11 June 2012 by Stc07008 (talk | contribs) (Created page with "= Connecting to the cluster = If you don't have an account, learn how to get one here. ==SSH access== Normally to access the cluster resources, [http://w...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Connecting to the cluster

If you don't have an account, learn how to get one here.

SSH access

Normally to access the cluster resources, OpenSSH is used. SSH stands for secure shell. It is the industry standard for remote access and command execution.

For Windows users, the recommended OpenSSH client is PuTTY. Please visit the PuTTY website to download.

The SSH destination is: hornet.engr.uconn.edu

Please note that, for security reasons, we limit SSH connections to on-campus or VPN access only. Access is only available from the UCONN wired network or the UCONN-SECURE wireless network.

If you need off-campus access, you can use the UCONN VPN.

Alternatively, if you have a School of Engineering account, you can first SSH to icarus.engr.uconn.edu, then to the cluster.

Web Interface

Alternately, jobs can also be submitted and controlled an easy-to-use web interface, the Platform Application Center. This is also limited to on-campus and UCONN VPN access. The URL for the Platform-Application Center GUI is https://hornet-pac.engr.uconn.edu.

Submitting Jobs

All job submission, management and scheduling is done using Platform LSF. Below are instructions for how to use LSF when connected to the cluster via SSH.

Always run jobs via LSF with the bsub command. If you do not, your process will be killed.

Job Submission - bsub (docs)

Commands are submitted to lsf using the bsub command. This is known as submitting a job.

  • To submit a job:
bsub {COMMAND}

Interactive

  • If you require an interactive job use the -I, -Ip (for a psuedo-terminal), or -Is (for a interactive script).
bsub -I {COMMAND}

For example,

bsub -Is R

X11 Forwarding

We suggest using your programs via the command-line interface if at all possible. For example, when running the application FLUENT, run it in batch mode.

If you need to interact with a graphical interface (i.e. a window) you can use X11 tunnelling over SSH. This requires changing the way you connect to the cluster.

X11 Forwarding from Linux or Mac OS X

On Linux, desktop distributions will have X11 installed. Mac OS X also uses X11. To connect to the cluster, you only need to add the -X flag.

ssh -X {USERNAME}@hornet@engr.uconn.edu

X11 Forwarding from Windows

Windows does not have X11 installed by default, so this is a little tricky. To do this use Xming. The install is straight forward and works on all versions of Windows. Make sure that Xming is running before you connect to Hornet!

  • Note to user's of Putty - You must enable X11 Forwarding before connecting or else it just won't work.
    • On the left pane go to Connections -> SSH -> X11 and make sure the "Enable X11 Forwarding" is checked.

You can also save the session in the main window once everything works so again you are only a double click away from the cluster.

Submitting GUI jobs via bsub

Assuming X11 is forwarded properly, here is how you submit a job with a graphic user interface

bsub -XF {COMMAND}

Job status - bjobs (docs)

To view the jobs you've submitted

bjobs -u {USERNAME}

This will tell you the JOBID, it status (STAT), which QUEUE it is in, the host you submitted the job from (FROM_HOST), which host it is executing on (EXEC_HOST), the JOB_NAME and the SUBMIT_TIME.

For detailed information about a specific job

bjobs -l {JOBID}


Other useful LSF tricks

To kill your job,

bkill {JOBID}

To kill your job uncleanly, but authoritatively (i.e. send it signal 9)

bkill -s 9 {JOBID}
  • It is suggested that you specify an output file for your program. If not, LSF can write stdout to a file, using the -o flag.
bsub -o {OUTPUT_FILE} {COMMAND}
  • To close your script use ctr-C
  • Remember that the admin team is not responsible for your program being disrupted. Failing to adhere to this protocol can result in loss of data, being kicked off the cluster, and/or a temporary ban.

For more information, check the command reference: http://becat.engr.uconn.edu/oldsite/hpc/pdf/lsf/lsf_command_ref/index.htm

The Platform LSF Knowledge Center is also available http://becat.engr.uconn.edu/oldsite/hpc/pdf/lsf/