Computing cluster

Computing cluster 2016-11-08T09:29:59+00:00

IRIC has a dedicated computing cluster and uses the Torque/PBS software to perform job management through its resources. To access it, you must have a Linux account on our servers and use the SSH client in order to connect to the master node of the cluster at the following address:

cluster.iric.ca

All the user directories are exported to this cluster by NFS and you will therefore have access to your data from any node using the same directory (ex: /u/username)

Submitting a job

The job submission is done with the command qsub and by specifying the required resources for the job. Simply put, it is possible to connect to the node interactively using the following commands:

module load torque
qsub -I -l nodes=1:ppn=2,mem=8gb,walltime=4:00:00

This command means you will be connected to the following available node, reserving 2 CPU cores and 8GB of memory, and this for a maximum time of 4 hours, after which you will be disconnected. During this time, you will be able to execute your program interactively on this node, which can be useful when testing analyses. Once you are confident in your analysis, you will want to submit your jobs in batches by creating a script that defines the tasks to be executed, such as:

#!/bin/bash
#PBS -V -l nodes=1:ppn=2,mem=8gb,walltime=4:00:00
tophat …
samtools …

The actual submission of this job, for example named test.pbs, will then be made with the following command:

qsub -d $HOME/work_folder test.pbs

The stdout and stderr outputs of your programs are directed into your working directories. For more details on the qsub parameters, you can look through the official Torque documentation.

Local scratch

For each job launched, a temporary folder is created on the compute node and can be referred by using the $TMPDIR variable. Local disk space ranges in size and thus a supplementary variable can be specified to select a subset node with sufficient disk space for your usage.

qsub -I -l nodes=1:ppn=2:scratch for nodes with 180GB or more of scratch space
qsub -I -l nodes=1:ppn=2:bigscratch for nodes with 350GB or more of scratch space

Monitoring a job

You can display the status of your jobs by using the qstat command. Deleting jobs is done with the qdel command.

qstat
qdel

The general status of the computing cluster can be displayed using:

pbs_free

and though the monitoring tool ganglia: http://intranet.iric.ca/ganglia.

Advanced documentation

Complete documentation on the resource management system can be found in the official Torque documentation:

http://docs.adaptivecomputing.com/torque/6-0-1/help.htm