Compilers & Programs

  • August 6th, 2012
  • by
  • in

Compilers

In addition to the standard GNU compilers, the system has the Intel Cluster Toolkit Compiler Edition installed. It is located in /apps/intel.

This toolkit includes:

  • Intel® C++ Compiler 11.1 Update 6
  • Intel® Fortran Compiler 11.1 Update 6
  • Intel® MPI Library 4.0
  • Intel® Trace Analyzer and Collector 8.0
  • Intel® Math Kernel Library 10.2 Update 5
  • Intel® MPI Benchmarks 3.2.1
  • Intel® Debugger 11.1 Update 6

 

Load Sharing Facility (LFS)

LSF (Load Sharing Facility) is a job scheduling and monitoring software system developed and maintained by Platform Computing. LSF should be used to submit jobs to the cluster. A job is submitted from the login node and will be executed when resources are available on the compute nodes. Jobs which request 2 or fewer processors and 15 minutes or less time are given higher priority and typically run very quickly. Quick turnaround of such small jobs enables users to quickly test potential jobs.

Below are the queues listed and the maximum resources that can be run in each. These queues exist to give a slightly higher priority to those running shorter jobs. The idea is that shorter jobs will clear the queue much faster and therefore have higher priority.

In order to specify a certain queue, you must add the line #BSUB -q short to the bsub script.

  • Debug Queue: The debug queue is designed to simply check your code to see if your settings are correct. This queue has a maximum of 15 minutes’ run time and 2 cores. This queue has the highest priority, and should be used only for fast checks for longer jobs.
  • Short Queue: The short queue is for small jobs that will run relatively quickly. This queue has a maximum of 6 hours run time and 8 cores. This queue has the second highest priority and jobs are limited to 4 hosts at once.
  • Normal Queue: The normal queue is the default queue and will most likely be used for the majority of jobs run. This queue has a maximum of 18 hours run time and 16 cores. This queue is third in priority and jobs are limited to 4 hosts at once.
  • Large Queue: The large queue is for longer jobs that have been tested first in a smaller scale. The queue has a maximum of 48 hours run time and 24 cores. This queue has the lowest priority and jobs may span up to 10 hosts.
  • Exception Queue: The exception queue is for special jobs that cannot be run due to the rules in place for the other queues. This queue has no limits on run time or number of cores. Any jobs submitted to this queue will remain pending until an administrator approves it. Since this queue can potentially tie up the entire cluster, use will only be granted after discussions to ensure it is the best use of computing time. You may e-mail all exception queue requests to hpc@mint.ua.edu.

Common LSF Commands

  • bsub < scriptfile.bsub – Submits job with the options in the .bsub file.
  • bjobs – Shows running and pending jobs
  • bjobs –p – Shows details for any jobs that are pending
  • bhosts – Shows all available hosts and their current job load
  • bqueues – Shows all queues and their statuses
  • bpeek xxxx – Shows current output from the job, where xxxx is the job number. To find the job number, replace xxxx with bjobs
  • bkill xxxx – Kills job specified, where xxxx is the job number. To find the job number, replace xxxx with bjobs

 

VASP

To run VASP with the LSF scheduler, you must use the bsub command and call a script with the job requirements specified. Please note that you must submit the job to the scheduler from the directory you wish to run vasp. Before you run VASP for the first time, please check to make sure you are a member of the VASP group. You can check this by running the command groups i.e., $ groups [your username] vasp. Ensure you see “vasp” in the output. If you do not see vasp in the output and you feel it is an error, please contact us at hpc@mint.ua.edu for assistance.

Please note that VASP is specifically licensed for named individuals in research groups. If you have a VASP license and would like to use it on the MINT HPC, you must submit a copy of your license agreement prior to use. VASP files are located in /apps/vasp/5.2 for the standard version and /apps/vasp/ noncol_5.2 for the non-collinear version. To run VASP in parallel, edit a script like the following and save as vasprun.8.bsub. This script will assume that you are running the job on 8 CPUs (BSUB -n 8) and that you have also set this in the INCAR file “NPAR = 8”. You must make sure the node count in NCAR and in the BSUB file match.

Also, please note that we advise running your VASP jobs on 8 CPUs or less. This will allow the jobs to run on one single machine at a time, greatly increasing speed of the job. You may want to benchmark your jobs at 8 and 16 nodes at first to see if the speed increase is noticeable enough to warrant the extra CPUs.

#!/bin/bash
### My VASP Job #BSUB -J vasprun #BSUB -a intelmpi #BSUB -n 8
### set output files #BSUB -o vasprun.%J.out output file %J=job number #BSUB -e vasprun.%J.err output error file; %J=job number echo "working directory is " $LS_SUBCWD echo -n "Executed on " $LSB_HOSTS # # run application time mpirun.lsf /apps/vasp/5.2.8/vasp

Now submit your script to LSF with:

$ bsub < vasprun.8.bsub

To run VASP in serial, one CPU mode only, please use the following:

$ vasprun.bsub

#!/bin/bash
### job name #BSUB -J vasprun #BSUB -n 1
### set output files #BSUB -o vasprun.%J.out output file %J=job number #BSUB -e vasprun.%J.err output error file; %J=job number echo "working directory is " $LS_SUBCWD echo -n "Executed on " $LSB_HOSTS # # run application time /apps/vasp/5.2.8/vasp

Submit your revised script to LSF with:

$ bsub < vasprun.bsub

After the job runs, your output files will be located in the directory in which you began the job. The screen output file will be named vasprun.xxxx.out and the error log will be vasprun.xxxx.err. The job number will be xxxx.

 

Matlab (Non-Graphical (Batch) Matlab for Single CPU Execution)

To run MATLAB with the LSF scheduler, you must use the bsub command and call a script with the job requirements specified. In order to prevent LSF from running more MATLAB jobs than there are licenses available (5 at present), a MATLAB license resource usage and monitoring scheme has been implemented in LSF. This is indicated by argument to the -R flag in the “matlabtest1” script below which calls a MATLAB script called matlabfile.m. If you attempt to run a Matlab job that exceeds the license count, your job will remain in the queue until sufficient licenses are available. Example BSUB script matlab.bsub for standalone Matlab:


#!/bin/bash
### Matlab Example #BSUB -J matlabtest1 #BSUB -n 8 #BSUB -R "span[hosts=1]rusage[matlab=1]"
### set output files #BSUB -o matlab.%J.out output file %J=job number #BSUB -e matlab.%J.err output error file; %J=job number echo "working directory is " $LS_SUBCWD echo -n "Executed on " $LSB_HOSTS # # run application time /apps/mathworks/bin/matlab -nodisplay < matlabfile.m

Now submit your script to the batch system with:

$ bsub < matlab.bsub

After the job runs, your output files will be located in the directory in which you began the job. The screen output file will be named matlab.xxxx.out and the error log will be matlab.xxxx.err. The job number will be xxxx.

 

Other Programs

If you are running and programs not specifically stated in this document, you may e-mail hpc@mint.ua.edu and we will gladly make bsub file optimized for whatever program you may be running.

 

Additional Help

For assistance with items that are not addressed in this document, please contact hpc@mint.ua.edu.