How To RUN your case in OCTOPUS500
(with 768 cores and 2T RAM )
Available Parallel Environments
The new parallel environment is mpiX (where you should replace X by the number of cpus (1 to 10) that you want to use in each compute node).
ex: if you select
-pe mpi5 25
It will launch a run using 5 compute nodes with 5 cpus on each node. This means that the number of slots that you required must be a multiple of X (mpiX):
-pe mpi5 22 —–> will not work
-pe mpi5 10 —–> will work
-pe mpi2 4 —–> will work
-pe mpi2 5 —–> will not work
THE RULE: The maximun number of cpus allowed per node are 10, so take care when submiting your job, its always a good idea to visit Ganglia (http://192.168.92.140/ganglia). Try to be cordial and efficient, if you want to submit a case, say with 5 cpus, and if there is a node with already 5 cpus occupied, you should choose that node with the command -l h=compute-0-x, where x is related to the number of the required node.
I have created a script that should work for all your cases. Just run:
#$ -S /bin/bash
# Set the Parallel Environment and number of procs.
#$ -pe mpi5 10
# The job will run in the actual directory
# Define the name of the job (name that will be displayed)
#$ -N putWhatEverYouWant
# Set your job output file
#$ -o log
# Set your job error file
#$ -e error.err
# Set the priority, default value is 0 (no priority)
#$ -p 0
# Set the email to receive news from the job
#$ -M firstname.lastname@example.org
#$ -m bea
# Put your Job commands here.
. $HOME/OpenFOAM/OpenFOAM-2.0.x/etc/bashrc #Put the correct one!
# Defining openmpi parameters (dont change, this should be ok)
ARGS=”–mca btl ^openib –mca btl_tcp_if_include eth0″
# Solver (the solver that you will run)
mpirun -np $NSLOTS $ARGS $SOLVER -parallel
Note: For running this script you need to be in the actual case folder.
And remember, kill all your process when you leave the cluster and always check ganglia to see cluster load;
http://192.168.92.140/ganglia (Inside UBI network)
Commonly-Used SGE Commands
This page lists some of the more frequently used Sun Grid Engine commands. It does not list all of the options for each command. The man command can be used to see the detailed description of any of these commands. For example, to see a detailed description of the qsub command, enter:
List of Commands:
qsub Submit a Job
qstat Determine the Status of a Job
qhost Display Node Information
qdel Cancel a Job
qhold Place a hold on a queued job to prevent it from running
qrls Release a job held with qhold
The qsub command is used to submit jobs to SGE. The syntax of the qsub command is:
qsub [-cwd] [-v SOME_VAR] [-o path] [-e path] [-M mail_address] [-m mail_options] [-l resources] script
Directs SGE to run the job in the same directory from which you submitted it. Alternatively, you can specify this flag in the SGE command file for the job.
Passes environment variable SOME_VAR to the job. Alternatively, you can specify this flag in the SGE command file for the job.
Redirects stdout from the SGE script. The default is your home directory. Specify /dev/null to disgard SGE messages. Alternatively, you can specify this flag in the SGE command file for the job.
Redirects stderr from the SGE script. The default is your home directory. Specify /dev/null to disgard SGE error messages. Alternatively, you can specify this flag in the SGE command file for the job.
where mail_address is user’s email address. It is always login_id@mail on Hoffman2.
Specifies the circumstances under which mail is to be sent to the job owner defined by -M option. For example options “bea” mean mail is sent at the begining, end, and at abort time (if it happens) of the job. Option “n” means no mail will be sent.
Specifies a list of resouces required for your job, for example memory and time per core:
Either the SGE command file or the script that starts up your job.
The qsub command line switches and options can also be used as active comments or embedded directives in an SGE command file that you submit with the qsub command. Advantages of this approach are: you have a record of what options were used to run your job; you can easily resubmit jobs; and you can use one command file as the basis for creating other similar command files. For example, if the file myjob.cmd contains:
and the qsub command used to submit it is:
qsub -cwd -o path -M login_id@mail -m bea -l h_data=1024M,h_rt=24:00:00 myjob.cmd
then the same result could be achieved by adding the following lines to the myjob.cmd file before the /path/to/executable line:
#$ -o path
#$ -M login_id@mail
#$ -m bea
#$ -l h_data=1024M,h_rt=24:00:00
and submitting the myjob.cmd script with:
After submitting a job with qsub, SGE will respond with something like:
Your job 624556 (“myjob.cmd”) has been submitted
where 624556 is the job number assigned by SGE to your job.
The qstat command displays information about the jobs in the SGE queues, both running and waiting to run. The syntax of the qstat command is:
qstat [-f] [-j job_number] [-U login_id] [-u login_id]
(qstat alone with no arguments) Displays a list of all running and waiting jobs.
-f Displays summary information on each queue as well as the job list.
Displays the status of the job whose job number is job_number
Displays a list of running and waiting jobs for those queues which login_id can access. Or use the groupjobs script for this information; enter groupjobs -help for usage information.
Displays a list of login_id ‘s running and waiting jobs. Or use the myjobs script for this information for your own login_id.
The qhost command displays information about compute nodes: their architectures, number of processors, load, etc. The syntax of the qhost command is:
qhost [-j] [-q]
(qhost alone with no arguments)
Displays a table of information about the compute nodes.
-j Adds information about the specific jobs that are running on each compute node.
-q Shows the queues each compute node accepts.
The qdel command is used to cancel a job either while it is waiting to execute or while it is running. The syntax of the qdel command is:
If a running job does not get cancelled right away, enter:
qdel -f job_number
to force it to be cancelled. Jobs in the “dr” state (disabled running) cannot be cancelled by the job owner. They must be cancelled by a system administrator. “dr” state jobs usually indicate a system hardware problem.