Advanced OAR

Detailed usage of OAR

This tutorial presents details on how to use OAR in detail, as well as some tips and tricks. It assumes you are familiar with OAR and the basics of the DI-Cluster usage. If not, please first look at the Getting Started page.

This OAR tutorial focuses on command line usage. It assumes you are using the bash shell (but should be easy to adapt to another shell). It can be read linearly, but you also may pick some random sections. Begin at least by useful tips.

Useful tips

Use screen or tmux so that your work is not lost if you loose the connection to the cluster. Moreover, having a screen session opened with one or more shell sessions allows you to leave your work session when you want then get back to it later and recover it exactly as you leaved it.

Most OAR commands (oarsub, oarstat, oarnodes) can provide output in various formats:

textual (this is the default mode)
PERL dumper (-D)
xml (-X)
yaml (-Y)
json (-J)

Regarding the oarsub command line, you should mostly only see the host word, but the oarsub command can use both the word host or network_address indifferently. Besides, the word node can also be used in place of host or network_address in the -l arguments, but only there.

Connection to a Job

Being connected to a job means that your environment is setup (OAR_JOB_ID and OAR_JOB_KEY_FILE) so that OAR commands can work. You are automatically connected to a job if you have submitted it in interactive mode. Else you must manually connect to it:

$ JOBID=$(oarsub 'sleep 300' | sed -n 's/OAR_JOB_ID=\(.*\)/\1/p')
$ oarsub -C $JOBID
$ pkill -f 'sleep 300'

Connection to the job’s nodes

To connect to nodes of a job you need to use the oarsh command, instead of ssh, and oarcp instead of scp to copy files to/from nodes.

oarsh and job keys

By default, OAR generates an ssh key pair for each job, and oarsh is used to connect the job’s nodes. oarsh looks to environment variables OAR_JOB_ID or OAR_JOB_KEY_FILE to know the key to use. This oarsh works directly if you are connected. You can also connect to the nodes without being connected to the job:

$ oarsub -I
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
OAR_JOB_ID=<JOBID>
...

then, in another terminal:

$ OAR_JOB_ID=<JOBID> oarsh <NODE_NAME>

If needed OAR allows to export the job key of a job.

Note that in this case (single-node job), the command is equivalent to:

$ oarsub -C <JOBID>

Passive and interactive modes

In interactive mode: a shell is opened on the first node of the reservation. In interactive mode, the job will be killed as soon as this job’s shell is closed and will be limited by the job’s walltime. It can also be killed by an explicit oardel.

You can experiment with 3 shells. On the first shell, to see the list of your running jobs, regularly run:

oarstat -u

On the second shell, run an interactive job:

$ oarsub -I

Wait for the job to start, run oarstat, then leave the job, run oarstat again.

Submit another interactive job, and on the third shell, kill it:

$ oardel <JOBID>

In passive mode: an executable is run by oar on the first node of the reservation. If you run:

$ oarsub "hostname" -l/nodes=13

This command reserves 13 nodes in the cluster, but only executes the command hostname in the first node of the reservation. The output of your passive jobs will be located where you execute the command, and can be accessed as:

$ cat OAR.<JOBID>.stdout
$ cat OAR.<JOBID>.stderr

You may not want a job to be interactive or to run a script when the job starts, for example because you will use the reserved resources from a program whose lifecycle is longer than the job (and which will use the resources by connecting to the job). One trick to achieve this is to run the job in passive mode with a long sleep command. One drawback of this method is that the job may terminate with status error if the sleep is killed. This can be a problem in some situations, eg. when using job dependencies.

Submission and Reservation

If you don’t specify the job’s start date (oar option -r), then your job is a submission and oar will choose the best schedule.
If you specify the job’s start date, this is a reservation, oar cannot decide the best schedule anymore, it is fixed.

There are some consequences:

in submission mode you’re almost guaranteed to get your wanted resources, because oar can decide what resources to allocate at the last moment. You cannot get the list of resources until the job starts.
in reservation mode, you’re not guaranteed to get your wanted resources, because oar has to plan the allocation of resources at reservation time. If later resources become not available, you lose them for your job. You can get the list of resources as soon as the reservation starts.
in submission mode, you cannot know the date at which your job will start until it starts. But OAR can give you an estimation of that date.

example: a reservation in one week:

$ oarsub -r "$(date '+%Y-%m-%d %H:%M:%S' --date='+1 week')"

For reservations, there is no interactive mode. You can give oar a command to execute or nothing. If you give it no command, you’ll have to connect to the jobs once the reservation starts.

Getting information about a job

The oarstat command gets jobs informations. By default it lists the current jobs of all users. You can restrict it to your own jobs or someone else’s jobs with option -u:

$ oarstat -u

You can get full details of a job:

$ oarstat -fj <JOBID>

If scripting OAR and regularly polling job states with oarstat, you can cause a high load on the OAR server (because default oarstat invocation causes costly SQL request in the OAR database). In this case, you should use option -s which is optimized and only queries the current state of a given job:

$ oarstat -s -j <JOBID>

Complex resource selection

The complete selector format syntax (oarsub -l option) is:

-l {sql1}/name1=n1/name2=n2+{sql2}/name3=n3/name4=n4/name5=n5+...,walltime=hh:mm:ss

where

sqlN are optional SQL predicates on the resource properties (e.g. cluster, host)
nameN=n are the wanted number of given resources of name nameN (e.g. host, cpu, core).
slashes (/) between resources express resource subtree selection
+ allows aggregating different resource specifications
walltime=hh:mm::ss (separated by a comma) sets the job walltime (expected duration), which defaults to 2 hour

Using the resource hierarchy

ask for 1 core on 15 nodes on a same cluster (total = 15 cores)
```
$ oarsub -I -l /cluster=1/nodes=15/core=1
```
ask for 1 core on 15 nodes on 2 clusters (total = 30 cores)
```
$ oarsub -I -l /cluster=2/nodes=15/core=1
```
ask for 1 core on 2 cpus on 15 nodes on a same cluster (total = 30 cores)
```
$ oarsub -I -l /cluster=1/nodes=15/cpu=2/core=1
```
ask for 10 cpus on 2 clusters (total = 8 cpus, the number of nodes and cores depends on the topology of the machines)
```
$ oarsub -I -l /cluster=2/cpu=4
```

Please mind that the nodes keyword (plural!) is an (historical) alias for host (singular!). A node or host is one server (computer). For instance, -l /cluster=X/nodes=Y/core=Z is exactly the same as -l /cluster=X/host=Y/core=Z.

Selecting nodes from a specific cluster

For example to reserve 2 machines on cluster 1 (nodes from node1-node5):

$ oarsub -I -l {"cluster='1'"}/nodes=2

Or, alternative syntax:

$ oarsub -I -p "cluster='1'" -l /nodes=2

Selecting specific nodes

For example, to reserve 1 machine out of node1, node2, or node3:

$ oarsub -I -l {"host in ('node1', 'node2', 'node3')"}/nodes=1

or, alternative syntax:

$ oarsub -I -p "host in ('node1', 'node2', 'node3')" -l /nodes=1

By negating the SQL clause, you can also exclude some nodes.

Other examples using properties

ask for 10 cores of the node1:

$ oarsub -I -l core=10 -p "host='node1'"

ask for any 4 nodes except node13

$ oarsub -I -p "not host like 'node12'" -l nodes=4

Warning

walltime must always be the last argument of -l <...>

if no resource matches your request, oarsub will exit with the message

Generate a job key...
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
There are not enough resources for your request
OAR_JOB_ID=-5
Oarsub failed: please verify your request syntax or ask for support to your admin.

Retrieving the resources allocated to my job

You can use oarprint, that allows to print nicely the resources of a job.

Retrieving resources from within the job

We first submit a job

$ oarsub -I -l nodes=2
...
OAR_JOB_ID=178361
..
Connect to OAR job 178361 via the node node1
..

Retrieve the host list

We want the list of the nodes we got, identified by unique hostnames

$ oarprint host
node1
node10

(We get 1 line per host, not per core !)

Warning

nodes is a pseudo property: you must use host instead

Retrieve the core list

$ oarprint core
28
26
1
12
4
6
23
245
247
25
3
32
29
27
24
8
248
5
21
20
243
241
244
7
13
22
9
2
18
246
16
19
15
11
242
14
30
17
31
10

Obviously, retrieving OAR internal core Id might not help much. Hence the use of a customized output format

Retrieve core list with host and cpuset id as identifier

We want to identify our cores by their associated host names and cpuset Ids:

$ oarprint core -P host,cpuset
node1 21
node1 23
node1 24
node10 7
node10 4
node1 26
node10 6
node10 2
node1 9
node1 10
node1 1
node1 17
node1 29
node1 30
node1 15
node1 12
node1 28
node1 19
node1 27
node10 1
node1 8
node1 0
node1 20
node1 14
node1 7
node1 13
node1 11
node1 6
node1 3
node1 16
node10 3
node1 5
node1 25
node1 31
node10 0
node10 5
node1 18
node1 22
node1 4
node1 2

Retrieving resources from the submission frontend

If you are not within a job ($OAR_RESOURCE_PROPERTIES_FILE is not defined), running oarprint will give:

$ oarprint
/usr/bin/oarprint: no input data available

In that case, you can however pipe the output of the oarstat command in oarprint, e.g.:

$ oarstat -j <JOB_ID> -p | oarprint core -P host,cpuset -F "%[%]" -f -
node1[1]
node10[0]
node1[15]
node1[14]
node1[2]
node1[6]
node1[17]
node1[9]
node1[26]
node1[13]
node1[31]
node10[3]
node1[10]
node1[21]
node10[2]
node1[28]
node10[6]
node1[22]
node1[25]
node10[1]
node1[24]
node1[27]
node1[19]
node1[16]
node1[3]
node10[5]
node1[29]
node10[4]
node1[0]
node1[4]
node1[8]
node1[30]
node10[7]
node1[11]
node1[18]
node1[5]
node1[12]
node1[7]
node1[23]
node1[20]

Using best effort mode jobs

Best effort job campaign

OAR 2 provides a way to specify that jobs are best effort, which means that the server can delete them if room is needed to fit other jobs. One can submit such jobs using the besteffort type of job.

For instance you can run a job campaign as follows:

for param in $(< ./paramlist); do
    oarsub -t besteffort -l core=1 "./my_script.sh $param"
done

In this example, the file ./paramlist contains a list of parameters for a parametric application.

The following demonstrates the mechanism.

Best effort job mechanism

Running a besteffort job in a first shell

$ oarsub -I -l nodes=10 -t besteffort
[ADMISSION RULE] Automatically redirect in the besteffort queue
[ADMISSION RULE] Automatically add the besteffort constraint on the resources
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
OAR_JOB_ID=988535
Interactive mode : waiting...
Starting...
Connect to OAR job 988535 via the node node11

node11:~$ uniq $OAR_FILE_NODES
node11
node12
node13
node2
node3
node4
node5
node6
node7
node8

Running a non besteffort job on the smae set of resources in a second shell

$ oarsub -I -l {"host in ('node11')"}/nodes=1
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
OAR_JOB_ID=988546
Interactive mode : waiting...
[2018-01-15 13:28:24] Start prediction: 2018-01-15 13:28:24 (FIFO scheduling OK)
Starting...
Connect to OAR job 988546 via the node node11

As expected, meanwhile the best effort job was stopped (watch the first shell):

node11:~$ Connection to node11 closed by remote host.
Connection to node11 closed.
[ERROR] Job was terminated
Disconnected from OAR job 988545

Testing the mechanism of dependency on an anterior job termination

First Job

We run a first interactive job in a first Shell

$oarsub -I
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
OAR_JOB_ID=988571
Interactive mode : waiting...
Starting...
Connect to OAR job 988569 via the node parasilo-28.rennes.grid5000.fr
node1:~$

And leave that job pending.

Second Job

Then we run a second job in another Shell, with a dependence on the first one

$ oarsub -I -a 988571
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
OAR_JOB_ID=988572
Interactive mode : waiting...
[2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK)

Job dependency in action

We do a logout on the first interactive job…

node1:~$ logout
Connection to node1 closed.
Disconnected from OAR job 988571

… then watch the second Shell and see the second job starting

[2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK)
Starting...
Connect to OAR job 988572 via the node node1

Changing the walltime of a running job (oarwalltime)

Starting with OAR version 2.5.8, users can request a change to the walltime (duration of the resource reservation) of a running job. This can be achieved using the oarwalltime command.

This change can be an increase or a decrease, and specified giving either a new walltime value, or an increase value (begin with +) or a decrease value (begin with -).

Please note that a request may stay partially or completely unsatisfied if a next job occupies the resources.

Job must be running for a walltime change. For Waiting job, delete and resubmit.

Command line interface

Querying the walltime change status:

frontend$ oarwalltime 1743185
Walltime change status for job 1743185 (job is running):
  Current walltime:       2:0:0
  Possible increase:  UNLIMITED
  Already granted:        0:0:0
  Pending/unsatisfied:    0:0:0

Requesting the walltime change:

frontend$ oarwalltime 1743185 +1:30
Accepted: walltime change request updated for job 1743185, it will be handled shortly.

Querying right afterward:

frontend$ oarwalltime 1743185
Walltime change status for job 1743185 (job is running):
  Current walltime:       1:0:0
  Possible increase:  UNLIMITED
  Already granted:        0:0:0
  Pending/unsatisfied:  +1:30:0

The request is still to be handled by OAR’s scheduler.

Querying again a bit later:

frontend$ oarwalltime 1743185
Walltime change status for job 1743185 (job is running):
  Current walltime:      2:30:0
  Possible increase:  UNLIMITED
  Already granted:      +1:30:0
  Pending/unsatisfied:    0:0:0

May a job exist on the resources and partially prevent the walltime increase, the query output would be:

frontend$ oarwalltime 1743185
Walltime change status for job 1743185 (job is running):
  Current walltime:      2:30:0
  Possible increase:  UNLIMITED
  Already granted:      +1:10:0
  Pending/unsatisfied:  +0:20:0

Changes events are also reported in oarstat.

See man oarwalltime for more information.

Last modified 27.02.2020