Getting Started

This page will help a first-time user to start using the cluster.

Note: Information on this page is based on the Getting Started page of Grid’5000.

First connection

Connection to the cluster is done using SSH.

As shown in the figure, the steps to use the cluster are as follows:

  1. Connect to the frontend machine.
  2. From this machine, reserve resources, and connect to those resources.

Frontend

Connection to the frontend uses port 12034. To connect using this port use the following command:

ssh -p 12034 login@cluster.di.fct.unl.pt

Additionally, you can create an alias in your ~/.ssh/config file:

Host dicluster
  User LOGIN
  HostName cluster.di.fct.unl.pt
  Port 12034

Which simplifies this command (and avoids having to remember the custom port number) to:

ssh dicluster

Home Directory

Your home directory is shared across all machines in the cluster (including the frontend), through NFS mounts. Meaning that changes made in any machine are visible to every other.

You can use scp (or rsync for better performance) to move data between your computer and you home folder in the cluster. For instance, to copy a file to the cluster, use the following command (do not forget the non-default port):

scp -P 12034 mylocalfile.c login@cluster.di.fct.unl.pt:targetdirectory/targetfile.c

Or, if using the previously suggested alias:

scp mylocalfile.c dicluster:targetdirectory/targetfile.c

Using resources

Reserving and manipulating reservations is done on the frontend node.

All resource managing is controlled by OAR, including exclusive access to machines, machine reservations, etc.

Visualizing Resources

In order to discover existing resources and view their status, three pages are available:

  • The Technical Description page has information about the hardware and network capabilities of all existing resources.
  • The Gantt chart page provides an overview of all current and planned resource reservations.
  • Monika also provides information about the state of all nodes, including some more detailed information about running (and planned) reservations.

Reserving with OAR

While OAR supports reserving resources at the core level, by default entire hosts (physical machines) are reserved.

To start a reservation, the command oarsub is used.

Basic interactive reservations

To reserve a single node, in interactive mode, do:

oarsub -I

To reserve three nodes, in interactive mode, do:

oarsub -l nodes=3 -I

or equivalently:

oarsub -l hosts=3 -I

To reserve a single core in a single host, run:

oarsub -l core=1 -I

After submitting the reservation, and as soon as resources are ready, the job will start, and you will be directly connected to the reserved resource (as indicated by the shell prompt).

To terminate your reservation and return to the frontend, simply exit this shell by typing exit or CTRL+d.

Passive reservations

Alternatively, instead of creating an interactive reservation, you can create a reservation and pass it a script using:

oarsub -l nodes=3 "myscript.sh"

In this case, the script will be run on the first node of the reservation, and the reservation will end either when the script finishes, or when the reservation time expires.

The command:

oarsub -C reservation_id

Allows you to access the reservation resources when not using an interactive job.

Additionally, if you want an interactive reservation, but want to avoid accidental termination by closing the shell, you can use:

oarsub "sleep 10d"

And then connect to the job using the previous command.

Reservation duration

If you wish to create a reservation with a different duration that the default one, you can use the walltime parameter. For instance the command:

oarsub -I -l nodes=2,walltime=2:30

Creates a reservation with the duration of 2 hours and 30 minutes. The format of this parameter is: [hour:min:sec|hour:min|hour] (walltime=5 => 5 hours, walltime=1:22 => 1 hour 22 minutes, walltime=0:03:30 => 3 minutes, 30 seconds).

You can only connect to nodes in your reservation, and only using the oarsh connector to go from one node to the other. The connector supports the same options as the classical ssh command, so it can be used as a replacement for software expecting ssh.

The command (inside a reservation):

uniq $OAR_NODEFILE

Will give you a list of all nodes in you reservation.

Advanced usage

For advanced usage of the oar reservation system, such as creating reservations in advance, changing the expiration date of reservations, fine-grained selection of resources, etc…, refer to Advanced OAR.


Last modified 27.09.2023