Getting Started
Note: Information on this page is based on the Getting Started page of Grid’5000.
First connection
Connection to the cluster is done using SSH.
As shown in the figure, the steps to use the cluster are as follows:
- Connect to the
frontend
machine. - From this machine, reserve resources, and connect to those resources.
Frontend
Connection to the frontend uses port 12034. To connect using this port use the following command:
ssh -p 12034 login@cluster.di.fct.unl.pt
Additionally, you can create an alias in your ~/.ssh/config
file:
Host dicluster
User LOGIN
HostName cluster.di.fct.unl.pt
Port 12034
Which simplifies this command (and avoids having to remember the custom port number) to:
ssh dicluster
Home Directory
Your home directory is shared across all machines in the cluster (including the frontend), through NFS mounts. Meaning that changes made in any machine are visible to every other.
You can use scp
(or rsync
for better performance) to move data between your computer and you home folder in the cluster. For instance, to copy a file to the cluster, use the following command (do not forget the non-default port):
scp -P 12034 mylocalfile.c login@cluster.di.fct.unl.pt:targetdirectory/targetfile.c
Or, if using the previously suggested alias:
scp mylocalfile.c dicluster:targetdirectory/targetfile.c
Note
If running experiments that make heavy use of the file system, and where performance is critical, we recommend that you avoid writing/reading from your home directory, as multiple users can be accessing the remote file system simultaneously, affecting your experiments.
Instead, create a folder in /tmp
, use that folder for your experiments and, when finished, copy any important data back to your home folder (or to your computer).
Keep in mind that any data left in /tmp
will eventually disappear.
Data Backup
While your home directory is stored in a NAS with a redundant RAID configuration, we do not have backups of your data. It is your responsibility to save important data outside the cluster.Using resources
Reserving and manipulating reservations is done on the frontend
node.
All resource managing is controlled by OAR, including exclusive access to machines, machine reservations, etc.
Visualizing Resources
In order to discover existing resources and view their status, three pages are available:
- The Technical Description page has information about the hardware and network capabilities of all existing resources.
- The Gantt chart page provides an overview of all current and planned resource reservations.
- Monika also provides information about the state of all nodes, including some more detailed information about running (and planned) reservations.
Reserving with OAR
Note
For a more detailed explanation and use-cases, see Advanced OARWhile OAR supports reserving resources at the core level, by default entire hosts (physical machines) are reserved.
To start a reservation, the command oarsub
is used.
Basic interactive reservations
To reserve a single node, in interactive mode, do:
oarsub -I
To reserve three nodes, in interactive mode, do:
oarsub -l nodes=3 -I
or equivalently:
oarsub -l hosts=3 -I
To reserve a single core in a single host, run:
oarsub -l core=1 -I
After submitting the reservation, and as soon as resources are ready, the job will start, and you will be directly connected to the reserved resource (as indicated by the shell prompt).
To terminate your reservation and return to the frontend, simply exit this shell by typing exit or CTRL+d.
Passive reservations
Alternatively, instead of creating an interactive reservation, you can create a reservation and pass it a script using:
oarsub -l nodes=3 "myscript.sh"
In this case, the script will be run on the first node of the reservation, and the reservation will end either when the script finishes, or when the reservation time expires.
The command:
oarsub -C reservation_id
Allows you to access the reservation resources when not using an interactive job.
Additionally, if you want an interactive reservation, but want to avoid accidental termination by closing the shell, you can use:
oarsub "sleep 10d"
And then connect to the job using the previous command.
Reservation duration
If you wish to create a reservation with a different duration that the default one, you can use the walltime
parameter. For instance the command:
oarsub -I -l nodes=2,walltime=2:30
Creates a reservation with the duration of 2 hours and 30 minutes. The format of this parameter is: [hour:min:sec|hour:min|hour]
(walltime=5 => 5 hours, walltime=1:22 => 1 hour 22 minutes, walltime=0:03:30 => 3 minutes, 30 seconds).
Navigating reservation nodes
You can only connect to nodes in your reservation, and only using the oarsh connector to go from one node to the other. The connector supports the same options as the classical ssh command, so it can be used as a replacement for software expecting ssh.
The command (inside a reservation):
uniq $OAR_NODEFILE
Will give you a list of all nodes in you reservation.
Advanced usage
For advanced usage of the oar reservation system, such as creating reservations in advance, changing the expiration date of reservations, fine-grained selection of resources, etc…, refer to Advanced OAR.