8 minute read

At work I manage a few computers owned by some colleagues. They are not tech-savvy, and I like to tinker around, so everyone is happy. During the last days, one of my colleagues was a little bit upset because his Gaussian calculations always died out in his computer, and so he had to send them to some other computer clusters that our group has access to, and wait for a long time for the jobs to actually start and carry out the calculations.

After we looked a bit into the issue, the cause was that he was launching the calculations as background tasks, in the belief that they would not be interrupted if he closed his ssh session. Obviously, as soon as he interrupted the session (or lost network connectivity to the server, which happens way too often here), the session and the background running tasks died out without issuing any kind of error message, which freaked him out.

Once we diagnosed the problem, I told him that I could install a batch queue manager for him, which would take care of executing his calculations without needing him to have a session open. Besides other obvious benefits, as being able to queue up multiple tasks and have them execute automatically as soon as the required resources are available.

I hesitated a little bit about which batch queue manager to install. Of course, it had to be free and open source. But… What choices did I have?

During my PhD I had installed, configured and managed a couple of Sun Grid Engine instances for our research group, but despite there are still some open source forks around, like Son of Grid Engine or Open Grid Engine, as far as I know, no further development is being done around this old scheduler. Also, for some time, I have also been a user in some clusters which feature Torque + MOAB/MAUI, but I never really liked it. And finally, there is Slurm, which is the scheduler that is being used in the majority of the clusters I have access right know.

As far as I know, Slurm is still actively maintained, it is open sourced, and is highly configurable. It looked like the best choice, and despite I never before made an installation of it, I decided I would give it a try.

This is, more or less how it went:


1. Installation

Well, Slurm has been around for some time, and as I expected, there is a package available for Ubuntu 14.04.5, which is the OS that my colleague’s server is running on. This meant that the software installation was as simple as executing:

$ apt-get install slurm-llnl

As I was only going to install the queue manager in a single machine, I only had to to this once. Slurm is meant to run on multiple machines, in order to have backup master node, several execution nodes, and so on, but we just have this machine available for calculations, so no more nodes :)

2. Configuration

This step was also quite easy: the slurm-llnl package includes a couple of HTML forms to assist in the generation of a slurm.conf file. These two forms are located at

/usr/share/doc/slurm-llnl/slurm-llnl-configurator.easy.html
/usr/share/doc/slurm-llnl/slurm-llnl-configurator.html

In my case the “easy” configurator was enough, as I intended to put up just a very basic system.

EDIT: I have heard that other linux distributions do not have these configurator files in their slurm packages. A similar version can be found online at https://slurm.schedmd.com/configurator.html.

Most of the default settings in the form are ok for a basic setup, but still I will walk you through some of the settings that need to be changed or are nice to be customized.

ControlMachine field

Set it to the name of the machine where you are installing Slurm.

Compute Machines section

This section will result in a configuration line for a single node or an array of similar compute nodes. It has the drawback that the form is limited to allow a single node name or pattern, but more nodes can be added to the configuration file very easily. I will explain once we have generated our initial configuration file.

Besides that limitation, all of the fields in this section are quite self explanatory, and, as mentioned in the form’s text, you can get the information required to fill in the “more technical” fields in this section by executing slurmd -C on the compute node you are configuring.

SLURM User field

This will be the user under whose privileges the batch queue manager will be running under. The default slurm user should have been created when we installed the package, so it is fine to leave it like that.

In case you need to change this field (which I do not recommend), please make sure that the user exists on the system, and has enough permissions for all the Slurm files and directories.

State Preservation section

The default values for these are fine, but in case you want to change the paths, make sure that they exist and your Slurm user has the proper permissions over them.

Scheduling and Interconnect section

You should also be ok with the default values for these sections.

Default MPI Type section

In my case, the default value of “None” is the right one (I will just configure one node, so there will be no communication between nodes), but if you want to configure multiple nodes and need to make use any kind of any Message Passing Interface to communicate between them, you will need to select the proper option depending on your specific setting.

Process Tracking section

The default in this section is to use the Pgid process ID tracking, but as the text mentions, this might lose track of some processes.

In my case, I preferred to use the cgroup option. If I am not wrong, all the requisites to use this option should be installed by default in Ubuntu 14.04.2, and the only thing to do (root required!) would be to add the text

cgroup_enable=memory swapaccount=1

to the GRUB_CMDLINE_LINUX line in /etc/default/grub, and do a grub-update, as recommended in the documentation, to allow Slurm to track memory usage in the submitted jobs.

Also, you should create a /etc/slurm-llnl/cgroup.conf file and give your Slurm user read permission. In my case, the only contents of the file is the line

CgroupAutomount=yes

which allows Slurm to mount the cgroup subsystem if it is not already available. I decided no to include any additional constraints into the configuration, but in case later on I changed my mind, I still would be able to add them to the cgroup configuration file.

Resource Selection section

This configuration option defines which resources will be used to determine how to fit your jobs into the available computing nodes. In our case, I set it to toe CR_CPU_Memory option, to use available processing cores and memory to limit the number of jobs that would be running in or single-node installation.

Task Launch section

To be consistent with the Process Tracking section, I selected the cgroup option to constrain the resources allowed to the jobs.

Event Logging section

As with other paths before, default values should be fine, but in case you decided to change them, make sure that the paths exist, and that your Slurm user has the proper rights on them.

Job Accounting Gather & Job Accounting Storage

As the only user of the system is also its owner, I did see no point in accounting for the resource usage, and left both sections with the default value of None.

Process ID Logging section

Once again, default values are ok, but in case you change, make sure that they exist, and make sure the proper permissions are set.

Once the configuration is complete, we click the “Submit” button at the end of the form, and copy and paste the resulting configuration into /etc/slurm-llnl/slurm.conf to finish the basic configuration. Do not forget to give your Slurm user read permission on it!

Managing other resources

My colleague’s machine has a GPU unit built in, and it would be nice if Slurm could also manage this resource, and schedule jobs that require to use it. To be able to do it, we need to do some manual editing of our configuration file to include the GPU as a Generic Resource:

  1. Edit your /etc/slurm-llnl/slurm.conf, and just before the COMPUTE NODES section, add a new RESOURCES section. Under it, add a line with the following text:
    GresTypes=gpu
    
  2. Add the following text to each of your COMPUTE NODES that has a GPU (or more), substituting # with the number of available GPUs on the node:
     Gres=gpu:#
    
  3. In each of the nodes whose configuration you altered in the previous step, create a gres.conf file under /etc/slurm-llnl/. Inside the file, and for each GPU that the node has available add a line like this:
    Name=gpu File=/dev/nvidia#
    

    where /dev/nvidia# corresponds to the device file associated to the corresponding GPU card. This is needed to keep track of the usage and allocation of the GPU resources by means of cgroup.

  4. Copy the modified /etc/slurm-llnl/slurm.conf to all the nodes in the cluster.
  5. Restart the nodes.

Adding more nodes

In case you are installing more than one calculation node, you should:

  1. Install Slurm to the new compute node(s), issuing sudo apt-get install slurm-llnl
  2. In the compute node(s), execute slurmd -C, and copy the first line of the output (all except the “UpTime” field).
  3. Edit the /etc/slurm-llnl/slurm.conf file (the one in your master node) , and paste the line(s) you just copied under the COMPUTE NODES section.
  4. Copy /etc/slurm-llnl/slurm.conf from your master node to your new compute node(s).
  5. Restart Slurm on all machines.

And that is all, we are ready to enjoy our batch queue manager!

This has been the first time I install a Slurm system, and, despite it actually looked like quite easy, maybe I made some mistake, either in my doing or when writing this post. If you can spot any mistake, or can advise any improvement, please leave me a comment!

Leave a comment