Slurm Installation on Debian

From Supercomputación y Cálculo Científico UIS
Revision as of 16:11, 5 May 2015 by Ltorres (talk | contribs)


Logo_sc33.png

Slurm Installation

In this section we describe all the administration tasks for the Slurm Workload Manager in the frontend node (Server) and in the compute nodes (Client)


Slurm Installation on Debian

CONTROLLER CONFIGURATION

Prerequisites

apt-get install -y build-essential

1. Install MUNGE

apt-get install -y libmunge-dev libmunge2 munge

2. Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.

Wait around for some random data (recommended for the paranoid): dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key

Grab some pseudorandom data (recommended for the impatient): dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key

chown munge:munge /etc/munge/munge.key chmod 400 /etc/munge/munge.key

3. Start MUNGE.

/etc/init.d/munge start

4. Testing Munge

The following steps can be performed to verify that the software has been properly installed and configured:

   Generate a credential on stdout: 
   $ munge -n 
   Check if a credential can be locally decoded: 
   $ munge -n | unmunge 
   Check if a credential can be remotely decoded: 
   $ munge -n | ssh somehost unmunge 
   Run a quick benchmark: 
   $ remunge 

5. Install MySQL server (for SLURM accounting) and development tools (to build SLURM). We'll also install the BLCR tools so that SLURM can take advantage of that checkpoint-and-restart functionality.

apt-get install mysql-server libmysqlclient-dev libmysqld-dev libmysqld-pic apt-get install gcc bison make flex libncurses5-dev tcsh pkg-config apt-get install blcr-util blcr-testsuite libcr-dbg libcr-dev libcr0

6. Download lastets version. (http://www.schedmd.com/#repos)

wget http://www.schedmd.com/download/latest/slurm-14.11.6.tar.bz2

7. Unpack and build SLURM.

tar xvf slurm-14.11.6.tar.bz2 cd slurm-14.11.6 ./configure --enable-multiple-slurmd make make install

8. Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in doc/html/configurator.html, this file can opened in the browser if it's copy to /usr/share/ (sftp://ip-server/usr/share/doc/slurm/configurator.html).

mkdir /usr/share/doc/slurm/ cd [slurm-src]/doc/html/ cp configurator.* /usr/share/doc/slurm/

Other way is copy the example configuration files out to /etc/slurm.

mkdir /etc/slurm cd [slurm-src] cp etc/slurm.conf.example /etc/slurm/slurm.conf cp etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf

Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.

9. Set things up for slurmdbd (the SLURM accounting daemon) in MySQL.

mysql -u root -p create database slurm_db; create user 'slurm'@'localhost'; set password for 'slurm'@'localhost' = password('MyPassword'); grant usage on *.* to 'slurm'@'localhost'; grant all privileges on slurm_db.* to 'slurm'@'localhost'; flush privileges; quit



Slurm Installation on Debian

  1. Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.

  2. Install slurm from repositories - apt-get install slurm-llnl o apt-get install slurm-wlm
  3. Open the local file in a browser sftp://ip-node/usr/share/doc/slurmctld/slurm-wlm-configurator.html where ip-node is the ip from slurmctld machine, and fill out the form. The slurm.conf file is generated and copy in /etc/slurm.conf in each machine
  4. Exist two ways for generate munge.key - /usr/sbin/create-munge-key or with dd if=/dev/random bs=1 count=1024 >/etc/munge/munge.key (recommended for the paranoid) and dd if=/dev/urandom bs=1 count=1024 >/etc/munge/munge.key (recommended for the impatient)
  5. copy munge.key in the nodes - ssh /etc/munge/munge.key root@nodes:/etc/munge/munge.key
  6. Edit file /etc/passwd and modify munge:x:501:501::var/run/munge;/sbin/nologin in each machine
  7. Test munge in local machine - munge -n | unmunge
  8. Test munge in nodes machine - munge -n | ssh host1 unmunge
  9. Start slurm - /etc/init.d/slurmctld start in the server machine and /etc/init.d/slurmd start in nodes machine