Difference between revisions of "Slurm Installation on Debian"

From Supercomputación y Cálculo Científico UIS
Line 11: Line 11:
 
     </div>
 
     </div>
 
</div>
 
</div>
 +
 +
<div class="col-md-14">
 +
    <div class="panel panel-darker-white-border">
 +
        <div class="panel-heading">
 +
            <h3 class="panel-title">Slurm Installation on Debian</h3>
 +
        </div>
 +
        <div class="panel-body">
 +
<ol>
 +
<li>Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.</li>
 +
<li>
 +
<p>Install MUNGE</p>
 +
<p>{{Command|<nowiki>apt-get install -y libmunge-dev libmunge2 munge</nowiki>}}</p>
 +
</li>
 +
<li>
 +
<p>Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.</p>
 +
<p>Wait around for some random data (recommended for the paranoid):</p>
 +
<p>{{Command|<nowiki>dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key</nowiki>}}</p>
 +
<p>Grab some pseudorandom data (recommended for the impatient):</p>
 +
<p>{{Command|<nowiki>dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key</nowiki>}}</p>
 +
<p>Permissions</p>
 +
<p>
 +
{{Command|<nowiki>chown munge:munge /etc/munge/munge.key
 +
chmod 400 /etc/munge/munge.key
 +
</nowiki>}}</p>
 +
</li>
 +
<li>
 +
<p>Edit file /etc/passwd</p>
 +
<p>{{Command|<nowiki>vi  /etc/passwd</nowiki>}}</p>
 +
<p>Modify user munge  in each machine</p>
 +
<p>{{File|/etc/passwd|<pre><nowiki>munge:x:501:501::var/run/munge;/sbin/nologin</nowiki></pre>}}</p>
 +
</li>
 +
<li>
 +
<p>Start MUNGE</p>
 +
<p>{{Command|<nowiki>/etc/init.d/munge start</nowiki>}}</p>
 +
</li>
 +
<li>
 +
<p>Testing Munge</p>
 +
<p>The following steps can be performed to verify that the software has been properly installed and configured:</p>
 +
<p>Generate a credential on stdout:</p>
 +
<p>{{Command|<nowiki>munge -n</nowiki>}}</p>
 +
<p>Check if a credential can be locally decoded:</p>
 +
<p>{{Command|<nowiki>munge -n | unmunge</nowiki>}}</p>
 +
<p>Check if a credential can be remotely decoded:</p>
 +
<p>{{Command|<nowiki>munge -n | ssh somehost unmunge</nowiki>}}</p>
 +
<p>Run a quick benchmark: </p>
 +
<p>{{Command|<nowiki>remunge</nowiki>}}</p>
 +
</li>
 +
<li>
 +
<p>Install SLURM from repositories</p>
 +
<p>{{Command|<nowiki>apt-get install -y slurm-wlm slurm-wlm-doc</nowiki>}}</p>
 +
</li>
 +
<li>
 +
<p>Create and copy slurm.conf</p>
 +
<p>Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in <b>/usr/share/doc/slurmctld/slurm-wlm-configurator.html</b></p>
 +
<p>Open this file in your browser</p>
 +
<p>sftp://ip-server/usr/share/doc/slurmctld/slurm-wlm-configurator.html</p>
 +
<p>{{Note|Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.}}</p>
 +
<p>Copy the result from web-based configuration tool in <b>/etc/slurm/slurm.conf</b> and configure it such that it looks like the following (This is a example - build a configuration file customized for your environment)</p>
 +
<p>{{File|/etc/slurm/slurm.conf|<pre><nowiki>
 +
#
 +
# slurm.conf file generated by configurator.html.
 +
#
 +
# See the slurm.conf man page for more information.
 +
#
 +
ClusterName=GUANE
 +
ControlMachine=guane
 +
#
 +
SlurmUser=slurm
 +
SlurmctldPort=6817
 +
SlurmdPort=6818
 +
AuthType=auth/munge
 +
StateSaveLocation=/tmp
 +
SlurmdSpoolDir=/var/spool/slurm/slurmd
 +
SwitchType=switch/none
 +
MpiDefault=none
 +
SlurmctldPidFile=/var/run/slurmctld.pid
 +
SlurmdPidFile=/var/run/slurmd.pid
 +
ProctrackType=proctrack/pgid
 +
CacheGroups=0
 +
ReturnToService=1
 +
#
 +
# TIMERS
 +
SlurmctldTimeout=300
 +
SlurmdTimeout=300
 +
InactiveLimit=0
 +
MinJobAge=300
 +
KillWait=30
 +
Waittime=0
 +
#
 +
# SCHEDULING
 +
SchedulerType=sched/backfill
 +
SelectType=select/linear
 +
FastSchedule=1
 +
#
 +
# LOGGING
 +
SlurmctldDebug=3
 +
SlurmdDebug=3
 +
JobCompType=jobcomp/none
 +
JobCompLoc=/tmp/slurm_job_completion.txt
 +
#
 +
# ACCOUNTING
 +
JobAcctGatherType=jobacct_gather/linux
 +
JobAcctGatherFrequency=30
 +
#
 +
#AccountingStorageType=accounting_storage/slurmdbd
 +
#AccountingStorageHost=slurm
 +
#AccountingStorageLoc=/tmp/slurm_job_accounting.txt
 +
#AccountingStoragePass=
 +
#AccountingStorageUser=
 +
#
 +
# COMPUTE NODES
 +
# control node
 +
NodeName=guane NodeAddr=192.168.1.70 Port=17000 State=UNKNOWN
 +
 +
# each logical node is on the same physical node, so we need different ports for them
 +
# name guane-[*] is arbitrary
 +
NodeName=guane-1 NodeAddr=192.168.1.71 Port=17002 State=UNKNOWN
 +
NodeName=guane-2 NodeAddr=192.168.1.72 Port=17003 State=UNKNOWN
 +
 +
# PARTITIONS
 +
# partition name is arbitrary
 +
PartitionName=guane Nodes=guane-[1-2] Default=YES MaxTime=8-00:00:00 State=UP
 +
</nowiki></pre>}}</p>
 +
</li>
 +
 +
 +
 +
<li>copy munge.key in the nodes - ssh /etc/munge/munge.key root@nodes:/etc/munge/munge.key</li>
 +
<li>Start slurm - /etc/init.d/slurmctld start in the server machine and /etc/init.d/slurmd start in nodes machine</li>
 +
 +
</ol>
 +
        </div>
 +
    </div>
 +
</div>
 +
{{Command|<nowiki>curl http://oar-ftp.imag.fr/oar/oarmaster.asc | sudo apt-key add  -</nowiki>}}
 +
{{File|/etc/oar/oar.conf|<pre><nowiki>
 +
DB_TYPE="mysql"
 +
 +
</nowiki></pre>}}
 +
{{Note|In our cluster guane we have 8 GPUs per node, every node have 24 cores. Therefore we have to use 3 CPU cores to manage one GPU. Thus, you have to modify the lines to do something like that.}}
  
  
Line 112: Line 252:
  
  
        </div>
 
    </div>
 
</div>
 
 
 
 
<div class="col-md-14">
 
    <div class="panel panel-darker-white-border">
 
        <div class="panel-heading">
 
            <h3 class="panel-title">Slurm Installation on Debian</h3>
 
        </div>
 
        <div class="panel-body">
 
<ol>
 
<li>
 
<p>Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.</p>
 
</li>
 
<li>Install slurm from repositories - apt-get install slurm-llnl o apt-get install slurm-wlm</li>
 
<li>Open the local file in a browser <b>sftp://ip-node/usr/share/doc/slurmctld/slurm-wlm-configurator.html</b> where ip-node is the ip from slurmctld machine, and fill out the form. The slurm.conf file is generated and copy in /etc/slurm.conf in each machine</li>
 
<li>Exist two ways for generate munge.key - /usr/sbin/create-munge-key or with dd if=/dev/random bs=1 count=1024 >/etc/munge/munge.key (recommended for the paranoid) and dd if=/dev/urandom bs=1 count=1024 >/etc/munge/munge.key (recommended for the impatient)</li>
 
<li>copy munge.key in the nodes - ssh /etc/munge/munge.key root@nodes:/etc/munge/munge.key</li>
 
<li>Edit file /etc/passwd and modify  munge:x:501:501::var/run/munge;/sbin/nologin in each machine</li>
 
<li>Test munge in local machine - munge -n | unmunge</li>
 
<li>Test munge in nodes machine - munge -n | ssh host1 unmunge</li>
 
<li>Start slurm - /etc/init.d/slurmctld start in the server machine and /etc/init.d/slurmd start in nodes machine</li>
 
 
</ol>
 
 
         </div>
 
         </div>
 
     </div>
 
     </div>
 
</div>
 
</div>

Revision as of 16:19, 6 May 2015


Logo_sc33.png

Slurm Installation

In this section we describe all the administration tasks for the Slurm Workload Manager in the frontend node (Server) and in the compute nodes (Client)

Slurm Installation on Debian

  1. Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.
  2. Install MUNGE

    apt-get install -y libmunge-dev libmunge2 munge

  3. Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.

    Wait around for some random data (recommended for the paranoid):

    dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key

    Grab some pseudorandom data (recommended for the impatient):

    dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key

    Permissions

    chown munge:munge /etc/munge/munge.key chmod 400 /etc/munge/munge.key

  4. Edit file /etc/passwd

    vi /etc/passwd

    Modify user munge in each machine

    File: /etc/passwd
    munge:x:501:501::var/run/munge;/sbin/nologin

  5. Start MUNGE

    /etc/init.d/munge start

  6. Testing Munge

    The following steps can be performed to verify that the software has been properly installed and configured:

    Generate a credential on stdout:

    munge -n

    Check if a credential can be locally decoded:

    munge -n | unmunge

    Check if a credential can be remotely decoded:

    munge -n | ssh somehost unmunge

    Run a quick benchmark:

    remunge

  7. Install SLURM from repositories

    apt-get install -y slurm-wlm slurm-wlm-doc

  8. Create and copy slurm.conf

    Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in /usr/share/doc/slurmctld/slurm-wlm-configurator.html

    Open this file in your browser

    sftp://ip-server/usr/share/doc/slurmctld/slurm-wlm-configurator.html

    NOTE: Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.

    Copy the result from web-based configuration tool in /etc/slurm/slurm.conf and configure it such that it looks like the following (This is a example - build a configuration file customized for your environment)

    File: /etc/slurm/slurm.conf
    #
    # slurm.conf file generated by configurator.html.
    #
    # See the slurm.conf man page for more information.
    #
    ClusterName=GUANE
    ControlMachine=guane
    #
    SlurmUser=slurm
    SlurmctldPort=6817
    SlurmdPort=6818
    AuthType=auth/munge
    StateSaveLocation=/tmp
    SlurmdSpoolDir=/var/spool/slurm/slurmd
    SwitchType=switch/none
    MpiDefault=none
    SlurmctldPidFile=/var/run/slurmctld.pid
    SlurmdPidFile=/var/run/slurmd.pid
    ProctrackType=proctrack/pgid
    CacheGroups=0
    ReturnToService=1
    #
    # TIMERS
    SlurmctldTimeout=300
    SlurmdTimeout=300
    InactiveLimit=0
    MinJobAge=300
    KillWait=30
    Waittime=0
    #
    # SCHEDULING
    SchedulerType=sched/backfill
    SelectType=select/linear
    FastSchedule=1
    #
    # LOGGING
    SlurmctldDebug=3
    SlurmdDebug=3
    JobCompType=jobcomp/none
    JobCompLoc=/tmp/slurm_job_completion.txt
    #
    # ACCOUNTING
    JobAcctGatherType=jobacct_gather/linux
    JobAcctGatherFrequency=30
    #
    #AccountingStorageType=accounting_storage/slurmdbd
    #AccountingStorageHost=slurm
    #AccountingStorageLoc=/tmp/slurm_job_accounting.txt
    #AccountingStoragePass=
    #AccountingStorageUser=
    #
    # COMPUTE NODES
    # control node
    NodeName=guane NodeAddr=192.168.1.70 Port=17000 State=UNKNOWN
    
    # each logical node is on the same physical node, so we need different ports for them
    # name guane-[*] is arbitrary
    NodeName=guane-1 NodeAddr=192.168.1.71 Port=17002 State=UNKNOWN
    NodeName=guane-2 NodeAddr=192.168.1.72 Port=17003 State=UNKNOWN
    
    # PARTITIONS
    # partition name is arbitrary
    PartitionName=guane Nodes=guane-[1-2] Default=YES MaxTime=8-00:00:00 State=UP
    

  9. copy munge.key in the nodes - ssh /etc/munge/munge.key root@nodes:/etc/munge/munge.key
  10. Start slurm - /etc/init.d/slurmctld start in the server machine and /etc/init.d/slurmd start in nodes machine
curl http://oar-ftp.imag.fr/oar/oarmaster.asc | sudo apt-key add -
File: /etc/oar/oar.conf
DB_TYPE="mysql"

NOTE: In our cluster guane we have 8 GPUs per node, every node have 24 cores. Therefore we have to use 3 CPU cores to manage one GPU. Thus, you have to modify the lines to do something like that.


Slurm Installation on Debian

http://wildflower.diablonet.net/~scaron/slurmsetup.html

CONTROLLER CONFIGURATION

Prerequisites

apt-get install -y build-essential

1. Install MUNGE

apt-get install -y libmunge-dev libmunge2 munge

2. Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.

Wait around for some random data (recommended for the paranoid): dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key

Grab some pseudorandom data (recommended for the impatient): dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key

chown munge:munge /etc/munge/munge.key chmod 400 /etc/munge/munge.key

3. Start MUNGE.

/etc/init.d/munge start

4. Testing Munge

The following steps can be performed to verify that the software has been properly installed and configured:

   Generate a credential on stdout: 
   $ munge -n 
   Check if a credential can be locally decoded: 
   $ munge -n | unmunge 
   Check if a credential can be remotely decoded: 
   $ munge -n | ssh somehost unmunge 
   Run a quick benchmark: 
   $ remunge 

5. Install MySQL server (for SLURM accounting) and development tools (to build SLURM). We'll also install the BLCR tools so that SLURM can take advantage of that checkpoint-and-restart functionality.

apt-get install mysql-server libmysqlclient-dev libmysqld-dev libmysqld-pic apt-get install gcc bison make flex libncurses5-dev tcsh pkg-config apt-get install blcr-util blcr-testsuite libcr-dbg libcr-dev libcr0

6. Download lastets version. (http://www.schedmd.com/#repos)

wget http://www.schedmd.com/download/latest/slurm-14.11.6.tar.bz2

7. Unpack and build SLURM.

tar xvf slurm-14.11.6.tar.bz2 cd slurm-14.11.6 ./configure --enable-multiple-slurmd make make install

8. Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in doc/html/configurator.html, this file can opened in the browser if it's copy to /usr/share/ (sftp://ip-server/usr/share/doc/slurm/configurator.html).

mkdir /usr/share/doc/slurm/ cd [slurm-src]/doc/html/ cp configurator.* /usr/share/doc/slurm/

Other way is copy the example configuration files out to /etc/slurm.

mkdir /etc/slurm cd [slurm-src] cp etc/slurm.conf.example /etc/slurm/slurm.conf cp etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf

Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.

9. Set things up for slurmdbd (the SLURM accounting daemon) in MySQL.

mysql -u root -p create database slurm_db; create user 'slurm'@'localhost'; set password for 'slurm'@'localhost' = password('MyPassword'); grant usage on *.* to 'slurm'@'localhost'; grant all privileges on slurm_db.* to 'slurm'@'localhost'; flush privileges; quit