Difference between revisions of "Slurm Installation on Debian"

From Supercomputación y Cálculo Científico UIS
Line 222: Line 222:
 
<p>Install MySQL server (for SLURM accounting) and development tools (to build SLURM). We'll also install the BLCR tools so that SLURM can take advantage of that checkpoint-and-restart functionality.</p>
 
<p>Install MySQL server (for SLURM accounting) and development tools (to build SLURM). We'll also install the BLCR tools so that SLURM can take advantage of that checkpoint-and-restart functionality.</p>
 
<p>
 
<p>
{{Command|<nowiki>apt-get install mysql-server libmysqlclient-dev libmysqld-dev libmysqld-pic </nowiki>}}
+
{{Command|<nowiki>apt-get install mysql-server libmysqlclient-dev libmysqld-dev libmysqld-pic  
{{Command|<nowiki>apt-get install gcc bison make flex libncurses5-dev tcsh pkg-config</nowiki>}}
+
apt-get install gcc bison make flex libncurses5-dev tcsh pkg-config
{{Command|<nowiki>apt-get install blcr-util blcr-testsuite libcr-dbg libcr-dev libcr0</nowiki>}}
+
apt-get install blcr-util blcr-testsuite libcr-dbg libcr-dev libcr0</nowiki>}}
 
</p>
 
</p>
 
<li>
 
<li>
Line 233: Line 233:
 
<p>Unpack and build SLURM</p>
 
<p>Unpack and build SLURM</p>
 
<p>{{Command|<nowiki>tar xvf slurm-14.11.6.tar.bz2</nowiki>}}
 
<p>{{Command|<nowiki>tar xvf slurm-14.11.6.tar.bz2</nowiki>}}
{{Command|<nowiki>cd slurm-14.11.6</nowiki>}}
+
cd slurm-14.11.6
{{Command|<nowiki>./configure --enable-multiple-slurmd</nowiki>}}
+
./configure --enable-multiple-slurmd
{{Command|<nowiki>make</nowiki>}}
+
make
{{Command|<nowiki>make install</nowiki>}}</p>
+
make install</nowiki>}}
 +
</p>
 
</li>.
 
</li>.
  
Line 242: Line 243:
 
<p>Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in doc/html/configurator.html, this file can opened in the browser if it's copy to /usr/share/ (sftp://ip-server/usr/share/doc/slurm/configurator.html)</p>
 
<p>Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in doc/html/configurator.html, this file can opened in the browser if it's copy to /usr/share/ (sftp://ip-server/usr/share/doc/slurm/configurator.html)</p>
 
<p>
 
<p>
.{{Command|<nowiki>mkdir /usr/share/doc/slurm/</nowiki>}}
+
.{{Command|<nowiki>mkdir /usr/share/doc/slurm/
{{Command|<nowiki>cd [slurm-src]/doc/html/</nowiki>}}
+
cd [slurm-src]/doc/html/
{{Command|<nowiki>cp configurator.* /usr/share/doc/slurm/</nowiki>}}
+
cp configurator.* /usr/share/doc/slurm/</nowiki>}}
 
</p>
 
</p>
 
<p>Other way is copy the example configuration files out to /etc/slurm.</p>
 
<p>Other way is copy the example configuration files out to /etc/slurm.</p>
 
<p>
 
<p>
 
{{Command|<nowiki>mkdir /etc/slurm</nowiki>}}
 
{{Command|<nowiki>mkdir /etc/slurm</nowiki>}}
{{Command|<nowiki>cd [slurm-src]</nowiki>}}
+
cd [slurm-src]
{{Command|<nowiki>cp etc/slurm.conf.example /etc/slurm/slurm.conf</nowiki>}}
+
cp etc/slurm.conf.example /etc/slurm/slurm.conf
{{Command|<nowiki>cp etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf</nowiki>}}
+
cp etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf</nowiki>}}
 
</p>
 
</p>
 
<p>Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.</p>
 
<p>Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.</p>
Line 257: Line 258:
 
<p>Set things up for slurmdbd (the SLURM accounting daemon) in MySQL</p>
 
<p>Set things up for slurmdbd (the SLURM accounting daemon) in MySQL</p>
 
<p>
 
<p>
{{Command|<nowiki>mysql -u root -p</nowiki>}}
+
{{Command|<nowiki>mysql -u root -p
{{Command|<nowiki>create database slurm_db;</nowiki>}}
+
create database slurm_db;
{{Command|<nowiki>create user 'slurm'@'localhost';</nowiki>}}
+
create user 'slurm'@'localhost';
{{Command|<nowiki>set password for 'slurm'@'localhost' = password('MyPassword');</nowiki>}}
+
set password for 'slurm'@'localhost' = password('MyPassword');
{{Command|<nowiki>grant usage on *.* to 'slurm'@'localhost';</nowiki>}}
+
grant usage on *.* to 'slurm'@'localhost';
{{Command|<nowiki>grant all privileges on slurm_db.* to 'slurm'@'localhost';</nowiki>}}
+
grant all privileges on slurm_db.* to 'slurm'@'localhost';
{{Command|<nowiki>flush privileges;</nowiki>}}
+
flush privileges;
{{Command|<nowiki>quit</nowiki>}}
+
quit</nowiki>}}
 
</p>
 
</p>
  

Revision as of 18:54, 6 May 2015


Logo_sc33.png

Slurm Installation

In this section we describe all the administration tasks for the Slurm Workload Manager in the frontend node (Server) and in the compute nodes (Client)

Slurm Installation on Debian from repositories

  1. Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.
  2. Install MUNGE

    apt-get install -y libmunge-dev libmunge2 munge

  3. Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.

    Wait around for some random data (recommended for the paranoid):

    dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key

    Grab some pseudorandom data (recommended for the impatient):

    dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key

    Permissions

    chown munge:munge /etc/munge/munge.key chmod 400 /etc/munge/munge.key

  4. Edit file /etc/passwd

    vi /etc/passwd

    Modify user munge in each machine

    File: /etc/passwd
    munge:x:501:501::var/run/munge;/sbin/nologin

  5. Start MUNGE

    /etc/init.d/munge start

  6. Testing Munge

    The following steps can be performed to verify that the software has been properly installed and configured:

    Generate a credential on stdout:

    munge -n

    Check if a credential can be locally decoded:

    munge -n | unmunge

    Check if a credential can be remotely decoded:

    munge -n | ssh somehost unmunge

    Run a quick benchmark:

    remunge

  7. Install SLURM from repositories

    apt-get install -y slurm-wlm slurm-wlm-doc

  8. Create and copy slurm.conf

    Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in /usr/share/doc/slurmctld/slurm-wlm-configurator.html

    Open this file in your browser

    sftp://ip-server/usr/share/doc/slurmctld/slurm-wlm-configurator.html

    NOTE: Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.

    Copy the result from web-based configuration tool in /etc/slurm/slurm.conf and configure it such that it looks like the following (This is a example - build a configuration file customized for your environment) - http://slurm.schedmd.com/slurm.conf.html

    File: /etc/slurm/slurm.conf
    #
    # slurm.conf file generated by configurator.html.
    #
    # See the slurm.conf man page for more information.
    #
    ClusterName=GUANE
    ControlMachine=guane
    #
    SlurmUser=slurm
    SlurmctldPort=6817
    SlurmdPort=6818
    AuthType=auth/munge
    StateSaveLocation=/tmp
    SlurmdSpoolDir=/var/spool/slurm/slurmd
    SwitchType=switch/none
    MpiDefault=none
    SlurmctldPidFile=/var/run/slurmctld.pid
    SlurmdPidFile=/var/run/slurmd.pid
    ProctrackType=proctrack/pgid
    CacheGroups=0
    ReturnToService=1
    #
    # TIMERS
    SlurmctldTimeout=300
    SlurmdTimeout=300
    InactiveLimit=0
    MinJobAge=300
    KillWait=30
    Waittime=0
    #
    # SCHEDULING
    SchedulerType=sched/backfill
    SelectType=select/linear
    FastSchedule=1
    #
    # LOGGING
    SlurmctldDebug=3
    SlurmdDebug=3
    JobCompType=jobcomp/none
    JobCompLoc=/tmp/slurm_job_completion.txt
    #
    # ACCOUNTING
    JobAcctGatherType=jobacct_gather/linux
    JobAcctGatherFrequency=30
    #
    #AccountingStorageType=accounting_storage/slurmdbd
    #AccountingStorageHost=slurm
    #AccountingStorageLoc=/tmp/slurm_job_accounting.txt
    #AccountingStoragePass=
    #AccountingStorageUser=
    #
    # COMPUTE NODES
    # control node
    NodeName=guane NodeAddr=192.168.1.70 Port=17000 State=UNKNOWN
    
    # each logical node is on the same physical node, so we need different ports for them
    # name guane-[*] is arbitrary
    NodeName=guane-1 NodeAddr=192.168.1.71 Port=17002 State=UNKNOWN
    NodeName=guane-2 NodeAddr=192.168.1.72 Port=17003 State=UNKNOWN
    
    # PARTITIONS
    # partition name is arbitrary
    PartitionName=guane Nodes=guane-[1-2] Default=YES MaxTime=8-00:00:00 State=UP
    

  9. Install munge in each node of cluster

    apt-get install -y libmunge-dev libmunge2 munge

  10. Copy munge.key file from server to each node from cluster

    scp /etc/munge/munge.key root@node:/etc/munge/munge.key

  11. Install SLURM compute node daemon

    apt-get install -y slurmd

  12. Start slurm in the nodes and server

    /etc/init.d/slurmd start

    /etc/init.d/slurmctld start


Slurm Installation on Debian from source


CONTROLLER CONFIGURATION

http://wildflower.diablonet.net/~scaron/slurmsetup.html

  1. Prerequisites

    apt-get install -y build-essential

  2. Install MUNGE

    apt-get install -y libmunge-dev libmunge2 munge

  3. Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.

    Wait around for some random data (recommended for the paranoid):

    dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key

    Grab some pseudorandom data (recommended for the impatient):

    dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key

    Permissions

    chown munge:munge /etc/munge/munge.key chmod 400 /etc/munge/munge.key

  4. Edit file /etc/passwd

    vi /etc/passwd

    Modify user munge in each machine

    File: /etc/passwd
    munge:x:501:501::var/run/munge;/sbin/nologin

  5. Start MUNGE

    /etc/init.d/munge start

  6. Testing Munge

    The following steps can be performed to verify that the software has been properly installed and configured:

    Generate a credential on stdout:

    munge -n

    Check if a credential can be locally decoded:

    munge -n | unmunge

    Check if a credential can be remotely decoded:

    munge -n | ssh somehost unmunge

    Run a quick benchmark:

    remunge

  7. Install MySQL server (for SLURM accounting) and development tools (to build SLURM). We'll also install the BLCR tools so that SLURM can take advantage of that checkpoint-and-restart functionality.

    apt-get install mysql-server libmysqlclient-dev libmysqld-dev libmysqld-pic apt-get install gcc bison make flex libncurses5-dev tcsh pkg-config apt-get install blcr-util blcr-testsuite libcr-dbg libcr-dev libcr0

  8. Download lastets version. (http://www.schedmd.com/#repos)

    wget http://www.schedmd.com/download/latest/slurm-14.11.6.tar.bz2

  9. Unpack and build SLURM

    tar xvf slurm-14.11.6.tar.bz2

    cd slurm-14.11.6 ./configure --enable-multiple-slurmd make make install</nowiki>}}

  10. .
  11. Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in doc/html/configurator.html, this file can opened in the browser if it's copy to /usr/share/ (sftp://ip-server/usr/share/doc/slurm/configurator.html)

    .

    mkdir /usr/share/doc/slurm/ cd [slurm-src]/doc/html/ cp configurator.* /usr/share/doc/slurm/

    Other way is copy the example configuration files out to /etc/slurm.

    mkdir /etc/slurm

    cd [slurm-src] cp etc/slurm.conf.example /etc/slurm/slurm.conf cp etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf</nowiki>}}

    Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.

  12. Set things up for slurmdbd (the SLURM accounting daemon) in MySQL

    mysql -u root -p create database slurm_db; create user 'slurm'@'localhost'; set password for 'slurm'@'localhost' = password('MyPassword'); grant usage on *.* to 'slurm'@'localhost'; grant all privileges on slurm_db.* to 'slurm'@'localhost'; flush privileges; quit



COMPUTE NODE CONFIGURATION


  1. Install munge in each node of cluster

    apt-get install -y libmunge-dev libmunge2 munge

  2. Copy munge.key file from server to each node from cluster

    scp /etc/munge/munge.key root@node:/etc/munge/munge.key