Difference between revisions of "Slurm Installation on Debian"
Line 167: | Line 167: | ||
− | <div class="col-md- | + | <div class="col-md-12"> |
<div class="panel panel-midnight-border"> | <div class="panel panel-midnight-border"> | ||
<div class="panel-heading"> | <div class="panel-heading"> | ||
Line 219: | Line 219: | ||
<li> | <li> | ||
− | + | <li> | |
− | + | <p>Install MySQL server (for SLURM accounting) and development tools (to build SLURM). We'll also install the BLCR tools so that SLURM can take advantage of that checkpoint-and-restart functionality.</p> | |
− | apt-get install mysql-server libmysqlclient-dev libmysqld-dev libmysqld-pic | + | <p> |
− | apt-get install gcc bison make flex libncurses5-dev tcsh pkg-config | + | {{Command|<nowiki>apt-get install mysql-server libmysqlclient-dev libmysqld-dev libmysqld-pic </nowiki>}} |
− | apt-get install blcr-util blcr-testsuite libcr-dbg libcr-dev libcr0 | + | {{Command|<nowiki>apt-get install gcc bison make flex libncurses5-dev tcsh pkg-config</nowiki>}} |
− | + | {{Command|<nowiki>apt-get install blcr-util blcr-testsuite libcr-dbg libcr-dev libcr0</nowiki>}} | |
− | + | </p> | |
− | + | <li> | |
− | wget http://www.schedmd.com/download/latest/slurm-14.11.6.tar.bz2 | + | <p>Download lastets version. (http://www.schedmd.com/#repos)</p> |
− | + | <p>{{Command|<nowiki>wget http://www.schedmd.com/download/latest/slurm-14.11.6.tar.bz2</nowiki>}}</p> | |
− | + | </li> | |
− | + | <li> | |
− | tar xvf slurm-14.11.6.tar.bz2 | + | <p>Unpack and build SLURM</p> |
− | cd slurm-14.11.6 | + | <p>{{Command|<nowiki>tar xvf slurm-14.11.6.tar.bz2</nowiki>}} |
− | ./configure --enable-multiple-slurmd | + | {{Command|<nowiki>cd slurm-14.11.6</nowiki>}} |
− | make | + | {{Command|<nowiki>./configure --enable-multiple-slurmd</nowiki>}} |
− | make install | + | {{Command|<nowiki>make</nowiki>}} |
− | + | {{Command|<nowiki>make install</nowiki>}}</p> | |
− | + | </li>. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | <li> | ||
+ | <p>Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in doc/html/configurator.html, this file can opened in the browser if it's copy to /usr/share/ (sftp://ip-server/usr/share/doc/slurm/configurator.html)</p> | ||
+ | <p> | ||
+ | .{{Command|<nowiki>mkdir /usr/share/doc/slurm/</nowiki>}} | ||
+ | {{Command|<nowiki>cd [slurm-src]/doc/html/</nowiki>}} | ||
+ | {{Command|<nowiki>cp configurator.* /usr/share/doc/slurm/</nowiki>}} | ||
+ | </p> | ||
+ | <p>Other way is copy the example configuration files out to /etc/slurm.</p> | ||
+ | <p> | ||
+ | {{Command|<nowiki>mkdir /etc/slurm</nowiki>}} | ||
+ | {{Command|<nowiki>cd [slurm-src]</nowiki>}} | ||
+ | {{Command|<nowiki>cp etc/slurm.conf.example /etc/slurm/slurm.conf</nowiki>}} | ||
+ | {{Command|<nowiki>cp etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf</nowiki>}} | ||
+ | </p> | ||
+ | <p>Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.</p> | ||
+ | <li> | ||
+ | <p>Set things up for slurmdbd (the SLURM accounting daemon) in MySQL</p> | ||
+ | <p> | ||
+ | {{Command|<nowiki>mysql -u root -p</nowiki>}} | ||
+ | {{Command|<nowiki>create database slurm_db;</nowiki>}} | ||
+ | {{Command|<nowiki>create user 'slurm'@'localhost';</nowiki>}} | ||
+ | {{Command|<nowiki>set password for 'slurm'@'localhost' = password('MyPassword');</nowiki>}} | ||
+ | {{Command|<nowiki>grant usage on *.* to 'slurm'@'localhost';</nowiki>}} | ||
+ | {{Command|<nowiki>grant all privileges on slurm_db.* to 'slurm'@'localhost';</nowiki>}} | ||
+ | {{Command|<nowiki>flush privileges;</nowiki>}} | ||
+ | {{Command|<nowiki>quit</nowiki>}} | ||
+ | </p> | ||
Line 273: | Line 276: | ||
</div> | </div> | ||
− | <div class="col-md- | + | <div class="col-md-12"> |
<div class="panel panel-midnight-border"> | <div class="panel panel-midnight-border"> | ||
<div class="panel-heading"> | <div class="panel-heading"> | ||
Line 300: | Line 303: | ||
</div> | </div> | ||
− | <div class="panel-footer"> | + | <div class="panel-footer">Slurm Installation on Debian from source</div> |
</div> | </div> | ||
</div> | </div> |
Revision as of 18:50, 6 May 2015
Slurm Installation
In this section we describe all the administration tasks for the Slurm Workload Manager in the frontend node (Server) and in the compute nodes (Client)
Slurm Installation on Debian from repositories
- Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.
-
Install MUNGE
apt-get install -y libmunge-dev libmunge2 munge -
Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.
Wait around for some random data (recommended for the paranoid):
dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.keyGrab some pseudorandom data (recommended for the impatient):
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.keyPermissions
chown munge:munge /etc/munge/munge.key chmod 400 /etc/munge/munge.key -
Edit file /etc/passwd
vi /etc/passwdModify user munge in each machine
File: /etc/passwdmunge:x:501:501::var/run/munge;/sbin/nologin
-
Start MUNGE
/etc/init.d/munge start -
Testing Munge
The following steps can be performed to verify that the software has been properly installed and configured:
Generate a credential on stdout:
munge -nCheck if a credential can be locally decoded:
munge -n | unmungeCheck if a credential can be remotely decoded:
munge -n | ssh somehost unmungeRun a quick benchmark:
remunge -
Install SLURM from repositories
apt-get install -y slurm-wlm slurm-wlm-doc -
Create and copy slurm.conf
Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in /usr/share/doc/slurmctld/slurm-wlm-configurator.html
Open this file in your browser
sftp://ip-server/usr/share/doc/slurmctld/slurm-wlm-configurator.html
NOTE: Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.Copy the result from web-based configuration tool in /etc/slurm/slurm.conf and configure it such that it looks like the following (This is a example - build a configuration file customized for your environment) - http://slurm.schedmd.com/slurm.conf.html
File: /etc/slurm/slurm.conf# # slurm.conf file generated by configurator.html. # # See the slurm.conf man page for more information. # ClusterName=GUANE ControlMachine=guane # SlurmUser=slurm SlurmctldPort=6817 SlurmdPort=6818 AuthType=auth/munge StateSaveLocation=/tmp SlurmdSpoolDir=/var/spool/slurm/slurmd SwitchType=switch/none MpiDefault=none SlurmctldPidFile=/var/run/slurmctld.pid SlurmdPidFile=/var/run/slurmd.pid ProctrackType=proctrack/pgid CacheGroups=0 ReturnToService=1 # # TIMERS SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 MinJobAge=300 KillWait=30 Waittime=0 # # SCHEDULING SchedulerType=sched/backfill SelectType=select/linear FastSchedule=1 # # LOGGING SlurmctldDebug=3 SlurmdDebug=3 JobCompType=jobcomp/none JobCompLoc=/tmp/slurm_job_completion.txt # # ACCOUNTING JobAcctGatherType=jobacct_gather/linux JobAcctGatherFrequency=30 # #AccountingStorageType=accounting_storage/slurmdbd #AccountingStorageHost=slurm #AccountingStorageLoc=/tmp/slurm_job_accounting.txt #AccountingStoragePass= #AccountingStorageUser= # # COMPUTE NODES # control node NodeName=guane NodeAddr=192.168.1.70 Port=17000 State=UNKNOWN # each logical node is on the same physical node, so we need different ports for them # name guane-[*] is arbitrary NodeName=guane-1 NodeAddr=192.168.1.71 Port=17002 State=UNKNOWN NodeName=guane-2 NodeAddr=192.168.1.72 Port=17003 State=UNKNOWN # PARTITIONS # partition name is arbitrary PartitionName=guane Nodes=guane-[1-2] Default=YES MaxTime=8-00:00:00 State=UP
-
Install munge in each node of cluster
apt-get install -y libmunge-dev libmunge2 munge -
Copy munge.key file from server to each node from cluster
scp /etc/munge/munge.key root@node:/etc/munge/munge.key -
Install SLURM compute node daemon
apt-get install -y slurmd -
Start slurm in the nodes and server
/etc/init.d/slurmd start/etc/init.d/slurmctld start
Slurm Installation on Debian from source
CONTROLLER CONFIGURATION
http://wildflower.diablonet.net/~scaron/slurmsetup.html
-
Prerequisites
apt-get install -y build-essential -
Install MUNGE
apt-get install -y libmunge-dev libmunge2 munge -
Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.
Wait around for some random data (recommended for the paranoid):
dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.keyGrab some pseudorandom data (recommended for the impatient):
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.keyPermissions
chown munge:munge /etc/munge/munge.key chmod 400 /etc/munge/munge.key -
Edit file /etc/passwd
vi /etc/passwdModify user munge in each machine
File: /etc/passwdmunge:x:501:501::var/run/munge;/sbin/nologin
-
Start MUNGE
/etc/init.d/munge start -
Testing Munge
The following steps can be performed to verify that the software has been properly installed and configured:
Generate a credential on stdout:
munge -nCheck if a credential can be locally decoded:
munge -n | unmungeCheck if a credential can be remotely decoded:
munge -n | ssh somehost unmungeRun a quick benchmark:
remunge -
Install MySQL server (for SLURM accounting) and development tools (to build SLURM). We'll also install the BLCR tools so that SLURM can take advantage of that checkpoint-and-restart functionality.
apt-get install mysql-server libmysqlclient-dev libmysqld-dev libmysqld-picapt-get install gcc bison make flex libncurses5-dev tcsh pkg-configapt-get install blcr-util blcr-testsuite libcr-dbg libcr-dev libcr0 -
Download lastets version. (http://www.schedmd.com/#repos)
wget http://www.schedmd.com/download/latest/slurm-14.11.6.tar.bz2 -
Unpack and build SLURM
tar xvf slurm-14.11.6.tar.bz2cd slurm-14.11.6./configure --enable-multiple-slurmdmakemake install.
-
Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in doc/html/configurator.html, this file can opened in the browser if it's copy to /usr/share/ (sftp://ip-server/usr/share/doc/slurm/configurator.html)
.
mkdir /usr/share/doc/slurm/cd [slurm-src]/doc/html/cp configurator.* /usr/share/doc/slurm/Other way is copy the example configuration files out to /etc/slurm.
mkdir /etc/slurmcd [slurm-src]cp etc/slurm.conf.example /etc/slurm/slurm.confcp etc/slurmdbd.conf.example /etc/slurm/slurmdbd.confExecuting the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.
-
Set things up for slurmdbd (the SLURM accounting daemon) in MySQL
mysql -u root -pcreate database slurm_db;create user 'slurm'@'localhost';set password for 'slurm'@'localhost' = password('MyPassword');grant usage on *.* to 'slurm'@'localhost';grant all privileges on slurm_db.* to 'slurm'@'localhost';flush privileges;quit
COMPUTE NODE CONFIGURATION
-
Install munge in each node of cluster
apt-get install -y libmunge-dev libmunge2 munge -
Copy munge.key file from server to each node from cluster
scp /etc/munge/munge.key root@node:/etc/munge/munge.key