Difference between revisions of "Slurm Installation on Debian"
Line 18: | Line 18: | ||
</div> | </div> | ||
<div class="panel-body"> | <div class="panel-body"> | ||
− | <ol> | + | <ol> |
− | <li>Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.</li> | + | <li>Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.</li> |
− | <li> | + | <li> |
− | <p>Install MUNGE</p> | + | <p>Install MUNGE</p> |
− | <p>{{Command|<nowiki>apt-get install -y libmunge-dev libmunge2 munge</nowiki>}}</p> | + | <p>{{Command|<nowiki>apt-get install -y libmunge-dev libmunge2 munge</nowiki>}}</p> |
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p>Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.</p> | + | <p>Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.</p> |
− | <p>Wait around for some random data (recommended for the paranoid):</p> | + | <p>Wait around for some random data (recommended for the paranoid):</p> |
− | <p>{{Command|<nowiki>dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key</nowiki>}}</p> | + | <p>{{Command|<nowiki>dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key</nowiki>}}</p> |
− | <p>Grab some pseudorandom data (recommended for the impatient):</p> | + | <p>Grab some pseudorandom data (recommended for the impatient):</p> |
− | <p>{{Command|<nowiki>dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key</nowiki>}}</p> | + | <p>{{Command|<nowiki>dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key</nowiki>}}</p> |
− | <p>Permissions</p> | + | <p>Permissions</p> |
− | <p> | + | <p> |
− | {{Command|<nowiki>chown munge:munge /etc/munge/munge.key | + | {{Command|<nowiki>chown munge:munge /etc/munge/munge.key |
chmod 400 /etc/munge/munge.key | chmod 400 /etc/munge/munge.key | ||
</nowiki>}}</p> | </nowiki>}}</p> | ||
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p>Edit file /etc/passwd</p> | + | <p>Edit file /etc/passwd</p> |
− | <p>{{Command|<nowiki>vi /etc/passwd</nowiki>}}</p> | + | <p>{{Command|<nowiki>vi /etc/passwd</nowiki>}}</p> |
− | <p>Modify user munge in each machine</p> | + | <p>Modify user munge in each machine</p> |
− | <p>{{File|/etc/passwd|<pre><nowiki>munge:x:501:501::var/run/munge;/sbin/nologin</nowiki></pre>}}</p> | + | <p>{{File|/etc/passwd|<pre><nowiki>munge:x:501:501::var/run/munge;/sbin/nologin</nowiki></pre>}}</p> |
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p>Start MUNGE</p> | + | <p>Start MUNGE</p> |
− | <p>{{Command|<nowiki>/etc/init.d/munge start</nowiki>}}</p> | + | <p>{{Command|<nowiki>/etc/init.d/munge start</nowiki>}}</p> |
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p>Testing Munge</p> | + | <p>Testing Munge</p> |
− | <p>The following steps can be performed to verify that the software has been properly installed and configured:</p> | + | <p>The following steps can be performed to verify that the software has been properly installed and configured:</p> |
− | <p>Generate a credential on stdout:</p> | + | <p>Generate a credential on stdout:</p> |
− | <p>{{Command|<nowiki>munge -n</nowiki>}}</p> | + | <p>{{Command|<nowiki>munge -n</nowiki>}}</p> |
− | <p>Check if a credential can be locally decoded:</p> | + | <p>Check if a credential can be locally decoded:</p> |
− | <p>{{Command|<nowiki>munge -n | unmunge</nowiki>}}</p> | + | <p>{{Command|<nowiki>munge -n | unmunge</nowiki>}}</p> |
− | <p>Check if a credential can be remotely decoded:</p> | + | <p>Check if a credential can be remotely decoded:</p> |
− | <p>{{Command|<nowiki>munge -n | ssh somehost unmunge</nowiki>}}</p> | + | <p>{{Command|<nowiki>munge -n | ssh somehost unmunge</nowiki>}}</p> |
− | <p>Run a quick benchmark: </p> | + | <p>Run a quick benchmark: </p> |
− | <p>{{Command|<nowiki>remunge</nowiki>}}</p> | + | <p>{{Command|<nowiki>remunge</nowiki>}}</p> |
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p>Install SLURM from repositories</p> | + | <p>Install SLURM from repositories</p> |
− | <p>{{Command|<nowiki>apt-get install -y slurm-wlm slurm-wlm-doc</nowiki>}}</p> | + | <p>{{Command|<nowiki>apt-get install -y slurm-wlm slurm-wlm-doc</nowiki>}}</p> |
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p>Create and copy slurm.conf</p> | + | <p>Create and copy slurm.conf</p> |
− | <p>Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in <b>/usr/share/doc/slurmctld/slurm-wlm-configurator.html</b></p> | + | <p>Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in <b>/usr/share/doc/slurmctld/slurm-wlm-configurator.html</b></p> |
− | <p>Open this file in your browser</p> | + | <p>Open this file in your browser</p> |
− | <p>sftp://ip-server/usr/share/doc/slurmctld/slurm-wlm-configurator.html</p> | + | <p>sftp://ip-server/usr/share/doc/slurmctld/slurm-wlm-configurator.html</p> |
− | <p>{{Note|Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.}}</p> | + | <p>{{Note|Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.}}</p> |
− | <p>Copy the result from web-based configuration tool in <b>/etc/slurm/slurm.conf</b> and configure it such that it looks like the following (This is a example - build a configuration file customized for your environment) - http://slurm.schedmd.com/slurm.conf.html</p> | + | <p>Copy the result from web-based configuration tool in <b>/etc/slurm/slurm.conf</b> and configure it such that it looks like the following (This is a example - build a configuration file customized for your environment) - http://slurm.schedmd.com/slurm.conf.html</p> |
− | <p>{{File|/etc/slurm/slurm.conf|<pre><nowiki> | + | <p>{{File|/etc/slurm/slurm.conf|<pre><nowiki> |
# | # | ||
# slurm.conf file generated by configurator.html. | # slurm.conf file generated by configurator.html. | ||
Line 134: | Line 134: | ||
PartitionName=guane Nodes=guane-[1-2] Default=YES MaxTime=8-00:00:00 State=UP | PartitionName=guane Nodes=guane-[1-2] Default=YES MaxTime=8-00:00:00 State=UP | ||
</nowiki></pre>}}</p> | </nowiki></pre>}}</p> | ||
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p>Install munge in each node of cluster</p> | + | <p>Install munge in each node of cluster</p> |
− | <p>{{Command|<nowiki>apt-get install -y libmunge-dev libmunge2 munge</nowiki>}}</p> | + | <p>{{Command|<nowiki>apt-get install -y libmunge-dev libmunge2 munge</nowiki>}}</p> |
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p>Copy munge.key file from server to each node from cluster</p> | + | <p>Copy munge.key file from server to each node from cluster</p> |
− | <p>scp /etc/munge/munge.key root@node:/etc/munge/munge.key</p> | + | <p>{{Command|<nowiki>scp /etc/munge/munge.key root@node:/etc/munge/munge.key</nowiki>}}</p> |
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p> Install SLURM compute node daemon</p> | + | <p>Install SLURM compute node daemon</p> |
− | < | + | <p>{{Command|<nowiki>apt-get install -y slurmd</nowiki>}}</p> |
− | </li> | + | </li> |
− | <li> | + | <li> |
− | <p>Start slurm in the nodes and server</p> | + | <p>Start slurm in the nodes and server</p> |
− | < | + | <p>{{Command|<nowiki>/etc/init.d/slurmd start</nowiki>}}</p> |
− | < | + | <p>{{Command|<nowiki>/etc/init.d/slurmctld start</nowiki>}}</p> |
− | </li> | + | </li> |
− | </ol> | + | </ol> |
</div> | </div> | ||
</div> | </div> | ||
Line 160: | Line 160: | ||
<div class="panel panel-darker-white-border"> | <div class="panel panel-darker-white-border"> | ||
<div class="panel-heading"> | <div class="panel-heading"> | ||
− | <h3 class="panel-title">Slurm Installation on Debian</h3> | + | <h3 class="panel-title">Slurm Installation on Debian from source</h3> |
</div> | </div> | ||
<div class="panel-body"> | <div class="panel-body"> |
Revision as of 16:56, 6 May 2015
Slurm Installation
In this section we describe all the administration tasks for the Slurm Workload Manager in the frontend node (Server) and in the compute nodes (Client)
Slurm Installation on Debian from repositories
- Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.
-
Install MUNGE
apt-get install -y libmunge-dev libmunge2 munge -
Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.
Wait around for some random data (recommended for the paranoid):
dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.keyGrab some pseudorandom data (recommended for the impatient):
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.keyPermissions
chown munge:munge /etc/munge/munge.key chmod 400 /etc/munge/munge.key -
Edit file /etc/passwd
vi /etc/passwdModify user munge in each machine
File: /etc/passwdmunge:x:501:501::var/run/munge;/sbin/nologin
-
Start MUNGE
/etc/init.d/munge start -
Testing Munge
The following steps can be performed to verify that the software has been properly installed and configured:
Generate a credential on stdout:
munge -nCheck if a credential can be locally decoded:
munge -n | unmungeCheck if a credential can be remotely decoded:
munge -n | ssh somehost unmungeRun a quick benchmark:
remunge -
Install SLURM from repositories
apt-get install -y slurm-wlm slurm-wlm-doc -
Create and copy slurm.conf
Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in /usr/share/doc/slurmctld/slurm-wlm-configurator.html
Open this file in your browser
sftp://ip-server/usr/share/doc/slurmctld/slurm-wlm-configurator.html
NOTE: Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.Copy the result from web-based configuration tool in /etc/slurm/slurm.conf and configure it such that it looks like the following (This is a example - build a configuration file customized for your environment) - http://slurm.schedmd.com/slurm.conf.html
File: /etc/slurm/slurm.conf# # slurm.conf file generated by configurator.html. # # See the slurm.conf man page for more information. # ClusterName=GUANE ControlMachine=guane # SlurmUser=slurm SlurmctldPort=6817 SlurmdPort=6818 AuthType=auth/munge StateSaveLocation=/tmp SlurmdSpoolDir=/var/spool/slurm/slurmd SwitchType=switch/none MpiDefault=none SlurmctldPidFile=/var/run/slurmctld.pid SlurmdPidFile=/var/run/slurmd.pid ProctrackType=proctrack/pgid CacheGroups=0 ReturnToService=1 # # TIMERS SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 MinJobAge=300 KillWait=30 Waittime=0 # # SCHEDULING SchedulerType=sched/backfill SelectType=select/linear FastSchedule=1 # # LOGGING SlurmctldDebug=3 SlurmdDebug=3 JobCompType=jobcomp/none JobCompLoc=/tmp/slurm_job_completion.txt # # ACCOUNTING JobAcctGatherType=jobacct_gather/linux JobAcctGatherFrequency=30 # #AccountingStorageType=accounting_storage/slurmdbd #AccountingStorageHost=slurm #AccountingStorageLoc=/tmp/slurm_job_accounting.txt #AccountingStoragePass= #AccountingStorageUser= # # COMPUTE NODES # control node NodeName=guane NodeAddr=192.168.1.70 Port=17000 State=UNKNOWN # each logical node is on the same physical node, so we need different ports for them # name guane-[*] is arbitrary NodeName=guane-1 NodeAddr=192.168.1.71 Port=17002 State=UNKNOWN NodeName=guane-2 NodeAddr=192.168.1.72 Port=17003 State=UNKNOWN # PARTITIONS # partition name is arbitrary PartitionName=guane Nodes=guane-[1-2] Default=YES MaxTime=8-00:00:00 State=UP
-
Install munge in each node of cluster
apt-get install -y libmunge-dev libmunge2 munge -
Copy munge.key file from server to each node from cluster
scp /etc/munge/munge.key root@node:/etc/munge/munge.key -
Install SLURM compute node daemon
apt-get install -y slurmd -
Start slurm in the nodes and server
/etc/init.d/slurmd start/etc/init.d/slurmctld start
Slurm Installation on Debian from source
http://wildflower.diablonet.net/~scaron/slurmsetup.html
CONTROLLER CONFIGURATION
Prerequisites
apt-get install -y build-essential
1. Install MUNGE
apt-get install -y libmunge-dev libmunge2 munge
2. Generate MUNGE key. There are various ways to do this, depending on the desired level of key quality. Refer to the MUNGE installation guide for complete details.
Wait around for some random data (recommended for the paranoid): dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key
Grab some pseudorandom data (recommended for the impatient): dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
chown munge:munge /etc/munge/munge.key chmod 400 /etc/munge/munge.key
3. Start MUNGE.
/etc/init.d/munge start
4. Testing Munge
The following steps can be performed to verify that the software has been properly installed and configured:
Generate a credential on stdout:
$ munge -n
Check if a credential can be locally decoded:
$ munge -n | unmunge
Check if a credential can be remotely decoded:
$ munge -n | ssh somehost unmunge
Run a quick benchmark:
$ remunge
5. Install MySQL server (for SLURM accounting) and development tools (to build SLURM). We'll also install the BLCR tools so that SLURM can take advantage of that checkpoint-and-restart functionality.
apt-get install mysql-server libmysqlclient-dev libmysqld-dev libmysqld-pic apt-get install gcc bison make flex libncurses5-dev tcsh pkg-config apt-get install blcr-util blcr-testsuite libcr-dbg libcr-dev libcr0
6. Download lastets version. (http://www.schedmd.com/#repos)
wget http://www.schedmd.com/download/latest/slurm-14.11.6.tar.bz2
7. Unpack and build SLURM.
tar xvf slurm-14.11.6.tar.bz2 cd slurm-14.11.6 ./configure --enable-multiple-slurmd make make install
8. Exist some ways to generate the slurm.cfg file. It have a web-based configuration tool which can be used to build a simple configuration file, which can then be manually edited for more complex configurations. The tool is located in doc/html/configurator.html, this file can opened in the browser if it's copy to /usr/share/ (sftp://ip-server/usr/share/doc/slurm/configurator.html).
mkdir /usr/share/doc/slurm/ cd [slurm-src]/doc/html/ cp configurator.* /usr/share/doc/slurm/
Other way is copy the example configuration files out to /etc/slurm.
mkdir /etc/slurm cd [slurm-src] cp etc/slurm.conf.example /etc/slurm/slurm.conf cp etc/slurmdbd.conf.example /etc/slurm/slurmdbd.conf
Executing the command slurmd -C on each compute node will print its physical configuration (sockets, cores, real memory size, etc.), which can be used in constructing the slurm.conf file.
9. Set things up for slurmdbd (the SLURM accounting daemon) in MySQL.
mysql -u root -p create database slurm_db; create user 'slurm'@'localhost'; set password for 'slurm'@'localhost' = password('MyPassword'); grant usage on *.* to 'slurm'@'localhost'; grant all privileges on slurm_db.* to 'slurm'@'localhost'; flush privileges; quit