Slurm Installation from sources
Anterior: Job Scheduler Slurm
This section shows how to install and setup Simple Linux Utility for Resource Management from sources
Contents
SLURM Installation from Sources
Install Requirements
Install munge in every node (compute and frontend). For example, on debian
Configure Munge
Create the munge key
Copy the munge key from the frontend and the compute nodes
Set permissions, user and group of the keys
Init munge service in all nodes. From the frontend node execute the following command
Test The Munge Service
From the frontend console execute the following commands:
Frontend
Nodes
Where guane is the base name of the compute nodes.
The output should be something like:
STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0)
Download the software
Compile the software
Configure SLURM
Creates the Data Base
Edit the Configuration Files
In the following directory create the configuration files:
/usr/local/slurm/etc
Please visit https://computing.llnl.gov/linux/slurm/slurm.conf.html for details.
MpiParams=ports=12000-12999
AuthType=auth/munge
CacheGroups=0
MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
Slurmdlogfile=/var/log/slurmd.log
SlurmdDebug=7
Slurmctldlogfile=/var/log/slurmctld.log
SlurmctldDebug=7
StateSaveLocation=/var/spool
SwitchType=switch/none
TaskPlugin=task/cgroup
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=10
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_Core,CR_Core_Default_Dist_Block
AccountingStorageHost=localhost
AccountingStorageLoc=slurmDB
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageUser=slurm
AccountingStoreJobComment=YES
AccountingStorageEnforce=associations,limits
ClusterName=guane
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
GresTypes=gpu
NodeName=guane[01-02,04,07,09-16] Procs=24 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=102000 Gres=gpu:8 State=UNKNOWN
NodeName=guane[03,05,06,08] Procs=16 Sockets=2 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=102000 Gres=gpu:8 State=UNKNOWN
PartitionName=all Nodes=guane[01-16] MaxTime=INFINITE State=UP Default=YES
PartitionName=manycores16 Nodes=guane[03,05,06,08] MaxTime=INFINITE State=UP
PartitionName=manycores24 Nodes=guane[01-02,04,07,09-16] MaxTime=INFINITE State=UP
Name=gpu Type=Tesla File=/dev/nvidia0
Name=gpu Type=Tesla File=/dev/nvidia1
Name=gpu Type=Tesla File=/dev/nvidia2
Name=gpu Type=Tesla File=/dev/nvidia3
Name=gpu Type=Tesla File=/dev/nvidia4
Name=gpu Type=Tesla File=/dev/nvidia5
Name=gpu Type=Tesla File=/dev/nvidia6
Name=gpu Type=Tesla File=/dev/nvidia7
AuthType=auth/munge
DbdAddr=localhost
DbdHost=localhost
SlurmUser=slurm
DebugLevel=4
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=localhost
StorageUser=slurm
StorageLoc=slurmDB
CgroupAutomount=yes
CgroupReleaseAgentDir="/usr/local/slurm/etc/cgroup"
ConstrainCores=yes
TaskAffinity=yes
ConstrainDevices=yes
AllowedDevicesFile="/usr/local/slurm/etc/allowed_devices.conf"
ConstrainRAMSpace=no
/dev/null
/dev/urandom
/dev/zero
/dev/cpu/*/*
/dev/pts/*
AuthType=auth/munge
DbdAddr=localhost
DbdHost=localhost
DebugLevel=4
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePass=griduis2o14
StorageUser=slurmacct
StorageLoc=slurmDB
Configure the scripts to manages the resources
cp /usr/local/src/slurm-14.11.7/etc/cgroup.release_common.example /usr/local/slurm/etc/cgroup/cgroup.release_common ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_devices ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_cpuset
ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_freezer
Initialize the Services
slurmctld and lurmdbd run on the frontend and slurmd on the compute nodes
On the frontend execute the following commands:
Test SLURM Services
Using the following commands you can see if everything is OK
You can issue the next command to test slurm also.
Activate the Accounting
Add the cluster to the database
Add an accounting category
Add the users. For example, gilberto. This should be a (linux, ldap, etc) valid user.

