Difference between revisions of "Slurm Installation from sources"

From Supercomputación y Cálculo Científico UIS
(Created page with "Anterior: Job Scheduler Slurm File:Logo sc33.png This section shows how to install and setup '''Simple Linux Utility for Resource Management''' f...")
 
 
(5 intermediate revisions by the same user not shown)
Line 67: Line 67:
 
=== Compile the software ===
 
=== Compile the software ===
  
        {{Command|tar xvjf slurm-16.05.4.tar.bz2
+
{{Command|<nowiki>tar xvjf slurm-16.05.4.tar.bz2
        cd /usr/local/src/slurm-16.05.4
+
cd /usr/local/src/slurm-16.05.4
        ./configure --prefix=/usr/local/slurm  
+
./configure --prefix=/usr/local/slurm  
        make -j25
+
make -j25
        make install}}
+
make install</nowiki>}}
  
 
=== Configure SLURM ===  
 
=== Configure SLURM ===  
Line 77: Line 77:
 
==== Creates the Data Base ====
 
==== Creates the Data Base ====
  
        {{Command|mysql -u root -p
+
{{Command|<nowiki>mysql -u root -p
        mysql> GRANT ALL  ON slurmDB.* to 'slurm'@'localhost';
+
mysql> GRANT ALL  ON slurmDB.* to 'slurm'@'localhost';
        mysql> exit}}
+
mysql> exit</nowiki>}}
 +
 
 +
 
 +
=== Edit the Configuration Files ===
  
 
In the following directory create the configuration files:
 
In the following directory create the configuration files:
Line 200: Line 203:
 
ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_cpuset
 
ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_cpuset
 
ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_freezer}}
 
ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_freezer}}
 
  
 
=== Initialize the Services ===
 
=== Initialize the Services ===
Line 208: Line 210:
 
On the frontend execute the following commands:
 
On the frontend execute the following commands:
  
        {{Command|cexec /usr/local/slurm/sbin/slurmd
+
{{Command|<nowiki>cexec /usr/local/slurm/sbin/slurmd
        slurmctld
+
slurmctld
        scontrol update NodeName=guane[1-10] State=RESUME
+
scontrol update NodeName=guane[1-10] State=RESUME
        /usr/local/slurm/sbin/slurmdbd &}}
+
/usr/local/slurm/sbin/slurmdbd &</nowiki>}}
  
 
=== Test SLURM Services ===
 
=== Test SLURM Services ===
Line 217: Line 219:
 
Using the following commands you can see if everything is OK
 
Using the following commands you can see if everything is OK
  
        {{Command|scontrol show node
+
{{Command|scontrol show node
        sinfo}}
+
sinfo}}
  
 
You can issue the next command to test slurm also.
 
You can issue the next command to test slurm also.
  
        {{Command|srun -N2 /bin/hostname}}
+
{{Command|srun -N2 /bin/hostname}}
  
 
=== Activate the Accounting ===
 
=== Activate the Accounting ===
Line 228: Line 230:
 
Add the cluster to the database
 
Add the cluster to the database
  
        {{Command|sacctmgr add cluster guane}}
+
{{Command|sacctmgr add cluster guane}}
  
 
Add an accounting category
 
Add an accounting category
  
        {{Command|sacctmgr add account general Description="General Accounting" Organization=SC3}}
+
{{Command|<nowiki>sacctmgr add account general Description="General Accounting" Organization=SC3</nowiki>}}
  
 
Add the users. For example, gilberto. This should be a (linux, ldap, etc) valid user.
 
Add the users. For example, gilberto. This should be a (linux, ldap, etc) valid user.
  
        {{Command|sacctmgr add user gilberto DefaultAccount=general}}
+
{{Command|<nowiki>sacctmgr add user gilberto DefaultAccount=general</nowiki>}}

Latest revision as of 16:46, 22 September 2016

Anterior: Job Scheduler Slurm

Logo sc33.png

This section shows how to install and setup Simple Linux Utility for Resource Management from sources

SLURM Installation from Sources

Install Requirements

Install munge in every node (compute and frontend). For example, on debian

apt-get -y install munge libfreeipmi-dev libhwloc-dev freeipmi libmunge-dev


Configure Munge

Create the munge key

/usr/sbin/create-munge-key


Copy the munge key from the frontend and the compute nodes

cpush /etc/munge/munge.key /etc/munge/


Set permissions, user and group of the keys

cexec chown munge:munge /etc/munge/munge.key


Init munge service in all nodes. From the frontend node execute the following command

/etc/init.d/munge start
cexec "/etc/init.d/munge start"


Test The Munge Service

From the frontend console execute the following commands:

Frontend

munge -n


Nodes

for i in 06 07 08 09 10 11 12 13 14 15 16; do munge -n


Where guane is the base name of the compute nodes.

The output should be something like:

STATUS:           Success (0)
STATUS:           Success (0)
STATUS:           Success (0)
STATUS:           Success (0)
STATUS:           Success (0)
STATUS:           Success (0)
STATUS:           Success (0)
STATUS:           Success (0)
STATUS:           Success (0)
STATUS:           Success (0)
STATUS:           Success (0)

Download the software


Compile the software

tar xvjf slurm-16.05.4.tar.bz2 cd /usr/local/src/slurm-16.05.4 ./configure --prefix=/usr/local/slurm make -j25 make install


Configure SLURM

Creates the Data Base

mysql -u root -p mysql> GRANT ALL ON slurmDB.* to 'slurm'@'localhost'; mysql> exit


Edit the Configuration Files

In the following directory create the configuration files:

/usr/local/slurm/etc

Please visit https://computing.llnl.gov/linux/slurm/slurm.conf.html for details.

File: slurm.conf
        MpiParams=ports=12000-12999
        AuthType=auth/munge
        CacheGroups=0
        MpiDefault=none
        ProctrackType=proctrack/cgroup
        ReturnToService=2
        SlurmctldPidFile=/var/run/slurmctld.pid
        SlurmctldPort=6817
        SlurmdPidFile=/var/run/slurmd.pid
        SlurmdPort=6818
        SlurmdSpoolDir=/var/spool/slurmd
        SlurmUser=root
        Slurmdlogfile=/var/log/slurmd.log
        SlurmdDebug=7
        Slurmctldlogfile=/var/log/slurmctld.log
        SlurmctldDebug=7
        StateSaveLocation=/var/spool
        SwitchType=switch/none
        TaskPlugin=task/cgroup
        InactiveLimit=0
        KillWait=30
        MinJobAge=300
        SlurmctldTimeout=120
        SlurmdTimeout=10
        Waittime=0
        FastSchedule=1
        SchedulerType=sched/backfill
        SchedulerPort=7321
        SelectType=select/cons_res
        SelectTypeParameters=CR_Core,CR_Core_Default_Dist_Block
        AccountingStorageHost=localhost
        AccountingStorageLoc=slurmDB
        AccountingStorageType=accounting_storage/slurmdbd
        AccountingStorageUser=slurm
        AccountingStoreJobComment=YES
        AccountingStorageEnforce=associations,limits
        ClusterName=guane
        JobCompType=jobcomp/none
        JobAcctGatherFrequency=30
        JobAcctGatherType=jobacct_gather/none
        GresTypes=gpu
        NodeName=guane[01-02,04,07,09-16] Procs=24 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=102000 Gres=gpu:8 State=UNKNOWN
        NodeName=guane[03,05,06,08] Procs=16 Sockets=2 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=102000 Gres=gpu:8 State=UNKNOWN
        PartitionName=all Nodes=guane[01-16]  MaxTime=INFINITE State=UP Default=YES
        PartitionName=manycores16 Nodes=guane[03,05,06,08]  MaxTime=INFINITE State=UP 
        PartitionName=manycores24 Nodes=guane[01-02,04,07,09-16]  MaxTime=INFINITE State=UP
File: gres.conf
        Name=gpu Type=Tesla File=/dev/nvidia0
        Name=gpu Type=Tesla File=/dev/nvidia1
        Name=gpu Type=Tesla File=/dev/nvidia2
        Name=gpu Type=Tesla File=/dev/nvidia3
        Name=gpu Type=Tesla File=/dev/nvidia4
        Name=gpu Type=Tesla File=/dev/nvidia5
        Name=gpu Type=Tesla File=/dev/nvidia6
        Name=gpu Type=Tesla File=/dev/nvidia7
File: slurmdbd.conf
        AuthType=auth/munge
        DbdAddr=localhost
        DbdHost=localhost
        SlurmUser=slurm
        DebugLevel=4
        LogFile=/var/log/slurm/slurmdbd.log
        PidFile=/var/run/slurmdbd.pid
        StorageType=accounting_storage/mysql
        StorageHost=localhost
        StorageUser=slurm
        StorageLoc=slurmDB
File: cgroup.conf
        CgroupAutomount=yes
        CgroupReleaseAgentDir="/usr/local/slurm/etc/cgroup"
        ConstrainCores=yes
        TaskAffinity=yes
        ConstrainDevices=yes
        AllowedDevicesFile="/usr/local/slurm/etc/allowed_devices.conf"
        ConstrainRAMSpace=no
File: allowed_devices.conf
        /dev/null
        /dev/urandom
        /dev/zero
        /dev/cpu/*/*
        /dev/pts/*
File: slurmdbd.conf
        AuthType=auth/munge
        DbdAddr=localhost
        DbdHost=localhost
        DebugLevel=4
        LogFile=/var/log/slurm/slurmdbd.log
        PidFile=/var/run/slurmdbd.pid
        StorageType=accounting_storage/mysql
        StorageHost=localhost
        StoragePass=griduis2o14
        StorageUser=slurmacct
        StorageLoc=slurmDB

Configure the scripts to manages the resources

mkdir etc/cgroup

cp /usr/local/src/slurm-14.11.7/etc/cgroup.release_common.example /usr/local/slurm/etc/cgroup/cgroup.release_common ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_devices ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_cpuset

ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_freezer


Initialize the Services

slurmctld and lurmdbd run on the frontend and slurmd on the compute nodes

On the frontend execute the following commands:

cexec /usr/local/slurm/sbin/slurmd slurmctld scontrol update NodeName=guane[1-10] State=RESUME /usr/local/slurm/sbin/slurmdbd &


Test SLURM Services

Using the following commands you can see if everything is OK

scontrol show node sinfo


You can issue the next command to test slurm also.

srun -N2 /bin/hostname


Activate the Accounting

Add the cluster to the database

sacctmgr add cluster guane


Add an accounting category

sacctmgr add account general Description="General Accounting" Organization=SC3


Add the users. For example, gilberto. This should be a (linux, ldap, etc) valid user.

sacctmgr add user gilberto DefaultAccount=general