Difference between revisions of "Slurm Installation from sources"
(2 intermediate revisions by the same user not shown) | |||
Line 210: | Line 210: | ||
On the frontend execute the following commands: | On the frontend execute the following commands: | ||
− | + | {{Command|<nowiki>cexec /usr/local/slurm/sbin/slurmd | |
− | + | slurmctld | |
− | + | scontrol update NodeName=guane[1-10] State=RESUME | |
− | + | /usr/local/slurm/sbin/slurmdbd &</nowiki>}} | |
=== Test SLURM Services === | === Test SLURM Services === | ||
Line 219: | Line 219: | ||
Using the following commands you can see if everything is OK | Using the following commands you can see if everything is OK | ||
− | + | {{Command|scontrol show node | |
− | + | sinfo}} | |
You can issue the next command to test slurm also. | You can issue the next command to test slurm also. | ||
− | + | {{Command|srun -N2 /bin/hostname}} | |
=== Activate the Accounting === | === Activate the Accounting === | ||
Line 230: | Line 230: | ||
Add the cluster to the database | Add the cluster to the database | ||
− | + | {{Command|sacctmgr add cluster guane}} | |
Add an accounting category | Add an accounting category | ||
− | + | {{Command|<nowiki>sacctmgr add account general Description="General Accounting" Organization=SC3</nowiki>}} | |
Add the users. For example, gilberto. This should be a (linux, ldap, etc) valid user. | Add the users. For example, gilberto. This should be a (linux, ldap, etc) valid user. | ||
− | + | {{Command|<nowiki>sacctmgr add user gilberto DefaultAccount=general</nowiki>}} |
Latest revision as of 16:46, 22 September 2016
Anterior: Job Scheduler Slurm
This section shows how to install and setup Simple Linux Utility for Resource Management from sources
Contents
SLURM Installation from Sources
Install Requirements
Install munge in every node (compute and frontend). For example, on debian
Configure Munge
Create the munge key
Copy the munge key from the frontend and the compute nodes
Set permissions, user and group of the keys
Init munge service in all nodes. From the frontend node execute the following command
Test The Munge Service
From the frontend console execute the following commands:
Frontend
Nodes
Where guane is the base name of the compute nodes.
The output should be something like:
STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0) STATUS: Success (0)
Download the software
Compile the software
Configure SLURM
Creates the Data Base
Edit the Configuration Files
In the following directory create the configuration files:
/usr/local/slurm/etc
Please visit https://computing.llnl.gov/linux/slurm/slurm.conf.html for details.
MpiParams=ports=12000-12999 AuthType=auth/munge CacheGroups=0 MpiDefault=none ProctrackType=proctrack/cgroup ReturnToService=2 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=root Slurmdlogfile=/var/log/slurmd.log SlurmdDebug=7 Slurmctldlogfile=/var/log/slurmctld.log SlurmctldDebug=7 StateSaveLocation=/var/spool SwitchType=switch/none TaskPlugin=task/cgroup InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=10 Waittime=0 FastSchedule=1 SchedulerType=sched/backfill SchedulerPort=7321 SelectType=select/cons_res SelectTypeParameters=CR_Core,CR_Core_Default_Dist_Block AccountingStorageHost=localhost AccountingStorageLoc=slurmDB AccountingStorageType=accounting_storage/slurmdbd AccountingStorageUser=slurm AccountingStoreJobComment=YES AccountingStorageEnforce=associations,limits ClusterName=guane JobCompType=jobcomp/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none GresTypes=gpu NodeName=guane[01-02,04,07,09-16] Procs=24 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=102000 Gres=gpu:8 State=UNKNOWN NodeName=guane[03,05,06,08] Procs=16 Sockets=2 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=102000 Gres=gpu:8 State=UNKNOWN PartitionName=all Nodes=guane[01-16] MaxTime=INFINITE State=UP Default=YES PartitionName=manycores16 Nodes=guane[03,05,06,08] MaxTime=INFINITE State=UP PartitionName=manycores24 Nodes=guane[01-02,04,07,09-16] MaxTime=INFINITE State=UP
Name=gpu Type=Tesla File=/dev/nvidia0 Name=gpu Type=Tesla File=/dev/nvidia1 Name=gpu Type=Tesla File=/dev/nvidia2 Name=gpu Type=Tesla File=/dev/nvidia3 Name=gpu Type=Tesla File=/dev/nvidia4 Name=gpu Type=Tesla File=/dev/nvidia5 Name=gpu Type=Tesla File=/dev/nvidia6 Name=gpu Type=Tesla File=/dev/nvidia7
AuthType=auth/munge DbdAddr=localhost DbdHost=localhost SlurmUser=slurm DebugLevel=4 LogFile=/var/log/slurm/slurmdbd.log PidFile=/var/run/slurmdbd.pid StorageType=accounting_storage/mysql StorageHost=localhost StorageUser=slurm StorageLoc=slurmDB
CgroupAutomount=yes CgroupReleaseAgentDir="/usr/local/slurm/etc/cgroup" ConstrainCores=yes TaskAffinity=yes ConstrainDevices=yes AllowedDevicesFile="/usr/local/slurm/etc/allowed_devices.conf" ConstrainRAMSpace=no
/dev/null /dev/urandom /dev/zero /dev/cpu/*/* /dev/pts/*
AuthType=auth/munge DbdAddr=localhost DbdHost=localhost DebugLevel=4 LogFile=/var/log/slurm/slurmdbd.log PidFile=/var/run/slurmdbd.pid StorageType=accounting_storage/mysql StorageHost=localhost StoragePass=griduis2o14 StorageUser=slurmacct StorageLoc=slurmDB
Configure the scripts to manages the resources
cp /usr/local/src/slurm-14.11.7/etc/cgroup.release_common.example /usr/local/slurm/etc/cgroup/cgroup.release_common ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_devices ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_cpuset
ln -s /usr/local/slurm/etc/cgroup/cgroup.release_common /usr/local/slurm/etc/cgroup/release_freezer
Initialize the Services
slurmctld and lurmdbd run on the frontend and slurmd on the compute nodes
On the frontend execute the following commands:
Test SLURM Services
Using the following commands you can see if everything is OK
You can issue the next command to test slurm also.
Activate the Accounting
Add the cluster to the database
Add an accounting category
Add the users. For example, gilberto. This should be a (linux, ldap, etc) valid user.