当前位置: 代码迷 >> 综合 >> Phoenix Job priorities
  详细解决方案

Phoenix Job priorities

热度:15   发布时间:2024-01-11 14:34:51.0

Contents

 [hide] 

  • 1Introduction
  • 2How the FairShare algorithm work – a test example
  • 3Diagnostic tool – rcquota, rcshare and rcstat
  • 4My job seems to take forever to run ? What is happening?
  • 5I used to wait a few hours before my job started running, but now it takes a few days !

Introduction[edit | edit source]

The SLURM job scheduler dynamically organises the queue in order to maximise the computational workload and thus the efficiency of the supercomputer. When you submit a job, it will be placed onto the queue and its priority will mainly depend the FairShare factor. The FairShare algorithm ensures that all project groups/schools can use their quarterly quotas. The following section will showcase an example to help understand how the FairShare algorithm works.

 

How the FairShare algorithm work – a test example[edit | edit source]

In this example, the supercomputer resources are shared among four groups : ACDC , Metallica, KISS and Credence Clearwater Revival (CCR). Each group is allocated a given percentage of the total resources (see Table 1). The fair share algorithm must ensure that the two pie charts shown in Figure 1 are very similar, that is the fraction of allocation spent by each group equals their allocated share.

Table 1: Example of the resources shared among four schools and the current resources spent by each school
School/Group Resources allocation Current usage
  Service units [SU] [%] Quota [%] Service unit spent [SU] [%]
ACDC 100,000 10% 50% 50,000 7.9%
Metallica 300,000 30% 75% 225,000 35.7%
KISS 250,000 25% 100% 250,000 39.6%
CCR 350,000 35% 30% 105,000 16.6 %
Total 1,000,000 100 %   630,000 100 %

 

As shown in Figure 1, it is easily noticeable that the KISS group have ran most of the jobs (~40% of the allocated resources so far compared to the expected 25 %). As a result, their job priorities will be considerably reduced. We also note from Table 1, that they have reached their quarterly quota. Although jobs can still be submitted, they will receive low priorities.

Figure 1: Pie charts illustrating the expected share of the allocated resources (left panel) vs the actual current share allocated to the four schools (right panel, see Table 1)

ACDC and Metallica have received ~8 %<10 % and ~36%>30 % of the allocated resources so far. The FairShare algorithm will consequently slightly enhance the priority of ACDC and lowering the priority of Metallica.

Finally, the CCR group has not submitted enough jobs. The FairShare algorithm will therefore boost the priority of these jobs. However, they are required to submit these jobs before the end of the quarter or they will waste the remaining service units (SU).

To summarize, the FairShare algorithm will increase/decrease the job priority if the fraction of research allocation used is lower/greater than the fraction of the system share.

 

Diagnostic tool – rcquota, rcshare and rcstat[edit | edit source]

rcquota and rcshare are powerful tools to diagnose whether your jobs will be on the high priority list or not.

rcshare : shows a summary of allocation and usage statistics by project for the current accounting period. As shown in Figure 2, the last three columns can help understand how the FairShare algorithm will assess your job. In the case where the grant use is greater than 100 %, then the school/project group you belong to will have their job priority lowered.

On one hand, the system share represents the fractional amount of the quarterly allocation granted to a school/research group.

SYSTEM SHARE = GRANT SU / TOTAL GRANT SU

On the other hand, the system use represents the current fractional amount of service units allocated to a given school/research group :

SYSTEM USE = USAGE SU / TOTAL USAGE SU

If SYSTEM SHARE < SYSTEM USE, then the FairShare algorithm will decrease the priority of the jobs from your school/research project. If SYSTEM SHARE > SYSTEM USE, then the FairShare algorithm will increase the priority of the jobs from your school/research project.

Figure 2: Output from rcshare. The columns indicate, in order, the project groups/schools, the quarterly service unit (SU) grant, the quarterly service unit used, the fraction of service unit used, the system share and the system use.

rcquota : This command outputs the allocation usage from your account as well as the school/project group you belong to.

Figure 3: Output from rcquota. The first column displays the project group/school you belong to. The second column indicates the number of quarterly service units (SU) granted to your research project/school. The third and four columns show the number of service units used and the fraction over the service unit granted.

Sprio | -rn -k2 | less : ordering the submitted jobs according to their priority number (mostly determined by the FairShare factor). You can access the job id by using squeue -u <a1234567>

rcstat <job_id> (see wiki-page): display the statistics of the job. This is particularly useful to optimize the resource allocation of a given SLURM job.

Figure 4:Output of rcstat.

The first few rows displays the allocation requested when the job has been submitted. It then shows the time when the job was submitted, when it started and ended.

Finally, it displays the fraction of your requested wall-time used, the fraction CPU used, and finally the amount of memory written. If you intend to submit similar jobs, these information help you optimize your allocation request when submitting a job. A general consensus is that a SLURM job is optimised when the wall-time, the CPU used and the memory used is around 80 % of the reserved allocation.

In Figure 4, we notice that the job used 96,7 % of the CPU allocated. This indicates that the code makes good use of all the cores available (no cores are idle). However, we can considerably reduce the requested wall-time and the requested memory per node. By optimizing your job request, your submitted job is expected to start sooner.

 

 

My job seems to take forever to run ? What is happening?[edit | edit source]

It is expected that the larger is the allocation requested, the longer is the waiting time. The job scheduler ensures that the supercomputer is efficiently making use of the nodes at all time. Whenever you submit a large job, the scheduler has to find a time when all the resources requested is available. To reduce the waiting time, you must ensure you optimize your SLURM job using rcstat (see above section).

 

I used to wait a few hours before my job started running, but now it takes a few days ![edit | edit source]

The queue is constantly being updated every minutes. High priority jobs that have recently been submitted will be pushed ahead of the queue. By using rcshare, rcquota, and sprio (see above section), you can diagnose whether your job is given lower or higher priority. If your project group/school has exceeded the quota, then the FairShare algorithm will give you low priority. If your school system share is smaller than your system use (see rcshare), then FairShare will also lower your priority. This will result in extending your waiting time.