High Performance Computing

As I am working my way thru the Alea framework to continue my research. I realized the Alea does not actually implement a scheduling algo, at least not a fairshare one.

Schedulers are difficult to write, but I have had the fortune to develop a few over the years. Wanted to document the internal architecture of a “good” scheduler. Frankly, it does not matter what type of scheduling algorithm you implement. As long at you follow the following approach, you will be ok.

Schedulers need to have 3 distinct stages:
– Pre-processing of tasks

– Scheduling of tasks

– Dispatching of tasks

most people lump the first two together. That’s a mistake!

– Step 1: preprocessing:

This is where you apply the desired policy to the incoming tasks. If you are interested in a FCFS policy, well, then you make sure that you queue up your tasks in a FCFS manner, but putting all the tasks in the order of arrival in a queue. If you are interested in SJF (shortest job first), well, then you sort the incoming tasks in a manner that the shortest jobs are first in the queue. Maybe you have more than one queue; it does not matter. The point here is that in the preprocessing, you are not actually scheduling anything, but rather enforcing a policy or set of desired policies to the incoming tasks.

– Step 2: scheduling:

After your tasks are sorted based on some policy, you are ready to schedule these tasks. Scheduling is supply vs. demand.
The most common scheduling policy is fairshare. I have many posts about fairshare, but the gist of it is that you are given a set of resources based on your “fair share”, which is determined by “you deserve more because you need more” mentality. If A has 10x as many tasks as B, it will get 10x more resources.
There are other policies, like highest priority first, etc.

– Step 3: dispatching:

After you have decided which tasks should get executed first, you now need to dispatch the tasks. Most scheduling systems use gang-bang scheduling in that a number of tasks are scheduled. The reason here is efficiency and practicality. You dont make a scheduling decision on one task, but rather a set of tasks. I am classifying gang-bang in the dispatching of the tasks, as it does not make a scheduling decision. Many papers (ref needed) simply claim that gang-bang is a scheduling policy of its own. The dispatcher essentially goes thru the set of tasks assigned to be executed, and sends those tasks to the nodes. Once the set is dispatched, it goes to the next set.

There are optimization that can be done at every step, but that’s where the innovation comes in. Also, the steps are not discrete as they depend on each other. You *can* and should make them as decoupled as possible, but that’s just a general software development rule of thumb!

Art Sedighi

Category: High Performance Computing

Scheduling Architecture