Meaningful measurement

What is a meaningful measurement of performance of a scheduler?  I am embarking on the next paper which is due in two weeks, and unsure about what I am actually going to write.  Well, I know what I want to write, but not sure about its presentation mechanism.

This is the status of where I am right now…

Alea + GridSim have now successfully been morphed into a new framework called dSim.  Why dSim?  not really sure, but it had a good ring to it, and potentially a commercially viable name.  Anyhow, dSim is capable of creating a dynamic load, creating dynamic set of resources, schedule appropriately, and measure results.  All of this can be done via configuration files and parameters.

One of the results that I am collecting is the makespan for the tasks, and as a result the entire job.  I dont really care about the job itself, but the makespan of a given task.

I am collecting data that is like the following:

Client Name task ID Submission time Finished Time Makespan
Client 1 1034 3000 11000 8000

For a given run, I have 1000’s of these rows.  I have one row entry per task; which makes sense.  The question is then what to do with this data?

Statistical analysis of the data would give me the mean, median, and the STD.  But is that enough?  For example, i may have following:

Client 1:
mean: 33000
STD: 18500

Client 2:
mean: 44000
STD: 9000

Obviously, that’s good info, but is it enough?

The work can potentially be extended to a cloud environment, but for now, the focus is HPC environments.

Scheduling Architecture

As I am working my way thru the Alea framework to continue my research.  I realized the Alea does not actually implement a scheduling algo, at least not a fairshare one.

Schedulers are difficult to write, but I have had the fortune to develop a few over the years.  Wanted to document the internal architecture of a “good” scheduler.  Frankly, it does not matter what type of scheduling algorithm you implement.  As long at you follow the following approach, you will be ok.

Schedulers need to have 3 distinct stages:
– Pre-processing of tasks

– Scheduling of tasks

– Dispatching of tasks

most people lump the first two together.  That’s a mistake!

– Step 1: preprocessing:

This is where you apply the desired policy to the incoming tasks.  If you are interested in a FCFS policy, well, then you make sure that you queue up your tasks in a FCFS manner, but putting all the tasks in the order of arrival in a queue.  If you are interested in SJF (shortest job first), well, then you sort the incoming tasks in a manner that the shortest jobs are first in the queue.  Maybe you have more than one queue; it does not matter.  The point here is that in the preprocessing, you are not actually scheduling anything, but rather enforcing a policy or set of desired policies to the incoming tasks.

– Step 2: scheduling:

After your tasks are sorted based on some policy, you are ready to schedule these tasks.  Scheduling is supply vs. demand.
The most common scheduling policy is fairshare.  I have many posts about fairshare, but the gist of it is that you are given a set of resources based on your “fair share”, which is determined by “you deserve more because you need more” mentality.  If A has 10x as many tasks as B, it will get 10x more resources.
There are other policies, like highest priority first, etc.

– Step 3: dispatching:

After you have decided which tasks should get executed first, you now need to dispatch the tasks.  Most scheduling systems use gang-bang scheduling in that a number of tasks are scheduled.  The reason here is efficiency and practicality.  You dont make a scheduling decision on one task, but rather a set of tasks.  I am classifying gang-bang in the dispatching of the tasks, as it does not make a scheduling decision.  Many papers (ref needed) simply claim that gang-bang is a scheduling policy of its own.  The dispatcher essentially goes thru the set of tasks assigned to be executed, and sends those tasks to the nodes.  Once the set is dispatched, it goes to the next set.

There are optimization that can be done at every step, but that’s where the innovation comes in.  Also, the steps are not discrete as they depend on each other.  You *can* and should make them as decoupled as possible, but that’s just a general software development rule of thumb!

Art Sedighi

Policy as it pertains to high-performance systems

I was recently asked to think about how high performance systems deal with policies?

Two clarifications are required here:

  1. What are high-performance systems in the context of Grid, HPC and scheduling?
  2. What are the policies that a typical high-performance system deals with or in other words, sets?

In the context of high-performance schedulers, a high-performance system is the scenario where we are dealing with a large number of tasks (potentially millions of tasks) that are fairy short in duration, and the total job is only complete once all the tasks have been completed.

What is “short” in our context?  I can easily say that short is in the order of milliseconds or even seconds, but more quantitatively, I will assume that a task is short in duration iff:

  1. Scheduling overhead directly impacts the speedup factor (i.e. the time that it takes to schedule that task cannot be neglected)
  2. The runtime of a give task is significantly shorter (two-orders of magnitude) than the overall runtime of given job.

The bigger question becomes what these policies actually are and why would they be of importance?

The following is a subset of policies that we could be referring to:

  1. Sharing policy
  2. Fair-share policy pertaining to scheduling
  3. (others – TBD)

In a sharing policy, a client can allow some or all if its resources to be shared (given out) to other client[s] that may need them.  This obviously has a risk that the resources are not immediately available when the original owner needs them back.  At the same time, if one waits before lending out resources, there could a high degree of unutilized resources.


The fair-share policy is scheduling is probably the most implicitly set policy in shared environments.  The users get their “fair-share” of the available resources based on some preset fair-share policy.  A user may assist or hint the policy with priorities, for example, but generally speaking, the policy is set and agreed to by all the users.

My research focuses on Fair-share policy and how it affects users – and to an extend resources.  Users agree to the fair-share policy with the assumption that what the scheduler does is “fair”.

Furthermore, users interact with the system unbeknownst to how the fair-share scheduling policy is affecting their runtime.  The side effect of a fair-share scheduler is that timing is severely affects the outcome.  Since there is no historical perspective kept to aid the scheduler to better aid the enforcement of such policy, and some users end up keep temporarily starved.


Art Sedighi

The Matching Pennies Problem

So what is the Matching Pennies Problem anyway (MPP)?

Matching Pennies is a simple game but has very interesting properties that can be applied to many other types of complex games.

There are two players, and each select either Head or Tail.  Here are the rules of the game:

– if the choices differ (one player chooses Head while the other Tail), Player 1 pays Player 2 $1

– if the choices are the same (both players select Head or both select Tail), Player 2 pays Player 1 $1


Head Tail
Head 1,-1 -1,1
Tail -1,1 1,-1

This is a Zero-sum game in that one player’s win would mean a loss for the second player.  In another words, the choices are diametrically opposed to one another.  This game is a strictly competitive game and no amount of collaboration will help the situation.

Recap: To be in Nash’s Equilibrium, no player has an action yielding an outcome that he prefers to that of the current action, given that every other player chooses his equilibrium action.  In other words, no player can profitably deviate, given the actions of the other players.  In the MPP, if Player 2 and Player 1 have chosen Head, Player 2 end up paying Player 1.  If now Player 2 chooses to change his choice to Tail, so can Player 1 and the results are the still the same.  You can see what happens from Player 1’s perspective if the users’ selections were not the same to start.

But there is not a single solution that “stands” out in that there is no solution (or strategy) that has a better outcome for one player over the current action.


Art Sedighi


Fairness in scheduling

What does it mean to be fair when scheduling tasks across a Grid?

Depending on the perspective of the affected entity, fairness could mean different things.  For  a heavy user of the system, “fair” could mean:

“I need (should read “neeeed”) more, so it is fair for me to get more”

From a casual user’s perspective:

“As long as I get to do my work, it is fair”

From a light-user:

“I don’t use the system that often, so when I do, I should have higher priority”

There can be cases made for each of these scenarios.  The first scenario, the heavy user, is the one which more schedulers tend to please.  It is an implied favouritism in that in order to drain the pending queues the fastest, the scheduler schedules more tasks from the heavy user as it had a higher percentage of pending tasks.

What is fairshare?  how can schedulers pony up resources in a shared manner?


Art Sedighi




Been thinking a lot about fair-share and if something is fair or how one can make it fair?

Fair-share, specially when it comes to computing, is figured out based on the load on the decision maker.  If I (a router for example) want to give fair access to a shared network, I look at my load of incoming and try to give out the most to the person with the largest queue.  Relatively speaking, all the queues get the same (fair-share) access, but not if you are the “little guy” that submits packets at a lower rate than others.

There are other mechanisms that try to take care of the little guy first and then go to the big users. In job scheduling, there is an algo for this called Shortest Job First.  You get the little guys and empty those queues first; then you are off to the longer running jobs.  The concept makes sense, right? You don’t want a short job to be stuck behind a job that runs for days.

So, what is fair?  How does one get its fair-share of a system?

Art Sedighi