Research Interest:
Scheduling Theory, High Performance Computing, hyper-scale/exascale systems, shared systems
Hypothesis
Current scheduling systems lack the ability to deal with a diverse and at times conflicting workloads. Current task scheduling systems use the same method to schedule tasks as job scheduling systems did in the 80’s. Jobs are longer running, and a few seconds does impact the final QoS. Tasks, however, are much shorter in duration and much higher in quantity. The fair-share methodology that has been used in scheduling jobs does not provide an optimal solution for high-performance systems with many clients each with variant levels of workload profiles.
General Idea
Fair-share scheduling is not fair. FS normalizes the workload from from clients irrespective of their task requirement, and schedules tasks based on that normalized view. Schedulers are forced to introduce a new priority constraint which only masks the problem, but do not actually solve it.
We need to treat a shared environment like a Game with clients being the rational players of that game. Players seek selfish behaviour, and as such are interested in gaming the system and starving the other Players. A scheduler acts as a mediator of this Game, and must seek a Social Optimum point amongst the Players. One Concept Solution that we will apply to this Game is Nash’s Equilibrium as it has been shown to seek and a social optimal solution.
Implications
Applying Game Theory to Shared Environments with many clients and many tasks is a new concept. NE has been applied to job level scheduler, and has been shown to demonstrate fair-ness in such environments.
One potential outcome of this research will be our ability to cope in new exascale shared environments in a manner that is fair to all participants. Current scheduling algorithms are all based on some heuristics that assumes that current scheduling decision will lead to an optimal solution for all participants. This is inaccurate as it cannot be tested or verified.
We aim to prove that by seeking a NE point, the system will be in an Socially Optimal and thus a fair state. Furthermore, as Games depend on their movement history, we can verify the correctness of our approach.
Further but less likely implication of this research is that the new Scheduling methodology will have a lower complexity than the current algorithms.