Meaningful measurement

What is a meaningful measurement of performance of a scheduler?  I am embarking on the next paper which is due in two weeks, and unsure about what I am actually going to write.  Well, I know what I want to write, but not sure about its presentation mechanism.

This is the status of where I am right now…

Alea + GridSim have now successfully been morphed into a new framework called dSim.  Why dSim?  not really sure, but it had a good ring to it, and potentially a commercially viable name.  Anyhow, dSim is capable of creating a dynamic load, creating dynamic set of resources, schedule appropriately, and measure results.  All of this can be done via configuration files and parameters.

One of the results that I am collecting is the makespan for the tasks, and as a result the entire job.  I dont really care about the job itself, but the makespan of a given task.

I am collecting data that is like the following:

Client Name task ID Submission time Finished Time Makespan
Client 1 1034 3000 11000 8000

For a given run, I have 1000’s of these rows.  I have one row entry per task; which makes sense.  The question is then what to do with this data?

Statistical analysis of the data would give me the mean, median, and the STD.  But is that enough?  For example, i may have following:

Client 1:
mean: 33000
STD: 18500

Client 2:
mean: 44000
STD: 9000

Obviously, that’s good info, but is it enough?

The work can potentially be extended to a cloud environment, but for now, the focus is HPC environments.