KWOK Benchmark Experiment

This document describes the first benchmark harness implementation under:

Motivation

The benchmark focuses on repeatable comparisons of scheduler+control behavior across baselines:

The setup keeps real scheduling semantics while using fake KWOK nodes and simulator-driven telemetry/control effects.

Kubernetes scheduler and API are real.
Fake nodes are tainted and selected by workload pods.
Simulator injects and advances batch work from trace input.
Experiment scripts orchestrate install/run/collect/plot.
A/B/C workload fairness is preserved by generating a canonical per-seed trace and deriving baseline A by stripping only power-profile affinity.

Current harness outputs:

per-run wall runtime proxy (wall_seconds),
time-scale aware simulated runtime (sim_seconds),
per-run workload size (jobs_total),
throughput metrics (jobs/wall-sec, jobs/sim-sec, jobs/sim-hour),
estimated simulated-time energy from simulator telemetry events (energy_sim_joules_est, energy_sim_kwh_est),
robust simulator-integrated energy export over all managed nodes (sim_debug_energy.json) used as primary energy source,
estimated average cluster power (avg_cluster_power_w_est),
run metadata (baseline, seed, commit, trace hash),
cluster snapshots/logs for debugging.

Main entrypoints:

Key plots produced now:

runtime_distribution.png (box+points by baseline, replacing index-based scatter),
throughput_vs_energy.png (tradeoff + Pareto frontier),
energy_vs_makespan.png (tradeoff with baseline means),
baseline_means.png (mean energy / throughput / makespan by baseline).

Energy is computed by integrating per-node telemetry event packagePowerWatts over time.
Integration is first done in wall time from event timestamps, then scaled by timeScale to approximate simulated-time energy.
If telemetry debug events are missing or sparse, energy fields can be empty or less reliable for that run.