KWOK Benchmark Experiment
This document describes the first benchmark harness implementation under:
experiments/01-kwok-benchmark/
Motivation
The benchmark focuses on repeatable comparisons of scheduler+control behavior across baselines:
- baseline A: no Joulie control path (simulator only),
- baseline B: static partition style,
- baseline C: queue-aware style.
The setup keeps real scheduling semantics while using fake KWOK nodes and simulator-driven telemetry/control effects.
Assumptions
- Kubernetes scheduler and API are real.
- Fake nodes are tainted and selected by workload pods.
- Simulator injects and advances batch work from trace input.
- Experiment scripts orchestrate install/run/collect/plot.
- A/B/C workload fairness is preserved by generating a canonical per-seed trace and deriving baseline A by stripping only power-profile affinity.
What is measured now
Current harness outputs:
- per-run wall runtime proxy (
wall_seconds), - time-scale aware simulated runtime (
sim_seconds), - per-run workload size (
jobs_total), - throughput metrics (
jobs/wall-sec,jobs/sim-sec,jobs/sim-hour), - estimated simulated-time energy from simulator telemetry events (
energy_sim_joules_est,energy_sim_kwh_est), - robust simulator-integrated energy export over all managed nodes (
sim_debug_energy.json) used as primary energy source, - estimated average cluster power (
avg_cluster_power_w_est), - run metadata (baseline, seed, commit, trace hash),
- cluster snapshots/logs for debugging.
Scripts
Main entrypoints:
scripts/00_prereqs_check.shscripts/01_create_cluster_kwokctl.shscripts/02_apply_nodes.shscripts/03_install_components.shscripts/04_run_one.pyscripts/05_sweep.pyscripts/06_collect.pyscripts/07_plot.pyscripts/99_cleanup.sh
Outputs
experiments/01-kwok-benchmark/results/<run_id>/...experiments/01-kwok-benchmark/results/summary.csvexperiments/01-kwok-benchmark/results/plots/*.png
Key plots produced now:
runtime_distribution.png(box+points by baseline, replacing index-based scatter),throughput_vs_energy.png(tradeoff + Pareto frontier),energy_vs_makespan.png(tradeoff with baseline means),baseline_means.png(mean energy / throughput / makespan by baseline).
Notes on energy interpretation
- Energy is computed by integrating per-node telemetry event
packagePowerWattsover time. - Integration is first done in wall time from event timestamps, then scaled by
timeScaleto approximate simulated-time energy. - If telemetry debug events are missing or sparse, energy fields can be empty or less reliable for that run.