Simulator Algorithms

This page documents the core simulator math and control/workload update loop implemented in simulator/cmd/simulator/main.go.

Per-Node State

Each simulated node tracks:

At telemetry/control update time, simulator computes power:

P = P_idle + (P_max - P_idle) * util^alpha * freqScale^beta

Where:

Then applies bounds/cap handling:

Floor P to at least 20W.
Clamp requested cap to [RaplCapMinW, RaplCapMaxW].
If P > cap, solve a cap-feasible frequency scale:
- targetFreq = ((cap - P_idle) / ((P_max - P_idle) * util^alpha))^(1/beta)
- lower-bounded by minFreqScale = FMinMHz/FMaxMHz.
Recompute P using updated FreqScale.
Final clip: P <= cap + RaplHeadW.

CapSaturated=true when even minimum feasible dynamic power remains above cap.

DVFS throttle is not applied as an instantaneous jump.

Given:

Update:

FreqScale = FreqScale + (targetScale - FreqScale) * min(1, dt/rampSec)

Then clamp to [minFreqScale, 1] and set:

ThrottlePct = round((1 - FreqScale) * 100)

Control endpoint supports:

The resulting node dynamics/power are reflected immediately via the same model above.

Each job has:

Node effective speed factor is FreqScale.

Per-job instantaneous speed on node:

speed = C_req * BaseSpeedPerCore * (1 - (1 - FreqScale) * S_cpu)

If k active jobs run on same node, fair-share is approximated by:

progress = speed * dt / max(1, k)

Update:

CPUUnitsRemaining -= progress

Job completes when CPUUnitsRemaining <= 0, then pod is deleted.

At any instant, approximate remaining lifetime of one job can be estimated as:

T_remaining ~= CPUUnitsRemaining / (speed / max(1, k))

So tighter caps / higher throttle / higher sensitivity increase completion time by reducing effective speed.

Simulator also derives class from pod scheduling constraints:

This is used for debug counters and completion metrics labeling.