Power Simulator

This page describes the simulator runtime mechanics (control/state/energy paths).

The canonical physical model, provenance, and hardware assumptions are documented in:

Hardware Modeling

For workload progression semantics:

Workload Simulator

Scope

The power simulator runtime is responsible for:

keeping per-node control state (CPU cap, DVFS throttle, GPU cap),
applying control actions from /control/{node},
updating dynamics with settling/ramp behavior,
exposing power telemetry on /telemetry/{node},
integrating energy over time (/debug/energy).

Runtime state and controls

Main node state includes:

CPU:
- target/applied cap
- utilization
- effective frequency scale
- throttle target/current
- saturation flag
- instantaneous and averaged power
- temperature and thermal-throttle fraction
GPU:
- per-device cap/target
- per-device instantaneous and averaged power
- per-device temperature and thermal-throttle fraction
- aggregate utilization
- effective performance multiplier
workload-model inputs aggregated from running jobs:
- memoryIntensity
- ioIntensity
- cpuFeedIntensity

Supported control actions:

rapl.set_power_cap_watts
dvfs.set_throttle_pct
gpu.set_power_cap_watts

Telemetry contract actually exposed by the simulator

The simulator returns both a compact node-level view and richer subsystem views on:

GET /telemetry/{node}

Important top-level fields:

packagePowerWatts
- exported averaged node power
instantPackagePowerWatts
- internal instantaneous node power used by the model

Important CPU fields:

cpu.packagePowerWatts
cpu.instantPowerWatts
cpu.utilization
cpu.memoryIntensity
cpu.ioIntensity
cpu.freqScale
cpu.temperatureC
cpu.thermalThrottlePct

Important GPU fields:

gpu.powerWattsTotal
gpu.avgPowerWattsTotal
gpu.utilization
gpu.memoryIntensity
gpu.cpuFeedIntensity
gpu.capWattsPerGpuApplied
gpu.capWattsPerGpuTarget
gpu.devices[]

This distinction matters because the simulator is now intentionally modeling the difference between:

internal fast-changing device power
exported averaged telemetry seen by the controller

Hardware identity and overrides

The simulator runtime no longer depends primarily on pre-enumerated node classes.

Today the intended precedence is:

node labels define simulated hardware identity
shared inventory resolves that identity into a CPU/GPU model
optional SIM_NODE_CLASS_CONFIG overrides refine or override profile parameters

So SIM_NODE_CLASS_CONFIG is still useful, but it is now an override layer rather than the main source of hardware truth.

Model source of truth

This page intentionally avoids duplicating formulas and hardware assumptions. Use Hardware Modeling as source of truth for:

measured vs proxy curves,
CPU/GPU workload-class behavior,
heterogeneous-node normalization semantics,
vendor/API-specific constraints and references.

Operational note

When formulas and this page diverge, simulator behavior and hardware-modeling.md are authoritative; update this page only for runtime flow and interfaces.