Joulie Operator
The operator is Joulie’s cluster-level decision engine.
It does not write host power interfaces directly. Instead, it decides desired node states and publishes them through Kubernetes objects and labels.
In practice, the operator answers one question over and over:
which nodes should currently supply performance capacity, and which can safely supply eco capacity?
Responsibilities
At each reconcile tick, the operator:
- selects eligible managed nodes,
- reads
NodeHardwarewhen available and falls back to node labels when it is not, - resolves hardware identity against the shared inventory,
- classifies workload demand from pod scheduling constraints,
- runs a policy algorithm (
pkg/operator/policy/) to compute a plan, - applies transition guards for safe downgrades,
- writes desired node targets (
NodeTwin.spec) and thejoulie.io/power-profilenode label.
The agent then enforces those targets node-by-node.
In addition to the reconcile loop, the operator runs a background controller:
Facility metrics
The facility metrics poller (cmd/operator/facility.go) queries Prometheus for data-center-level signals: ambient temperature, total IT power, and cooling power. These feed into the twin computation for PUE estimation and cooling stress refinement.
Disabled by default (ENABLE_FACILITY_METRICS=false). When enabled, the poller runs every FACILITY_POLL_INTERVAL (default 30s) and computes PUE as (IT power + cooling power) / IT power. The ambient temperature is passed to the twin’s LinearCoolingModel for temperature-aware stress scoring. The scheduler extender then weights marginal power costs by PUE.
See Configuration Reference for the full list of facility env vars.
Control boundary with the agent
- operator decides what each node should be
- agent decides how to apply the corresponding controls on that node
This separation keeps policy logic portable while actuator details stay node-local.
Reconcile flow
- Read nodes matching
NODE_SELECTOR(chart default:joulie.io/managed=true). - Ignore reserved/unschedulable nodes.
- Build a normalized hardware view:
- prefer
NodeHardware - otherwise derive hardware identity from node labels / allocatable resources
- resolve CPU/GPU models against the inventory
- compute per-node CPU/GPU density signals
- prefer
- Build demand view from active pods:
- performance-constrained
- eco-constrained
- unconstrained
- Sort eligible nodes by normalized compute density (CPU + GPU), highest first.
- Run policy (
static_partition,queue_aware_v1, or debugrule_swap_v1). - For planned
performance -> ecotransitions, run downgrade guard:- publish
profile=ecoas desired state - set
NodeTwin.status.schedulableClasstodrainingwhile performance pods are still present
- publish
- Persist desired state through
NodeTwin.specand update thejoulie.io/power-profilenode label.
The important distinction is:
NodeTwin.specexpresses desired target state for enforcement,joulie.io/power-profilenode label expresses the current power profile,NodeTwin.statusholds twin output includingschedulableClass, which expresses transition state (includingdraining) for the scheduler extender.
Power intent configuration knobs
Operator intent emission is controlled by env vars:
- CPU:
CPU_WRITE_ABSOLUTE_CAPS(true|false)CPU_PERFORMANCE_CAP_PCT_OF_MAXCPU_ECO_CAP_PCT_OF_MAXPERFORMANCE_CAP_WATTSECO_CAP_WATTS
- GPU:
GPU_PERFORMANCE_CAP_PCT_OF_MAXGPU_ECO_CAP_PCT_OF_MAXGPU_WRITE_ABSOLUTE_CAPS(true|false)GPU_MODEL_CAPS_JSONGPU_PRODUCT_LABEL_KEYS
High-level behavior:
- CPU:
- when
CPU_WRITE_ABSOLUTE_CAPS=false, operator writes normalized percentage intent, - when
CPU_WRITE_ABSOLUTE_CAPS=true, operator writes absolute watts intent.
- when
- GPU:
- when
GPU_WRITE_ABSOLUTE_CAPS=false, operator writes percentage intent, - when
GPU_WRITE_ABSOLUTE_CAPS=true, operator may write resolvedcapWattsPerGpuin addition tocapPctOfMax, when model-based mapping is available.
- when
This is why GPU NodeTwin.spec objects may contain both normalized intent and resolved absolute caps at the same time.
Heterogeneous planning
The operator is now inventory-aware.
Its first heterogeneous-planning input is a normalized compute-density score built from:
- recognized CPU model + socket/core shape
- recognized GPU model + GPU count
This score is used to order eligible nodes before policy assignment.
So, for the same policy parameters, denser nodes are preferred first for performance supply.
If NodeHardware is not available yet:
- the operator derives a best-effort hardware view from labels such as
joulie.io/hw.cpu-model,joulie.io/hw.gpu-model,joulie.io/hw.gpu-count, - and from allocatable extended resources (
nvidia.com/gpu,amd.com/gpu).
That keeps simulator-first and bootstrap scenarios working without making NodeHardware a hand-authored prerequisite.
Node state model
Joulie models two scheduler-facing supply states:
performanceeco
DrainingPerformance is an internal operator FSM state tracked via NodeTwin.status.schedulableClass = "draining".
That state means:
- the operator wants the node to end up in eco,
- the transition is still guarded because performance pods are still present,
- the scheduler extender sees
schedulableClass: drainingand applies a score penalty to avoid placing new workloads on the node.
Why this model
- scheduler gets clear supply signal from node labels,
- policy can evolve independently of host control implementation,
- transitions are auditable and safer than instant downgrade.