Joulie

Core Concepts

Mon, 01 Jan 0001 00:00:00 +0000

Before installing Joulie, understand the control model.

What Joulie is

Joulie is a Kubernetes-native energy management system that uses per-node digital twins to optimize data center power consumption.

It continuously ingests telemetry from every node (CPU/GPU power draw via RAPL and NVML/DCGM, per-pod resource utilization via cAdvisor, and optional energy counters from Kepler) to maintain an up-to-date model of each node’s thermal and power state.

These per-node digital twins drive two outcomes:

Installation

Mon, 01 Jan 0001 00:00:00 +0000

This page covers how to install the Joulie simulator in a Kubernetes cluster.

Prerequisites

A running Kubernetes cluster (real or kind for local development)
kubectl configured for the target cluster
helm v3+ (for Helm installation)

Install via Helm (recommended)

The simulator is published as an OCI Helm chart. Install it with:

helm install joulie-sim oci://registry.cern.ch/mbunino/joulie/joulie-sim \
 -n joulie-system --create-namespace

To customize values, download the default values first:

helm show values oci://registry.cern.ch/mbunino/joulie/joulie-sim > values.yaml

Then install with overrides:

Quickstart

Mon, 01 Jan 0001 00:00:00 +0000

This page is the fastest path to run Joulie. For conceptual context first, read Core Concepts.

Prerequisites

Kubernetes cluster with worker nodes
Node Feature Discovery (NFD) deployed
Optional for real enforcement: nodes exposing writable power interfaces
- RAPL power limit files, or
- cpufreq sysfs interfaces

Install from release (recommended)

Install directly from OCI chart release:

helm upgrade --install joulie oci://registry.cern.ch/mbunino/joulie/joulie \
 --version <version> \
 -n joulie-system \
 --create-namespace \
 -f values/joulie.yaml

Label nodes managed by the operator

Important: Joulie will only target nodes with a specific label, and ignore all the others. By default, install does not auto-select nodes. Default expected selector value is:

Pod Compatibility for Joulie

Mon, 01 Jan 0001 00:00:00 +0000

Joulie uses a single pod annotation to express workload placement intent:

joulie.io/workload-class: performance | standard

The scheduler extender reads this annotation and steers pods accordingly. No node affinity rules are needed.

Workload classes

Class	Behavior
`performance`	Must run on full-power nodes. The extender hard-rejects eco nodes.
`standard`	Default. Can run on any node. Adaptive scoring steers toward eco when performance nodes are congested.

If no annotation is present, the extender treats it as standard.

Agent Runtime Modes

Mon, 01 Jan 0001 00:00:00 +0000

The agent supports two runtime modes:

daemonset: real-hardware mode, one pod per real node.
pool: simulation mode, one pod hosts many logical per-node controllers.

Chart templates:

charts/joulie/templates/agent-daemonset.yaml
charts/joulie/templates/agent-statefulset.yaml

DaemonSet mode (real hardware)

Required runtime settings

securityContext.privileged: true
Host mount:
- host path /sys -> container path /host-sys
Env:
- NODE_NAME from spec.nodeName
- AGENT_MODE=daemonset (default)
- optional RECONCILE_INTERVAL (default 20s)
- optional SIMULATE_ONLY=true (skip host writes, log requested actions)
- optional METRICS_ADDR (default :8080)

Pool mode (KWOK / simulation)

Pool mode preserves per-node semantics but shards logical node controllers across replicas.

CPU Support and Power Capping

Mon, 01 Jan 0001 00:00:00 +0000

Joulie supports node-level CPU power capping through NodeTwin intents enforced by the agent.

Contract model

CPU intent is defined in NodeTwin.spec.cpu:

packagePowerCapWatts (optional absolute cap)
packagePowerCapPctOfMax (optional normalized profile intent)

Precedence:

packagePowerCapWatts if present
otherwise packagePowerCapPctOfMax

Policy behavior

Operator profile assignment remains performance vs eco. CPU cap values are generated per profile and written into NodeTwin.spec:

performance profile typically maps to a higher cap (often 100%)
eco profile maps to a lower cap

For heterogeneous nodes, percentage-based intent remains useful because each node resolves normalized intent using node-local capabilities. If percentage intent cannot be converted to watts (for example missing RAPL range), the agent applies a DVFS percent fallback path when possible.

CRD and Policy Model

Mon, 01 Jan 0001 00:00:00 +0000

This page defines Joulie’s core contract:

demand comes from pod scheduling constraints,
supply is exposed by node power-profile labels,
discovered hardware is published through NodeHardware,
desired state is published through NodeTwin.

APIs

Group/version:

joulie.io/v1alpha1

CRDs:

NodeHardware (nodehardwares, cluster-scoped)
NodeTwin (nodetwins, cluster-scoped)

CRD definitions live in:

config/crd/bases/joulie.io_nodehardwares.yaml
config/crd/bases/joulie.io_nodetwins.yaml

Demand model (workloads)

Workload class is determined from the joulie.io/workload-class pod annotation:

performance demand: pod carries joulie.io/workload-class: performance.
standard demand (default): no annotation, or joulie.io/workload-class: standard. Can run on any node; adaptive scoring steers toward eco when performance nodes are congested.

Supply model (nodes)

Node supply is represented by:

GPU Support (NVIDIA + AMD)

Mon, 01 Jan 0001 00:00:00 +0000

Joulie supports node-level GPU power-cap intents for NVIDIA and AMD.

Validation status

GPU support has been validated in simulator mode only (no bare-metal GPU access yet). The host code paths are designed to work on bare metal (NVIDIA + AMD) when GPU nodes are available.

Contract model

NodeTwin.spec.gpu.powerCap defines a per-GPU cap intent:

scope: perGpu
capWattsPerGpu (absolute, optional)
capPctOfMax (percentage, optional)

Precedence:

capWattsPerGpu if present
otherwise capPctOfMax

The same cap is applied uniformly to all GPUs on the node.

Workload and Power Simulator

Mon, 01 Jan 0001 00:00:00 +0000

The Joulie simulator lets you run full control-loop experiments on virtual clusters without real hardware. It keeps Kubernetes scheduling real while simulating hardware telemetry, power dynamics, and thermal behavior per node.

This page covers the simulator’s architecture, HTTP API, and integration points. Detailed subsystems are documented on dedicated pages linked throughout.

Architecture at a glance

The simulator extends the same control path used on real nodes:

Node labels define simulated hardware identity.
Operator resolves hardware from NodeHardware when available, otherwise from labels/inventory fallback.
Operator writes desired node profile (NodeTwin.spec).
Agent reads desired state and sends control intents.
Simulator emulates telemetry/control behavior per node and exposes HTTP endpoints.
Next reconcile loop reacts to updated simulated state.

The diagram shows the end-to-end loop:

Workload Generation

Mon, 01 Jan 0001 00:00:00 +0000

This page documents how Joulie generates realistic AI workload traces for the simulator.

It is separate from Workload Simulator:

this page explains how traces are generated,
the workload-simulator page explains how those traces are consumed at runtime.

Scope

The current generator is designed to be realistic for:

AI-oriented Kubernetes clusters,
CPU + GPU workloads,
memory-pressure-sensitive jobs,
multi-pod logical workloads such as distributed training and HPO-style experiments.

The current generator does not explicitly model:

Workload Distributions

Mon, 01 Jan 0001 00:00:00 +0000

This page documents the statistical distributions and priors behind the current workload generator.

Use it together with:

What this page is for

The generator is no longer just a flat random-job emitter. It now uses explicit priors for:

arrival timing,
GPU-count skew,
duration shape,
utilization,
memory pressure,
multi-pod workload structure.

This page makes those priors visible and explains why they are reasonable.

1. Arrival model

The current implementation uses a lightweight NHPP-like process:

Kubernetes AI Workloads

Mon, 01 Jan 0001 00:00:00 +0000

This page explains how the logical workload structures used by Joulie map onto common Kubernetes-native AI workload patterns.

It is mainly a documentation page today. The current simulator generator emits the structure metadata and pod-expanded jobs, but it does not yet render PyTorchJob, MPIJob, or Katib Experiment manifests directly.

Why this page exists

The workload-generation report makes an important point:

realistic AI workloads are often not single pods,
and a single logical workload may map to:
- a launcher + workers,
- parameter servers + workers,
- or a controller + many HPO trial pods.

That distinction matters even in a simulator, because power and slowdown should often be understood at the logical workload level, not only at the pod level.

Joulie Operator

Mon, 01 Jan 0001 00:00:00 +0000

The operator is Joulie’s cluster-level decision engine.

It does not write host power interfaces directly. Instead, it decides desired node states and publishes them through Kubernetes objects and labels.

In practice, the operator answers one question over and over: which nodes should currently supply performance capacity, and which can safely supply eco capacity?

Responsibilities

At each reconcile tick, the operator:

selects eligible managed nodes,
reads NodeHardware when available and falls back to node labels when it is not,
resolves hardware identity against the shared inventory,
classifies workload demand from pod scheduling constraints,
runs a policy algorithm (pkg/operator/policy/) to compute a plan,
applies transition guards for safe downgrades,
writes desired node targets (NodeTwin.spec) and the joulie.io/power-profile node label.

The agent then enforces those targets node-by-node.

Workload Simulator

Mon, 01 Jan 0001 00:00:00 +0000

This page documents the workload-side simulation model.

Trace generation methodology, statistical priors, multi-pod workload structure, and workload-generation references are documented in Workload Generation.

Scope

The workload simulator handles:

trace/job ingestion,
pod creation and placement via real scheduler,
per-job progress updates,
completion and pod deletion,
class inference from scheduling constraints.

Power/control dynamics are documented separately in:

Power Simulator

Trace-driven workload model

Enable with:

SIM_WORKLOAD_TRACE_PATH=/path/to/trace.jsonl

The simulator loads type=job records and schedules pods over time according to submit offsets.

Hardware Modeling and Physical Power Model

Mon, 01 Jan 0001 00:00:00 +0000

This page documents how Joulie models CPUs and GPUs across the project using a mix of:

official vendor specifications and management APIs,
public measured power curves, and
explicit proxy models where public exact curves are not yet available.

It serves two closely related purposes:

for the agent, it describes the hardware assumptions used to resolve caps, interpret device limits, and reason about how throttling affects attainable performance
for the simulator, it describes the physical model used to turn utilization and control actions into simulated power and slowdown

Quick summary

If you want the short version before the details:

Joulie Agent

Mon, 01 Jan 0001 00:00:00 +0000

The agent is Joulie’s node-side enforcement component.

It consumes desired state and applies node-local controls through configured backends.

If the operator decides “this node should now behave like eco” or “this node should stay performance”, the agent is the component that turns that intent into concrete control actions.

Responsibilities

At each reconcile tick, the agent:

identifies its node scope (single node in daemonset mode, sharded set in pool mode),
discovers local CPU/GPU hardware and runtime control capability,
publishes NodeHardware for each owned node,
reads desired target (NodeTwin.spec) for each owned node,
resolves telemetry/control backend from environment variables (default: host),
applies controls (host or HTTP),
exports metrics and status.

Inputs and outputs

Inputs:

Power Simulator

Mon, 01 Jan 0001 00:00:00 +0000

This page describes the simulator runtime mechanics (control/state/energy paths).

The canonical physical model, provenance, and hardware assumptions are documented in:

Hardware Modeling

For workload progression semantics:

Workload Simulator

Scope

The power simulator runtime is responsible for:

keeping per-node control state (CPU cap, DVFS throttle, GPU cap),
applying control actions from /control/{node},
updating dynamics with settling/ramp behavior,
exposing power telemetry on /telemetry/{node},
integrating energy over time (/debug/energy).

Runtime state and controls

Main node state includes:

Digital Twin

Mon, 01 Jan 0001 00:00:00 +0000

The digital twin is Joulie’s core predictive engine. It is a lightweight O(1) parametric model that predicts the impact of scheduling and power-cap decisions on node thermal and power state, without running a full simulation for each scheduling decision.

What the digital twin computes

For each managed node, the twin produces three scores stored in NodeTwin.status:

Signal	Range	Meaning
Power headroom	0-100	Remaining power budget before hitting thermal or PSU limits. Higher is better for new workload placement.
CoolingStress	0-100	Predicted percentage of cooling capacity in use. High values indicate the node is near its thermal limit.
PSUStress	0-100	Predicted percentage of PDU/rack power capacity in use. High values indicate the rack is near its power supply limit.

The twin also computes:

Hardware Modeling

Mon, 01 Jan 0001 00:00:00 +0000

This simulator section now treats hardware modeling as a shared hardware concept rather than a simulator-only detail.

The canonical page is:

Hardware Modeling and Physical Power Model

Use that page for:

CPU and GPU model provenance
physical assumptions behind caps and slowdown
heterogeneous-node semantics
current limitations and calibration status

From the simulator point of view, the important relationship is simple:

the simulator implements the modeling assumptions documented there
the agent relies on the same hardware assumptions when interpreting caps and backend limits
simulator runtime pages describe how those models are exercised in experiments

For simulator-specific flow, continue with:

Policy Algorithms

Mon, 01 Jan 0001 00:00:00 +0000

This page documents the controller policy algorithms implemented in pkg/operator/policy/.

Use this page after:

Classification Input

Policy demand classification is derived from the joulie.io/workload-class pod annotation:

performance: pod carries joulie.io/workload-class: performance.
standard (default): no annotation or joulie.io/workload-class: standard.

Shared Reconcile Flow

Each reconcile tick:

Select eligible nodes from NODE_SELECTOR, excluding reserved and unschedulable nodes.
Build a hardware view from NodeHardware when available, otherwise from node labels/inventory fallback.
Sort eligible nodes by normalized compute density (highest first).
Preserve at least one performance-capable node per discovered hardware family whenever the requested HP count allows it.
Build a desired plan with the selected policy.
Apply downgrade guard (sets NodeTwin.status.schedulableClass to draining while blocking pods still run).
Write NodeTwin.spec and update the joulie.io/power-profile node label.

In other words, policies still decide how many high-performance nodes are needed, but the density-aware ordering influences which nodes get those assignments.

Scheduler Extender

Mon, 01 Jan 0001 00:00:00 +0000

Joulie ships a scheduler extender that steers workloads toward appropriate nodes based on power profile, thermal stress, and hardware capabilities.

How a pod gets scheduled (end-to-end)

When a new pod is created in the cluster, the following sequence occurs:

1. Pod created (e.g., kubectl apply, Job controller, Deployment rollout)
 |
2. kube-scheduler picks up the unscheduled pod
 |
3. kube-scheduler runs its default filters (resource fits, taints, affinity)
 |
4. kube-scheduler calls Joulie's /filter endpoint
 | - Sends: pod spec + candidate node list
 | - Joulie reads pod annotation joulie.io/workload-class
 | - Performance pods: reject nodes with schedulableClass = eco or draining
 | - Standard pods: pass all nodes
 | - Returns: filtered node list + rejection reasons
 |
5. kube-scheduler calls Joulie's /prioritize endpoint
 | - Sends: pod spec + surviving node list
 | - Joulie reads NodeTwin CRs (cached, 30s TTL) for power state
 | - Joulie reads NodeHardware CRs (cached, 30s TTL) for hardware specs
 | - Joulie extracts pod CPU/GPU requests for marginal power estimation
 | - Joulie scores each node 0-100 using the scoring formula
 | - Returns: list of (node, score) pairs
 |
6. kube-scheduler combines Joulie scores with its own plugin scores
 |
7. Pod is bound to the highest-scoring node

The extender participates in steps 4 and 5 only. It does not replace the Kubernetes scheduler — it extends it with energy-aware filter and scoring logic.

Simulator Metrics

Mon, 01 Jan 0001 00:00:00 +0000

This page documents Prometheus metrics exposed by the simulator (simulator/cmd/simulator/main.go).

Endpoint:

path: /metrics
address: simulator HTTP listen address (SIM_ADDR, default :18080)

Related debug endpoints (non-Prometheus):

/debug/nodes
/debug/events
/debug/energy

HTTP/request metrics

joulie_sim_requests_total{route,method,status} (counter)
- total HTTP requests by route/method/status
joulie_sim_request_duration_seconds{route,method} (histogram)
- request latency

Control-path metrics

joulie_sim_controls_total{node,action} (counter)
- received control actions by node/action
joulie_sim_control_actions_total{node,action,result} (counter)
- control action outcomes
- result: applied|blocked|error

Per-node simulated state metrics

joulie_sim_node_cap_watts{node} (gauge)
- current simulated effective cap
joulie_sim_node_rapl_cap_watts{node} (gauge)
- simulated RAPL cap value
joulie_sim_node_throttle_pct{node} (gauge)
- simulated DVFS throttle percent
joulie_sim_node_power_watts{node} (gauge)
- simulated exported node power
joulie_sim_node_cpu_util{node} (gauge)
- simulated CPU utilization
joulie_sim_node_freq_scale{node} (gauge)
- simulated frequency scale
joulie_sim_node_running_pods{node} (gauge)
- running pods observed on the node
joulie_sim_node_class_info{node,class} (gauge)
- class assignment marker (1 on active class)

Workload/job metrics

joulie_sim_job_submitted_total{class} (counter)
- jobs submitted by class
joulie_sim_job_completed_total{class,node} (counter)
- jobs completed by class and node
joulie_sim_job_completion_seconds (histogram)
- job completion latency distribution

Notes

Prometheus metrics capture online simulator state and request/control behavior.
Integrated node/cluster energy totals are exposed through /debug/energy (JSON), not as Prometheus time series in the current implementation.
Richer thermal and averaged-vs-instantaneous details are currently exposed through the HTTP telemetry/debug endpoints rather than as separate Prometheus gauges.
In particular, fields such as instantPackagePowerWatts, cpu.temperatureC, cpu.thermalThrottlePct, and per-device GPU averaged power live in /telemetry/{node} and /debug/nodes.

Energy-Aware Scheduling

Mon, 01 Jan 0001 00:00:00 +0000

Joulie’s scheduler extender makes placement decisions informed by real-time energy telemetry, workload characteristics, and facility-level power conditions. This page describes the full pipeline from metrics collection through scoring and optional rescheduling.

End-to-end pipeline

The energy-aware scheduling pipeline has five stages:

Kepler + RAPL/NVML telemetry
 -> Prometheus (scrape & store)
 -> Digital twin (NodeTwin.status)
 -> Scheduler extender (filter + score)
 -> Placement decision

Each stage runs independently and communicates through Kubernetes CRDs or Prometheus queries. There is no monolithic scheduling engine; each component does one thing and feeds the next.

Configuration Reference

Mon, 01 Jan 0001 00:00:00 +0000

Complete reference for all Joulie environment variables. These are set via Helm values or directly in the Deployment/DaemonSet manifests.

Defaults listed below are the code defaults. The Helm chart (charts/joulie/values.yaml) overrides some of them — notably, the operator NODE_SELECTOR defaults to joulie.io/managed=true in the chart even though the code default is node-role.kubernetes.io/worker.

Agent

Variable	Default	Description
`AGENT_MODE`	`daemonset`	`daemonset` (one agent per node) or `pool` (shared agents with sharding)
`NODE_NAME`	(required in daemonset mode)	Name of the node this agent manages
`RECONCILE_INTERVAL`	`20s`	How often the agent reconciles desired state
`METRICS_ADDR`	`:8080`	Address for the Prometheus metrics endpoint
`SIMULATE_ONLY`	`false`	If `true`, agent discovers hardware but does not apply power caps
`HARDWARE_CATALOG_PATH`	`simulator/catalog/hardware.yaml`	Path to the hardware inventory catalog YAML

Agent pool mode

Variable	Default	Description
`POOL_NODE_SELECTOR`	`joulie.io/managed=true`	Label selector for nodes managed by pool agents
`POOL_SHARDS`	`1`	Total number of shards for pool mode partitioning
`POOL_SHARD_ID`	(from pod ordinal)	Shard ID for this agent instance

Agent DVFS control

Variable	Default	Description
`DVFS_EMA_ALPHA`	`0.3`	Exponential moving average smoothing factor for power tracking
`DVFS_HIGH_MARGIN_W`	`10.0`	Power above cap (watts) to trigger frequency reduction
`DVFS_LOW_MARGIN_W`	`15.0`	Power below cap (watts) to trigger frequency increase
`DVFS_STEP_PCT`	`10`	Frequency throttle step size (%)
`DVFS_COOLDOWN`	`20s`	Minimum duration between DVFS adjustments
`DVFS_TRIP_COUNT`	`2`	Consecutive samples outside margin before acting
`DVFS_MIN_FREQ_KHZ`	`1500000`	Floor frequency for DVFS throttling (kHz)

Agent telemetry and control backends

Variable	Default	Description
`TELEMETRY_CPU_SOURCE`	`host`	CPU telemetry source: `host`, `http`, `prometheus`, `none`
`TELEMETRY_CPU_CONTROL`	`host`	CPU control backend: `host`, `http`, `none`
`TELEMETRY_GPU_CONTROL`	`host`	GPU control backend: `host`, `http`, `none`
`TELEMETRY_CPU_HTTP_ENDPOINT`	(empty)	HTTP endpoint for CPU telemetry (e.g., `http://sim:18080/telemetry/{node}`)
`TELEMETRY_CPU_CONTROL_HTTP_ENDPOINT`	(empty)	HTTP endpoint for CPU control (e.g., `http://sim:18080/control/{node}`)
`TELEMETRY_CPU_CONTROL_MODE`	(empty)	CPU control mode override
`TELEMETRY_GPU_CONTROL_HTTP_ENDPOINT`	(empty)	HTTP endpoint for GPU control (e.g., `http://sim:18080/control/{node}`)
`TELEMETRY_GPU_CONTROL_MODE`	(empty)	GPU control mode override
`TELEMETRY_HTTP_TIMEOUT_SECONDS`	`5`	HTTP client timeout for telemetry/control requests

Operator

Variable	Default	Description
`RECONCILE_INTERVAL`	`1m`	How often the operator reconciles cluster state
`METRICS_ADDR`	`:8081`	Address for the Prometheus metrics endpoint
`NODE_SELECTOR`	`node-role.kubernetes.io/worker`	Label selector for managed nodes
`RESERVED_LABEL_KEY`	`joulie.io/reserved`	Label key for nodes excluded from policy decisions
`POWER_PROFILE_LABEL`	`joulie.io/power-profile`	Node label key for the active power profile
`OPERATOR_NODE_POWER_SOURCE`	`static`	Node power data source: `static`, `http`, `prometheus`
`OPERATOR_NODE_POWER_HTTP_ENDPOINT`	(empty)	HTTP endpoint for per-node power readings
`OPERATOR_NODE_POWER_PROMETHEUS_ADDRESS`	(empty)	Prometheus address for per-node power queries
`OPERATOR_NODE_POWER_PROMETHEUS_QUERY`	(empty)	PromQL query for per-node power readings

Power cap configuration

Variable	Default	Description
`PERFORMANCE_CAP_WATTS`	`5000`	Absolute CPU power cap for performance nodes (watts)
`ECO_CAP_WATTS`	`120`	Absolute CPU power cap for eco nodes (watts)
`CPU_PERFORMANCE_CAP_PCT_OF_MAX`	`100`	CPU cap as percentage of max for performance nodes
`CPU_ECO_CAP_PCT_OF_MAX`	`60`	CPU cap as percentage of max for eco nodes
`CPU_WRITE_ABSOLUTE_CAPS`	`false`	If `true`, write absolute watts instead of percentage
`GPU_PERFORMANCE_CAP_PCT_OF_MAX`	`100`	GPU cap as percentage of max for performance nodes
`GPU_ECO_CAP_PCT_OF_MAX`	`60`	GPU cap as percentage of max for eco nodes
`GPU_WRITE_ABSOLUTE_CAPS`	`false`	If `true`, write absolute GPU watts instead of percentage
`GPU_MODEL_CAPS_JSON`	`{}`	JSON map of GPU model name to `{"minCapWatts": N, "maxCapWatts": M}`
`GPU_PRODUCT_LABEL_KEYS`	`joulie.io/gpu.product,...`	Comma-separated node label keys to read GPU product name

Policy configuration

Variable	Default	Description
`POLICY_TYPE`	`static_partition`	Policy algorithm: `static_partition`, `queue_aware_v1`, or `rule_swap_v1`
`STATIC_HP_FRAC`	`0.50`	Fraction of nodes allocated to performance in `static_partition`
`QUEUE_HP_BASE_FRAC`	`0.60`	Base fraction of performance nodes in `queue_aware_v1`
`QUEUE_HP_MIN`	`1`	Minimum performance nodes in `queue_aware_v1`
`QUEUE_HP_MAX`	`1000000`	Maximum performance nodes in `queue_aware_v1`
`QUEUE_PERF_PER_HP_NODE`	`10`	Performance pods per performance node ratio in `queue_aware_v1`

Facility metrics

Variable	Default	Description
`ENABLE_FACILITY_METRICS`	`false`	Enable polling data-center-level metrics from Prometheus
`FACILITY_PROMETHEUS_ADDRESS`	`http://prometheus-operated.monitoring:9090`	Prometheus endpoint for facility metric queries
`FACILITY_POLL_INTERVAL`	`30s`	How often facility metrics are polled
`FACILITY_AMBIENT_TEMP_METRIC`	`datacenter_ambient_temperature_celsius`	PromQL metric name for ambient temperature
`FACILITY_IT_POWER_METRIC`	`datacenter_total_it_power_watts`	PromQL metric name for total IT power draw
`FACILITY_COOLING_POWER_METRIC`	`datacenter_cooling_power_watts`	PromQL metric name for cooling infrastructure power
`FACILITY_ZONE_AMBIENT_METRIC_TEMPLATE`	(empty)	PromQL template for per-zone ambient temperature, e.g. `datacenter_ambient_temperature_celsius{zone="%s"}`. Use `%s` as the zone name placeholder. Empty = disabled. (planned — not yet wired to env vars)
`FACILITY_RACK_POWER_METRIC_TEMPLATE`	(empty)	PromQL template for per-rack power draw, e.g. `datacenter_rack_power_watts{rack="%s"}`. Use `%s` as the rack name placeholder. Empty = disabled. (planned — not yet wired to env vars)

Node topology

Joulie supports optional per-rack PSU stress and per-zone cooling stress. This is activated by adding standard node labels:

Input Telemetry and Actuation Interfaces

Mon, 01 Jan 0001 00:00:00 +0000

This page describes runtime IO contracts:

how Joulie reads telemetry inputs,
how Joulie sends control intents.

If you want the CRD-level summary first, read CRD and Policy Model. This page is the detailed runtime reference for the telemetry and control contract.

It is not the /metrics exposition contract. For exported metrics, see Metrics Reference.

Why this abstraction exists

Joulie must run in two worlds with the same control logic:

real hardware clusters,
simulator/KWOK clusters.

So agent/operator logic depends on provider interfaces, not directly on sysfs or simulator HTTP shape.

Metrics Reference

Mon, 01 Jan 0001 00:00:00 +0000

Joulie exposes Prometheus metrics from multiple components.

This page covers operator + agent + scheduler extender metrics. Simulator metrics are documented separately in:

Simulator Metrics

For telemetry/control input interfaces (host/http routing), see:

Input Telemetry and Actuation Interfaces

Endpoints by component

Agent:
- path: /metrics
- default address: :8080
- env override: METRICS_ADDR
Operator:
- path: /metrics
- default address: :8081
- env override: METRICS_ADDR
Scheduler extender:
- path: /metrics
- default address: :9877
- env override: METRICS_ADDR

Agent metrics

Backend and selected cap

joulie_backend_mode{node,mode} (gauge)
- mode: none|rapl|dvfs
- active mode is 1, others 0
joulie_policy_cap_watts{node,policy} (gauge)
- current selected policy cap in watts

RAPL power/energy

joulie_rapl_energy_uj{node,zone} (gauge)
- latest raw RAPL energy counter in microjoules
joulie_rapl_estimated_power_watts{node,zone} (gauge)
- per-zone estimated power from energy deltas
joulie_rapl_package_total_power_watts{node} (gauge)
- sum of package-level estimated power

DVFS controller

joulie_dvfs_observed_power_watts{node} (gauge)
- observed package power used by DVFS controller
joulie_dvfs_ema_power_watts{node} (gauge)
- EMA-smoothed power used for decisions
joulie_dvfs_throttle_pct{node} (gauge)
- current throttle percentage
joulie_dvfs_above_trip_count{node} (gauge)
- consecutive above-threshold samples
joulie_dvfs_below_trip_count{node} (gauge)
- consecutive below-threshold samples
joulie_dvfs_actions_total{node,action} (counter)
- action: throttle_up|throttle_down

CPU frequency observability

joulie_dvfs_cpu_cur_freq_khz{node,cpu} (gauge)
- current CPU/policy frequency in kHz
joulie_dvfs_cpu_max_freq_khz{node,cpu} (gauge)
- enforced max frequency cap in kHz

Reliability

joulie_reconcile_errors_total{node} (counter)
- reconcile-loop errors

Operator metrics

FSM state and profile label

joulie_operator_node_state{node,state} (gauge)
- state: ActivePerformance|DrainingPerformance|ActiveEco
- active state is 1, others 0
joulie_operator_node_profile_label{node,profile} (gauge)
- operator-applied node label view
- profile: performance|eco
- active profile is 1, others 0

Transition accounting

joulie_operator_state_transitions_total{node,from_state,to_state,result} (counter)
- transition events emitted by operator
- result:
  - applied: transition committed
  - deferred: transition blocked/deferred by safeguards

Heterogeneous planning

joulie_operator_node_compute_density{node,component} (gauge)
- normalized per-node density signal used for heterogeneous planning
- component: cpu|gpu
- higher values mean the operator considers that node relatively denser for that subsystem

Scheduler extender metrics

Request counters

joulie_scheduler_filter_requests_total{workload_class} (counter)
- total filter requests by workload class
- workload_class: standard|performance
joulie_scheduler_prioritize_requests_total{workload_class} (counter)
- total prioritize (scoring) requests by workload class

Request latency

joulie_scheduler_filter_duration_seconds{workload_class} (histogram)
- time to process a filter request
joulie_scheduler_prioritize_duration_seconds{workload_class} (histogram)
- time to process a prioritize request

Scoring signals

joulie_scheduler_final_node_score{node,workload_class} (gauge)
- final scheduling score (0-100) for each node and workload class
- updated on every prioritize call; reflects the combined headroom + cooling + trend + bonus formula
joulie_scheduler_node_headroom_score{node} (gauge)
- power headroom score per node
- can go negative when projected power (measured + pod marginal) exceeds the capped budget

Data freshness

joulie_scheduler_stale_twin_data{node} (gauge)
- 1 if the NodeTwin status is older than the staleness threshold (default 5m), 0 otherwise
- a node with stale data receives a neutral score (50) instead of its computed value
- useful for alerting when the operator has stopped updating twin status

Notes

Metrics are pull-based; values depend on scrape interval.
Highest cardinality is usually per-CPU frequency series.

CPU-Only Benchmark

Mon, 01 Jan 0001 00:00:00 +0000

This page reports results from the CPU-only cluster benchmark experiment (KWOK, 40 nodes):

experiments/01-cpu-only-benchmark/

Scope

The benchmark compares three baselines on a CPU-only cluster with 40 KWOK nodes across 3 hardware families, running on a real Kind+KWOK Kubernetes cluster:

A: Simulator only (no power management)
B: Joulie with static partition policy
C: Joulie with queue-aware dynamic policy

The experiment demonstrates energy savings achievable through CPU RAPL capping alone, without GPU complexity.

Heterogeneous GPU Cluster Benchmark

Mon, 01 Jan 0001 00:00:00 +0000

This page reports results from the heterogeneous GPU cluster benchmark experiment (KWOK, 41 nodes):

experiments/02-heterogeneous-benchmark/

Scope

The benchmark compares three baselines on a heterogeneous GPU cluster mixing 5 distinct GPU hardware families plus CPU-only nodes, running on a real Kind+KWOK Kubernetes cluster:

A: Simulator only (no power management)
B: Joulie with static partition policy
C: Joulie with queue-aware dynamic policy

The experiment demonstrates energy savings achievable through combined CPU and GPU RAPL capping on a mixed-vendor GPU fleet.

Homogeneous H100 NVL Benchmark

Mon, 01 Jan 0001 00:00:00 +0000

This page reports results from the homogeneous H100 NVL cluster benchmark experiment (KWOK, 41 nodes):

experiments/03-homogeneous-h100-benchmark/

Scope

The benchmark compares three baselines on a homogeneous GPU cluster with 33 identical NVIDIA H100 NVL nodes plus 8 CPU-only nodes, running on a real Kind+KWOK Kubernetes cluster:

A: Simulator only (no power management)
B: Joulie with static partition policy
C: Joulie with queue-aware dynamic policy

Hypothesis

Joulie performs better on a homogeneous cluster because every GPU node can accept any GPU job, eliminating the vendor/product-specific placement constraints that restrict policy flexibility in the heterogeneous case.

Scoring Formula Validation

Mon, 01 Jan 0001 00:00:00 +0000

This page reports results from the energy-aware scheduling formula validation experiment:

experiments/04-scoring-formula-validation/

Objective

Validate Joulie’s energy-aware scheduling formula by demonstrating that power-aware scheduling improves energy efficiency compared to standard Kubernetes bin-packing (MostAllocated), using a Modelica FMU (DXCooled Airside Economizer) for physically-accurate cooling/PUE computation.

Two scales tested:

Small cluster (28 nodes) — formula tuning and component selection
Large cluster (2,500 nodes) — production-scale validation with H100 GPUs

The experiment also validated the evolution from a legacy multi-component formula to the current streamlined Joulie scoring formula.