Architecture on Joulie

CRD and Policy Model

Mon, 01 Jan 0001 00:00:00 +0000

CRD

The implemented APIs are:

Group: joulie.io
Version: v1alpha1
NodePowerProfile (nodepowerprofiles, cluster-scoped) for operator-assigned per-node desired state
TelemetryProfile (telemetryprofiles, cluster-scoped) for telemetry source configuration (node + cluster scope)

CRD files:

config/crd/bases/joulie.io_nodepowerprofiles.yaml
config/crd/bases/joulie.io_telemetryprofiles.yaml

Conceptual model (next step)

Policy should be modeled as a cluster-wide mapping:

input: cluster context at time t
output: node -> power state

Minimal states:

ActivePerformance (mapped to profile performance)
ActiveEco (mapped to profile eco)

Initial implementation should remain rule-based and deterministic. Future implementations can be telemetry-driven or model-driven.

Input Telemetry and Actuation Interfaces

Mon, 01 Jan 0001 00:00:00 +0000

This document defines Joulie’s internal input interfaces for telemetry and control.

Important distinction:

this is not about Prometheus metrics exposed by Joulie,
this is about how Joulie components consume input data and apply controls.

Goals

Run against real hardware in bare metal clusters.
Run in virtual/simulated clusters (kind/kwok) with the same control logic.
Keep APIs generic enough for CPU + GPU + future signals.
Avoid policy/API churn when moving from rule-based to data-driven policies.

Core model (simple)

Joulie uses:

Metrics Reference

Mon, 01 Jan 0001 00:00:00 +0000

Joulie agent exposes Prometheus metrics on /metrics (default :8080).

This document is only for exported observability metrics. For input telemetry and control interfaces (real hardware vs simulated HTTP), see:

Input Telemetry and Actuation Interfaces

Endpoint

Path: /metrics
Address: METRICS_ADDR env var (default :8080)

Backend/Policy

joulie_backend_mode{node,mode} (gauge)
- mode values: none, rapl, dvfs
- Active mode has value 1, others 0
joulie_policy_cap_watts{node,policy} (gauge)
- Current selected policy cap in watts

This metric can also be used to derive policy states in Grafana:

Operator Notes

Mon, 01 Jan 0001 00:00:00 +0000

Target concept

Joulie should evolve into a centralized operator that owns the global optimization loop.

At each control step (for example every minute), the operator:

Reads cluster-wide context.
Decides node-to-power-profile assignments.
Writes desired per-node state.
Monitors outcomes and re-plans.

States start simple:

ActivePerformance (mapped to profile performance): unconstrained / HPC-oriented.
ActiveEco (mapped to profile eco): constrained / energy-saving.

Control responsibility boundary

Operator is the control-plane brain. Agent is an actuator/telemetry component.

Policy Algorithms

Mon, 01 Jan 0001 00:00:00 +0000

This page documents the controller policy algorithms implemented in cmd/operator/main.go.

Classification Input

Policy demand classification is derived from pod scheduling constraints on joulie.io/power-profile:

performance-only: pod can run only on performance/draining-performance.
eco-only: pod can run only on eco.
general (implicit unconstrained): no explicit power-profile constraint, or both profiles allowed.
unknown: unsupported/ambiguous constraint shape.

For safety, unknown is treated as performance-sensitive in downgrade guards.

Shared Reconcile Flow

Each reconcile tick:

Select eligible nodes from NODE_SELECTOR, excluding reserved and unschedulable nodes.
Build a desired plan with the selected policy.
Apply downgrade guard (can convert planned eco to draining-performance/performance).
Write NodePowerProfile and update node label joulie.io/power-profile.

`static_partition`

Goal: deterministic fixed HP/LP split.