<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Joulie</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/</link><description>Recent content on Joulie</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/index.xml" rel="self" type="application/rss+xml"/><item><title>Core Concepts</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/core-concepts/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/core-concepts/</guid><description>&lt;p&gt;Before installing Joulie, understand the control model.&lt;/p&gt;
&lt;h2 id="what-joulie-is"&gt;What Joulie is&lt;/h2&gt;
&lt;p&gt;Joulie is a Kubernetes-native energy management system that uses &lt;strong&gt;per-node digital twins&lt;/strong&gt; to optimize data center power consumption.&lt;/p&gt;
&lt;p&gt;It continuously ingests telemetry from every node (CPU/GPU power draw via RAPL and NVML/DCGM, per-pod resource utilization via cAdvisor, and optional energy counters from &lt;a href="https://github.com/sustainable-computing-io/kepler"&gt;Kepler&lt;/a&gt;) to maintain an up-to-date model of each node&amp;rsquo;s thermal and power state.&lt;/p&gt;
&lt;p&gt;These per-node digital twins drive two outcomes:&lt;/p&gt;</description></item><item><title>Installation</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/installation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/installation/</guid><description>&lt;p&gt;This page covers how to install the Joulie simulator in a Kubernetes cluster.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A running Kubernetes cluster (real or &lt;a href="https://kind.sigs.k8s.io/"&gt;kind&lt;/a&gt; for local development)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubectl&lt;/code&gt; configured for the target cluster&lt;/li&gt;
&lt;li&gt;&lt;code&gt;helm&lt;/code&gt; v3+ (for Helm installation)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="install-via-helm-recommended"&gt;Install via Helm (recommended)&lt;/h2&gt;
&lt;p&gt;The simulator is published as an OCI Helm chart. Install it with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm install joulie-sim oci://registry.cern.ch/mbunino/joulie/joulie-sim &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -n joulie-system --create-namespace
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To customize values, download the default values first:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm show values oci://registry.cern.ch/mbunino/joulie/joulie-sim &amp;gt; values.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then install with overrides:&lt;/p&gt;</description></item><item><title>Quickstart</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/quickstart/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/quickstart/</guid><description>&lt;p&gt;This page is the fastest path to run Joulie.
For conceptual context first, read &lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/core-concepts/"&gt;Core Concepts&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes cluster with worker nodes&lt;/li&gt;
&lt;li&gt;Node Feature Discovery (NFD) deployed&lt;/li&gt;
&lt;li&gt;Optional for real enforcement: nodes exposing writable power interfaces
&lt;ul&gt;
&lt;li&gt;RAPL power limit files, or&lt;/li&gt;
&lt;li&gt;cpufreq sysfs interfaces&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="install-from-release-recommended"&gt;Install from release (recommended)&lt;/h2&gt;
&lt;p&gt;Install directly from OCI chart release:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm upgrade --install joulie oci://registry.cern.ch/mbunino/joulie/joulie &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --version &amp;lt;version&amp;gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -n joulie-system &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --create-namespace &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -f values/joulie.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="label-nodes-managed-by-the-operator"&gt;Label nodes managed by the operator&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: Joulie will only target nodes with a specific label, and ignore
all the others. By default, install does not auto-select nodes.
Default expected selector value is:&lt;/p&gt;</description></item><item><title>Pod Compatibility for Joulie</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/workload-compatibility/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/workload-compatibility/</guid><description>&lt;p&gt;Joulie uses a single pod annotation to express workload placement intent:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;joulie.io/workload-class: performance | standard
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The scheduler extender reads this annotation and steers pods accordingly. No node affinity rules are needed.&lt;/p&gt;
&lt;h2 id="workload-classes"&gt;Workload classes&lt;/h2&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Class&lt;/th&gt;
 &lt;th&gt;Behavior&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;performance&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Must run on full-power nodes. The extender hard-rejects eco nodes.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;standard&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Default. Can run on any node. Adaptive scoring steers toward eco when performance nodes are congested.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If no annotation is present, the extender treats it as &lt;code&gt;standard&lt;/code&gt;.&lt;/p&gt;</description></item><item><title>Agent Runtime Modes</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/daemonset/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/daemonset/</guid><description>&lt;p&gt;The agent supports two runtime modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;daemonset&lt;/code&gt;: real-hardware mode, one pod per real node.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pool&lt;/code&gt;: simulation mode, one pod hosts many logical per-node controllers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Chart templates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;charts/joulie/templates/agent-daemonset.yaml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;charts/joulie/templates/agent-statefulset.yaml&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="daemonset-mode-real-hardware"&gt;DaemonSet mode (real hardware)&lt;/h2&gt;
&lt;h3 id="required-runtime-settings"&gt;Required runtime settings&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;securityContext.privileged: true&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Host mount:
&lt;ul&gt;
&lt;li&gt;host path &lt;code&gt;/sys&lt;/code&gt; -&amp;gt; container path &lt;code&gt;/host-sys&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Env:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;NODE_NAME&lt;/code&gt; from &lt;code&gt;spec.nodeName&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AGENT_MODE=daemonset&lt;/code&gt; (default)&lt;/li&gt;
&lt;li&gt;optional &lt;code&gt;RECONCILE_INTERVAL&lt;/code&gt; (default &lt;code&gt;20s&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;optional &lt;code&gt;SIMULATE_ONLY=true&lt;/code&gt; (skip host writes, log requested actions)&lt;/li&gt;
&lt;li&gt;optional &lt;code&gt;METRICS_ADDR&lt;/code&gt; (default &lt;code&gt;:8080&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="pool-mode-kwok--simulation"&gt;Pool mode (KWOK / simulation)&lt;/h2&gt;
&lt;p&gt;Pool mode preserves per-node semantics but shards logical node controllers across replicas.&lt;/p&gt;</description></item><item><title>CPU Support and Power Capping</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/hardware/cpus/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/hardware/cpus/</guid><description>&lt;p&gt;Joulie supports node-level CPU power capping through &lt;code&gt;NodeTwin&lt;/code&gt; intents enforced by the agent.&lt;/p&gt;
&lt;h2 id="contract-model"&gt;Contract model&lt;/h2&gt;
&lt;p&gt;CPU intent is defined in &lt;code&gt;NodeTwin.spec.cpu&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;packagePowerCapWatts&lt;/code&gt; (optional absolute cap)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;packagePowerCapPctOfMax&lt;/code&gt; (optional normalized profile intent)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Precedence:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;packagePowerCapWatts&lt;/code&gt; if present&lt;/li&gt;
&lt;li&gt;otherwise &lt;code&gt;packagePowerCapPctOfMax&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="policy-behavior"&gt;Policy behavior&lt;/h2&gt;
&lt;p&gt;Operator profile assignment remains &lt;code&gt;performance&lt;/code&gt; vs &lt;code&gt;eco&lt;/code&gt;.
CPU cap values are generated per profile and written into &lt;code&gt;NodeTwin.spec&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;performance profile typically maps to a higher cap (often 100%)&lt;/li&gt;
&lt;li&gt;eco profile maps to a lower cap&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For heterogeneous nodes, percentage-based intent remains useful because each node resolves normalized intent using node-local capabilities.
If percentage intent cannot be converted to watts (for example missing RAPL range), the agent applies a DVFS percent fallback path when possible.&lt;/p&gt;</description></item><item><title>CRD and Policy Model</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/policy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/policy/</guid><description>&lt;p&gt;This page defines Joulie&amp;rsquo;s core contract:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;demand&lt;/strong&gt; comes from pod scheduling constraints,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;supply&lt;/strong&gt; is exposed by node power-profile labels,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;discovered hardware&lt;/strong&gt; is published through &lt;code&gt;NodeHardware&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;desired state&lt;/strong&gt; is published through &lt;code&gt;NodeTwin&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="apis"&gt;APIs&lt;/h2&gt;
&lt;p&gt;Group/version:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie.io/v1alpha1&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CRDs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;NodeHardware&lt;/code&gt; (&lt;code&gt;nodehardwares&lt;/code&gt;, cluster-scoped)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;NodeTwin&lt;/code&gt; (&lt;code&gt;nodetwins&lt;/code&gt;, cluster-scoped)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CRD definitions live in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;config/crd/bases/joulie.io_nodehardwares.yaml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;config/crd/bases/joulie.io_nodetwins.yaml&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="demand-model-workloads"&gt;Demand model (workloads)&lt;/h2&gt;
&lt;p&gt;Workload class is determined from the &lt;code&gt;joulie.io/workload-class&lt;/code&gt; pod annotation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;performance&lt;/code&gt; demand: pod carries &lt;code&gt;joulie.io/workload-class: performance&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;standard&lt;/code&gt; demand (default): no annotation, or &lt;code&gt;joulie.io/workload-class: standard&lt;/code&gt;. Can run on any node; adaptive scoring steers toward eco when performance nodes are congested.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="supply-model-nodes"&gt;Supply model (nodes)&lt;/h2&gt;
&lt;p&gt;Node supply is represented by:&lt;/p&gt;</description></item><item><title>GPU Support (NVIDIA + AMD)</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/hardware/gpus/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/hardware/gpus/</guid><description>&lt;p&gt;Joulie supports node-level GPU power-cap intents for NVIDIA and AMD.&lt;/p&gt;
&lt;h2 id="validation-status"&gt;Validation status&lt;/h2&gt;
&lt;p&gt;GPU support has been validated in simulator mode only (no bare-metal GPU access yet).
The host code paths are designed to work on bare metal (NVIDIA + AMD) when GPU nodes are available.&lt;/p&gt;
&lt;h2 id="contract-model"&gt;Contract model&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;NodeTwin.spec.gpu.powerCap&lt;/code&gt; defines a per-GPU cap intent:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;scope: perGpu&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;capWattsPerGpu&lt;/code&gt; (absolute, optional)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;capPctOfMax&lt;/code&gt; (percentage, optional)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Precedence:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;capWattsPerGpu&lt;/code&gt; if present&lt;/li&gt;
&lt;li&gt;otherwise &lt;code&gt;capPctOfMax&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The same cap is applied uniformly to all GPUs on the node.&lt;/p&gt;</description></item><item><title>Workload and Power Simulator</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/simulator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/simulator/</guid><description>&lt;p&gt;The Joulie simulator lets you run full control-loop experiments on virtual clusters without real hardware. It keeps Kubernetes scheduling real while simulating hardware telemetry, power dynamics, and thermal behavior per node.&lt;/p&gt;
&lt;p&gt;This page covers the simulator&amp;rsquo;s architecture, HTTP API, and integration points. Detailed subsystems are documented on dedicated pages linked throughout.&lt;/p&gt;
&lt;h2 id="architecture-at-a-glance"&gt;Architecture at a glance&lt;/h2&gt;
&lt;p&gt;The simulator extends the same control path used on real nodes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Node labels define simulated hardware identity.&lt;/li&gt;
&lt;li&gt;Operator resolves hardware from &lt;code&gt;NodeHardware&lt;/code&gt; when available, otherwise from labels/inventory fallback.&lt;/li&gt;
&lt;li&gt;Operator writes desired node profile (&lt;code&gt;NodeTwin.spec&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Agent reads desired state and sends control intents.&lt;/li&gt;
&lt;li&gt;Simulator emulates telemetry/control behavior per node and exposes HTTP endpoints.&lt;/li&gt;
&lt;li&gt;Next reconcile loop reacts to updated simulated state.&lt;/li&gt;
&lt;/ol&gt;
&lt;img src='https://joulie-k8s.github.io/Joulie/versions/v0.1.1/images/joulie-arch-simulator.png
' alt="Joulie simulator architecture overview"&gt;
&lt;p&gt;The diagram shows the end-to-end loop:&lt;/p&gt;</description></item><item><title>Workload Generation</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-generation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-generation/</guid><description>&lt;p&gt;This page documents how Joulie generates &lt;strong&gt;realistic AI workload traces&lt;/strong&gt; for the simulator.&lt;/p&gt;
&lt;p&gt;It is separate from &lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-simulator/"&gt;Workload Simulator&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;this page explains how traces are &lt;strong&gt;generated&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;the workload-simulator page explains how those traces are &lt;strong&gt;consumed at runtime&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scope"&gt;Scope&lt;/h2&gt;
&lt;p&gt;The current generator is designed to be realistic for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI-oriented Kubernetes clusters,&lt;/li&gt;
&lt;li&gt;CPU + GPU workloads,&lt;/li&gt;
&lt;li&gt;memory-pressure-sensitive jobs,&lt;/li&gt;
&lt;li&gt;multi-pod logical workloads such as distributed training and HPO-style experiments.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The current generator &lt;strong&gt;does not&lt;/strong&gt; explicitly model:&lt;/p&gt;</description></item><item><title>Workload Distributions</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-distributions/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-distributions/</guid><description>&lt;p&gt;This page documents the &lt;strong&gt;statistical distributions and priors&lt;/strong&gt; behind the current workload generator.&lt;/p&gt;
&lt;p&gt;Use it together with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-generation/"&gt;Workload Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-simulator/"&gt;Workload Simulator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/hardware/hardware-modeling/"&gt;Hardware Modeling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-this-page-is-for"&gt;What this page is for&lt;/h2&gt;
&lt;p&gt;The generator is no longer just a flat random-job emitter.
It now uses explicit priors for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;arrival timing,&lt;/li&gt;
&lt;li&gt;GPU-count skew,&lt;/li&gt;
&lt;li&gt;duration shape,&lt;/li&gt;
&lt;li&gt;utilization,&lt;/li&gt;
&lt;li&gt;memory pressure,&lt;/li&gt;
&lt;li&gt;multi-pod workload structure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This page makes those priors visible and explains why they are reasonable.&lt;/p&gt;
&lt;h2 id="1-arrival-model"&gt;1. Arrival model&lt;/h2&gt;
&lt;p&gt;The current implementation uses a lightweight &lt;strong&gt;NHPP-like&lt;/strong&gt; process:&lt;/p&gt;</description></item><item><title>Kubernetes AI Workloads</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/kubernetes-ai-workloads/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/kubernetes-ai-workloads/</guid><description>&lt;p&gt;This page explains how the logical workload structures used by Joulie map onto common Kubernetes-native AI workload patterns.&lt;/p&gt;
&lt;p&gt;It is mainly a documentation page today.
The current simulator generator emits the &lt;strong&gt;structure metadata and pod-expanded jobs&lt;/strong&gt;, but it does &lt;strong&gt;not yet&lt;/strong&gt; render &lt;code&gt;PyTorchJob&lt;/code&gt;, &lt;code&gt;MPIJob&lt;/code&gt;, or &lt;code&gt;Katib Experiment&lt;/code&gt; manifests directly.&lt;/p&gt;
&lt;h2 id="why-this-page-exists"&gt;Why this page exists&lt;/h2&gt;
&lt;p&gt;The workload-generation report makes an important point:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;realistic AI workloads are often &lt;strong&gt;not single pods&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;and a single logical workload may map to:
&lt;ul&gt;
&lt;li&gt;a launcher + workers,&lt;/li&gt;
&lt;li&gt;parameter servers + workers,&lt;/li&gt;
&lt;li&gt;or a controller + many HPO trial pods.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That distinction matters even in a simulator, because power and slowdown should often be understood at the &lt;strong&gt;logical workload&lt;/strong&gt; level, not only at the pod level.&lt;/p&gt;</description></item><item><title>Joulie Operator</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/operator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/operator/</guid><description>&lt;p&gt;The operator is Joulie&amp;rsquo;s cluster-level decision engine.&lt;/p&gt;
&lt;p&gt;It does not write host power interfaces directly.
Instead, it decides desired node states and publishes them through Kubernetes objects and labels.&lt;/p&gt;
&lt;p&gt;In practice, the operator answers one question over and over:
which nodes should currently supply &lt;code&gt;performance&lt;/code&gt; capacity, and which can safely supply &lt;code&gt;eco&lt;/code&gt; capacity?&lt;/p&gt;
&lt;h2 id="responsibilities"&gt;Responsibilities&lt;/h2&gt;
&lt;p&gt;At each reconcile tick, the operator:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;selects eligible managed nodes,&lt;/li&gt;
&lt;li&gt;reads &lt;code&gt;NodeHardware&lt;/code&gt; when available and falls back to node labels when it is not,&lt;/li&gt;
&lt;li&gt;resolves hardware identity against the shared inventory,&lt;/li&gt;
&lt;li&gt;classifies workload demand from pod scheduling constraints,&lt;/li&gt;
&lt;li&gt;runs a policy algorithm (&lt;code&gt;pkg/operator/policy/&lt;/code&gt;) to compute a plan,&lt;/li&gt;
&lt;li&gt;applies transition guards for safe downgrades,&lt;/li&gt;
&lt;li&gt;writes desired node targets (&lt;code&gt;NodeTwin.spec&lt;/code&gt;) and the &lt;code&gt;joulie.io/power-profile&lt;/code&gt; node label.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The agent then enforces those targets node-by-node.&lt;/p&gt;</description></item><item><title>Workload Simulator</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-simulator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-simulator/</guid><description>&lt;p&gt;This page documents the workload-side simulation model.&lt;/p&gt;
&lt;p&gt;Trace generation methodology, statistical priors, multi-pod workload structure, and workload-generation references are documented in &lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-generation/"&gt;Workload Generation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="scope"&gt;Scope&lt;/h2&gt;
&lt;p&gt;The workload simulator handles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;trace/job ingestion,&lt;/li&gt;
&lt;li&gt;pod creation and placement via real scheduler,&lt;/li&gt;
&lt;li&gt;per-job progress updates,&lt;/li&gt;
&lt;li&gt;completion and pod deletion,&lt;/li&gt;
&lt;li&gt;class inference from scheduling constraints.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Power/control dynamics are documented separately in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/power-simulator/"&gt;Power Simulator&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="trace-driven-workload-model"&gt;Trace-driven workload model&lt;/h2&gt;
&lt;p&gt;Enable with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;SIM_WORKLOAD_TRACE_PATH=/path/to/trace.jsonl&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The simulator loads &lt;code&gt;type=job&lt;/code&gt; records and schedules pods over time according to submit offsets.&lt;/p&gt;</description></item><item><title>Hardware Modeling and Physical Power Model</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/hardware/hardware-modeling/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/hardware/hardware-modeling/</guid><description>&lt;p&gt;This page documents how Joulie models CPUs and GPUs across the project using a mix of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;official vendor specifications and management APIs&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;public measured power curves&lt;/strong&gt;, and&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;explicit proxy models&lt;/strong&gt; where public exact curves are not yet available.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It serves two closely related purposes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;for the &lt;strong&gt;agent&lt;/strong&gt;, it describes the hardware assumptions used to resolve caps, interpret device limits, and reason about how throttling affects attainable performance&lt;/li&gt;
&lt;li&gt;for the &lt;strong&gt;simulator&lt;/strong&gt;, it describes the physical model used to turn utilization and control actions into simulated power and slowdown&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="quick-summary"&gt;Quick summary&lt;/h2&gt;
&lt;p&gt;If you want the short version before the details:&lt;/p&gt;</description></item><item><title>Joulie Agent</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/agent/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/agent/</guid><description>&lt;p&gt;The agent is Joulie&amp;rsquo;s node-side enforcement component.&lt;/p&gt;
&lt;p&gt;It consumes desired state and applies node-local controls through configured backends.&lt;/p&gt;
&lt;p&gt;If the operator decides &amp;ldquo;this node should now behave like eco&amp;rdquo; or &amp;ldquo;this node should stay performance&amp;rdquo;,
the agent is the component that turns that intent into concrete control actions.&lt;/p&gt;
&lt;h2 id="responsibilities"&gt;Responsibilities&lt;/h2&gt;
&lt;p&gt;At each reconcile tick, the agent:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;identifies its node scope (single node in daemonset mode, sharded set in pool mode),&lt;/li&gt;
&lt;li&gt;discovers local CPU/GPU hardware and runtime control capability,&lt;/li&gt;
&lt;li&gt;publishes &lt;code&gt;NodeHardware&lt;/code&gt; for each owned node,&lt;/li&gt;
&lt;li&gt;reads desired target (&lt;code&gt;NodeTwin.spec&lt;/code&gt;) for each owned node,&lt;/li&gt;
&lt;li&gt;resolves telemetry/control backend from environment variables (default: host),&lt;/li&gt;
&lt;li&gt;applies controls (host or HTTP),&lt;/li&gt;
&lt;li&gt;exports metrics and status.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="inputs-and-outputs"&gt;Inputs and outputs&lt;/h2&gt;
&lt;p&gt;Inputs:&lt;/p&gt;</description></item><item><title>Power Simulator</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/power-simulator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/power-simulator/</guid><description>&lt;p&gt;This page describes the simulator runtime mechanics (control/state/energy paths).&lt;/p&gt;
&lt;p&gt;The canonical physical model, provenance, and hardware assumptions are documented in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/hardware/hardware-modeling/"&gt;Hardware Modeling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For workload progression semantics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/workload-simulator/"&gt;Workload Simulator&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scope"&gt;Scope&lt;/h2&gt;
&lt;p&gt;The power simulator runtime is responsible for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;keeping per-node control state (CPU cap, DVFS throttle, GPU cap),&lt;/li&gt;
&lt;li&gt;applying control actions from &lt;code&gt;/control/{node}&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;updating dynamics with settling/ramp behavior,&lt;/li&gt;
&lt;li&gt;exposing power telemetry on &lt;code&gt;/telemetry/{node}&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;integrating energy over time (&lt;code&gt;/debug/energy&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="runtime-state-and-controls"&gt;Runtime state and controls&lt;/h2&gt;
&lt;p&gt;Main node state includes:&lt;/p&gt;</description></item><item><title>Digital Twin</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/digital-twin/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/digital-twin/</guid><description>&lt;p&gt;The digital twin is Joulie&amp;rsquo;s core predictive engine. It is a lightweight O(1) parametric model that predicts the impact of scheduling and power-cap decisions on node thermal and power state, without running a full simulation for each scheduling decision.&lt;/p&gt;
&lt;h2 id="what-the-digital-twin-computes"&gt;What the digital twin computes&lt;/h2&gt;
&lt;p&gt;For each managed node, the twin produces three scores stored in &lt;code&gt;NodeTwin.status&lt;/code&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Signal&lt;/th&gt;
 &lt;th&gt;Range&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Power headroom&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;0-100&lt;/td&gt;
 &lt;td&gt;Remaining power budget before hitting thermal or PSU limits. Higher is better for new workload placement.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;CoolingStress&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;0-100&lt;/td&gt;
 &lt;td&gt;Predicted percentage of cooling capacity in use. High values indicate the node is near its thermal limit.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;PSUStress&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;0-100&lt;/td&gt;
 &lt;td&gt;Predicted percentage of PDU/rack power capacity in use. High values indicate the rack is near its power supply limit.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The twin also computes:&lt;/p&gt;</description></item><item><title>Hardware Modeling</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/hardware-modeling/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/hardware-modeling/</guid><description>&lt;p&gt;This simulator section now treats hardware modeling as a shared hardware concept rather than a simulator-only detail.&lt;/p&gt;
&lt;p&gt;The canonical page is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/hardware/hardware-modeling/"&gt;Hardware Modeling and Physical Power Model&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use that page for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU and GPU model provenance&lt;/li&gt;
&lt;li&gt;physical assumptions behind caps and slowdown&lt;/li&gt;
&lt;li&gt;heterogeneous-node semantics&lt;/li&gt;
&lt;li&gt;current limitations and calibration status&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From the simulator point of view, the important relationship is simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the simulator implements the modeling assumptions documented there&lt;/li&gt;
&lt;li&gt;the agent relies on the same hardware assumptions when interpreting caps and backend limits&lt;/li&gt;
&lt;li&gt;simulator runtime pages describe how those models are exercised in experiments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For simulator-specific flow, continue with:&lt;/p&gt;</description></item><item><title>Policy Algorithms</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/policies/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/policies/</guid><description>&lt;p&gt;This page documents the controller policy algorithms implemented in &lt;code&gt;pkg/operator/policy/&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Use this page after:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/policy/"&gt;CRD and Policy Model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/operator/"&gt;Joulie Operator&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="classification-input"&gt;Classification Input&lt;/h2&gt;
&lt;p&gt;Policy demand classification is derived from the &lt;code&gt;joulie.io/workload-class&lt;/code&gt; pod annotation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;performance&lt;/code&gt;: pod carries &lt;code&gt;joulie.io/workload-class: performance&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;standard&lt;/code&gt; (default): no annotation or &lt;code&gt;joulie.io/workload-class: standard&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="shared-reconcile-flow"&gt;Shared Reconcile Flow&lt;/h2&gt;
&lt;p&gt;Each reconcile tick:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Select eligible nodes from &lt;code&gt;NODE_SELECTOR&lt;/code&gt;, excluding reserved and unschedulable nodes.&lt;/li&gt;
&lt;li&gt;Build a hardware view from &lt;code&gt;NodeHardware&lt;/code&gt; when available, otherwise from node labels/inventory fallback.&lt;/li&gt;
&lt;li&gt;Sort eligible nodes by normalized compute density (highest first).&lt;/li&gt;
&lt;li&gt;Preserve at least one performance-capable node per discovered hardware family whenever the requested HP count allows it.&lt;/li&gt;
&lt;li&gt;Build a desired plan with the selected policy.&lt;/li&gt;
&lt;li&gt;Apply downgrade guard (sets &lt;code&gt;NodeTwin.status.schedulableClass&lt;/code&gt; to &lt;code&gt;draining&lt;/code&gt; while blocking pods still run).&lt;/li&gt;
&lt;li&gt;Write &lt;code&gt;NodeTwin.spec&lt;/code&gt; and update the &lt;code&gt;joulie.io/power-profile&lt;/code&gt; node label.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In other words, policies still decide &lt;em&gt;how many&lt;/em&gt; high-performance nodes are needed, but the density-aware ordering influences &lt;em&gt;which&lt;/em&gt; nodes get those assignments.&lt;/p&gt;</description></item><item><title>Scheduler Extender</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/scheduler/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/scheduler/</guid><description>&lt;p&gt;Joulie ships a scheduler extender that steers workloads toward appropriate nodes based on power profile, thermal stress, and hardware capabilities.&lt;/p&gt;
&lt;h2 id="how-a-pod-gets-scheduled-end-to-end"&gt;How a pod gets scheduled (end-to-end)&lt;/h2&gt;
&lt;p&gt;When a new pod is created in the cluster, the following sequence occurs:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;1. Pod created (e.g., kubectl apply, Job controller, Deployment rollout)
 |
2. kube-scheduler picks up the unscheduled pod
 |
3. kube-scheduler runs its default filters (resource fits, taints, affinity)
 |
4. kube-scheduler calls Joulie&amp;#39;s /filter endpoint
 | - Sends: pod spec + candidate node list
 | - Joulie reads pod annotation joulie.io/workload-class
 | - Performance pods: reject nodes with schedulableClass = eco or draining
 | - Standard pods: pass all nodes
 | - Returns: filtered node list + rejection reasons
 |
5. kube-scheduler calls Joulie&amp;#39;s /prioritize endpoint
 | - Sends: pod spec + surviving node list
 | - Joulie reads NodeTwin CRs (cached, 30s TTL) for power state
 | - Joulie reads NodeHardware CRs (cached, 30s TTL) for hardware specs
 | - Joulie extracts pod CPU/GPU requests for marginal power estimation
 | - Joulie scores each node 0-100 using the scoring formula
 | - Returns: list of (node, score) pairs
 |
6. kube-scheduler combines Joulie scores with its own plugin scores
 |
7. Pod is bound to the highest-scoring node
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The extender participates in steps 4 and 5 only. It does not replace the Kubernetes scheduler — it extends it with energy-aware filter and scoring logic.&lt;/p&gt;</description></item><item><title>Simulator Metrics</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/metrics/</guid><description>&lt;p&gt;This page documents Prometheus metrics exposed by the simulator (&lt;code&gt;simulator/cmd/simulator/main.go&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Endpoint:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;path: &lt;code&gt;/metrics&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;address: simulator HTTP listen address (&lt;code&gt;SIM_ADDR&lt;/code&gt;, default &lt;code&gt;:18080&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Related debug endpoints (non-Prometheus):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/debug/nodes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/debug/events&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/debug/energy&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="httprequest-metrics"&gt;HTTP/request metrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_requests_total{route,method,status}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;total HTTP requests by route/method/status&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_request_duration_seconds{route,method}&lt;/code&gt; (histogram)
&lt;ul&gt;
&lt;li&gt;request latency&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="control-path-metrics"&gt;Control-path metrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_controls_total{node,action}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;received control actions by node/action&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_control_actions_total{node,action,result}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;control action outcomes&lt;/li&gt;
&lt;li&gt;&lt;code&gt;result&lt;/code&gt;: &lt;code&gt;applied|blocked|error&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="per-node-simulated-state-metrics"&gt;Per-node simulated state metrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_cap_watts{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;current simulated effective cap&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_rapl_cap_watts{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated RAPL cap value&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_throttle_pct{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated DVFS throttle percent&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_power_watts{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated exported node power&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_cpu_util{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated CPU utilization&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_freq_scale{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated frequency scale&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_running_pods{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;running pods observed on the node&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_class_info{node,class}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;class assignment marker (&lt;code&gt;1&lt;/code&gt; on active class)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="workloadjob-metrics"&gt;Workload/job metrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_job_submitted_total{class}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;jobs submitted by class&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_job_completed_total{class,node}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;jobs completed by class and node&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_job_completion_seconds&lt;/code&gt; (histogram)
&lt;ul&gt;
&lt;li&gt;job completion latency distribution&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="notes"&gt;Notes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Prometheus metrics capture online simulator state and request/control behavior.&lt;/li&gt;
&lt;li&gt;Integrated node/cluster energy totals are exposed through &lt;code&gt;/debug/energy&lt;/code&gt; (JSON), not as Prometheus time series in the current implementation.&lt;/li&gt;
&lt;li&gt;Richer thermal and averaged-vs-instantaneous details are currently exposed through the HTTP telemetry/debug endpoints rather than as separate Prometheus gauges.&lt;/li&gt;
&lt;li&gt;In particular, fields such as &lt;code&gt;instantPackagePowerWatts&lt;/code&gt;, &lt;code&gt;cpu.temperatureC&lt;/code&gt;, &lt;code&gt;cpu.thermalThrottlePct&lt;/code&gt;, and per-device GPU averaged power live in &lt;code&gt;/telemetry/{node}&lt;/code&gt; and &lt;code&gt;/debug/nodes&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Energy-Aware Scheduling</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/energy-aware-scheduling/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/energy-aware-scheduling/</guid><description>&lt;p&gt;Joulie&amp;rsquo;s scheduler extender makes placement decisions informed by real-time energy telemetry, workload characteristics, and facility-level power conditions. This page describes the full pipeline from metrics collection through scoring and optional rescheduling.&lt;/p&gt;
&lt;h2 id="end-to-end-pipeline"&gt;End-to-end pipeline&lt;/h2&gt;
&lt;p&gt;The energy-aware scheduling pipeline has five stages:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Kepler + RAPL/NVML telemetry
 -&amp;gt; Prometheus (scrape &amp;amp; store)
 -&amp;gt; Digital twin (NodeTwin.status)
 -&amp;gt; Scheduler extender (filter + score)
 -&amp;gt; Placement decision
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Each stage runs independently and communicates through Kubernetes CRDs or Prometheus queries. There is no monolithic scheduling engine; each component does one thing and feeds the next.&lt;/p&gt;</description></item><item><title>Configuration Reference</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/05-configuration-reference/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/getting-started/05-configuration-reference/</guid><description>&lt;p&gt;Complete reference for all Joulie environment variables. These are set via Helm values or directly in the Deployment/DaemonSet manifests.&lt;/p&gt;
&lt;p&gt;Defaults listed below are the &lt;strong&gt;code defaults&lt;/strong&gt;. The Helm chart (&lt;code&gt;charts/joulie/values.yaml&lt;/code&gt;) overrides some of them — notably, the operator &lt;code&gt;NODE_SELECTOR&lt;/code&gt; defaults to &lt;code&gt;joulie.io/managed=true&lt;/code&gt; in the chart even though the code default is &lt;code&gt;node-role.kubernetes.io/worker&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="agent"&gt;Agent&lt;/h2&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Variable&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;AGENT_MODE&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;daemonset&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;daemonset&lt;/code&gt; (one agent per node) or &lt;code&gt;pool&lt;/code&gt; (shared agents with sharding)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;NODE_NAME&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(required in daemonset mode)&lt;/td&gt;
 &lt;td&gt;Name of the node this agent manages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;RECONCILE_INTERVAL&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;20s&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;How often the agent reconciles desired state&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;METRICS_ADDR&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;:8080&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Address for the Prometheus metrics endpoint&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;SIMULATE_ONLY&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;false&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;If &lt;code&gt;true&lt;/code&gt;, agent discovers hardware but does not apply power caps&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;HARDWARE_CATALOG_PATH&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;simulator/catalog/hardware.yaml&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Path to the hardware inventory catalog YAML&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="agent-pool-mode"&gt;Agent pool mode&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Variable&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;POOL_NODE_SELECTOR&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;joulie.io/managed=true&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Label selector for nodes managed by pool agents&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;POOL_SHARDS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Total number of shards for pool mode partitioning&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;POOL_SHARD_ID&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(from pod ordinal)&lt;/td&gt;
 &lt;td&gt;Shard ID for this agent instance&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="agent-dvfs-control"&gt;Agent DVFS control&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Variable&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;DVFS_EMA_ALPHA&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.3&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Exponential moving average smoothing factor for power tracking&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;DVFS_HIGH_MARGIN_W&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;10.0&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Power above cap (watts) to trigger frequency reduction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;DVFS_LOW_MARGIN_W&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;15.0&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Power below cap (watts) to trigger frequency increase&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;DVFS_STEP_PCT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;10&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Frequency throttle step size (%)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;DVFS_COOLDOWN&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;20s&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Minimum duration between DVFS adjustments&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;DVFS_TRIP_COUNT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;2&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Consecutive samples outside margin before acting&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;DVFS_MIN_FREQ_KHZ&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;1500000&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Floor frequency for DVFS throttling (kHz)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="agent-telemetry-and-control-backends"&gt;Agent telemetry and control backends&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Variable&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TELEMETRY_CPU_SOURCE&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;host&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;CPU telemetry source: &lt;code&gt;host&lt;/code&gt;, &lt;code&gt;http&lt;/code&gt;, &lt;code&gt;prometheus&lt;/code&gt;, &lt;code&gt;none&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TELEMETRY_CPU_CONTROL&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;host&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;CPU control backend: &lt;code&gt;host&lt;/code&gt;, &lt;code&gt;http&lt;/code&gt;, &lt;code&gt;none&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TELEMETRY_GPU_CONTROL&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;host&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;GPU control backend: &lt;code&gt;host&lt;/code&gt;, &lt;code&gt;http&lt;/code&gt;, &lt;code&gt;none&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TELEMETRY_CPU_HTTP_ENDPOINT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;HTTP endpoint for CPU telemetry (e.g., &lt;code&gt;http://sim:18080/telemetry/{node}&lt;/code&gt;)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TELEMETRY_CPU_CONTROL_HTTP_ENDPOINT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;HTTP endpoint for CPU control (e.g., &lt;code&gt;http://sim:18080/control/{node}&lt;/code&gt;)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TELEMETRY_CPU_CONTROL_MODE&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;CPU control mode override&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TELEMETRY_GPU_CONTROL_HTTP_ENDPOINT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;HTTP endpoint for GPU control (e.g., &lt;code&gt;http://sim:18080/control/{node}&lt;/code&gt;)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TELEMETRY_GPU_CONTROL_MODE&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;GPU control mode override&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;TELEMETRY_HTTP_TIMEOUT_SECONDS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;5&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;HTTP client timeout for telemetry/control requests&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="operator"&gt;Operator&lt;/h2&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Variable&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;RECONCILE_INTERVAL&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;1m&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;How often the operator reconciles cluster state&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;METRICS_ADDR&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;:8081&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Address for the Prometheus metrics endpoint&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;NODE_SELECTOR&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;node-role.kubernetes.io/worker&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Label selector for managed nodes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;RESERVED_LABEL_KEY&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;joulie.io/reserved&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Label key for nodes excluded from policy decisions&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;POWER_PROFILE_LABEL&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;joulie.io/power-profile&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Node label key for the active power profile&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;OPERATOR_NODE_POWER_SOURCE&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;static&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Node power data source: &lt;code&gt;static&lt;/code&gt;, &lt;code&gt;http&lt;/code&gt;, &lt;code&gt;prometheus&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;OPERATOR_NODE_POWER_HTTP_ENDPOINT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;HTTP endpoint for per-node power readings&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;OPERATOR_NODE_POWER_PROMETHEUS_ADDRESS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;Prometheus address for per-node power queries&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;OPERATOR_NODE_POWER_PROMETHEUS_QUERY&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;PromQL query for per-node power readings&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="power-cap-configuration"&gt;Power cap configuration&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Variable&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;PERFORMANCE_CAP_WATTS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;5000&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Absolute CPU power cap for performance nodes (watts)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;ECO_CAP_WATTS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;120&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Absolute CPU power cap for eco nodes (watts)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;CPU_PERFORMANCE_CAP_PCT_OF_MAX&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;100&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;CPU cap as percentage of max for performance nodes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;CPU_ECO_CAP_PCT_OF_MAX&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;60&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;CPU cap as percentage of max for eco nodes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;CPU_WRITE_ABSOLUTE_CAPS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;false&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;If &lt;code&gt;true&lt;/code&gt;, write absolute watts instead of percentage&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;GPU_PERFORMANCE_CAP_PCT_OF_MAX&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;100&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;GPU cap as percentage of max for performance nodes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;GPU_ECO_CAP_PCT_OF_MAX&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;60&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;GPU cap as percentage of max for eco nodes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;GPU_WRITE_ABSOLUTE_CAPS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;false&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;If &lt;code&gt;true&lt;/code&gt;, write absolute GPU watts instead of percentage&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;GPU_MODEL_CAPS_JSON&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;{}&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;JSON map of GPU model name to &lt;code&gt;{&amp;quot;minCapWatts&amp;quot;: N, &amp;quot;maxCapWatts&amp;quot;: M}&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;GPU_PRODUCT_LABEL_KEYS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;joulie.io/gpu.product,...&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Comma-separated node label keys to read GPU product name&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="policy-configuration"&gt;Policy configuration&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Variable&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;POLICY_TYPE&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;static_partition&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Policy algorithm: &lt;code&gt;static_partition&lt;/code&gt;, &lt;code&gt;queue_aware_v1&lt;/code&gt;, or &lt;code&gt;rule_swap_v1&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;STATIC_HP_FRAC&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.50&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Fraction of nodes allocated to performance in &lt;code&gt;static_partition&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;QUEUE_HP_BASE_FRAC&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.60&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Base fraction of performance nodes in &lt;code&gt;queue_aware_v1&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;QUEUE_HP_MIN&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Minimum performance nodes in &lt;code&gt;queue_aware_v1&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;QUEUE_HP_MAX&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;1000000&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Maximum performance nodes in &lt;code&gt;queue_aware_v1&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;QUEUE_PERF_PER_HP_NODE&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;10&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Performance pods per performance node ratio in &lt;code&gt;queue_aware_v1&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="facility-metrics"&gt;Facility metrics&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Variable&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;ENABLE_FACILITY_METRICS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;false&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Enable polling data-center-level metrics from Prometheus&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;FACILITY_PROMETHEUS_ADDRESS&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;http://prometheus-operated.monitoring:9090&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Prometheus endpoint for facility metric queries&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;FACILITY_POLL_INTERVAL&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;30s&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;How often facility metrics are polled&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;FACILITY_AMBIENT_TEMP_METRIC&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;datacenter_ambient_temperature_celsius&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;PromQL metric name for ambient temperature&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;FACILITY_IT_POWER_METRIC&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;datacenter_total_it_power_watts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;PromQL metric name for total IT power draw&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;FACILITY_COOLING_POWER_METRIC&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;datacenter_cooling_power_watts&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;PromQL metric name for cooling infrastructure power&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;FACILITY_ZONE_AMBIENT_METRIC_TEMPLATE&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;PromQL template for per-zone ambient temperature, e.g. &lt;code&gt;datacenter_ambient_temperature_celsius{zone=&amp;quot;%s&amp;quot;}&lt;/code&gt;. Use &lt;code&gt;%s&lt;/code&gt; as the zone name placeholder. Empty = disabled. (planned — not yet wired to env vars)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;FACILITY_RACK_POWER_METRIC_TEMPLATE&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;(empty)&lt;/td&gt;
 &lt;td&gt;PromQL template for per-rack power draw, e.g. &lt;code&gt;datacenter_rack_power_watts{rack=&amp;quot;%s&amp;quot;}&lt;/code&gt;. Use &lt;code&gt;%s&lt;/code&gt; as the rack name placeholder. Empty = disabled. (planned — not yet wired to env vars)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="node-topology"&gt;Node topology&lt;/h3&gt;
&lt;p&gt;Joulie supports optional per-rack PSU stress and per-zone cooling stress. This is activated by adding standard node labels:&lt;/p&gt;</description></item><item><title>Input Telemetry and Actuation Interfaces</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/telemetry/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/telemetry/</guid><description>&lt;p&gt;This page describes runtime IO contracts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;how Joulie reads telemetry inputs,&lt;/li&gt;
&lt;li&gt;how Joulie sends control intents.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want the CRD-level summary first, read &lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/policy/"&gt;CRD and Policy Model&lt;/a&gt;.
This page is the detailed runtime reference for the telemetry and control contract.&lt;/p&gt;
&lt;p&gt;It is not the &lt;code&gt;/metrics&lt;/code&gt; exposition contract.
For exported metrics, see &lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/metrics/"&gt;Metrics Reference&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="why-this-abstraction-exists"&gt;Why this abstraction exists&lt;/h2&gt;
&lt;p&gt;Joulie must run in two worlds with the same control logic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;real hardware clusters,&lt;/li&gt;
&lt;li&gt;simulator/KWOK clusters.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So agent/operator logic depends on provider interfaces, not directly on sysfs or simulator HTTP shape.&lt;/p&gt;</description></item><item><title>Metrics Reference</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/metrics/</guid><description>&lt;p&gt;Joulie exposes Prometheus metrics from multiple components.&lt;/p&gt;
&lt;p&gt;This page covers &lt;strong&gt;operator + agent + scheduler extender&lt;/strong&gt; metrics.
Simulator metrics are documented separately in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/simulator/metrics/"&gt;Simulator Metrics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For telemetry/control input interfaces (host/http routing), see:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/architecture/telemetry/"&gt;Input Telemetry and Actuation Interfaces&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="endpoints-by-component"&gt;Endpoints by component&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Agent:
&lt;ul&gt;
&lt;li&gt;path: &lt;code&gt;/metrics&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;default address: &lt;code&gt;:8080&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;env override: &lt;code&gt;METRICS_ADDR&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Operator:
&lt;ul&gt;
&lt;li&gt;path: &lt;code&gt;/metrics&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;default address: &lt;code&gt;:8081&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;env override: &lt;code&gt;METRICS_ADDR&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Scheduler extender:
&lt;ul&gt;
&lt;li&gt;path: &lt;code&gt;/metrics&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;default address: &lt;code&gt;:9877&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;env override: &lt;code&gt;METRICS_ADDR&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="agent-metrics"&gt;Agent metrics&lt;/h2&gt;
&lt;h3 id="backend-and-selected-cap"&gt;Backend and selected cap&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_backend_mode{node,mode}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;mode&lt;/code&gt;: &lt;code&gt;none|rapl|dvfs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;active mode is &lt;code&gt;1&lt;/code&gt;, others &lt;code&gt;0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_policy_cap_watts{node,policy}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;current selected policy cap in watts&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="rapl-powerenergy"&gt;RAPL power/energy&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_rapl_energy_uj{node,zone}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;latest raw RAPL energy counter in microjoules&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_rapl_estimated_power_watts{node,zone}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;per-zone estimated power from energy deltas&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_rapl_package_total_power_watts{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;sum of package-level estimated power&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="dvfs-controller"&gt;DVFS controller&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_dvfs_observed_power_watts{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;observed package power used by DVFS controller&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_dvfs_ema_power_watts{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;EMA-smoothed power used for decisions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_dvfs_throttle_pct{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;current throttle percentage&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_dvfs_above_trip_count{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;consecutive above-threshold samples&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_dvfs_below_trip_count{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;consecutive below-threshold samples&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_dvfs_actions_total{node,action}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;action&lt;/code&gt;: &lt;code&gt;throttle_up|throttle_down&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="cpu-frequency-observability"&gt;CPU frequency observability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_dvfs_cpu_cur_freq_khz{node,cpu}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;current CPU/policy frequency in kHz&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_dvfs_cpu_max_freq_khz{node,cpu}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;enforced max frequency cap in kHz&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reliability"&gt;Reliability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_reconcile_errors_total{node}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;reconcile-loop errors&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="operator-metrics"&gt;Operator metrics&lt;/h2&gt;
&lt;h3 id="fsm-state-and-profile-label"&gt;FSM state and profile label&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_operator_node_state{node,state}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;state&lt;/code&gt;: &lt;code&gt;ActivePerformance|DrainingPerformance|ActiveEco&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;active state is &lt;code&gt;1&lt;/code&gt;, others &lt;code&gt;0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_operator_node_profile_label{node,profile}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;operator-applied node label view&lt;/li&gt;
&lt;li&gt;&lt;code&gt;profile&lt;/code&gt;: &lt;code&gt;performance|eco&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;active profile is &lt;code&gt;1&lt;/code&gt;, others &lt;code&gt;0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="transition-accounting"&gt;Transition accounting&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_operator_state_transitions_total{node,from_state,to_state,result}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;transition events emitted by operator&lt;/li&gt;
&lt;li&gt;&lt;code&gt;result&lt;/code&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;applied&lt;/code&gt;: transition committed&lt;/li&gt;
&lt;li&gt;&lt;code&gt;deferred&lt;/code&gt;: transition blocked/deferred by safeguards&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="heterogeneous-planning"&gt;Heterogeneous planning&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_operator_node_compute_density{node,component}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;normalized per-node density signal used for heterogeneous planning&lt;/li&gt;
&lt;li&gt;&lt;code&gt;component&lt;/code&gt;: &lt;code&gt;cpu|gpu&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;higher values mean the operator considers that node relatively denser for that subsystem&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scheduler-extender-metrics"&gt;Scheduler extender metrics&lt;/h2&gt;
&lt;h3 id="request-counters"&gt;Request counters&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_scheduler_filter_requests_total{workload_class}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;total filter requests by workload class&lt;/li&gt;
&lt;li&gt;&lt;code&gt;workload_class&lt;/code&gt;: &lt;code&gt;standard|performance&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_scheduler_prioritize_requests_total{workload_class}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;total prioritize (scoring) requests by workload class&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="request-latency"&gt;Request latency&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_scheduler_filter_duration_seconds{workload_class}&lt;/code&gt; (histogram)
&lt;ul&gt;
&lt;li&gt;time to process a filter request&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_scheduler_prioritize_duration_seconds{workload_class}&lt;/code&gt; (histogram)
&lt;ul&gt;
&lt;li&gt;time to process a prioritize request&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="scoring-signals"&gt;Scoring signals&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_scheduler_final_node_score{node,workload_class}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;final scheduling score (0-100) for each node and workload class&lt;/li&gt;
&lt;li&gt;updated on every prioritize call; reflects the combined headroom + cooling + trend + bonus formula&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_scheduler_node_headroom_score{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;power headroom score per node&lt;/li&gt;
&lt;li&gt;can go negative when projected power (measured + pod marginal) exceeds the capped budget&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="data-freshness"&gt;Data freshness&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_scheduler_stale_twin_data{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;1&lt;/code&gt; if the NodeTwin status is older than the staleness threshold (default 5m), &lt;code&gt;0&lt;/code&gt; otherwise&lt;/li&gt;
&lt;li&gt;a node with stale data receives a neutral score (50) instead of its computed value&lt;/li&gt;
&lt;li&gt;useful for alerting when the operator has stopped updating twin status&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="notes"&gt;Notes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Metrics are pull-based; values depend on scrape interval.&lt;/li&gt;
&lt;li&gt;Highest cardinality is usually per-CPU frequency series.&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>CPU-Only Benchmark</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/experiments/cpu-only-benchmark/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/experiments/cpu-only-benchmark/</guid><description>&lt;p&gt;This page reports results from the CPU-only cluster benchmark experiment (KWOK, 40 nodes):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/joulie-k8s/Joulie/tree/main/experiments/01-cpu-only-benchmark"&gt;&lt;code&gt;experiments/01-cpu-only-benchmark/&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scope"&gt;Scope&lt;/h2&gt;
&lt;p&gt;The benchmark compares three baselines on a &lt;strong&gt;CPU-only cluster&lt;/strong&gt; with 40 KWOK nodes across 3 hardware families, running on a real Kind+KWOK Kubernetes cluster:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;A&lt;/code&gt;: Simulator only (no power management)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;B&lt;/code&gt;: Joulie with static partition policy&lt;/li&gt;
&lt;li&gt;&lt;code&gt;C&lt;/code&gt;: Joulie with queue-aware dynamic policy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The experiment demonstrates energy savings achievable through CPU RAPL capping alone, without GPU complexity.&lt;/p&gt;</description></item><item><title>Heterogeneous GPU Cluster Benchmark</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/experiments/heterogeneous-benchmark/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/experiments/heterogeneous-benchmark/</guid><description>&lt;p&gt;This page reports results from the heterogeneous GPU cluster benchmark experiment (KWOK, 41 nodes):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/joulie-k8s/Joulie/tree/main/experiments/02-heterogeneous-benchmark"&gt;&lt;code&gt;experiments/02-heterogeneous-benchmark/&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scope"&gt;Scope&lt;/h2&gt;
&lt;p&gt;The benchmark compares three baselines on a &lt;strong&gt;heterogeneous GPU cluster&lt;/strong&gt; mixing 5 distinct GPU hardware families plus CPU-only nodes, running on a real Kind+KWOK Kubernetes cluster:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;A&lt;/code&gt;: Simulator only (no power management)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;B&lt;/code&gt;: Joulie with static partition policy&lt;/li&gt;
&lt;li&gt;&lt;code&gt;C&lt;/code&gt;: Joulie with queue-aware dynamic policy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The experiment demonstrates energy savings achievable through combined CPU and GPU RAPL capping on a mixed-vendor GPU fleet.&lt;/p&gt;</description></item><item><title>Homogeneous H100 NVL Benchmark</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/experiments/homogeneous-h100-benchmark/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/experiments/homogeneous-h100-benchmark/</guid><description>&lt;p&gt;This page reports results from the homogeneous H100 NVL cluster benchmark experiment (KWOK, 41 nodes):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/joulie-k8s/Joulie/tree/main/experiments/03-homogeneous-h100-benchmark"&gt;&lt;code&gt;experiments/03-homogeneous-h100-benchmark/&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scope"&gt;Scope&lt;/h2&gt;
&lt;p&gt;The benchmark compares three baselines on a &lt;strong&gt;homogeneous GPU cluster&lt;/strong&gt; with 33 identical NVIDIA H100 NVL nodes plus 8 CPU-only nodes, running on a real Kind+KWOK Kubernetes cluster:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;A&lt;/code&gt;: Simulator only (no power management)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;B&lt;/code&gt;: Joulie with static partition policy&lt;/li&gt;
&lt;li&gt;&lt;code&gt;C&lt;/code&gt;: Joulie with queue-aware dynamic policy&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="hypothesis"&gt;Hypothesis&lt;/h3&gt;
&lt;p&gt;Joulie performs better on a homogeneous cluster because every GPU node can accept any GPU job, eliminating the vendor/product-specific placement constraints that restrict policy flexibility in the &lt;a href="https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/experiments/heterogeneous-benchmark/"&gt;heterogeneous case&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Scoring Formula Validation</title><link>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/experiments/scoring-formula-validation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/versions/v0.1.1/docs/experiments/scoring-formula-validation/</guid><description>&lt;p&gt;This page reports results from the energy-aware scheduling formula validation experiment:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/joulie-k8s/Joulie/tree/main/experiments/04-scoring-formula-validation"&gt;&lt;code&gt;experiments/04-scoring-formula-validation/&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="objective"&gt;Objective&lt;/h2&gt;
&lt;p&gt;Validate Joulie&amp;rsquo;s energy-aware scheduling formula by demonstrating that power-aware scheduling improves energy efficiency compared to standard Kubernetes bin-packing (MostAllocated), using a Modelica FMU (DXCooled Airside Economizer) for physically-accurate cooling/PUE computation.&lt;/p&gt;
&lt;p&gt;Two scales tested:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Small cluster&lt;/strong&gt; (28 nodes) — formula tuning and component selection&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Large cluster&lt;/strong&gt; (2,500 nodes) — production-scale validation with H100 GPUs&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The experiment also validated the evolution from a legacy multi-component formula to the current streamlined Joulie scoring formula.&lt;/p&gt;</description></item></channel></rss>