<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Simulator on Joulie</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/</link><description>Recent content in Simulator on Joulie</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://joulie-k8s.github.io/Joulie/main/docs/simulator/index.xml" rel="self" type="application/rss+xml"/><item><title>Installation</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/installation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/main/docs/simulator/installation/</guid><description>&lt;p&gt;This page covers how to install the Joulie simulator in a Kubernetes cluster.&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A running Kubernetes cluster (real or &lt;a href="https://kind.sigs.k8s.io/"&gt;kind&lt;/a&gt; for local development)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubectl&lt;/code&gt; configured for the target cluster&lt;/li&gt;
&lt;li&gt;&lt;code&gt;helm&lt;/code&gt; v3+ (for Helm installation)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="install-via-helm-recommended"&gt;Install via Helm (recommended)&lt;/h2&gt;
&lt;p&gt;The simulator is published as an OCI Helm chart. Install it with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm install joulie-sim oci://registry.cern.ch/mbunino/joulie/joulie-sim &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -n joulie-system --create-namespace
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To customize values, download the default values first:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;helm show values oci://registry.cern.ch/mbunino/joulie/joulie-sim &amp;gt; values.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then install with overrides:&lt;/p&gt;</description></item><item><title>Workload and Power Simulator</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/simulator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/main/docs/simulator/simulator/</guid><description>&lt;p&gt;The Joulie simulator lets you run full control-loop experiments on virtual clusters without real hardware. It keeps Kubernetes scheduling real while simulating hardware telemetry, power dynamics, and thermal behavior per node.&lt;/p&gt;
&lt;p&gt;This page covers the simulator&amp;rsquo;s architecture, HTTP API, and integration points. Detailed subsystems are documented on dedicated pages linked throughout.&lt;/p&gt;
&lt;h2 id="architecture-at-a-glance"&gt;Architecture at a glance&lt;/h2&gt;
&lt;p&gt;The simulator extends the same control path used on real nodes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Node labels define simulated hardware identity.&lt;/li&gt;
&lt;li&gt;Operator resolves hardware from &lt;code&gt;NodeHardware&lt;/code&gt; when available, otherwise from labels/inventory fallback.&lt;/li&gt;
&lt;li&gt;Operator writes desired node profile (&lt;code&gt;NodeTwin.spec&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Agent reads desired state and sends control intents.&lt;/li&gt;
&lt;li&gt;Simulator emulates telemetry/control behavior per node and exposes HTTP endpoints.&lt;/li&gt;
&lt;li&gt;Next reconcile loop reacts to updated simulated state.&lt;/li&gt;
&lt;/ol&gt;
&lt;img src='https://joulie-k8s.github.io/Joulie/main/images/joulie-arch-simulator.png
' alt="Joulie simulator architecture overview"&gt;
&lt;p&gt;The diagram shows the end-to-end loop:&lt;/p&gt;</description></item><item><title>Workload Generation</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-generation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-generation/</guid><description>&lt;p&gt;This page documents how Joulie generates &lt;strong&gt;realistic AI workload traces&lt;/strong&gt; for the simulator.&lt;/p&gt;
&lt;p&gt;It is separate from &lt;a href="https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-simulator/"&gt;Workload Simulator&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;this page explains how traces are &lt;strong&gt;generated&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;the workload-simulator page explains how those traces are &lt;strong&gt;consumed at runtime&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scope"&gt;Scope&lt;/h2&gt;
&lt;p&gt;The current generator is designed to be realistic for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI-oriented Kubernetes clusters,&lt;/li&gt;
&lt;li&gt;CPU + GPU workloads,&lt;/li&gt;
&lt;li&gt;memory-pressure-sensitive jobs,&lt;/li&gt;
&lt;li&gt;multi-pod logical workloads such as distributed training and HPO-style experiments.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The current generator &lt;strong&gt;does not&lt;/strong&gt; explicitly model:&lt;/p&gt;</description></item><item><title>Workload Distributions</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-distributions/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-distributions/</guid><description>&lt;p&gt;This page documents the &lt;strong&gt;statistical distributions and priors&lt;/strong&gt; behind the current workload generator.&lt;/p&gt;
&lt;p&gt;Use it together with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-generation/"&gt;Workload Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-simulator/"&gt;Workload Simulator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/main/docs/hardware/hardware-modeling/"&gt;Hardware Modeling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-this-page-is-for"&gt;What this page is for&lt;/h2&gt;
&lt;p&gt;The generator is no longer just a flat random-job emitter.
It now uses explicit priors for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;arrival timing,&lt;/li&gt;
&lt;li&gt;GPU-count skew,&lt;/li&gt;
&lt;li&gt;duration shape,&lt;/li&gt;
&lt;li&gt;utilization,&lt;/li&gt;
&lt;li&gt;memory pressure,&lt;/li&gt;
&lt;li&gt;multi-pod workload structure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This page makes those priors visible and explains why they are reasonable.&lt;/p&gt;
&lt;h2 id="1-arrival-model"&gt;1. Arrival model&lt;/h2&gt;
&lt;p&gt;The current implementation uses a lightweight &lt;strong&gt;NHPP-like&lt;/strong&gt; process:&lt;/p&gt;</description></item><item><title>Kubernetes AI Workloads</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/kubernetes-ai-workloads/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/main/docs/simulator/kubernetes-ai-workloads/</guid><description>&lt;p&gt;This page explains how the logical workload structures used by Joulie map onto common Kubernetes-native AI workload patterns.&lt;/p&gt;
&lt;p&gt;It is mainly a documentation page today.
The current simulator generator emits the &lt;strong&gt;structure metadata and pod-expanded jobs&lt;/strong&gt;, but it does &lt;strong&gt;not yet&lt;/strong&gt; render &lt;code&gt;PyTorchJob&lt;/code&gt;, &lt;code&gt;MPIJob&lt;/code&gt;, or &lt;code&gt;Katib Experiment&lt;/code&gt; manifests directly.&lt;/p&gt;
&lt;h2 id="why-this-page-exists"&gt;Why this page exists&lt;/h2&gt;
&lt;p&gt;The workload-generation report makes an important point:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;realistic AI workloads are often &lt;strong&gt;not single pods&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;and a single logical workload may map to:
&lt;ul&gt;
&lt;li&gt;a launcher + workers,&lt;/li&gt;
&lt;li&gt;parameter servers + workers,&lt;/li&gt;
&lt;li&gt;or a controller + many HPO trial pods.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That distinction matters even in a simulator, because power and slowdown should often be understood at the &lt;strong&gt;logical workload&lt;/strong&gt; level, not only at the pod level.&lt;/p&gt;</description></item><item><title>Workload Simulator</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-simulator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-simulator/</guid><description>&lt;p&gt;This page documents the workload-side simulation model.&lt;/p&gt;
&lt;p&gt;Trace generation methodology, statistical priors, multi-pod workload structure, and workload-generation references are documented in &lt;a href="https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-generation/"&gt;Workload Generation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="scope"&gt;Scope&lt;/h2&gt;
&lt;p&gt;The workload simulator handles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;trace/job ingestion,&lt;/li&gt;
&lt;li&gt;pod creation and placement via real scheduler,&lt;/li&gt;
&lt;li&gt;per-job progress updates,&lt;/li&gt;
&lt;li&gt;completion and pod deletion,&lt;/li&gt;
&lt;li&gt;class inference from scheduling constraints.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Power/control dynamics are documented separately in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/main/docs/simulator/power-simulator/"&gt;Power Simulator&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="trace-driven-workload-model"&gt;Trace-driven workload model&lt;/h2&gt;
&lt;p&gt;Enable with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;SIM_WORKLOAD_TRACE_PATH=/path/to/trace.jsonl&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The simulator loads &lt;code&gt;type=job&lt;/code&gt; records and schedules pods over time according to submit offsets.&lt;/p&gt;</description></item><item><title>Power Simulator</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/power-simulator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/main/docs/simulator/power-simulator/</guid><description>&lt;p&gt;This page describes the simulator runtime mechanics (control/state/energy paths).&lt;/p&gt;
&lt;p&gt;The canonical physical model, provenance, and hardware assumptions are documented in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/main/docs/hardware/hardware-modeling/"&gt;Hardware Modeling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For workload progression semantics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/main/docs/simulator/workload-simulator/"&gt;Workload Simulator&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scope"&gt;Scope&lt;/h2&gt;
&lt;p&gt;The power simulator runtime is responsible for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;keeping per-node control state (CPU cap, DVFS throttle, GPU cap),&lt;/li&gt;
&lt;li&gt;applying control actions from &lt;code&gt;/control/{node}&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;updating dynamics with settling/ramp behavior,&lt;/li&gt;
&lt;li&gt;exposing power telemetry on &lt;code&gt;/telemetry/{node}&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;integrating energy over time (&lt;code&gt;/debug/energy&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="runtime-state-and-controls"&gt;Runtime state and controls&lt;/h2&gt;
&lt;p&gt;Main node state includes:&lt;/p&gt;</description></item><item><title>Hardware Modeling</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/hardware-modeling/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/main/docs/simulator/hardware-modeling/</guid><description>&lt;p&gt;This simulator section now treats hardware modeling as a shared hardware concept rather than a simulator-only detail.&lt;/p&gt;
&lt;p&gt;The canonical page is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://joulie-k8s.github.io/Joulie/main/docs/hardware/hardware-modeling/"&gt;Hardware Modeling and Physical Power Model&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use that page for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU and GPU model provenance&lt;/li&gt;
&lt;li&gt;physical assumptions behind caps and slowdown&lt;/li&gt;
&lt;li&gt;heterogeneous-node semantics&lt;/li&gt;
&lt;li&gt;current limitations and calibration status&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From the simulator point of view, the important relationship is simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the simulator implements the modeling assumptions documented there&lt;/li&gt;
&lt;li&gt;the agent relies on the same hardware assumptions when interpreting caps and backend limits&lt;/li&gt;
&lt;li&gt;simulator runtime pages describe how those models are exercised in experiments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For simulator-specific flow, continue with:&lt;/p&gt;</description></item><item><title>Simulator Metrics</title><link>https://joulie-k8s.github.io/Joulie/main/docs/simulator/metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://joulie-k8s.github.io/Joulie/main/docs/simulator/metrics/</guid><description>&lt;p&gt;This page documents Prometheus metrics exposed by the simulator (&lt;code&gt;simulator/cmd/simulator/main.go&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Endpoint:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;path: &lt;code&gt;/metrics&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;address: simulator HTTP listen address (&lt;code&gt;SIM_ADDR&lt;/code&gt;, default &lt;code&gt;:18080&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Related debug endpoints (non-Prometheus):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/debug/nodes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/debug/events&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/debug/energy&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="httprequest-metrics"&gt;HTTP/request metrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_requests_total{route,method,status}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;total HTTP requests by route/method/status&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_request_duration_seconds{route,method}&lt;/code&gt; (histogram)
&lt;ul&gt;
&lt;li&gt;request latency&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="control-path-metrics"&gt;Control-path metrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_controls_total{node,action}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;received control actions by node/action&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_control_actions_total{node,action,result}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;control action outcomes&lt;/li&gt;
&lt;li&gt;&lt;code&gt;result&lt;/code&gt;: &lt;code&gt;applied|blocked|error&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="per-node-simulated-state-metrics"&gt;Per-node simulated state metrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_cap_watts{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;current simulated effective cap&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_rapl_cap_watts{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated RAPL cap value&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_throttle_pct{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated DVFS throttle percent&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_power_watts{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated exported node power&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_cpu_util{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated CPU utilization&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_freq_scale{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;simulated frequency scale&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_running_pods{node}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;running pods observed on the node&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_node_class_info{node,class}&lt;/code&gt; (gauge)
&lt;ul&gt;
&lt;li&gt;class assignment marker (&lt;code&gt;1&lt;/code&gt; on active class)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="workloadjob-metrics"&gt;Workload/job metrics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_job_submitted_total{class}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;jobs submitted by class&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_job_completed_total{class,node}&lt;/code&gt; (counter)
&lt;ul&gt;
&lt;li&gt;jobs completed by class and node&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;joulie_sim_job_completion_seconds&lt;/code&gt; (histogram)
&lt;ul&gt;
&lt;li&gt;job completion latency distribution&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="notes"&gt;Notes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Prometheus metrics capture online simulator state and request/control behavior.&lt;/li&gt;
&lt;li&gt;Integrated node/cluster energy totals are exposed through &lt;code&gt;/debug/energy&lt;/code&gt; (JSON), not as Prometheus time series in the current implementation.&lt;/li&gt;
&lt;li&gt;Richer thermal and averaged-vs-instantaneous details are currently exposed through the HTTP telemetry/debug endpoints rather than as separate Prometheus gauges.&lt;/li&gt;
&lt;li&gt;In particular, fields such as &lt;code&gt;instantPackagePowerWatts&lt;/code&gt;, &lt;code&gt;cpu.temperatureC&lt;/code&gt;, &lt;code&gt;cpu.thermalThrottlePct&lt;/code&gt;, and per-device GPU averaged power live in &lt;code&gt;/telemetry/{node}&lt;/code&gt; and &lt;code&gt;/debug/nodes&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>