Documentation

Joulie is a Kubernetes-native energy management system that uses per-node digital twins to optimize data center power consumption. It ingests real-time telemetry from every node (CPU/GPU power draw, thermal state, per-pod utilization) to maintain a continuously updated model of the cluster’s energy state. That model drives two things: power cap enforcement (via RAPL and NVML) and scheduling decisions that steer workloads toward the most energy-efficient nodes.

If you are completely new, the smoothest path is:

  1. Getting Started
  2. Architecture
  3. Hardware
  4. Simulator
  5. Experiments

Core mental model:

  • telemetry feeds the digital twin,
  • the twin drives operator decisions (power caps, migration triggers),
  • the scheduler extender reads twin state to steer new pod placement,
  • feedback from new placements updates telemetry, closing the loop.

Section guide

  • Getting Started
    • concepts, install, workload compatibility, WorkloadProfile classification, configuration reference
  • Architecture
    • operator/agent/twin/scheduler roles, CRDs, policy algorithms, telemetry/control interfaces, kubectl plugin
  • Hardware
    • CPU and GPU support model, heterogeneity strategy, runtime caveats
  • Simulator
    • trace-driven workload simulation, power modeling, facility stress
  • Experiments
    • benchmark design and measured outcomes

What to expect

  • Per-node digital twins: telemetry → twin state → cap decisions and scheduling.
  • Kubernetes-native contracts: 2 user-facing CRDs (NodeHardware, NodeTwin) + scheduling constraints as intent/supply language.
  • Workload classification: automatic profiling via Kepler/cAdvisor metrics with transparent classification reasons.
  • Observability tooling: kubectl joulie plugin, Grafana dashboard, Prometheus metrics.
  • Practical path to adoption: quickstart first, then progressive deep dives.