Documentation

Joulie is a Kubernetes-native energy management system that uses per-node digital twins to optimize data center power consumption. It ingests real-time telemetry from every node (CPU/GPU power draw, thermal state, per-pod utilization) to maintain a continuously updated model of the cluster’s energy state. That model drives two things: power cap enforcement (via RAPL and NVML) and scheduling decisions that steer workloads toward the most energy-efficient nodes.

What can Joulie do?

Results from simulated benchmark experiments (Kind + KWOK clusters with physics-based power and cooling models):

  • 20-29% cluster power savings in heterogeneous GPU/CPU workloads through combined capping and scheduling.
  • 6.4% savings from scheduling alone – energy-aware pod placement reduces consumption without any power cap enforcement (validated on a simulated 2,500-node cluster).
  • Zero application impact – workload-class annotations let performance-critical pods bypass power constraints while background jobs absorb savings.
  • Full observabilitykubectl joulie status, Grafana dashboards, and Prometheus metrics give immediate visibility into per-node energy state.

Where to start

If you are completely new, the smoothest path is:

  1. Getting Started
  2. Architecture
  3. Hardware
  4. Simulator
  5. Experiments

Core mental model:

  • telemetry feeds the digital twin,
  • the twin drives operator decisions (power caps, node profiles),
  • the scheduler extender reads twin state to steer new pod placement,
  • feedback from new placements updates telemetry, closing the loop.

Section guide

  • Getting Started
    • core concepts, Helm-based install, workload class annotations, agent runtime modes, full configuration reference
  • Architecture
    • operator, agent, digital twin, and scheduler extender roles; CRD definitions; policy algorithms; telemetry and actuation interfaces; kubectl plugin
  • Hardware
    • CPU (RAPL) and GPU (NVML) support, heterogeneous node strategies, cap range discovery, hardware modeling for simulation
  • Simulator
    • trace-driven workload simulation, power and cooling models, facility stress testing, workload distribution profiles
  • Experiments
    • benchmark design, baseline comparisons, and measured power savings across heterogeneous clusters

What to expect

  • Per-node digital twins: telemetry → twin state → cap decisions and scheduling.
  • Kubernetes-native contracts: 2 user-facing CRDs (NodeHardware, NodeTwin) + scheduling constraints as intent/supply language.
  • Observability tooling: kubectl joulie plugin, Grafana dashboard, Prometheus metrics.
  • Practical path to adoption: quickstart first, then progressive deep dives.