Skip to content

Overview

stepscale-autoscaler is a self-hosted Kubernetes operator that analyzes your HorizontalPodAutoscalers and KEDA ScaledObjects and recommends - then, once approved, applies - better autoscaling configuration. It runs entirely in your cluster. This manual covers installation, configuration, and day-2 operation; it is written for a competent platform or SRE reader, and every command is copy-pasteable.

Applies to: chart and operator 0.1.x (CRD stepscale.io/v1alpha1).

The operator continuously analyzes how your autoscaled workloads actually behave and proposes better autoscaling configuration. It follows a deliberate, human-in-the-loop workflow:

recommend → approve → apply → verify → (auto-rollback on degradation)
  1. Recommend. On each reconcile tick the operator reads your HorizontalPodAutoscalers and the workloads behind them, builds a metric history, and runs a rule engine to detect optimization opportunities (idle windows, over-provisioning, scale lag, thrashing, and recurring peaks). An optional LLM pass - using your API key - judges the candidates, assigns a risk level, and writes a human-readable summary. The result is persisted as a ScalingRecommendation custom resource.

  2. Approve. Nothing is changed until a human approves it. You review the recommendation (kubectl get scalingrecommendations) and set spec.approved: true.

  3. Apply. The operator applies an approved recommendation to the live HPA or KEDA ScaledObject - adjusting minReplicas / maxReplicas, the CPU target, and the scale-down cooldown.

  4. Verify and auto-rollback. An applied change is held on probation. After the probation window the operator re-reads the workload’s health; if the change degraded it (for example, CPU pushed past target), the operator automatically reverts to the previous configuration. This is the safety net that makes the apply step low-risk.

For workloads with a strong recurring (daily/weekly) pattern, the operator can additionally emit a predictive schedule that pre-raises the replica floor ahead of each forecasted peak and restores the baseline afterward.

  • It does not replace the HPA or KEDA. Your existing autoscaler keeps making every real-time scaling decision. The operator only tunes its configuration; it never sits in the request path and is not a runtime dependency of your workloads.
  • It is advisory-first. No change is applied without explicit approval. If you never approve a recommendation, the operator is a pure read-only advisor.
  • It is not a SaaS. There is no central data plane, no cross-account access, and no dashboard hosted by stepscale.
  • Runs entirely in your cluster. Metric collection, analysis, and apply all happen in-cluster. No workload metrics or cluster data leave your account.
  • Offline license. The license is a signed file verified locally against a baked-in public key - no phone-home. The operator works fully air-gapped.
  • The only optional outbound call is to your chosen LLM provider’s API, and only when you configure one. With llm.provider=none the operator makes no external calls at all.
flowchart LR
    subgraph cluster["Your Kubernetes cluster"]
        direction LR
        OP["stepscale-autoscaler<br/>operator pod"]
        HPA["HPAs / KEDA<br/>ScaledObjects"]
        PROM[("Prometheus<br/>(optional)")]
        MS["metrics-server"]
        CRD["ScalingRecommendation<br/>CRs"]
        LIC["License Secret<br/>(offline, signed)"]

        HPA -- "read config + status" --> OP
        PROM -- "metric history" --> OP
        MS -- "live metrics" --> OP
        OP -- "emit recommendations" --> CRD
        CRD -- "approved: true" --> OP
        OP -- "apply / rollback patch" --> HPA
        LIC -- "verify offline" --> OP
    end

    OP -. "optional: BYO LLM API key" .-> LLM["LLM provider<br/>(OpenAI / Anthropic)"]

    USER["Platform / SRE"] -- "review + approve" --> CRD

The operator is a single small Rust binary shipped as a distroless, cosign-signed image. It holds no persistent state of its own beyond the ScalingRecommendation resources and a leader-election Lease; metric history is rebuilt from Prometheus (or accumulated from HPA status) on each run.

ConceptSummary
ScalingRecommendationNamespaced CR (stepscale.io/v1alpha1, short name scalerec) holding a target reference, a config diff, a risk level, a summary, optional projected savings, and an optional predictive schedule.
ApprovalHuman gate. The operator applies a recommendation only when spec.approved is true.
PhaseThe lifecycle state in status.phase: pendingappliedverified, or rolledBack / degraded / blocked / failed. See Usage.
ProbationThe window after an apply during which the operator watches health before declaring a change verified (or reverting it).
Predictive scheduleTime-based replica-floor raises emitted for workloads with a recurring peak, applied as KEDA cron triggers or operator-managed HPA floor changes.
License stateLicensed, Grace, or Unlicensed. Applying changes requires a valid (or in-grace) license; without one the operator runs analysis-only.

Next: Requirements and Installation, or jump to Usage and workflow.