Overview
stepscale-autoscaler is a self-hosted Kubernetes operator that analyzes your HorizontalPodAutoscalers and KEDA ScaledObjects and recommends - then, once approved, applies - better autoscaling configuration. It runs entirely in your cluster. This manual covers installation, configuration, and day-2 operation; it is written for a competent platform or SRE reader, and every command is copy-pasteable.
Applies to: chart and operator 0.1.x (CRD stepscale.io/v1alpha1).
1.1 What it does
Section titled “1.1 What it does”The operator continuously analyzes how your autoscaled workloads actually behave and proposes better autoscaling configuration. It follows a deliberate, human-in-the-loop workflow:
recommend → approve → apply → verify → (auto-rollback on degradation)-
Recommend. On each reconcile tick the operator reads your HorizontalPodAutoscalers and the workloads behind them, builds a metric history, and runs a rule engine to detect optimization opportunities (idle windows, over-provisioning, scale lag, thrashing, and recurring peaks). An optional LLM pass - using your API key - judges the candidates, assigns a risk level, and writes a human-readable summary. The result is persisted as a
ScalingRecommendationcustom resource. -
Approve. Nothing is changed until a human approves it. You review the recommendation (
kubectl get scalingrecommendations) and setspec.approved: true. -
Apply. The operator applies an approved recommendation to the live HPA or KEDA ScaledObject - adjusting
minReplicas/maxReplicas, the CPU target, and the scale-down cooldown. -
Verify and auto-rollback. An applied change is held on probation. After the probation window the operator re-reads the workload’s health; if the change degraded it (for example, CPU pushed past target), the operator automatically reverts to the previous configuration. This is the safety net that makes the apply step low-risk.
For workloads with a strong recurring (daily/weekly) pattern, the operator can additionally emit a predictive schedule that pre-raises the replica floor ahead of each forecasted peak and restores the baseline afterward.
1.2 What it is not
Section titled “1.2 What it is not”- It does not replace the HPA or KEDA. Your existing autoscaler keeps making every real-time scaling decision. The operator only tunes its configuration; it never sits in the request path and is not a runtime dependency of your workloads.
- It is advisory-first. No change is applied without explicit approval. If you never approve a recommendation, the operator is a pure read-only advisor.
- It is not a SaaS. There is no central data plane, no cross-account access, and no dashboard hosted by stepscale.
1.3 Trust and data residency
Section titled “1.3 Trust and data residency”- Runs entirely in your cluster. Metric collection, analysis, and apply all happen in-cluster. No workload metrics or cluster data leave your account.
- Offline license. The license is a signed file verified locally against a baked-in public key - no phone-home. The operator works fully air-gapped.
- The only optional outbound call is to your chosen LLM provider’s API, and only when
you configure one. With
llm.provider=nonethe operator makes no external calls at all.
1.4 Architecture
Section titled “1.4 Architecture”flowchart LR
subgraph cluster["Your Kubernetes cluster"]
direction LR
OP["stepscale-autoscaler<br/>operator pod"]
HPA["HPAs / KEDA<br/>ScaledObjects"]
PROM[("Prometheus<br/>(optional)")]
MS["metrics-server"]
CRD["ScalingRecommendation<br/>CRs"]
LIC["License Secret<br/>(offline, signed)"]
HPA -- "read config + status" --> OP
PROM -- "metric history" --> OP
MS -- "live metrics" --> OP
OP -- "emit recommendations" --> CRD
CRD -- "approved: true" --> OP
OP -- "apply / rollback patch" --> HPA
LIC -- "verify offline" --> OP
end
OP -. "optional: BYO LLM API key" .-> LLM["LLM provider<br/>(OpenAI / Anthropic)"]
USER["Platform / SRE"] -- "review + approve" --> CRD
The operator is a single small Rust binary shipped as a distroless, cosign-signed image. It
holds no persistent state of its own beyond the ScalingRecommendation resources and a
leader-election Lease; metric history is rebuilt from Prometheus (or accumulated from HPA
status) on each run.
1.5 Core concepts
Section titled “1.5 Core concepts”| Concept | Summary |
|---|---|
ScalingRecommendation | Namespaced CR (stepscale.io/v1alpha1, short name scalerec) holding a target reference, a config diff, a risk level, a summary, optional projected savings, and an optional predictive schedule. |
| Approval | Human gate. The operator applies a recommendation only when spec.approved is true. |
| Phase | The lifecycle state in status.phase: pending → applied → verified, or rolledBack / degraded / blocked / failed. See Usage. |
| Probation | The window after an apply during which the operator watches health before declaring a change verified (or reverting it). |
| Predictive schedule | Time-based replica-floor raises emitted for workloads with a recurring peak, applied as KEDA cron triggers or operator-managed HPA floor changes. |
| License state | Licensed, Grace, or Unlicensed. Applying changes requires a valid (or in-grace) license; without one the operator runs analysis-only. |
Next: Requirements and Installation, or jump to Usage and workflow.