Kubernetes Cluster Autoscaler: Development guide for your cloud, how to and what to keep in mind

Oct 30, 2025

-Shafin Hasnat

What is CA

To introduce kubernetes cluster autoscaler, it scales nodes of a kubernetes cluster automatically based on workload. If there are unscheduled pods in a cluster due to resource shortage, CA comes as a uncaped hero in this case. It horizontally scales your cluster nodes. If the workload is gone and nodes are underutilized and incurring cloud bill, CA automatically removes them. If you are building your own kubernetes as a service, it is a feature that you must offer. In this blog, I will write about development of Cluster Autoscaler for your own cloud.

Getting started

Kubernetes releases cluster autoscaler with it’s official release schedule. Autoscaler implementation for top cloud providers are included in the release. Kubernetes officially offers a repository (github) where cloud providers can implement their own CA and make a PR to get it released by kubernetes with it’s predefined release cycle. But if are a cloud provider and planning to develop your own CA, your kubernetes service should a implement some features to make it compliant with kubernetes CA implementation.

CA scales nodegroup, so your kubernetes service should expose API to scale up a nodegroup with a given delta.
API to scale down or delete node.
Nodegroup info api may include min/max and current count. It can also be passed as args.

Development

So to get started with development of CA for your cloud provider, first fork the autoscaler repository. Check the out the following path - autoscaler/cluster-autoscaler/cloudprovider/cloud_provider.go

type CloudProvider interface {
 // Name returns name of the cloud provider.
 Name() string

 // NodeGroups returns all node groups configured for this cloud provider.
 NodeGroups() []NodeGroup

 // NodeGroupForNode returns the node group for the given node, nil if the node
 // should not be processed by cluster autoscaler, or non-nil error if such
 // occurred. Must be implemented.
 NodeGroupForNode(*apiv1.Node) (NodeGroup, error)

 // HasInstance returns whether the node has corresponding instance in cloud provider,
 // true if the node has an instance, false if it no longer exists
 HasInstance(*apiv1.Node) (bool, error)

 // Pricing returns pricing model for this cloud provider or error if not available.
 // Implementation optional.
 Pricing() (PricingModel, errors.AutoscalerError)

 // GetAvailableMachineTypes get all machine types that can be requested from the cloud provider.
 // Implementation optional.
 GetAvailableMachineTypes() ([]string, error)

 // NewNodeGroup builds a theoretical node group based on the node definition provided. The node group is not automatically
 // created on the cloud provider side. The node group is not returned by NodeGroups() until it is created.
 // Implementation optional.
 NewNodeGroup(machineType string, labels map[string]string, systemLabels map[string]string,
  taints []apiv1.Taint, extraResources map[string]resource.Quantity) (NodeGroup, error)

 // GetResourceLimiter returns struct containing limits (max, min) for resources (cores, memory etc.).
 GetResourceLimiter() (*ResourceLimiter, error)

 // GPULabel returns the label added to nodes with GPU resource.
 GPULabel() string

 // GetAvailableGPUTypes return all available GPU types cloud provider supports.
 GetAvailableGPUTypes() map[string]struct{}

 // GetNodeGpuConfig returns the label, type and resource name for the GPU added to node. If node doesn't have
 // any GPUs, it returns nil.
 GetNodeGpuConfig(*apiv1.Node) *GpuConfig

 // Cleanup cleans up open resources before the cloud provider is destroyed, i.e. go routines etc.
 Cleanup() error

 // Refresh is called before every main loop and can be used to dynamically update cloud provider state.
 // In particular the list of node groups returned by NodeGroups can change as a result of CloudProvider.Refresh().
 Refresh() error
}

type NodeGroup interface {
 // MaxSize returns maximum size of the node group.
 MaxSize() int

 // MinSize returns minimum size of the node group.
 MinSize() int

 // TargetSize returns the current target size of the node group. It is possible that the
 // number of nodes in Kubernetes is different at the moment but should be equal
 // to Size() once everything stabilizes (new nodes finish startup and registration or
 // removed nodes are deleted completely). Implementation required.
 TargetSize() (int, error)

 // IncreaseSize increases the size of the node group. To delete a node you need
 // to explicitly name it and use DeleteNode. This function should wait until
 // node group size is updated. Implementation required.
 IncreaseSize(delta int) error

 // AtomicIncreaseSize tries to increase the size of the node group atomically.
 // It returns error if requesting the entire delta fails. The method doesn't wait until the new instances appear.
 // Implementation is optional. Implementation of this method generally requires external cloud provider support
 // for atomically requesting multiple instances. If implemented, CA will take advantage of the method while scaling up
 // BestEffortAtomicScaleUp ProvisioningClass, guaranteeing that all instances required for such a
 // ProvisioningRequest are provisioned atomically.
 AtomicIncreaseSize(delta int) error

 // DeleteNodes deletes nodes from this node group. Error is returned either on
 // failure or if the given node doesn't belong to this node group. This function
 // should wait until node group size is updated. Implementation required.
 DeleteNodes([]*apiv1.Node) error

 // ForceDeleteNodes deletes nodes from this node group, without checking for
 // constraints like minimal size validation etc. Error is returned either on
 // failure or if the given node doesn't belong to this node group. This function
 // should wait until node group size is updated.
 ForceDeleteNodes([]*apiv1.Node) error

 // DecreaseTargetSize decreases the target size of the node group. This function
 // doesn't permit to delete any existing node and can be used only to reduce the
 // request for new nodes that have not been yet fulfilled. Delta should be negative.
 // It is assumed that cloud provider will not delete the existing nodes when there
 // is an option to just decrease the target. Implementation required.
 DecreaseTargetSize(delta int) error

 // Id returns an unique identifier of the node group.
 Id() string

 // Debug returns a string containing all information regarding this node group.
 Debug() string

 // Nodes returns a list of all nodes that belong to this node group.
 // It is required that Instance objects returned by this method have Id field set.
 // Other fields are optional.
 // This list should include also instances that might have not become a kubernetes node yet.
 Nodes() ([]Instance, error)

 // TemplateNodeInfo returns a framework.NodeInfo structure of an empty
 // (as if just started) node. This will be used in scale-up simulations to
 // predict what would a new node look like if a node group was expanded. The returned
 // NodeInfo is expected to have a fully populated Node object, with all of the labels,
 // capacity and allocatable information as well as all pods that are started on
 // the node by default, using manifest (most likely only kube-proxy). Implementation optional.
 TemplateNodeInfo() (*framework.NodeInfo, error)

 // Exist checks if the node group really exists on the cloud provider side. Allows to tell the
 // theoretical node group from the real one. Implementation required.
 Exist() bool

 // Create creates the node group on the cloud provider side. Implementation optional.
 Create() (NodeGroup, error)

 // Delete deletes the node group on the cloud provider side.
 // This will be executed only for autoprovisioned node groups, once their size drops to 0.
 // Implementation optional.
 Delete() error

 // Autoprovisioned returns true if the node group is autoprovisioned. An autoprovisioned group
 // was created by CA and can be deleted when scaled to 0.
 Autoprovisioned() bool

 // GetOptions returns NodeGroupAutoscalingOptions that should be used for this particular
 // NodeGroup. Returning a nil will result in using default options.
 // Implementation optional. Callers MUST handle `cloudprovider.ErrNotImplemented`.
 GetOptions(defaults config.NodeGroupAutoscalingOptions) (*config.NodeGroupAutoscalingOptions, error)
}

The methods are pretty much well commented which explain the implementation a lot. In this blog, we will only implement the ‘must implement’ methods that are required for a working CA for your cloud.

CA lifecycle

There is not much information officially available about CA lifecycle. It calls the Refresh method implemented for your cloud provider every 10 seconds by default. This method refreshes with node group info which makes it eligible to scale. Based on the refresh and gathered data from the cloud provider API.

NodeGroups() method returns all nodegroups.
Nodes() method returns list of nodes belong to the nodegroup.
NodeGroupForNode() method returns the nodegroup that a node belongs to.
TargetSize() returns the size of a nodegroup.
MinSize() and MaxSize() returns the minimum and maximum limit to scale.

If cluster nodes are at capacity and there are unscheduled nodes, CA calls IncreaseSize() method with a delta which scales up the nodegroup. On the other hand, if there are underutilized nodes in the cluster, the CA taint and drain the nodes and calls DeleteNodes() method with multiple nodes that delete nodes from the cloudprovider.

Implementing CA

All of the CA implementation for your cloud will be written under the autoscaler/cluster-autoscaler/cloudprovider/ path of the autoscaler codebase along with the other cloud providers. In the autoscaler/cluster-autoscaler/cloudprovider/cloudprovider.go file, your provider will be listed.

autoscaler/
└── cluster-autoscaler/
    └── cloudprovider/
        └── <your_cloud_provider>/
            ├── <your_cloud_provider>_sdk/
            │   └── <your_cloud_provider>_sdk.go
            ├── <your_cloud_provider>_manager.go
            ├── <your_cloud_provider>_cloud_provider.go
            └── <your_cloud_provider>_node_group.go

Before we proceed, let’s shed some light on the structs that the methods will be bound to -

// Manager struct 
type Manager struct {
 ...
 nodeGroups []*NodeGroup
}

// NodeGroup struct
type NodeGroup struct {
 ...
 id        int
 count     int
 minSize   int
 maxSize   int
 nodeGroup ClusterNodeGroupFields
 nodes     []ClusterNodeFields
}

// yourCloudProvider struct
type CloudProvider struct {
 ...
 manager         *Manager
}

The <your_cloud_provider>_manager.go file contains the abstracted implementation of the cloud provider API from the sdk and authentication. Manger orchestrates node group state and API client interactions. It also contain the Refresh method which handles the reconciliation process of the CA by populating with the node group state for the CloudProvider implementation which gets returned by the Refresh method which is implemented by CloudProvider interface.

<your_cloud_provider>_node_group.go implements the methods of the NodeGroup interface. It is mainly responsible for scaling up and down the target node group.

<your_cloud_provider>_cloud_provider.go implements the methods of the CloudProvider interface which are bound to the struct of the cloud provider. Based on the nodegroup data aggregated in the Refresh loop, the Methods of CloudProvider informs the CA about the nodes and nodegroups of the cluster. It also implements the build function that constructs your cloud provider instance for cluster autoscaler which has to be included under the builder/ path.

autoscaler/
└── cluster-autoscaler/
    └── cloudprovider/
        └── builder/
            ├── builder_go.go
            └── builder_<your_cloud_provider>.go

Authentication

CA by default supports --cloud-config arg to pass your cloud config file. Environment variables can also be used for api keys or other authentication modes.

Build and install

Refer to the autoscaler/cluster-autoscaler/Makefile to compile, build and publish the CA. Dockerfile is present under the same path. To install the CA to the cluster, helm chart offered under the same repo in the autoscaler/cluster-autoscaler/charts path can be used. Make sure to set values for your image with image.repository and image.tag and other required value. The helm chart includes necessary roles to run CA smoothly. Custom manifest also can be written in this case. Here is a list of required roles-

rules:
  - apiGroups: [""]
    resources: ["events", "endpoints", "configmaps"]
    verbs: ["create", "patch", "get", "list", "watch", "update"]
  - apiGroups: [""]
    resources: ["pods/eviction"]
    verbs: ["create"]
  - apiGroups: [""]
    resources: ["pods/status"]
    verbs: ["update"]
  - apiGroups: [""]
    resources: ["endpoints"]
    resourceNames: ["cluster-autoscaler"]
    verbs: ["get", "update"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["watch", "list", "get", "update", "delete"]
  - apiGroups: [""]
    resources: ["pods", "services", "replicationcontrollers", "persistentvolumeclaims", "persistentvolumes"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["apps"]
    resources: ["statefulsets", "daemonsets", "replicasets", "deployments"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses", "csinodes"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["batch", "extensions"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    verbs: ["create", "get", "list", "watch", "update", "delete"]
  - apiGroups: ["policy"]
    resources: ["poddisruptionbudgets"]
    verbs: ["watch", "list"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["csidrivers", "volumeattachments", "csistoragecapacities"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["namespaces"]
    verbs: ["get", "list", "watch"]

Since CA operates on the worker nodes, it is recommended to run it on the master node. Here is a sample deployment manifest for CA-

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
        - name: cluster-autoscaler
          image: ghcr.io/nexgencloud/autoscaler/cluster-autoscaler:latest
          imagePullPolicy: Always
          command:
            - ./cluster-autoscaler
            - --cloud-provider=<your_cloud_provider>
            - --namespace=kube-system
            - --v=4
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists
              - matchExpressions:
                - key: node-role.kubernetes.io/master
                  operator: Exists
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule

Since cluster autoscaler scales worker nodes, it is highly recommended to annotate the master nodes with cluster-autoscaler.kubernetes.io/scale-down-disabled: true during reconciliation process.

Test

It is highly recommended to write unit test for your CA implementation to get your PR accepted in the upstream. You can test CA end-to-end from this test suite (github).

To test the CA on a cluster, first verify it is running. Then create a deployment with high cpu and memory request and scale it up to large number. If there are unscheduled pod, after a while you will see the cluster is scaling up to the max number. If scaled down, you will see nodes are soft tainted as delete candidate, and eventually, nodes will be evicted and deleted from cloud and scaled down limiting to the min count. It is seen that, during scaling down, nodes are soft tainted with PreferNoSchedule exceeding the min count, which is eventually removed.

Release

It is important to pass all the e2e test to get PR approved. Also proper comments and licensing is important to pass CI pipelines.

Useful links

CA FAQ: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md

Before we end

The blog has covered most parts to implement an end to end cluster autoscaler for your cloud provider. CA is a must have feature for your kubernetes as a service. Feel free to comment or reach out to me for more. Email: [email protected] Website: https://www.shafinhasnat.me/ LinkedIn: https://www.linkedin.com/in/shafinhasnat/

Read on medium