kamaji multi-tenancy AWS Kubernetes

Overcoming EKS limitations with Kamaji on AWS

Amazon EKS, besides being one of the most used Kubernetes distributions and managed Kubernetes service worldwide, brings several limitations: in this blog post, we discover how Kamaji can overcome these, maintaining the same ease of a managed service thanks to its Auto-Pilot feature.

Thursday, January 18, 2024 Dario Tranchitella

In this blog post, we’re going to show how Kamaji can be used to create Control Planes on AWS, the popular Infrastructure Provider.

Kamaji, the hosted control plane manager

Kamaji is an Open Source project, backed by CLASTIX, which allows running Kubernetes control planes as Pods rather than virtual machines: this approach offers increased flexibility and resource efficiency.

By containerizing the control plane components and deploying them as pods, organisations can take advantage of container orchestration features, such as auto-scaling and easier resource management, simplifying Day-2 Operations thanks to automation.

Furthermore, Kamaji’s approach mostly known as Hosted Control Planes allows for more dynamic scaling of the control plane components based on demand, optimizing resource utilisation and providing a more streamlined infrastructure: Additionally, containerized control plane components enable faster deployment, updates, and rollback processes, contributing to improved agility and responsiveness in managing the Kubernetes cluster.

Why not use EKS?

Elastic Kubernetes Service (EKS) is a Kubernetes-managed service offered by AWS. The Control Plane is externally managed by Amazon on your behalf, by maintaining it in high availability, and providing automatic updates and scaling, making it easier for developers to focus on building and deploying applications rather than managing the Kubernetes infrastructure.

Even tho EKS is one of the most used managed Kubernetes services, it brings some pitfalls:

  • Feature Lag: as a managed service, there have been delays in EKS supporting the latest Kubernetes alpha features, pushing some users to prefer managing their clusters to have more control over updates and feature adoption.

  • Hard setup: although a managed service, using EKS, at first sight, seems trivial and can integrate seamlessly with AWS services. However, it requires an opinionated approach with several integrations: if you’re a seasoned Kubernetes expert, you could face the AWS fatigue during the integration with other AWS services.

  • Extended support: AWS recently extended the support for deployed Kubernetes, for a total of 26 months for each Kubernetes minor version. After this amount of time, your organization is forced to update the Kubernetes version, unless you incur an additional charge to use the Extended Support, which pricing is not yet publicly available.

  • Third-party CNIs break AWS support: AWS offers top-tier support for their services, and EKS is part of them. However, if you’re planning to drop in with a third-party CNI, such as Calico or Cilium, the cluster cannot be validated by AWS Support, without any chance to get professional, dedicated, and highly-priced technical support.

  • No sleep mode: although pretty cheap, EKS doesn’t support a sleep mode, such as turning off the Control Plane when needed, such as other managed services like Azure Kubernetes Service (AKS).

  • Single VPCs: at the current date, EKS supports the joining of compute nodes from only a Virtual Private Cloud (VPC), bringing potential issues in case of IP exhaustion in certain circumstances.

EKS offers a very low-friction day-to-day experience, and it’s a perfect solution for companies that don’t have a dedicated staff for managing Kubernetes. However, if you are going to run Kubernetes at scale and your business is entirely based on Kubernetes, you could potentially stumble upon limitations and frictions which are frontloaded for common users.

Installing Kamaji on AWS

Given the potential limitations of EKS, organizations could take full advantage of Kamaji to provide an EKS-similar service, but:

  • Based on upstream Kubernetes

  • By having full control of the Control Planes, such as customising the API Server flags

  • A Kubernetes native approach by leveraging on Custom Resource Definitions

  • Speeding up onboarding of clusters, with Control Planes offered by Kamaji being able to spin up in a matter of seconds (your mileage may vary)

For our purpose, you can still create a Kubernetes cluster on AWS, by leveraging the Cluster API AWS Infrastructure Provider which creates EC2 instances, or by using an EKS cluster as the management one required by Kamaji according to your needs.

Once the Management Cluster is provisioned and ready, Kamaji can be installed by using the Helm procedure: we’re giving for granted a valid CSI is already there, since it’s required for the Tenant Control Plane’s status stored in the provided Datastores. Otherwise, you can install the Amazon EBS CSI driver by following the official documentation.

$: kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-0 1/1 Running 0 68d
etcd-1 1/1 Running 0 68d
etcd-2 1/1 Running 0 68d
kamaji-55bf4f7d49-jt6wq 1/1 Running 0 25d

$: kubectl get crds
NAME CREATED AT
datastores.kamaji.clastix.io 2023-11-11T17:13:00Z
tenantcontrolplanes.kamaji.clastix.io 2023-11-11T17:13:00Z

$: kubectl get datastores.kamaji.clastix.io
NAME DRIVER AGE
default etcd 68d

AWS Load Balancer CNAME issue

The Kubernetes API Server is announcing itself by using a set of IP addresses. If you’re using a kube-proxy networking solution, these IPs are filled in the iptables chain (or IPVS) to resolve the kubernetes.default.svc endpoint.

As stated in several blog posts and the official documentation, Kamaji implements the Hosted Control Plane architecture where  Tenant Control Planes run in the Management Cluster as pods: although supporting the ability to expose the Tenant Control Plane API Server using an Ingress Controller, worker nodes still need an IP address. This can be achieved by using a LoadBalanacer service which, thanks to a Cloud Controller, can provision an IP address used to interact with the workload deployed in the cluster.

Kubernetes clusters deployed on AWS offer the integration of the AWS Cloud Provider, allowing the creation of ALB and NLB instances which are exposed by using a Fully Qualified Domain Name (FQDN), and reported back in the Service status subresource. A valid IP can be retrieved for the given domains, however, these IPs are not fixed and could change upon certain circumstances, such as during Disaster Recoveries, migrations, or creation back upon a wrong deletion.

To overcome this limitation, a LoadBalancer service can use an Elastic IP, a static and public IPv4 address designed for dynamic cloud computing: thanks to it, we can easily determine in advance the IP that will be assigned to the Tenant Control Plane to announce itself, and then used by the worker nodes.

To let Tenant Control Planes work on AWS, the management cluster must have the AWS Load Balancer Controller, available in the AWS EKS Helm Chart repository: this can be installed by following the official documentation, and by setting the proper IAM authentication.

Creating the required AWS resources

Before the creation of a Tenant Control Plane, we must have some pre-recreated resources, as well as know in advance some others.

  • Elastic IP: this is going to be our public and fixed IPv4 address that will be used to interact with the Kubernetes cluster.

  • Elastic IP allocations: the unique identifier of our IPv4 address associated with the subnet where we’d like to deploy our Tenant Control Plane pods.

  • Subnet: the AWS Subnet where the Tenant Control Plane pods will be deployed.

An example manifest of the control plane could be something like this:

apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
  name: my-aws-control-plane
  namespace: default
spec:
  controlPlane:
    deployment:
      replicas: 2
      nodeSelector:
        topology.kubernetes.io/zone: eu-central-1a
    service:
      additionalMetadata:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
          service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-10c14e4290f602421
          service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
          service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-284a0230e20ffd77e
          service.beta.kubernetes.io/aws-load-balancer-type: nlb
      serviceType: LoadBalancer
    version: v1.29.0
  networkProfile:
    address: 18.194.235.159

Your mileage may vary according to your environment, especially in the following fields:

  • The key spec.controlPlane.deployment.nodeSelector must refer to the subnet where you’d like to deploy your control plane pods

  • All the mentioned Tenant Control Plane Service annotations are required, and the following values may vary depending on your environment, such as:

    • aws-load-balancer-eip-allocations: the Elastic IP allocation ID you can retrieve from the Elastic IP addresses page in your EC2 Network & Security section

    • aws-load-balancer-subnets: the subnet ID, must match the node selector mentioned early

  • Finally, you can set the Elastic IP value as spec.networkProfile.address: unfortunately, the AWS Load Balancer controller is not able to extract the Elastic IP assigned to the Service and this value must be user-provided.

We’re taking for granted the Tenant Control Plane will be reachable externally, mostly because nodes couldn’t be in the same subnets, as well as to keep the guide straightforward and trivial: this is the reason for the internet-facing Load Balancer scheme value.

Furthermore, a Network Load Balancer is used since the TLS termination is processed by Tenant Control Plane pods: Kamaji supports exposure of Tenant Control Planes through an Ingress Controller, but again, for the easiness of setup sake, we’re illustrating the simpler approach.

Creating and joining tenant worker nodes

Finally, once our Tenant Control Plane is ready, the admin kubeconfig can be retrieved from the management cluster: it’s available in the Secret named my-aws-control-plane-admin-kubeconfig.

The following command can be used to simplify the retrieval, although CLASTIX developed a comfy dashboard to manage your Tenant Control Plane instances: kubectl get secret my-aws-control-plane-admin-kubeconfig -o jsonpath='{.data.admin\.conf}' | base64 -d.

The given secret must be used to generate a kubeadm token that will be used by our worker nodes to perform the TLS bootstrap and join the cluster: by using the provided kubeconfig, the command can be retrieved by issuing the following command, KUBECONFIG=/path/to/my-aws-control-plane-kubeconfig kubeadm token create --print-join-command.

The resulting output should look similar to this: kubeadm join 18.194.235.159 --token 7yh9t9.avjqzjekax6qcyy1 --discovery-token-ca-cert-hash sha256:38843ea0461b28f45228ccbfb0674b3593fe459dcd14df73548afe72dec1c09. If you want to know more about how nodes join thanks to kubeadm, the official Kubernetes documentation is a good place to start.

It’s time to create our EC2 instances, which would require some preconfigured packages and dependencies: we can easily rely on the Cluster API AWS AMIs which are golden images used by the official Cluster API AWS Infrastructure provider. You can retrieve your AMIs with the following command, by replacing the placeholders: clusterawsadm ami list --kubernetes-version v1.29.0 --region eu-central-1 --os ubuntu-20.04.

Once we retrieved our AMIs, finally, it’s finally time to create our EC2 instances: you can pick your preferred strategy, such as relying on AWS Auto Scaling Group, Cloud Formation, or bare EC2 instances: once provisioned and ready, the generated kubeadm join command must be issued on all the nodes that will form the cluster. If you want to have a fancy automation, the command be put as user data, making it launched at the first boot time.

Now it’s just a matter of time, and by using the admin kubeconfig, you’ll see your nodes have joined the Tenant Control Plane!

NAME            STATUS   ROLES    AGE   VERSION
ip-10-0-100-1   Ready    <none>   8h    v1.29.00
ip-10-0-100-2   Ready    <none>   8h    v1.29.00
ip-10-0-100-3   Ready    <none>   8h    v1.29.00

Finally, it’s time for your own integration, such as using Karpenter to dynamically allocate nodes and optimize your AWS bill, as well as taking full advantage of the sleep mode offered by Kamaji which turns off the Control Plane, which can be turned back on in a matter of seconds!

High Availability with Multiple Subnets

The said guide showed how to provision Tenant Control Planes deployed in a single subnet: this is not a suggested approach since Availability Zones span across just a single Availability Zone and don’t cover you from high availability outages.

Although incurring higher costs, a Tenant Control Plane can be created across multiple Availability Zones by leveraging on the Pod Topology Spread Constraints: the resulting pods can be customized by modifying the Tenant Control Plane specification, especially the spec.controlPlane.deployment.topologySpreadConstraints.

At the same time, Elastic IPs can be associated with one and only subnet, requiring the same amount of allocations in the Tenant Control Plane Service annotation key (service.beta.kubernetes.io/aws-load-balancer-eip-allocations), as well as the list of subnets (service.beta.kubernetes.io/aws-load-balancer-subnets). The additional IPs must be added to the certificate subject alternative name (SAN) list, available in the Tenant Control Plane specification key spec.networkProfile.certSANs.

As a result, your manifest could look like the following:

apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
  name: my-aws-control-plane
  namespace: default
spec:
  controlPlane:
    deployment:
      replicas: 2
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: "topology.kubernetes.io/zone"
        whenUnsatisfiable: ScheduleAnyway
      - maxSkew: 1
        topologyKey: "kubernetes.io/hostname"
        whenUnsatisfiable: ScheduleAnyway
    service:
      additionalMetadata:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
          service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-10c14e4290f602421,eipalloc-1400292124ec6f410
          service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
          service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-284a0230e20ffd77e,subnet-7d22fee80a742030f
          service.beta.kubernetes.io/aws-load-balancer-type: nlb
      serviceType: LoadBalancer
    version: v1.29.0
  networkProfile:
    certSANs:
    - 81.91.32.119
    address: 18.194.235.159

These are just simple and ready-to-use guidelines, there could be more advanced configurations to optimise traffic and decrease cost exposure, especially regarding multi-AZs setups, as well as protecting the cluster endpoint by having a private cluster: CLASTIX can help you with professional services, according to your business requirements.

Cluster API support

Especially for the worker nodes join process, the approach can be cumbersome and error-prone given the lack of proper automation. CLASTIX's commitment to Open Source is already in progress, and we’re contributing upstream in providing the ability to provision Control Planes backed by Kamaji and Worker Nodes on AWS by using the Cluster API AWS Infrastructure Provider, which offers several interesting features concerning multi-tenancy.

This feature will provide seamless integration between Cluster API AWS-based resources with Kamaji Tenant Control Planes, and it will be available by declaring the Control Plane as a KamajiControlPlane as you can do with the already supported Infrastructure Provider.

Conclusion

In this article, you learned that Kamaji allows you to overcome AWS EKS limitations by creating a Managed Kubernetes Service which is totally under your control. After provisioning the required resources, such as Elastic IPs, you can create any number of Tenant Control Planes, which can be consumed by your worker nodes that can be placed in other AWS VPCs or even AWS accounts.

Tenant Control Planes have a number of benefits, including a declarative and native Kubernetes approach thanks to the Custom Resource Definition model, auto-pilot mode since running as Pods in a management cluster, scale-to-zero to achieve sleep mode, and upstream bootstrap support to kubeadm. Check out the documentation to learn more!