CLASTIX is working on evangelizing the benefit of addressing multi-tenancy on Kubernetes in the right path, as we’re working on drafting the first-ever State of the Multi-Tenancy report.
So far, we had the chance to talk to many companies that are running up to thousands of containers on Kubernetes and going to solve their governance issues with multi-tenancy: however, let’s try to understand what is multi-tenancy, first.
From a straightforward definition, multi-tenancy is a way to use a shared infrastructure to run isolated workloads owned by tenants. In the Kubernetes landscape, you could think about the Namespace resource that allows grouping workloads under a logical partition of your cluster, but the reality is way more complex.
Each Tenant could need more than a single Namespace for various reasons, like sub-departments, sub-tenants, or any other application or environment constraints: tl;dr; your mileage may vary.
When your multi-tenancy model needs more Namespace resources, Kubernetes lacks a native offering due to the scope of its primitives designed for Multi-Tenancy, such as Resource Quota (how can you set a hard quota of maximum CPU, memory, or storage for a Tenant?), no ACL support on the Kubernetes API Server (how can I get the list of my Tenant Namespaces?), and missing policy enforcement (how can I prevent consuming resources allocated to another Tenant?).
The legacy approach to solving these limitations has been shifting multi-tenancy to the infrastructure level, requiring a new cluster for each tenant: and this is where the cluster sprawl began. Your organization will have to manage multiple clusters with all the pros and cons, such as higher costs, the need of enabling a governance model tied to the underlying infrastructure, the operational burden in regards of life-cycle of the clusters, struggling in having a single pane of glass for all your clusters.
Last but not least, who’s going to manage this fleet of clusters? You have to pick in defining an SRE team for each tenant, with the shortage of highly skilled professionals, or having a single SRE team that will need to cope with a dozen, hundred, or thousand of clusters?
This is a common situation we noted in many large organizations, especially in the finance sector where various departments are acting as a sort of inner and smaller organization, and require a solution to run their workloads without focusing on the underlying infrastructure.
According to the latest reports, AWS EKS seems the most used service to run Kubernetes on the cloud, thus, this reference architecture is focusing on this managed service to make it simple.
The question is, how can Kubernetes play a role here despite the problems stated above? The answer is Capsule, our flag-ship open-source project aimed to make multi-tenancy easy.
Installing Capsule on AWS EKS is straightforward, as Dario Tranchitella (Technical Advisor) showed in this YouTube video: give it a try and reach him directly for any questions or issues!
Using the Capsule jargon, a Cluster Administrator can leverage any Kubernetes cluster with multi-tenancy by defining a set of tenants, as defined in the following manifest.
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: us-east-branch
spec:
owners:
- kind: User
name: alice@bank.us
- kind: Group
name: us-east@branch.bank.us
As a simple starting point, Alice, and all the employees of the US east branch people, will be allowed to create a set of Namespaces in the Kubernetes cluster that will be bounded to the tenant, creating a sort of hierarchical ownership.
What happened if Bob or any other member of another branch tries to create a Namespace without a Tenant definition entitlement? Simply, the creation will be rejected: no tenant, no party.
The self-service provisioning of Namespace, according to the tenant creation and boundary, is the starting point for your multi-tenant Kubernetes platform: let’s see the next steps to provide further isolation, security, and resource management to your tenants.
The self-service capability is the first milestone for your internal developer platform based on Kubernetes, but it’s not enough. Granting unlimited permissions, without proper enforcement, leads to potential issues and privilege escalations, as well as missing constraints on the competence of costs: this is where governance approaches to multi-tenancy.
As the financial sector is operating with sensitive data, although tenants are part of the same organization, you have to adhere to the security-first approach. Kubernetes allows using a set of worker nodes to distribute the cluster workload, thus, a malicious workload could be placed in the same node where a sensitive one is processing information.
The first starting point is to enforce all Namespace resources to take advantage of the Pod Security Standards, an admission controller in Kubernetes that, according to a set of labels defined for the Namespace, prevents running workloads according to a security profile. Capsule allows labelling all the Namespace objects belonging to a Tenant with these values, performing a constant reconciliation of these and blocking any update or removal of those.
If running workloads on a shared node pool is not good enough and makes your CISO worried about it, the PodNodeSelector admission controller can force the scheduler to place Pods (thus, workloads) on a dedicated node pool: as with PSP, Capsule enforces this behaviour preventing the change of the required fields in the Namespace definition.
In a software-defined world as with Kubernetes, workloads can communicate with each other, a nightmare in regulated environments where each service must be surveyed with the approval of the networking department. Luckily, Kubernetes offers the NetworkPolicy API that grants egress and ingress connectivity to the workloads deployed in the Namespace. However, unless using third-party and vendor solutions, and hacking the Kubernetes RBAC, these policies can be deleted by the Tenant owners, having the chance to exploit missing network isolation by starting a network escalation and attacking other workloads, or the Kubernetes API server, leading to a denial of service. Capsule allows defining of a set of NetworkPolicies that must be replicated across all the tenant Namespaces and defining those as tenant-scoped, and blocking any update or delete operation by unauthorized users. Tenant owners can still create their application-scoped NetworkPolicies resources, without compromising the expected developer experience.
Workloads in Kubernetes can be exposed in different ways: on EKS you can ask for a LoadBalancer, or bind it to a worker port. How can you control which services can request a new Load Balancer (that has a billing cost, by the way) or binding to a specific port? Your organization need to tame the networking exposure of the tenant services by blocking the NodePort and LoadBalancer services and allowing to expose Services through a specific Ingress Controller: all of these requirements are matched by Capsule that allows setting a fine-grained per tenant-based control over the Service types and Ingress Controller class a tenant can use. Also, to prevent traffic sniffing, Capsule can enforce the uniqueness of hostnames for which your services are accepting traffic, as well as define a set of valid hostnames a tenant can use. If your workload exposition is not backed by HTTP traffic but rather bare TCP/UDP, Kubernetes allows you to expose those using external IPs, great, however, you cannot define which IPs a tenant can use, unless you’re not using Capsule!
Workloads play an important role, that’s true, and how can you trust those? You need to trust just a specific set of registries, avoiding public images that could lead to crypto-mining or backdoor container images that can compromise your cluster: remember, security first! And Capsule is offering multi-tenancy with a security-first approach, allowing defining of a set of trusted container registries on a Tenant basis and preventing running untrusted images.
A final discussion must be shed in regards to quota enforcement: as we said earlier, defining cross-namespace quotas such as the maximum amount of pods, CPU, memory, or storage, is not possible: this lead to the creation of a dedicated cluster for each tenant, leading to the cluster sprawl phenomenon, and the burnout of many SREs in maintaining these installations. Capsule allows defining resource quotas at the tenant scope level, without introducing any additional plugin and leveraging the upstream Kubernetes API types: your SREs will love it, as well as your CFO.
At CLASTIX, we’ve been developers in the first place, and we love the CLI as our mechanical keyboard. Using a shared cluster allows for optimising resources and costs, and taming the governance by enforcing policies from a single entry point. However, unless rewriting Kubernetes itself, the missing ACL support for getting just your tenant resources such as Namespace, PersistentVolume, or the cluster-scoped resources entitled to use such as Nodes, PriorityClass, RuntimeClass, IngressClass, or StorageClass, is still unresolved.
Well, it isn’t. Please, let me introduce the Capsule Proxy that automatically allows filtering these resources and gives the same expected Kubernetes experience for CLI maniacs.
Multi-Tenancy is getting more interest from large organizations that need to optimize costs, have a homogenous infrastructure used by multiple actors, and ensure security compliance on different levels, such as banks, pension funds, insurance agencies and trading platforms just to name the biggest beneficiaries.
A typical organization could have up to a hundred tenants, although your mileage may vary, and the hardware saving can be terrific: using a Capsule cluster your organization is going to manage a single AWS EKS cluster, rather than 100, with a 90% of cost saving.
Besides the governance benefits, resource optimizations, and bare financial savings, I want to shed a light on the real operational saving, people. More clusters require more people, and unfortunately, we’re not living with an abundance of highly skilled technicians. A multi-tenant Kubernetes cluster allows you to serve up to a hundred, or even a thousand! of Tenants since the enforcement, governance, and security are offloaded to Capsule and its policy engine, giving your SRE people chill time.
Well, it depends. However the enormous benefits of multi-tenancy, some questions must be still solved such as logging, metrics, monitoring, and CI/CD. Tenants want to take full advantage of Kubernetes itself and your organization must be proactive in shaping a proper Internal Developer Platform to empower what developers are doing at, building and operating the software of your organization.
Dear readers, we hope you found this article useful and that it shed some light on an area that would benefit greatly from the adoption of Kubernetes and wise use of multitenancy; but to finish we also want to add some pragmatism and an action item for you, if you are interested in following this path.
If you want to know more, interact with us by following our LinkedIn, Twitter and YouTube profiles. Join our open community on MeetUp and chat with us on Slack.
CLASTIX is the commercial company behind Capsule, and one of the leaders in multi-tenancy Kubernetes solutions. We’d be happy to get in touch with your organization for an assessment and to present our commercial solutions to address all your needs, as well as offer technical support for your production usage.