Customer Spotlight: How Dinova built a Scalable and Resilient Managed Kubernetes Service.
Cloud Service Providers (CSPs) and Managed Service Providers (MSPs) face many complexities in delivering secure and scalable Kubernetes experiences to their customers. Managing dedicated tenant clusters while ensuring resource consistency and operational transparency is a significant challenge.
Clastix.io, the company behind Kamaji, an innovative Control Plane Manager for Kubernetes, partnered with Dinova, an Italian Company specialised in Digital Innovation through AI, Data, Cloud, and Cybersecurity, to create a robust solution that significantly improves the deployment and management of their Kubernetes clusters. In this post, we’ll explore how Sveltos and Kamaji enabled Dinova to efficiently deploy dedicated Kubernetes clusters, transparently provision nodes and resources, and maintain infrastructure integrity at scale.
Cloud providers such as Dinova serve a diverse range of customers, each requiring isolated Kubernetes clusters. These clusters need to be:
Isolated: Each customer (tenant) must have a dedicated cluster with no shared resources.
Transparent: Provisioning and management of nodes and resources should be seamless for the customer.
Resilient: Critical infrastructure resources must remain intact, even with customer actions.
Scalable: Deploying and managing numerous clusters efficiently is crucial.
Most traditional Kubernetes management solutions were not suitable for Cloud Providers as they miss the key features to fulfill the requirements: ensuring resource consistency while preventing customer disruptions.
To address these challenges, Dinova adopted Kamaji for control plane management and Sveltos for resource provisioning and drift detection.
Kamaji is a Kubernetes Control Plane Manager that allows service providers to run tenant cluster control planes as pods within a central management cluster, rather than on dedicated machines for each tenant. This approach, often called "Hosted Control Plane", is specifically designed for CSPs/MSPs and it’s based on the same underlying principles used by larger hyperscalers. It offers several benefits when compared with more traditional solutions:
Cost Efficiency: By running control planes as pods instead of virtual machines, it significantly reduces the infrastructure footprint. Kamaji eliminates the need for separate control plane nodes for each tenant cluster, saving on compute resources.
Scalability: Kamaji makes it easy to deploy hundreds or thousands of tenant clusters. For example, benchmarks show that Kamaji can provision a control plane in under 16 seconds and reconcile 100 control planes in less than 150 seconds.
Isolation: Each tenant’s control plane runs in its own isolated namespace within the management cluster, ensuring hard multi-tenancy. Tenants consume their Control Plane as a Service, without directly accessing the running pods, so they have neither visibility nor access into each other’s clusters.
Dinova installed Kamaji via Helm, following the official documentation, and configured it to use a dedicated datastore to store the state of each tenant cluster. For each customer, they create a TenantControlPlane resource, which Kamaji uses to spin up one or more dedicated control plane pods. The worker nodes for each tenant cluster are then joined to this control plane, running on separate virtual machines or bare metal instances in their infrastructure.
This setup allows them to provide each customer with a fully compliant Kubernetes cluster that feels like a dedicated instance, while they manage the control planes efficiently behind the scenes.
While Kamaji handles the control planes, Sveltos manages the provisioning of infrastructure resources across the tenant clusters. Sveltos is an open source Kubernetes add-on controller that runs in the management cluster and deploys resources to managed clusters (in their case, the tenant clusters created by Kamaji). Here’s how they use it:
Transparent Provisioning: Sveltos allows to programmatically deploy Kubernetes resources (e.g., add-ons, applications, or infrastructure components) to each tenant cluster without exposing the process to the customer. They define resources as Helm charts, raw YAML, or Kustomize manifests in the management cluster, and Sveltos ensures they are deployed to the appropriate tenant clusters.
Cluster Selection: Sveltos uses ClusterSelector labels to target specific tenant clusters. For example, they label each tenant cluster with metadata like customer=acme
or env=production
, and Sveltos deploys resources only to clusters matching those labels. This ensures that each customer gets the right set of resources tailored to their needs.
Integration with Cluster API: Since Kamaji integrates with Cluster API for worker node management, they use Sveltos alongside Cluster API to provision worker nodes. For instance, they define MachineDeployment resources in Cluster API to create worker nodes on the infrastructure (e.g., vSphere), and Sveltos ensures that additional resources like networking add-ons (e.g., Calico) or storage solutions (e.g., Longhorn) are deployed to those nodes.
By combining Kamaji and Sveltos, Dinova can fully automate the creation and configuration of tenant clusters. The customer simply requests a cluster, and they handle the rest: Kamaji spins up the control plane, Cluster API provisions the worker nodes, and Sveltos deploys the necessary resources, all without the customer needing to interact with the underlying infrastructure.
One of the biggest risks in a managed Kubernetes service is that customers might accidentally delete or modify critical infrastructure resources such as networking components, storage classes, or RBAC policies that are required for the cluster to function properly. To mitigate this, they rely heavily on Sveltos’ syncMode: ContinuousWithDriftDetection feature.
Here’s how it works in their setup:
Defining Critical Resources: They use Sveltos’ ClusterProfile resources to define the infrastructure components that must always exist in each tenant cluster. For example, they might specify a ClusterProfile that deploys the Calico CNI, a default StorageClass, and a set of RBAC roles for the customer’s admin user.
Continuous Monitoring: With syncMode: ContinuousWithDriftDetection, Sveltos continuously monitors the state of these resources in each tenant cluster. It compares the actual state of the cluster (what’s currently deployed) with the desired state (what’s defined in the ClusterProfile in the management cluster).
Automatic Correction: If a customer deletes or modifies a critical resource (e.g., they accidentally remove the Calico CNI), Sveltos detects the drift and automatically re-deploys the resource to restore the desired state. This ensures that the cluster remains functional and compliant with our standards, even if the customer makes a mistake.
This feature has been a game-changer for Dinova. It allows them to confidently delegate cluster access to customers without worrying about accidental misconfigurations breaking their clusters. If a customer deletes a critical resource, Sveltos silently restores it, ensuring uninterrupted service.
By combining Kamaji and Sveltos, Dinova achieved several key benefits as a cloud provider:
Operational Efficiency: Kamaji’s hosted control plane approach reduces the cost and complexity of managing tenant clusters at scale; they no longer need dedicated control plane nodes for each customer, which saves on infrastructure costs and simplifies operations.
Transparency: Sveltos allows provisioning resources behind the scenes, giving customers a seamless experience. They get a fully configured Kubernetes cluster without needing to understand or interact with the provisioning process.
Resilience: The ContinuousWithDriftDetection feature ensures that critical infrastructure resources are always present, even if a customer accidentally deletes them. This reduces support tickets and improves reliability.
Scalability: Kamaji’s ability to manage hundreds of control planes and Sveltos’ ability to deploy resources across a fleet of clusters make it easy for them to scale our managed Kubernetes service as our customer base grows.
Security and Isolation: Kamaji’s hard multi-tenancy ensures that each customer’s cluster is fully isolated, with no shared resources. Sveltos’ multi-tenancy features (e.g., ClusterProfile and Profile resources) further support this by allowing them to manage resources for each tenant independently.
While this setup works very well, there are a few challenges to keep in mind:
Management Cluster Reliability: Since both Kamaji and Sveltos run in the management cluster, it’s critical to ensure that the management cluster is highly available and resilient. Use a robust setup with multiple control plane nodes and regular backups to mitigate this risk.
Resource Monitoring: With ContinuousWithDriftDetection
, Sveltos can generate additional load on the management cluster as it monitors and reconciles resources across many tenant clusters. Carefully tune the management cluster’s resources to handle this at scale.
Customer Education: While we aim for transparency, we still need to educate customers about the boundaries of their clusters. For example, they need to understand that certain resources (like the CNI) are managed by Sveltos and should not be modified.
At Sveltos we are excited to see how this solution, in collaboration with Clastix.io, supports cloud providers like Dinova in delivering top-tier Managed Kubernetes Services. We encourage you to explore Sveltos and Kamaji to enhance your Kubernetes offerings.