kamaji BareMetal Hosted Control Plane edge computing kMetal

Bringing the Cloud Experience to the Edge with Kamaji

In this article, we explore how Kamaji, combined with Cluster API and kMetal, enables a powerful alternative to delivering a true managed Kubernetes experience across cloud and edge, with full control, higher resilience, and without vendor lock-in.

Monday, April 6, 2026 Dario Tranchitella

Cloud providers have spent years refining the managed Kubernetes experience into something that feels almost effortless: users request a cluster, and everything else is handled behind the scenes. Companies such as OVHcloud, IONOS, and Rackspace have successfully delivered this model at scale, largely thanks to architectures built around hosted control planes.

At the centre of many of these implementations is Kamaji, an open-source project developed by CLASTIX that enables running multiple Kubernetes control planes as lightweight workloads inside a management cluster.

However, while this model works extremely well in centralised cloud environments, it becomes significantly more complex when organisations attempt to extend the same experience to the edge.

This raises an important question: how can we bring the same level of simplicity, automation, and reliability to environments that are inherently less predictable than the cloud?

Understanding the Edge

The term edge is often overused, but in practical terms, it refers to computing environments that operate outside traditional cloud data centres, typically closer to where data is generated and consumed. These environments can include manufacturing plants, retail locations, telecommunications infrastructure, hospitals, or transportation systems.

What makes the edge fundamentally different is not just its location, but the constraints under which it operates. Unlike cloud environments, which are designed for homogeneity and abstraction, edge infrastructure is often heterogeneous. Hardware may come from different vendors, be deployed over many years, and vary significantly in capabilities and configuration.

Operational access is another key limitation. In many cases, engineers cannot simply log into a system or replace a component remotely. Physical intervention may be required, which introduces delays and operational overhead. This reality stands in stark contrast to the elasticity and immediacy that characterise cloud environments.

Network conditions further complicate the picture. Edge locations frequently experience intermittent connectivity, reduced bandwidth, and higher latency. In some scenarios, temporary disconnections are not exceptions but expected behaviour.

Finally, physical constraints such as limited space, power, or cooling capacity impose additional design considerations. These constraints force organisations to rethink how infrastructure and control planes are deployed and managed.

The Traditional Approach: Centralised Control Planes

A natural first attempt at extending Kubernetes to the edge is to maintain control planes in the cloud while placing worker nodes at edge locations. On paper, this approach seems attractive because it preserves centralised management and avoids deploying complex components on-site.

Cloud
└── Control Plane (API Server, Scheduler, etcd)

Edge
└── Worker Nodes

In practice, however, this model introduces a strong dependency on network connectivity between the edge and the cloud. While Kubernetes is designed to tolerate temporary disconnections at the data plane level, many modern workloads rely heavily on continuous interaction with the Kubernetes API. Components such as Operators, controllers, autoscalers, and GitOps agents are all built around reconciliation loops that assume consistent access to the control plane.

When connectivity is disrupted, these components may stop functioning correctly. Even though running workloads might continue temporarily, the system as a whole loses its ability to adapt, heal, and evolve. Over time, this can lead to degraded performance or even service disruption.

Beyond workload behaviour, operational visibility also suffers. Nodes may appear unhealthy or unreachable, and control-plane-driven actions such as scheduling or scaling may fail. This creates a fragile system where a network issue can quickly escalate into an operational incident.

Most importantly, this architecture removes autonomy from edge environments. In scenarios such as manufacturing, where continuous operation is critical, relying entirely on a remote control plane becomes a significant risk.

The Extreme Opposite: Fully Independent Clusters

At the other end of the spectrum, organisations may choose to deploy fully independent Kubernetes clusters at each edge location. This approach ensures that every site can operate autonomously, regardless of connectivity to the cloud.

Factory A
└── Full Kubernetes Cluster
└── Control Plane (API Server, Scheduler, etcd)
└── Worker Nodes

Factory B
└── Full Kubernetes Cluster
└── Control Plane (API Server, Scheduler, etcd)
└── Worker Nodes

Factory C
└── Full Kubernetes Cluster
└── Control Plane (API Server, Scheduler, etcd)
└── Worker Nodes

While this solves the problem of network dependency, it introduces a different kind of complexity. Each location effectively becomes its own isolated environment, requiring dedicated management of control planes, upgrades, security policies, and monitoring systems.

Over time, this leads to what is commonly referred to as cluster sprawl. Instead of a unified platform, organisations find themselves managing a large number of independent clusters, each with its own lifecycle and configuration. Maintaining consistency across these clusters becomes increasingly difficult, especially as the number of locations grows.

Operational fragmentation is a natural consequence. Different clusters may run different versions of Kubernetes, apply different policies, or adopt different operational practices. This divergence increases the risk of misconfigurations and complicates troubleshooting.

Equally problematic is the loss of centralised visibility. Without a unified control plane, it becomes challenging to enforce global policies, deploy shared services, or gain a comprehensive view of the system. The simplicity and efficiency of the cloud model are effectively lost.

The Goal: Cloud-Like Experience at the Edge

What organisations ultimately need is not a compromise between these two extremes, but a model that combines their strengths. They want the centralised governance and ease of use of the cloud, while preserving the autonomy and resilience required at the edge.

This means enabling self-service cluster provisioning without exposing infrastructure complexity, maintaining consistent policies across all environments, and ensuring that edge locations can continue operating independently when needed.

Achieving this balance requires a shift in how we think about Kubernetes architecture.

The Foundation: Cluster API as the Single Source of Truth

At the core of this new approach is Cluster API, which introduces a declarative model for managing Kubernetes clusters. Instead of treating clusters as opaque systems, Cluster API represents them as Kubernetes resources that can be created, modified, and reconciled like any other object.

This abstraction allows organisations to define cluster configurations consistently, regardless of the underlying infrastructure. Whether a cluster runs in the cloud, on bare metal, or at the edge becomes an implementation detail rather than a fundamental difference.

By adopting Cluster API as the single source of truth, organisations can centralise lifecycle management while still supporting diverse environments. Changes can be applied declaratively, automation can be standardised, and infrastructure providers can be swapped or combined without altering the overall model.

Introducing Kamaji Hosted Control Planes

Kamaji builds on this foundation by implementing a hosted control plane architecture. Instead of provisioning dedicated control plane nodes for each cluster, Kamaji runs control plane components as isolated workloads within a management cluster.

This approach significantly improves efficiency, as multiple tenant clusters can share the same underlying infrastructure while maintaining strong isolation. It also simplifies operations, since upgrades and maintenance can be performed centrally.

The success of this model in cloud environments, as demonstrated by providers like OVHcloud, IONOS, and Rackspace, shows that hosted control planes are a viable foundation for managed Kubernetes services.

The key question is how to extend this model to edge environments without inheriting the limitations of centralised architectures.

Solving the Edge Problem with External Control Plane Deployment

One of the most powerful capabilities of Kamaji is the ability to decouple the orchestration of control planes from their physical location. Through a feature known as ExternalClusterReference, Kamaji can deploy tenant control planes in clusters that are different from the one managing them.

This seemingly small capability enables a fundamental shift in architecture. Instead of choosing between centralisation and fragmentation, organisations can distribute control planes while maintaining centralised governance.

The Island Architecture

This leads to what can be described as the Island architecture, where each edge location hosts its own lightweight management cluster, while still being orchestrated from a central control plane.

In this model, a central cluster running Cluster API acts as the global orchestrator. It defines the desired state of all clusters, including those running at the edge. This ensures consistency and enables centralised lifecycle management.

Cloud
└── Management Cluster
└── Cluster API
└── Kamaji

Factory (Edge)
└── Island Management Cluster (managed via Cluster API)
└── Kamaji

At each edge location, a local management cluster (referred to as an Island) runs a nested Kamaji. This cluster is responsible for hosting tenant control planes locally, within the same physical environment as the workloads that depend on them.

Factory

Island Cluster
├─── Kamaji
| ├─ Tenant Control Plane A
| ├─ Tenant Control Plane B
| └─ Tenant Control Plane C
└─── Worker Nodes
└── Connect locally to tenant control planes

Worker nodes within the edge environment connect to these local control planes rather than to a remote cloud endpoint. As a result, control loops remain functional even in the presence of network disruptions.

Why This Architecture Works

The strength of the Island architecture lies in its ability to reconcile competing requirements. By placing control planes close to the workloads they manage, it ensures that critical Kubernetes components continue to function even when connectivity to the cloud is lost. This dramatically improves resilience and allows edge environments to operate autonomously.

The main "challenge" regarding the Hosted Control Plane architecture has always been the need for a Kubernetes cluster for inception: with the Island architecture, the management cluster is yet another Kubernetes cluster managed by a centralised Kamaji instance, reporting to the central Cluster API inventory. Resulting clusters at the edge will be declared in the central management platform, not at the edge, which is just responsible for spawning Control Planes, and letting worker nodes join the local Kubernetes API Servers.

The use of Cluster API as a central control mechanism ensures that governance remains consistent across all locations. Even though control planes are distributed, they are still managed declaratively from a single source of truth.

This architecture also reduces operational risk by avoiding excessive centralisation. Instead of relying on a single control plane for the entire system, responsibility is distributed across multiple locations. This not only improves fault tolerance but also aligns better with the physical realities of edge deployments.

The role for CLASTIX kMetal

Building on these principles, CLASTIX kMetal provides a comprehensive platform for managing Kubernetes across cloud and edge environments, as well as on bare metal by providing Kubernetes-specific virtualisation.

A key capability is the concept of a Single Pane of Glass, where all clusters (whether they are running in the cloud, in edge locations, or as tenant clusters) are represented within a unified interface. This eliminates the fragmentation typically associated with distributed environments and restores the centralised visibility that operators expect.

In addition, the platform enables a multi-tenant self-service model. Development teams can request and consume Kubernetes clusters without needing to understand the underlying infrastructure. This abstraction significantly improves developer productivity while reducing the operational burden on platform teams.

Another important aspect is the ability to manage addons and platform services consistently across clusters. Observability, security, and networking components can be deployed and maintained in a uniform way, ensuring that all environments adhere to the same standards.

Crucially, this is all built on top of Cluster API, which means that infrastructure differences are abstracted away. Whether a factory runs on bare metal or virtualised hardware does not affect how clusters are managed.

Operational ROI

From an operational perspective, this architecture delivers tangible benefits. By leveraging hosted control planes, organisations can significantly reduce the infrastructure footprint required to run Kubernetes at scale. Control planes become lightweight and densely packed, lowering both cost and complexity.

Automation through Cluster API simplifies day-to-day operations. Tasks that would traditionally require manual intervention, such as upgrades or scaling, can be handled declaratively, improving efficiency and reducing the likelihood of human error.

Resilience is another major advantage. Because control planes are deployed locally at the edge, workloads can continue to operate even during network disruptions. This is particularly important in environments where downtime translates directly into financial loss.

Finally, distributing control planes across multiple locations reduces systemic risk. Instead of concentrating all functionality in a single centralised system, the architecture spreads responsibility in a way that improves overall robustness and aligns with the principle of avoiding single points of failure.

What's next for the Future of Edge

As organisations continue to expand their edge footprint, the limitations of traditional Kubernetes architectures will become increasingly apparent. Delivering a true managed experience outside the cloud requires a combination of centralised governance and local autonomy.

By combining Cluster API, Kamaji, and kMetal, it is possible to achieve this balance. Still, add-ons and applications delivery must be sorted out: in the upcoming blog post, we'll share how Project Sveltos (already part of kMetal) can solve the last challenge, providing a managed, hybrid, and multi-cloud application platform.