kamaji capsule GPU NVIDIA

GPU-Based CaaS, Platform Engineering, and the Future of Multi-Tenancy

In one of the most insightful technical posts of the year, Landon Clipp published a brilliant deep-dive into what it actually takes to build a GPU-based Containers-as-a-Service (CaaS) platform on modern NVIDIA HGX systems.

Monday, November 17, 2025 Dario Tranchitella

Landon's breakdown covers the whole spectrum: GPU isolation, NVLink fabric partitioning, custom CDI device plugins, VM-based container runtimes, and the orchestration glue that brings the system together. It's a masterclass in modern platform design, and I couldn't present it at The Platform Brief newsletter.

What struck me the most is how all these moving parts perfectly intersect with the evolving practice of Platform Engineering, and how this blueprint opens the door for next-generation multi-tenant GPU platforms built on Kubernetes. I've already helped Seeweb along with CLASTIX to offer their public Serverless GPU offering on top of Kamaji and Project Capsule: by reading Lando's article, it looked like an epiphany, so let's expand on this through a Platform Engineering lens.

A GPU CaaS isn't just infrastructure: it's a Product

I've repeated this countless times, and I should change the line, but we're discussing a concrete example here. such as GPUs, and in the context of CaaS offering this can't be just a collection of nodes with GPUs. It is a platform product that enforces:

  • consistent APIs

  • predictable tenancy boundaries

  • standardized workloads

  • controlled access to expensive hardware

  • strong defaults

  • and automated policy propagation

I've mentioned Seeweb's k8sgpu offering, where an elastic pool of GPU instances can be consumed externally by leveraging the concept of Virtual Kubelet: in such a use case, there's a proper API since it relies on the Kubernetes contract, you launch a Pod, and it gets deployed in a multi-tenant environment. If you want to learn more, there's a video showing how to launch a GPU workload in less than 3 minutes. But here we're aiming to build a Platform, then a Product.

And Landon’s design expresses this beautifully: every component — from CDI plugins to fabric partitioning to VM-based runtimes — becomes a product capability of the platform.

This is Platform Engineering at its finest: turning infrastructural complexity into self-service experiences that users can consume safely.

Nowhere is this more important than in multi-tenancy, another buzzword I'm heavily using on LinkedIn, and IRL for those unlucky who have ever met me personally.

Expanding tenancy: moving from Namespaces to a real Multi-Tenant foundation

In the original article, Landon defines a clean, Kubernetes-native approach to tenant onboarding:

  • a custom Tenancy CRD

  • a dedicated Namespace

  • NetworkPolicies to isolate traffic

  • a controller that marks the tenant "ready"

  • integration with runtime classes for GPU workloads

This is exactly what we implemented for an open-source project, now donated to the Cloud Native Computing Foundation (CNCF).

Project Capsule: the tenancy layer built for Platform Teams

Project Capsule (incubated as CNCF Sandbox project) provides exactly the abstractions Landon describes, but with a richer, production-grade multi-tenant model. Capsule extends Kubernetes with:

  • Tenant CRDs that group multiple namespaces: organise workloads for a customer or business unit across N namespaces, not just one.

  • Automatic propagation of NetworkPolicies: restrict inter-tenant traffic with consistent, inherited policies — no manual admin overhead.

  • Default RuntimeClasses, Pod Security defaults, and ResourceQuotas: perfect for GPU workloads where tenants must use a specific virtualisation runtime (e.g., Kata/QEMU with VFIO passthrough).

  • Quota and fair-use enforcement per tenant: critical when GPUs are scarce, expensive, and require strict scheduling fairness.

Capsule becomes a natural complement to the platform Landon describes: it codifies tenancy as a first-class API, makes policies automatic, and reduces the operational tax on platform teams.

This is the ideal model for a soft multi-tenant GPU CaaS, and there's no better way to explain it than showing the code worldwide.

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: landon
spec:
owners:
- name: alice
kind: landon
- name: system:serviceaccounts:tenants:landon
kind: ServiceAccount
runtimeClasses:
default: kata-qemu-nvidia-gpu
matchLabels:
kubernetes.io/metadata.name: kata-qemu-nvidia-gpu

The said manifest allows one (or more) users to create Namespaces in a Kubernetes cluster: it could be a Service Account, a generic user named landon. All the created Namespaces using a regular API call (such as kubectl, or via API) will be bound to the Capsule's Tenant landon.

Every Namespace will be grouped to its Tenant, and thanks to the Runtime Class enforcement, the workloads will automatically run with the provided Runtime Class.

Project Capsule provides further features, such as limiting the number of Namespaces, and an advanced Resource Quotas mechanism which spans the boundaries of Namespaces, but for the sake of simplicity, I'm using Landon's blog post to highlight how Capsule perfectly fits the design, and we're still missing an important aspect, the network.

Quoting Landon's article:

Another component that must be isolated is the network itself.
By default, k8s pods can talk to each other if you have a proper Container Network Interface (CNI) plugin installed.
he CNI I chose to go with is Cilium. (...)
Cilium will determine whether or not the packet is allowed to continue based on however you've configured the CiliumNetworkPolicy

We couldn't agree more, and a no-brainer for Capsule since we can automate the creation of such NetworkPolicies, reducing the operational burden of such a platform.

apiVersion: capsule.clastix.io/v1beta2
kind: GlobalTenantResource
metadata:
name: caas-networkpolicies
spec:
resyncPeriod: 60s
resources:
- rawItems:
- apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-policy
spec:
# Apply to all pods in their Namespace
podSelector: {}
policyTypes:
- Ingress
- Egress
egress:
- toEndpoints:
- matchLabels:
capsule.clastix.io/tenant: "{{tenant.name}}"
endpointSelector: {}
ingress:
- fromEndpoints:
- matchLabels:
capsule.clastix.io/tenant: "{{tenant.name}}"

The GlobalTenantResource is an enhancement we introduced in Capsule, allowing the automatic propagation of custom resources: imagine the onboarding of a new Tenant, and the need to replicate required objects such as Secrets, Service Account, or NetworkPolicies.

Now imagine if the said objects could be subject to a drift: the Platform Administrator's time will be devoured by this toil. Project Capsule has been designed to offload platform administrators from these tasks and let them focus on more productive activities, such as building a platform.

As you can notice, Capsule can become a natural complement to the platform Landon describes: it codifies tenancy as a first-class API, makes policies automatic, and reduces the operational tax on platform teams. Furthermore, it doesn't solve only the onboarding, but also the offboarding: once a customer has completed their task, by deleting the Tenant object, an automatic clean-up occurs, freeing up resources, and avoiding the presence of stale ones.

This is the ideal model for a soft multi-tenant GPU CaaS.

But what if you need more?

Can soft Multi-Tenancy be enough?

There are clear scenarios where namespace-based tenancy is insufficient:

  • strict regulatory segmentation

  • noisy-neighbor isolation

  • differentiated SLAs

  • multi-organization boundaries

  • complex upgrade strategies

  • or simply the need for

    API server isolation

Imagine the need for replicating the same Platform on a customer's own infrastructure: they want everything on their own, by leveraging on their concept of multi-tenancy which is a nested one.

In such a scenario, you would need multiple Kubernetes clusters, and we're already aware of the inglorious Control Plane tax: Kubernetes is made of the Control Plane, mostly formed by 3 instances just serving the API Server, the Controller Manager, and the Scheduler, to let Worker Nodes run your workloads. Imagine scaling the number of clusters to dozens or hundreds in different infrastructures, and the design "multiple namespaces in one cluster".

This is where the next piece comes in: Kamaji provides a modern Hosted Control Plane architecture by turning Kubernetes control planes into a software appliance. Forget about VMs, quorum, Availability Zone, and Operations. Kamaji is taking care of that; you just need to bring your worker nodes, as you would do in a Cloud Provider such as Azure, Google, AWS, or pick your preferred one.

Kamaji as foundation for NVIDIA DGX clusters

With Kamaji, tenants have to bring their own nodes, without worrying about the platform level.

This is a game-changer for GPU-based CaaS, and brings the tenancy to a higher scale of multi-tenancy:

Each tenant (or tenant group) gets its own Kubernetes control plane

Each Tenant just brings their own hardware, forgetting all the headache of Control Plane management: users must focus on delivering applications, not on how to operate Kubernetes.

Control planes become lightweight, fast to provision, and easy to manage

Ideal for environments where new GPU tenants are onboarded frequently: the Control Plane can be created in less than 30 seconds, allowing unprecedented speed of onboarding and offboarding.

GPU nodes join tenant-specific clusters

Especially for regulated environments, a sharing-nothing compliance is required: it relies on hardware segmentation.

Put offside virtualisation

Especially for Bare Metal and GPU, Control Planes would require an additional layer of virtualisation: Kamaji relies on running Control Planes as Pods in a management Kubernetes cluster, allowing a higher bin-packing of workloads, without wasting time, resources, and energies into unnecessary virtualisation.


A Blueprint for the Next Generation of GPU CaaS

Again, so many kudos to Landon for providing the foundational architecture:

  • GPU passthrough using VFIO

  • NVLink/NVSwitch isolation

  • CDI device plugins

  • Kata/QEMU for VM-based containers

  • Tenant-aware scheduling

  • Network isolation

  • Resource allocation and reuse safety (GPU leases)

By layering Platform Engineering principles on top of this, we get a complete platform model:

  • Base Layer: Hardware Virtualisation: fabric partitioning, firmware safeguards, GPU enumeration.

  • Runtime Layer for Secure Execution: VM-based containers, runtime classes, device mapping.

  • Platform Layer for Tenancy & Policy: Capsule for namespace grouping, policy inheritance, defaults.

  • Control Plane Layer for Stronger Isolation: Kamaji Hosted Control Planes when hard isolation is needed.

  • Self-Service Layer for Developer Experience: a simple CaaS API: "run this container with N GPUs"

This is how GPU platforms move from single-cluster experiments to planet-scale, production-ready services.

Final Thoughts

Landon's article is more than a technical guide: it's a vision of what GPU platforms will look like as the AI compute market matures. The depth of the work, the clarity in the reasoning, and the honesty about the complexity involved all deserve serious praise.

I just added my opinionated view to bring it to a product-offering level, by placing the open source projects I maintain: tl;dr; huge kudos to Landon for pushing the industry forward.


Are you looking to build something similar?

CLASTIX is the commercial company that donated Project Capsule to the CNCF and maintains Kamaji, the Hosted Control Plane manager for Kubernetes, trusted by the biggest GPU players in the world (NVIDIA, Rackspace, Mistral Compute): get in touch with our team to learn more about how we can help your organisation in delivering real value from your GPU pool.