Multi-Tenant AI Agents on Kubernetes: From Sidecar DBs to Centralized Services

When you’re running AI agents in production, the first challenge isn’t the model — it’s the infrastructure. How do you spin up an isolated agent per user, on the fly, without leaking credentials or state between tenants?

This is the story of how we solved that problem at scale on GCP, and what we learned along the way.

The Problem

We needed to provision a new AI agent for every user that activated a specific workflow. Each agent required:

Its own credentials: API keys for language models, scoped per tenant.
Its own state: Conversation history, memory, and context that must never leak between users.
Dynamic provisioning: Agents had to spin up and down on demand, not be pre-allocated.

This immediately raised the fundamental multi-tenancy question: How do you isolate tenant data?

Option A: Shared Database with Row-Level Security

The classic approach. One Postgres instance, one schema, and RLS policies that filter rows by tenant_id. It’s cost-efficient and operationally simple — one database to back up, monitor, and scale.

But for AI agents handling sensitive workflows, the blast radius of a misconfigured RLS policy is enormous. One bad query and Tenant A sees Tenant B’s conversation history.

Option B: Sidecar Database per Tenant (The MVP)

Our initial architecture used Kubernetes’ sidecar pattern: each Pod ran two containers side by side.

Main Container: The agentic loop — receiving messages, calling the LLM, executing tools, returning responses.
Sidecar Container: A lightweight database (SQLite) providing completely isolated storage for that tenant’s state.

apiVersion: v1
kind: Pod
metadata:
  name: agent-tenant-42
spec:
  containers:
  - name: agent-loop
    image: our-agent:v1
    envFrom:
    - secretRef:
        name: tenant-42-credentials
  - name: tenant-db
    image: sqlite-sidecar:latest
    volumeMounts:
    - name: tenant-data
      mountPath: /data
  volumes:
  - name: tenant-data
    persistentVolumeClaim:
      claimName: pvc-tenant-42

The appeal was obvious: zero-trust isolation by default. Tenants couldn’t see each other’s data because there was literally no shared database. Each agent lived in its own apartment with its own filing cabinet.

The PersistentVolume Lesson

Here’s what you learn quickly in production: Kubernetes Pods are ephemeral. If the Pod crashes, restarts, or gets rescheduled to another node, any data written to local disk (emptyDir) vanishes.

For a stateless web server, that’s fine. For a database holding a user’s conversation history? That’s a disaster.

The fix is PersistentVolumeClaims (PVCs) — storage that survives Pod restarts. But this introduces its own complexity:

Each tenant now needs a provisioned PVC (we used GCP Persistent Disks).
PVCs are zone-locked in GCP, meaning your Pod can only be scheduled in the zone where its disk lives, reducing Kubernetes’ scheduling flexibility.
Cleanup becomes critical: orphaned PVCs from deleted tenants accumulate cost silently.

The Pivot: Centralized Service

We quickly realized the sidecar-per-tenant model, while elegant for isolation, didn’t scale economically. The math was simple:

1,000 users = 1,000 PVCs = 1,000 Persistent Disks.
Most users were on free or basic tiers — they didn’t need (or want to pay for) dedicated infrastructure.
The operational overhead of managing thousands of individual database instances was significant.

The sidecar approach made sense as a premium feature for enterprise tenants who demanded strict data isolation. But for the majority of users, a centralized database with properly implemented RLS was the right trade-off: lower cost, simpler operations, and “good enough” isolation for non-sensitive workloads.

The hybrid architecture became:

Standard tier: Shared Postgres with RLS, managed via Cloud SQL.
Premium tier: Dedicated sidecar DB with PVC, for tenants with compliance requirements.

Infrastructure as Code: Helm + Secret Manager

Provisioning agents dynamically required two pieces working together:

Helm Charts for Templated Deployments

Every new agent was deployed via a Helm chart with tenant-specific values injected at install time:

helm install agent-tenant-42 ./charts/agent \
  --set tenant.id=42 \
  --set tenant.tier=standard \
  --set tenant.model=claude-opus-4

The chart templated everything: the Pod spec, the service account, network policies, and resource limits. Upgrading all agents to a new version was a single helm upgrade across the fleet.

GCP Secret Manager for Credentials

API keys and sensitive configuration never touched Helm values or environment variables directly. Instead:

Secrets were stored in GCP Secret Manager, scoped per tenant.
A Kubernetes ExternalSecret (via External Secrets Operator) synced them into the cluster as native Kubernetes Secrets.
The Pod’s envFrom referenced the synced Secret, keeping credentials out of version control and Helm history.

This meant rotating an API key was a Secret Manager update — no redeployment needed.

What I’d Do Differently Today

Start with the centralized model. The sidecar MVP taught us a lot, but in retrospect, we could have shipped faster with RLS from day one and added the sidecar tier later for premium users.
Use SQLite only if you embrace its constraints. SQLite is brilliant for single-writer, local-first workloads. But in a Kubernetes context, the PVC dependency negates much of its simplicity. For the centralized path, Postgres (managed via Cloud SQL) is the obvious choice.
Invest in observability early. When you have hundreds of agents running independently, you need centralized logging and tracing from day one — not after the first “why did Tenant 42’s agent stop responding?” incident.

Takeaway

Multi-tenant AI agents are fundamentally an infrastructure problem, not an AI problem. The model doesn’t care if it’s running in a sidecar or a monolith. But your users care about privacy, your finance team cares about cost, and your ops team cares about not getting paged at 3 AM because a PersistentVolume filled up.

Start simple. Isolate where it matters. Scale what pays for itself.

Diego Jiménez Vergara — AI Infrastructure & DevOps Engineer. Building at the intersection of FinTech, AI, and Cloud Native systems.

Home

Explorer

Multi-Tenant AI Agents on Kubernetes: From Sidecar DBs to Centralized Services

Multi-Tenant AI Agents on Kubernetes: From Sidecar DBs to Centralized Services

The Problem

Option A: Shared Database with Row-Level Security

Option B: Sidecar Database per Tenant (The MVP)

The PersistentVolume Lesson

The Pivot: Centralized Service

Infrastructure as Code: Helm + Secret Manager

Helm Charts for Templated Deployments

GCP Secret Manager for Credentials

What I’d Do Differently Today

Takeaway

Graph View

Table of Contents

Backlinks