The Fleet Visibility Gap: Why Teams With 5+ Clusters Hit a Wall
Every tool that worked at 2 clusters breaks at 8. kubectl, Lens, k9s, Headlamp are all single-cluster-at-a-time. Here's where the wall is and what it looks like.

It's the third time this week someone asked me which cluster our staging API actually runs in.
Not "is staging healthy." Not "what's the error rate." Just: where does it live. The answer turned out to be stg-eu-2, a cluster spun up four months ago for a GDPR workload that quietly became the default for anything EU-adjacent. Nobody wrote that down. The person who knew left the company in September.

If you're running one cluster, skip this post. If you're running two, bookmark it. If you're somewhere between five and fifty, you already know where this is going.
How you end up with twelve clusters without planning to
Nobody sits down and says "let's run twelve Kubernetes clusters." You start with one. Then you need a staging environment, because prod-shaped accidents are expensive. Two.
Then someone points out that running staging and prod on the same control plane defeats the point, so you split the cloud accounts. Three, if you count dev. Then a customer in Frankfurt needs data residency. Four. Then your biggest customer wants a dedicated tenant for compliance. Five. Then the platform team decides per-team clusters are cleaner than shared namespaces for the new ML workloads. Eight. Then you acquire a company and inherit their GKE footprint. Eleven.
That's a real trajectory. I spoke to a platform lead at a 60-person company last month who has 14 clusters across 3 regions - EKS in us-east-1 and eu-west-1, GKE in asia-southeast1 for latency reasons, plus four "temporary" clusters that have existed for over a year. They have two engineers on the platform team.
The clusters aren't the problem. Kubernetes handles that part fine. The problem is that the tools you used to operate one cluster don't compose.
The single-cluster assumption baked into every tool
Walk through the standard debugging kit. Every single one of these assumes you're looking at one cluster at a time.
kubectl has contexts. That's the official answer. In practice you end up with a ~/.kube/config that's 400 lines long, a handful of KUBECONFIG env vars pointing at different files, and a shell alias wall that looks something like this:
# ~/.zshrc, three months into the job
alias kprod-us="kubectl --context=arn:aws:eks:us-east-1:1234:cluster/prod-us"
alias kprod-eu="kubectl --context=arn:aws:eks:eu-west-1:1234:cluster/prod-eu"
alias kstg-us="kubectl --context=arn:aws:eks:us-east-1:1234:cluster/stg-us"
alias kstg-eu="kubectl --context=arn:aws:eks:eu-west-1:1234:cluster/stg-eu-2"
alias kdev="kubectl --context=kind-dev"
alias kacme="kubectl --context=gke_acme-prod_us-central1_acme"
# ... and six more
# the function everyone writes eventually
kall() {
for ctx in prod-us prod-eu stg-us stg-eu-2; do
echo "=== $ctx ==="
kubectl --context="$ctx" "$@"
done
}That kall function is a tell. It's what you write the first time you need to check "are any of our clusters seeing the same CrashLoopBackOff?" It works for trivial commands. It falls apart the moment you need to correlate anything, or the moment one context hangs for 30 seconds because the VPN dropped.

k9s is excellent. I use it daily. But it shows one cluster at a time. You switch contexts with :ctx and the whole UI reloads - events, pods, the lot. There's no "show me every failing pod across my fleet." Not k9s's job, and it's honest about that.
Lens and its forks (OpenLens, Freelens) technically let you add multiple kubeconfigs to the sidebar. Each one opens in a separate workspace pane. You can't see them at once in any useful way, and switching between them triggers a full reload of the cluster state. On a machine with six clusters loaded, memory usage gets unpleasant. Lens itself has the Mirantis-cloud-login baggage we wrote about when we introduced Radar.
Headlamp supports multiple clusters better than most - it'll show them in the sidebar and you can click between them. But the views are per-cluster. There's no aggregated event feed, no cross-cluster search, no fleet dashboard. It's a competent single-cluster UI that tolerates being pointed at several kubeconfigs.
Per-cluster dashboards (Grafana, DataDog cluster view, the EKS console) work well for what they show, but they're separate dashboards. You end up with a bookmark folder of twelve URLs and a habit of opening them in sequence every morning.
The honest comparison
| Tool | Multiple clusters supported | Aggregate view | Persistent cross-cluster history | Who it's for |
|---|---|---|---|---|
| kubectl | Via context switching | No | No | Everyone, always |
| k9s | One at a time (fast switch) | No | No | Terminal natives debugging one cluster |
| Lens / OpenLens | Multiple kubeconfigs loaded | No | No | GUI users on a single laptop |
| Headlamp | Sidebar-style multi-cluster | Partial | No | Teams who want a browser-based UI |
| Per-cluster dashboards | Yes, separately | No | Yes, per cluster | Ops teams with dedicated dashboards per env |
None of these are bad. They're solving the single-cluster problem well. The fleet problem is a different problem.
The hidden cost of not having fleet visibility
You don't notice the gap all at once. It accumulates.
Longer incidents. The on-call engineer pages in, opens their laptop, and spends the first four minutes figuring out which cluster the alert came from. Was it prod-us or prod-eu-2? The alert says api-gateway but there are three of those across the fleet. By the time they've switched context, opened logs, and cross-referenced the deploy history, the customer impact is already on Twitter.
Missed signal. A ConfigMap change goes out to all production clusters via Argo CD. One of them rejects it because a CRD version is pinned to an older release. Nobody notices for six hours because the failure is buried in one cluster's events feed and nobody was looking at that tab.
Onboarding tax. The new engineer joins. They need to know how to connect to every cluster, which ones are safe to click around in, which ones have stricter RBAC, which ones have the weird Istio setup from 2023. The only documentation is a Notion page that's been out of date since February. They spend their first two weeks building a mental map that the rest of the team carries in their heads.
Context reconstruction. "What did we deploy to prod-eu on Tuesday?" is a question that should take five seconds. Without persistent cross-cluster history, it takes a Slack thread, a Git log, a CI run search, and someone eventually guessing.
These costs don't show up on a dashboard. They show up as fatigue, as longer MTTR, as "we should rebuild our platform" conversations that don't lead anywhere.
What a fleet tool needs to do
If you were designing for the multi-cluster case from the start, a few properties fall out naturally:
Aggregate view first, drill-down second. The default landing page should show every cluster you operate, with health, recent events, and workload counts. Clicking into a cluster gets you the familiar single-cluster view. Right now most tools flip that - you pick a cluster, then see its state.
Persistent timeline across restarts. Kubernetes events expire after an hour by default. If your tool only shows current events, you've lost the ability to answer "what happened overnight." Cross-cluster history needs to survive process restarts and be queryable by time range.
A shared context that isn't a kubeconfig. Kubeconfigs were designed for one user on one machine. For a team, you want shared enrollment - a cluster gets connected once, everyone authorized can see it, access is managed centrally. No more Slack DMs with kubeconfig.yaml attached.
Access control beyond kubeconfigs. RBAC at the cluster level is fine for services. Humans need something else: per-cluster and per-namespace scoping, roles that distinguish "can view" from "can exec into pods," and an audit trail of who did what.
Notifications that know about more than one cluster. A Slack message that says "pod X crashlooped in prod-eu-2" is useful. A Slack message that says "the same image hash just started crashlooping in three clusters, here's the correlation" is actually actionable.
Outbound-only connectivity. Nobody is opening an inbound port on their production cluster for a vendor dashboard. Any sane fleet tool has the agent dial out, not the reverse.
What we're working on
We've been using Radar internally at Skyhook for a while - a local, single-binary Kubernetes visibility tool. It's solved the single-cluster UX problem well enough that we're releasing it as open source in January 2026. You point it at your kubeconfig, it opens a dashboard in your browser, done. No account, no cloud.

A hosted version is next. Same UI, but the agent runs in each cluster and ships state to a shared backend, so the fleet view works the way it should - one page, every cluster, persistent history, shared access for the whole team. We're not ready to talk about that in detail yet. More when it's built.
In the meantime, if this post describes your week, you're not alone. The tooling gap is real. Recognizing it is the first step toward not papering over it with another wall of shell aliases.
Keep reading
Introducing Radar: Multi-Cluster Kubernetes Visibility for Teams
Radar Cloud is the hosted multi-cluster extension of Radar OSS. Fleet view, 30-day timeline, SSO, scoped RBAC, alerts. Credentials never leave your cluster.
Multi-Cluster Topology: Cross-Cluster Service Maps That Don't Hairball
Cross-cluster service topology is hard because Kubernetes itself has no multi-cluster graph. Here's how Radar builds one without turning it into a hairball.
Persistent Event Timeline: Debugging What Happened Last Tuesday
Kubernetes events vanish after an hour. Radar keeps them for 30 days. Here is how a persistent event timeline changes incident response.