Introducing Radar: Multi-Cluster Kubernetes Visibility for Teams
Radar Cloud is the hosted multi-cluster extension of Radar OSS. Fleet view, 30-day timeline, SSO, scoped RBAC, alerts. Credentials never leave your cluster.

A platform engineer we talk to runs staging, two prod regions, and a handful of ephemeral preview clusters. She loves Radar OSS - she has it aliased to r on her laptop and uses it every day. But when her on-call teammate pings her in Slack at 1am asking "what does the topology look like right now in prod-us?", she has to either screenshot her own view or walk him through kubectl for the next twenty minutes.
The OSS tool works exactly as designed. One engineer, one cluster, one laptop. That's the product. But teams need something else: a shared view that doesn't vanish when someone closes their terminal, access controls that aren't "whoever has the kubeconfig wins", and a timeline that survives past the last restart. That's what Radar is.

TL;DR
Radar is the hosted extension of Radar. A small Go agent runs in each cluster, connects outbound-only over mTLS to agents.radarhq.io, and feeds a multi-cluster dashboard with persistent state. You get fleet view across every cluster, 30-day event timeline, SSO, scoped RBAC, alerts to Slack/PagerDuty/Teams, and shareable deep links. Credentials stay in the cluster. Logs and exec stream on demand and are never stored at rest. Free tier for up to 3 clusters, $99 per cluster per month on Team, custom pricing on Enterprise.
What OSS Can't Do For a Team
Radar OSS is a single Go binary that runs locally against your kubeconfig. That architecture is a feature: nothing to install in the cluster, nothing leaves your machine, instant startup. It's the right shape for individual debugging, regulated environments, and air-gapped clusters.
It's the wrong shape for a team of eight engineers running twelve clusters.
Six things break once you scale past one engineer and one cluster:
- No fleet view. OSS talks to whatever kubeconfig context is active. Switching contexts is manual. There's no single screen that says "across all my clusters, what's unhealthy right now?"
- Timeline dies on restart. Events live in memory (or SQLite if you flip a flag). Close the process, lose the history. Good enough for a debugging session, not good enough for a postmortem.
- No shared links. If you spot a broken pod and want to point a teammate at it, the best you can do is a screenshot. There's no URL that opens the same view on their machine.
- Whoever has the kubeconfig can do anything. OSS respects your cluster RBAC, but it doesn't layer anything on top. If your kubeconfig can delete pods, so can anyone who borrows your laptop.
- No alerts. OSS shows you what's happening when you're looking. It doesn't tell you when you're not.
- No SSO. Access is a kubeconfig file passed around however your team passes around kubeconfig files.
Every one of those gaps is a thing we kept hearing from OSS users. Radar closes them without changing the OSS architecture or pulling features behind a paywall.
A Concrete Before and After
Last month one of our early-access teams hit a cascading DNS failure that took out two services across their prod-eu cluster. Three engineers ended up debugging together across a two-hour window.
With OSS alone, that session leaves no trace. Each engineer sees their own local view, events older than the informer's memory are gone, and the only persistent artifact is whatever someone remembered to paste into Slack.
With Radar, they all opened the same fleet view, filtered to prod-eu, and scrolled the timeline back to when the first CoreDNS pod started crashlooping. The correlation engine had already grouped the CoreDNS restarts, the downstream pod readiness failures, and the Service endpoint flaps into a single incident. They sent a deep link to the incident view to the SRE who joined forty minutes late. She landed on the exact state of the cluster at minute twelve, not "whatever is happening now."
No heroics. Just: the history was there, the URL was shareable, and access was scoped to people who were supposed to have access.
What Radar Adds
Fleet view
One dashboard across every connected cluster. Health, workload counts, recent warnings, Helm releases, node capacity, grouped by environment or label or region. Drill into any cluster and you're inside the same resource/topology/timeline/Helm views you already know from OSS.

Persistent event timeline
Events, resource changes, Helm operations, and agent-observed state transitions streamed to the Radar backend and retained. 24 hours on Free, 30 days on Team, 1 year+ on Enterprise. Filter by cluster, namespace, resource kind, severity. Scroll back to the minute something broke instead of hoping your terminal buffer caught it.

SSO
Team accounts get Google and GitHub OAuth plus full SAML / OIDC SSO (Okta, Entra ID, OneLogin, and any spec-compliant provider) at a published price. SCIM 2.0 user provisioning ships on the Enterprise plan.
Scoped RBAC
Per-cluster and per-namespace access. Four roles (Admin, Operator, Viewer, Custom) and the ability to build your own. An Operator on prod-us is not automatically an Operator on prod-eu. A Viewer can read but can't restart pods. A Custom role can, say, read everything but only exec into pods in the debug namespace.
This is the thing you cannot build on top of OSS no matter how hard you try, because OSS inherits whatever the local kubeconfig grants.
Alerts
Slack, PagerDuty, Opsgenie, Microsoft Teams, and a generic webhook. Alerts are built on top of the timeline, so the same correlation that groups events into incidents in the UI also drives notifications - no separate rule language, no duplicate routing to tune.
Suppression windows, dedup by resource, and grouped delivery are standard. If fifty pods go NotReady because a node drained, you get one notification about the node, not fifty about the pods.
Shareable deep links
Every view has a URL. Resource detail, filtered timeline slice, topology at a specific time, incident view. Paste it into Slack, link it from an incident doc, bookmark it. The URL preserves filters, time range, and the cluster scope.
Architecture
Radar is split in two: a small per-cluster agent and a multi-tenant backend.

The agent is a single Go binary, ~32MB RSS at steady state, deployed as a Deployment via Helm. It uses the same client-go SharedInformer pattern as Radar OSS - list once per resource type, then watch deltas. No polling, no periodic full scans.
The connection is outbound-only. The agent dials agents.radarhq.io:443 and holds a long-lived connection over mutual TLS. The cluster doesn't need a LoadBalancer, Ingress, NodePort, or inbound firewall rule. If your cluster can make an outbound HTTPS call to any public endpoint, the agent works. This is the property that makes Radar tolerable to security teams.
The agent runs with a scoped Kubernetes ServiceAccount. Read-only by default. You opt into write permissions (scale, delete pod, apply) per-namespace if you want to drive operations from the Radar UI. No way for the Radar backend to escalate past what the ServiceAccount grants in that cluster.
What the agent ships to the backend: resource state changes, Kubernetes events, Helm release metadata, and coarse pod metrics. That's it. Cluster secrets are never read or shipped. Config data stays in the cluster.
What the agent does not ship: logs, exec sessions, and port-forwards are streamed through the agent on demand, only when a user opens them in the UI, and are never stored at rest. If you close the log tab, the stream ends. We're a visibility plane, not a log backend.
Backend: Go microservices, PostgreSQL for metadata, ClickHouse for timeline events at scale. US (us-east-1) and EU (eu-west-1) data residency. Pick one per tenant, enforced end-to-end. SOC 2 Type II.
Each cluster is onboarded with a bearer token issued from the Radar dashboard. The raw token is shown exactly once at creation and only ever stored as a SHA-256 hash, bound to your tenant, your cluster, and the ServiceAccount's identity. An Admin can rotate it at any time, which forces the agent to reconnect with the new token.
Install

helm repo add skyhook https://skyhook-io.github.io/helm-charts
helm repo update
helm install radar-cloud-agent skyhook/radar-cloud-agent \
--namespace radar-cloud --create-namespace \
--set token=$RADAR_HUB_TOKENOne minute from helm install to a cluster tile lighting up green in the fleet view.
A Tour in Five Views
Fleet
The landing page. Cards for every cluster you've enrolled, grouped however you want, with health, recent warnings, and drill-in. The view OSS users have been asking for since week one.
Topology

The same structured-DAG topology from OSS, running at cluster scope. Ownership chains, Service routing, Ingress paths, ConfigMap/Secret references, HPA targets. Problem resources light up yellow or red.
One honest note: topology in v1 is single-cluster. You pick a cluster, you get its graph. Cross-cluster topology edges (service meshes spanning clusters, federated DNS, cross-cluster ExternalName services) are on the roadmap but not in GA. We'd rather ship single-cluster topology that's correct than cross-cluster topology that fakes the edges.
Timeline
The persistent version. Scroll back days or weeks, filter by cluster or namespace, click into incidents that the correlation engine grouped together. The backend is ClickHouse, which is what lets us do fast range queries over millions of events without the UI going slow.
Helm

All Helm releases across every cluster. Status, chart version, app version, owned-resource health. Install, upgrade, rollback, uninstall - if the agent's ServiceAccount has write permissions in that namespace. Release history and values comparisons across revisions.
Alerts

Route by cluster, namespace, severity, resource kind, or incident type. Per-channel routing (prod to PagerDuty, staging to a Slack channel, everything else to a webhook). Suppression windows during planned maintenance. Mute a noisy namespace without muting its neighbor.
What It Costs
| Plan | Clusters | Timeline retention | Audit log | Users | Auth | Alerts | SLA | Price |
|---|---|---|---|---|---|---|---|---|
| Free | 3 (up to 10 nodes each) | 24 hours | 7 days | Unlimited | Google, GitHub | Community Slack | Best-effort | $0 |
| Team | Unlimited | 30 days | 30 days | Unlimited | Google, GitHub, SAML/OIDC | Slack, PagerDuty, MS Teams | 99.5% | $99 / cluster / month |
| Enterprise | Unlimited | 1 year+ | 1 year | Unlimited | + SCIM 2.0 provisioning | All channels + webhooks | 99.9% | Contact us |
Enterprise adds SCIM 2.0 provisioning, advanced namespace-scoped RBAC, BYOC / on-prem deployment, 1-year audit retention, US or EU data residency, and a dedicated CSM. Annual contracts get 20% off the Team list rate.
Billing is per connected cluster, not per node or per pod. A 3-node dev cluster and a 300-node prod cluster cost the same on Team (Free has a 10-node sanity cap per cluster). That's deliberate - we don't want to penalize you for running bigger workloads, and the agent's cost to us doesn't scale linearly with cluster size.
OSS vs Radar Cloud
| Radar OSS | Radar | |
|---|---|---|
| Install location | Your laptop (or in-cluster Helm) | Small agent per cluster, hosted backend |
| Clusters | One at a time | Unlimited, aggregated fleet view |
| Timeline retention | In-memory (or SQLite, single-process) | 24 hours Free / 30 days Team / 1 year+ Enterprise |
| Auth | Local kubeconfig | SSO (Okta, Google, Entra ID, SAML/OIDC) + SCIM |
| RBAC | Inherits kubeconfig | Scoped per-cluster and per-namespace |
| Alerts | None | Slack, PagerDuty, Opsgenie, Teams, webhook |
| Shareable views | No | Deep links per resource / time range |
| Price | Free forever | Free for 3 clusters, $99 / cluster / month Team, Enterprise custom |
| SLA | None | 99.9% on Enterprise |
What We Are Not Shipping in v1
This is the honest list. Some of these will land in the next two quarters, some we're still debating.
- Cross-cluster topology edges. Mentioned above. Single-cluster topology only in v1.
- Custom dashboards. The views are the views. You can filter and deep-link, but you can't build your own tiles or drag widgets around. This is in the design queue for Q3.
- Retention beyond what Enterprise offers. We can keep cluster history indefinitely for Enterprise customers, but if you need a seven-year regulatory archive with legal-hold guarantees, Radar is not that system. Ship the data to your SIEM.
- Self-hosted Radar below the Enterprise tier. The BYOC / on-prem deployment path is an Enterprise feature, not something on Team. If you need the Radar experience inside your own infra without an Enterprise contract, the path today is Radar OSS plus the in-cluster Helm deployment. Teams talking to us about regulated / air-gapped environments usually end up on Enterprise BYOC.
Radar is also not a metrics platform, not an APM, and not a log aggregator. It's a Kubernetes visibility and incident plane. Prometheus, Grafana, Datadog, and your existing log backend keep their jobs.
Why Hosted When OSS Exists
The obvious question: if Radar OSS is free and good, why build a paid hosted thing?
Because some things can't live on one laptop. Persistence needs a database. Notifications need a system that's awake when you're not. SSO and RBAC need identity and an authorization layer. Cross-cluster aggregation needs a fan-in point. None of those belong in a local Go binary, and bolting them on would ruin the thing that makes OSS good.
So we drew the line cleanly.
Radar OSS stays free, stays fully-featured for single-cluster local use, and keeps all the views it launched with in January. We're not pulling topology behind a paywall. We're not crippling the Helm view. We're not adding a "cloud login required" nag screen. The January 2026 architecture is the architecture, and the code is Apache 2.0 on GitHub.
Radar is the thing you reach for when one laptop isn't enough. The agent is different code, the backend is different code, the value is different. If that value is worth $99 per cluster per month to your team, great. If it isn't, OSS is still there.
That's the split. No bait-and-switch, no "community edition" with missing features, no mandatory sign-up to make OSS work. The two products serve different jobs.
Getting Started
Sign up at radarhq.io and you get the free tier immediately - one cluster, up to 10 nodes, 24-hour event timeline, unlimited teammates in your workspace. Enroll your first cluster with the Helm install above.
Docs live in the Radar repository README while we finish a dedicated Radar docs site. The OSS source, which shares most of the UI and the informer backbone, is at github.com/skyhook-io/radar.
If you're already running Radar OSS and want to try Radar on a single non-critical cluster, the agent is drop-in. You don't have to choose.
Keep reading
Multi-Cluster Topology: Cross-Cluster Service Maps That Don't Hairball
Cross-cluster service topology is hard because Kubernetes itself has no multi-cluster graph. Here's how Radar builds one without turning it into a hairball.
Persistent Event Timeline: Debugging What Happened Last Tuesday
Kubernetes events vanish after an hour. Radar keeps them for 30 days. Here is how a persistent event timeline changes incident response.
The Fleet Visibility Gap: Why Teams With 5+ Clusters Hit a Wall
Every tool that worked at 2 clusters breaks at 8. kubectl, Lens, k9s, Headlamp are all single-cluster-at-a-time. Here's where the wall is and what it looks like.