All posts
Engineering·February 1, 2026· 10 min read

The Radar Agent: Outbound-Only, Scoped, and 32MB

How the Radar agent works: outbound-only TLS, read-only RBAC by default, ~32MB RSS. The five questions every platform team asks, answered.

Eyal Dulberg
CTO, Skyhook
The Radar Agent: Outbound-Only, Scoped, and 32MB

Every platform team we've talked to about Radar asks the same five questions before they'll install anything in their cluster. Here they are, in the order they usually come up:

  1. Does it need inbound network access?
  2. What Kubernetes permissions does it need?
  3. What data leaves the cluster?
  4. How are credentials handled?
  5. What's the blast radius if the agent gets compromised?

These are the right questions. If you're responsible for a cluster that handles real workloads, "just install our Helm chart and give it cluster-admin" is not an acceptable answer. The rest of this post is how we answer each one, with the actual design choices we made and the tradeoffs that came with them.

Radar agent architecture

1. No inbound access. Ever.

The simplest question first. The agent never opens a port. Not a NodePort, not a LoadBalancer, not an Ingress. It doesn't sit behind a Service that your cluster exposes to the internet, and it doesn't need you to punch holes in your firewall for a vendor CIDR range.

We rejected the inbound model early. The typical pitch goes: "just expose our controller on an Ingress and we'll call it when we need to." For most platform teams we've worked with, that's a non-starter. Every inbound path is a new attack surface, a new TLS cert to manage, a new firewall rule to audit, and a new thing your security team has to sign off on. Doing that for a vendor tool is a lot of ceremony for visibility.

So the agent dials out. On startup, it opens a single TCP connection to agents.radarhq.io:443, negotiates TLS, upgrades to a WebSocket, and authenticates with a cluster-scoped bearer token it got at install time. On top of that one WebSocket we run yamux for multiplexing, so every backend-initiated request (topology fetch, log tail, exec session, MCP tool call) opens a fresh stream on the same connection instead of a new TCP dial. That one connection stays open for the lifetime of the pod.

What this looks like from your side:

  • One egress rule: allow agents.radarhq.io:443 (or the EU endpoint, depending on region).
  • No inbound rules. No public IPs. No load balancers.
  • TLS on the wire. The agent verifies the server cert against public CAs; the server authenticates the agent by SHA-256-hashing the presented bearer token and comparing it against the hash it stored when the cluster was onboarded.

The honest concession: if you run a cluster with zero egress to the public internet - no image pulls from Docker Hub, no calls to external APIs, nothing - then Radar requires you to allowlist one FQDN. That's the minimum. We can't do visibility-as-a-service without a network path. For air-gapped environments, the open-source Radar is the right answer.

2. Read-only by default. Read-write is opt-in per feature.

The Helm chart installs one Deployment running one container, with one ServiceAccount bound to one ClusterRole. The default ClusterRole is read-only. Here's the actual YAML, trimmed for readability:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: radar-cloud-agent
rules:
  # Core resources
  - apiGroups: [""]
    resources:
      - pods
      - services
      - endpoints
      - configmaps
      - namespaces
      - nodes
      - persistentvolumes
      - persistentvolumeclaims
      - events
      - serviceaccounts
    verbs: [get, list, watch]
 
  # Workload controllers
  - apiGroups: ["apps"]
    resources: [deployments, statefulsets, daemonsets, replicasets]
    verbs: [get, list, watch]
 
  # Batch
  - apiGroups: ["batch"]
    resources: [jobs, cronjobs]
    verbs: [get, list, watch]
 
  # Networking
  - apiGroups: ["networking.k8s.io"]
    resources: [ingresses, networkpolicies]
    verbs: [get, list, watch]
 
  # Autoscaling
  - apiGroups: ["autoscaling"]
    resources: [horizontalpodautoscalers]
    verbs: [get, list, watch]
 
  # RBAC (metadata only - we don't read Role bindings contents beyond names/subjects)
  - apiGroups: ["rbac.authorization.k8s.io"]
    resources: [roles, rolebindings, clusterroles, clusterrolebindings]
    verbs: [get, list, watch]
 
  # Metrics (if metrics-server is installed)
  - apiGroups: ["metrics.k8s.io"]
    resources: [pods, nodes]
    verbs: [get, list]
 
  # CRDs - discovery only, so the agent can find Argo/Flux/Istio/etc.
  - apiGroups: ["apiextensions.k8s.io"]
    resources: [customresourcedefinitions]
    verbs: [get, list, watch]
 
  # Secrets - metadata only (names, types, timestamps), NOT values.
  # See the data contract section for how we handle this.
  - apiGroups: [""]
    resources: [secrets]
    verbs: [get, list, watch]
 
  # Pod logs and exec are scoped to the "pods/log" and "pods/exec" subresources
  # and are disabled by default. Enable per feature.
  # - apiGroups: [""]
  #   resources: [pods/log]
  #   verbs: [get]

That's the baseline. It lets the agent watch resources, build the topology, correlate events, and show you Helm releases. It does not let it mutate anything, exec into pods, or read log content.

If you want specific features that need more, you opt in. Each feature maps to a named set of rules:

  • features.logs=true adds pods/log get.
  • features.exec=true adds pods/exec create. This is also gated per-user by Radar's own RBAC, so Viewers never get a terminal.
  • features.helm=true adds get/list/watch/create/update/patch/delete on Helm-managed resources (scoped via labels).
  • features.scale=true adds update on deployments/scale, statefulsets/scale.
  • features.restart=true adds patch on workloads.

You enable them in values.yaml. Nothing else flips on automatically.

3. The data contract

The question I always want asked and rarely get: what actually leaves the cluster?

Data that leavesData that does not
Resource state (kind, name, namespace, labels, status, spec fields)Secret values. Ever.
Kubernetes events (Warning, Normal)ConfigMap values marked sensitive via annotation
Helm release metadata (chart name, version, values, status)Log content at rest (streamed on demand only)
Coarse pod metrics (CPU, memory from metrics-server)Exec session content (streamed, not stored)
CRD schemas and instances you've allowlistedPort-forward traffic (streamed, not stored)
Node status, capacity, allocatableRaw kube-apiserver audit logs
Ownership chains (RS → Deploy, Pods → RS, etc.)managedFields (stripped before send)

A few things worth being explicit about.

Secret values never leave. The agent lists Secrets so it can show you that a Secret named db-credentials exists and is mounted into a pod, but it does not read the data field. At the informer level, we strip the data before it enters the in-memory cache. If you try to "view" a Secret in the Radar UI, we show you the keys and type. We do not show the values.

Logs, exec, and port-forward are on-demand only. When a user in the UI asks to tail a pod's logs, the Radar server signals the agent over the uplink, the agent opens a streaming read against pods/log, and the bytes flow back through the uplink to the user's browser. Nothing is written to disk on our side. When the user closes the tab, the stream closes. Same pattern for kubectl exec and kubectl port-forward equivalents.

managedFields gets stripped. If you've ever looked at the raw JSON of a Kubernetes object, you know managedFields can be larger than the rest of the object combined. We strip it at informer callback time, before the object enters the cache. Saves memory on the agent and bytes on the wire.

ConfigMaps can opt out via annotation. If you have ConfigMaps with sensitive content (which you shouldn't, but people do), annotate them with radarhq.io/redact: "true" and the agent treats them like Secrets - metadata only, values redacted before shipping.

4. Credentials: cluster bearer tokens over TLS

Each cluster connection is bootstrapped with a bearer token issued from the Radar dashboard by an Admin. The raw token is shown in the dashboard exactly once at creation and only ever stored as a SHA-256 hash server-side, bound to one cluster record.

On every connection, the agent presents the token over TLS; the Radar server hashes the presented value and compares it against the stored hash before opening the tunnel. There's no separate enrollment handshake and no client-cert material to manage in-cluster: the token itself is the long-lived credential. An Admin can rotate the token at any time from the dashboard, which invalidates the previous hash and forces the agent to reconnect with the new value.

We don't handle your kubeconfig, your cloud credentials, or your cluster CA. The agent authenticates to the Kubernetes API using the projected ServiceAccount token that the kubelet gives it, same as any in-cluster controller.

5. Blast radius: what happens if the agent is compromised

The worst case: someone gets code execution inside the agent pod. What can they do?

With the default read-only ClusterRole, they can read cluster state. That's a disclosure problem, not a destruction problem. They cannot scale workloads to zero, they cannot delete your PVCs, they cannot exec into pods and pivot. The ClusterRole is the ceiling.

With read-write features enabled, the ceiling rises to exactly what those features allow. If you enabled features.scale=true, an attacker can scale deployments. If you enabled features.exec=true, they can open a shell in any pod - the same pod an Admin user could exec into through the UI. This is why we make every read-write feature opt-in and surface the RBAC impact in the Helm chart's NOTES.txt when you install it.

The other containment: the agent runs in its own namespace (radar-cloud by default), with a PodSecurityContext that drops all capabilities, runs as non-root, has a read-only root filesystem, and disables privilege escalation. A compromised agent container can't write to disk or escalate to root without breaking out of the kernel's sandbox.

Why SharedInformer

The agent uses the same client-go SharedInformer pattern as Radar OSS. A SharedInformer does one list call against the API server for a given resource kind, switches to a Watch stream for deltas, and maintains an in-memory cache that multiple consumers can read from.

Why this matters:

  • Low API server load. One watch per kind, not N polls per second. On a cluster with 10,000 resources, we do 10k list reads at startup, then we're on watch streams. The kube-apiserver barely notices us.
  • Deltas, not snapshots. When a pod transitions from Pending to Running, we ship one delta. Not the entire pod object repeatedly.
  • Memory efficiency. On a cluster with a few hundred resources, the agent sits at ~32MB RSS. Memory grows roughly linearly with object count - a cluster with tens of thousands of pods will use more, but the constant factor is small because we strip managedFields and denormalize aggressively.

Radar resources view powered by the same informer cache

The wire protocol

The uplink is a WebSocket over TLS, with yamux as the multiplexer on top. A few properties worth calling out:

  • One connection, many streams. Every backend-initiated request (a browser loading topology, an MCP tool fetching resources, a log tail, a pod-exec session) is a fresh yamux stream on the same WebSocket. Streams are cheap (roughly 1KB of bookkeeping each); TCP dials are not.
  • Plain HTTP inside. Each stream carries a normal HTTP request to the in-cluster Radar binary's chi router, including SSE for live updates and WebSocket upgrades for kubectl exec. There's no custom binary protocol to reverse-engineer; helm template the chart and read the manifests if you want to audit the surface area.
  • Keepalives. yamux sends lightweight pings on idle connections so NATs and load balancers don't silently reap the tunnel, and so dead ends are detected quickly on both sides.
  • Back-pressure. yamux's per-stream flow control keeps one slow consumer (say, a user tailing noisy logs) from starving every other stream on the same connection.

Failure modes

Networks fail. Our backend sometimes has outages. Here's what the agent does when things go wrong.

Uplink drops. The agent reconnects with exponential backoff (1s, 2s, 4s, ..., capped at 60s). While disconnected, it keeps its informer caches warm and buffers deltas in a bounded in-memory queue (default 10,000 events). If the queue fills, we drop oldest. When the uplink comes back, the agent does a lightweight resync - the server tells it the last event it saw, and the agent sends anything newer.

Radar has an outage. Your cluster keeps running. The agent is not in the data path for your workloads. It doesn't mutate anything, it doesn't proxy traffic, it doesn't gate admission. If we're down, you lose visibility in the Radar UI for the duration. That's the worst thing that happens.

Agent crashes. Kubernetes restarts it. On startup, it redials the tunnel with the bearer token from its Helm values, rebuilds its informer caches, and is back in the fleet view. Cold start to steady state is usually under 10 seconds on a small cluster, maybe 30 seconds on a big one.

What we don't do

This is the part I care about most, because a lot of "agents" in this space have grown tentacles.

  • No sidecar injection. We don't mutate your pods.
  • No mutating admission webhooks. We don't sit in the admission path. If the agent is down, your deploys still work.
  • No validating webhooks. Same reason.
  • No CRDs of our own. We read your CRDs (Argo, Flux, Istio, your own). We don't install ours.
  • No init containers. Nothing piggybacks on your workloads.
  • No cert-manager dependency. The agent authenticates with a static bearer token you supply at install time. Nothing in the chart asks your cluster to issue certs for us.

One Deployment. One ServiceAccount. One ClusterRole. One Secret. One namespace. That's it.

Install

helm repo add skyhook https://skyhook-io.github.io/helm-charts
helm repo update
 
helm install radar-cloud-agent skyhook/radar-cloud-agent \
  --namespace radar-cloud --create-namespace \
  --set token=$RADAR_HUB_TOKEN

The token comes from the Radar dashboard when you click "Add cluster." By the time helm install exits, the agent is connected and your cluster shows up in the fleet view.

If you want to audit exactly what lands in your cluster before installing, helm template the chart and read the manifests. There are no surprises. That's the whole point.

The short version

The agent is a small Go binary that dials out to one FQDN over TLS, authenticates with a cluster-scoped bearer token, runs with a read-only ClusterRole unless you opt into more, serves on-demand requests from the Radar backend over a yamux-multiplexed WebSocket, and never stores logs or exec sessions at rest. It's ~32MB RSS at steady state on a typical cluster. If it gets compromised, the ClusterRole is the ceiling on what an attacker can do.

None of this is revolutionary. It's just the default every hosted Kubernetes tool should have been shipping for the last five years. Some do. Many don't. If you're evaluating one, these are the five questions to ask - ours or anyone else's.

radar-cloudkubernetesarchitecturesecurityagent

Bring your first cluster online in 60 seconds.

Install the Helm chart, paste a token, see your cluster. No credit card required.

Apache 2.0 OSS · Unlimited clusters self-hosted · Hosted free tier for up to 3 clusters