Skip to main content

First: what to grab before opening a support ticket

Three pieces of context cover 80% of cases:
# 1. hub container logs (last 200 lines).
kubectl -n radar-hub logs deploy/radar-hub-hub -c hub --tail=200

# 2. migrate initContainer logs (initContainer of the same hub Pod).
kubectl -n radar-hub logs deploy/radar-hub-hub -c migrate

# 3. Pod state.
kubectl -n radar-hub get pods,deploy,svc,ingress
Attach those to support@radarhq.io along with your license id (visible in the log line radar-hub: license verified license_id=lic_...).

”Can’t reach the web app” / first-install issues

Pod stuck pending

kubectl -n radar-hub describe pod <pod-name>
Most common causes:
  • Image pull error - your registry mirror isn’t reachable, or the imagePullSecrets ref is missing. Confirm kubectl -n radar-hub get sa <sa-name> -o yaml lists the secret.
  • No node fits - nodeSelector / tolerations / affinity settings exclude every node. Drop them while debugging.
  • PVC pending - the chart doesn’t claim PVCs, but if you’re running CNPG in the same namespace its claims may compete for storage.

Hub Pod stuck in Init: migrate failed

kubectl -n radar-hub logs deploy/radar-hub-hub -c migrate
Failure modes:
  • relation "X" does not exist - extremely rare since the chart’s 000_baseline.sql migration bootstraps the schema automatically. If you see this, your Postgres has a partial schema applied by hand. Easiest recovery: drop the database, recreate it empty, restart the hub Pod (kubectl -n radar-hub rollout restart deploy/radar-hub-hub). The migrate initContainer will re-apply everything from 000_baseline.sql onward.
  • password authentication failed - the DSN is wrong. kubectl get secret <pg-secret> -o yaml | grep dsn to confirm.
  • connection refused - the DB isn’t reachable from this namespace. Test with a one-off kubectl run -ti --rm pgcheck --image=postgres:17-alpine -- psql <dsn> -c 'SELECT 1'.

Hub Pod CrashLoopBackOff at boot

kubectl -n radar-hub logs deploy/radar-hub-hub -c hub --previous
(Use --previous because the main container dies on misconfig; the current pod may not have logs yet.)
  • no auth configured: set WORKOS_API_KEY ... or RADAR_HUB_OIDC_* ... or RADAR_HUB_BOOTSTRAP_ADMIN_* ... (log.Fatal at boot) - neither WorkOS nor OIDC nor break-glass is wired. At least one is required. Fix in chart values: auth.breakGlass.email + password or OIDC issuer + secret.
  • HUB_COOKIE_PASSWORD must be at least 32 bytes - the value is too short. Regenerate with openssl rand -base64 48.

Control plane starts but /api/config 503s

kubectl -n radar-hub logs deploy/radar-hub-hub -c hub --tail=50
  • db connection: ... - container is up, DB is unreachable mid-flight. Same checks as the migrate failures above.

Web app loads but every request 401s

The web app fetches /api/account on mount. If that 401s in a fresh browser:
  • Cookie not set - confirm you actually completed the break-glass login or OIDC roundtrip. The login response should Set-Cookie: radar-hub-session=....
  • CORS / SameSite mismatch - if your web app host differs from the control plane host (split origins), the cookie won’t flow. Single-host install is the supported MVP shape.
  • HUB_ALLOWED_ORIGINS doesn’t include your origin - the control plane strips disallowed origins server-side.

Auth issues

”Invalid credentials” on break-glass login

  • Email is case-insensitive, but trim whitespace.
  • Bcrypt hash format must start with $2y$, $2a$, or $2b$. If it starts with anything else, the chart will reject it at boot - check kubectl -n radar-hub logs deploy/radar-hub-hub | grep break-glass.
  • Plaintext password with a ' or $ not properly escaped via --set will produce a hash for a different string than you intended. Use --set-string or move secrets to a Secret.

OIDC redirect loop

  • auth.oidc.callbackURL must EXACTLY match what’s registered with the IdP. Case, scheme, port, trailing slash all matter.
  • Discover-doc errors at boot: kubectl -n radar-hub logs | grep oidc. Common cause: corporate proxy intercepting outbound HTTPS - the control plane needs direct egress to the IdP’s .well-known/openid-configuration.

”OIDC silent refresh not supported”

That’s not an error - the control plane doesn’t silently refresh OIDC sessions. When a user’s cookie expires (default 7 days), they re-auth through the IdP.

Group-based admin role isn’t sticking

The control plane computes role at first sign-in only. If a user signed in with the wrong group, then got added to radar-admins, they won’t auto-promote - an existing owner has to promote via the web app. By design (avoids accidentally yanking admin status when groups get reorganized).

License issues

”license verification failed”

  • The build was cut against a different public key than the one signed your JWT. Confirm the radar-hub image tag matches what your account team issued the license for.
  • The JWT was truncated when pasted. Whole token is one base64 line - watch for \n mid-string.

”license is expired”

This is warn-only; the control plane still serves. To clear the banner:
  1. Get a renewed JWT from your account team.
  2. Update the Helm value (license.key) or rotate the existing Secret.
  3. Roll the Pod: kubectl -n radar-hub rollout restart deploy/radar-hub-hub.

”embedded license public key is the all-zero placeholder”

The radar-hub binary was built without the production public key embedded. This shouldn’t happen in any image you pulled from ghcr.io/skyhook-dev/radar-hub. If you see it, you’re probably running a self-built image - the chart’s image references should point at the official release.

Cluster-side issues

”Cluster shows disconnected” but the in-cluster Radar is running

In the customer cluster:
kubectl logs -n radar deploy/radar --tail=50
Look for:
  • cloud: dial wss://radar.acme.example/agent: ... - DNS / network from the customer cluster to the control plane.
  • cloud: token rejected - the cluster token was rotated or revoked. Re-mint via the web app’s “Add cluster” wizard.
  • cloud: tls handshake failed - the control plane’s TLS cert doesn’t validate from the customer cluster. Confirm cert chain.

Cluster connects but tunnel calls hang

  • The control plane’s proxy_read_timeout is 600s by default. Long Helm catalog scans or large audit pulls can exceed that. Bump via ingress.annotations for nginx-ingress.
  • Some on-prem load balancers strip Connection: upgrade at L7. The Ingress class needs to support WebSocket. nginx-ingress does by default; some HA-Proxy / F5 deployments need explicit config.

Performance issues

/api/fleet/* is slow

Fleet calls fan out to every connected cluster in parallel with a 7s per-cluster timeout + 12s overall cap. If you have hundreds of clusters and the slowest one dominates, the call returns partial results with the slow ones flagged. The web app’s coverage footer shows offline / errored clusters - that’s the signal, not a bug.

radar-hub Pod CPU at 100%

  • Audit retention sweep isn’t actively run today (query-time floor only). Watch radarhub_audit_events_total - if it’s into the millions and queries are slow, the retention filter is doing too much work; reduce RADAR_HUB_AUDIT_RETENTION_DAYS or add a manual sweep.
  • Tunnel-session count: hundreds of clusters connected to a single replica is fine; thousands isn’t tested.

Postgres slow on audit reads

Index on audit_events(org_id, created_at DESC) is shipped by default. If a customer’s audit volume is enormous, ensure that index is healthy:
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'audit_events';

Last resort

  • helm get values radar-hub to confirm what’s actually configured.
  • kubectl -n radar-hub describe deploy/radar-hub-hub for the full env-var list as Kubernetes sees it.
  • Toggle break-glass admin if OIDC is wedged - it’s always available even when OIDC is misconfigured. That’s its whole purpose.
If none of the above gets you unblocked, email support@radarhq.io with the three artifacts at the top of this page.