First: what to grab before opening a support ticket
Three pieces of context cover 80% of cases:support@radarhq.io along with your license id (visible in the log line radar-hub: license verified license_id=lic_...).
”Can’t reach the web app” / first-install issues
Pod stuck pending
- Image pull error - your registry mirror isn’t reachable, or the
imagePullSecretsref is missing. Confirmkubectl -n radar-hub get sa <sa-name> -o yamllists the secret. - No node fits -
nodeSelector/tolerations/affinitysettings exclude every node. Drop them while debugging. - PVC pending - the chart doesn’t claim PVCs, but if you’re running CNPG in the same namespace its claims may compete for storage.
Hub Pod stuck in Init: migrate failed
relation "X" does not exist- extremely rare since the chart’s000_baseline.sqlmigration bootstraps the schema automatically. If you see this, your Postgres has a partial schema applied by hand. Easiest recovery: drop the database, recreate it empty, restart the hub Pod (kubectl -n radar-hub rollout restart deploy/radar-hub-hub). ThemigrateinitContainer will re-apply everything from000_baseline.sqlonward.password authentication failed- the DSN is wrong.kubectl get secret <pg-secret> -o yaml | grep dsnto confirm.connection refused- the DB isn’t reachable from this namespace. Test with a one-offkubectl run -ti --rm pgcheck --image=postgres:17-alpine -- psql <dsn> -c 'SELECT 1'.
Hub Pod CrashLoopBackOff at boot
--previous because the main container dies on misconfig; the current pod may not have logs yet.)
no auth configured: set WORKOS_API_KEY ... or RADAR_HUB_OIDC_* ... or RADAR_HUB_BOOTSTRAP_ADMIN_* ...(log.Fatalat boot) - neither WorkOS nor OIDC nor break-glass is wired. At least one is required. Fix in chart values:auth.breakGlass.email+ password or OIDCissuer+ secret.HUB_COOKIE_PASSWORD must be at least 32 bytes- the value is too short. Regenerate withopenssl rand -base64 48.
Control plane starts but /api/config 503s
db connection: ...- container is up, DB is unreachable mid-flight. Same checks as the migrate failures above.
Web app loads but every request 401s
The web app fetches/api/account on mount. If that 401s in a fresh browser:
- Cookie not set - confirm you actually completed the break-glass login or OIDC roundtrip. The login response should
Set-Cookie: radar-hub-session=.... - CORS / SameSite mismatch - if your web app host differs from the control plane host (split origins), the cookie won’t flow. Single-host install is the supported MVP shape.
HUB_ALLOWED_ORIGINSdoesn’t include your origin - the control plane strips disallowed origins server-side.
Auth issues
”Invalid credentials” on break-glass login
- Email is case-insensitive, but trim whitespace.
- Bcrypt hash format must start with
$2y$,$2a$, or$2b$. If it starts with anything else, the chart will reject it at boot - checkkubectl -n radar-hub logs deploy/radar-hub-hub | grep break-glass. - Plaintext password with a
'or$not properly escaped via--setwill produce a hash for a different string than you intended. Use--set-stringor move secrets to a Secret.
OIDC redirect loop
auth.oidc.callbackURLmust EXACTLY match what’s registered with the IdP. Case, scheme, port, trailing slash all matter.- Discover-doc errors at boot:
kubectl -n radar-hub logs | grep oidc. Common cause: corporate proxy intercepting outbound HTTPS - the control plane needs direct egress to the IdP’s.well-known/openid-configuration.
”OIDC silent refresh not supported”
That’s not an error - the control plane doesn’t silently refresh OIDC sessions. When a user’s cookie expires (default 7 days), they re-auth through the IdP.Group-based admin role isn’t sticking
The control plane computes role at first sign-in only. If a user signed in with the wrong group, then got added toradar-admins, they won’t auto-promote - an existing owner has to promote via the web app. By design (avoids accidentally yanking admin status when groups get reorganized).
License issues
”license verification failed”
- The build was cut against a different public key than the one signed your JWT. Confirm the
radar-hubimage tag matches what your account team issued the license for. - The JWT was truncated when pasted. Whole token is one base64 line - watch for
\nmid-string.
”license is expired”
This is warn-only; the control plane still serves. To clear the banner:- Get a renewed JWT from your account team.
- Update the Helm value (
license.key) or rotate the existing Secret. - Roll the Pod:
kubectl -n radar-hub rollout restart deploy/radar-hub-hub.
”embedded license public key is the all-zero placeholder”
Theradar-hub binary was built without the production public key embedded. This shouldn’t happen in any image you pulled from ghcr.io/skyhook-dev/radar-hub. If you see it, you’re probably running a self-built image - the chart’s image references should point at the official release.
Cluster-side issues
”Cluster shows disconnected” but the in-cluster Radar is running
In the customer cluster:cloud: dial wss://radar.acme.example/agent: ...- DNS / network from the customer cluster to the control plane.cloud: token rejected- the cluster token was rotated or revoked. Re-mint via the web app’s “Add cluster” wizard.cloud: tls handshake failed- the control plane’s TLS cert doesn’t validate from the customer cluster. Confirm cert chain.
Cluster connects but tunnel calls hang
- The control plane’s
proxy_read_timeoutis 600s by default. Long Helm catalog scans or large audit pulls can exceed that. Bump viaingress.annotationsfor nginx-ingress. - Some on-prem load balancers strip
Connection: upgradeat L7. The Ingress class needs to support WebSocket. nginx-ingress does by default; some HA-Proxy / F5 deployments need explicit config.
Performance issues
/api/fleet/* is slow
Fleet calls fan out to every connected cluster in parallel with a 7s per-cluster timeout + 12s overall cap. If you have hundreds of clusters and the slowest one dominates, the call returns partial results with the slow ones flagged. The web app’s coverage footer shows offline / errored clusters - that’s the signal, not a bug.
radar-hub Pod CPU at 100%
- Audit retention sweep isn’t actively run today (query-time floor only). Watch
radarhub_audit_events_total- if it’s into the millions and queries are slow, the retention filter is doing too much work; reduceRADAR_HUB_AUDIT_RETENTION_DAYSor add a manual sweep. - Tunnel-session count: hundreds of clusters connected to a single replica is fine; thousands isn’t tested.
Postgres slow on audit reads
Index onaudit_events(org_id, created_at DESC) is shipped by default. If a customer’s audit volume is enormous, ensure that index is healthy:
Last resort
helm get values radar-hubto confirm what’s actually configured.kubectl -n radar-hub describe deploy/radar-hub-hubfor the full env-var list as Kubernetes sees it.- Toggle break-glass admin if OIDC is wedged - it’s always available even when OIDC is misconfigured. That’s its whole purpose.
support@radarhq.io with the three artifacts at the top of this page.