For Site Reliability Engineers

Cut incident MTTR without adding another dashboard to your on-call rotation.

Built for the 2am page. Multi-cluster search, event replay, shareable incident links - none of which require you to find a kubeconfig first.

Try Radar Cloud free Let's Talk Install OSS

The on-call reality

An incident starts. Your first five minutes decide everything.

You get paged. You're groggy. You've got a terminal, a laptop balanced on your knees, and a sinking feeling that you don't remember which cluster the payments service runs in.

Your first kubectl command takes 40 seconds because you're on the wrong context. By the time you get the right one, the pod has restarted. The events are rolling off. The logs have rotated. Somebody in Slack is asking for an update.

The real MTTR killer isn't the fix. It's the five minutes of kubectl roulette before you can even see what's wrong. That's the time Radar gives back.

A typical incident

Your runbook, before and after.

Same incident. Same human. Different tool.

Time

With Radar

Without

0:00

PagerDuty fires

Open the Slack incident channel. Click the Radar link in the alert.

Find your VPN. SSH to jump host. Switch kubeconfig. Remember which cluster.

0:30

Triage

See the affected pods, recent events, and upstream dependencies in one view.

kubectl get pods --all-namespaces | grep ... (in 4 different clusters).

2:00

Root cause

Rewind the timeline 10 minutes. See the ArgoCD sync that broke it. Click to the commit.

Check ArgoCD UI. Check GitHub. Check kubectl events (which are already gone).

3:00

Rollback

One-click Helm rollback from Radar. Confirm the topology goes green.

Find the right values file. Run helm rollback. Watch kubectl and hope.

5:00

Communicate

Share the timeline link with the incident channel.

Screenshot. Paste. Explain. Answer the same 4 questions from 3 people.

Post-incident

Retro

The timeline is still there. Export it as CSV. Attach to the post-mortem doc.

Events are gone. Reconstruct from logs. Hope someone took screenshots.

Composite walkthrough based on incident retros from early design partners. Your mileage will vary. The kubectl roulette will not.

What SREs actually use

Six features. All optimized for the way incidents really happen.

Cross-cluster search

One search bar across every cluster. Paste a pod name. Jump to the state, events, and logs in seconds - no kubeconfig hunting.

Event timeline replay

Rewind any cluster past Kubernetes' 1-hour TTL - events are persisted locally per cluster, however long you configure storage for. Correlate OOMKills and image pulls with what users were seeing.

Shareable incident links

Drop a link in #incident-response. Everyone sees the same topology, the same filtered timeline, the same resource state. Handoffs in seconds.

Less alert noise

Radar's goal with alerting is to surface what matters and bundle the rest. Slack/PagerDuty/MS Teams routing today; correlation and noise-reduction templates on the roadmap.

Traffic + mTLS visibility

Is the payments service actually talking to the fraud service? See east-west traffic live, with error rates, latency, and TLS cert health.

GitOps incident correlation

ArgoCD synced 4 minutes before the alert. Radar shows you the commit, the diff, and the live state - in the same timeline.

Integrates with your on-call stack

We page you in the tools you already answer in.

Radar doesn't replace your alerting. It adds the Kubernetes-layer truth your existing alerts are missing.

PagerDuty

Opsgenie

Slack

MS Teams

Datadog

New Relic

Grafana

Webhook

What good looks like

The numbers SRE teams track when they roll out Radar.

These are the metrics early design partners have reported. Not promises - baselines to watch.

4×

Faster time-to-root-cause in incident retros

60%

Reduction in Slack questions during incidents

Days

New on-call engineers productive in days, not months

FAQ

Questions SREs ask.

Does Radar replace my observability stack?

No. Radar is the Kubernetes-layer truth - what's running, what's connected, what happened. It doesn't replace Datadog, New Relic, or Grafana for application metrics and tracing. It sits next to them, and most teams deep-link between the two.

What about my existing alerting?

Keep it. Radar's alerts are for Kubernetes-state correlations (e.g., "pod crashed 3 minutes after an ArgoCD sync") - not a replacement for your application SLO alerts. Alerts can be routed to PagerDuty, Opsgenie, Slack, MS Teams, or a generic webhook.

Can I trigger rollbacks from Radar?

Yes, for Helm releases. One-click rollback to any previous revision, with a diff preview. RBAC-scoped - only engineers with deploy permissions in that namespace can execute.

Does it work with our existing RBAC?

Yes. Radar inherits from Kubernetes RBAC. If you can't list pods in the payments namespace with kubectl, you can't see them in Radar. Map your Okta or Google groups to namespaces once; access stays consistent.

What happens if Radar is down?

Your cluster keeps running. The in-cluster Radar buffers events for up to an hour and syncs on reconnect. When Radar itself is the incident, you still have kubectl - we just want it to not be your first reach.

Your next incident is going to happen. Be ready for it.

Apache 2.0 OSS or hosted free for 3 clusters. Install in 60 seconds. Have a timeline before your next page fires.

Install with Helm Try hosted Let's Talk

Apache 2.0 OSS · Unlimited clusters self-hosted · Hosted free tier for up to 3 clusters