Skip to main content
Radar ships basic resource support for the GPU scheduling, batch, and inference-serving ecosystem: status badges, smart table columns, working column filters, and sidebar grouping for every kind below. Detail views use Radar’s standard spec/status renderer; typed detail views, topology participation, and diagnosis land as each tool gets its deeper integration. Every status mapping is derived from the tool’s upstream API types - condition semantics, phase enums, and multi-version field moves (Kueue v1beta1/v1beta2, MPIJob v1/v2beta1, InferencePool’s dual API group) are handled.

Queueing & scheduling

ToolKinds
KueueClusterQueue, LocalQueue, Workload, ResourceFlavor, AdmissionCheck
Cluster AutoscalerProvisioningRequest (created by Kueue’s provisioning admission check)
VolcanoJob, Queue, PodGroup, JobFlow, JobTemplate
KAI SchedulerQueue, PodGroup
Workload status follows the Kueue admission lifecycle - Pending, QuotaReserved, Admitted, Evicted/Preempted, Finished. Queue badges reflect Active (Kueue) or Open/Closed (Volcano) state.
Volcano’s Job shares its kind name with the built-in batch/v1 Job, and Volcano and KAI both ship Queue and PodGroup kinds. Radar disambiguates by API group everywhere - tables, filters, and status badges always route to the right tool.

Distributed training & batch

ToolKinds
Kubeflow trainingPyTorchJob, TFJob, MPIJob, TrainJob
JobSetJobSet
LeaderWorkerSetLeaderWorkerSet
KubeRayRayCluster, RayJob, RayService, RayCronJob
Training jobs show per-replica-type readiness (Master 1/1, Worker 3/4) and elapsed time; Ray kinds surface job status, deployment status, and worker counts.

Inference serving

ToolKinds
KServeInferenceService, ServingRuntime, ClusterServingRuntime, InferenceGraph, TrainedModel, LLMInferenceService
Gateway API Inference ExtensionInferencePool (v1 + alpha groups), InferenceObjective
KAITOWorkspace, RAGEngine
NVIDIA NIM OperatorNIMService, NIMCache, NIMPipeline
InferenceService badges distinguish model-load failures (BlockedByFailedLoad) from plain not-ready; InferencePool acceptance reflects per-gateway Accepted + ResolvedRefs conditions across both API groups.

GPU operators

ToolKinds
NVIDIA GPU OperatorClusterPolicy, NVIDIADriver - full detail views
AMD GPU OperatorDeviceConfig
Dynamic Resource AllocationResourceClaim, DeviceClass, ResourceSlice, ResourceClaimTemplate - full detail views and relationships
The NVIDIA GPU Operator and DRA get full typed detail views - see their dedicated pages. Classic extended-resource GPU visibility (node capacity, pod requests, GPU table columns) works on any cluster with no operator at all.