> ## Documentation Index
> Fetch the complete documentation index at: https://radarhq.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# GPU & AI/ML workload visibility - Kueue, KubeRay, KServe & more

> Radar's GPU and AI/ML coverage: Kueue, Volcano, KubeRay, KServe, KAI Scheduler, NVIDIA NIM, and Kubeflow training - 37 kinds with status badges, smart columns, and filters.

Radar ships basic resource support for the GPU scheduling, batch, and inference-serving ecosystem: **status badges, smart table columns, working column filters, and sidebar grouping** for every kind below. Detail views use Radar's standard spec/status renderer; typed detail views, topology participation, and diagnosis land as each tool gets its deeper integration.

Every status mapping is derived from the tool's upstream API types - condition semantics, phase enums, and multi-version field moves (Kueue v1beta1/v1beta2, MPIJob v1/v2beta1, InferencePool's dual API group) are handled.

## Queueing & scheduling

| Tool                                                            | Kinds                                                                 |
| --------------------------------------------------------------- | --------------------------------------------------------------------- |
| [Kueue](https://kueue.sigs.k8s.io/)                             | ClusterQueue, LocalQueue, Workload, ResourceFlavor, AdmissionCheck    |
| Cluster Autoscaler                                              | ProvisioningRequest (created by Kueue's provisioning admission check) |
| [Volcano](https://volcano.sh/)                                  | Job, Queue, PodGroup, JobFlow, JobTemplate                            |
| [KAI Scheduler](https://github.com/kai-scheduler/KAI-Scheduler) | Queue, PodGroup                                                       |

Workload status follows the Kueue admission lifecycle - Pending, QuotaReserved, Admitted, Evicted/Preempted, Finished. Queue badges reflect `Active` (Kueue) or Open/Closed (Volcano) state.

<Note>
  Volcano's Job shares its kind name with the built-in batch/v1 Job, and Volcano and KAI both ship Queue and PodGroup kinds. Radar disambiguates by API group everywhere - tables, filters, and status badges always route to the right tool.
</Note>

## Distributed training & batch

| Tool                                                                   | Kinds                                      |
| ---------------------------------------------------------------------- | ------------------------------------------ |
| [Kubeflow training](https://www.kubeflow.org/docs/components/trainer/) | PyTorchJob, TFJob, MPIJob, TrainJob        |
| [JobSet](https://jobset.sigs.k8s.io/)                                  | JobSet                                     |
| [LeaderWorkerSet](https://lws.sigs.k8s.io/)                            | LeaderWorkerSet                            |
| [KubeRay](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) | RayCluster, RayJob, RayService, RayCronJob |

Training jobs show per-replica-type readiness (Master 1/1, Worker 3/4) and elapsed time; Ray kinds surface job status, deployment status, and worker counts.

## Inference serving

| Tool                                                                                    | Kinds                                                                                                      |
| --------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| [KServe](https://kserve.github.io/website/)                                             | InferenceService, ServingRuntime, ClusterServingRuntime, InferenceGraph, TrainedModel, LLMInferenceService |
| [Gateway API Inference Extension](https://gateway-api-inference-extension.sigs.k8s.io/) | InferencePool (v1 + alpha groups), InferenceObjective                                                      |
| [KAITO](https://kaito-project.github.io/kaito/)                                         | Workspace, RAGEngine                                                                                       |
| [NVIDIA NIM Operator](https://docs.nvidia.com/nim-operator/)                            | NIMService, NIMCache, NIMPipeline                                                                          |

InferenceService badges distinguish model-load failures (BlockedByFailedLoad) from plain not-ready; InferencePool acceptance reflects per-gateway Accepted + ResolvedRefs conditions across both API groups.

## GPU operators

| Tool                                                                               | Kinds                                                                                                  |
| ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| [NVIDIA GPU Operator](/features/integrations/nvidia-gpu-operator)                  | ClusterPolicy, NVIDIADriver - full detail views                                                        |
| [AMD GPU Operator](https://instinct.docs.amd.com/projects/gpu-operator/en/latest/) | DeviceConfig                                                                                           |
| [Dynamic Resource Allocation](/features/integrations/dra)                          | ResourceClaim, DeviceClass, ResourceSlice, ResourceClaimTemplate - full detail views and relationships |

The NVIDIA GPU Operator and DRA get full typed detail views - see their dedicated pages. Classic extended-resource GPU visibility (node capacity, pod requests, GPU table columns) works on any cluster with no operator at all.