Observe infrastructure alerts such as GPU failures, thermal violations, and more during machine learning experiments you log to W&B. When you run on a supported CoreWeave Kubernetes Service (CKS) cluster, enable this integration, and satisfy the prerequisites on this page, CoreWeave Mission Control can monitor your compute infrastructure during a W&B run.Documentation Index
Fetch the complete documentation index at: https://wb-21fd5541-john-wbdocs-2044-rename-serverless-products.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
This feature is in Preview. Contact your W&B representative for access.
Prerequisites
The following must be true for this integration to work end-to-end.| Prerequisite | Details |
|---|---|
| CoreWeave platform | Available only on CoreWeave Kubernetes Service (CKS) clusters. Not available on CoreWeave bare metal clusters or CoreWeave Classic. Training jobs running through SUNK on CKS also satisfy this requirement. |
| W&B Python SDK | For training jobs, use the wandb package version 0.20.1 or later when you log a run. |
| W&B Server (Dedicated Cloud or Self-Managed) | If using a W&B Dedicated Cloud or W&B Self-Managed deployment, use W&B Server version 0.73.0 or later. Set the SERVER_FLAG_ENABLE_CORE_WEAVE_OBSERVABILITY environment variable on the W&B app pod so the server can accept CoreWeave observability data. |