Weights And Biases — W&B: log ML experiments, sweeps, model registry, dashboards
Weights And Biases
Section titled “Weights And Biases”W&B: log ML experiments, sweeps, model registry, dashboards.
Skill metadata
Section titled “Skill metadata”| Source | Bundled (installed by default) |
| Path | skills/mlops/evaluation/weights-and-biases |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | wandb |
| Tags | MLOps, Weights And Biases, WandB, Experiment Tracking, Hyperparameter Tuning, Model Registry, Collaboration, Real-Time Visualization, PyTorch, TensorFlow, HuggingFace |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Weights & Biases: ML Experiment Tracking & MLOps
Section titled “Weights & Biases: ML Experiment Tracking & MLOps”When to Use This Skill
Section titled “When to Use This Skill”Use Weights & Biases (W&B) when you need to:
- Track ML experiments with automatic metric logging
- Visualize training in real-time dashboards
- Compare runs across hyperparameters and configurations
- Optimize hyperparameters with automated sweeps
- Manage model registry with versioning and lineage
- Collaborate on ML projects with team workspaces
- Track artifacts (datasets, models, code) with lineage
Users: 200,000+ ML practitioners | GitHub Stars: 10.5k+ | Integrations: 100+
Installation
Section titled “Installation”# Install W&Bpip install wandb
# Login (creates API key)wandb login
# Or set API key programmaticallyexport WANDB_API_KEY=your_api_key_hereQuick Start
Section titled “Quick Start”Basic Experiment Tracking
Section titled “Basic Experiment Tracking”import wandb
# Initialize a runrun = wandb.init( project="my-project", config={ "learning_rate": 0.001, "epochs": 10, "batch_size": 32, "architecture": "ResNet50" })
# Training loopfor epoch in range(run.config.epochs): # Your training code train_loss = train_epoch() val_loss = validate()
# Log metrics wandb.log({ "epoch": epoch, "train/loss": train_loss, "val/loss": val_loss, "train/accuracy": train_acc, "val/accuracy": val_acc })
# Finish the runwandb.finish()With PyTorch
Section titled “With PyTorch”import torchimport wandb
# Initializewandb.init(project="pytorch-demo", config={ "lr": 0.001, "epochs": 10})
# Access configconfig = wandb.config
# Training loopfor epoch in range(config.epochs): for batch_idx, (data, target) in enumerate(train_loader): # Forward pass output = model(data) loss = criterion(output, target)
# Backward pass optimizer.zero_grad() loss.backward() optimizer.step()
# Log every 100 batches if batch_idx % 100 == 0: wandb.log({ "loss": loss.item(), "epoch": epoch, "batch": batch_idx })
# Save modeltorch.save(model.state_dict(), "model.pth")wandb.save("model.pth") # Upload to W&B
wandb.finish()Core Concepts
Section titled “Core Concepts”1. Projects and Runs
Section titled “1. Projects and Runs”Project: Collection of related experiments Run: Single execution of your training script
# Create/use projectrun = wandb.init( project="image-classification", name="resnet50-experiment-1", # Optional run name tags=["baseline", "resnet"], # Organize with tags notes="First baseline run" # Add notes)
# Each run has unique IDprint(f"Run ID: {run.id}")print(f"Run URL: {run.url}")2. Configuration Tracking
Section titled “2. Configuration Tracking”Track hyperparameters automatically:
config = { # Model architecture "model": "ResNet50", "pretrained": True,
# Training params "learning_rate": 0.001, "batch_size": 32, "epochs": 50, "optimizer": "Adam",
# Data params "dataset": "ImageNet", "augmentation": "standard"}
wandb.init(project="my-project", config=config)
# Access config during traininglr = wandb.config.learning_ratebatch_size = wandb.config.batch_size3. Metric Logging
Section titled “3. Metric Logging”# Log scalarswandb.log({"loss": 0.5, "accuracy": 0.92})
# Log multiple metricswandb.log({ "train/loss": train_loss, "train/accuracy": train_acc, "val/loss": val_loss, "val/accuracy": val_acc, "learning_rate": current_lr, "epoch": epoch})
# Log with custom x-axiswandb.log({"loss": loss}, step=global_step)
# Log media (images, audio, video)wandb.log({"examples": [wandb.Image(img) for img in images]})
# Log histogramswandb.log({"gradients": wandb.Histogram(gradients)})
# Log tablestable = wandb.Table(columns=["id", "prediction", "ground_truth"])wandb.log({"predictions": table})4. Model Checkpointing
Section titled “4. Model Checkpointing”import torchimport wandb
# Save model checkpointcheckpoint = { 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': loss,}
torch.save(checkpoint, 'checkpoint.pth')
# Upload to W&Bwandb.save('checkpoint.pth')
# Or use Artifacts (recommended)artifact = wandb.Artifact('model', type='model')artifact.add_file('checkpoint.pth')wandb.log_artifact(artifact)Hyperparameter Sweeps
Section titled “Hyperparameter Sweeps”Automatically search for optimal hyperparameters.
Define Sweep Configuration
Section titled “Define Sweep Configuration”sweep_config = { 'method': 'bayes', # or 'grid', 'random' 'metric': { 'name': 'val/accuracy', 'goal': 'maximize' }, 'parameters': { 'learning_rate': { 'distribution': 'log_uniform', 'min': 1e-5, 'max': 1e-1 }, 'batch_size': { 'values': [16, 32, 64, 128] }, 'optimizer': { 'values': ['adam', 'sgd', 'rmsprop'] }, 'dropout': { 'distribution': 'uniform', 'min': 0.1, 'max': 0.5 } }}
# Initialize sweepsweep_id = wandb.sweep(sweep_config, project="my-project")Define Training Function
Section titled “Define Training Function”def train(): # Initialize run run = wandb.init()
# Access sweep parameters lr = wandb.config.learning_rate batch_size = wandb.config.batch_size optimizer_name = wandb.config.optimizer
# Build model with sweep config model = build_model(wandb.config) optimizer = get_optimizer(optimizer_name, lr)
# Training loop for epoch in range(NUM_EPOCHS): train_loss = train_epoch(model, optimizer, batch_size) val_acc = validate(model)
# Log metrics wandb.log({ "train/loss": train_loss, "val/accuracy": val_acc })
# Run sweepwandb.agent(sweep_id, function=train, count=50) # Run 50 trialsSweep Strategies
Section titled “Sweep Strategies”# Grid search - exhaustivesweep_config = { 'method': 'grid', 'parameters': { 'lr': {'values': [0.001, 0.01, 0.1]}, 'batch_size': {'values': [16, 32, 64]} }}
# Random searchsweep_config = { 'method': 'random', 'parameters': { 'lr': {'distribution': 'uniform', 'min': 0.0001, 'max': 0.1}, 'dropout': {'distribution': 'uniform', 'min': 0.1, 'max': 0.5} }}
# Bayesian optimization (recommended)sweep_config = { 'method': 'bayes', 'metric': {'name': 'val/loss', 'goal': 'minimize'}, 'parameters': { 'lr': {'distribution': 'log_uniform', 'min': 1e-5, 'max': 1e-1} }}Artifacts
Section titled “Artifacts”Track datasets, models, and other files with lineage.
Log Artifacts
Section titled “Log Artifacts”# Create artifactartifact = wandb.Artifact( name='training-dataset', type='dataset', description='ImageNet training split', metadata={'size': '1.2M images', 'split': 'train'})
# Add filesartifact.add_file('data/train.csv')artifact.add_dir('data/images/')
# Log artifactwandb.log_artifact(artifact)Use Artifacts
Section titled “Use Artifacts”# Download and use artifactrun = wandb.init(project="my-project")
# Download artifactartifact = run.use_artifact('training-dataset:latest')artifact_dir = artifact.download()
# Use the datadata = load_data(f"{artifact_dir}/train.csv")Model Registry
Section titled “Model Registry”# Log model as artifactmodel_artifact = wandb.Artifact( name='resnet50-model', type='model', metadata={'architecture': 'ResNet50', 'accuracy': 0.95})
model_artifact.add_file('model.pth')wandb.log_artifact(model_artifact, aliases=['best', 'production'])
# Link to model registryrun.link_artifact(model_artifact, 'model-registry/production-models')Integration Examples
Section titled “Integration Examples”HuggingFace Transformers
Section titled “HuggingFace Transformers”from transformers import Trainer, TrainingArgumentsimport wandb
# Initialize W&Bwandb.init(project="hf-transformers")
# Training arguments with W&Btraining_args = TrainingArguments( output_dir="./results", report_to="wandb", # Enable W&B logging run_name="bert-finetuning", logging_steps=100, save_steps=500)
# Trainer automatically logs to W&Btrainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset)
trainer.train()PyTorch Lightning
Section titled “PyTorch Lightning”from pytorch_lightning import Trainerfrom pytorch_lightning.loggers import WandbLoggerimport wandb
# Create W&B loggerwandb_logger = WandbLogger( project="lightning-demo", log_model=True # Log model checkpoints)
# Use with Trainertrainer = Trainer( logger=wandb_logger, max_epochs=10)
trainer.fit(model, datamodule=dm)Keras/TensorFlow
Section titled “Keras/TensorFlow”import wandbfrom wandb.keras import WandbCallback
# Initializewandb.init(project="keras-demo")
# Add callbackmodel.fit( x_train, y_train, validation_data=(x_val, y_val), epochs=10, callbacks=[WandbCallback()] # Auto-logs metrics)Visualization & Analysis
Section titled “Visualization & Analysis”Custom Charts
Section titled “Custom Charts”# Log custom visualizationsimport matplotlib.pyplot as plt
fig, ax = plt.subplots()ax.plot(x, y)wandb.log({"custom_plot": wandb.Image(fig)})
# Log confusion matrixwandb.log({"conf_mat": wandb.plot.confusion_matrix( probs=None, y_true=ground_truth, preds=predictions, class_names=class_names)})Reports
Section titled “Reports”Create shareable reports in W&B UI:
- Combine runs, charts, and text
- Markdown support
- Embeddable visualizations
- Team collaboration
Best Practices
Section titled “Best Practices”1. Organize with Tags and Groups
Section titled “1. Organize with Tags and Groups”wandb.init( project="my-project", tags=["baseline", "resnet50", "imagenet"], group="resnet-experiments", # Group related runs job_type="train" # Type of job)2. Log Everything Relevant
Section titled “2. Log Everything Relevant”# Log system metricswandb.log({ "gpu/util": gpu_utilization, "gpu/memory": gpu_memory_used, "cpu/util": cpu_utilization})
# Log code versionwandb.log({"git_commit": git_commit_hash})
# Log data splitswandb.log({ "data/train_size": len(train_dataset), "data/val_size": len(val_dataset)})3. Use Descriptive Names
Section titled “3. Use Descriptive Names”# ✅ Good: Descriptive run nameswandb.init( project="nlp-classification", name="bert-base-lr0.001-bs32-epoch10")
# ❌ Bad: Generic nameswandb.init(project="nlp", name="run1")4. Save Important Artifacts
Section titled “4. Save Important Artifacts”# Save final modelartifact = wandb.Artifact('final-model', type='model')artifact.add_file('model.pth')wandb.log_artifact(artifact)
# Save predictions for analysispredictions_table = wandb.Table( columns=["id", "input", "prediction", "ground_truth"], data=predictions_data)wandb.log({"predictions": predictions_table})5. Use Offline Mode for Unstable Connections
Section titled “5. Use Offline Mode for Unstable Connections”import os
# Enable offline modeos.environ["WANDB_MODE"] = "offline"
wandb.init(project="my-project")# ... your code ...
# Sync later# wandb sync <run_directory>Team Collaboration
Section titled “Team Collaboration”Share Runs
Section titled “Share Runs”# Runs are automatically shareable via URLrun = wandb.init(project="team-project")print(f"Share this URL: {run.url}")Team Projects
Section titled “Team Projects”- Create team account at wandb.ai
- Add team members
- Set project visibility (private/public)
- Use team-level artifacts and model registry
Pricing
Section titled “Pricing”- Free: Unlimited public projects, 100GB storage
- Academic: Free for students/researchers
- Teams: $50/seat/month, private projects, unlimited storage
- Enterprise: Custom pricing, on-prem options
Resources
Section titled “Resources”- Documentation: https://docs.wandb.ai
- GitHub: https://github.com/wandb/wandb (10.5k+ stars)
- Examples: https://github.com/wandb/examples
- Community: https://wandb.ai/community
- Discord: https://wandb.me/discord
See Also
Section titled “See Also”references/sweeps.md- Comprehensive hyperparameter optimization guidereferences/artifacts.md- Data and model versioning patternsreferences/integrations.md- Framework-specific examples