Pytorch Lightning
Pytorch Lightning
Section titled “Pytorch Lightning”High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.
Skill metadata
Section titled “Skill metadata”| Source | Optional — install with hermes skills install official/mlops/pytorch-lightning |
| Path | optional-skills/mlops/pytorch-lightning |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | lightning, torch, transformers |
| Tags | PyTorch Lightning, Training Framework, Distributed Training, DDP, FSDP, DeepSpeed, High-Level API, Callbacks, Best Practices, Scalable |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
PyTorch Lightning - High-Level Training Framework
Section titled “PyTorch Lightning - High-Level Training Framework”Quick start
Section titled “Quick start”PyTorch Lightning organizes PyTorch code to eliminate boilerplate while maintaining flexibility.
Installation:
pip install lightningConvert PyTorch to Lightning (3 steps):
import lightning as Limport torchfrom torch import nnfrom torch.utils.data import DataLoader, Dataset
# Step 1: Define LightningModule (organize your PyTorch code)class LitModel(L.LightningModule): def __init__(self, hidden_size=128): super().__init__() self.model = nn.Sequential( nn.Linear(28 * 28, hidden_size), nn.ReLU(), nn.Linear(hidden_size, 10) )
def training_step(self, batch, batch_idx): x, y = batch y_hat = self.model(x) loss = nn.functional.cross_entropy(y_hat, y) self.log('train_loss', loss) # Auto-logged to TensorBoard return loss
def configure_optimizers(self): return torch.optim.Adam(self.parameters(), lr=1e-3)
# Step 2: Create datatrain_loader = DataLoader(train_dataset, batch_size=32)
# Step 3: Train with Trainer (handles everything else!)trainer = L.Trainer(max_epochs=10, accelerator='gpu', devices=2)model = LitModel()trainer.fit(model, train_loader)That’s it! Trainer handles:
- GPU/TPU/CPU switching
- Distributed training (DDP, FSDP, DeepSpeed)
- Mixed precision (FP16, BF16)
- Gradient accumulation
- Checkpointing
- Logging
- Progress bars
Common workflows
Section titled “Common workflows”Workflow 1: From PyTorch to Lightning
Section titled “Workflow 1: From PyTorch to Lightning”Original PyTorch code:
model = MyModel()optimizer = torch.optim.Adam(model.parameters())model.to('cuda')
for epoch in range(max_epochs): for batch in train_loader: batch = batch.to('cuda') optimizer.zero_grad() loss = model(batch) loss.backward() optimizer.step()Lightning version:
class LitModel(L.LightningModule): def __init__(self): super().__init__() self.model = MyModel()
def training_step(self, batch, batch_idx): loss = self.model(batch) # No .to('cuda') needed! return loss
def configure_optimizers(self): return torch.optim.Adam(self.parameters())
# Traintrainer = L.Trainer(max_epochs=10, accelerator='gpu')trainer.fit(LitModel(), train_loader)Benefits: 40+ lines → 15 lines, no device management, automatic distributed
Workflow 2: Validation and testing
Section titled “Workflow 2: Validation and testing”class LitModel(L.LightningModule): def __init__(self): super().__init__() self.model = MyModel()
def training_step(self, batch, batch_idx): x, y = batch y_hat = self.model(x) loss = nn.functional.cross_entropy(y_hat, y) self.log('train_loss', loss) return loss
def validation_step(self, batch, batch_idx): x, y = batch y_hat = self.model(x) val_loss = nn.functional.cross_entropy(y_hat, y) acc = (y_hat.argmax(dim=1) == y).float().mean() self.log('val_loss', val_loss) self.log('val_acc', acc)
def test_step(self, batch, batch_idx): x, y = batch y_hat = self.model(x) test_loss = nn.functional.cross_entropy(y_hat, y) self.log('test_loss', test_loss)
def configure_optimizers(self): return torch.optim.Adam(self.parameters(), lr=1e-3)
# Train with validationtrainer = L.Trainer(max_epochs=10)trainer.fit(model, train_loader, val_loader)
# Testtrainer.test(model, test_loader)Automatic features:
- Validation runs every epoch by default
- Metrics logged to TensorBoard
- Best model checkpointing based on val_loss
Workflow 3: Distributed training (DDP)
Section titled “Workflow 3: Distributed training (DDP)”# Same code as single GPU!model = LitModel()
# 8 GPUs with DDP (automatic!)trainer = L.Trainer( accelerator='gpu', devices=8, strategy='ddp' # Or 'fsdp', 'deepspeed')
trainer.fit(model, train_loader)Launch:
# Single command, Lightning handles the restpython train.pyNo changes needed:
- Automatic data distribution
- Gradient synchronization
- Multi-node support (just set
num_nodes=2)
Workflow 4: Callbacks for monitoring
Section titled “Workflow 4: Callbacks for monitoring”from lightning.pytorch.callbacks import ModelCheckpoint, EarlyStopping, LearningRateMonitor
# Create callbackscheckpoint = ModelCheckpoint( monitor='val_loss', mode='min', save_top_k=3, filename='model-{epoch:02d}-{val_loss:.2f}')
early_stop = EarlyStopping( monitor='val_loss', patience=5, mode='min')
lr_monitor = LearningRateMonitor(logging_interval='epoch')
# Add to Trainertrainer = L.Trainer( max_epochs=100, callbacks=[checkpoint, early_stop, lr_monitor])
trainer.fit(model, train_loader, val_loader)Result:
- Auto-saves best 3 models
- Stops early if no improvement for 5 epochs
- Logs learning rate to TensorBoard
Workflow 5: Learning rate scheduling
Section titled “Workflow 5: Learning rate scheduling”class LitModel(L.LightningModule): # ... (training_step, etc.)
def configure_optimizers(self): optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
# Cosine annealing scheduler = torch.optim.lr_scheduler.CosineAnnealingLR( optimizer, T_max=100, eta_min=1e-5 )
return { 'optimizer': optimizer, 'lr_scheduler': { 'scheduler': scheduler, 'interval': 'epoch', # Update per epoch 'frequency': 1 } }
# Learning rate auto-logged!trainer = L.Trainer(max_epochs=100)trainer.fit(model, train_loader)When to use vs alternatives
Section titled “When to use vs alternatives”Use PyTorch Lightning when:
- Want clean, organized code
- Need production-ready training loops
- Switching between single GPU, multi-GPU, TPU
- Want built-in callbacks and logging
- Team collaboration (standardized structure)
Key advantages:
- Organized: Separates research code from engineering
- Automatic: DDP, FSDP, DeepSpeed with 1 line
- Callbacks: Modular training extensions
- Reproducible: Less boilerplate = fewer bugs
- Tested: 1M+ downloads/month, battle-tested
Use alternatives instead:
- Accelerate: Minimal changes to existing code, more flexibility
- Ray Train: Multi-node orchestration, hyperparameter tuning
- Raw PyTorch: Maximum control, learning purposes
- Keras: TensorFlow ecosystem
Common issues
Section titled “Common issues”Issue: Loss not decreasing
Check data and model setup:
# Add to training_stepdef training_step(self, batch, batch_idx): if batch_idx == 0: print(f"Batch shape: {batch[0].shape}") print(f"Labels: {batch[1]}") loss = ... return lossIssue: Out of memory
Reduce batch size or use gradient accumulation:
trainer = L.Trainer( accumulate_grad_batches=4, # Effective batch = batch_size × 4 precision='bf16' # Or 'fp16', reduces memory 50%)Issue: Validation not running
Ensure you pass val_loader:
# WRONGtrainer.fit(model, train_loader)
# CORRECTtrainer.fit(model, train_loader, val_loader)Issue: DDP spawns multiple processes unexpectedly
Lightning auto-detects GPUs. Explicitly set devices:
# Test on CPU firsttrainer = L.Trainer(accelerator='cpu', devices=1)
# Then GPUtrainer = L.Trainer(accelerator='gpu', devices=1)Advanced topics
Section titled “Advanced topics”Callbacks: See references/callbacks.md for EarlyStopping, ModelCheckpoint, custom callbacks, and callback hooks.
Distributed strategies: See references/distributed.md for DDP, FSDP, DeepSpeed ZeRO integration, multi-node setup.
Hyperparameter tuning: See references/hyperparameter-tuning.md for integration with Optuna, Ray Tune, and WandB sweeps.
Hardware requirements
Section titled “Hardware requirements”- CPU: Works (good for debugging)
- Single GPU: Works
- Multi-GPU: DDP (default), FSDP, or DeepSpeed
- Multi-node: DDP, FSDP, DeepSpeed
- TPU: Supported (8 cores)
- Apple MPS: Supported
Precision options:
- FP32 (default)
- FP16 (V100, older GPUs)
- BF16 (A100/H100, recommended)
- FP8 (H100)
Resources
Section titled “Resources”- Docs: https://lightning.ai/docs/pytorch/stable/
- GitHub: https://github.com/Lightning-AI/pytorch-lightning ⭐ 29,000+
- Version: 2.5.5+
- Examples: https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples
- Discord: https://discord.gg/lightning-ai
- Used by: Kaggle winners, research labs, production teams