Guidance
Guidance
Section titled “Guidance”Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research’s constrained generation framework
Skill metadata
Section titled “Skill metadata”| Source | Optional — install with hermes skills install official/mlops/guidance |
| Path | optional-skills/mlops/guidance |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | guidance, transformers |
| Tags | Prompt Engineering, Guidance, Constrained Generation, Structured Output, JSON Validation, Grammar, Microsoft Research, Format Enforcement, Multi-Step Workflows |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Guidance: Constrained LLM Generation
Section titled “Guidance: Constrained LLM Generation”When to Use This Skill
Section titled “When to Use This Skill”Use Guidance when you need to:
- Control LLM output syntax with regex or grammars
- Guarantee valid JSON/XML/code generation
- Reduce latency vs traditional prompting approaches
- Enforce structured formats (dates, emails, IDs, etc.)
- Build multi-step workflows with Pythonic control flow
- Prevent invalid outputs through grammatical constraints
GitHub Stars: 18,000+ | From: Microsoft Research
Installation
Section titled “Installation”# Base installationpip install guidance
# With specific backendspip install guidance[transformers] # Hugging Face modelspip install guidance[llama_cpp] # llama.cpp modelsQuick Start
Section titled “Quick Start”Basic Example: Structured Generation
Section titled “Basic Example: Structured Generation”from guidance import models, gen
# Load model (supports OpenAI, Transformers, llama.cpp)lm = models.OpenAI("gpt-4")
# Generate with constraintsresult = lm + "The capital of France is " + gen("capital", max_tokens=5)
print(result["capital"]) # "Paris"With Anthropic Claude
Section titled “With Anthropic Claude”from guidance import models, gen, system, user, assistant
# Configure Claudelm = models.Anthropic("claude-sonnet-4-5-20250929")
# Use context managers for chat formatwith system(): lm += "You are a helpful assistant."
with user(): lm += "What is the capital of France?"
with assistant(): lm += gen(max_tokens=20)Core Concepts
Section titled “Core Concepts”1. Context Managers
Section titled “1. Context Managers”Guidance uses Pythonic context managers for chat-style interactions.
from guidance import system, user, assistant, gen
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# System messagewith system(): lm += "You are a JSON generation expert."
# User messagewith user(): lm += "Generate a person object with name and age."
# Assistant responsewith assistant(): lm += gen("response", max_tokens=100)
print(lm["response"])Benefits:
- Natural chat flow
- Clear role separation
- Easy to read and maintain
2. Constrained Generation
Section titled “2. Constrained Generation”Guidance ensures outputs match specified patterns using regex or grammars.
Regex Constraints
Section titled “Regex Constraints”from guidance import models, gen
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# Constrain to valid email formatlm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
# Constrain to date format (YYYY-MM-DD)lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}")
# Constrain to phone numberlm += "Phone: " + gen("phone", regex=r"\d{3}-\d{3}-\d{4}")
print(lm["email"]) # Guaranteed valid emailprint(lm["date"]) # Guaranteed YYYY-MM-DD formatHow it works:
- Regex converted to grammar at token level
- Invalid tokens filtered during generation
- Model can only produce matching outputs
Selection Constraints
Section titled “Selection Constraints”from guidance import models, gen, select
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# Constrain to specific choiceslm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")
# Multiple-choice selectionlm += "Best answer: " + select( ["A) Paris", "B) London", "C) Berlin", "D) Madrid"], name="answer")
print(lm["sentiment"]) # One of: positive, negative, neutralprint(lm["answer"]) # One of: A, B, C, or D3. Token Healing
Section titled “3. Token Healing”Guidance automatically “heals” token boundaries between prompt and generation.
Problem: Tokenization creates unnatural boundaries.
# Without token healingprompt = "The capital of France is "# Last token: " is "# First generated token might be " Par" (with leading space)# Result: "The capital of France is Paris" (double space!)Solution: Guidance backs up one token and regenerates.
from guidance import models, gen
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# Token healing enabled by defaultlm += "The capital of France is " + gen("capital", max_tokens=5)# Result: "The capital of France is Paris" (correct spacing)Benefits:
- Natural text boundaries
- No awkward spacing issues
- Better model performance (sees natural token sequences)
4. Grammar-Based Generation
Section titled “4. Grammar-Based Generation”Define complex structures using context-free grammars.
from guidance import models, gen
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# JSON grammar (simplified)json_grammar = """{ "name": <gen name regex="[A-Za-z ]+" max_tokens=20>, "age": <gen age regex="[0-9]+" max_tokens=3>, "email": <gen email regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}" max_tokens=50>}"""
# Generate valid JSONlm += gen("person", grammar=json_grammar)
print(lm["person"]) # Guaranteed valid JSON structureUse cases:
- Complex structured outputs
- Nested data structures
- Programming language syntax
- Domain-specific languages
5. Guidance Functions
Section titled “5. Guidance Functions”Create reusable generation patterns with the @guidance decorator.
from guidance import guidance, gen, models
@guidancedef generate_person(lm): """Generate a person with name and age.""" lm += "Name: " + gen("name", max_tokens=20, stop="\n") lm += "\nAge: " + gen("age", regex=r"[0-9]+", max_tokens=3) return lm
# Use the functionlm = models.Anthropic("claude-sonnet-4-5-20250929")lm = generate_person(lm)
print(lm["name"])print(lm["age"])Stateful Functions:
@guidance(stateless=False)def react_agent(lm, question, tools, max_rounds=5): """ReAct agent with tool use.""" lm += f"Question: {question}\n\n"
for i in range(max_rounds): # Thought lm += f"Thought {i+1}: " + gen("thought", stop="\n")
# Action lm += "\nAction: " + select(list(tools.keys()), name="action")
# Execute tool tool_result = tools[lm["action"]]() lm += f"\nObservation: {tool_result}\n\n"
# Check if done lm += "Done? " + select(["Yes", "No"], name="done") if lm["done"] == "Yes": break
# Final answer lm += "\nFinal Answer: " + gen("answer", max_tokens=100) return lmBackend Configuration
Section titled “Backend Configuration”Anthropic Claude
Section titled “Anthropic Claude”from guidance import models
lm = models.Anthropic( model="claude-sonnet-4-5-20250929", api_key="your-api-key" # Or set ANTHROPIC_API_KEY env var)OpenAI
Section titled “OpenAI”lm = models.OpenAI( model="gpt-4o-mini", api_key="your-api-key" # Or set OPENAI_API_KEY env var)Local Models (Transformers)
Section titled “Local Models (Transformers)”from guidance.models import Transformers
lm = Transformers( "microsoft/Phi-4-mini-instruct", device="cuda" # Or "cpu")Local Models (llama.cpp)
Section titled “Local Models (llama.cpp)”from guidance.models import LlamaCpp
lm = LlamaCpp( model_path="/path/to/model.gguf", n_ctx=4096, n_gpu_layers=35)Common Patterns
Section titled “Common Patterns”Pattern 1: JSON Generation
Section titled “Pattern 1: JSON Generation”from guidance import models, gen, system, user, assistant
lm = models.Anthropic("claude-sonnet-4-5-20250929")
with system(): lm += "You generate valid JSON."
with user(): lm += "Generate a user profile with name, age, and email."
with assistant(): lm += """{ "name": """ + gen("name", regex=r'"[A-Za-z ]+"', max_tokens=30) + """, "age": """ + gen("age", regex=r"[0-9]+", max_tokens=3) + """, "email": """ + gen("email", regex=r'"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"', max_tokens=50) + """}"""
print(lm) # Valid JSON guaranteedPattern 2: Classification
Section titled “Pattern 2: Classification”from guidance import models, gen, select
lm = models.Anthropic("claude-sonnet-4-5-20250929")
text = "This product is amazing! I love it."
lm += f"Text: {text}\n"lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")lm += "\nConfidence: " + gen("confidence", regex=r"[0-9]+", max_tokens=3) + "%"
print(f"Sentiment: {lm['sentiment']}")print(f"Confidence: {lm['confidence']}%")Pattern 3: Multi-Step Reasoning
Section titled “Pattern 3: Multi-Step Reasoning”from guidance import models, gen, guidance
@guidancedef chain_of_thought(lm, question): """Generate answer with step-by-step reasoning.""" lm += f"Question: {question}\n\n"
# Generate multiple reasoning steps for i in range(3): lm += f"Step {i+1}: " + gen(f"step_{i+1}", stop="\n", max_tokens=100) + "\n"
# Final answer lm += "\nTherefore, the answer is: " + gen("answer", max_tokens=50)
return lm
lm = models.Anthropic("claude-sonnet-4-5-20250929")lm = chain_of_thought(lm, "What is 15% of 200?")
print(lm["answer"])Pattern 4: ReAct Agent
Section titled “Pattern 4: ReAct Agent”from guidance import models, gen, select, guidance
@guidance(stateless=False)def react_agent(lm, question): """ReAct agent with tool use.""" tools = { "calculator": lambda expr: eval(expr), "search": lambda query: f"Search results for: {query}", }
lm += f"Question: {question}\n\n"
for round in range(5): # Thought lm += f"Thought: " + gen("thought", stop="\n") + "\n"
# Action selection lm += "Action: " + select(["calculator", "search", "answer"], name="action")
if lm["action"] == "answer": lm += "\nFinal Answer: " + gen("answer", max_tokens=100) break
# Action input lm += "\nAction Input: " + gen("action_input", stop="\n") + "\n"
# Execute tool if lm["action"] in tools: result = tools[lm["action"]](lm["action_input"]) lm += f"Observation: {result}\n\n"
return lm
lm = models.Anthropic("claude-sonnet-4-5-20250929")lm = react_agent(lm, "What is 25 * 4 + 10?")print(lm["answer"])Pattern 5: Data Extraction
Section titled “Pattern 5: Data Extraction”from guidance import models, gen, guidance
@guidancedef extract_entities(lm, text): """Extract structured entities from text.""" lm += f"Text: {text}\n\n"
# Extract person lm += "Person: " + gen("person", stop="\n", max_tokens=30) + "\n"
# Extract organization lm += "Organization: " + gen("organization", stop="\n", max_tokens=30) + "\n"
# Extract date lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}", max_tokens=10) + "\n"
# Extract location lm += "Location: " + gen("location", stop="\n", max_tokens=30) + "\n"
return lm
text = "Tim Cook announced at Apple Park on 2024-09-15 in Cupertino."
lm = models.Anthropic("claude-sonnet-4-5-20250929")lm = extract_entities(lm, text)
print(f"Person: {lm['person']}")print(f"Organization: {lm['organization']}")print(f"Date: {lm['date']}")print(f"Location: {lm['location']}")Best Practices
Section titled “Best Practices”1. Use Regex for Format Validation
Section titled “1. Use Regex for Format Validation”# ✅ Good: Regex ensures valid formatlm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
# ❌ Bad: Free generation may produce invalid emailslm += "Email: " + gen("email", max_tokens=50)2. Use select() for Fixed Categories
Section titled “2. Use select() for Fixed Categories”# ✅ Good: Guaranteed valid categorylm += "Status: " + select(["pending", "approved", "rejected"], name="status")
# ❌ Bad: May generate typos or invalid valueslm += "Status: " + gen("status", max_tokens=20)3. Leverage Token Healing
Section titled “3. Leverage Token Healing”# Token healing is enabled by default# No special action needed - just concatenate naturallylm += "The capital is " + gen("capital") # Automatic healing4. Use stop Sequences
Section titled “4. Use stop Sequences”# ✅ Good: Stop at newline for single-line outputslm += "Name: " + gen("name", stop="\n")
# ❌ Bad: May generate multiple lineslm += "Name: " + gen("name", max_tokens=50)5. Create Reusable Functions
Section titled “5. Create Reusable Functions”# ✅ Good: Reusable pattern@guidancedef generate_person(lm): lm += "Name: " + gen("name", stop="\n") lm += "\nAge: " + gen("age", regex=r"[0-9]+") return lm
# Use multiple timeslm = generate_person(lm)lm += "\n\n"lm = generate_person(lm)6. Balance Constraints
Section titled “6. Balance Constraints”# ✅ Good: Reasonable constraintslm += gen("name", regex=r"[A-Za-z ]+", max_tokens=30)
# ❌ Too strict: May fail or be very slowlm += gen("name", regex=r"^(John|Jane)$", max_tokens=10)Comparison to Alternatives
Section titled “Comparison to Alternatives”| Feature | Guidance | Instructor | Outlines | LMQL |
|---|---|---|---|---|
| Regex Constraints | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
| Grammar Support | ✅ CFG | ❌ No | ✅ CFG | ✅ CFG |
| Pydantic Validation | ❌ No | ✅ Yes | ✅ Yes | ❌ No |
| Token Healing | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
| Local Models | ✅ Yes | ⚠️ Limited | ✅ Yes | ✅ Yes |
| API Models | ✅ Yes | ✅ Yes | ⚠️ Limited | ✅ Yes |
| Pythonic Syntax | ✅ Yes | ✅ Yes | ✅ Yes | ❌ SQL-like |
| Learning Curve | Low | Low | Medium | High |
When to choose Guidance:
- Need regex/grammar constraints
- Want token healing
- Building complex workflows with control flow
- Using local models (Transformers, llama.cpp)
- Prefer Pythonic syntax
When to choose alternatives:
- Instructor: Need Pydantic validation with automatic retrying
- Outlines: Need JSON schema validation
- LMQL: Prefer declarative query syntax
Performance Characteristics
Section titled “Performance Characteristics”Latency Reduction:
- 30-50% faster than traditional prompting for constrained outputs
- Token healing reduces unnecessary regeneration
- Grammar constraints prevent invalid token generation
Memory Usage:
- Minimal overhead vs unconstrained generation
- Grammar compilation cached after first use
- Efficient token filtering at inference time
Token Efficiency:
- Prevents wasted tokens on invalid outputs
- No need for retry loops
- Direct path to valid outputs
Resources
Section titled “Resources”- Documentation: https://guidance.readthedocs.io
- GitHub: https://github.com/guidance-ai/guidance (18k+ stars)
- Notebooks: https://github.com/guidance-ai/guidance/tree/main/notebooks
- Discord: Community support available
See Also
Section titled “See Also”references/constraints.md- Comprehensive regex and grammar patternsreferences/backends.md- Backend-specific configurationreferences/examples.md- Production-ready examples