Outlines — Outlines: structured JSON/regex/Pydantic LLM generation
Outlines
Section titled “Outlines”Outlines: structured JSON/regex/Pydantic LLM generation.
Skill metadata
Section titled “Skill metadata”| Source | Bundled (installed by default) |
| Path | skills/mlops/inference/outlines |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | outlines, transformers, vllm, pydantic |
| Tags | Prompt Engineering, Outlines, Structured Generation, JSON Schema, Pydantic, Local Models, Grammar-Based Generation, vLLM, Transformers, Type Safety |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Outlines: Structured Text Generation
Section titled “Outlines: Structured Text Generation”When to Use This Skill
Section titled “When to Use This Skill”Use Outlines when you need to:
- Guarantee valid JSON/XML/code structure during generation
- Use Pydantic models for type-safe outputs
- Support local models (Transformers, llama.cpp, vLLM)
- Maximize inference speed with zero-overhead structured generation
- Generate against JSON schemas automatically
- Control token sampling at the grammar level
GitHub Stars: 8,000+ | From: dottxt.ai (formerly .txt)
Installation
Section titled “Installation”# Base installationpip install outlines
# With specific backendspip install outlines transformers # Hugging Face modelspip install outlines llama-cpp-python # llama.cpppip install outlines vllm # vLLM for high-throughputQuick Start
Section titled “Quick Start”Basic Example: Classification
Section titled “Basic Example: Classification”import outlinesfrom typing import Literal
# Load modelmodel = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
# Generate with type constraintprompt = "Sentiment of 'This product is amazing!': "generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])sentiment = generator(prompt)
print(sentiment) # "positive" (guaranteed one of these)With Pydantic Models
Section titled “With Pydantic Models”from pydantic import BaseModelimport outlines
class User(BaseModel): name: str age: int email: str
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
# Generate structured outputprompt = "Extract user: John Doe, 30 years old, john@example.com"generator = outlines.generate.json(model, User)user = generator(prompt)
print(user.name) # "John Doe"print(user.age) # 30print(user.email) # "john@example.com"Core Concepts
Section titled “Core Concepts”1. Constrained Token Sampling
Section titled “1. Constrained Token Sampling”Outlines uses Finite State Machines (FSM) to constrain token generation at the logit level.
How it works:
- Convert schema (JSON/Pydantic/regex) to context-free grammar (CFG)
- Transform CFG into Finite State Machine (FSM)
- Filter invalid tokens at each step during generation
- Fast-forward when only one valid token exists
Benefits:
- Zero overhead: Filtering happens at token level
- Speed improvement: Fast-forward through deterministic paths
- Guaranteed validity: Invalid outputs impossible
import outlines
# Pydantic model -> JSON schema -> CFG -> FSMclass Person(BaseModel): name: str age: int
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
# Behind the scenes:# 1. Person -> JSON schema# 2. JSON schema -> CFG# 3. CFG -> FSM# 4. FSM filters tokens during generation
generator = outlines.generate.json(model, Person)result = generator("Generate person: Alice, 25")2. Structured Generators
Section titled “2. Structured Generators”Outlines provides specialized generators for different output types.
Choice Generator
Section titled “Choice Generator”# Multiple choice selectiongenerator = outlines.generate.choice( model, ["positive", "negative", "neutral"])
sentiment = generator("Review: This is great!")# Result: One of the three choicesJSON Generator
Section titled “JSON Generator”from pydantic import BaseModel
class Product(BaseModel): name: str price: float in_stock: bool
# Generate valid JSON matching schemagenerator = outlines.generate.json(model, Product)product = generator("Extract: iPhone 15, $999, available")
# Guaranteed valid Product instanceprint(type(product)) # <class '__main__.Product'>Regex Generator
Section titled “Regex Generator”# Generate text matching regexgenerator = outlines.generate.regex( model, r"[0-9]{3}-[0-9]{3}-[0-9]{4}" # Phone number pattern)
phone = generator("Generate phone number:")# Result: "555-123-4567" (guaranteed to match pattern)Integer/Float Generators
Section titled “Integer/Float Generators”# Generate specific numeric typesint_generator = outlines.generate.integer(model)age = int_generator("Person's age:") # Guaranteed integer
float_generator = outlines.generate.float(model)price = float_generator("Product price:") # Guaranteed float3. Model Backends
Section titled “3. Model Backends”Outlines supports multiple local and API-based backends.
Transformers (Hugging Face)
Section titled “Transformers (Hugging Face)”import outlines
# Load from Hugging Facemodel = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda" # Or "cpu")
# Use with any generatorgenerator = outlines.generate.json(model, YourModel)llama.cpp
Section titled “llama.cpp”# Load GGUF modelmodel = outlines.models.llamacpp( "./models/llama-3.1-8b-instruct.Q4_K_M.gguf", n_gpu_layers=35)
generator = outlines.generate.json(model, YourModel)vLLM (High Throughput)
Section titled “vLLM (High Throughput)”# For production deploymentsmodel = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", tensor_parallel_size=2 # Multi-GPU)
generator = outlines.generate.json(model, YourModel)OpenAI (Limited Support)
Section titled “OpenAI (Limited Support)”# Basic OpenAI supportmodel = outlines.models.openai( "gpt-4o-mini", api_key="your-api-key")
# Note: Some features limited with API modelsgenerator = outlines.generate.json(model, YourModel)4. Pydantic Integration
Section titled “4. Pydantic Integration”Outlines has first-class Pydantic support with automatic schema translation.
Basic Models
Section titled “Basic Models”from pydantic import BaseModel, Field
class Article(BaseModel): title: str = Field(description="Article title") author: str = Field(description="Author name") word_count: int = Field(description="Number of words", gt=0) tags: list[str] = Field(description="List of tags")
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")generator = outlines.generate.json(model, Article)
article = generator("Generate article about AI")print(article.title)print(article.word_count) # Guaranteed > 0Nested Models
Section titled “Nested Models”class Address(BaseModel): street: str city: str country: str
class Person(BaseModel): name: str age: int address: Address # Nested model
generator = outlines.generate.json(model, Person)person = generator("Generate person in New York")
print(person.address.city) # "New York"Enums and Literals
Section titled “Enums and Literals”from enum import Enumfrom typing import Literal
class Status(str, Enum): PENDING = "pending" APPROVED = "approved" REJECTED = "rejected"
class Application(BaseModel): applicant: str status: Status # Must be one of enum values priority: Literal["low", "medium", "high"] # Must be one of literals
generator = outlines.generate.json(model, Application)app = generator("Generate application")
print(app.status) # Status.PENDING (or APPROVED/REJECTED)Common Patterns
Section titled “Common Patterns”Pattern 1: Data Extraction
Section titled “Pattern 1: Data Extraction”from pydantic import BaseModelimport outlines
class CompanyInfo(BaseModel): name: str founded_year: int industry: str employees: int
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")generator = outlines.generate.json(model, CompanyInfo)
text = """Apple Inc. was founded in 1976 in the technology industry.The company employs approximately 164,000 people worldwide."""
prompt = f"Extract company information:\n{text}\n\nCompany:"company = generator(prompt)
print(f"Name: {company.name}")print(f"Founded: {company.founded_year}")print(f"Industry: {company.industry}")print(f"Employees: {company.employees}")Pattern 2: Classification
Section titled “Pattern 2: Classification”from typing import Literalimport outlines
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
# Binary classificationgenerator = outlines.generate.choice(model, ["spam", "not_spam"])result = generator("Email: Buy now! 50% off!")
# Multi-class classificationcategories = ["technology", "business", "sports", "entertainment"]category_gen = outlines.generate.choice(model, categories)category = category_gen("Article: Apple announces new iPhone...")
# With confidenceclass Classification(BaseModel): label: Literal["positive", "negative", "neutral"] confidence: float
classifier = outlines.generate.json(model, Classification)result = classifier("Review: This product is okay, nothing special")Pattern 3: Structured Forms
Section titled “Pattern 3: Structured Forms”class UserProfile(BaseModel): full_name: str age: int email: str phone: str country: str interests: list[str]
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")generator = outlines.generate.json(model, UserProfile)
prompt = """Extract user profile from:Name: Alice JohnsonAge: 28Email: alice@example.comPhone: 555-0123Country: USAInterests: hiking, photography, cooking"""
profile = generator(prompt)print(profile.full_name)print(profile.interests) # ["hiking", "photography", "cooking"]Pattern 4: Multi-Entity Extraction
Section titled “Pattern 4: Multi-Entity Extraction”class Entity(BaseModel): name: str type: Literal["PERSON", "ORGANIZATION", "LOCATION"]
class DocumentEntities(BaseModel): entities: list[Entity]
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")generator = outlines.generate.json(model, DocumentEntities)
text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond."prompt = f"Extract entities from: {text}"
result = generator(prompt)for entity in result.entities: print(f"{entity.name} ({entity.type})")Pattern 5: Code Generation
Section titled “Pattern 5: Code Generation”class PythonFunction(BaseModel): function_name: str parameters: list[str] docstring: str body: str
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")generator = outlines.generate.json(model, PythonFunction)
prompt = "Generate a Python function to calculate factorial"func = generator(prompt)
print(f"def {func.function_name}({', '.join(func.parameters)}):")print(f' """{func.docstring}"""')print(f" {func.body}")Pattern 6: Batch Processing
Section titled “Pattern 6: Batch Processing”def batch_extract(texts: list[str], schema: type[BaseModel]): """Extract structured data from multiple texts.""" model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, schema)
results = [] for text in texts: result = generator(f"Extract from: {text}") results.append(result)
return results
class Person(BaseModel): name: str age: int
texts = [ "John is 30 years old", "Alice is 25 years old", "Bob is 40 years old"]
people = batch_extract(texts, Person)for person in people: print(f"{person.name}: {person.age}")Backend Configuration
Section titled “Backend Configuration”Transformers
Section titled “Transformers”import outlines
# Basic usagemodel = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
# GPU configurationmodel = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda", model_kwargs={"torch_dtype": "float16"})
# Popular modelsmodel = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct")model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")llama.cpp
Section titled “llama.cpp”# Load GGUF modelmodel = outlines.models.llamacpp( "./models/llama-3.1-8b.Q4_K_M.gguf", n_ctx=4096, # Context window n_gpu_layers=35, # GPU layers n_threads=8 # CPU threads)
# Full GPU offloadmodel = outlines.models.llamacpp( "./models/model.gguf", n_gpu_layers=-1 # All layers on GPU)vLLM (Production)
Section titled “vLLM (Production)”# Single GPUmodel = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")
# Multi-GPUmodel = outlines.models.vllm( "meta-llama/Llama-3.1-70B-Instruct", tensor_parallel_size=4 # 4 GPUs)
# With quantizationmodel = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", quantization="awq" # Or "gptq")Best Practices
Section titled “Best Practices”1. Use Specific Types
Section titled “1. Use Specific Types”# ✅ Good: Specific typesclass Product(BaseModel): name: str price: float # Not str quantity: int # Not str in_stock: bool # Not str
# ❌ Bad: Everything as stringclass Product(BaseModel): name: str price: str # Should be float quantity: str # Should be int2. Add Constraints
Section titled “2. Add Constraints”from pydantic import Field
# ✅ Good: With constraintsclass User(BaseModel): name: str = Field(min_length=1, max_length=100) age: int = Field(ge=0, le=120) email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")
# ❌ Bad: No constraintsclass User(BaseModel): name: str age: int email: str3. Use Enums for Categories
Section titled “3. Use Enums for Categories”# ✅ Good: Enum for fixed setclass Priority(str, Enum): LOW = "low" MEDIUM = "medium" HIGH = "high"
class Task(BaseModel): title: str priority: Priority
# ❌ Bad: Free-form stringclass Task(BaseModel): title: str priority: str # Can be anything4. Provide Context in Prompts
Section titled “4. Provide Context in Prompts”# ✅ Good: Clear contextprompt = """Extract product information from the following text.Text: iPhone 15 Pro costs $999 and is currently in stock.Product:"""
# ❌ Bad: Minimal contextprompt = "iPhone 15 Pro costs $999 and is currently in stock."5. Handle Optional Fields
Section titled “5. Handle Optional Fields”from typing import Optional
# ✅ Good: Optional fields for incomplete dataclass Article(BaseModel): title: str # Required author: Optional[str] = None # Optional date: Optional[str] = None # Optional tags: list[str] = [] # Default empty list
# Can succeed even if author/date missingComparison to Alternatives
Section titled “Comparison to Alternatives”| Feature | Outlines | Instructor | Guidance | LMQL |
|---|---|---|---|---|
| Pydantic Support | ✅ Native | ✅ Native | ❌ No | ❌ No |
| JSON Schema | ✅ Yes | ✅ Yes | ⚠️ Limited | ✅ Yes |
| Regex Constraints | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
| Local Models | ✅ Full | ⚠️ Limited | ✅ Full | ✅ Full |
| API Models | ⚠️ Limited | ✅ Full | ✅ Full | ✅ Full |
| Zero Overhead | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes |
| Automatic Retrying | ❌ No | ✅ Yes | ❌ No | ❌ No |
| Learning Curve | Low | Low | Low | High |
When to choose Outlines:
- Using local models (Transformers, llama.cpp, vLLM)
- Need maximum inference speed
- Want Pydantic model support
- Require zero-overhead structured generation
- Control token sampling process
When to choose alternatives:
- Instructor: Need API models with automatic retrying
- Guidance: Need token healing and complex workflows
- LMQL: Prefer declarative query syntax
Performance Characteristics
Section titled “Performance Characteristics”Speed:
- Zero overhead: Structured generation as fast as unconstrained
- Fast-forward optimization: Skips deterministic tokens
- 1.2-2x faster than post-generation validation approaches
Memory:
- FSM compiled once per schema (cached)
- Minimal runtime overhead
- Efficient with vLLM for high throughput
Accuracy:
- 100% valid outputs (guaranteed by FSM)
- No retry loops needed
- Deterministic token filtering
Resources
Section titled “Resources”- Documentation: https://outlines-dev.github.io/outlines
- GitHub: https://github.com/outlines-dev/outlines (8k+ stars)
- Discord: https://discord.gg/R9DSu34mGd
- Blog: https://blog.dottxt.co
See Also
Section titled “See Also”references/json_generation.md- Comprehensive JSON and Pydantic patternsreferences/backends.md- Backend-specific configurationreferences/examples.md- Production-ready examples