Context Engine Plugins
Building a Context Engine Plugin
Section titled “Building a Context Engine Plugin”Context engine plugins replace the built-in ContextCompressor with an alternative strategy for managing conversation context. For example, a Lossless Context Management (LCM) engine that builds a knowledge DAG instead of lossy summarization.
How it works
Section titled “How it works”The agent’s context management is built on the ContextEngine ABC (agent/context_engine.py). The built-in ContextCompressor is the default implementation. Plugin engines must implement the same interface.
Only one context engine can be active at a time. Selection is config-driven:
context: engine: "compressor" # default built-in engine: "lcm" # activates a plugin engine named "lcm"Plugin engines are never auto-activated — the user must explicitly set context.engine to the plugin’s name.
Directory structure
Section titled “Directory structure”Each context engine lives in plugins/context_engine/<name>/:
plugins/context_engine/lcm/├── __init__.py # exports the ContextEngine subclass├── plugin.yaml # metadata (name, description, version)└── ... # any other modules your engine needsThe ContextEngine ABC
Section titled “The ContextEngine ABC”Your engine must implement these required methods:
from agent.context_engine import ContextEngine
class LCMEngine(ContextEngine):
@property def name(self) -> str: """Short identifier, e.g. 'lcm'. Must match config.yaml value.""" return "lcm"
def update_from_response(self, usage: dict) -> None: """Called after every LLM call with the usage dict.
Update self.last_prompt_tokens, self.last_completion_tokens, self.last_total_tokens from the response. """
def should_compress(self, prompt_tokens: int = None) -> bool: """Return True if compaction should fire this turn."""
def compress(self, messages: list, current_tokens: int = None, focus_topic: str = None) -> list: """Compact the message list and return a new (possibly shorter) list.
The returned list must be a valid OpenAI-format message sequence.
``focus_topic`` is an optional topic string from manual ``/compress <focus>``; engines that support guided compression should prioritise preserving information related to it, others may ignore it. """Class attributes your engine must maintain
Section titled “Class attributes your engine must maintain”The agent reads these directly for display and logging:
last_prompt_tokens: int = 0last_completion_tokens: int = 0last_total_tokens: int = 0threshold_tokens: int = 0 # when compression triggerscontext_length: int = 0 # model's full context windowcompression_count: int = 0 # how many times compress() has runOptional methods
Section titled “Optional methods”These have sensible defaults in the ABC. Override as needed:
| Method | Default | Override when |
|---|---|---|
on_session_start(session_id, **kwargs) | No-op | You need to load persisted state (DAG, DB) |
on_session_end(session_id, messages) | No-op | You need to flush state, close connections |
on_session_reset() | Resets token counters | You have per-session state to clear |
update_model(model, context_length, ...) | Updates context_length + threshold | You need to recalculate budgets on model switch |
get_tool_schemas() | Returns [] | Your engine provides agent-callable tools (e.g., lcm_grep) |
handle_tool_call(name, args, **kwargs) | Returns error JSON | You implement tool handlers |
should_compress_preflight(messages) | Returns False | You can do a cheap pre-API-call estimate |
get_status() | Standard token/threshold dict | You have custom metrics to expose |
Engine tools
Section titled “Engine tools”Context engines can expose tools the agent calls directly. Return schemas from get_tool_schemas() and handle calls in handle_tool_call():
def get_tool_schemas(self): return [{ "name": "lcm_grep", "description": "Search the context knowledge graph", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"], }, }]
def handle_tool_call(self, name, args, **kwargs): if name == "lcm_grep": results = self._search_dag(args["query"]) return json.dumps({"results": results}) return json.dumps({"error": f"Unknown tool: {name}"})Engine tools are injected into the agent’s tool list at startup and dispatched automatically — no registry registration needed.
Registration
Section titled “Registration”Via directory (recommended)
Section titled “Via directory (recommended)”Place your engine in plugins/context_engine/<name>/. The __init__.py must export a ContextEngine subclass. The discovery system finds and instantiates it automatically.
Via general plugin system
Section titled “Via general plugin system”A general plugin can also register a context engine:
def register(ctx): engine = LCMEngine(context_length=200000) ctx.register_context_engine(engine)Only one engine can be registered. A second plugin attempting to register is rejected with a warning.
Lifecycle
Section titled “Lifecycle”1. Engine instantiated (plugin load or directory discovery)2. on_session_start() — conversation begins3. update_from_response() — after each API call4. should_compress() — checked each turn5. compress() — called when should_compress() returns True6. on_session_end() — session boundary (CLI exit, /reset, gateway expiry)on_session_reset() is called on /new or /reset to clear per-session state without a full shutdown.
Configuration
Section titled “Configuration”Users select your engine via hermes plugins → Provider Plugins → Context Engine, or by editing config.yaml:
context: engine: "lcm" # must match your engine's name propertyThe compression config block (compression.threshold, compression.protect_last_n, etc.) is specific to the built-in ContextCompressor. Your engine should define its own config format if needed, reading from config.yaml during initialization.
Testing
Section titled “Testing”from agent.context_engine import ContextEngine
def test_engine_satisfies_abc(): engine = YourEngine(context_length=200000) assert isinstance(engine, ContextEngine) assert engine.name == "your-name"
def test_compress_returns_valid_messages(): engine = YourEngine(context_length=200000) msgs = [{"role": "user", "content": "hello"}] result = engine.compress(msgs) assert isinstance(result, list) assert all("role" in m for m in result)See tests/agent/test_context_engine.py for the full ABC contract test suite.
See also
Section titled “See also”- Context Compression and Caching — how the built-in compressor works
- Memory Provider Plugins — analogous single-select plugin system for memory
- Plugins — general plugin system overview