Skip to content

Qmd

Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration.

SourceOptional — install with hermes skills install official/research/qmd
Pathoptional-skills/research/qmd
Version1.0.0
AuthorHermes Agent + Teknium
LicenseMIT
Platformsmacos, linux
TagsSearch, Knowledge-Base, RAG, Notes, MCP, Local-AI
Related skillsobsidian, native-mcp, arxiv

The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.

Local, on-device search engine for personal knowledge bases. Indexes markdown notes, meeting transcripts, documentation, and any text-based files, then provides hybrid search combining keyword matching, semantic understanding, and LLM-powered reranking — all running locally with no cloud dependencies.

Created by Tobi Lütke. MIT licensed.

  • User asks to search their notes, docs, knowledge base, or meeting transcripts
  • User wants to find something across a large collection of markdown/text files
  • User wants semantic search (“find notes about X concept”) not just keyword grep
  • User has already set up qmd collections and wants to query them
  • User asks to set up a local knowledge base or document search system
  • Keywords: “search my notes”, “find in my docs”, “knowledge base”, “qmd”
Окно терминала
# Check version
node --version # must be >= 22
# macOS — install or upgrade via Homebrew
brew install node@22
# Linux — use NodeSource or nvm
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs
# or with nvm:
nvm install 22 && nvm use 22

SQLite with Extension Support (macOS only)

Section titled “SQLite with Extension Support (macOS only)”

macOS system SQLite lacks extension loading. Install via Homebrew:

Окно терминала
brew install sqlite
Окно терминала
npm install -g @tobilu/qmd
# or with Bun:
bun install -g @tobilu/qmd

First run auto-downloads 3 local GGUF models (~2GB total):

ModelPurposeSize
embeddinggemma-300M-Q8_0Vector embeddings~300MB
qwen3-reranker-0.6b-q8_0Result reranking~640MB
qmd-query-expansion-1.7BQuery expansion~1.1GB
Окно терминала
qmd --version
qmd status
CommandWhat It DoesSpeed
qmd search "query"BM25 keyword search (no models)~0.2s
qmd vsearch "query"Semantic vector search (1 model)~3s
qmd query "query"Hybrid + reranking (all 3 models)~2-3s warm, ~19s cold
qmd get <docid>Retrieve full document contentinstant
qmd multi-get "glob"Retrieve multiple filesinstant
qmd collection add <path> --name <n>Add a directory as a collectioninstant
qmd context add <path> "description"Add context metadata to improve retrievalinstant
qmd embedGenerate/update vector embeddingsvaries
qmd statusShow index health and collection infoinstant
qmd mcpStart MCP server (stdio)persistent
qmd mcp --http --daemonStart MCP server (HTTP, warm models)persistent

Point qmd at directories containing your documents:

Окно терминала
# Add a notes directory
qmd collection add ~/notes --name notes
# Add project docs
qmd collection add ~/projects/myproject/docs --name project-docs
# Add meeting transcripts
qmd collection add ~/meetings --name meetings
# List all collections
qmd collection list

Context metadata helps the search engine understand what each collection contains. This significantly improves retrieval quality:

Окно терминала
qmd context add qmd://notes "Personal notes, ideas, and journal entries"
qmd context add qmd://project-docs "Technical documentation for the main project"
qmd context add qmd://meetings "Meeting transcripts and action items from team syncs"
Окно терминала
qmd embed

This processes all documents in all collections and generates vector embeddings. Re-run after adding new documents or collections.

Окно терминала
qmd status # shows index health, collection stats, model info

Best for: exact terms, code identifiers, names, known phrases. No models loaded — near-instant results.

Окно терминала
qmd search "authentication middleware"
qmd search "handleError async"

Best for: natural language questions, conceptual queries. Loads embedding model (~3s first query).

Окно терминала
qmd vsearch "how does the rate limiter handle burst traffic"
qmd vsearch "ideas for improving onboarding flow"

Hybrid Search with Reranking (Best Quality)

Section titled “Hybrid Search with Reranking (Best Quality)”

Best for: important queries where quality matters most. Uses all 3 models — query expansion, parallel BM25+vector, reranking.

Окно терминала
qmd query "what decisions were made about the database migration"

Combine different search types in a single query for precision:

Окно терминала
# BM25 for exact term + vector for concept
qmd query $'lex: rate limiter\nvec: how does throttling work under load'
# With query expansion
qmd query $'expand: database migration plan\nlex: "schema change"'
SyntaxEffectExample
termPrefix matchperf matches “performance”
"phrase"Exact phrase"rate limiter"
-termExclude termperformance -sports

For complex topics, write what you expect the answer to look like:

Окно терминала
qmd query $'hyde: The migration plan involves three phases. First, we add the new columns without dropping the old ones. Then we backfill data. Finally we cut over and remove legacy columns.'
Окно терминала
qmd search "query" --collection notes
qmd query "query" --collection project-docs
Окно терминала
qmd search "query" --json # JSON output (best for parsing)
qmd search "query" --limit 5 # Limit results
qmd get "#abc123" # Get by document ID
qmd get "path/to/file.md" # Get by file path
qmd get "file.md:50" -l 100 # Get specific line range
qmd multi-get "journals/*.md" --json # Batch retrieve by glob

qmd exposes an MCP server that provides search tools directly to Hermes Agent via the native MCP client. This is the preferred integration — once configured, the agent gets qmd tools automatically without needing to load this skill.

Add to ~/.hermes/config.yaml:

mcp_servers:
qmd:
command: "qmd"
args: ["mcp"]
timeout: 30
connect_timeout: 45

This registers tools: mcp_qmd_search, mcp_qmd_vsearch, mcp_qmd_deep_search, mcp_qmd_get, mcp_qmd_status.

Tradeoff: Models load on first search call (~19s cold start), then stay warm for the session. Acceptable for occasional use.

Section titled “Option B: HTTP Daemon Mode (Fast, Recommended for Heavy Use)”

Start the qmd daemon separately — it keeps models warm in memory:

Окно терминала
# Start daemon (persists across agent restarts)
qmd mcp --http --daemon
# Runs on http://localhost:8181 by default

Then configure Hermes Agent to connect via HTTP:

mcp_servers:
qmd:
url: "http://localhost:8181/mcp"
timeout: 30

Tradeoff: Uses ~2GB RAM while running, but every query is fast (~2-3s). Best for users who search frequently.

Окно терминала
cat > ~/Library/LaunchAgents/com.qmd.daemon.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.qmd.daemon</string>
<key>ProgramArguments</key>
<array>
<string>qmd</string>
<string>mcp</string>
<string>--http</string>
<string>--daemon</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/qmd-daemon.log</string>
<key>StandardErrorPath</key>
<string>/tmp/qmd-daemon.log</string>
</dict>
</plist>
EOF
launchctl load ~/Library/LaunchAgents/com.qmd.daemon.plist
Окно терминала
mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/qmd-daemon.service << 'EOF'
[Unit]
Description=QMD MCP Daemon
After=network.target
[Service]
ExecStart=qmd mcp --http --daemon
Restart=on-failure
RestartSec=10
Environment=PATH=/usr/local/bin:/usr/bin:/bin
[Install]
WantedBy=default.target
EOF
systemctl --user daemon-reload
systemctl --user enable --now qmd-daemon
systemctl --user status qmd-daemon

Once connected, these tools are available as mcp_qmd_*:

MCP ToolMaps ToDescription
mcp_qmd_searchqmd searchBM25 keyword search
mcp_qmd_vsearchqmd vsearchSemantic vector search
mcp_qmd_deep_searchqmd queryHybrid search + reranking
mcp_qmd_getqmd getRetrieve document by ID or path
mcp_qmd_statusqmd statusIndex health and stats

The MCP tools accept structured JSON queries for multi-mode search:

{
"searches": [
{"type": "lex", "query": "authentication middleware"},
{"type": "vec", "query": "how user login is verified"}
],
"collections": ["project-docs"],
"limit": 10
}

When MCP is not configured, use qmd directly via terminal:

terminal(command="qmd query 'what was decided about the API redesign' --json", timeout=30)

For setup and management tasks, always use terminal:

terminal(command="qmd collection add ~/Documents/notes --name notes")
terminal(command="qmd context add qmd://notes 'Personal research notes and ideas'")
terminal(command="qmd embed")
terminal(command="qmd status")

Understanding the internals helps choose the right search mode:

  1. Query Expansion — A fine-tuned 1.7B model generates 2 alternative queries. The original gets 2x weight in fusion.
  2. Parallel Retrieval — BM25 (SQLite FTS5) and vector search run simultaneously across all query variants.
  3. RRF Fusion — Reciprocal Rank Fusion (k=60) merges results. Top-rank bonus: #1 gets +0.05, #2-3 get +0.02.
  4. LLM Reranking — qwen3-reranker scores top 30 candidates (0.0-1.0).
  5. Position-Aware Blending — Ranks 1-3: 75% retrieval / 25% reranker. Ranks 4-10: 60/40. Ranks 11+: 40/60 (trusts reranker more for long tail).

Smart Chunking: Documents are split at natural break points (headings, code blocks, blank lines) targeting ~900 tokens with 15% overlap. Code blocks are never split mid-block.

  1. Always add context descriptionsqmd context add dramatically improves retrieval accuracy. Describe what each collection contains.
  2. Re-embed after adding documentsqmd embed must be re-run when new files are added to collections.
  3. Use qmd search for speed — when you need fast keyword lookup (code identifiers, exact names), BM25 is instant and needs no models.
  4. Use qmd query for quality — when the question is conceptual or the user needs the best possible results, use hybrid search.
  5. Prefer MCP integration — once configured, the agent gets native tools without needing to load this skill each time.
  6. Daemon mode for frequent users — if the user searches their knowledge base regularly, recommend the HTTP daemon setup.
  7. First query in structured search gets 2x weight — put the most important/certain query first when combining lex and vec.

Normal — qmd auto-downloads ~2GB of GGUF models on first use. This is a one-time operation.

This happens when models aren’t loaded in memory. Solutions:

  • Use HTTP daemon mode (qmd mcp --http --daemon) to keep warm
  • Use qmd search (BM25 only) when models aren’t needed
  • MCP stdio mode loads models on first search, stays warm for session

Install Homebrew SQLite: brew install sqlite Then ensure it’s on PATH before system SQLite.

Run qmd collection add <path> --name <name> to add directories, then qmd embed to index them.

Embedding model override (CJK/multilingual)

Section titled “Embedding model override (CJK/multilingual)”

Set QMD_EMBED_MODEL environment variable for non-English content:

Окно терминала
export QMD_EMBED_MODEL="your-multilingual-model"
  • Index & vectors: ~/.cache/qmd/index.sqlite
  • Models: Auto-downloaded to local cache on first run
  • No cloud dependencies — everything runs locally