Qdrant Vector Search — High-performance vector similarity search engine for RAG and semantic search
Qdrant Vector Search
Section titled “Qdrant Vector Search”High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.
Skill metadata
Section titled “Skill metadata”| Source | Optional — install with hermes skills install official/mlops/qdrant |
| Path | optional-skills/mlops/qdrant |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | qdrant-client>=1.12.0 |
| Tags | RAG, Vector Search, Qdrant, Semantic Search, Embeddings, Similarity Search, HNSW, Production, Distributed |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Qdrant - Vector Similarity Search Engine
Section titled “Qdrant - Vector Similarity Search Engine”High-performance vector database written in Rust for production RAG and semantic search.
When to use Qdrant
Section titled “When to use Qdrant”Use Qdrant when:
- Building production RAG systems requiring low latency
- Need hybrid search (vectors + metadata filtering)
- Require horizontal scaling with sharding/replication
- Want on-premise deployment with full data control
- Need multi-vector storage per record (dense + sparse)
- Building real-time recommendation systems
Key features:
- Rust-powered: Memory-safe, high performance
- Rich filtering: Filter by any payload field during search
- Multiple vectors: Dense, sparse, multi-dense per point
- Quantization: Scalar, product, binary for memory efficiency
- Distributed: Raft consensus, sharding, replication
- REST + gRPC: Both APIs with full feature parity
Use alternatives instead:
- Chroma: Simpler setup, embedded use cases
- FAISS: Maximum raw speed, research/batch processing
- Pinecone: Fully managed, zero ops preferred
- Weaviate: GraphQL preference, built-in vectorizers
Quick start
Section titled “Quick start”Installation
Section titled “Installation”# Python clientpip install qdrant-client
# Docker (recommended for development)docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
# Docker with persistent storagedocker run -p 6333:6333 -p 6334:6334 \ -v $(pwd)/qdrant_storage:/qdrant/storage \ qdrant/qdrantBasic usage
Section titled “Basic usage”from qdrant_client import QdrantClientfrom qdrant_client.models import Distance, VectorParams, PointStruct
# Connect to Qdrantclient = QdrantClient(host="localhost", port=6333)
# Create collectionclient.create_collection( collection_name="documents", vectors_config=VectorParams(size=384, distance=Distance.COSINE))
# Insert vectors with payloadclient.upsert( collection_name="documents", points=[ PointStruct( id=1, vector=[0.1, 0.2, ...], # 384-dim vector payload={"title": "Doc 1", "category": "tech"} ), PointStruct( id=2, vector=[0.3, 0.4, ...], payload={"title": "Doc 2", "category": "science"} ) ])
# Search with filteringresults = client.search( collection_name="documents", query_vector=[0.15, 0.25, ...], query_filter={ "must": [{"key": "category", "match": {"value": "tech"}}] }, limit=10)
for point in results: print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")Core concepts
Section titled “Core concepts”Points - Basic data unit
Section titled “Points - Basic data unit”from qdrant_client.models import PointStruct
# Point = ID + Vector(s) + Payloadpoint = PointStruct( id=123, # Integer or UUID string vector=[0.1, 0.2, 0.3, ...], # Dense vector payload={ # Arbitrary JSON metadata "title": "Document title", "category": "tech", "timestamp": 1699900000, "tags": ["python", "ml"] })
# Batch upsert (recommended)client.upsert( collection_name="documents", points=[point1, point2, point3], wait=True # Wait for indexing)Collections - Vector containers
Section titled “Collections - Vector containers”from qdrant_client.models import VectorParams, Distance, HnswConfigDiff
# Create with HNSW configurationclient.create_collection( collection_name="documents", vectors_config=VectorParams( size=384, # Vector dimensions distance=Distance.COSINE # COSINE, EUCLID, DOT, MANHATTAN ), hnsw_config=HnswConfigDiff( m=16, # Connections per node (default 16) ef_construct=100, # Build-time accuracy (default 100) full_scan_threshold=10000 # Switch to brute force below this ), on_disk_payload=True # Store payload on disk)
# Collection infoinfo = client.get_collection("documents")print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")Distance metrics
Section titled “Distance metrics”| Metric | Use Case | Range |
|---|---|---|
COSINE | Text embeddings, normalized vectors | 0 to 2 |
EUCLID | Spatial data, image features | 0 to ∞ |
DOT | Recommendations, unnormalized | -∞ to ∞ |
MANHATTAN | Sparse features, discrete data | 0 to ∞ |
Search operations
Section titled “Search operations”Basic search
Section titled “Basic search”# Simple nearest neighbor searchresults = client.search( collection_name="documents", query_vector=[0.1, 0.2, ...], limit=10, with_payload=True, with_vectors=False # Don't return vectors (faster))Filtered search
Section titled “Filtered search”from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
# Complex filteringresults = client.search( collection_name="documents", query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition(key="category", match=MatchValue(value="tech")), FieldCondition(key="timestamp", range=Range(gte=1699000000)) ], must_not=[ FieldCondition(key="status", match=MatchValue(value="archived")) ] ), limit=10)
# Shorthand filter syntaxresults = client.search( collection_name="documents", query_vector=query_embedding, query_filter={ "must": [ {"key": "category", "match": {"value": "tech"}}, {"key": "price", "range": {"gte": 10, "lte": 100}} ] }, limit=10)Batch search
Section titled “Batch search”from qdrant_client.models import SearchRequest
# Multiple queries in one requestresults = client.search_batch( collection_name="documents", requests=[ SearchRequest(vector=[0.1, ...], limit=5), SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}), SearchRequest(vector=[0.3, ...], limit=10) ])RAG integration
Section titled “RAG integration”With sentence-transformers
Section titled “With sentence-transformers”from sentence_transformers import SentenceTransformerfrom qdrant_client import QdrantClientfrom qdrant_client.models import VectorParams, Distance, PointStruct
# Initializeencoder = SentenceTransformer("all-MiniLM-L6-v2")client = QdrantClient(host="localhost", port=6333)
# Create collectionclient.create_collection( collection_name="knowledge_base", vectors_config=VectorParams(size=384, distance=Distance.COSINE))
# Index documentsdocuments = [ {"id": 1, "text": "Python is a programming language", "source": "wiki"}, {"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"},]
points = [ PointStruct( id=doc["id"], vector=encoder.encode(doc["text"]).tolist(), payload={"text": doc["text"], "source": doc["source"]} ) for doc in documents]client.upsert(collection_name="knowledge_base", points=points)
# RAG retrievaldef retrieve(query: str, top_k: int = 5) -> list[dict]: query_vector = encoder.encode(query).tolist() results = client.search( collection_name="knowledge_base", query_vector=query_vector, limit=top_k ) return [{"text": r.payload["text"], "score": r.score} for r in results]
# Use in RAG pipelinecontext = retrieve("What is Python?")prompt = f"Context: {context}\n\nQuestion: What is Python?"With LangChain
Section titled “With LangChain”from langchain_community.vectorstores import Qdrantfrom langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")retriever = vectorstore.as_retriever(search_kwargs={"k": 5})With LlamaIndex
Section titled “With LlamaIndex”from llama_index.vector_stores.qdrant import QdrantVectorStorefrom llama_index.core import VectorStoreIndex, StorageContext
vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)query_engine = index.as_query_engine()Multi-vector support
Section titled “Multi-vector support”Named vectors (different embedding models)
Section titled “Named vectors (different embedding models)”from qdrant_client.models import VectorParams, Distance
# Collection with multiple vector typesclient.create_collection( collection_name="hybrid_search", vectors_config={ "dense": VectorParams(size=384, distance=Distance.COSINE), "sparse": VectorParams(size=30000, distance=Distance.DOT) })
# Insert with named vectorsclient.upsert( collection_name="hybrid_search", points=[ PointStruct( id=1, vector={ "dense": dense_embedding, "sparse": sparse_embedding }, payload={"text": "document text"} ) ])
# Search specific vectorresults = client.search( collection_name="hybrid_search", query_vector=("dense", query_dense), # Specify which vector limit=10)Sparse vectors (BM25, SPLADE)
Section titled “Sparse vectors (BM25, SPLADE)”from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector
# Collection with sparse vectorsclient.create_collection( collection_name="sparse_search", vectors_config={}, sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))})
# Insert sparse vectorclient.upsert( collection_name="sparse_search", points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})])Quantization (memory optimization)
Section titled “Quantization (memory optimization)”from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
# Scalar quantization (4x memory reduction)client.create_collection( collection_name="quantized", vectors_config=VectorParams(size=384, distance=Distance.COSINE), quantization_config=ScalarQuantization( scalar=ScalarQuantizationConfig( type=ScalarType.INT8, quantile=0.99, # Clip outliers always_ram=True # Keep quantized in RAM ) ))
# Search with rescoringresults = client.search( collection_name="quantized", query_vector=query, search_params={"quantization": {"rescore": True}}, # Rescore top results limit=10)Payload indexing
Section titled “Payload indexing”from qdrant_client.models import PayloadSchemaType
# Create payload index for faster filteringclient.create_payload_index( collection_name="documents", field_name="category", field_schema=PayloadSchemaType.KEYWORD)
client.create_payload_index( collection_name="documents", field_name="timestamp", field_schema=PayloadSchemaType.INTEGER)
# Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOLProduction deployment
Section titled “Production deployment”Qdrant Cloud
Section titled “Qdrant Cloud”from qdrant_client import QdrantClient
# Connect to Qdrant Cloudclient = QdrantClient( url="https://your-cluster.cloud.qdrant.io", api_key="your-api-key")Performance tuning
Section titled “Performance tuning”# Optimize for search speed (higher recall)client.update_collection( collection_name="documents", hnsw_config=HnswConfigDiff(ef_construct=200, m=32))
# Optimize for indexing speed (bulk loads)client.update_collection( collection_name="documents", optimizer_config={"indexing_threshold": 20000})Best practices
Section titled “Best practices”- Batch operations - Use batch upsert/search for efficiency
- Payload indexing - Index fields used in filters
- Quantization - Enable for large collections (>1M vectors)
- Sharding - Use for collections >10M vectors
- On-disk storage - Enable
on_disk_payloadfor large payloads - Connection pooling - Reuse client instances
Common issues
Section titled “Common issues”Slow search with filters:
# Create payload index for filtered fieldsclient.create_payload_index( collection_name="docs", field_name="category", field_schema=PayloadSchemaType.KEYWORD)Out of memory:
# Enable quantization and on-disk storageclient.create_collection( collection_name="large_collection", vectors_config=VectorParams(size=384, distance=Distance.COSINE), quantization_config=ScalarQuantization(...), on_disk_payload=True)Connection issues:
# Use timeout and retryclient = QdrantClient( host="localhost", port=6333, timeout=30, prefer_grpc=True # gRPC for better performance)References
Section titled “References”- Advanced Usage - Distributed mode, hybrid search, recommendations
- Troubleshooting - Common issues, debugging, performance tuning
Resources
Section titled “Resources”- GitHub: https://github.com/qdrant/qdrant (22k+ stars)
- Docs: https://qdrant.tech/documentation/
- Python Client: https://github.com/qdrant/qdrant-client
- Cloud: https://cloud.qdrant.io
- Version: 1.12.0+
- License: Apache 2.0