Faiss — Facebook's library for efficient similarity search and clustering of dense vectors
Facebook’s library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.
Skill metadata
Section titled “Skill metadata”| Source | Optional — install with hermes skills install official/mlops/faiss |
| Path | optional-skills/mlops/faiss |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | faiss-cpu, faiss-gpu, numpy |
| Tags | RAG, FAISS, Similarity Search, Vector Search, Facebook AI, GPU Acceleration, Billion-Scale, K-NN, HNSW, High Performance, Large Scale |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
FAISS - Efficient Similarity Search
Section titled “FAISS - Efficient Similarity Search”Facebook AI’s library for billion-scale vector similarity search.
When to use FAISS
Section titled “When to use FAISS”Use FAISS when:
- Need fast similarity search on large vector datasets (millions/billions)
- GPU acceleration required
- Pure vector similarity (no metadata filtering needed)
- High throughput, low latency critical
- Offline/batch processing of embeddings
Metrics:
- 31,700+ GitHub stars
- Meta/Facebook AI Research
- Handles billions of vectors
- C++ with Python bindings
Use alternatives instead:
- Chroma/Pinecone: Need metadata filtering
- Weaviate: Need full database features
- Annoy: Simpler, fewer features
Quick start
Section titled “Quick start”Installation
Section titled “Installation”# CPU onlypip install faiss-cpu
# GPU supportpip install faiss-gpuBasic usage
Section titled “Basic usage”import faissimport numpy as np
# Create sample data (1000 vectors, 128 dimensions)d = 128nb = 1000vectors = np.random.random((nb, d)).astype('float32')
# Create indexindex = faiss.IndexFlatL2(d) # L2 distanceindex.add(vectors) # Add vectors
# Searchk = 5 # Find 5 nearest neighborsquery = np.random.random((1, d)).astype('float32')distances, indices = index.search(query, k)
print(f"Nearest neighbors: {indices}")print(f"Distances: {distances}")Index types
Section titled “Index types”1. Flat (exact search)
Section titled “1. Flat (exact search)”# L2 (Euclidean) distanceindex = faiss.IndexFlatL2(d)
# Inner product (cosine similarity if normalized)index = faiss.IndexFlatIP(d)
# Slowest, most accurate2. IVF (inverted file) - Fast approximate
Section titled “2. IVF (inverted file) - Fast approximate”# Create quantizerquantizer = faiss.IndexFlatL2(d)
# IVF index with 100 clustersnlist = 100index = faiss.IndexIVFFlat(quantizer, d, nlist)
# Train on dataindex.train(vectors)
# Add vectorsindex.add(vectors)
# Search (nprobe = clusters to search)index.nprobe = 10distances, indices = index.search(query, k)3. HNSW (Hierarchical NSW) - Best quality/speed
Section titled “3. HNSW (Hierarchical NSW) - Best quality/speed”# HNSW indexM = 32 # Number of connections per layerindex = faiss.IndexHNSWFlat(d, M)
# No training neededindex.add(vectors)
# Searchdistances, indices = index.search(query, k)4. Product Quantization - Memory efficient
Section titled “4. Product Quantization - Memory efficient”# PQ reduces memory by 16-32×m = 8 # Number of subquantizersnbits = 8index = faiss.IndexPQ(d, m, nbits)
# Train and addindex.train(vectors)index.add(vectors)Save and load
Section titled “Save and load”# Save indexfaiss.write_index(index, "large.index")
# Load indexindex = faiss.read_index("large.index")
# Continue usingdistances, indices = index.search(query, k)GPU acceleration
Section titled “GPU acceleration”# Single GPUres = faiss.StandardGpuResources()index_cpu = faiss.IndexFlatL2(d)index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0
# Multi-GPUindex_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
# 10-100× faster than CPULangChain integration
Section titled “LangChain integration”from langchain_community.vectorstores import FAISSfrom langchain_openai import OpenAIEmbeddings
# Create FAISS vector storevectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
# Savevectorstore.save_local("faiss_index")
# Loadvectorstore = FAISS.load_local( "faiss_index", OpenAIEmbeddings(), allow_dangerous_deserialization=True)
# Searchresults = vectorstore.similarity_search("query", k=5)LlamaIndex integration
Section titled “LlamaIndex integration”from llama_index.vector_stores.faiss import FaissVectorStoreimport faiss
# Create FAISS indexd = 1536faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)Best practices
Section titled “Best practices”- Choose right index type - Flat for <10K, IVF for 10K-1M, HNSW for quality
- Normalize for cosine - Use IndexFlatIP with normalized vectors
- Use GPU for large datasets - 10-100× faster
- Save trained indices - Training is expensive
- Tune nprobe/ef_search - Balance speed/accuracy
- Monitor memory - PQ for large datasets
- Batch queries - Better GPU utilization
Performance
Section titled “Performance”| Index Type | Build Time | Search Time | Memory | Accuracy |
|---|---|---|---|---|
| Flat | Fast | Slow | High | 100% |
| IVF | Medium | Fast | Medium | 95-99% |
| HNSW | Slow | Fastest | High | 99% |
| PQ | Medium | Fast | Low | 90-95% |
Resources
Section titled “Resources”- GitHub: https://github.com/facebookresearch/faiss ⭐ 31,700+
- Wiki: https://github.com/facebookresearch/faiss/wiki
- License: MIT