Chroma — Open-source embedding database for AI applications
Chroma
Section titled “Chroma”Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects.
Skill metadata
Section titled “Skill metadata”| Source | Optional — install with hermes skills install official/mlops/chroma |
| Path | optional-skills/mlops/chroma |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | chromadb, sentence-transformers |
| Tags | RAG, Chroma, Vector Database, Embeddings, Semantic Search, Open Source, Self-Hosted, Document Retrieval, Metadata Filtering |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Chroma - Open-Source Embedding Database
Section titled “Chroma - Open-Source Embedding Database”The AI-native database for building LLM applications with memory.
When to use Chroma
Section titled “When to use Chroma”Use Chroma when:
- Building RAG (retrieval-augmented generation) applications
- Need local/self-hosted vector database
- Want open-source solution (Apache 2.0)
- Prototyping in notebooks
- Semantic search over documents
- Storing embeddings with metadata
Metrics:
- 24,300+ GitHub stars
- 1,900+ forks
- v1.3.3 (stable, weekly releases)
- Apache 2.0 license
Use alternatives instead:
- Pinecone: Managed cloud, auto-scaling
- FAISS: Pure similarity search, no metadata
- Weaviate: Production ML-native database
- Qdrant: High performance, Rust-based
Quick start
Section titled “Quick start”Installation
Section titled “Installation”# Pythonpip install chromadb
# JavaScript/TypeScriptnpm install chromadb @chroma-core/default-embedBasic usage (Python)
Section titled “Basic usage (Python)”import chromadb
# Create clientclient = chromadb.Client()
# Create collectioncollection = client.create_collection(name="my_collection")
# Add documentscollection.add( documents=["This is document 1", "This is document 2"], metadatas=[{"source": "doc1"}, {"source": "doc2"}], ids=["id1", "id2"])
# Queryresults = collection.query( query_texts=["document about topic"], n_results=2)
print(results)Core operations
Section titled “Core operations”1. Create collection
Section titled “1. Create collection”# Simple collectioncollection = client.create_collection("my_docs")
# With custom embedding functionfrom chromadb.utils import embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction( api_key="your-key", model_name="text-embedding-3-small")
collection = client.create_collection( name="my_docs", embedding_function=openai_ef)
# Get existing collectioncollection = client.get_collection("my_docs")
# Delete collectionclient.delete_collection("my_docs")2. Add documents
Section titled “2. Add documents”# Add with auto-generated IDscollection.add( documents=["Doc 1", "Doc 2", "Doc 3"], metadatas=[ {"source": "web", "category": "tutorial"}, {"source": "pdf", "page": 5}, {"source": "api", "timestamp": "2025-01-01"} ], ids=["id1", "id2", "id3"])
# Add with custom embeddingscollection.add( embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]], documents=["Doc 1", "Doc 2"], ids=["id1", "id2"])3. Query (similarity search)
Section titled “3. Query (similarity search)”# Basic queryresults = collection.query( query_texts=["machine learning tutorial"], n_results=5)
# Query with filtersresults = collection.query( query_texts=["Python programming"], n_results=3, where={"source": "web"})
# Query with metadata filtersresults = collection.query( query_texts=["advanced topics"], where={ "$and": [ {"category": "tutorial"}, {"difficulty": {"$gte": 3}} ] })
# Access resultsprint(results["documents"]) # List of matching documentsprint(results["metadatas"]) # Metadata for each docprint(results["distances"]) # Similarity scoresprint(results["ids"]) # Document IDs4. Get documents
Section titled “4. Get documents”# Get by IDsdocs = collection.get( ids=["id1", "id2"])
# Get with filtersdocs = collection.get( where={"category": "tutorial"}, limit=10)
# Get all documentsdocs = collection.get()5. Update documents
Section titled “5. Update documents”# Update document contentcollection.update( ids=["id1"], documents=["Updated content"], metadatas=[{"source": "updated"}])6. Delete documents
Section titled “6. Delete documents”# Delete by IDscollection.delete(ids=["id1", "id2"])
# Delete with filtercollection.delete( where={"source": "outdated"})Persistent storage
Section titled “Persistent storage”# Persist to diskclient = chromadb.PersistentClient(path="./chroma_db")
collection = client.create_collection("my_docs")collection.add(documents=["Doc 1"], ids=["id1"])
# Data persisted automatically# Reload later with same pathclient = chromadb.PersistentClient(path="./chroma_db")collection = client.get_collection("my_docs")Embedding functions
Section titled “Embedding functions”Default (Sentence Transformers)
Section titled “Default (Sentence Transformers)”# Uses sentence-transformers by defaultcollection = client.create_collection("my_docs")# Default model: all-MiniLM-L6-v2OpenAI
Section titled “OpenAI”from chromadb.utils import embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction( api_key="your-key", model_name="text-embedding-3-small")
collection = client.create_collection( name="openai_docs", embedding_function=openai_ef)HuggingFace
Section titled “HuggingFace”huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction( api_key="your-key", model_name="sentence-transformers/all-mpnet-base-v2")
collection = client.create_collection( name="hf_docs", embedding_function=huggingface_ef)Custom embedding function
Section titled “Custom embedding function”from chromadb import Documents, EmbeddingFunction, Embeddings
class MyEmbeddingFunction(EmbeddingFunction): def __call__(self, input: Documents) -> Embeddings: # Your embedding logic return embeddings
my_ef = MyEmbeddingFunction()collection = client.create_collection( name="custom_docs", embedding_function=my_ef)Metadata filtering
Section titled “Metadata filtering”# Exact matchresults = collection.query( query_texts=["query"], where={"category": "tutorial"})
# Comparison operatorsresults = collection.query( query_texts=["query"], where={"page": {"$gt": 10}} # $gt, $gte, $lt, $lte, $ne)
# Logical operatorsresults = collection.query( query_texts=["query"], where={ "$and": [ {"category": "tutorial"}, {"difficulty": {"$lte": 3}} ] } # Also: $or)
# Containsresults = collection.query( query_texts=["query"], where={"tags": {"$in": ["python", "ml"]}})LangChain integration
Section titled “LangChain integration”from langchain_chroma import Chromafrom langchain_openai import OpenAIEmbeddingsfrom langchain.text_splitter import RecursiveCharacterTextSplitter
# Split documentstext_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)docs = text_splitter.split_documents(documents)
# Create Chroma vector storevectorstore = Chroma.from_documents( documents=docs, embedding=OpenAIEmbeddings(), persist_directory="./chroma_db")
# Queryresults = vectorstore.similarity_search("machine learning", k=3)
# As retrieverretriever = vectorstore.as_retriever(search_kwargs={"k": 5})LlamaIndex integration
Section titled “LlamaIndex integration”from llama_index.vector_stores.chroma import ChromaVectorStorefrom llama_index.core import VectorStoreIndex, StorageContextimport chromadb
# Initialize Chromadb = chromadb.PersistentClient(path="./chroma_db")collection = db.get_or_create_collection("my_collection")
# Create vector storevector_store = ChromaVectorStore(chroma_collection=collection)storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Create indexindex = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
# Queryquery_engine = index.as_query_engine()response = query_engine.query("What is machine learning?")Server mode
Section titled “Server mode”# Run Chroma server# Terminal: chroma run --path ./chroma_db --port 8000
# Connect to serverimport chromadbfrom chromadb.config import Settings
client = chromadb.HttpClient( host="localhost", port=8000, settings=Settings(anonymized_telemetry=False))
# Use as normalcollection = client.get_or_create_collection("my_docs")Best practices
Section titled “Best practices”- Use persistent client - Don’t lose data on restart
- Add metadata - Enables filtering and tracking
- Batch operations - Add multiple docs at once
- Choose right embedding model - Balance speed/quality
- Use filters - Narrow search space
- Unique IDs - Avoid collisions
- Regular backups - Copy chroma_db directory
- Monitor collection size - Scale up if needed
- Test embedding functions - Ensure quality
- Use server mode for production - Better for multi-user
Performance
Section titled “Performance”| Operation | Latency | Notes |
|---|---|---|
| Add 100 docs | ~1-3s | With embedding |
| Query (top 10) | ~50-200ms | Depends on collection size |
| Metadata filter | ~10-50ms | Fast with proper indexing |
Resources
Section titled “Resources”- GitHub: https://github.com/chroma-core/chroma ⭐ 24,300+
- Docs: https://docs.trychroma.com
- Discord: https://discord.gg/MMeYNTmh3x
- Version: 1.3.3+
- License: Apache 2.0