Pinecone — Managed vector database for production AI applications
Pinecone
Section titled “Pinecone”Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.
Skill metadata
Section titled “Skill metadata”| Source | Optional — install with hermes skills install official/mlops/pinecone |
| Path | optional-skills/mlops/pinecone |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | pinecone-client |
| Tags | RAG, Pinecone, Vector Database, Managed Service, Serverless, Hybrid Search, Production, Auto-Scaling, Low Latency, Recommendations |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Pinecone - Managed Vector Database
Section titled “Pinecone - Managed Vector Database”The vector database for production AI applications.
When to use Pinecone
Section titled “When to use Pinecone”Use when:
- Need managed, serverless vector database
- Production RAG applications
- Auto-scaling required
- Low latency critical (<100ms)
- Don’t want to manage infrastructure
- Need hybrid search (dense + sparse vectors)
Metrics:
- Fully managed SaaS
- Auto-scales to billions of vectors
- p95 latency <100ms
- 99.9% uptime SLA
Use alternatives instead:
- Chroma: Self-hosted, open-source
- FAISS: Offline, pure similarity search
- Weaviate: Self-hosted with more features
Quick start
Section titled “Quick start”Installation
Section titled “Installation”pip install pinecone-clientBasic usage
Section titled “Basic usage”from pinecone import Pinecone, ServerlessSpec
# Initializepc = Pinecone(api_key="your-api-key")
# Create indexpc.create_index( name="my-index", dimension=1536, # Must match embedding dimension metric="cosine", # or "euclidean", "dotproduct" spec=ServerlessSpec(cloud="aws", region="us-east-1"))
# Connect to indexindex = pc.Index("my-index")
# Upsert vectorsindex.upsert(vectors=[ {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "A"}}, {"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"category": "B"}}])
# Queryresults = index.query( vector=[0.1, 0.2, ...], top_k=5, include_metadata=True)
print(results["matches"])Core operations
Section titled “Core operations”Create index
Section titled “Create index”# Serverless (recommended)pc.create_index( name="my-index", dimension=1536, metric="cosine", spec=ServerlessSpec( cloud="aws", # or "gcp", "azure" region="us-east-1" ))
# Pod-based (for consistent performance)from pinecone import PodSpec
pc.create_index( name="my-index", dimension=1536, metric="cosine", spec=PodSpec( environment="us-east1-gcp", pod_type="p1.x1" ))Upsert vectors
Section titled “Upsert vectors”# Single upsertindex.upsert(vectors=[ { "id": "doc1", "values": [0.1, 0.2, ...], # 1536 dimensions "metadata": { "text": "Document content", "category": "tutorial", "timestamp": "2025-01-01" } }])
# Batch upsert (recommended)vectors = [ {"id": f"vec{i}", "values": embedding, "metadata": metadata} for i, (embedding, metadata) in enumerate(zip(embeddings, metadatas))]
index.upsert(vectors=vectors, batch_size=100)Query vectors
Section titled “Query vectors”# Basic queryresults = index.query( vector=[0.1, 0.2, ...], top_k=10, include_metadata=True, include_values=False)
# With metadata filteringresults = index.query( vector=[0.1, 0.2, ...], top_k=5, filter={"category": {"$eq": "tutorial"}})
# Namespace queryresults = index.query( vector=[0.1, 0.2, ...], top_k=5, namespace="production")
# Access resultsfor match in results["matches"]: print(f"ID: {match['id']}") print(f"Score: {match['score']}") print(f"Metadata: {match['metadata']}")Metadata filtering
Section titled “Metadata filtering”# Exact matchfilter = {"category": "tutorial"}
# Comparisonfilter = {"price": {"$gte": 100}} # $gt, $gte, $lt, $lte, $ne
# Logical operatorsfilter = { "$and": [ {"category": "tutorial"}, {"difficulty": {"$lte": 3}} ]} # Also: $or
# In operatorfilter = {"tags": {"$in": ["python", "ml"]}}Namespaces
Section titled “Namespaces”# Partition data by namespaceindex.upsert( vectors=[{"id": "vec1", "values": [...]}], namespace="user-123")
# Query specific namespaceresults = index.query( vector=[...], namespace="user-123", top_k=5)
# List namespacesstats = index.describe_index_stats()print(stats['namespaces'])Hybrid search (dense + sparse)
Section titled “Hybrid search (dense + sparse)”# Upsert with sparse vectorsindex.upsert(vectors=[ { "id": "doc1", "values": [0.1, 0.2, ...], # Dense vector "sparse_values": { "indices": [10, 45, 123], # Token IDs "values": [0.5, 0.3, 0.8] # TF-IDF scores }, "metadata": {"text": "..."} }])
# Hybrid queryresults = index.query( vector=[0.1, 0.2, ...], sparse_vector={ "indices": [10, 45], "values": [0.5, 0.3] }, top_k=5, alpha=0.5 # 0=sparse, 1=dense, 0.5=hybrid)LangChain integration
Section titled “LangChain integration”from langchain_pinecone import PineconeVectorStorefrom langchain_openai import OpenAIEmbeddings
# Create vector storevectorstore = PineconeVectorStore.from_documents( documents=docs, embedding=OpenAIEmbeddings(), index_name="my-index")
# Queryresults = vectorstore.similarity_search("query", k=5)
# With metadata filterresults = vectorstore.similarity_search( "query", k=5, filter={"category": "tutorial"})
# As retrieverretriever = vectorstore.as_retriever(search_kwargs={"k": 10})LlamaIndex integration
Section titled “LlamaIndex integration”from llama_index.vector_stores.pinecone import PineconeVectorStore
# Connect to Pineconepc = Pinecone(api_key="your-key")pinecone_index = pc.Index("my-index")
# Create vector storevector_store = PineconeVectorStore(pinecone_index=pinecone_index)
# Use in LlamaIndexfrom llama_index.core import StorageContext, VectorStoreIndex
storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)Index management
Section titled “Index management”# List indicesindexes = pc.list_indexes()
# Describe indexindex_info = pc.describe_index("my-index")print(index_info)
# Get index statsstats = index.describe_index_stats()print(f"Total vectors: {stats['total_vector_count']}")print(f"Namespaces: {stats['namespaces']}")
# Delete indexpc.delete_index("my-index")Delete vectors
Section titled “Delete vectors”# Delete by IDindex.delete(ids=["vec1", "vec2"])
# Delete by filterindex.delete(filter={"category": "old"})
# Delete all in namespaceindex.delete(delete_all=True, namespace="test")
# Delete entire indexindex.delete(delete_all=True)Best practices
Section titled “Best practices”- Use serverless - Auto-scaling, cost-effective
- Batch upserts - More efficient (100-200 per batch)
- Add metadata - Enable filtering
- Use namespaces - Isolate data by user/tenant
- Monitor usage - Check Pinecone dashboard
- Optimize filters - Index frequently filtered fields
- Test with free tier - 1 index, 100K vectors free
- Use hybrid search - Better quality
- Set appropriate dimensions - Match embedding model
- Regular backups - Export important data
Performance
Section titled “Performance”| Operation | Latency | Notes |
|---|---|---|
| Upsert | ~50-100ms | Per batch |
| Query (p50) | ~50ms | Depends on index size |
| Query (p95) | ~100ms | SLA target |
| Metadata filter | ~+10-20ms | Additional overhead |
Pricing (as of 2025)
Section titled “Pricing (as of 2025)”Serverless:
- $0.096 per million read units
- $0.06 per million write units
- $0.06 per GB storage/month
Free tier:
- 1 serverless index
- 100K vectors (1536 dimensions)
- Great for prototyping
Resources
Section titled “Resources”- Website: https://www.pinecone.io
- Docs: https://docs.pinecone.io
- Console: https://app.pinecone.io
- Pricing: https://www.pinecone.io/pricing