Index Configuration Types
IndexIVF
Standard IVF (Inverted File) index configuration, ideal for balanced performance:
| Speed | Accuracy | Memory Usage |
| Fast | Good | Medium |
from cyborgdb import IndexIVF
config = IndexIVF(
dimension=768 # optional, defaults to auto-detect
)
IndexIVFFlat
IVFFlat index configuration, suitable for highest accuracy requirements:
| Speed | Accuracy | Memory Usage |
| Medium | Highest | High |
from cyborgdb import IndexIVFFlat
config = IndexIVFFlat(
dimension=512 # optional, defaults to auto-detect
)
IndexIVFPQ
IVFPQ (Product Quantization) index configuration, optimized for memory efficiency:
| Speed | Accuracy | Memory Usage |
| Fast | Good | Low |
from cyborgdb import IndexIVFPQ
config = IndexIVFPQ(
pq_dim=64, # required: product quantization dimension
pq_bits=8, # required: bits per quantization code
dimension=1536 # optional, defaults to auto-detect
)
Both pq_dim and pq_bits are required parameters for IndexIVFPQ. Unlike IndexIVF and IndexIVFFlat, these parameters must be explicitly specified.
Dictionary format for upsert operations:
vector_item = {
"id": "unique_identifier", # Required: string
"vector": [0.1, 0.2, 0.3, ...], # Optional: List[float]
"contents": "text content", # Optional: string or bytes
"metadata": { # Optional: Dict[str, Any]
"category": "research",
"author": "Dr. Smith",
"tags": ["ai", "ml"]
}
}
Results returned from query operations:
# Single query result format (flat list)
single_query_results = [
{
"id": "doc1", # string (always included)
"distance": 0.125, # float (always included, lower = more similar)
"metadata": { # Dict (if included in query)
"category": "research"
},
"contents": "text content", # string (if included in query)
"vector": [0.1, 0.2, ...] # List[float] (if included in query)
},
# ... more results
]
# Batch query result format (nested list)
batch_query_results = [
[ # Results for first query vector
{"id": "doc1", "distance": 0.125, ...},
{"id": "doc2", "distance": 0.234, ...}
],
[ # Results for second query vector
{"id": "doc3", "distance": 0.156, ...},
{"id": "doc4", "distance": 0.278, ...}
]
]
The filters parameter in query operations supports MongoDB-style operators:
Supported Operators
$eq: Equality ({"category": "research"})
$ne: Not equal ({"status": {"$ne": "draft"}})
$gt: Greater than ({"score": {"$gt": 0.8}})
$gte: Greater than or equal ({"year": {"$gte": 2020}})
$lt: Less than ({"price": {"$lt": 100}})
$lte: Less than or equal ({"rating": {"$lte": 4.5}})
$in: In array ({"tag": {"$in": ["ai", "ml"]}})
$nin: Not in array ({"category": {"$nin": ["spam", "deleted"]}})
$and: Logical AND ({"$and": [{"a": 1}, {"b": 2}]})
$or: Logical OR ({"$or": [{"x": 1}, {"y": 2}]})
Filter Examples
# Simple equality filter
simple_filter = {"category": "research"}
# Range filter
range_filter = {
"published_year": {"$gte": 2020, "$lte": 2024}
}
# Complex compound filter
complex_filter = {
"$and": [
{"category": "research"},
{"confidence": {"$gte": 0.9}},
{"$or": [
{"language": "en"},
{"translated": True}
]}
]
}
Field Selection
Many operations support field selection through the include parameter:
Available Fields
vector: The vector data itself
contents: Text or binary content associated with the vector
metadata: Structured metadata object
distance: Similarity distance (query operations only, always included automatically)
The id and distance fields are always included in query results regardless of the include parameter.
Example Usage
# Include only metadata (efficient for existence checks)
metadata_only = ["metadata"]
# Include vectors and distances (for similarity analysis)
vectors_and_distances = ["vector", "distance"]
# Include all available fields
all_fields = ["vector", "contents", "metadata", "distance"]
Distance Metrics
Supported distance metrics for similarity calculations:
cosine: Cosine similarity (recommended for normalized vectors)
euclidean: Euclidean distance (L2 norm)
squared_euclidean: Squared Euclidean distance (faster than euclidean)