Skip to main content

Index Configuration

v0.17 introduces a single DiskIVF index type. The polymorphic IndexIVFFlat / IndexIVFPQ / IndexIVFSQ types from v0.16 have been removed — there is no index_config argument any more. Configuration is expressed as flat keyword arguments to client.create_index.
ParameterTypeDefaultDescription
dimensionintNone (auto-detect)Vector dimensionality. Inferred from the first upsert or from embedding_model when omitted.
metricstrserver default ("euclidean")"euclidean", "squared_euclidean", or "cosine".
embedding_modelstrNoneOptional sentence-transformers model name for automatic embedding generation.
storage_precisionstr"float32"On-disk rerank-vector dtype: "float32" or "float16".

Key Management

ParameterTypeNotes
index_keybytes32-byte encryption key. Used by the SDK-supplied KEK path. Required on every subsequent call for that index.
kms_namestrName of a kms.registry entry in the service YAML. The server generates and wraps the DEK on creation; subsequent calls omit index_key.
At least one of index_key / kms_name must be supplied to create_index. Supplying both against a real-KMS slot is rejected.

Vector Item Format

Dictionary format for upsert operations:
vector_item = {
    "id": "unique_identifier",           # Required: string
    "vector": [0.1, 0.2, 0.3, ...],    # Optional: List[float]
    "contents": "text content",          # Optional: string or bytes
    "metadata": {                        # Optional: Dict[str, Any]
        "category": "research",
        "author": "Dr. Smith",
        "tags": ["ai", "ml"]
    }
}

Query Result Format

Results returned from query operations:
# Single query result format (flat list)
single_query_results = [
    {
        "id": "doc1",                    # string (always included)
        "distance": 0.125,               # float (if included, lower = more similar)
        "metadata": {                    # Dict (if included in query)
            "category": "research"
        },
        "contents": "text content",      # string (if included in query)
        "vector": [0.1, 0.2, ...]      # List[float] (if included in query)
    },
    # ... more results
]

# Batch query result format (nested list)
batch_query_results = [
    [  # Results for first query vector
        {"id": "doc1", "distance": 0.125, ...},
        {"id": "doc2", "distance": 0.234, ...}
    ],
    [  # Results for second query vector
        {"id": "doc3", "distance": 0.156, ...},
        {"id": "doc4", "distance": 0.278, ...}
    ]
]

Metadata Filtering

The filters parameter in query operations supports MongoDB-style operators:

Supported Operators

  • $eq: Equality ({"category": "research"})
  • $ne: Not equal ({"status": {"$ne": "draft"}})
  • $gt: Greater than ({"score": {"$gt": 0.8}})
  • $gte: Greater than or equal ({"year": {"$gte": 2020}})
  • $lt: Less than ({"price": {"$lt": 100}})
  • $lte: Less than or equal ({"rating": {"$lte": 4.5}})
  • $in: In array ({"tag": {"$in": ["ai", "ml"]}})
  • $nin: Not in array ({"category": {"$nin": ["spam", "deleted"]}})
  • $and: Logical AND ({"$and": [{"a": 1}, {"b": 2}]})
  • $or: Logical OR ({"$or": [{"x": 1}, {"y": 2}]})

Filter Examples

# Simple equality filter
simple_filter = {"category": "research"}

# Range filter
range_filter = {
    "published_year": {"$gte": 2020, "$lte": 2024}
}

# Complex compound filter
complex_filter = {
    "$and": [
        {"category": "research"},
        {"confidence": {"$gte": 0.9}},
        {"$or": [
            {"language": "en"},
            {"translated": True}
        ]}
    ]
}

Field Selection

Many operations support field selection through the include parameter:

Available Fields

  • vector: The vector data itself
  • contents: Text or binary content associated with the vector
  • metadata: Structured metadata object
  • distance: Similarity distance (query operations only)
The id field is always included in query results. Other fields such as distance and metadata are controlled by the include parameter (server default: [] — only id is returned unless include specifies additional fields).

Example Usage

# Include only metadata (efficient for existence checks)
metadata_only = ["metadata"]

# Include vectors and distances (for similarity analysis)
vectors_and_distances = ["vector", "distance"]

# Include all available fields
all_fields = ["vector", "contents", "metadata", "distance"]

Distance Metrics

Supported distance metrics for similarity calculations:
  • cosine: Cosine similarity (recommended for normalized vectors)
  • euclidean: Euclidean distance (L2 norm)
  • squared_euclidean: Squared Euclidean distance (faster than euclidean)