Types - CyborgDB Docs

Index Configuration

v0.17 introduces a single DiskIVF index type. The polymorphic IndexIVFFlat / IndexIVFPQ / IndexIVFSQ types from v0.16 have been removed — there is no index_config argument any more. Configuration is expressed as flat keyword arguments to client.create_index.

Parameter	Type	Default	Description
`dimension`	`int`	`None` (auto-detect)	Vector dimensionality. Inferred from the first upsert or from `embedding_model` when omitted.
`metric`	`str`	server default (`"euclidean"`)	`"euclidean"`, `"squared_euclidean"`, or `"cosine"`.
`embedding_model`	`str`	`None`	Optional sentence-transformers model name for automatic embedding generation.
`storage_precision`	`str`	`"float32"`	On-disk rerank-vector dtype: `"float32"` or `"float16"`.

Key Management

Parameter	Type	Notes
`index_key`	`bytes`	32-byte encryption key. Used by the SDK-supplied KEK path. Required on every subsequent call for that index.
`kms_name`	`str`	Name of a `kms.registry` entry in the service YAML. The server generates and wraps the DEK on creation; subsequent calls omit `index_key`.

At least one of index_key / kms_name must be supplied to create_index. Supplying both against a real-KMS slot is rejected.

Vector Item Format

Dictionary format for upsert operations:

vector_item = {
    "id": "unique_identifier",           # Required: string
    "vector": [0.1, 0.2, 0.3, ...],    # Optional: List[float]
    "contents": "text content",          # Optional: string or bytes
    "metadata": {                        # Optional: Dict[str, Any]
        "category": "research",
        "author": "Dr. Smith",
        "tags": ["ai", "ml"]
    }
}

Query Result Format

Results returned from query operations:

# Single query result format (flat list)
single_query_results = [
    {
        "id": "doc1",                    # string (always included)
        "distance": 0.125,               # float (if included, lower = more similar)
        "metadata": {                    # Dict (if included in query)
            "category": "research"
        },
        "contents": "text content",      # string (if included in query)
        "vector": [0.1, 0.2, ...]      # List[float] (if included in query)
    },
    # ... more results
]

# Batch query result format (nested list)
batch_query_results = [
    [  # Results for first query vector
        {"id": "doc1", "distance": 0.125, ...},
        {"id": "doc2", "distance": 0.234, ...}
    ],
    [  # Results for second query vector
        {"id": "doc3", "distance": 0.156, ...},
        {"id": "doc4", "distance": 0.278, ...}
    ]
]

Metadata Filtering

The filters parameter in query operations supports MongoDB-style operators:

Supported Operators

$eq: Equality ({"category": "research"})
$ne: Not equal ({"status": {"$ne": "draft"}})
$gt: Greater than ({"score": {"$gt": 0.8}})
$gte: Greater than or equal ({"year": {"$gte": 2020}})
$lt: Less than ({"price": {"$lt": 100}})
$lte: Less than or equal ({"rating": {"$lte": 4.5}})
$in: In array ({"tag": {"$in": ["ai", "ml"]}})
$nin: Not in array ({"category": {"$nin": ["spam", "deleted"]}})
$and: Logical AND ({"$and": [{"a": 1}, {"b": 2}]})
$or: Logical OR ({"$or": [{"x": 1}, {"y": 2}]})

Filter Examples

# Simple equality filter
simple_filter = {"category": "research"}

# Range filter
range_filter = {
    "published_year": {"$gte": 2020, "$lte": 2024}
}

# Complex compound filter
complex_filter = {
    "$and": [
        {"category": "research"},
        {"confidence": {"$gte": 0.9}},
        {"$or": [
            {"language": "en"},
            {"translated": True}
        ]}
    ]
}

Field Selection

Many operations support field selection through the include parameter:

Available Fields

vector: The vector data itself
contents: Text or binary content associated with the vector
metadata: Structured metadata object
distance: Similarity distance (query operations only)

The id field is always included in query results. Other fields such as distance and metadata are controlled by the include parameter (server default: [] — only id is returned unless include specifies additional fields).

Example Usage

# Include only metadata (efficient for existence checks)
metadata_only = ["metadata"]

# Include vectors and distances (for similarity analysis)
vectors_and_distances = ["vector", "distance"]

# Include all available fields
all_fields = ["vector", "contents", "metadata", "distance"]

Distance Metrics

Supported distance metrics for similarity calculations:

cosine: Cosine similarity (recommended for normalized vectors)
euclidean: Euclidean distance (L2 norm)
squared_euclidean: Squared Euclidean distance (faster than euclidean)

​Index Configuration

​Key Management

​Vector Item Format

​Query Result Format

​Metadata Filtering

​Supported Operators

​Filter Examples

​Field Selection

​Available Fields

​Example Usage

​Distance Metrics

Index Configuration

Key Management

Vector Item Format

Query Result Format

Metadata Filtering

Supported Operators

Filter Examples

Field Selection

Available Fields

Example Usage

Distance Metrics