Upsert - CyborgDB Docs

Adds new vectors to the index or updates existing ones. The Python SDK exposes a single positional API with two calling shapes:

# Shape 1: list of item dicts
index.upsert(items)

# Shape 2: parallel arrays — IDs + numpy array of vectors
index.upsert(ids, vectors)

For batches large enough that JSON encoding becomes a bottleneck, call upsert_binary directly — it sends vectors as base64-encoded binary and also accepts parallel metadata / contents lists. Shape 2 above is a thin wrapper that already forwards to upsert_binary under the hood, but only with ids + vectors (no metadata/contents).

Parameters

Shape 1: List of item dicts

Parameter	Type	Default	Description
`items`	`List[Dict]`	-	List of dictionaries, one per vector.

Where each dictionary can contain:

[
  {
    "id": str,                # Unique identifier for the vector (required)
    "vector": List[float],    # Vector data. Optional if the index has an embedding model and `contents` is provided.
    "contents": str | bytes,  # Optional content. Bytes are base64-encoded, strings are passed through. All contents are encrypted before storage.
    "metadata": Dict          # Optional key-value pairs for filtering and retrieval
  },
  ...
]

The contents field accepts both strings and bytes. Bytes are automatically base64-encoded before encryption; strings are passed as-is. Contents are returned in their original format (string or bytes) when retrieved with get().

Shape 2: Parallel arrays

Parameter	Type	Default	Description
`ids`	`List[str]`	-	List of unique vector identifiers.
`vectors`	`np.ndarray` (shape `(n, dim)`, dtype `float32`)	-	Vector data as a 2D numpy array.

The two-arg form of upsert() does not accept metadata or contents keyword arguments. To attach metadata or contents alongside parallel arrays, call upsert_binary(ids, vectors, metadata=..., contents=...) directly.

Returns

None

Example Usage

Dictionary format

items = [
    {"id": "doc1", "vector": [0.1, 0.2, 0.3, 0.4]},
    {"id": "doc2", "vector": [0.5, 0.6, 0.7, 0.8], "metadata": {"category": "news"}},
]

index.upsert(items)

Parallel arrays (binary fast path)

import numpy as np

ids = ["vec1", "vec2", "vec3"]
vectors = np.random.rand(3, 128).astype(np.float32)

index.upsert(ids, vectors)

Parallel arrays with metadata / contents

When you need metadata or contents alongside parallel arrays, call upsert_binary directly:

import numpy as np

ids = ["vec1", "vec2", "vec3"]
vectors = np.random.rand(3, 128).astype(np.float32)
metadata = [
    {"category": "news"},
    {"category": "research"},
    None,
]
contents = ["First doc body", "Second doc body", None]

index.upsert_binary(ids, vectors, metadata=metadata, contents=contents)

​Parameters

​Shape 1: List of item dicts

​Shape 2: Parallel arrays

​Returns

​Example Usage

​Dictionary format

​Parallel arrays (binary fast path)

​Parallel arrays with metadata / contents

Parameters

Shape 1: List of item dicts

Shape 2: Parallel arrays

Returns

Example Usage

Dictionary format

Parallel arrays (binary fast path)

Parallel arrays with metadata / contents