Adds new vectors to the index or updates existing ones. The Python SDK exposes a single positional API with two calling shapes:
# Shape 1: list of item dicts
index.upsert(items)
# Shape 2: parallel arrays — IDs + numpy array of vectors
index.upsert(ids, vectors)
For batches large enough that JSON encoding becomes a bottleneck, call
upsert_binary directly — it sends vectors as base64-encoded binary and also accepts parallel
metadata /
contents lists. Shape 2 above is a thin wrapper that already forwards to
upsert_binary under the hood, but only with
ids +
vectors (no metadata/contents).
Parameters
Shape 1: List of item dicts
| Parameter | Type | Default | Description |
|---|
items | List[Dict] | - | List of dictionaries, one per vector. |
Where each dictionary can contain:
[
{
"id": str, # Unique identifier for the vector (required)
"vector": List[float], # Vector data. Optional if the index has an embedding model and `contents` is provided.
"contents": str | bytes, # Optional content. Bytes are base64-encoded, strings are passed through. All contents are encrypted before storage.
"metadata": Dict # Optional key-value pairs for filtering and retrieval
},
...
]
The contents field accepts both strings and bytes. Bytes are automatically base64-encoded before encryption; strings are passed as-is. Contents are returned in their original format (string or bytes) when retrieved with get().
Shape 2: Parallel arrays
| Parameter | Type | Default | Description |
|---|
ids | List[str] | - | List of unique vector identifiers. |
vectors | np.ndarray (shape (n, dim), dtype float32) | - | Vector data as a 2D numpy array. |
Returns
None
Example Usage
items = [
{"id": "doc1", "vector": [0.1, 0.2, 0.3, 0.4]},
{"id": "doc2", "vector": [0.5, 0.6, 0.7, 0.8], "metadata": {"category": "news"}},
]
index.upsert(items)
Parallel arrays (binary fast path)
import numpy as np
ids = ["vec1", "vec2", "vec3"]
vectors = np.random.rand(3, 128).astype(np.float32)
index.upsert(ids, vectors)
Parallel arrays with metadata / contents
When you need metadata or contents alongside parallel arrays, call upsert_binary directly:
import numpy as np
ids = ["vec1", "vec2", "vec3"]
vectors = np.random.rand(3, 128).astype(np.float32)
metadata = [
{"category": "news"},
{"category": "research"},
None,
]
contents = ["First doc body", "Second doc body", None]
index.upsert_binary(ids, vectors, metadata=metadata, contents=contents)