Train an Encrypted Index

This functionality is only present in the core library version of CyborgDB. In other versions (microservice, serverless), it is automatically called once enough vector embeddings have been indexed.

CyborgDB uses IVF* index types, which leverage clustering algorithms to segment the index into smaller sections for efficient querying. These clustering algorithms must be trained on the specific data being indexed in order to adequately represent that data. In the core library version of CyborgDB, this training must be explicitly called once enough vectors have been added:

# Train the encrypted index
index.train()

You must have at least 2 * n_lists number of vectors in the index (ingested via upsert) before you can call train.

Training Parameters

Parameters are available to customize the training process:

Parameter	Type	Default	Description
`batch_size`	`int`	`0`	(Optional) Size of each batch for training. `0` auto-selects the batch size.
`max_iters`	`int`	`0`	(Optional) Maximum number of iterations for training. `0` auto-selects the iteration count.
`tolerance`	`float`	`1e-6`	(Optional) Convergence tolerance for training.
`max_memory`	`int`	`0`	(Optional) Maximum memory to use for training. `0` sets no limit.

Warnings with Large Untrained Queries

While training is technically optional (you can use CyborgDB without ever calling train), it is recommended that you do so once you have a large number of vectors in the index (e.g., > 50,000). If you don’t, and you call query, you will see a warning in the console, stating:

Warning: querying untrained index with more than 50000 indexed vectors.

API Reference

For more information on training an encrypted index, refer to the API reference:

Python API Reference

API reference for train() in Python

C++ API Reference

API reference for Train() in C++

Get Started

Encrypted Indexes

Data Operations

Advanced

Train an Encrypted Index

Training Parameters

Warnings with Large Untrained Queries

API Reference

Python API Reference

C++ API Reference

Get Started

Encrypted Indexes

Data Operations

Advanced

​Training Parameters

​Warnings with Large Untrained Queries

​API Reference

Python API Reference

C++ API Reference

Training Parameters

Warnings with Large Untrained Queries

API Reference