This functionality is only present in the embedded library version of Cyborg Vector Search.
In other versions (microservice, serverless), it is automatically called once enough vector embeddings have been indexed.
IVF*
index types, which leverage clustering algorithms to segment the index into smaller sections for efficient querying. These clustering algorithms must be trained on the specific data being indexed in order to adequately represent that data.
In the embedded library version of Cyborg Vector Search, this training must be explicitly called once enough vectors have been added:
You must have at least
2 * n_lists
number of vectors in the index (ingested via upsert
) before you can call train
.Training Parameters
Parameters are available to customize the training process:Parameter | Type | Default | Description |
---|---|---|---|
batch_size | int | 0 | (Optional) Size of each batch for training. 0 auto-selects the batch size. |
max_iters | int | 0 | (Optional) Maximum number of iterations for training. 0 auto-selects the iteration count. |
tolerance | float | 1e-6 | (Optional) Convergence tolerance for training. |
Warnings with Large Untrained Queries
While training is technically optional (you can use Cyborg Vector Search without ever callingtrain
), it is recommended that you do so once you have a large number of vectors in the index (e.g., > 50,000
). If you don’t, and you call query
, you will see a warning in the console, stating: