Skip to main content
Trains the encrypted index to optimize it for efficient similarity search queries. Training is essential for IVF-based indexes to achieve optimal query performance and accuracy.
In CyborgDB Service v0.17, training is auto-triggered server-side once upserts cross the configured RETRAIN_THRESHOLD. upsert() returns None either way — to observe training, poll is_training(). Calling train() explicitly forces immediate clustering and is useful when you want to block until the index is ready (for example, before benchmarking queries). Auto-training can be disabled service-side with the AUTO_TRAIN_DISABLED setting.
index.train(
    n_lists=None,
    batch_size=None,
    max_iters=None,
    tolerance=None
)

Parameters

ParameterTypeDefaultDescription
n_listsintNone(Optional) Number of inverted lists to use for the index. When None, auto-selects based on the dataset size
batch_sizeintNone(Optional) Number of vectors to process per training batch. When None, the server uses 2048
max_itersintNone(Optional) Maximum number of training iterations. When None, the server uses 100
tolerancefloatNone(Optional) Convergence tolerance for training completion. When None, the server uses 1e-6
Training is a compute-intensive operation that may take several seconds to minutes depending on the index size and configuration.

Returns

None

Exceptions

  • Throws if the API request fails due to network connectivity issues.
  • Throws if authentication fails (invalid API key).
  • Throws if the encryption key is invalid for the specified index.
  • Throws if there are insufficient resources to complete training.
  • Throws if the index has no vectors to train on.
  • Throws if the index configuration is incompatible with training.
  • Throws if training parameters are out of valid ranges.
  • Throws if training fails to converge within the specified parameters.

Example Usage

Basic Index Training

# Train the index after adding the data
index.train()

Custom Training Parameters

# Train with custom parameters for large dataset
index.train(
    n_lists=100,        # Number of inverted lists
    batch_size=4096,    # Larger batches for better performance
    max_iters=200,      # More iterations for better convergence
    tolerance=1e-7      # Stricter convergence criteria
)