> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cyborg.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Train an Encrypted Index

CyborgDB Service stores every encrypted index as a **DiskIVF** index — an inverted-file (IVF) index whose clusters are produced by a training step. Training fits the clustering model to the actual vectors in your index so queries can probe a small number of clusters instead of scanning everything.

In CyborgDB Service, training is handled automatically after 10,000 vectors have been upserted. However, you can explicitly trigger training once enough vectors have been added, if you wish to specify training parameters.

<Tip>You can adjust the number of vectors that will trigger automatic training by setting the `RETRAIN_THRESHOLD` environment variable. See more in the [Environment Variables](./env-vars) guide.</Tip>

<CodeGroup>
  ```python Python SDK icon="python" theme={null}
  # Train the encrypted index
  index.train()

  # Or train with a specific number of clusters
  index.train(n_lists=128)
  ```

  ```javascript JavaScript SDK icon="js" theme={null}
  // Train the encrypted index
  await index.train();

  // Or train with a specific number of clusters
  await index.train({ nLists: 128 });
  ```

  ```typescript TypeScript SDK icon="code" theme={null}
  // Train the encrypted index
  await index.train();

  // Or train with a specific number of clusters
  await index.train({ nLists: 128 });
  ```

  ```go Go SDK icon="golang" theme={null}
  // Train the encrypted index
  err := index.Train(context.Background(), cyborgdb.TrainParams{})

  // Or train with a specific number of clusters
  nLists := int32(128)
  err = index.Train(context.Background(), cyborgdb.TrainParams{
      NLists: &nLists,
  })
  ```

  ```bash cURL icon="rectangle-terminal" theme={null}
  curl -X POST "http://localhost:8000/v1/indexes/train" \
       -H "X-API-Key: your-api-key" \
       -H "Content-Type: application/json" \
       -d '{
         "index_name": "my_index",
         "index_key": "your_64_character_hex_key_here"
       }'
  ```
</CodeGroup>

<Tip>You must have at least 10,000 or `2 * n_lists` number of vectors in the index (ingested via `upsert`) before you can call `train`.</Tip>

## Training Parameters

Parameters are available to customize the training process:

| Parameter    | Type    | Default       | Description                                                                                                                                              |
| ------------ | ------- | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `n_lists`    | `int`   | `None` (auto) | *(Optional)* Number of inverted index lists to create in the index. When `None` or omitted, auto-determines based on the number of vectors in the index. |
| `batch_size` | `int`   | `None`        | *(Optional)* Number of vectors to process per training batch. When `None`, the server uses 2048.                                                         |
| `max_iters`  | `int`   | `None`        | *(Optional)* Maximum number of training iterations. When `None`, the server uses 100.                                                                    |
| `tolerance`  | `float` | `None`        | *(Optional)* Convergence tolerance for training completion. When `None`, the server uses 1e-6.                                                           |
| `max_memory` | `int`   | `None` (0)    | *(Optional)* Maximum memory usage in MB. When `None` or 0, no memory limit is applied.                                                                   |

`n_lists` is the number of clusters into which each vector in the index can be categorized. Typically, the higher the value, the higher the recall (but also the slower the indexing process). As a good rule of thumb, `n_lists` should be:

* A base-2 number (e.g., `2,048`, `4,096`). Not a requirement, but yields performance optimizations.
* Each cluster should have between `100` - `10,000` vectors; so `n_lists` should be roughly between `1/100` - `1/10,000` of the total number of items which will be indexed.

If not specified, CyborgDB will auto-determine the best `n_lists` value based on the number of vectors in the index.

## Avoid the large-untrained-query warning

While training is technically optional (you can use CyborgDB without ever calling `train`), it is recommended that you do so once you have a large number of vectors in the index (e.g., `> 50,000`). If you don't, and you call `query`, you will see a warning in the console, stating:

```
Warning: querying untrained index with more than 50000 indexed vectors.
```

## API Reference

For more information on training an encrypted index, refer to the API reference:

<CardGroup cols={2}>
  <Card title="REST API Reference" href="../../rest-api/encrypted-index/train" icon="rectangle-terminal">
    REST API reference for `/v1/indexes/train`
  </Card>

  <Card title="Python SDK Reference" href="../../python-sdk/encrypted-index/train" icon="python">
    API reference for `train()` in Python
  </Card>

  <Card title="JS/TS SDK Reference" href="../../js-ts-sdk/encrypted-index/train" icon="js">
    API reference for `train()` in JavaScript/TypeScript
  </Card>

  <Card title="Go SDK Reference" href="../../go-sdk/encrypted-index/train" icon="golang">
    API reference for `Train()` in Go
  </Card>
</CardGroup>
