> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cyborg.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Configure an Encrypted Index

<Info>Index configuration is automatically handled by default. This guide allows you to override these defaults to customize index behavior & performance characteristics.</Info>

CyborgDB uses a single index type: **DiskIVF**, a disk-backed inverted-file index. Rather than choosing between several index variants, you tune one index with a small set of knobs.

DiskIVF retrieves results in two stages: a fast first stage narrows down candidates using a compact Product-Quantized (PQ) representation, then a rerank stage recomputes exact distances against the stored vectors (in `float32` or `float16`) to deliver high recall. This gives you the speed of a quantized index with the accuracy of an exact rerank, all within a single index that scales to disk.

There are no longer any `IVFFlat` / `IVFPQ` / `IVFSQ` variants to choose between — a single DiskIVF index covers all of these use cases. You configure it at **creation time** (dimension, storage precision, metric), at **training time** (clustering parameters), and at **query time** (`n_probes`, `rerank_mult`).

***

## Creating a DiskIVF Index

The most common path is to let CyborgDB choose sensible defaults. You only need an index name and a 32-byte key; the `dimension` is auto-detected from your first upsert (or derived from `embedding_model` if you provide one).

<CodeGroup>
  ```python Python icon="python" theme={null}
  import cyborgdb_core as cyborgdb
  import secrets

  api_key = "your_api_key_here"  # Replace with your CyborgDB API key

  client = cyborgdb.Client(api_key, cyborgdb.StorageConfig.memory())

  index_key = secrets.token_bytes(32)  # 32-byte index KEK

  # Create a DiskIVF index with defaults (dimension auto-detected on first upsert)
  index = client.create_index("test_index", index_key)
  ```

  ```cpp C++ icon="brackets-curly" theme={null}
  #include "cyborgdb_core/client.hpp"
  #include "cyborgdb_core/encrypted_index.hpp"
  #include <array>
  #include <openssl/rand.h>

  std::string api_key = "your_api_key_here";  // Replace with your CyborgDB API key

  cyborg::Client client(api_key, cyborg::StorageConfig::Memory(), 0, cyborg::kNone);

  std::array<uint8_t, 32> index_key;
  RAND_bytes(index_key.data(), index_key.size());  // 32-byte index KEK

  // Create a DiskIVF index with defaults (dimension auto-detected on first upsert)
  auto index = client.CreateIndex("test_index", index_key);
  ```
</CodeGroup>

### Creation Parameters

You can override the defaults at creation time:

* `dimension`: vector dimensionality. Optional — auto-detected from the first upsert, or derived from `embedding_model` if provided.
* `storage_precision`: the on-disk dtype used for the rerank vectors. `float32` (default) gives the highest recall; `float16` roughly **halves disk footprint** with a slight precision loss. Acceptable values are `numpy.float32` / `numpy.float16` (or the strings `"float32"` / `"float16"`) in Python, and `StoragePrecision::Float32` / `StoragePrecision::Float16` in C++.
* `embedding_model`: an optional [`sentence-transformers`](https://www.sbert.net/) model name (Python only) that enables automatic embedding generation and fixes the dimension.
* `metric`: the distance metric — `"euclidean"` (default), `"cosine"`, or `"squared_euclidean"`.

<CodeGroup>
  ```python Python icon="python" theme={null}
  import cyborgdb_core as cyborgdb
  import numpy as np
  import secrets

  index_key = secrets.token_bytes(32)

  # Create a DiskIVF index with explicit configuration
  index = client.create_index(
      "test_index",
      index_key,
      dimension=768,
      storage_precision=np.float16,   # halve disk footprint vs. float32
      metric="cosine"
  )
  ```

  ```cpp C++ icon="brackets-curly" theme={null}
  #include "cyborgdb_core/client.hpp"
  #include "cyborgdb_core/encrypted_index.hpp"

  // Build a DiskIVF config: dimension 768, float16 rerank storage
  cyborg::IndexDiskIVF index_config(
      /*dimension=*/768,
      /*embedding_model=*/"",
      cyborg::StoragePrecision::Float16);   // halve disk footprint vs. Float32

  // Create the index with a cosine metric
  auto index = client.CreateIndex(
      "test_index", index_key, index_config, cyborg::DistanceMetric::Cosine);
  ```
</CodeGroup>

<Tip>Use `float16` storage precision when disk footprint matters more than the last fraction of a percent of recall. For most workloads the recall difference is negligible.</Tip>

***

## Training Parameters

For datasets larger than \~50,000 vectors, you should train the index to build the IVF clustering. Training accepts several optional tuning parameters:

* `n_lists`: the number of clusters (inverted lists). `0` (default) auto-selects a value based on the dataset size. More lists make each list smaller (faster, finer-grained search) but require more `n_probes` at query time to maintain recall.
* `max_iters`: maximum k-means iterations (default `100`).
* `tolerance`: convergence tolerance for k-means (default `1e-6`).
* `max_memory`: a soft cap (in MB) on memory used during training; `0` (default) means no limit.
* `batch_size`: training batch size; `0` (default) lets CyborgDB choose automatically.

<CodeGroup>
  ```python Python icon="python" theme={null}
  # Train the index with custom clustering parameters
  index.train(
      n_lists=4096,
      max_iters=100,
      tolerance=1e-6,
      max_memory=0      # no limit
  )
  ```

  ```cpp C++ icon="brackets-curly" theme={null}
  // Train the index with custom clustering parameters
  cyborg::TrainingConfig training_config(
      /*n_lists=*/4096,
      /*batch_size=*/0,     // auto
      /*max_iters=*/100,
      /*tolerance=*/1e-6,
      /*max_memory=*/0);    // no limit

  index->TrainIndex(training_config, index_key);
  ```
</CodeGroup>

For more on the training lifecycle, see [Training an Encrypted Index](../encrypted-indexes/train-index).

***

## Query-Time Parameters

DiskIVF exposes two knobs at query time that trade recall against latency:

* `n_probes`: how many clusters to search per query. Higher values increase recall at some latency cost. `0` (default) auto-selects based on `n_lists`.
* `rerank_mult`: the stage-1 retrieval multiplier. CyborgDB first retrieves `rerank_mult * top_k` candidates using the compact PQ representation, then reranks them against the stored `float32`/`float16` vectors. Higher values improve recall at some latency cost (default `10`).

<CodeGroup>
  ```python Python icon="python" theme={null}
  # Tune recall vs. latency at query time
  results = index.query(
      query_vectors=[0.5, 0.9, 0.2, 0.7],
      top_k=10,
      n_probes=32,
      rerank_mult=10
  )
  ```

  ```cpp C++ icon="brackets-curly" theme={null}
  // Tune recall vs. latency at query time
  cyborg::Array2D<float> query_vectors{{0.5, 0.9, 0.2, 0.7}};
  cyborg::QueryParams query_params(
      /*top_k=*/10,
      /*n_probes=*/32,
      /*filters=*/"",
      /*include=*/{},
      /*greedy=*/false,
      /*rerank_mult=*/10);

  cyborg::QueryResults results = index->Query(query_vectors, query_params, index_key);
  ```
</CodeGroup>

***

## Customizing Distance Metrics

By default, CyborgDB uses `euclidean` distance. You can override this by providing a `metric` parameter at index creation:

<CodeGroup>
  ```python Python icon="python" theme={null}
  # Existing setup ...

  index = client.create_index(
      "index_name",
      index_key,
      metric="cosine"
  )
  ```

  ```cpp C++ icon="brackets-curly" theme={null}
  // Existing setup ...

  auto index = client.CreateIndex("index_name", index_key, cyborg::DistanceMetric::Cosine);
  ```
</CodeGroup>

The currently supported distance metrics are:

* `"cosine"`: Cosine similarity.
* `"euclidean"`: Euclidean distance.
* `"squared_euclidean"`: Squared Euclidean distance.

***

## API Reference

For more information on configuring an encrypted index, refer to the API Reference:

<CardGroup cols={2}>
  <Card title="Python API Reference" href="../../python/types" icon="python">
    API reference for `StoragePrecision` and index types in Python
  </Card>

  <Card title="C++ API Reference" href="../../cpp/types" icon="brackets-curly">
    API reference for `IndexDiskIVF` and `StoragePrecision` in C++
  </Card>
</CardGroup>
