Index configuration is automatically handled by default. This guide allows you to override these defaults to customize index behavior & performance characteristics.
CyborgDB offers three index types, all of which offer varying characteristics:
Index TypeSpeedRecallIndex Size
IVFFlatFastHighestBiggest
IVFPQFastHighMedium
IVFFastestLowestSmallest
Generally-speaking, we recommend that you start with IVFFlat, which provides the highest recall (accuracy) with high indexing and retrieval speeds. For most use-cases, this index type will scale well. If minimizing index size is important, IVFPQ takes IVFFlat and applies Product-Quantization (PQ, a form of lossy compression) to the vector embeddings which can greatly reduce index size.

IVFFlat Index Type

The IVFFlat index type improves IVF significantly by storing encrypted vector embeddings in the index. In addition to selecting the closest clusters for a query vector, the exact distance can be computed between each candidate vector and the query, yielding very high recall rates (up to >99%). This comes at the cost of index size and some search speed. We recommend IVFFlat indexes for most applications, as it provides the highest recall rates, and it’s possible to mitigate index size constraints later via IVFPQ. To create an IVFFlat index, you can use its configuration constructor:
import cyborgdb_core as cyborgdb
# or import cyborgdb_lite as cyborgdb

# Create the IVF index config
index_config = cyborgdb.IndexIVFFlat()

# Create the index
index_name = "test_index"
index_key = bytes([0] * 32) # Set your private key here
index = client.create_index(
    index_name=index_name,
    index_key=index_key,
    index_config=index_config)

IVFPQ Index Type

The IVFPQ index type is not supported in cyborgdb_lite. For more details, see CyborgDB Lite.
The IVFPQ index type builds upon IVFFlat by applying Product Quantization (PQ) - a form of lossy compression - to reduce the index size. When applied correctly, IVFPQ indexes can maintain high recall (>95%) while reducing index size significantly (2-4x). We recommend IVFPQ indexes for mature applications, where the dataset and query distributions are well-established. This is because IVFPQ requires the most tuning to yield an ideal balance between recall and index size. It is possible to go from IVFFlat to IVFPQ on the same index, but not vice-versa. To create an IVFPQ index, you can use its configuration constructor:
import cyborgdb_core as cyborgdb
# or import cyborgdb_lite as cyborgdb

# Set index parameters
pq_dim = 32 # Dimension must be divisible by pq_dim
pq_bits = 8 # Number of bits for each pq dimension

# Create the IVF index config
index_config = cyborgdb.IndexIVFPQ(pq_dim, pq_bits)

# Create the index
index_name = "test_index"
index_key = bytes([0] * 32) # Set your private key here
index = client.create_index(
    index_name=index_name,
    index_key=index_key,
    index_config=index_config
)
pq_dim is the number of dimensionality for each vector after product-quantization. It must be between 1 and dimension, and dimension must be cleanly divisible by pq_dim. Lower pq_dim will yield smaller index sizes but lower recall. pq_bits is the number of bits that will be used to represent each dimension of the product-quantized vector embeddings. It must be between 1 and 16, with lower values yielding smaller index sizes but lower recall.

IVF Index Type

The IVF index type is not supported in cyborgdb_lite. For more details, see CyborgDB Lite.
The IVF index type (Inverted File Index) is the simplest offered by CyborgDB. We recommend IVF indexes for applications which require high-speed, low-latency search with low recall requirements (or where top_k is rather large, i.e. >500). To create an IVF index, you can use its configuration constructor:
import cyborgdb_core as cyborgdb
# or import cyborgdb_lite as cyborgdb

# Create the IVF index config
index_config = cyborgdb.IndexIVF()

# Create the index
index_name = "test_index"
index_key = bytes([0] * 32) # Set your private key here
index = client.create_index(
    index_name=index_name,
    index_key=index_key,
    index_config=index_config
)

Customizing Distance Metrics

By default, CyborgDB uses euclidean distance as its metric for all index types. You can override this default by providing a distance_metric parameter to any of the index constructors. For example:
# Existing setup ...

index_config = cyborgdb.IndexIVFFlat()

index = client.create_index(
    index_name="index_name", 
    index_key=index_key, 
    index_config=index_config, 
    metric="euclidean"
)
The currently supported distance metrics are:
  • "cosine": Cosine similarity.
  • "euclidean": Euclidean distance.
  • "squared_euclidean": Squared Euclidean distance.

API Reference

For more information on the IndexConfig classes, refer to the API Reference: