Index configuration is automatically handled by default. This guide allows you to override these defaults to customize index behavior & performance characteristics.
CyborgDB offers three index types, all of which offer varying characteristics:
Index TypeSpeedRecallIndex Size
IVFFlatFastHighestBiggest
IVFPQFastHighMedium
IVFFastestLowestSmallest
Generally-speaking, we recommend that you start with IVFFlat, which provides the highest recall (accuracy) with high indexing and retrieval speeds. For most use-cases, this index type will scale well. If minimizing index size is important, IVFPQ takes IVFFlat and applies Product-Quantization (PQ, a form of lossy compression) to the vector embeddings which can greatly reduce index size.

IVFFlat Index Type

The IVFFlat index type improves IVF significantly by storing encrypted vector embeddings in the index. In addition to selecting the closest clusters for a query vector, the exact distance can be computed between each candidate vector and the query, yielding very high recall rates (up to >99%). This comes at the cost of index size and some search speed. We recommend IVFFlat indexes for most applications, as it provides the highest recall rates, and it’s possible to mitigate index size constraints later via IVFPQ. To create an IVFFlat index, you can use its configuration constructor:
from cyborgdb import Client, IndexIVFFlat

# Create a client
client = Client(
    base_url='http://localhost:8000', 
    api_key='your-api-key'
)

# Create the IVFFlat index config
index_config = IndexIVFFlat()

# Generate encryption key and create the index
index_name = "test_index"
index_key = client.generate_key()  # Generate secure 32-byte key
index = client.create_index(
    index_name=index_name, 
    index_key=index_key, 
    index_config=index_config
)

IVFPQ Index Type

The IVFPQ index type is only supported in paid plans.
The IVFPQ index type builds upon IVFFlat by applying Product Quantization (PQ) - a form of lossy compression - to reduce the index size. When applied correctly, IVFPQ indexes can maintain high recall (>95%) while reducing index size significantly (2-4x). We recommend IVFPQ indexes for mature applications, where the dataset and query distributions are well-established. This is because IVFPQ requires the most tuning to yield an ideal balance between recall and index size. It is possible to go from IVFFlat to IVFPQ on the same index, but not vice-versa. To create an IVFPQ index, you can use its configuration constructor:
from cyborgdb import Client, IndexIVFPQ

# Create a client
client = Client(
    base_url='http://localhost:8000', 
    api_key='your-api-key'
)

# Set index parameters
pq_dim = 32 # Dimension must be divisible by pq_dim
pq_bits = 8 # Number of bits for each pq dimension

# Create the IVFPQ index config
index_config = IndexIVFPQ(pq_dim, pq_bits)

# Generate encryption key and create the index
index_name = "test_index"
index_key = client.generate_key()  # Generate secure 32-byte key

index = client.create_index(
    index_name=index_name, 
    index_key=index_key, 
    index_config=index_config
)
pq_dim is the number of dimensionality for each vector after product-quantization. It must be between 1 and dimension, and dimension must be cleanly divisible by pq_dim. Lower pq_dim will yield smaller index sizes but lower recall. pq_bits is the number of bits that will be used to represent each dimension of the product-quantized vector embeddings. It must be between 1 and 16, with lower values yielding smaller index sizes but lower recall.

IVF Index Type

The IVF index type is only supported in paid plans.
The IVF index type (Inverted File Index) is the simplest offered by CyborgDB. We recommend IVF indexes for applications which require high-speed, low-latency search with low recall requirements (or where top_k is rather large, i.e. >500). To create an IVF index, you can use its configuration constructor:
from cyborgdb import Client, IndexIVF

# Create a client
client = Client(
    base_url='http://localhost:8000', 
    api_key='your-api-key'
)

# Create the IVF index config
index_config = IndexIVF()

# Generate encryption key and create the index
index_name = "test_index"
index_key = client.generate_key()  # Generate secure 32-byte key

index = client.create_index(
    index_name=index_name, 
    index_key=index_key, 
    index_config=index_config
)

Customizing Distance Metrics

By default, CyborgDB uses euclidean distance as its metric for all index types. You can override this default by providing a metric parameter to any of the index constructors. For example:
# Example with cosine similarity
index_config = IndexIVFFlat()

index = client.create_index(
    index_name="index_name", 
    index_key=index_key, 
    index_config=index_config, 
    metric="cosine"
)
The currently supported distance metrics are:
  • "cosine": Cosine similarity.
  • "euclidean": Euclidean distance.
  • "squared_euclidean": Squared Euclidean distance.

API Reference

For more information on the IndexConfig classes, refer to the API Reference: