Skip to main content

DBConfig

The DBConfig class specifies the storage location for the index, with options for in-memory storage, databases, or file-based storage.

Parameters

ParameterTypeDefaultDescription
locationstring-DB location (redis, postgres, memory)
table_namestringNone(Optional) Table name (postgres-only)
connection_stringstringNone(Optional) Connection string to access DB.
The supported location options are:
  • "redis": Use for high-speed, in-memory storage (recommended for index_location).
  • "postgres": Use for reliable, SQL-based storage (recommended for config_location).
  • "memory" Use for temporary in-memory storage (for benchmarking and evaluation purposes).

Example Usage

import cyborgdb_core as cyborgdb

index_location = cyborgdb.DBConfig(
    location="redis",
    connection_string="redis://localhost"
)

config_location = cyborgdb.DBConfig(
    location="postgres",
    table_name="config_table",
    connection_string="host=localhost dbname=postgres"
)
For more info, you can read about supported backing stores here.

GPUConfig

The GPUConfig class configures which operations should use GPU acceleration. GPU acceleration requires CUDA support.

Parameters

ParameterTypeDefaultDescription
upsertboolFalse(Optional) Enable GPU for upsert operations
trainboolFalse(Optional) Enable GPU for training operations
The query parameter is not available in the constructor. To enable GPU for query operations, you must enable both upsert=True and train=True, or use the bitflag operations directly in C++.

Properties (Read-Only)

PropertyTypeDescription
upsertboolWhether GPU is enabled for upsert operations
trainboolWhether GPU is enabled for training operations
queryboolWhether GPU is enabled for query operations
allboolWhether all GPU operations are enabled
noneboolWhether no GPU operations are enabled

Example Usage

import cyborgdb_core as cyborgdb

# Enable GPU for upsert and training (query will also be enabled via properties)
gpu_config1 = cyborgdb.GPUConfig(upsert=True, train=True)

# Enable GPU only for training
gpu_config2 = cyborgdb.GPUConfig(train=True)

# Disable GPU (default)
gpu_config3 = cyborgdb.GPUConfig()

# Check GPU configuration
if gpu_config1.all:
    print("All GPU operations enabled")

if gpu_config2.train and gpu_config2.query:
    print("GPU enabled for training and query")

DistanceMetric

DistanceMetric is a string representing the distance metric used for the index. Options include:
  • "cosine": Cosine similarity.
  • "euclidean": Euclidean distance.
  • "squared_euclidean": Squared Euclidean distance.

IndexConfig

The IndexConfig class defines the parameters for the type of index to be created. Each index type (e.g., ivf, ivfflat, ivfpq) has unique configuration options:

IndexIVF

Ideal for large-scale datasets where fast retrieval is prioritized over high recall:
SpeedRecallIndex Size
FastestLowestSmallest

Parameters

ParameterTypeDefaultDescription
dimensionint0(Optional) Dimensionality of vector embeddings. Auto-detected if 0.

Properties (Read-Only)

PropertyTypeDescription
n_listsintNumber of inverted lists (coarse clusters). Set internally during training, initially 1.
dimensionintDimensionality of vector embeddings.
metricstrDistance metric used.
index_typestrReturns “ivf”.

Example Usage

import cyborgdb_core as cyborgdb

# Basic configuration with auto-detection
index_config = cyborgdb.IndexIVF()

# Explicit dimension configuration
index_config = cyborgdb.IndexIVF(dimension=128)

# Access read-only properties
print(f"n_lists: {index_config.n_lists}")  # Will show 1 (default)
print(f"dimension: {index_config.dimension}")  # Will show 128

IndexIVFFlat

Suitable for applications requiring high recall with less concern for memory usage:
SpeedRecallIndex Size
FastHighestBiggest

Parameters

ParameterTypeDefaultDescription
dimensionint0(Optional) Dimensionality of vector embeddings. Auto-detected if 0.

Properties (Read-Only)

PropertyTypeDescription
n_listsintNumber of inverted lists (coarse clusters). Set internally during training, initially 1.
dimensionintDimensionality of vector embeddings.
metricstrDistance metric used.
index_typestrReturns “ivfflat”.

Example Usage

import cyborgdb_core as cyborgdb

# Basic configuration with auto-detection
index_config = cyborgdb.IndexIVFFlat()

# Explicit dimension configuration
index_config = cyborgdb.IndexIVFFlat(dimension=128)

# Access read-only properties
print(f"n_lists: {index_config.n_lists}")  # Will show 1 (default)
print(f"dimension: {index_config.dimension}")  # Will show 128
IndexIVFFlat is the default index configuration and is suitable for most use cases.

IndexIVFPQ

Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:
SpeedRecallIndex Size
FastHighMedium

Parameters

ParameterTypeDefaultDescription
dimensionintNone(Optional) Dimensionality of vector embeddings. Auto-detected if not provided.
pq_dimint-(Required) Dimensionality of PQ codes after quantization.
pq_bitsint-(Required) Number of bits per quantizer (between 1 and 16).

Properties (Read-Only)

PropertyTypeDescription
n_listsintNumber of inverted lists (coarse clusters). Set internally during training, initially 1.
dimensionintDimensionality of vector embeddings.
metricstrDistance metric used.
index_typestrReturns “ivfpq”.
pq_dimintDimensionality of PQ codes after quantization.
pq_bitsintNumber of bits per quantizer.

Example Usage

import cyborgdb_core as cyborgdb

# Basic configuration (dimension auto-detected)
index_config = cyborgdb.IndexIVFPQ(pq_dim=64, pq_bits=8)

# Explicit configuration
index_config = cyborgdb.IndexIVFPQ(
    dimension=128,
    pq_dim=64, 
    pq_bits=8
)

# Access read-only properties
print(f"n_lists: {index_config.n_lists}")  # Will show 1 (default)
print(f"pq_dim: {index_config.pq_dim}")    # Will show 64
print(f"pq_bits: {index_config.pq_bits}")  # Will show 8
If dimension is not provided, it will be auto-determined based on the first vector embedding added to the index.