Skip to main content

DBConfig

The DBConfig class specifies the storage location for the index, with options for in-memory storage, databases, or file-based storage.

Parameters

ParameterTypeDefaultDescription
locationstring-DB location (redis, postgres, memory, rocksdb, threadsafememory)
table_namestringNone(Optional) Table name (postgres-only)
connection_stringstringNone(Optional) Connection string to access DB.
The supported location options are:
  • "rocksdb": Use for persistent local storage with no network dependency (recommended for embedded/local deployments).
  • "redis": Use for high-speed, in-memory storage (recommended for index_location).
  • "postgres": Use for reliable, SQL-based storage (recommended for config_location).
  • "memory": Use for temporary in-memory storage (for benchmarking and evaluation purposes).
  • "threadsafememory": Use for thread-safe in-memory storage (for multi-threaded benchmarking and evaluation).
memory is deprecated and will be removed in a future release. Please use threadsafememory instead.

Example Usage

import cyborgdb_core as cyborgdb

index_location = cyborgdb.DBConfig(
    location="redis",
    connection_string="redis://localhost"
)

config_location = cyborgdb.DBConfig(
    location="postgres",
    table_name="config_table",
    connection_string="host=localhost dbname=postgres"
)
For more info, you can read about supported backing stores here.

GPUConfig

The GPUConfig class configures which operations should use GPU acceleration. GPU acceleration requires CUDA support.

Parameters

ParameterTypeDefaultDescription
upsertboolFalse(Optional) Enable GPU for upsert operations
trainboolFalse(Optional) Enable GPU for training operations

Properties (Read-Only)

PropertyTypeDescription
upsertboolWhether GPU is enabled for upsert operations
trainboolWhether GPU is enabled for training operations
queryboolWhether GPU is enabled for query operations
allboolWhether all GPU operations are enabled
noneboolWhether no GPU operations are enabled

Example Usage

import cyborgdb_core as cyborgdb

# Enable GPU for upsert and training
gpu_config1 = cyborgdb.GPUConfig(upsert=True, train=True)

# Enable GPU only for training
gpu_config2 = cyborgdb.GPUConfig(train=True)

# Disable GPU (default)
gpu_config3 = cyborgdb.GPUConfig()

# Check GPU configuration
if gpu_config1.all:
    print("All GPU operations enabled")

if gpu_config2.train:
    print("GPU enabled for training")

DistanceMetric

DistanceMetric is a string representing the distance metric used for the index. Options include:
  • "cosine": Cosine similarity.
  • "euclidean": Euclidean distance.
  • "squared_euclidean": Squared Euclidean distance.

IndexConfig

The IndexConfig class defines the parameters for the type of index to be created. Each index type (e.g., ivf, ivfflat, ivfpq) has unique configuration options:

IndexIVF

The IndexIVF type is deprecated and will be removed in a future release.
Ideal for large-scale datasets where fast retrieval is prioritized over high recall:
SpeedRecallIndex Size
FastestLowestSmallest

Parameters

ParameterTypeDefaultDescription
dimensionintNone(Optional) Dimensionality of vector embeddings. When None, auto-detected from the first upsert.

Properties (Read-Only)

PropertyTypeDescription
n_listsintNumber of inverted lists (coarse clusters). Set internally during training, initially 1.
dimensionintDimensionality of vector embeddings.
metricstrDistance metric used.
index_typestrReturns “ivf”.

Example Usage

import cyborgdb_core as cyborgdb

# Basic configuration with auto-detection
index_config = cyborgdb.IndexIVF()

# Explicit dimension configuration
index_config = cyborgdb.IndexIVF(dimension=128)

# Access read-only properties
print(f"n_lists: {index_config.n_lists}")  # Will show 1 (default)
print(f"dimension: {index_config.dimension}")  # Will show 128

IndexIVFFlat

Suitable for applications requiring high recall with less concern for memory usage:
SpeedRecallIndex Size
FastHighestBiggest

Parameters

ParameterTypeDefaultDescription
dimensionintNone(Optional) Dimensionality of vector embeddings. When None, auto-detected from the first upsert.

Properties (Read-Only)

PropertyTypeDescription
n_listsintNumber of inverted lists (coarse clusters). Set internally during training, initially 1.
dimensionintDimensionality of vector embeddings.
metricstrDistance metric used.
index_typestrReturns “ivfflat”.

Example Usage

import cyborgdb_core as cyborgdb

# Basic configuration with auto-detection
index_config = cyborgdb.IndexIVFFlat()

# Explicit dimension configuration
index_config = cyborgdb.IndexIVFFlat(dimension=128)

# Access read-only properties
print(f"n_lists: {index_config.n_lists}")  # Will show 1 (default)
print(f"dimension: {index_config.dimension}")  # Will show 128

IndexIVFPQ

Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:
SpeedRecallIndex Size
FastHighMedium

Parameters

ParameterTypeDefaultDescription
dimensionintNone(Optional) Dimensionality of vector embeddings. When None, auto-detected from the first upsert.
pq_dimintNone(Optional) Dimensionality of PQ codes after quantization. When None, set to dimension during first upsert.
pq_bitsintNone(Optional) Number of bits per quantizer (between 1 and 16). When None, defaults to 8.

Properties (Read-Only)

PropertyTypeDescription
n_listsintNumber of inverted lists (coarse clusters). Set internally during training, initially 1.
dimensionintDimensionality of vector embeddings.
metricstrDistance metric used.
index_typestrReturns “ivfpq”.

Methods

MethodReturn TypeDescription
pq_dim()intDimensionality of PQ codes after quantization.
pq_bits()intNumber of bits per quantizer.

Example Usage

import cyborgdb_core as cyborgdb

# Basic configuration (all parameters auto-detected)
index_config = cyborgdb.IndexIVFPQ()

# Explicit configuration
index_config = cyborgdb.IndexIVFPQ(
    dimension=128,
    pq_dim=64,
    pq_bits=8
)

# Access read-only properties
print(f"n_lists: {index_config.n_lists}")      # Will show 1 (default)
print(f"pq_dim: {index_config.pq_dim()}")      # Will show 64
print(f"pq_bits: {index_config.pq_bits()}")    # Will show 8
If dimension or pq_dim is not provided, it will be auto-determined based on the first vector embedding added to the index.

IndexIVFSQ

Scalar Quantization compresses embeddings, providing a good balance of speed, recall, and index size:
SpeedRecallIndex Size
FastHighSmall

Parameters

ParameterTypeDefaultDescription
dimensionintNone(Optional) Dimensionality of vector embeddings. When None, auto-detected from the first upsert.
sq_bitsint16(Optional) Number of bits for scalar quantization (8 or 16).

Properties (Read-Only)

PropertyTypeDescription
n_listsintNumber of inverted lists (coarse clusters). Set internally during training, initially 1.
dimensionintDimensionality of vector embeddings.
metricstrDistance metric used.
index_typestrReturns “ivfsq”.

Methods

MethodReturn TypeDescription
sq_bits()intNumber of bits for scalar quantization.

Example Usage

import cyborgdb_core as cyborgdb

# Basic configuration with auto-detection (default 8-bit SQ)
index_config = cyborgdb.IndexIVFSQ()

# 16-bit scalar quantization (higher precision, larger index)
index_config = cyborgdb.IndexIVFSQ(sq_bits=16)

# Explicit dimension configuration
index_config = cyborgdb.IndexIVFSQ(dimension=128, sq_bits=8)

# Access read-only properties
print(f"n_lists: {index_config.n_lists}")      # Will show 1 (default)
print(f"dimension: {index_config.dimension}")  # Will show 128
print(f"sq_bits: {index_config.sq_bits()}")    # Will show 8
IndexIVFSQ is the default index configuration (with sq_bits=16) and is suitable for most use cases. It provides a good balance of recall, speed, and index size.