DBConfig
The DBConfig class specifies the storage location for the index, with options for in-memory storage, databases, or file-based storage.
Parameters
| Parameter | Type | Default | Description |
|---|
location | string | - | DB location (redis, postgres, memory, rocksdb, threadsafememory) |
table_name | string | None | (Optional) Table name (postgres-only) |
connection_string | string | None | (Optional) Connection string to access DB. |
The supported location options are:
"rocksdb": Use for persistent local storage with no network dependency (recommended for embedded/local deployments).
"redis": Use for high-speed, in-memory storage (recommended for index_location).
"postgres": Use for reliable, SQL-based storage (recommended for config_location).
"memory": Use for temporary in-memory storage (for benchmarking and evaluation purposes).
"threadsafememory": Use for thread-safe in-memory storage (for multi-threaded benchmarking and evaluation).
memory is deprecated and will be removed in a future release. Please use threadsafememory instead.
Example Usage
import cyborgdb_core as cyborgdb
index_location = cyborgdb.DBConfig(
location="redis",
connection_string="redis://localhost"
)
config_location = cyborgdb.DBConfig(
location="postgres",
table_name="config_table",
connection_string="host=localhost dbname=postgres"
)
For more info, you can read about supported backing stores here.
GPUConfig
The GPUConfig class configures which operations should use GPU acceleration. GPU acceleration requires CUDA support.
Parameters
| Parameter | Type | Default | Description |
|---|
upsert | bool | False | (Optional) Enable GPU for upsert operations |
train | bool | False | (Optional) Enable GPU for training operations |
Properties (Read-Only)
| Property | Type | Description |
|---|
upsert | bool | Whether GPU is enabled for upsert operations |
train | bool | Whether GPU is enabled for training operations |
query | bool | Whether GPU is enabled for query operations |
all | bool | Whether all GPU operations are enabled |
none | bool | Whether no GPU operations are enabled |
Example Usage
import cyborgdb_core as cyborgdb
# Enable GPU for upsert and training
gpu_config1 = cyborgdb.GPUConfig(upsert=True, train=True)
# Enable GPU only for training
gpu_config2 = cyborgdb.GPUConfig(train=True)
# Disable GPU (default)
gpu_config3 = cyborgdb.GPUConfig()
# Check GPU configuration
if gpu_config1.all:
print("All GPU operations enabled")
if gpu_config2.train:
print("GPU enabled for training")
DistanceMetric
DistanceMetric is a string representing the distance metric used for the index. Options include:
"cosine": Cosine similarity.
"euclidean": Euclidean distance.
"squared_euclidean": Squared Euclidean distance.
IndexConfig
The IndexConfig class defines the parameters for the type of index to be created. Each index type (e.g., ivf, ivfflat, ivfpq) has unique configuration options:
IndexIVF
The IndexIVF type is deprecated and will be removed in a future release.
Ideal for large-scale datasets where fast retrieval is prioritized over high recall:
| Speed | Recall | Index Size |
|---|
| Fastest | Lowest | Smallest |
Parameters
| Parameter | Type | Default | Description |
|---|
dimension | int | None | (Optional) Dimensionality of vector embeddings. When None, auto-detected from the first upsert. |
Properties (Read-Only)
| Property | Type | Description |
|---|
n_lists | int | Number of inverted lists (coarse clusters). Set internally during training, initially 1. |
dimension | int | Dimensionality of vector embeddings. |
metric | str | Distance metric used. |
index_type | str | Returns “ivf”. |
Example Usage
import cyborgdb_core as cyborgdb
# Basic configuration with auto-detection
index_config = cyborgdb.IndexIVF()
# Explicit dimension configuration
index_config = cyborgdb.IndexIVF(dimension=128)
# Access read-only properties
print(f"n_lists: {index_config.n_lists}") # Will show 1 (default)
print(f"dimension: {index_config.dimension}") # Will show 128
IndexIVFFlat
Suitable for applications requiring high recall with less concern for memory usage:
| Speed | Recall | Index Size |
|---|
| Fast | Highest | Biggest |
Parameters
| Parameter | Type | Default | Description |
|---|
dimension | int | None | (Optional) Dimensionality of vector embeddings. When None, auto-detected from the first upsert. |
Properties (Read-Only)
| Property | Type | Description |
|---|
n_lists | int | Number of inverted lists (coarse clusters). Set internally during training, initially 1. |
dimension | int | Dimensionality of vector embeddings. |
metric | str | Distance metric used. |
index_type | str | Returns “ivfflat”. |
Example Usage
import cyborgdb_core as cyborgdb
# Basic configuration with auto-detection
index_config = cyborgdb.IndexIVFFlat()
# Explicit dimension configuration
index_config = cyborgdb.IndexIVFFlat(dimension=128)
# Access read-only properties
print(f"n_lists: {index_config.n_lists}") # Will show 1 (default)
print(f"dimension: {index_config.dimension}") # Will show 128
IndexIVFPQ
Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:
| Speed | Recall | Index Size |
|---|
| Fast | High | Medium |
Parameters
| Parameter | Type | Default | Description |
|---|
dimension | int | None | (Optional) Dimensionality of vector embeddings. When None, auto-detected from the first upsert. |
pq_dim | int | None | (Optional) Dimensionality of PQ codes after quantization. When None, set to dimension during first upsert. |
pq_bits | int | None | (Optional) Number of bits per quantizer (between 1 and 16). When None, defaults to 8. |
Properties (Read-Only)
| Property | Type | Description |
|---|
n_lists | int | Number of inverted lists (coarse clusters). Set internally during training, initially 1. |
dimension | int | Dimensionality of vector embeddings. |
metric | str | Distance metric used. |
index_type | str | Returns “ivfpq”. |
Methods
| Method | Return Type | Description |
|---|
pq_dim() | int | Dimensionality of PQ codes after quantization. |
pq_bits() | int | Number of bits per quantizer. |
Example Usage
import cyborgdb_core as cyborgdb
# Basic configuration (all parameters auto-detected)
index_config = cyborgdb.IndexIVFPQ()
# Explicit configuration
index_config = cyborgdb.IndexIVFPQ(
dimension=128,
pq_dim=64,
pq_bits=8
)
# Access read-only properties
print(f"n_lists: {index_config.n_lists}") # Will show 1 (default)
print(f"pq_dim: {index_config.pq_dim()}") # Will show 64
print(f"pq_bits: {index_config.pq_bits()}") # Will show 8
If dimension or pq_dim is not provided, it will be auto-determined based on the first vector embedding added to the index.
IndexIVFSQ
Scalar Quantization compresses embeddings, providing a good balance of speed, recall, and index size:
| Speed | Recall | Index Size |
|---|
| Fast | High | Small |
Parameters
| Parameter | Type | Default | Description |
|---|
dimension | int | None | (Optional) Dimensionality of vector embeddings. When None, auto-detected from the first upsert. |
sq_bits | int | 16 | (Optional) Number of bits for scalar quantization (8 or 16). |
Properties (Read-Only)
| Property | Type | Description |
|---|
n_lists | int | Number of inverted lists (coarse clusters). Set internally during training, initially 1. |
dimension | int | Dimensionality of vector embeddings. |
metric | str | Distance metric used. |
index_type | str | Returns “ivfsq”. |
Methods
| Method | Return Type | Description |
|---|
sq_bits() | int | Number of bits for scalar quantization. |
Example Usage
import cyborgdb_core as cyborgdb
# Basic configuration with auto-detection (default 8-bit SQ)
index_config = cyborgdb.IndexIVFSQ()
# 16-bit scalar quantization (higher precision, larger index)
index_config = cyborgdb.IndexIVFSQ(sq_bits=16)
# Explicit dimension configuration
index_config = cyborgdb.IndexIVFSQ(dimension=128, sq_bits=8)
# Access read-only properties
print(f"n_lists: {index_config.n_lists}") # Will show 1 (default)
print(f"dimension: {index_config.dimension}") # Will show 128
print(f"sq_bits: {index_config.sq_bits()}") # Will show 8
IndexIVFSQ is the default index configuration (with sq_bits=16) and is suitable for most use cases. It provides a good balance of recall, speed, and index size.