DBConfig
The DBConfig
class specifies the storage location for the index, with options for in-memory storage, databases, or file-based storage.
Parameters
Parameter | Type | Default | Description |
---|
location | string | - | DB location (redis , postgres , memory ) |
table_name | string | None | (Optional) Table name (postgres -only) |
connection_string | string | None | (Optional) Connection string to access DB. |
The supported location
options are:
"redis"
: Use for high-speed, in-memory storage (recommended for index_location
).
"postgres"
: Use for reliable, SQL-based storage (recommended for config_location
).
"memory"
Use for temporary in-memory storage (for benchmarking and evaluation purposes).
Example Usage
import cyborgdb_core as cyborgdb
index_location = cyborgdb.DBConfig(location="redis",
connection_string="redis://localhost")
config_location = cyborgdb.DBConfig(location="postgres",
table_name="config_table", connection_string="host=localhost dbname=postgres")
For more info, you can read about supported backing stores here.
DistanceMetric
DistanceMetric
is a string representing the distance metric used for the index. Options include:
"cosine"
: Cosine similarity.
"euclidean"
: Euclidean distance.
"squared_euclidean"
: Squared Euclidean distance.
IndexConfig
The IndexConfig
class defines the parameters for the type of index to be created. Each index type (e.g., ivf
, ivfflat
, ivfpq
) has unique configuration options:
IndexIVF
Ideal for large-scale datasets where fast retrieval is prioritized over high recall:
Speed | Recall | Index Size |
---|
Fastest | Lowest | Smallest |
Parameters
Parameter | Type | Description |
---|
dimension | int | Dimensionality of vector embeddings. |
n_lists | int | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | str | (Optional) Distance metric to use for index build and queries. |
Example Usage
import cyborgdb_core as cyborgdb
index_config = cyborgdb.IndexIVF(dimension=128, n_lists=1024, metric="euclidean")
IndexIVFFlat
Suitable for applications requiring high recall with less concern for memory usage:
Speed | Recall | Index Size |
---|
Fast | Highest | Biggest |
Parameters
Parameter | Type | Description |
---|
dimension | int | Dimensionality of vector embeddings. |
n_lists | int | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | str | (Optional) Distance metric to use for index build and queries. |
Example Usage
import cyborgdb_core as cyborgdb
index_config = cyborgdb.IndexIVFFlat(dimension=128, n_lists=1024, metric="euclidean")
IndexIVFPQ
Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:
Speed | Recall | Index Size |
---|
Fast | High | Medium |
Parameters
Parameter | Type | Description |
---|
dimension | int | Dimensionality of vector embeddings. |
n_lists | int | Number of inverted index lists to create in the index (recommended base-2 value). |
pq_dim | int | Dimensionality of embeddings after quantization (less than or equal to dimension ). |
pq_bits | int | Number of bits per dimension for PQ embeddings (between 1 and 16). |
metric | str | (Optional) Distance metric to use for index build and queries. |
Example Usage
import cyborgdb_core as cyborgdb
index_config = cyborgdb.IndexIVFPQ(dimension=128, n_lists=1024, pq_dim=64, pq_bits=8, metric="euclidean")
If embedding_model
is defined in create_index()
, then dimension
is unncessary in IndexConfig
.