The Location
enum contains the supported index backing store locations for CyborgDB. These are:
DBConfig
defines the storage location for various index components.
Parameter | Type | Description |
---|---|---|
location | Location | Specifies the type of storage location. |
table_name | std::string | (Optional) Name of the table in the database, if applicable. |
db_connection_string | std::string | (Optional) Connection string for database access, if applicable. |
For more info, you can read about supported backing stores here.
The DistanceMetric
enum contains the supported distance metrics for CyborgDB. These are:
IndexConfig
is an abstract base class for configuring index types. The three derived classes can be used to configure indexes:
IndexConfig
and params, refer to the index configuration tuning guide.Ideal for large-scale datasets where fast retrieval is prioritized over high recall:
Speed | Recall | Index Size |
---|---|---|
Fastest | Lowest | Smallest |
Parameter | Type | Description |
---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Suitable for applications requiring high recall with less concern for memory usage:
Speed | Recall | Index Size |
---|---|---|
Fast | Highest | Biggest |
Parameter | Type | Description |
---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:
Speed | Recall | Index Size |
---|---|---|
Fast | High | Medium |
Parameter | Type | Description |
---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
pq_dim | size_t | Dimensionality of embeddings after quantization (less than or equal to dimension ). |
pq_bits | size_t | Number of bits per dimension for PQ embeddings (between 1 and 16). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Array2D
class provides a 2D container for data, which can be initialized with a specific number of rows and columns, or from an existing vector.
Array2D(size_t rows, size_t cols, const T& initial_value = T())
: Creates an empty 2D array with specified dimensions.Array2D(std::vector<T>&& data, size_t cols)
: Initializes the 2D array from a 1D vector.Array2D(const std::vector<T>& data, size_t cols)
: Initializes the 2D array from a 1D vector (copy).operator()(size_t row, size_t col) const
: Access an element at the specified row and column (read-only).operator()(size_t row, size_t col)
: Access an element at the specified row and column (read-write).size_t rows() const
: Returns the number of rows.size_t cols() const
: Returns the number of columns.size_t size() const
: Returns the total number of elements.The TrainingConfig
struct defines parameters for training an index, allowing control over convergence and memory usage.
Parameter | Type | Description |
---|---|---|
batch_size | size_t | (Optional) Size of each batch for training. Defaults to 0 , which auto-selects the batch size. |
max_iters | size_t | (Optional) Maximum iterations for training. Defaults to 0 , which auto-selects iterations. |
tolerance | double | (Optional) Convergence tolerance for training. Defaults to 1e-6 . |
max_memory | size_t | (Optional) Maximum memory (MB) usage during training. Defaults to 0 , no limit. |
The QueryParams
struct defines parameters for querying the index, controlling the number of results and probing behavior.
Parameter | Type | Description |
---|---|---|
top_k | size_t | (Optional) Number of nearest neighbors to return. Defaults to 100 . |
n_probes | size_t | (Optional) Number of lists to probe during query. Defaults to 1 . |
include | std::vector<ResultFields> | (Optional) List of item fields to return. Can include kDistance and kMetadata . Defaults to all. |
filters | std::string | (Optional) A JSON string of filters to apply to vector metadata, limiting search scope to these vectors. |
greedy | bool | (Optional) Whether to perform greedy search. Defaults to false . |
Higher n_probes values may improve recall but could slow down query time, so select a value based on desired recall and performance trade-offs. For guidance on how to select the right n_probes
, refer to the query parameter tuning guide.
filters
use a subset of the MongoDB Query and Projection Operators.
For instance: filters: { "$and": [ { "label": "cat" }, { "confidence": { "$gte": 0.9 } } ] }
means that only vectors where label == "cat"
and confidence >= 0.9
will be considered for encrypted vector search.
For more info on metadata, see Metadata Filtering.QueryResults
class holds the results from a Query
operation, including IDs and distances for the nearest neighbors of each query.
Method | Return Type | Description |
---|---|---|
Result operator[](size_t query_idx) | Result | Returns read-write access to IDs and distances for a specific query. |
const std::vector<std::vector<std::string>>& ids() const | std::vector<std::vector<std::string>>& | Get read-only access to all IDs. |
const Array2D<float>& distances() const | const Array2D<float>& | Get read-only access to all distances. |
const std::vector<float>& vectors() const | const std::vectorfloat>& | Get read-only access to all vectors. |
const std::vector<std::vector<std::string>>& metadatas() const | const std::vector<std::vector<std::string>>& | Get read-only access to all metadatas. |
size_t num_queries() const | size_t | Returns the number of queries. |
size_t top_k() const | size_t | Returns the number of top-k items per query. |
bool empty() const | bool | Checks if the results are empty. |
Item
Item
struct holds the individual results from a Get
operation, including the requested fields.
ItemFields
enum defines the fields that can be requested for an Item
object.
By default, ids
are always included in the returned items.
The Location
enum contains the supported index backing store locations for CyborgDB. These are:
DBConfig
defines the storage location for various index components.
Parameter | Type | Description |
---|---|---|
location | Location | Specifies the type of storage location. |
table_name | std::string | (Optional) Name of the table in the database, if applicable. |
db_connection_string | std::string | (Optional) Connection string for database access, if applicable. |
For more info, you can read about supported backing stores here.
The DistanceMetric
enum contains the supported distance metrics for CyborgDB. These are:
IndexConfig
is an abstract base class for configuring index types. The three derived classes can be used to configure indexes:
IndexConfig
and params, refer to the index configuration tuning guide.Ideal for large-scale datasets where fast retrieval is prioritized over high recall:
Speed | Recall | Index Size |
---|---|---|
Fastest | Lowest | Smallest |
Parameter | Type | Description |
---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Suitable for applications requiring high recall with less concern for memory usage:
Speed | Recall | Index Size |
---|---|---|
Fast | Highest | Biggest |
Parameter | Type | Description |
---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:
Speed | Recall | Index Size |
---|---|---|
Fast | High | Medium |
Parameter | Type | Description |
---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
pq_dim | size_t | Dimensionality of embeddings after quantization (less than or equal to dimension ). |
pq_bits | size_t | Number of bits per dimension for PQ embeddings (between 1 and 16). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Array2D
class provides a 2D container for data, which can be initialized with a specific number of rows and columns, or from an existing vector.
Array2D(size_t rows, size_t cols, const T& initial_value = T())
: Creates an empty 2D array with specified dimensions.Array2D(std::vector<T>&& data, size_t cols)
: Initializes the 2D array from a 1D vector.Array2D(const std::vector<T>& data, size_t cols)
: Initializes the 2D array from a 1D vector (copy).operator()(size_t row, size_t col) const
: Access an element at the specified row and column (read-only).operator()(size_t row, size_t col)
: Access an element at the specified row and column (read-write).size_t rows() const
: Returns the number of rows.size_t cols() const
: Returns the number of columns.size_t size() const
: Returns the total number of elements.The TrainingConfig
struct defines parameters for training an index, allowing control over convergence and memory usage.
Parameter | Type | Description |
---|---|---|
batch_size | size_t | (Optional) Size of each batch for training. Defaults to 0 , which auto-selects the batch size. |
max_iters | size_t | (Optional) Maximum iterations for training. Defaults to 0 , which auto-selects iterations. |
tolerance | double | (Optional) Convergence tolerance for training. Defaults to 1e-6 . |
max_memory | size_t | (Optional) Maximum memory (MB) usage during training. Defaults to 0 , no limit. |
The QueryParams
struct defines parameters for querying the index, controlling the number of results and probing behavior.
Parameter | Type | Description |
---|---|---|
top_k | size_t | (Optional) Number of nearest neighbors to return. Defaults to 100 . |
n_probes | size_t | (Optional) Number of lists to probe during query. Defaults to 1 . |
include | std::vector<ResultFields> | (Optional) List of item fields to return. Can include kDistance and kMetadata . Defaults to all. |
filters | std::string | (Optional) A JSON string of filters to apply to vector metadata, limiting search scope to these vectors. |
greedy | bool | (Optional) Whether to perform greedy search. Defaults to false . |
Higher n_probes values may improve recall but could slow down query time, so select a value based on desired recall and performance trade-offs. For guidance on how to select the right n_probes
, refer to the query parameter tuning guide.
filters
use a subset of the MongoDB Query and Projection Operators.
For instance: filters: { "$and": [ { "label": "cat" }, { "confidence": { "$gte": 0.9 } } ] }
means that only vectors where label == "cat"
and confidence >= 0.9
will be considered for encrypted vector search.
For more info on metadata, see Metadata Filtering.QueryResults
class holds the results from a Query
operation, including IDs and distances for the nearest neighbors of each query.
Method | Return Type | Description |
---|---|---|
Result operator[](size_t query_idx) | Result | Returns read-write access to IDs and distances for a specific query. |
const std::vector<std::vector<std::string>>& ids() const | std::vector<std::vector<std::string>>& | Get read-only access to all IDs. |
const Array2D<float>& distances() const | const Array2D<float>& | Get read-only access to all distances. |
const std::vector<float>& vectors() const | const std::vectorfloat>& | Get read-only access to all vectors. |
const std::vector<std::vector<std::string>>& metadatas() const | const std::vector<std::vector<std::string>>& | Get read-only access to all metadatas. |
size_t num_queries() const | size_t | Returns the number of queries. |
size_t top_k() const | size_t | Returns the number of top-k items per query. |
bool empty() const | bool | Checks if the results are empty. |
Item
Item
struct holds the individual results from a Get
operation, including the requested fields.
ItemFields
enum defines the fields that can be requested for an Item
object.
By default, ids
are always included in the returned items.