Location
enum contains the supported index backing store locations for CyborgDB. These are:
DBConfig
defines the storage location for various index components.
Parameter | Type | Description |
---|---|---|
location | Location | Specifies the type of storage location. |
table_name | std::string | (Optional) Name of the table in the database, if applicable. |
db_connection_string | std::string | (Optional) Connection string for database access, if applicable. |
DistanceMetric
enum contains the supported distance metrics for CyborgDB. These are:
IndexConfig
is an abstract base class for configuring index types. The three derived classes can be used to configure indexes:
Speed | Recall | Index Size |
---|---|---|
Fastest | Lowest | Smallest |
Parameter | Type | Description |
---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Speed | Recall | Index Size |
---|---|---|
Fast | Highest | Biggest |
Parameter | Type | Description |
---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Speed | Recall | Index Size |
---|---|---|
Fast | High | Medium |
Parameter | Type | Description |
---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
pq_dim | size_t | Dimensionality of embeddings after quantization (less than or equal to dimension ). |
pq_bits | size_t | Number of bits per dimension for PQ embeddings (between 1 and 16). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Array2D
class provides a 2D container for data, which can be initialized with a specific number of rows and columns, or from an existing vector.
Array2D(size_t rows, size_t cols, const T& initial_value = T())
: Creates an empty 2D array with specified dimensions.Array2D(std::vector<T>&& data, size_t cols)
: Initializes the 2D array from a 1D vector.Array2D(const std::vector<T>& data, size_t cols)
: Initializes the 2D array from a 1D vector (copy).operator()(size_t row, size_t col) const
: Access an element at the specified row and column (read-only).operator()(size_t row, size_t col)
: Access an element at the specified row and column (read-write).size_t rows() const
: Returns the number of rows.size_t cols() const
: Returns the number of columns.size_t size() const
: Returns the total number of elements.TrainingConfig
struct defines parameters for training an index, allowing control over convergence and memory usage.
Parameter | Type | Description |
---|---|---|
batch_size | size_t | (Optional) Size of each batch for training. Defaults to 0 , which auto-selects the batch size. |
max_iters | size_t | (Optional) Maximum iterations for training. Defaults to 0 , which auto-selects iterations. |
tolerance | double | (Optional) Convergence tolerance for training. Defaults to 1e-6 . |
max_memory | size_t | (Optional) Maximum memory (MB) usage during training. Defaults to 0 , no limit. |
QueryParams
struct defines parameters for querying the index, controlling the number of results and probing behavior.
Parameter | Type | Description |
---|---|---|
top_k | size_t | (Optional) Number of nearest neighbors to return. Defaults to 100 . |
n_probes | size_t | (Optional) Number of lists to probe during query. Defaults to 1 . |
include | std::vector<ResultFields> | (Optional) List of item fields to return. Can include kDistance and kMetadata . Defaults to all. |
filters | std::string | (Optional) A JSON string of filters to apply to vector metadata, limiting search scope to these vectors. |
greedy | bool | (Optional) Whether to perform greedy search. Defaults to false . |
filters
use a subset of the MongoDB Query and Projection Operators.
For instance: filters: { "$and": [ { "label": "cat" }, { "confidence": { "$gte": 0.9 } } ] }
means that only vectors where label == "cat"
and confidence >= 0.9
will be considered for encrypted vector search.
For more info on metadata, see Metadata Filtering.QueryResults
class holds the results from a Query
operation, including IDs and distances for the nearest neighbors of each query.
Method | Return Type | Description |
---|---|---|
Result operator[](size_t query_idx) | Result | Returns read-write access to IDs and distances for a specific query. |
const std::vector<std::vector<std::string>>& ids() const | std::vector<std::vector<std::string>>& | Get read-only access to all IDs. |
const Array2D<float>& distances() const | const Array2D<float>& | Get read-only access to all distances. |
const std::vector<float>& vectors() const | const std::vectorfloat>& | Get read-only access to all vectors. |
const std::vector<std::vector<std::string>>& metadatas() const | const std::vector<std::vector<std::string>>& | Get read-only access to all metadatas. |
size_t num_queries() const | size_t | Returns the number of queries. |
size_t top_k() const | size_t | Returns the number of top-k items per query. |
bool empty() const | bool | Checks if the results are empty. |
Item
Item
struct holds the individual results from a Get
operation, including the requested fields.
ItemFields
enum defines the fields that can be requested for an Item
object.
ids
are always included in the returned items.