Location enum contains the supported index backing store locations for Cyborg Vector Search. These are:
DBConfig)
LocationConfig defines the storage location for various index components.
| Parameter | Type | Description |
|---|---|---|
location | Location | Specifies the type of storage location. |
table_name | std::string | (Optional) Name of the table in the database, if applicable. |
db_connection_string | std::string | (Optional) Connection string for database access, if applicable. |
DistanceMetric enum contains the supported distance metrics for Cyborg Vector Search. These are:
IndexConfig is an abstract base class for configuring index types. The three derived classes can be used to configure indexes:
| Speed | Recall | Index Size |
|---|---|---|
| Fastest | Lowest | Smallest |
| Parameter | Type | Description |
|---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
| Speed | Recall | Index Size |
|---|---|---|
| Fast | Highest | Biggest |
| Parameter | Type | Description |
|---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
| Speed | Recall | Index Size |
|---|---|---|
| Fast | High | Medium |
| Parameter | Type | Description |
|---|---|---|
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
pq_dim | size_t | Dimensionality of embeddings after quantization (less than or equal to dimension). |
pq_bits | size_t | Number of bits per dimension for PQ embeddings (between 1 and 16). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Array2D class provides a 2D container for data, which can be initialized with a specific number of rows and columns, or from an existing vector.
Array2D(size_t rows, size_t cols, const T& initial_value = T()): Creates an empty 2D array with specified dimensions.Array2D(std::vector<T>&& data, size_t cols): Initializes the 2D array from a 1D vector.Array2D(const std::vector<T>& data, size_t cols): Initializes the 2D array from a 1D vector (copy).operator()(size_t row, size_t col) const: Access an element at the specified row and column (read-only).operator()(size_t row, size_t col): Access an element at the specified row and column (read-write).size_t rows() const: Returns the number of rows.size_t cols() const: Returns the number of columns.size_t size() const: Returns the total number of elements.TrainingConfig struct defines parameters for training an index, allowing control over convergence and memory usage.
| Parameter | Type | Description |
|---|---|---|
batch_size | size_t | (Optional) Size of each batch for training. Defaults to 0, which auto-selects the batch size. |
max_iters | size_t | (Optional) Maximum iterations for training. Defaults to 0, which auto-selects iterations. |
tolerance | double | (Optional) Convergence tolerance for training. Defaults to 1e-6. |
max_memory | size_t | (Optional) Maximum memory (MB) usage during training. Defaults to 0, no limit. |
QueryParams struct defines parameters for querying the index, controlling the number of results and probing behavior.
| Parameter | Type | Description |
|---|---|---|
top_k | size_t | (Optional) Number of nearest neighbors to return. Defaults to 100. |
n_probes | size_t | (Optional) Number of lists to probe during query. Defaults to 1. |
return_distances | bool | (Optional) Whether to return distances with the IDs. Defaults to true. |
greedy | bool | (Optional) Whether to perform greedy search. Defaults to false. |
QueryResults class holds the results from a Query operation, including IDs and distances for the nearest neighbors of each query.
| Method | Return Type | Description |
|---|---|---|
Result operator[](size_t query_idx) | Result | Returns read-write access to IDs and distances for a specific query. |
const Array2D<uint64_t>& ids() const | const Array2D<uint64_t>& | Get read-only access to all IDs. |
const Array2D<float>& distances() const | const Array2D<float>& | Get read-only access to all distances. |
size_t num_queries() const | size_t | Returns the number of queries. |
size_t top_k() const | size_t | Returns the number of top-k items per query. |
bool empty() const | bool | Checks if the results are empty. |