Documentation Index
Fetch the complete documentation index at: https://docs.cyborg.co/llms.txt
Use this file to discover all available pages before exploring further.
Location
The Location enum contains the supported index backing store locations for CyborgDB. These are:
enum class Location {
kRedis, // In-memory storage via Redis
kMemory, // Temporary in-memory storage
kPostgres, // Relational database storage
kNone // Undefined storage type
};
DBConfig
DBConfig defines the storage location for various index components.
Constructor
DBConfig(Location location,
const std::optional<std::string>& table_name,
const std::optional<std::string>& db_connection_string);
Parameters
| Parameter | Type | Description |
location | Location | Specifies the type of storage location. |
table_name | std::string | (Optional) Name of the table in the database, if applicable. |
db_connection_string | std::string | (Optional) Connection string for database access, if applicable. |
Example Usage
cyborg::DBConfig index_loc(Location::kRedis, std::nullopt, "redis://localhost");
cyborg::DBConfig config_loc(Location::kRedis, std::nullopt, "redis://localhost");
cyborg::DBConfig items_loc(Location::kPostgres, "items", "host=localhost dbname=postgres");
For more info, you can read about supported backing stores here.
DistanceMetric
The DistanceMetric enum contains the supported distance metrics for CyborgDB. These are:
enum class DistanceMetric {
Cosine,
Euclidean,
SquaredEuclidean};
IndexConfig
IndexConfig is an abstract base class for configuring index types. The three derived classes can be used to configure indexes:
IndexIVF
Ideal for large-scale datasets where fast retrieval is prioritized over high recall:
| Speed | Recall | Index Size |
| Fastest | Lowest | Smallest |
Constructor
IndexIVF(size_t dimension,
size_t n_lists,
DistanceMetric metric = DistanceMetric::Euclidean);
Parameters
| Parameter | Type | Description |
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
IndexIVFFlat
Suitable for applications requiring high recall with less concern for memory usage:
| Speed | Recall | Index Size |
| Fast | Highest | Biggest |
Constructor
IndexIVFFlat(size_t dimension,
size_t n_lists,
DistanceMetric metric = DistanceMetric::Euclidean);
Parameters
| Parameter | Type | Description |
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
IndexIVFPQ
Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:
| Speed | Recall | Index Size |
| Fast | High | Medium |
Constructor
IndexIVFPQ(size_t dimension,
size_t n_lists,
size_t pq_dim,
size_t pq_bits,
DistanceMetric metric = DistanceMetric::Euclidean);
Parameters
| Parameter | Type | Description |
dimension | size_t | Dimensionality of vector embeddings. |
n_lists | size_t | Number of inverted index lists to create in the index (recommended base-2 value). |
pq_dim | size_t | Dimensionality of embeddings after quantization (less than or equal to dimension). |
pq_bits | size_t | Number of bits per dimension for PQ embeddings (between 1 and 16). |
metric | DistanceMetric | (Optional) Distance metric to use for index build and queries. |
Array2D
Array2D class provides a 2D container for data, which can be initialized with a specific number of rows and columns, or from an existing vector.
Constructors
Array2D(size_t rows, size_t cols, const T& initial_value = T());
Array2D(std::vector<T>&& data, size_t cols);
Array2D(const std::vector<T>& data, size_t cols);
Array2D(size_t rows, size_t cols, const T& initial_value = T()): Creates an empty 2D array with specified dimensions.
Array2D(std::vector<T>&& data, size_t cols): Initializes the 2D array from a 1D vector.
Array2D(const std::vector<T>& data, size_t cols): Initializes the 2D array from a 1D vector (copy).
Access Methods
operator()(size_t row, size_t col) const: Access an element at the specified row and column (read-only).
operator()(size_t row, size_t col): Access an element at the specified row and column (read-write).
size_t rows() const: Returns the number of rows.
size_t cols() const: Returns the number of columns.
size_t size() const: Returns the total number of elements.
Example Usage
// Converting a vector to an array
std::vector<uint8_t> vec = {0, 1, 2, 3, 4, 5, 6, 7};
cyborg::Array2D<uint8_t> arr(vec, 2);
// arr is now a 2D array of 4 rows and 2 columns, with the contents from vec
// Creating a 2D array with 3 rows and 2 columns, initialized to zero
cyborg::Array2D<int> array(3, 2, 0);
// Access and modify elements
array(0, 0) = 1;
array(0, 1) = 2;
// Printing the array
for (size_t i = 0; i < array.rows(); ++i) {
for (size_t j = 0; j < array.cols(); ++j) {
std::cout << array(i, j) << " ";
}
std::cout << std::endl;
}
TrainingConfig
The TrainingConfig struct defines parameters for training an index, allowing control over convergence and memory usage.
Constructor
TrainingConfig(size_t batch_size = 0,
size_t max_iters = 0,
double tolerance = 1e-6,
size_t max_memory = 0);
Parameters
| Parameter | Type | Description |
batch_size | size_t | (Optional) Size of each batch for training. Defaults to 0, which auto-selects the batch size. |
max_iters | size_t | (Optional) Maximum iterations for training. Defaults to 0, which auto-selects iterations. |
tolerance | double | (Optional) Convergence tolerance for training. Defaults to 1e-6. |
max_memory | size_t | (Optional) Maximum memory (MB) usage during training. Defaults to 0, no limit. |
QueryParams
The QueryParams struct defines parameters for querying the index, controlling the number of results and probing behavior.
Constructor
QueryParams(size_t top_k = 100,
size_t n_probes = 1,
std::vector<ResultFields> include = {kDistance},
bool greedy = false,
std::string filters = "");
Parameters
| Parameter | Type | Description |
top_k | size_t | (Optional) Number of nearest neighbors to return. Defaults to 100. |
n_probes | size_t | (Optional) Number of lists to probe during query. Defaults to 1. |
include | std::vector<ResultFields> | (Optional) List of item fields to return. Can include kDistance and kMetadata. Defaults to all. |
filters | std::string | (Optional) A JSON string of filters to apply to vector metadata, limiting search scope to these vectors. |
greedy | bool | (Optional) Whether to perform greedy search. Defaults to false. |
Higher n_probes values may improve recall but could slow down query time, so select a value based on desired recall and performance trade-offs.
filters use a subset of the
MongoDB Query and Projection Operators.
For instance:
filters: { "$and": [ { "label": "cat" }, { "confidence": { "$gte": 0.9 } } ] } means that only vectors where
label == "cat" and
confidence >= 0.9 will be considered for encrypted vector search.
For more info on metadata, see
Metadata Filtering.
QueryResults
QueryResults class holds the results from a Query operation, including IDs and distances for the nearest neighbors of each query.
Access Methods
| Method | Return Type | Description |
Result operator[](size_t query_idx) | Result | Returns read-write access to IDs and distances for a specific query. |
const std::vector<std::vector<std::string>>& ids() const | std::vector<std::vector<std::string>>& | Get read-only access to all IDs. |
const Array2D<float>& distances() const | const Array2D<float>& | Get read-only access to all distances. |
const std::vector<float>& vectors() const | const std::vectorfloat>& | Get read-only access to all vectors. |
const std::vector<std::vector<std::string>>& metadatas() const | const std::vector<std::vector<std::string>>& | Get read-only access to all metadatas. |
size_t num_queries() const | size_t | Returns the number of queries. |
size_t top_k() const | size_t | Returns the number of top-k items per query. |
bool empty() const | bool | Checks if the results are empty. |
Example Usage
QueryResults results(num_queries, top_k);
// Access the top-k results for each query
for (size_t i = 0; i < num_queries; ++i) {
auto result = results[i];
for (size_t j = 0; j < result.num_results; ++j) {
std::cout << "ID: " << result.ids[j] << ", Distance: " << result.distances[j] << std::endl;
}
}
// Get the IDs and distances for all queries
auto all_ids = results.ids();
auto all_distances = results.distances();
Item
Item struct holds the individual results from a Get operation, including the requested fields.
struct Item {
const std::string id; // Item ID
const std::vector<float> vector; // Vector embedding
const std::vector<uint8_t> contents; // Decrypted contents
const std::string metadata; // Metadata (JSON string)
};
ItemFields
ItemFields enum defines the fields that can be requested for an Item object.
enum class ItemFields {
kVector,
kContents,
kMetadata
};
By default, ids are always included in the returned items.