Types - CyborgDB Docs

Location

The Location enum contains the supported index backing store locations for CyborgDB. These are:

enum class Location {
    kRedis,      // In-memory storage via Redis
    kMemory,     // Temporary in-memory storage
    kPostgres,   // Relational database storage
    kNone        // Undefined storage type
};

DBConfig

DBConfig defines the storage location for various index components.

Constructor

DBConfig(Location location,
                const std::optional<std::string>& table_name,
                const std::optional<std::string>& db_connection_string);

Parameters

Parameter	Type	Description
`location`	`Location`	Specifies the type of storage location.
`table_name`	`std::string`	(Optional) Name of the table in the database, if applicable.
`db_connection_string`	`std::string`	(Optional) Connection string for database access, if applicable.

Example Usage

cyborg::DBConfig index_loc(Location::kRedis, std::nullopt, "redis://localhost");
cyborg::DBConfig config_loc(Location::kRedis, std::nullopt, "redis://localhost");
cyborg::DBConfig items_loc(Location::kPostgres, "items", "host=localhost dbname=postgres");

For more info, you can read about supported backing stores here.

DistanceMetric

The DistanceMetric enum contains the supported distance metrics for CyborgDB. These are:

enum class DistanceMetric {
    Cosine,
    Euclidean,
    SquaredEuclidean};

IndexConfig

IndexConfig is an abstract base class for configuring index types. The three derived classes can be used to configure indexes:

For guidance on how to select the right IndexConfig and params, refer to the index configuration tuning guide.

IndexIVF

Ideal for large-scale datasets where fast retrieval is prioritized over high recall:

Speed	Recall	Index Size
Fastest	Lowest	Smallest

Constructor

IndexIVF(size_t dimension,
         size_t n_lists,
         DistanceMetric metric = DistanceMetric::Euclidean);

Parameters

Parameter	Type	Description
`dimension`	`size_t`	Dimensionality of vector embeddings.
`n_lists`	`size_t`	Number of inverted index lists to create in the index (recommended base-2 value).
`metric`	`DistanceMetric`	(Optional) Distance metric to use for index build and queries.

IndexIVFFlat

Suitable for applications requiring high recall with less concern for memory usage:

Speed	Recall	Index Size
Fast	Highest	Biggest

Constructor

IndexIVFFlat(size_t dimension,
             size_t n_lists,
             DistanceMetric metric = DistanceMetric::Euclidean);

Parameters

Parameter	Type	Description
`dimension`	`size_t`	Dimensionality of vector embeddings.
`n_lists`	`size_t`	Number of inverted index lists to create in the index (recommended base-2 value).
`metric`	`DistanceMetric`	(Optional) Distance metric to use for index build and queries.

IndexIVFPQ

Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:

Speed	Recall	Index Size
Fast	High	Medium

Constructor

IndexIVFPQ(size_t dimension,
           size_t n_lists,
           size_t pq_dim,
           size_t pq_bits,
           DistanceMetric metric = DistanceMetric::Euclidean);

Parameters

Parameter	Type	Description
`dimension`	`size_t`	Dimensionality of vector embeddings.
`n_lists`	`size_t`	Number of inverted index lists to create in the index (recommended base-2 value).
`pq_dim`	`size_t`	Dimensionality of embeddings after quantization (less than or equal to `dimension`).
`pq_bits`	`size_t`	Number of bits per dimension for PQ embeddings (between 1 and 16).
`metric`	`DistanceMetric`	(Optional) Distance metric to use for index build and queries.

Array2D

Array2D class provides a 2D container for data, which can be initialized with a specific number of rows and columns, or from an existing vector.

Constructors

Array2D(size_t rows, size_t cols, const T& initial_value = T());
Array2D(std::vector<T>&& data, size_t cols);
Array2D(const std::vector<T>& data, size_t cols);

Array2D(size_t rows, size_t cols, const T& initial_value = T()): Creates an empty 2D array with specified dimensions.
Array2D(std::vector<T>&& data, size_t cols): Initializes the 2D array from a 1D vector.
Array2D(const std::vector<T>& data, size_t cols): Initializes the 2D array from a 1D vector (copy).

Access Methods

operator()(size_t row, size_t col) const: Access an element at the specified row and column (read-only).
operator()(size_t row, size_t col): Access an element at the specified row and column (read-write).
size_t rows() const: Returns the number of rows.
size_t cols() const: Returns the number of columns.
size_t size() const: Returns the total number of elements.

Example Usage

// Converting a vector to an array
std::vector<uint8_t> vec = {0, 1, 2, 3, 4, 5, 6, 7};
cyborg::Array2D<uint8_t> arr(vec, 2);
// arr is now a 2D array of 4 rows and 2 columns, with the contents from vec

// Creating a 2D array with 3 rows and 2 columns, initialized to zero
cyborg::Array2D<int> array(3, 2, 0);

// Access and modify elements
array(0, 0) = 1;
array(0, 1) = 2;

// Printing the array
for (size_t i = 0; i < array.rows(); ++i) {
    for (size_t j = 0; j < array.cols(); ++j) {
        std::cout << array(i, j) << " ";
    }
    std::cout << std::endl;
}

TrainingConfig

The TrainingConfig struct defines parameters for training an index, allowing control over convergence and memory usage.

Constructor

TrainingConfig(size_t batch_size = 0,
                size_t max_iters = 0,
                double tolerance = 1e-6,
                size_t max_memory = 0);

Parameters

Parameter	Type	Description
`batch_size`	`size_t`	(Optional) Size of each batch for training. Defaults to `0`, which auto-selects the batch size.
`max_iters`	`size_t`	(Optional) Maximum iterations for training. Defaults to `0`, which auto-selects iterations.
`tolerance`	`double`	(Optional) Convergence tolerance for training. Defaults to `1e-6`.
`max_memory`	`size_t`	(Optional) Maximum memory (MB) usage during training. Defaults to `0`, no limit.

QueryParams

The QueryParams struct defines parameters for querying the index, controlling the number of results and probing behavior.

Constructor

QueryParams(size_t top_k = 100,
            size_t n_probes = 1,
            std::vector<ResultFields> include = {kDistance},
            bool greedy = false,
            std::string filters = "");

Parameters

Parameter	Type	Description
`top_k`	`size_t`	(Optional) Number of nearest neighbors to return. Defaults to `100`.
`n_probes`	`size_t`	(Optional) Number of lists to probe during query. Defaults to `1`.
`include`	`std::vector<ResultFields>`	(Optional) List of item fields to return. Can include `kDistance` and `kMetadata`. Defaults to all.
`filters`	`std::string`	(Optional) A JSON string of filters to apply to vector metadata, limiting search scope to these vectors.
`greedy`	`bool`	(Optional) Whether to perform greedy search. Defaults to `false`.

Higher n_probes values may improve recall but could slow down query time, so select a value based on desired recall and performance trade-offs. For guidance on how to select the right n_probes, refer to the query parameter tuning guide.

filters use a subset of the MongoDB Query and Projection Operators. For instance: filters: { "$and": [ { "label": "cat" }, { "confidence": { "$gte": 0.9 } } ] } means that only vectors where label == "cat" and confidence >= 0.9 will be considered for encrypted vector search. For more info on metadata, see Metadata Filtering.

QueryResults

QueryResults class holds the results from a Query operation, including IDs and distances for the nearest neighbors of each query.

Access Methods

Method	Return Type	Description
`Result operator[](size_t query_idx)`	`Result`	Returns read-write access to IDs and distances for a specific query.
`const std::vector<std::vector<std::string>>& ids() const`	`std::vector<std::vector<std::string>>&`	Get read-only access to all IDs.
`const Array2D<float>& distances() const`	`const Array2D<float>&`	Get read-only access to all distances.
`const std::vector<float>& vectors() const`	`const std::vectorfloat>&`	Get read-only access to all vectors.
`const std::vector<std::vector<std::string>>& metadatas() const`	`const std::vector<std::vector<std::string>>&`	Get read-only access to all metadatas.
`size_t num_queries() const`	`size_t`	Returns the number of queries.
`size_t top_k() const`	`size_t`	Returns the number of top-k items per query.
`bool empty() const`	`bool`	Checks if the results are empty.

Example Usage

QueryResults results(num_queries, top_k);

// Access the top-k results for each query
for (size_t i = 0; i < num_queries; ++i) {
    auto result = results[i];
    for (size_t j = 0; j < result.num_results; ++j) {
        std::cout << "ID: " << result.ids[j] << ", Distance: " << result.distances[j] << std::endl;
    }
}

// Get the IDs and distances for all queries
auto all_ids = results.ids();
auto all_distances = results.distances();

`Item`

Item struct holds the individual results from a Get operation, including the requested fields.

struct Item {
    const std::string id;                   // Item ID
    const std::vector<float> vector;        // Vector embedding
    const std::vector<uint8_t> contents;    // Decrypted contents
    const std::string metadata;             // Metadata (JSON string)
};

ItemFields

ItemFields enum defines the fields that can be requested for an Item object.

enum class ItemFields {
    kVector,
    kContents,
    kMetadata
};

By default, ids are always included in the returned items.

On this page

Location
DBConfig
Constructor
Parameters
Example Usage
DistanceMetric
IndexConfig
IndexIVF
Constructor
Parameters
IndexIVFFlat
Constructor
Parameters
IndexIVFPQ
Constructor
Parameters
Array2D
Constructors
Access Methods
Example Usage
TrainingConfig
Constructor
Parameters
QueryParams
Constructor
Parameters
QueryResults
Access Methods
Example Usage
Item
ItemFields

Location

The Location enum contains the supported index backing store locations for CyborgDB. These are:

enum class Location {
    kRedis,      // In-memory storage via Redis
    kMemory,     // Temporary in-memory storage
    kPostgres,   // Relational database storage
    kNone        // Undefined storage type
};

DBConfig

DBConfig defines the storage location for various index components.

Constructor

DBConfig(Location location,
                const std::optional<std::string>& table_name,
                const std::optional<std::string>& db_connection_string);

Parameters

Parameter	Type	Description
`location`	`Location`	Specifies the type of storage location.
`table_name`	`std::string`	(Optional) Name of the table in the database, if applicable.
`db_connection_string`	`std::string`	(Optional) Connection string for database access, if applicable.

Example Usage

cyborg::DBConfig index_loc(Location::kRedis, std::nullopt, "redis://localhost");
cyborg::DBConfig config_loc(Location::kRedis, std::nullopt, "redis://localhost");
cyborg::DBConfig items_loc(Location::kPostgres, "items", "host=localhost dbname=postgres");

For more info, you can read about supported backing stores here.

DistanceMetric

The DistanceMetric enum contains the supported distance metrics for CyborgDB. These are:

enum class DistanceMetric {
    Cosine,
    Euclidean,
    SquaredEuclidean};

IndexConfig

IndexConfig is an abstract base class for configuring index types. The three derived classes can be used to configure indexes:

For guidance on how to select the right IndexConfig and params, refer to the index configuration tuning guide.

IndexIVF

Ideal for large-scale datasets where fast retrieval is prioritized over high recall:

Speed	Recall	Index Size
Fastest	Lowest	Smallest

Constructor

IndexIVF(size_t dimension,
         size_t n_lists,
         DistanceMetric metric = DistanceMetric::Euclidean);

Parameters

Parameter	Type	Description
`dimension`	`size_t`	Dimensionality of vector embeddings.
`n_lists`	`size_t`	Number of inverted index lists to create in the index (recommended base-2 value).
`metric`	`DistanceMetric`	(Optional) Distance metric to use for index build and queries.

IndexIVFFlat

Suitable for applications requiring high recall with less concern for memory usage:

Speed	Recall	Index Size
Fast	Highest	Biggest

Constructor

IndexIVFFlat(size_t dimension,
             size_t n_lists,
             DistanceMetric metric = DistanceMetric::Euclidean);

Parameters

Parameter	Type	Description
`dimension`	`size_t`	Dimensionality of vector embeddings.
`n_lists`	`size_t`	Number of inverted index lists to create in the index (recommended base-2 value).
`metric`	`DistanceMetric`	(Optional) Distance metric to use for index build and queries.

IndexIVFPQ

Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:

Speed	Recall	Index Size
Fast	High	Medium

Constructor

IndexIVFPQ(size_t dimension,
           size_t n_lists,
           size_t pq_dim,
           size_t pq_bits,
           DistanceMetric metric = DistanceMetric::Euclidean);

Parameters

Parameter	Type	Description
`dimension`	`size_t`	Dimensionality of vector embeddings.
`n_lists`	`size_t`	Number of inverted index lists to create in the index (recommended base-2 value).
`pq_dim`	`size_t`	Dimensionality of embeddings after quantization (less than or equal to `dimension`).
`pq_bits`	`size_t`	Number of bits per dimension for PQ embeddings (between 1 and 16).
`metric`	`DistanceMetric`	(Optional) Distance metric to use for index build and queries.

Array2D

Array2D class provides a 2D container for data, which can be initialized with a specific number of rows and columns, or from an existing vector.

Constructors

Array2D(size_t rows, size_t cols, const T& initial_value = T());
Array2D(std::vector<T>&& data, size_t cols);
Array2D(const std::vector<T>& data, size_t cols);

Array2D(size_t rows, size_t cols, const T& initial_value = T()): Creates an empty 2D array with specified dimensions.
Array2D(std::vector<T>&& data, size_t cols): Initializes the 2D array from a 1D vector.
Array2D(const std::vector<T>& data, size_t cols): Initializes the 2D array from a 1D vector (copy).

Access Methods

operator()(size_t row, size_t col) const: Access an element at the specified row and column (read-only).
operator()(size_t row, size_t col): Access an element at the specified row and column (read-write).
size_t rows() const: Returns the number of rows.
size_t cols() const: Returns the number of columns.
size_t size() const: Returns the total number of elements.

Example Usage

// Converting a vector to an array
std::vector<uint8_t> vec = {0, 1, 2, 3, 4, 5, 6, 7};
cyborg::Array2D<uint8_t> arr(vec, 2);
// arr is now a 2D array of 4 rows and 2 columns, with the contents from vec

// Creating a 2D array with 3 rows and 2 columns, initialized to zero
cyborg::Array2D<int> array(3, 2, 0);

// Access and modify elements
array(0, 0) = 1;
array(0, 1) = 2;

// Printing the array
for (size_t i = 0; i < array.rows(); ++i) {
    for (size_t j = 0; j < array.cols(); ++j) {
        std::cout << array(i, j) << " ";
    }
    std::cout << std::endl;
}

TrainingConfig

The TrainingConfig struct defines parameters for training an index, allowing control over convergence and memory usage.

Constructor

TrainingConfig(size_t batch_size = 0,
                size_t max_iters = 0,
                double tolerance = 1e-6,
                size_t max_memory = 0);

Parameters

Parameter	Type	Description
`batch_size`	`size_t`	(Optional) Size of each batch for training. Defaults to `0`, which auto-selects the batch size.
`max_iters`	`size_t`	(Optional) Maximum iterations for training. Defaults to `0`, which auto-selects iterations.
`tolerance`	`double`	(Optional) Convergence tolerance for training. Defaults to `1e-6`.
`max_memory`	`size_t`	(Optional) Maximum memory (MB) usage during training. Defaults to `0`, no limit.

QueryParams

The QueryParams struct defines parameters for querying the index, controlling the number of results and probing behavior.

Constructor

QueryParams(size_t top_k = 100,
            size_t n_probes = 1,
            std::vector<ResultFields> include = {kDistance},
            bool greedy = false,
            std::string filters = "");

Parameters

Parameter	Type	Description
`top_k`	`size_t`	(Optional) Number of nearest neighbors to return. Defaults to `100`.
`n_probes`	`size_t`	(Optional) Number of lists to probe during query. Defaults to `1`.
`include`	`std::vector<ResultFields>`	(Optional) List of item fields to return. Can include `kDistance` and `kMetadata`. Defaults to all.
`filters`	`std::string`	(Optional) A JSON string of filters to apply to vector metadata, limiting search scope to these vectors.
`greedy`	`bool`	(Optional) Whether to perform greedy search. Defaults to `false`.

QueryResults

QueryResults class holds the results from a Query operation, including IDs and distances for the nearest neighbors of each query.

Access Methods

Method	Return Type	Description
`Result operator[](size_t query_idx)`	`Result`	Returns read-write access to IDs and distances for a specific query.
`const std::vector<std::vector<std::string>>& ids() const`	`std::vector<std::vector<std::string>>&`	Get read-only access to all IDs.
`const Array2D<float>& distances() const`	`const Array2D<float>&`	Get read-only access to all distances.
`const std::vector<float>& vectors() const`	`const std::vectorfloat>&`	Get read-only access to all vectors.
`const std::vector<std::vector<std::string>>& metadatas() const`	`const std::vector<std::vector<std::string>>&`	Get read-only access to all metadatas.
`size_t num_queries() const`	`size_t`	Returns the number of queries.
`size_t top_k() const`	`size_t`	Returns the number of top-k items per query.
`bool empty() const`	`bool`	Checks if the results are empty.

Example Usage

QueryResults results(num_queries, top_k);

// Access the top-k results for each query
for (size_t i = 0; i < num_queries; ++i) {
    auto result = results[i];
    for (size_t j = 0; j < result.num_results; ++j) {
        std::cout << "ID: " << result.ids[j] << ", Distance: " << result.distances[j] << std::endl;
    }
}

// Get the IDs and distances for all queries
auto all_ids = results.ids();
auto all_distances = results.distances();

`Item`

Item struct holds the individual results from a Get operation, including the requested fields.

struct Item {
    const std::string id;                   // Item ID
    const std::vector<float> vector;        // Vector embedding
    const std::vector<uint8_t> contents;    // Decrypted contents
    const std::string metadata;             // Metadata (JSON string)
};

ItemFields

ItemFields enum defines the fields that can be requested for an Item object.

enum class ItemFields {
    kVector,
    kContents,
    kMetadata
};

By default, ids are always included in the returned items.

On this page

Location
DBConfig
Constructor
Parameters
Example Usage
DistanceMetric
IndexConfig
IndexIVF
Constructor
Parameters
IndexIVFFlat
Constructor
Parameters
IndexIVFPQ
Constructor
Parameters
Array2D
Constructors
Access Methods
Example Usage
TrainingConfig
Constructor
Parameters
QueryParams
Constructor
Parameters
QueryResults
Access Methods
Example Usage
Item
ItemFields

​Location

​DBConfig

​Constructor

​Parameters

​Example Usage

​DistanceMetric

​IndexConfig

​IndexIVF

​Constructor

​Parameters

​IndexIVFFlat

​Constructor

​Parameters

​IndexIVFPQ

​Constructor

​Parameters

​Array2D

​Constructors

​Access Methods

​Example Usage

​TrainingConfig

​Constructor

​Parameters

​QueryParams

​Constructor

​Parameters

​QueryResults

​Access Methods

​Example Usage

​Item

​ItemFields

Introduction

​Location

​DBConfig

​Constructor

​Parameters

​Example Usage

​DistanceMetric

​IndexConfig

​IndexIVF

​Constructor

​Parameters

​IndexIVFFlat

​Constructor

​Parameters

​IndexIVFPQ

​Constructor

​Parameters

​Array2D

​Constructors

​Access Methods

​Example Usage

​TrainingConfig

​Constructor

​Parameters

​QueryParams

​Constructor

​Parameters

​QueryResults

​Access Methods

​Example Usage

​Item

​ItemFields

Location

DBConfig

Constructor

Parameters

Example Usage

DistanceMetric

IndexConfig

IndexIVF

Constructor

Parameters

IndexIVFFlat

Constructor

Parameters

IndexIVFPQ

Constructor

Parameters

Array2D

Constructors

Access Methods

Example Usage

TrainingConfig

Constructor

Parameters

QueryParams

Constructor

Parameters

QueryResults

Access Methods

Example Usage

`Item`

ItemFields

Location

DBConfig

Constructor

Parameters

Example Usage

DistanceMetric

IndexConfig

IndexIVF

Constructor

Parameters

IndexIVFFlat

Constructor

Parameters

IndexIVFPQ

Constructor

Parameters

Array2D

Constructors

Access Methods

Example Usage

TrainingConfig

Constructor

Parameters

QueryParams

Constructor

Parameters

QueryResults

Access Methods

Example Usage

`Item`

ItemFields