Skip to main content

Location

The Location enum contains the supported index backing store locations for CyborgDB. These are:
enum class Location {
    kRedis,      // In-memory storage via Redis
    kMemory,     // Temporary in-memory storage
    kPostgres,   // Relational database storage
    kNone        // Undefined storage type
};

DBConfig

DBConfig defines the storage location for various index components.

Constructor

DBConfig(Location location,
                const std::optional<std::string>& table_name,
                const std::optional<std::string>& db_connection_string);

Parameters

ParameterTypeDescription
locationLocationSpecifies the type of storage location.
table_namestd::string(Optional) Name of the table in the database, if applicable.
db_connection_stringstd::string(Optional) Connection string for database access, if applicable.

Example Usage

cyborg::DBConfig index_loc(Location::kRedis, std::nullopt, "redis://localhost");
cyborg::DBConfig config_loc(Location::kRedis, std::nullopt, "redis://localhost");
cyborg::DBConfig items_loc(Location::kPostgres, "items", "host=localhost dbname=postgres");
For more info, you can read about supported backing stores here.

GPUConfig

GPUConfig is an enum that specifies which operations should use GPU acceleration. It uses bitflags that can be combined using the | (OR) operator.

Enum Values

enum GPUConfig : uint8_t {
    kNone = 0,                        // No GPU usage
    kUpsert = 1 << 0,                 // Use GPU for upsert operations
    kTrain = 1 << 1,                  // Use GPU for training operations
    kQuery = 1 << 2,                  // Use GPU for query operations
    kAll = kUpsert | kTrain | kQuery  // Use GPU for all operations
};

Example Usage

// Enable GPU for all operations
cyborg::GPUConfig config1 = cyborg::kAll;

// Enable GPU only for training and query
cyborg::GPUConfig config2 = cyborg::kTrain | cyborg::kQuery;

// Enable GPU only for upsert
cyborg::GPUConfig config3 = cyborg::kUpsert;

// Disable GPU completely
cyborg::GPUConfig config4 = cyborg::kNone;

DeviceConfig

DeviceConfig class holds the configuration details for the device used in vector search operations, such as the number of CPU threads and GPU acceleration settings.

Constructor

DeviceConfig(const int cpu_threads = 0, const GPUConfig gpu_config = kNone);

Parameters

ParameterTypeDescription
cpu_threadsint(Optional) Number of CPU threads to use. Defaults to 0 (use all available cores).
gpu_configGPUConfig(Optional) GPU operations configuration. Defaults to kNone (no GPU).

Methods

MethodReturn TypeDescription
cpu_threads() constintGet the number of CPU threads configured.
gpu_config() constGPUConfigGet the GPU operations configuration.

Example Usage

// 4 CPU threads, GPU enabled for training and query
cyborg::DeviceConfig device_config(4, cyborg::kTrain | cyborg::kQuery);
int threads = device_config.cpu_threads();           // Returns 4
cyborg::GPUConfig gpu = device_config.gpu_config();  // Returns kTrain | kQuery

DistanceMetric

The DistanceMetric enum contains the supported distance metrics for CyborgDB. These are:
enum class DistanceMetric {
    Cosine,
    Euclidean,
    SquaredEuclidean};

IndexConfig

IndexConfig is an abstract base class for configuring index types. The three derived classes can be used to configure indexes:

IndexIVF

Ideal for large-scale datasets where fast retrieval is prioritized over high recall:
SpeedRecallIndex Size
FastestLowestSmallest

Constructor

IndexIVF(size_t dimension = 0, std::optional<std::string> embedding_model = std::nullopt);

Parameters

ParameterTypeDefaultDescription
dimensionsize_t0(Optional) Dimensionality of vector embeddings. Auto-detected if 0.
embedding_modelstd::optional<std::string>std::nullopt(Optional) Embedding model name for auto-generation.

Methods

MethodReturn TypeDescription
dimension()size_tGet vector dimensionality.
metric()DistanceMetricGet distance metric.
set_metric(DistanceMetric)voidSet distance metric.
n_lists()size_tGet number of inverted lists (initially 1, set during training).
set_n_lists(size_t)voidSet number of inverted lists (usually done automatically during training).

IndexIVFFlat

Suitable for applications requiring high recall with less concern for memory usage:
SpeedRecallIndex Size
FastHighestBiggest

Constructor

IndexIVFFlat(size_t dimension = 0, std::optional<std::string> embedding_model = std::nullopt);

Parameters

ParameterTypeDefaultDescription
dimensionsize_t0(Optional) Dimensionality of vector embeddings. Auto-detected if 0.
embedding_modelstd::optional<std::string>std::nullopt(Optional) Embedding model name for auto-generation.

Methods

MethodReturn TypeDescription
dimension()size_tGet vector dimensionality.
metric()DistanceMetricGet distance metric.
set_metric(DistanceMetric)voidSet distance metric.
n_lists()size_tGet number of inverted lists (initially 1, set during training).
set_n_lists(size_t)voidSet number of inverted lists (usually done automatically during training).
IndexIVFFlat is the default index configuration and is suitable for most use cases.

IndexIVFPQ

Product Quantization compresses embeddings, making it suitable for balancing memory use and recall:
SpeedRecallIndex Size
FastHighMedium

Constructor

IndexIVFPQ(size_t dimension = 0, size_t pq_dim = 16, size_t pq_bits = 8,
           std::optional<std::string> embedding_model = std::nullopt);

Parameters

ParameterTypeDefaultDescription
dimensionsize_t0(Optional) Dimensionality of vector embeddings. Auto-detected if 0.
pq_dimsize_t16Dimensionality of embeddings after quantization (less than or equal to dimension).
pq_bitssize_t8Number of bits per dimension for PQ embeddings (between 1 and 16).
embedding_modelstd::optional<std::string>std::nullopt(Optional) Embedding model name for auto-generation.

Methods

MethodReturn TypeDescription
dimension()size_tGet vector dimensionality.
metric()DistanceMetricGet distance metric.
set_metric(DistanceMetric)voidSet distance metric.
n_lists()size_tGet number of inverted lists (initially 1, set during training).
set_n_lists(size_t)voidSet number of inverted lists (usually done automatically during training).
pq_dim()size_tGet PQ dimensionality.
pq_bits()size_tGet PQ bits per quantizer.

Array2D

Array2D class provides a 2D container for data, which can be initialized with a specific number of rows and columns, or from an existing vector.

Constructors

Array2D(size_t rows, size_t cols, const T& initial_value = T());
Array2D(std::vector<T>&& data, size_t cols);
Array2D(const std::vector<T>& data, size_t cols);
Array2D(std::initializer_list<std::initializer_list<T>> init_list);
Array2D(Array2D&& other) noexcept;
Array2D(const Array2D& other);
Array2D();
  • Array2D(size_t rows, size_t cols, const T& initial_value = T()): Creates a 2D array with specified dimensions, initialized with the given value.
  • Array2D(std::vector<T>&& data, size_t cols): Initializes the 2D array from a 1D vector (move semantics).
  • Array2D(const std::vector<T>& data, size_t cols): Initializes the 2D array from a 1D vector (copy).
  • Array2D(std::initializer_list<std::initializer_list<T>> init_list): Initializes from a nested initializer list (e.g., {{1, 2}, {3, 4}}).
  • Array2D(Array2D&& other) noexcept: Move constructor - transfers ownership without copying.
  • Array2D(const Array2D& other): Copy constructor - creates a deep copy.
  • Array2D(): Default constructor - creates an empty array (0 rows, 0 columns).

Access Methods

  • operator()(size_t row, size_t col) const: Access an element at the specified row and column (read-only).
  • operator()(size_t row, size_t col): Access an element at the specified row and column (read-write).
  • size_t rows() const: Returns the number of rows.
  • size_t cols() const: Returns the number of columns.
  • size_t size() const: Returns the total number of elements.

Example Usage

// Converting a vector to an array
std::vector<uint8_t> vec = {0, 1, 2, 3, 4, 5, 6, 7};
cyborg::Array2D<uint8_t> arr(vec, 2);
// arr is now a 2D array of 4 rows and 2 columns, with the contents from vec

// Creating a 2D array with 3 rows and 2 columns, initialized to zero
cyborg::Array2D<int> array(3, 2, 0);

// Access and modify elements
array(0, 0) = 1;
array(0, 1) = 2;

// Printing the array
for (size_t i = 0; i < array.rows(); ++i) {
    for (size_t j = 0; j < array.cols(); ++j) {
        std::cout << array(i, j) << " ";
    }
    std::cout << std::endl;
}

TrainingConfig

The TrainingConfig struct defines parameters for training an index, allowing control over convergence and memory usage.

Constructor

TrainingConfig(std::optional<size_t> n_lists = std::nullopt,
               std::optional<size_t> batch_size = std::nullopt,
               std::optional<size_t> max_iters = std::nullopt,
               std::optional<double> tolerance = std::nullopt,
               std::optional<size_t> max_memory = std::nullopt);

Parameters

ParameterTypeDescription
n_listsstd::optional<size_t>(Optional) Number of inverted lists to create. Defaults to std::nullopt (auto-determines, typically 0).
batch_sizestd::optional<size_t>(Optional) Size of each batch for training. Defaults to std::nullopt (auto-determines, typically 2048).
max_itersstd::optional<size_t>(Optional) Maximum iterations for training. Defaults to std::nullopt (auto-determines, typically 100).
tolerancestd::optional<double>(Optional) Convergence tolerance for training. Defaults to std::nullopt (uses 1e-6).
max_memorystd::optional<size_t>(Optional) Maximum memory (MB) usage during training. Defaults to std::nullopt (no limit).

Struct Members

Note: The struct members are stored in this order (different from constructor parameter order):
size_t batch_size;   // Batch size (default: 2048)
size_t max_iters;    // Maximum iterations (default: 100)
double tolerance;    // Convergence tolerance (default: 1e-6)
size_t max_memory;   // Maximum memory in MB (default: 0, no limit)
size_t n_lists;      // Number of inverted lists (default: 0, auto-determine)

QueryParams

The QueryParams struct defines parameters for querying the index, controlling the number of results and probing behavior.

Constructor

explicit QueryParams(size_t top_k = 100,
                     size_t n_probes = 0,
                     std::string filters = "",
                     std::vector<ResultFields> include = {},
                     bool greedy = false);

Parameters

ParameterTypeDescription
top_ksize_t(Optional) Number of nearest neighbors to return. Defaults to 100.
n_probessize_t(Optional) Number of lists to probe during query. Defaults to 0 which will auto-determine optimal probes.
filtersstd::string(Optional) A JSON string of filters to apply to vector metadata, limiting search scope to these vectors.
includestd::vector<ResultFields>(Optional) List of result fields to return. Can include kDistance and kMetadata. Defaults to empty.
greedybool(Optional) Whether to perform greedy search. Defaults to false.
Higher n_probes values may improve recall but could slow down query time, so select a value based on desired recall and performance trade-offs.
filters use a subset of the MongoDB Query and Projection Operators. For instance: filters: { "$and": [ { "label": "cat" }, { "confidence": { "$gte": 0.9 } } ] } means that only vectors where label == "cat" and confidence >= 0.9 will be considered for encrypted vector search. For more info on metadata, see Metadata Filtering.

QueryResults

QueryResults class holds the results from a Query operation, including IDs and distances for the nearest neighbors of each query.

Access Methods

MethodReturn TypeDescription
Result operator[](size_t query_idx)ResultReturns read-write access to IDs and distances for a specific query.
const Array2D<ItemID>& ids() constconst Array2D<ItemID>&Get read-only access to all IDs as a 2D array.
const Array2D<float>& distances() constconst Array2D<float>&Get read-only access to all distances as a 2D array.
std::vector<std::vector<std::string>>& metadata()std::vector<std::vector<std::string>>&Get read-write access to all metadata.
size_t num_queries() constsize_tReturns the number of queries.
size_t top_k() constsize_tReturns the number of top-k items per query.
bool empty() constboolChecks if the results are empty.

Example Usage

QueryResults results(num_queries, top_k);

// Access the top-k results for each query
for (size_t i = 0; i < num_queries; ++i) {
    auto result = results[i];
    for (size_t j = 0; j < result.num_results; ++j) {
        std::cout << "ID: " << result.ids[j] << ", Distance: " << result.distances[j] << std::endl;
    }
}

// Get the IDs and distances for all queries
auto all_ids = results.ids();
auto all_distances = results.distances();

ItemID

ItemID is a type alias for unique identifiers used throughout CyborgDB.
using ItemID = std::string;
ItemID is used to uniquely identify vectors and items within an encrypted index. Currently implemented as std::string for flexibility and human-readable identifiers.

IndexType

The IndexType enum defines the supported index types in CyborgDB:
enum IndexType {
    IVF,     // Inverted File index
    IVFPQ,   // Inverted File with Product Quantization
    IVFFLAT  // Inverted File with flat (uncompressed) storage
};
All three index types are available:
  • IVF: Fastest retrieval, lowest recall, smallest index size
  • IVFPQ: Balanced memory usage and recall with product quantization
  • IVFFLAT: Highest recall, largest index size, no compression

Item

Item struct holds the individual results from a Get operation, including the requested fields.
struct Item {
    const std::string id;                   // Item ID
    const std::vector<float> vector;        // Vector embedding
    const std::vector<uint8_t> contents;    // Decrypted contents
    const std::string metadata;             // Metadata (JSON string)
};

ResultFields

ResultFields enum specifies which fields to include in query results.
enum class ResultFields {
    kDistance,    // Include distance scores in query results
    kMetadata     // Include metadata in query results
};

ItemFields

ItemFields enum defines the fields that can be requested for an Item object.
enum class ItemFields {
    kVector,       // Include vector in returned items
    kMetadata,     // Include metadata in returned items
    kContents      // Include content data in returned items
};
By default, ids are always included in the returned items.