Encryption

CyborgDB encrypts all sensitive data — including embeddings, document IDs, contents, and metadata — end-to-end, ensuring that nothing is stored or processed in plaintext without authorization. This section details the cryptographic design, the indexing scheme, and how forward privacy is achieved.

The technology discussed in this guide is covered by Cyborg’s IP portfolio, including US Patents 12,164,664, 11,860,875, 11,423,028 and 10,977,315, among others. Any use is subject to licensing agreements.

1. Data Encryption

All stored records in CyborgDB are encrypted using:

in
Per-record unique to prevent ciphertext pattern reuse
Authenticated encryption, ensuring both confidentiality and tamper detection

This symmetric encryption is applied to:

Vector Embeddings

Document IDs

Document Contents

Metadata Fields

This guarantees that even with full disk access, data remains encrypted and cannot be modified undetected.

2. Forward-Secure Cryptographic Indexing

CyborgDB also encrypts the search index used for approximate nearest neighbor (ANN) retrieval. The index is constructed using a forward-secure cryptographic scheme inspired by research in forward-private searchable encryption, combined with Cyborg’s patented enhancements.

Forward privacy ensures that newly inserted entries cannot be linked to prior search queries, even if the index is compromised later.

Key Concepts

Per-cluster seeds via HMAC
Each ANN cluster is assigned a unique seed derived from the master key using (MasterKey, ClusterID).
Binary tree indexing with SHA-3
The ANN cluster is represented as a binary tree. Internal node keys are derived from their children using .
Token-scoped search
Search queries produce tokens that only target the minimal set of node keys required to execute the query. Unnecessary index nodes are ignored.
In-use encryption
Queries are executed over ciphertext nodes, preventing leakage from memory.

Cryptographic Vector Indexing Flow

Below is an overview of how CyborgDB indexes vectors cryptographically: This flow demonstrates how CyborgDB transforms a single vector embedding into multiple encrypted components:

Vector clustering: The clustering model assigns each embedding to an optimal cluster for efficient ANN search
Deterministic key derivation: The master key and cluster ID generate a unique seed for this cluster
Forward privacy: The counter map ensures identical embeddings get different cryptographic treatment on each insertion
Dual-purpose keys: SHA-3 produces two keys serving different functions - one for storage indexing, one for encryption
Secure storage: The final encrypted embedding can only be decrypted with the derived Node Key R

This design ensures that even identical vector embeddings result in different ciphertexts and index entries, preventing correlation attacks across insertions.

3. Cryptographic Approximate Nearest Neighbor Search

CyborgDB’s search execution maintains cryptographic protection throughout the entire query process, leveraging the cryptographic index discussed above.

Token Generation

When a client initiates a query:

Query vector preparation: The search vector undergoes the same clustering model to determine target clusters
Search token derivation: For each target cluster, the client recreates the cryptographic key hierarchy
Minimal key generation: Only the node keys required for the query path are computed

This process generates the exact same keys used during indexing, but only for the clusters relevant to the query. The client now possesses the minimal set of tokens needed to decrypt the required index nodes.

This token-scoped approach ensures that irrelevant index nodes remain opaque ciphertexts, minimizing information leakage.

Encrypted Index Traversal

With search tokens in hand, the server executes the query while maintaining cryptographic protection: The retrieval and decryption process:

Token-based retrieval: Search tokens are used as lookup keys in the encrypted KV store
Optional decryption: Embeddings can be decrypted for final distance computation

This ensures that the server can execute efficient ANN search while never having persistent access to plaintext embeddings or the ability to decrypt irrelevant index nodes. Alternatively, node decryption can happen client-side, ensuring the server only deals with ciphertext and never sees plaintext embeddings.

4. Security Benefits

CyborgDB’s multi-layered cryptographic design provides comprehensive protection against various threat scenarios, leveraging Cyborg’s patented technology to achieve security properties impossible with traditional approaches.

Threat Protection Matrix

Threat Scenario	Protection Mechanism
Disk theft	AES-256-GCM encryption at rest
Network interception	TLS + AEAD encryption in transit
Server compromise	In-use encryption prevents extraction of plaintext embeddings or node keys
Index leakage	Forward privacy hides relationship between new inserts and past searches
Embedding inversion	Ciphertext embeddings are never exposed — inversion is not possible without keys

Security Properties

Property	Implementation	Benefit
Query Privacy	Token-scoped key derivation	Server never sees full index keys
Result Confidentiality	End-to-end encryption	Plaintext results never leave client
Forward Security	Counter-based key evolution	Past compromises don’t affect future queries
Index Obfuscation	Ciphertext-only storage	Unused index nodes remain opaque
Correlation Resistance	Per-insertion randomization	Identical vectors produce different ciphertexts

Traditional encrypted search schemes are vulnerable to statistical attacks. CyborgDB’s forward-secure design prevents these correlation attacks.

5. Performance

While comprehensive cryptographic protection typically comes with significant overhead, CyborgDB’s implementation has been extensively optimized to achieve production-grade performance that rivals plaintext systems.

Performance Optimizations

CyborgDB achieves exceptional performance through several key optimizations:

Hardware acceleration: Leveraging AES-NI and SHA extensions available in modern processors
Parallel key derivation: Computing node keys concurrently across multiple clusters
Lazy decryption: Only decrypting index nodes when accessed during search traversal
Vectorized operations: Batch processing of similarity computations in encrypted space
Cache-optimized layouts: Organizing encrypted index structures for optimal memory access patterns

These optimizations deliver industry-leading performance:

>10,000 QPS at 95% recall for ANN search over encrypted embeddings
Sub-10ms query latency for typical workloads

CyborgDB Docs

Getting Started

Security

Changelog

1. Data Encryption

Vector Embeddings

Document IDs

Document Contents

Metadata Fields

2. Forward-Secure Cryptographic Indexing

Key Concepts

Cryptographic Vector Indexing Flow

3. Cryptographic Approximate Nearest Neighbor Search

Token Generation

Encrypted Index Traversal

4. Security Benefits

Threat Protection Matrix

Security Properties

5. Performance

Performance Optimizations

CyborgDB Docs

Getting Started

Security

Changelog

​1. Data Encryption

Vector Embeddings

Document IDs

Document Contents

Metadata Fields

​2. Forward-Secure Cryptographic Indexing

​Key Concepts

​Cryptographic Vector Indexing Flow

​3. Cryptographic Approximate Nearest Neighbor Search

​Token Generation

​Encrypted Index Traversal

​4. Security Benefits

​Threat Protection Matrix

​Security Properties

​5. Performance

​Performance Optimizations

1. Data Encryption

2. Forward-Secure Cryptographic Indexing

Key Concepts

Cryptographic Vector Indexing Flow

3. Cryptographic Approximate Nearest Neighbor Search

Token Generation

Encrypted Index Traversal

4. Security Benefits

Threat Protection Matrix

Security Properties

5. Performance

Performance Optimizations