CyborgDB encrypts all sensitive data — including embeddings, document IDs, contents, and metadata — end-to-end, ensuring that nothing is stored or processed in plaintext without authorization. This section details the cryptographic design, the indexing scheme, and how forward privacy is achieved.
The technology discussed in this guide is covered by Cyborg’s IP portfolio, including US Patents 12,164,664, 11,860,875, 11,423,028 and 10,977,315, among others. Any use is subject to licensing agreements.

1. Data Encryption

All stored records in CyborgDB are encrypted using:
  • in
  • Per-record unique to prevent ciphertext pattern reuse
  • Authenticated encryption, ensuring both confidentiality and tamper detection
This symmetric encryption is applied to:

Vector Embeddings

Document IDs

Document Contents

Metadata Fields

This guarantees that even with full disk access, data remains encrypted and cannot be modified undetected.

2. Forward-Secure Cryptographic Indexing

CyborgDB also encrypts the search index used for approximate nearest neighbor (ANN) retrieval. The index is constructed using a forward-secure cryptographic scheme inspired by research in forward-private searchable encryption, combined with Cyborg’s patented enhancements.
Forward privacy ensures that newly inserted entries cannot be linked to prior search queries, even if the index is compromised later.

Key Concepts

  1. Per-cluster seeds via HMAC
    Each ANN cluster is assigned a unique seed derived from the master key using (MasterKey, ClusterID).
  2. Binary tree indexing with SHA-3
    The ANN cluster is represented as a binary tree. Internal node keys are derived from their children using .
  3. Token-scoped search
    Search queries produce tokens that only target the minimal set of node keys required to execute the query. Unnecessary index nodes are ignored.
  4. In-use encryption
    Queries are executed over ciphertext nodes, preventing leakage from memory.

Cryptographic Vector Indexing Flow

Below is an overview of how CyborgDB indexes vectors cryptographically: This flow demonstrates how CyborgDB transforms a single vector embedding into multiple encrypted components:
  1. Vector clustering: The clustering model assigns each embedding to an optimal cluster for efficient ANN search
  2. Deterministic key derivation: The master key and cluster ID generate a unique seed for this cluster
  3. Forward privacy: The counter map ensures identical embeddings get different cryptographic treatment on each insertion
  4. Dual-purpose keys: SHA-3 produces two keys serving different functions - one for storage indexing, one for encryption
  5. Secure storage: The final encrypted embedding can only be decrypted with the derived Node Key R
This design ensures that even identical vector embeddings result in different ciphertexts and index entries, preventing correlation attacks across insertions. CyborgDB’s search execution maintains cryptographic protection throughout the entire query process, leveraging the cryptographic index discussed above.

Token Generation

When a client initiates a query:
  1. Query vector preparation: The search vector undergoes the same clustering model to determine target clusters
  2. Search token derivation: For each target cluster, the client recreates the cryptographic key hierarchy
  3. Minimal key generation: Only the node keys required for the query path are computed
This process generates the exact same keys used during indexing, but only for the clusters relevant to the query. The client now possesses the minimal set of tokens needed to decrypt the required index nodes.
This token-scoped approach ensures that irrelevant index nodes remain opaque ciphertexts, minimizing information leakage.

Encrypted Index Traversal

With search tokens in hand, the server executes the query while maintaining cryptographic protection: The retrieval and decryption process:
  1. Token-based retrieval: Search tokens are used as lookup keys in the encrypted KV store
  2. Optional decryption: Embeddings can be decrypted for final distance computation
This ensures that the server can execute efficient ANN search while never having persistent access to plaintext embeddings or the ability to decrypt irrelevant index nodes. Alternatively, node decryption can happen client-side, ensuring the server only deals with ciphertext and never sees plaintext embeddings.

4. Security Benefits

CyborgDB’s multi-layered cryptographic design provides comprehensive protection against various threat scenarios, leveraging Cyborg’s patented technology to achieve security properties impossible with traditional approaches.

Threat Protection Matrix

Threat ScenarioProtection Mechanism
Disk theftAES-256-GCM encryption at rest
Network interceptionTLS + AEAD encryption in transit
Server compromiseIn-use encryption prevents extraction of plaintext embeddings or node keys
Index leakageForward privacy hides relationship between new inserts and past searches
Embedding inversionCiphertext embeddings are never exposed — inversion is not possible without keys

Security Properties

PropertyImplementationBenefit
Query PrivacyToken-scoped key derivationServer never sees full index keys
Result ConfidentialityEnd-to-end encryptionPlaintext results never leave client
Forward SecurityCounter-based key evolutionPast compromises don’t affect future queries
Index ObfuscationCiphertext-only storageUnused index nodes remain opaque
Correlation ResistancePer-insertion randomizationIdentical vectors produce different ciphertexts
Traditional encrypted search schemes are vulnerable to statistical attacks. CyborgDB’s forward-secure design prevents these correlation attacks.

5. Performance

While comprehensive cryptographic protection typically comes with significant overhead, CyborgDB’s implementation has been extensively optimized to achieve production-grade performance that rivals plaintext systems.

Performance Optimizations

CyborgDB achieves exceptional performance through several key optimizations:
  • Hardware acceleration: Leveraging AES-NI and SHA extensions available in modern processors
  • Parallel key derivation: Computing node keys concurrently across multiple clusters
  • Lazy decryption: Only decrypting index nodes when accessed during search traversal
  • Vectorized operations: Batch processing of similarity computations in encrypted space
  • Cache-optimized layouts: Organizing encrypted index structures for optimal memory access patterns
These optimizations deliver industry-leading performance:
>10,000 QPS at 95% recall for ANN search over encrypted embeddings
Sub-10ms query latency for typical workloads