The technology discussed in this guide is covered by Cyborg’s IP portfolio, including US Patents 12,164,664, 11,860,875, 11,423,028 and 10,977,315, among others. Any use is subject to licensing agreements.
1. Data Encryption
All stored records in CyborgDB are encrypted using:- in
- Per-record unique to prevent ciphertext pattern reuse
- Authenticated encryption, ensuring both confidentiality and tamper detection
Vector Embeddings
Document IDs
Document Contents
Metadata Fields
2. Forward-Secure Cryptographic Indexing
CyborgDB also encrypts the search index used for approximate nearest neighbor (ANN) retrieval. The index is constructed using a forward-secure cryptographic scheme inspired by research in forward-private searchable encryption, combined with Cyborg’s patented enhancements.Forward privacy ensures that newly inserted entries cannot be linked to prior search queries, even if the index is compromised later.
Key Concepts
-
Per-cluster seeds via HMAC
Each ANN cluster is assigned a unique seed derived from the master key using (MasterKey, ClusterID). -
Binary tree indexing with SHA-3
The ANN cluster is represented as a binary tree. Internal node keys are derived from their children using . -
Token-scoped search
Search queries produce tokens that only target the minimal set of node keys required to execute the query. Unnecessary index nodes are ignored. -
In-use encryption
Queries are executed over ciphertext nodes, preventing leakage from memory.
Cryptographic Vector Indexing Flow
Below is an overview of how CyborgDB indexes vectors cryptographically: This flow demonstrates how CyborgDB transforms a single vector embedding into multiple encrypted components:- Vector clustering: The clustering model assigns each embedding to an optimal cluster for efficient ANN search
- Deterministic key derivation: The master key and cluster ID generate a unique seed for this cluster
- Forward privacy: The counter map ensures identical embeddings get different cryptographic treatment on each insertion
- Dual-purpose keys: SHA-3 produces two keys serving different functions - one for storage indexing, one for encryption
- Secure storage: The final encrypted embedding can only be decrypted with the derived Node Key R
3. Cryptographic Approximate Nearest Neighbor Search
CyborgDB’s search execution maintains cryptographic protection throughout the entire query process, leveraging the cryptographic index discussed above.Token Generation
When a client initiates a query:- Query vector preparation: The search vector undergoes the same clustering model to determine target clusters
- Search token derivation: For each target cluster, the client recreates the cryptographic key hierarchy
- Minimal key generation: Only the node keys required for the query path are computed
This token-scoped approach ensures that irrelevant index nodes remain opaque ciphertexts, minimizing information leakage.
Encrypted Index Traversal
With search tokens in hand, the server executes the query while maintaining cryptographic protection: The retrieval and decryption process:- Token-based retrieval: Search tokens are used as lookup keys in the encrypted KV store
- Optional decryption: Embeddings can be decrypted for final distance computation
4. Security Benefits
CyborgDB’s multi-layered cryptographic design provides comprehensive protection against various threat scenarios, leveraging Cyborg’s patented technology to achieve security properties impossible with traditional approaches.Threat Protection Matrix
Threat Scenario | Protection Mechanism |
---|---|
Disk theft | AES-256-GCM encryption at rest |
Network interception | TLS + AEAD encryption in transit |
Server compromise | In-use encryption prevents extraction of plaintext embeddings or node keys |
Index leakage | Forward privacy hides relationship between new inserts and past searches |
Embedding inversion | Ciphertext embeddings are never exposed — inversion is not possible without keys |
Security Properties
Property | Implementation | Benefit |
---|---|---|
Query Privacy | Token-scoped key derivation | Server never sees full index keys |
Result Confidentiality | End-to-end encryption | Plaintext results never leave client |
Forward Security | Counter-based key evolution | Past compromises don’t affect future queries |
Index Obfuscation | Ciphertext-only storage | Unused index nodes remain opaque |
Correlation Resistance | Per-insertion randomization | Identical vectors produce different ciphertexts |
Traditional encrypted search schemes are vulnerable to statistical attacks. CyborgDB’s forward-secure design prevents these correlation attacks.
5. Performance
While comprehensive cryptographic protection typically comes with significant overhead, CyborgDB’s implementation has been extensively optimized to achieve production-grade performance that rivals plaintext systems.Performance Optimizations
CyborgDB achieves exceptional performance through several key optimizations:- Hardware acceleration: Leveraging AES-NI and SHA extensions available in modern processors
- Parallel key derivation: Computing node keys concurrently across multiple clusters
- Lazy decryption: Only decrypting index nodes when accessed during search traversal
- Vectorized operations: Batch processing of similarity computations in encrypted space
- Cache-optimized layouts: Organizing encrypted index structures for optimal memory access patterns
>10,000 QPS at 95% recall for ANN search over encrypted embeddings
Sub-10ms query latency for typical workloads