> ## Documentation Index > Fetch the complete documentation index at: https://docs.cyborg.co/llms.txt > Use this file to discover all available pages before exploring further. # Threat Model CyborgDB's threat model defines the types of adversaries assumed, the attacks protected against, and the attacks explicitly not addressed. It guides the design of our encryption, key management, and system architecture. A well-scoped threat model is essential to evaluate whether CyborgDB's security properties align with your risk profile. This page should be read alongside the [Encryption](./encryption) section. ## 1. Adversary Model The threat model assume adversaries can be: Can intercept, modify, or replay traffic between client and server. Has full read access to server memory, storage, and code execution environment. A privileged operator or cloud provider administrator with direct infrastructure access. Can analyze access patterns, ciphertexts, and index structures over time. CyborgDB do **not** assume the client endpoint is compromised (see [Out of Scope](#6-out-of-scope)), nor does it defend against malicious users with valid credentials performing authorized actions. ## 2. Attack Surfaces Vector databases present a uniquely dangerous attack surface because they **centralize semantic intelligence** from across an organization's entire data ecosystem. Unlike traditional databases that contain data from a single application, vector databases aggregate embeddings from CRM systems, HR databases, financial records, email communications, and document repositories. Standard vector databases compound this risk by storing & using embeddings in **plaintext format**, making them immediately exploitable upon breach. Once an attacker gains access, they can directly extract dense vector representations and apply machine learning techniques to reconstruct the original sensitive content with high fidelity. This combination of centralized intelligence and plaintext storage transforms what should be isolated system breaches into organization-wide intelligence compromises. The following attack surfaces become particularly critical in this context: * Disk-level theft of database files * Cloud snapshot compromise * Backup leakage * Man-in-the-middle interception * Traffic replay or modification * Memory scraping from compromised server * Runtime introspection of index structures * Retrieval of embeddings or keys from process space * Frequency analysis of search tokens * Correlation between inserted embeddings and prior queries * Leakage via predictable index structures ## 3. Adversary Capabilities | Capability | Example Sources | | ---------------------------- | ----------------------------------------------- | | **Full disk access** | Stolen storage volume, cloud snapshot | | **Full memory access** | Compromised hypervisor, malicious kernel module | | **Network interception** | BGP hijack, malicious ISP | | **Log & telemetry access** | Misconfigured logging, compromised SIEM | | **Code execution on server** | Supply chain attack, RCE in application stack | The **Server-Side Adversary** case — full memory, disk, and runtime access — is the primary driver for CyborgDB's *in-use encryption* and *forward-secure index* design. ## 4. Attack Demonstration Cyborg demonstrated the severity of vector database vulnerabilities at the Confidential Computing Summit (June 2025): * **Target**: Production-like vector DB with synthetic sensitive data (e.g., social security numbers, medical info) * **Attack time**: \< 5 minutes from access to sensitive data recovery * **Recovery rate**: 99.38% successful reconstruction of original documents For a deep dive on how embedding inversion works, the specific attack vectors, and comparative results across vector databases, see [Embedding Inversion](./embedding-inversion). ### Attack Flow ```mermaid theme={null} graph TD A(Database Breach) --> B(Embedding Extraction) B --> C{Inversion Attack} C --> D1[Gradient Optimization] C --> D2[Transformer Models] C --> D3[Custom ML Techniques] D1 --> E(Original Content Recovered) D2 --> E D3 --> E ``` ## 5. Mitigation Mapping The table below maps specific adversary actions to CyborgDB controls: | Attack Vector | Mitigation | Residual Risk | | ---------------------------- | -------------------------------------------------------- | ----------------------------------------------------------- | | **Disk theft** | AES-256-GCM encryption at rest | Key theft from KMS would bypass | | **Memory scraping** | In-use encryption with ephemeral node keys | Queries in progress may suggest active clusters | | **Index structure analysis** | Forward privacy & per-insertion randomization | Search pattern leakage still possible within active session | | **Embedding inversion** | Encrypted embeddings never stored/processed in plaintext | Compromised client could still expose | | **Network interception** | TLS + AEAD | Endpoint compromise would still allow decryption | | **Query correlation** | Forward-secure cryptographic counters | Statistical attacks on *large* query volumes | | **Cross-system linking** | Per-record key derivation with unique IVs | Metadata correlation if encryption keys compromised | These controls assume standard cryptographic primitives — AES-256, SHA-3, HMAC — remain unbroken and that all encryption keys remain secret. If either assumption fails, the corresponding protections may no longer hold. ### Control Points in the Attack Chain ```mermaid theme={null} flowchart LR A[Adversary] -->|Exploit network| B[Intercept traffic] A -->|Steal disk snapshot| C[Access ciphertext at rest] A -->|Compromise server| D[Access memory & runtime] A -->|Analyze index| E[Pattern & correlation attacks] subgraph "CyborgDB Controls" F[TLS + AEAD] --> B G[AES-256-GCM] --> C H[In-use Encryption] --> D I[Forward Privacy] --> E J[Per-insertion Randomization] --> E end B --> K[Attack Blocked] C --> K D --> K E --> K ``` To learn more about how CyborgDB implements these protections, read our [Encryption guide](./encryption). ## 6. Out of Scope CyborgDB **does not** protect against: * **Client endpoint compromise**: If the user's device is compromised, plaintext data may be exposed during normal operation * **Authorized insider misuse**: Valid users performing authorized but malicious actions within their permissions * **Social engineering**: Attacks targeting users to reveal credentials or perform unauthorized actions * **Physical access to client devices**: Direct access to unlocked client machines * **Key escrow attacks**: Government or legal compulsion to provide decryption keys (mitigated by BYOK/HYOK) Organizations should implement complementary controls (endpoint protection, user training, access controls) to address these out-of-scope threat vectors.