Skip to main content
CyborgDB Service can be configured entirely via environment variables, entirely via a YAML file, or any mix of the two. This page is the canonical, exhaustive YAML reference — every key the service understands, what it does, and what its default is. For prose explanations and end-to-end walkthroughs, see:

Resolution and precedence

The service resolves a YAML file in this order (first hit wins):
  1. The CYBORGDB_CONFIG_FILE environment variable. Missing path = hard error.
  2. ./cyborgdb.yaml
  3. ./cyborgdb.yml
  4. /etc/cyborgdb/cyborgdb.yaml
A missing file is fine — the service falls back to env-only. Settings precedence, highest to lowest:
  1. Init args (programmatic embedding)
  2. Environment variables
  3. .env file
  4. YAML file
  5. File secrets

Env-var substitution

Any string value in the YAML may reference an environment variable:
  • ${VAR} — required. Startup fails if VAR is unset.
  • ${VAR:-default} — uses default when VAR is unset.
A variable set to the empty string counts as unset. Use this pattern to keep BYOK role ARNs, account IDs, and credentials out of the checked-in YAML.

Full schema

cyborgdb.yaml
service:
  # ---- Server ----
  port: 8000                                       # int. Default: 8000.
  require_api_key: true                            # bool. Default: true.
  cyborgdb_service_log_level: INFO                 # DEBUG | INFO | WARNING | ERROR. Default: INFO.

  # ---- TLS / HTTPS (optional; both required, both must exist on disk) ----
  ssl_cert_path: /etc/cyborgdb/tls/cert.pem        # string. Default: unset (HTTP).
  ssl_key_path:  /etc/cyborgdb/tls/key.pem         # string. Default: unset (HTTP).

  # ---- Authentication ----
  cyborgdb_api_key: ${CYBORGDB_API_KEY}            # string. REQUIRED. The X-API-Key clients send.
  cyborgdb_service_root_key: ${CYBORGDB_SERVICE_ROOT_KEY}  # string. Optional. When set, enables RBAC.
                                                   # See ./multi-tenancy.

  # ---- Storage backend ----
  cyborgdb_db_type: disk                           # memory | disk | s3. Default: disk.
  cyborgdb_disk_path: /var/lib/cyborgdb            # string. disk only. Default: ~/.cyborgdb/data
                                                   # (or /app/cyborgdb_data in Docker).

  # S3 settings (used only when cyborgdb_db_type: s3)
  cyborgdb_s3_bucket: my-bucket                    # string. REQUIRED for s3.
  cyborgdb_s3_region: us-east-1                    # string. Optional. Default: us-east-1.
  cyborgdb_s3_prefix: cyborgdb/                    # string. Optional. Default: unset.
  cyborgdb_s3_endpoint: https://minio.internal:9000 # string. Optional. Required for non-AWS endpoints.
  cyborgdb_s3_access_key: ${MINIO_ACCESS_KEY}      # string. Required with cyborgdb_s3_endpoint.
  cyborgdb_s3_secret_key: ${MINIO_SECRET_KEY}      # string. Required with cyborgdb_s3_endpoint.
  cyborgdb_s3_session_token: ${MINIO_SESSION_TOKEN} # string. Optional.

  # ---- Per-keystore RAM cache (applied to every newly created index) ----
  cache_policy_vectors:  false                     # bool. Default: false.
  cache_policy_metadata: false                     # bool. Default: false.
  cache_policy_ids:      false                     # bool. Default: false.

  # ---- Performance ----
  cpu_threads: 0                                   # int. 0 = auto-detect. Default: 0.
  gpu_operations: none                             # none | upsert | train | all | comma-list.
                                                   # Default: none. (Query GPU not yet supported.)
  retrain_threshold: 10000                         # int. Auto-retrain trigger:
                                                   # fires when num_vectors > n_lists * this.
  auto_train_disabled: false                       # bool. Default: false. Set to true to fully
                                                   # disable post-upsert auto-training (explicit
                                                   # train() / POST /v1/indexes/train still work).
                                                   # Also implied when retrain_threshold < 0.

  # ---- KMS cache ----
  index_kek_cache_ttl_seconds: 60                  # int. Default: 60. TTL for plaintext index KEKs
                                                   # in the service-side cache. Shorter = faster
                                                   # KMS revocation propagation; longer = fewer
                                                   # KMS calls.

# ---- Per-index KMS registry (optional; required only if any index uses kms_name) ----
kms:
  registry:
    # Each child is a named slot referenced by create_index(kms_name=...).
    <slot-name>:
      provider: aws-kms                            # aws-kms | aws. REQUIRED per slot.
                                                   #   aws-kms — AWS KMS (HSM-managed KEK).
                                                   #   aws     — AWS Secrets Manager (KEK in
                                                   #             Secrets Manager; local AES-GCM).
      key_id:   alias/cyborgdb-default             # string. REQUIRED. KMS key id/ARN or
                                                   # Secrets Manager name/ARN, depending on provider.
      region:   us-east-1                          # string. REQUIRED.

      # BYOK (cross-account access): optional triple.
      role_arn:    ${ACME_BYOK_ROLE_ARN}           # string. Optional. Service calls sts:AssumeRole.
      external_id: ${ACME_BYOK_EXTERNAL_ID}        # string. Required if role_arn set.
      role_session_name: cyborgdb-acme             # string. Optional. Appears in customer's CloudTrail.

Storage backend cheat sheet

BackendPersistenceRequired keys (beyond cyborgdb_db_type)Credential source
memoryIn-process onlyn/a
disk (default)Embedded RocksDB on local diskcyborgdb_disk_path (optional)n/a
s3 (AWS, instance role)AWS S3cyborgdb_s3_bucketAWS default credential provider chain (instance/task role, env, profile)
s3 (AWS, explicit keys)AWS S3cyborgdb_s3_bucket + AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env varsAWS default chain
s3 (S3-compatible: MinIO, R2, …)MinIO/R2 etc.cyborgdb_s3_bucket, cyborgdb_s3_endpoint, cyborgdb_s3_access_key, cyborgdb_s3_secret_keyExplicit CYBORGDB_S3_* only — AWS chain bypassed
The CYBORGDB_S3_* namespace is deliberately separate from AWS_* so storage and KMS credentials cannot collide. KMS (under kms.registry) uses the standard AWS credential chain or sts:AssumeRole; S3 storage uses its own explicit keys (or the chain if no explicit keys are set).

KMS provider matrix

ProviderWhere the wrap key livesKEK flowWhen to choose
aws-kmsAWS KMS (HSM-managed)Service generates KEK → kms.Encrypt. On load, kms.Decrypt.HSM isolation; cleanest revocation semantics.
awsAWS Secrets ManagerService generates KEK → AES-GCM-wraps under the Secrets Manager value. On load, fetches the secret and unwraps locally.Cross-account BYOK where customers prefer Secrets Manager.
Both providers accept role_arn + external_id for cross-account (BYOK) access — the service calls sts:AssumeRole before reaching the key on every wrap or unwrap.

Minimal viable configs

Dev — disk, single key:
service:
  cyborgdb_api_key: ${CYBORGDB_API_KEY}
Production — S3 on AWS with instance role, TLS, RBAC:
service:
  port: 8443
  cyborgdb_api_key:      ${CYBORGDB_API_KEY}
  cyborgdb_service_root_key: ${CYBORGDB_SERVICE_ROOT_KEY}
  ssl_cert_path: /etc/ssl/certs/cyborgdb.crt
  ssl_key_path:  /etc/ssl/private/cyborgdb.key
  cyborgdb_db_type:   s3
  cyborgdb_s3_bucket: cyborgdb-prod
  cyborgdb_s3_region: us-east-1
Production — MinIO + per-tenant BYOK:
service:
  cyborgdb_api_key:      ${CYBORGDB_API_KEY}
  cyborgdb_service_root_key: ${CYBORGDB_SERVICE_ROOT_KEY}
  cyborgdb_db_type:      s3
  cyborgdb_s3_bucket:    cyborgdb
  cyborgdb_s3_endpoint:  https://minio.internal:9000
  cyborgdb_s3_access_key: ${MINIO_ACCESS_KEY}
  cyborgdb_s3_secret_key: ${MINIO_SECRET_KEY}
  index_kek_cache_ttl_seconds: 30

kms:
  registry:
    vendor-default:
      provider: aws-kms
      key_id:   alias/cyborgdb-default
      region:   us-east-1
    customer-acme:
      provider:    aws
      key_id:      customers/acme/wrap-key
      region:      us-west-2
      role_arn:    ${ACME_BYOK_ROLE_ARN}
      external_id: ${ACME_BYOK_EXTERNAL_ID}

Validation behavior

  • Invalid cyborgdb_db_type (anything not in memory | disk | s3) — startup fails fast with a clear error.
  • CYBORGDB_CONFIG_FILE set to a missing path — hard error.
  • ${VAR} referencing an unset env var — hard error at parse time.
  • KMS slot with missing provider/key_id/region — load-time error when the first index references the slot.
  • cyborgdb_s3_endpoint set without explicit cyborgdb_s3_access_key + cyborgdb_s3_secret_key — startup fails (the AWS chain is bypassed for custom endpoints).

See also