Skip to main content

DBConfig

The DBConfig class specifies the storage location for the index, with options for in-memory storage or databases.

Parameters

ParameterTypeDefaultDescription
locationstring-DB location (redis, postgres, rocksdb, memory, threadsafememory)
table_namestringNone(Optional) Table name (postgres-only)
connection_stringstringNone(Optional) Connection string to access DB
The supported location options are:
  • "redis": Use for high-speed, in-memory storage (recommended for index_location)
  • "postgres": Use for reliable, SQL-based storage (recommended for config_location)
  • "rocksdb": Use for persistent, on-disk key-value storage
  • "memory": Use for temporary in-memory storage (for benchmarking and evaluation purposes)
  • "threadsafememory": Use for thread-safe in-memory storage (for multi-threaded benchmarking)

Example Usage

from cyborgdb_core import DBConfig

# Redis configuration
index_location = DBConfig(
    location="redis",
    connection_string="redis://localhost:6379"
)

# PostgreSQL configuration
config_location = DBConfig(
    location="postgres",
    table_name="config_table",
    connection_string="host=localhost dbname=vectordb user=postgres"
)

# RocksDB configuration
rocksdb_location = DBConfig(
    location="rocksdb",
    connection_string="/path/to/rocksdb"
)

# Memory configuration (for testing)
memory_location = DBConfig(location="memory")

# Thread-safe memory configuration (for multi-threaded testing)
ts_memory_location = DBConfig(location="threadsafememory")

Embeddings

The Embedded LangChain integration accepts any LangChain Embeddings implementation:

Supported Embedding Types

TypeDescriptionExample
EmbeddingsAny LangChain Embeddings implementationOpenAIEmbeddings(), HuggingFaceEmbeddings()

Example Usage

from langchain_openai import OpenAIEmbeddings
from cyborgdb_core.integrations.langchain import CyborgVectorStore
from cyborgdb_core import DBConfig

# Using LangChain Embeddings
store = CyborgVectorStore(
    index_name="docs",
    index_key=key,
    api_key="your-api-key",
    embedding=OpenAIEmbeddings(),
    index_location=DBConfig("memory"),
    config_location=DBConfig("memory")
)

DistanceMetric

DistanceMetric is a string representing the distance metric used for the index. Options include:
  • "cosine": Cosine similarity (recommended for normalized embeddings)
  • "euclidean": Euclidean distance
  • "squared_euclidean": Squared Euclidean distance

Metric Characteristics

MetricRangeBest MatchUse Case
cosine[0, 2]0Text embeddings, normalized vectors
euclidean[0, ∞)0Raw feature vectors
squared_euclidean[0, ∞)0When avoiding sqrt computation

IndexType

The index type determines the algorithm used for approximate nearest neighbor search.

Available Index Types

TypeDescriptionSpeedRecallIndex Size
"ivfflat"Inverted file with flat storageFastHighestBiggest
"ivfpq"Inverted file with product quantizationFastHighMedium
"ivfsq"Inverted file with scalar quantizationFastHighSmall
The default index type for the Embedded library is "ivfsq".

Example Usage

from langchain_openai import OpenAIEmbeddings

# IVFFlat index (highest recall)
store = CyborgVectorStore(
    index_name="high_recall_index",
    index_key=key,
    api_key="your-api-key",
    embedding=OpenAIEmbeddings(),
    index_location=DBConfig("memory"),
    config_location=DBConfig("memory"),
    index_type="ivfflat",
    index_config_params={"n_lists": 1024}
)

# IVFPQ index (balanced performance)
store = CyborgVectorStore(
    index_name="balanced_index",
    index_key=key,
    api_key="your-api-key",
    embedding=OpenAIEmbeddings(),
    index_location=DBConfig("memory"),
    config_location=DBConfig("memory"),
    index_type="ivfpq",
    index_config_params={
        "n_lists": 1024,
        "pq_dim": 64,
        "pq_bits": 8
    }
)

IndexConfigParams

Optional parameters for configuring the index, passed as a dictionary.

Parameters by Index Type

IVFFlat

ParameterTypeDefaultDescription
n_listsint1024Number of inverted lists (clusters)

IVFPQ

ParameterTypeDefaultDescription
n_listsint1024Number of inverted lists (clusters)
pq_dimint8Dimensionality after product quantization
pq_bitsint8Bits per quantized dimension (1-16)

IVFSQ

ParameterTypeDefaultDescription
n_listsint1024Number of inverted lists (clusters)
sq_bitsint8Bits per scalar quantized value

Tuning Guidelines

  • n_lists: Use √n where n is the expected number of vectors. Common values: 256, 512, 1024, 2048
  • pq_dim: Should divide the embedding dimension evenly. Lower values = more compression
  • pq_bits / sq_bits: 8 bits provides good balance. Lower = more compression, higher = better accuracy

Document

LangChain Document object used for storing text with metadata.

Attributes

AttributeTypeDescription
page_contentstrThe text content of the document
metadatadictOptional metadata associated with the document

Example Usage

from langchain_core.documents import Document

# Create a document
doc = Document(
    page_content="This is the content of my document",
    metadata={
        "source": "manual",
        "author": "John Doe",
        "timestamp": "2024-01-01"
    }
)

# Add to vector store
store.add_documents([doc])

Filter Format

Metadata filters use a dictionary format for querying documents.

Simple Filters

# Exact match
filter = {"category": "technology"}

# Multiple conditions (AND)
filter = {
    "category": "technology",
    "year": 2024
}

Advanced Filters

# Range queries
filter = {
    "price": {"$gte": 100, "$lte": 500}
}

# IN queries
filter = {
    "tags": {"$in": ["python", "machine-learning"]}
}

# Nested fields
filter = {
    "metadata.author": "John Doe"
}

Supported Operators

OperatorDescriptionExample
$eqEqual to{"age": {"$eq": 25}}
$neNot equal to{"status": {"$ne": "archived"}}
$gtGreater than{"price": {"$gt": 100}}
$gteGreater than or equal{"score": {"$gte": 0.8}}
$ltLess than{"quantity": {"$lt": 10}}
$lteLess than or equal{"rating": {"$lte": 5}}
$inIn array{"tags": {"$in": ["ai", "ml"]}}
$ninNot in array{"category": {"$nin": ["draft", "deleted"]}}

Return Types

Query Results

Query operations return documents with optional scores:
# similarity_search returns List[Document]
docs = store.similarity_search("query", k=5)
# Returns: [Document(...), Document(...), ...]

# similarity_search_with_score returns List[Tuple[Document, float]]
results = store.similarity_search_with_score("query", k=5)
# Returns: [(Document(...), 0.95), (Document(...), 0.87), ...]

Score Normalization

Scores are normalized to [0, 1] range where:
  • 1.0 = Perfect match
  • 0.0 = Worst match
The normalization depends on the distance metric used.

Async Support

All methods have async variants prefixed with a:
Sync MethodAsync Method
add_textsaadd_texts
add_documentsaadd_documents
similarity_searchasimilarity_search
similarity_search_with_scoreasimilarity_search_with_score
similarity_search_by_vectorasimilarity_search_by_vector
max_marginal_relevance_searchamax_marginal_relevance_search
deleteadelete

Example Usage

import asyncio

async def main():
    # Async text addition
    ids = await store.aadd_texts(["async text 1", "async text 2"])

    # Async search
    docs = await store.asimilarity_search("query", k=5)

    # Async deletion
    success = await store.adelete(ids)

asyncio.run(main())