DBConfig
The DBConfig class specifies the storage location for the index, with options for in-memory storage or databases.Parameters
| Parameter | Type | Default | Description |
|---|
location | string | - | DB location (redis, postgres, rocksdb, memory, threadsafememory) |
table_name | string | None | (Optional) Table name (postgres-only) |
connection_string | string | None | (Optional) Connection string to access DB |
The supported location options are:
"redis": Use for high-speed, in-memory storage (recommended for index_location)
"postgres": Use for reliable, SQL-based storage (recommended for config_location)
"rocksdb": Use for persistent, on-disk key-value storage
"memory": Use for temporary in-memory storage (for benchmarking and evaluation purposes)
"threadsafememory": Use for thread-safe in-memory storage (for multi-threaded benchmarking)
Example Usage
from cyborgdb_core import DBConfig
# Redis configuration
index_location = DBConfig(
location="redis",
connection_string="redis://localhost:6379"
)
# PostgreSQL configuration
config_location = DBConfig(
location="postgres",
table_name="config_table",
connection_string="host=localhost dbname=vectordb user=postgres"
)
# RocksDB configuration
rocksdb_location = DBConfig(
location="rocksdb",
connection_string="/path/to/rocksdb"
)
# Memory configuration (for testing)
memory_location = DBConfig(location="memory")
# Thread-safe memory configuration (for multi-threaded testing)
ts_memory_location = DBConfig(location="threadsafememory")
Embeddings
The Embedded LangChain integration accepts any LangChain Embeddings implementation:Supported Embedding Types
| Type | Description | Example |
|---|
Embeddings | Any LangChain Embeddings implementation | OpenAIEmbeddings(), HuggingFaceEmbeddings() |
Example Usage
from langchain_openai import OpenAIEmbeddings
from cyborgdb_core.integrations.langchain import CyborgVectorStore
from cyborgdb_core import DBConfig
# Using LangChain Embeddings
store = CyborgVectorStore(
index_name="docs",
index_key=key,
api_key="your-api-key",
embedding=OpenAIEmbeddings(),
index_location=DBConfig("memory"),
config_location=DBConfig("memory")
)
DistanceMetric
DistanceMetric is a string representing the distance metric used for the index. Options include:
"cosine": Cosine similarity (recommended for normalized embeddings)
"euclidean": Euclidean distance
"squared_euclidean": Squared Euclidean distance
Metric Characteristics
| Metric | Range | Best Match | Use Case |
|---|
cosine | [0, 2] | 0 | Text embeddings, normalized vectors |
euclidean | [0, ∞) | 0 | Raw feature vectors |
squared_euclidean | [0, ∞) | 0 | When avoiding sqrt computation |
IndexType
The index type determines the algorithm used for approximate nearest neighbor search.Available Index Types
| Type | Description | Speed | Recall | Index Size |
|---|
"ivfflat" | Inverted file with flat storage | Fast | Highest | Biggest |
"ivfpq" | Inverted file with product quantization | Fast | High | Medium |
"ivfsq" | Inverted file with scalar quantization | Fast | High | Small |
The default index type for the Embedded library is "ivfsq".
Example Usage
from langchain_openai import OpenAIEmbeddings
# IVFFlat index (highest recall)
store = CyborgVectorStore(
index_name="high_recall_index",
index_key=key,
api_key="your-api-key",
embedding=OpenAIEmbeddings(),
index_location=DBConfig("memory"),
config_location=DBConfig("memory"),
index_type="ivfflat",
index_config_params={"n_lists": 1024}
)
# IVFPQ index (balanced performance)
store = CyborgVectorStore(
index_name="balanced_index",
index_key=key,
api_key="your-api-key",
embedding=OpenAIEmbeddings(),
index_location=DBConfig("memory"),
config_location=DBConfig("memory"),
index_type="ivfpq",
index_config_params={
"n_lists": 1024,
"pq_dim": 64,
"pq_bits": 8
}
)
IndexConfigParams
Optional parameters for configuring the index, passed as a dictionary.Parameters by Index Type
IVFFlat
| Parameter | Type | Default | Description |
|---|
n_lists | int | 1024 | Number of inverted lists (clusters) |
IVFPQ
| Parameter | Type | Default | Description |
|---|
n_lists | int | 1024 | Number of inverted lists (clusters) |
pq_dim | int | 8 | Dimensionality after product quantization |
pq_bits | int | 8 | Bits per quantized dimension (1-16) |
IVFSQ
| Parameter | Type | Default | Description |
|---|
n_lists | int | 1024 | Number of inverted lists (clusters) |
sq_bits | int | 8 | Bits per scalar quantized value |
Tuning Guidelines
- n_lists: Use √n where n is the expected number of vectors. Common values: 256, 512, 1024, 2048
- pq_dim: Should divide the embedding dimension evenly. Lower values = more compression
- pq_bits / sq_bits: 8 bits provides good balance. Lower = more compression, higher = better accuracy
Document
LangChain Document object used for storing text with metadata.Attributes
| Attribute | Type | Description |
|---|
page_content | str | The text content of the document |
metadata | dict | Optional metadata associated with the document |
Example Usage
from langchain_core.documents import Document
# Create a document
doc = Document(
page_content="This is the content of my document",
metadata={
"source": "manual",
"author": "John Doe",
"timestamp": "2024-01-01"
}
)
# Add to vector store
store.add_documents([doc])
Metadata filters use a dictionary format for querying documents.Simple Filters
# Exact match
filter = {"category": "technology"}
# Multiple conditions (AND)
filter = {
"category": "technology",
"year": 2024
}
Advanced Filters
# Range queries
filter = {
"price": {"$gte": 100, "$lte": 500}
}
# IN queries
filter = {
"tags": {"$in": ["python", "machine-learning"]}
}
# Nested fields
filter = {
"metadata.author": "John Doe"
}
Supported Operators
| Operator | Description | Example |
|---|
$eq | Equal to | {"age": {"$eq": 25}} |
$ne | Not equal to | {"status": {"$ne": "archived"}} |
$gt | Greater than | {"price": {"$gt": 100}} |
$gte | Greater than or equal | {"score": {"$gte": 0.8}} |
$lt | Less than | {"quantity": {"$lt": 10}} |
$lte | Less than or equal | {"rating": {"$lte": 5}} |
$in | In array | {"tags": {"$in": ["ai", "ml"]}} |
$nin | Not in array | {"category": {"$nin": ["draft", "deleted"]}} |
Return Types
Query Results
Query operations return documents with optional scores:# similarity_search returns List[Document]
docs = store.similarity_search("query", k=5)
# Returns: [Document(...), Document(...), ...]
# similarity_search_with_score returns List[Tuple[Document, float]]
results = store.similarity_search_with_score("query", k=5)
# Returns: [(Document(...), 0.95), (Document(...), 0.87), ...]
Score Normalization
Scores are normalized to [0, 1] range where:
- 1.0 = Perfect match
- 0.0 = Worst match
The normalization depends on the distance metric used.
Async Support
All methods have async variants prefixed with a:| Sync Method | Async Method |
|---|
add_texts | aadd_texts |
add_documents | aadd_documents |
similarity_search | asimilarity_search |
similarity_search_with_score | asimilarity_search_with_score |
similarity_search_by_vector | asimilarity_search_by_vector |
max_marginal_relevance_search | amax_marginal_relevance_search |
delete | adelete |
Example Usage
import asyncio
async def main():
# Async text addition
ids = await store.aadd_texts(["async text 1", "async text 2"])
# Async search
docs = await store.asimilarity_search("query", k=5)
# Async deletion
success = await store.adelete(ids)
asyncio.run(main())
Connection Configuration
The Python SDK connects to a running CyborgDB service instead of using DBConfig for storage locations.| Parameter | Type | Default | Description |
|---|
base_url | str | - | Base URL of the CyborgDB microservice endpoint |
api_key | str | - | API key for authentication with the microservice |
verify_ssl | bool | None | (Optional) SSL verification. When None, automatically disabled for localhost and http:// URLs |
DBConfig and GPUConfig are not applicable to the Python SDK. Storage configuration is managed by the CyborgDB service.
Example Usage
from cyborgdb.integrations.langchain import CyborgVectorStore
store = CyborgVectorStore(
index_name="my_documents",
index_key=key,
api_key="your-api-key",
embedding="all-MiniLM-L6-v2",
base_url="http://localhost:8000"
)
Embeddings
The Python SDK supports the same embedding types as the Embedded library:| Type | Description | Example |
|---|
str | Model name string for SentenceTransformers | "sentence-transformers/all-MiniLM-L6-v2" |
SentenceTransformer | SentenceTransformer model instance | SentenceTransformer("all-MiniLM-L6-v2") |
Embeddings | Any LangChain Embeddings implementation | OpenAIEmbeddings(), HuggingFaceEmbeddings() |
DistanceMetric
Same as Embedded:
"cosine": Cosine similarity (recommended for normalized embeddings)
"euclidean": Euclidean distance
"squared_euclidean": Squared Euclidean distance
| Metric | Range | Best Match | Use Case |
|---|
cosine | [0, 2] | 0 | Text embeddings, normalized vectors |
euclidean | [0, ∞) | 0 | Raw feature vectors |
squared_euclidean | [0, ∞) | 0 | When avoiding sqrt computation |
IndexType
Same as Embedded:| Type | Description | Speed | Recall | Index Size |
|---|
"ivfflat" | Inverted file with flat storage | Fast | Highest | Biggest |
"ivfpq" | Inverted file with product quantization | Fast | High | Medium |
"ivfsq" | Inverted file with scalar quantization | Fast | High | Small |
Document
LangChain Document object used for storing text with metadata.| Attribute | Type | Description |
|---|
page_content | str | The text content of the document |
metadata | dict | Optional metadata associated with the document |
from langchain_core.documents import Document
doc = Document(
page_content="This is the content of my document",
metadata={"source": "manual", "author": "John Doe"}
)
Same filter format as Embedded. Metadata filters use a dictionary format:# Exact match
filter = {"category": "technology"}
# Range queries
filter = {"price": {"$gte": 100, "$lte": 500}}
# IN queries
filter = {"tags": {"$in": ["python", "machine-learning"]}}
Supported operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin
Return Types
Same as Embedded:# similarity_search returns List[Document]
docs = store.similarity_search("query", k=5)
# similarity_search_with_score returns List[Tuple[Document, float]]
results = store.similarity_search_with_score("query", k=5)
Async Support
Same async variants as Embedded, prefixed with a:| Sync Method | Async Method |
|---|
add_texts | aadd_texts |
add_documents | aadd_documents |
similarity_search | asimilarity_search |
similarity_search_with_score | asimilarity_search_with_score |
similarity_search_by_vector | asimilarity_search_by_vector |
delete | adelete |
CyborgVectorStoreConfig
Configuration interface for constructing a CyborgVectorStore in JavaScript/TypeScript.interface CyborgVectorStoreConfig {
baseUrl: string;
apiKey: string;
indexName: string;
indexKey: Uint8Array;
indexType?: "ivfflat" | "ivfpq" | "ivfsq";
indexConfigParams?: Record<string, number>;
dimension?: number;
metric?: "cosine" | "euclidean" | "squared_euclidean";
verifySsl?: boolean;
}
| Parameter | Type | Default | Description |
|---|
baseUrl | string | - | Base URL of the CyborgDB microservice endpoint |
apiKey | string | - | API key for authentication |
indexName | string | - | Name of the index |
indexKey | Uint8Array | - | 32-byte encryption key |
indexType | string | "ivfflat" | (Optional) Index algorithm type |
indexConfigParams | Record<string, number> | - | (Optional) Additional index configuration parameters |
dimension | number | - | (Optional) Embedding dimension (auto-inferred if not provided) |
metric | string | "cosine" | (Optional) Distance metric |
verifySsl | boolean | true | (Optional) SSL verification |
All parameters use camelCase convention, consistent with JavaScript/TypeScript standards.
EmbeddingsInterface
The JS/TS SDK accepts any LangChain EmbeddingsInterface implementation:import { OpenAIEmbeddings } from '@langchain/openai';
import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';
// OpenAI embeddings
const embeddings = new OpenAIEmbeddings();
// HuggingFace embeddings
const embeddings = new HuggingFaceTransformersEmbeddings({
model: "all-MiniLM-L6-v2"
});
Unlike the Python SDKs, the JS/TS SDK does not accept raw model name strings. You must pass a LangChain EmbeddingsInterface instance.
DistanceMetric
Same options as Python SDKs:
"cosine": Cosine similarity (recommended for normalized embeddings)
"euclidean": Euclidean distance
"squared_euclidean": Squared Euclidean distance
Document
LangChain Document object:import { Document } from '@langchain/core/documents';
const doc = new Document({
pageContent: "This is the content of my document",
metadata: { source: "manual", author: "John Doe" }
});
| Attribute | Type | Description |
|---|
pageContent | string | The text content of the document |
metadata | Record<string, any> | Optional metadata associated with the document |
FilterType
Metadata filters use an object format:// Exact match
const filter = { category: "technology" };
// Multiple conditions (AND)
const filter = {
category: "technology",
year: 2024
};
Supports the same operators as the Python SDKs: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin.
Return Types
// similaritySearch returns Promise<Document[]>
const docs = await store.similaritySearch("query", 5);
// similaritySearchWithScore returns Promise<[Document, number][]>
const results = await store.similaritySearchWithScore("query", 5);
Async
All JS/TS methods are natively async and return Promise<...>. There are no separate sync/async variants — every method uses await.