DBConfig
class specifies the storage location for the index, with options for in-memory storage, databases, or file-based storage.
Parameter | Type | Default | Description |
---|---|---|---|
location | string | - | DB location (redis , postgres , memory , s3 , gcs , local ) |
table_name | string | None | (Optional) Table name (postgres -only) |
connection_string | string | None | (Optional) Connection string to access DB |
bucket | string | None | (Optional) Bucket name for cloud storage (s3 , gcs ) |
access_key | string | None | (Optional) Access key for cloud storage |
secret_key | string | None | (Optional) Secret key for cloud storage |
region | string | None | (Optional) Region for cloud storage |
endpoint | string | None | (Optional) Custom endpoint for S3-compatible storage |
path | string | None | (Optional) Path for local file storage |
location
options are:
"redis"
: Use for high-speed, in-memory storage (recommended for index_location
)"postgres"
: Use for reliable, SQL-based storage (recommended for config_location
)"memory"
: Use for temporary in-memory storage (for benchmarking and evaluation purposes)"s3"
: Use for Amazon S3 or S3-compatible storage"gcs"
: Use for Google Cloud Storage"local"
: Use for local file system storageType | Description | Example |
---|---|---|
str | Model name string for SentenceTransformers | "sentence-transformers/all-MiniLM-L6-v2" |
SentenceTransformer | SentenceTransformer model instance | SentenceTransformer("all-MiniLM-L6-v2") |
Embeddings | Any LangChain Embeddings implementation | OpenAIEmbeddings() , HuggingFaceEmbeddings() |
DistanceMetric
is a string representing the distance metric used for the index. Options include:
"cosine"
: Cosine similarity (recommended for normalized embeddings)"euclidean"
: Euclidean distance"squared_euclidean"
: Squared Euclidean distanceMetric | Range | Best Match | Use Case |
---|---|---|---|
cosine | [0, 2] | 0 | Text embeddings, normalized vectors |
euclidean | [0, ∞) | 0 | Raw feature vectors |
squared_euclidean | [0, ∞) | 0 | When avoiding sqrt computation |
Type | Description | Speed | Recall | Index Size |
---|---|---|---|---|
"ivfflat" | Inverted file with flat storage | Fast | Highest | Biggest |
"ivf" | Inverted file with compression | Fastest | Lowest | Smallest |
"ivfpq" | Inverted file with product quantization | Fast | High | Medium |
cyborgdb-lite
only supports "ivfflat"
index type.
Parameter | Type | Default | Description |
---|---|---|---|
n_lists | int | 1024 | Number of inverted lists (clusters) |
Parameter | Type | Default | Description |
---|---|---|---|
n_lists | int | 1024 | Number of inverted lists (clusters) |
pq_dim | int | 8 | Dimensionality after product quantization |
pq_bits | int | 8 | Bits per quantized dimension (1-16) |
Attribute | Type | Description |
---|---|---|
page_content | str | The text content of the document |
metadata | dict | Optional metadata associated with the document |
Operator | Description | Example |
---|---|---|
$eq | Equal to | {"age": {"$eq": 25}} |
$ne | Not equal to | {"status": {"$ne": "archived"}} |
$gt | Greater than | {"price": {"$gt": 100}} |
$gte | Greater than or equal | {"score": {"$gte": 0.8}} |
$lt | Less than | {"quantity": {"$lt": 10}} |
$lte | Less than or equal | {"rating": {"$lte": 5}} |
$in | In array | {"tags": {"$in": ["ai", "ml"]}} |
$nin | Not in array | {"category": {"$nin": ["draft", "deleted"]}} |
a
:
Sync Method | Async Method |
---|---|
add_texts | aadd_texts |
add_documents | aadd_documents |
similarity_search | asimilarity_search |
similarity_search_with_score | asimilarity_search_with_score |
max_marginal_relevance_search | amax_marginal_relevance_search |
delete | adelete |