Usage Guide¶
Embenx is designed to be simple for prototyping yet robust enough for research-grade agentic memory. This guide covers core retrieval and serialization.
Core Retrieval¶
The primary interface is the Collection class. It provides a table-like abstraction for vectors and metadata.
from embenx import Collection
import numpy as np
# 1. Initialize with a specific backend
# Options: 'faiss-hnsw', 'scann', 'usearch', 'pgvector', 'duckdb', etc.
col = Collection(dimension=768, indexer_type="faiss-hnsw")
# 2. Add data
# Vectors can be numpy arrays or lists
vectors = np.random.rand(100, 768).astype('float32')
metadata = [{"id": i, "text": f"Document {i}", "tag": "test"} for i in range(100)]
col.add(vectors, metadata)
# 3. Basic Search
# Returns a list of (metadata, distance) tuples
results = col.search(query_vector, top_k=5)
# 4. Metadata Filtering
# Supports exact match dictionary filters across any indexed field
results = col.search(query_vector, top_k=5, where={"tag": "test"})
# 5. Serialization
# Saves to a portable Parquet file containing both vectors and metadata
col.to_parquet("my_memory.parquet")
# Load back
new_col = Collection.from_parquet("my_memory.parquet")
Advanced Retrieval Features¶
Matryoshka Truncation¶
If you are using Matryoshka Representation Learning (MRL) models, you can truncate dimensions for 10x faster retrieval with minimal accuracy loss.
# Define a collection that truncates 768-dim embeddings to 128
col = Collection(dimension=768, truncate_dim=128)
# Input vectors are still expected to be 768-dim; truncation happens internally
col.add(full_vectors, metadata)
results = col.search(full_query_vector)
Hybrid Search (Dense + Sparse)¶
Combine semantic vector search with keyword-based BM25 retrieval using Reciprocal Rank Fusion (RRF).
# Initialize with a sparse indexer
col = Collection(dimension=768, sparse_indexer_type="bm25")
# Perform hybrid search
results = col.hybrid_search(
query_vector=q_vec,
query_text="fox",
dense_weight=0.5,
sparse_weight=0.5
)
Reranking¶
Improve precision by re-scoring top candidates with a Cross-Encoder or FlashRank.
from embenx.rerank import RerankHandler
# Use FlashRank (CPU-optimized)
ranker = RerankHandler(model_name="ms-marco-TinyBERT-L-2-v2", model_type="flashrank")
# Search with reranking hook
results = col.search(query_vector, top_k=5, reranker=ranker, query_text="My original question")
Evaluation & Benchmarking¶
Embenx makes it easy to measure the performance of different indexers on your own data.
# Measure Recall@10 against an exact search baseline
metrics = col.evaluate(indexer_type="faiss-hnsw", top_k=10)
print(f"Recall: {metrics['recall']}, Latency: {metrics['latency_ms']}ms")
# Benchmark multiple indexers side-by-side
col.benchmark(indexers=["faiss", "usearch", "hnswlib"])
Synthetic Data Generation¶
Embenx allows you to generate high-quality synthetic query-document pairs from your collections using LLMs. This is useful for creating fine-tuning datasets or evaluation benchmarks.
# 1. Generate queries using LiteLLM (v1.83.0+)
# Supports GPT-4, Claude, Gemini, etc.
results = col.generate_synthetic_queries(
text_key="text",
n_queries_per_doc=2,
num_docs=100,
model="gpt-4o-mini"
)
# 2. Use a local LLM (Ollama)
# Requires running: ollama run llama3
results = col.generate_synthetic_queries(
model="ollama/llama3",
api_base="http://localhost:11434",
output_path="training_data.parquet"
)
# 3. Export to JSONL or CSV
col.generate_synthetic_queries(
n_queries_per_doc=1,
output_path="eval_bench.jsonl"
)