API Reference¶
Core¶
- class core.AgenticCollection(name: str = 'default', dimension: int | None = None, indexer_type: str = 'faiss', sparse_indexer_type: str | None = None, truncate_dim: int | None = None, **indexer_kwargs)[source]¶
Bases:
CollectionSpecialized collection for autonomous agent memory. Supports search loops, feedback, and self-healing ranking.
- class core.CacheCollection(name: str = 'default', dimension: int | None = None, indexer_type: str = 'faiss', sparse_indexer_type: str | None = None, truncate_dim: int | None = None, **indexer_kwargs)[source]¶
Bases:
CollectionSpecialized collection for Retrieval-Augmented KV Caching (RA-KVC). Supports storing high-dimensional activation tensors.
- class core.ClusterCollection(n_clusters: int = 10, **kwargs)[source]¶
Bases:
CollectionSpecialized collection for ClusterKV-style optimizations. Implements semantic clustering of vectors for improved retrieval throughput.
- class core.Collection(name: str = 'default', dimension: int | None = None, indexer_type: str = 'faiss', sparse_indexer_type: str | None = None, truncate_dim: int | None = None, **indexer_kwargs)[source]¶
Bases:
objectPrimary interface for managing embeddings and metadata. Provides a high-level API for indexing, search, and I/O.
- add(vectors: ndarray | List[List[float]], metadata: List[Dict[str, Any]] | None = None)[source]¶
Add vectors and metadata to the collection.
- add_images(image_paths: List[str], model: str = 'openai/clip-vit-base-patch32', metadata: List[Dict[str, Any]] | None = None)[source]¶
Embed and add images to the collection.
- benchmark(indexers: List[str] | None = None, top_k: int = 5)[source]¶
Benchmark multiple indexers on the current collection data.
- Parameters:
indexers – List of indexer names to compare (e.g. [“faiss”, “hnswlib”]). If None, benchmarks all available indexers.
top_k – Number of neighbors to search for during benchmark.
- evaluate(indexer_type: str = 'faiss-hnsw', top_k: int = 10, **kwargs) Dict[str, Any][source]¶
Evaluate an indexer’s recall and latency against an exact search baseline.
- Returns:
Dictionary with ‘recall’ and ‘latency_ms’ metrics.
- export_to_production(backend: str, connection_url: str, collection_name: str | None = None)[source]¶
One-click export from local Embenx collection to production clusters. Supported backends: ‘qdrant’, ‘milvus’.
- classmethod from_parquet(path: str, vector_col: str = 'vector', **kwargs)[source]¶
Load a collection from a Parquet file.
- generate_synthetic_queries(text_key: str = 'text', n_queries_per_doc: int = 1, num_docs: int = 100, model: str = 'gpt-4o-mini', custom_prompt: str | None = None, output_path: str | None = None, api_base: str | None = None, **llm_kwargs) List[Dict[str, Any]][source]¶
Generate synthetic search queries for documents in the collection using an LLM. Supports local Ollama/vLLM via api_base and llm_kwargs.
- hybrid_search(query_vector: ndarray | List[float], query_text: str, top_k: int = 5, dense_weight: float = 0.5, sparse_weight: float = 0.5, where: Dict[str, Any] | None = None) List[Tuple[Dict[str, Any], float]][source]¶
Perform hybrid search combining dense and sparse results using Reciprocal Rank Fusion (RRF).
- search(query: ndarray | List[float], top_k: int = 5, where: Dict[str, Any] | None = None, reranker: callable | RerankHandler | None = None, query_text: str | None = None) List[Tuple[Dict[str, Any], float]][source]¶
Search the collection for the nearest neighbors.
- Parameters:
query – Vector to search for.
top_k – Number of results to return.
where – Metadata filter dictionary.
reranker – A callable or RerankHandler for re-scoring.
query_text – Original text for reranking context.
- search_image(image_path: str, model: str = 'openai/clip-vit-base-patch32', top_k: int = 5) List[Tuple[Dict[str, Any], float]][source]¶
Search for similar items using an image query.
- search_trajectory(trajectory: ndarray | List[List[float]], top_k: int = 5, pooling: str = 'mean', where: Dict[str, Any] | None = None) List[Tuple[Dict[str, Any], float]][source]¶
Search for similar trajectories (sequences of vectors).
- Parameters:
trajectory – Sequence of vectors representing a state/action trajectory.
top_k – Number of results to return.
pooling – Method to pool the trajectory into a single search vector (‘mean’ or ‘max’).
where – Metadata filter dictionary.
- class core.Session(session_id: str, dimension: int, storage_dir: str = '.embenx_sessions')[source]¶
Bases:
objectManaged agentic session with automatic temporal decay and persistence.
- class core.SpatialCollection(name: str = 'default', dimension: int | None = None, indexer_type: str = 'faiss', sparse_indexer_type: str | None = None, truncate_dim: int | None = None, **indexer_kwargs)[source]¶
Bases:
CollectionSpecialized collection for ESWM (Episodic Spatial World Memory). Supports navigation trajectories and spatial-aware retrieval.
- class core.StateCollection(name: str = 'default', dimension: int | None = None, indexer_type: str = 'faiss', sparse_indexer_type: str | None = None, truncate_dim: int | None = None, **indexer_kwargs)[source]¶
Bases:
CollectionSpecialized collection for State Space Model (SSM) hydration. Supports storing hidden states (h0).
- class core.TemporalCollection(name: str = 'default', dimension: int | None = None, indexer_type: str = 'faiss', sparse_indexer_type: str | None = None, truncate_dim: int | None = None, **indexer_kwargs)[source]¶
Bases:
CollectionSpecialized collection for Echo-style temporal episodic memory. Supports time-stamped embeddings and recency-biased retrieval.
Benchmark¶
- benchmark.benchmark_single_indexer(name, indexer_cls, dimension, embeddings, metadata, console, cleanup=True)[source]¶
- benchmark.generate_report(results: List[Dict[str, Any]], dataset_name: str, output_path: str = 'benchmark_report.md')[source]¶
Generate a formatted Markdown technical report from benchmark results.
- benchmark.load_custom_indexer(script_path: str, console: Console)[source]¶
Dynamically load a class inheriting from BaseIndexer from a given script.
- benchmark.run_benchmark(dataset_name: str, split: str, text_column: str, max_docs: int, indexer_names: List[str], model_name: str, console: Console, data_files: str = None, cleanup: bool = True, custom_indexer_script: str = None, subset: str = 'default')[source]¶
Run Embenx benchmarks. Matches original signature for test compatibility.
Reranking¶
Data & Zoo¶
- data.load_documents(dataset_name: str, subset: str = 'default', split: str = 'train', max_docs: int = 100) List[Dict[str, Any]][source]¶
Load documents from Hugging Face or local files.
- data.load_from_zoo(dataset_name: str, cache_dir: str = '.embenx_cache') Collection[source]¶
Download and load a pre-built collection from the Embenx Retrieval Zoo.
- data.save_collection(collection: Collection, path: str)[source]¶
Save a collection’s vectors and metadata to disk.
LLM & Embedding¶
Agentic & MCP¶
Indexers¶
- class indexers.BaseIndexer(name: str, dimension: int)[source]¶
Bases:
ABC- abstractmethod build_index(embeddings: List[List[float]], metadata: List[Dict[str, Any]]) None[source]¶
Build or insert embeddings into the index.
- Parameters:
embeddings – List of embedding vectors.
metadata – List of metadata dictionaries corresponding to each embedding.
- abstractmethod get_size() int[source]¶
Return the approximate memory footprint or disk size in bytes.
- abstractmethod search(query_embedding: List[float], top_k: int = 5) List[Tuple[Dict[str, Any], float]][source]¶
Search the index and return a list of (metadata, distance/score) tuples.
- Parameters:
query_embedding – The embedding vector to search for.
top_k – Number of nearest neighbors to return.
- Returns:
List of tuples containing (metadata, distance).
- class indexers.base.BaseIndexer(name: str, dimension: int)[source]¶
Bases:
ABC- abstractmethod build_index(embeddings: List[List[float]], metadata: List[Dict[str, Any]]) None[source]¶
Build or insert embeddings into the index.
- Parameters:
embeddings – List of embedding vectors.
metadata – List of metadata dictionaries corresponding to each embedding.
- abstractmethod get_size() int[source]¶
Return the approximate memory footprint or disk size in bytes.
- abstractmethod search(query_embedding: List[float], top_k: int = 5) List[Tuple[Dict[str, Any], float]][source]¶
Search the index and return a list of (metadata, distance/score) tuples.
- Parameters:
query_embedding – The embedding vector to search for.
top_k – Number of nearest neighbors to return.
- Returns:
List of tuples containing (metadata, distance).