Integrates with knowledge bases to enable searching, retrieving, and potentially updating information from structured re...
Created byApr 22, 2025
Embedding MCP Server
A Model Context Protocol (MCP) server implementation powered by txtai, providing semantic search, knowledge graph capabilities, and AI-driven text processing through a standardized interface.
The Power of txtai: All-in-one Embeddings Database
This project leverages txtai, an all-in-one embeddings database for RAG leveraging semantic search, knowledge graph construction, and language model workflows. txtai offers several key advantages:
Unified Vector Database: Combines vector indexes, graph networks, and relational databases in a single platform
Semantic Search: Find information based on meaning, not just keywords
Knowledge Graph Integration: Automatically build and query knowledge graphs from your data
Portable Knowledge Bases: Save entire knowledge bases as compressed archives (.tar.gz) that can be easily shared and loaded
Extensible Pipeline System: Process text, documents, audio, images, and video through a unified API
Local-first Architecture: Run everything locally without sending data to external services
How It Works
The project contains a knowledge base builder tool and a MCP server. The knowledge base builder tool is a command-line interface for creating and managing knowledge bases. The MCP server provides a standardized interface to access the knowledge base.
It is not required to use the knowledge base builder tool to build a knowledge base. You can always build a knowledge base using txtai's programming interface by writing a Python script or even using a jupyter notebook. As long as the knowledge base is built using txtai, it can be loaded by the MCP server. Better yet, the knowledge base can be a folder on the file system or an exported .tar.gz file. Just give it to the MCP server and it will load it.
1. Build a Knowledge Base with kb_builder
The kb_builder module provides a command-line interface for creating and managing knowledge bases:
Process documents from various sources (files, directories, JSON)
Extract text and create embeddings
Build knowledge graphs automatically
Export portable knowledge bases
Note it is possibly limited in functionality and currently only provided for convenience.
2. Start the MCP Server
The MCP server provides a standardized interface to access the knowledge base:
Semantic search capabilities
Knowledge graph querying and visualization
Text processing pipelines (summarization, extraction, etc.)
Full compliance with the Model Context Protocol
Installation
Recommended: Using uv with Python 3.10+
We recommend using uv with Python 3.10 or newer for the best experience. This provides better dependency management and ensures consistent behavior.
Note: We pin transformers to version 4.49.0 to avoid deprecation warnings about transformers.agents.tools that appear in version 4.50.0 and newer. If you use a newer version of transformers, you may see these warnings, but they don't affect functionality.
Using conda
From Source
Using uv (Faster Alternative)
Using uvx (No Installation Required)
uvx allows you to run packages directly from PyPI without installing them:
Command Line Usage
Building a Knowledge Base
You can use the command-line tools installed from PyPI, the Python module directly, or the convenient shell scripts:
Using the PyPI Installed Commands
Using uvx (No Installation Required)
Using the Python Module
Using the Convenience Scripts
The repository includes convenient wrapper scripts that make it easier to build and search knowledge bases:
Run ./scripts/kb_build.sh --help or ./scripts/kb_search.sh --help for more options.
Starting the MCP Server
Using the PyPI Installed Command
Using uvx (No Installation Required)
Using the Python Module
MCP Server Configuration
The MCP server is configured using environment variables or command-line arguments, not YAML files. YAML files are only used for configuring txtai components during knowledge base building.
Here's how to configure the MCP server:
Common configuration options:
--embeddings: Path to the knowledge base (required)
--host: Host address to bind to (default: localhost)
--port: Port to listen on (default: 8000)
--transport: Transport to use, either 'sse' or 'stdio' (default: stdio)
--enable-causal-boost: Enable causal boost feature for enhanced relevance scoring
--causal-config: Path to custom causal boost configuration YAML file
Configuring LLM Clients to Use the MCP Server
To configure an LLM client to use the MCP server, you need to create an MCP configuration file. Here's an example mcp_config.json:
Using the server directly
If you use a virtual Python environment to install the server, you can use the following configuration - note that MCP host like Claude will not be able to connect to the server if you use a virtual environment, you need to use the absolute path to the Python executable of the virtual environment where you did "pip install" or "uv pip install", for example
Using system default Python
If you use your system default Python, you can use the following configuration:
Alternatively, if you're using uvx, assuming you have uvx installed in your system via "brew install uvx" etc, or you 've installed uvx and made it globally accessible via:
This creates a symbolic link from your user-specific installation to a system-wide location. For macOS applications like Claude Desktop, you can modify the system-wide PATH by creating or editing a launchd configuration file:
Add this content:
Then load it:
You'll need to restart your computer for this to take effect, though.
Place this configuration file in a location accessible to your LLM client and configure the client to use it. The exact configuration steps will depend on your specific LLM client.
Advanced Knowledge Base Configuration
Building a knowledge base with txtai requires a YAML configuration file that controls various aspects of the embedding process. This configuration is used by the kb_builder tool, not the MCP server itself.
One may need to tune segmentation/chunking strategies, embedding models, and scoring methods, as well as configure graph construction, causal boosting, weights of hybrid search, and more.
Fortunately, txtai provides a powerful YAML configuration system that requires no coding. Here's an example of a comprehensive configuration for knowledge base building:
Configuration Examples
The src/kb_builder/configs directory contains configuration templates for different use cases and storage backends:
Storage and Backend Configurations
memory.yml: In-memory vectors (fastest for development, no persistence)
sqlite-faiss.yml: SQLite for content + FAISS for vectors (local file-based persistence)
postgres-pgvector.yml: PostgreSQL + pgvector (production-ready with full persistence)
Domain-Specific Configurations
base.yml: Base configuration template
code_repositories.yml: Optimized for code repositories
data_science.yml: Configured for data science documents
general_knowledge.yml: General purpose knowledge base
research_papers.yml: Optimized for academic papers
technical_docs.yml: Configured for technical documentation
You can use these as starting points for your own configurations:
Advanced Features
Knowledge Graph Capabilities
The MCP server leverages txtai's built-in graph functionality to provide powerful knowledge graph capabilities:
Automatic Graph Construction: Build knowledge graphs from your documents automatically
Graph Traversal: Navigate through related concepts and documents
Path Finding: Discover connections between different pieces of information
Community Detection: Identify clusters of related information
Causal Boosting Mechanism
The MCP server includes a sophisticated causal boosting mechanism that enhances search relevance by identifying and prioritizing causal relationships:
Pattern Recognition: Detects causal language patterns in both queries and documents
Multilingual Support: Automatically applies appropriate patterns based on detected query language
Configurable Boost Multipliers: Different types of causal matches receive customizable boost factors
Enhanced Relevance: Results that explain causal relationships are prioritized in search results
This mechanism significantly improves responses to "why" and "how" questions by surfacing content that explains relationships between concepts. The causal boosting configuration is highly customizable through YAML files, allowing adaptation to different domains and languages.
License
MIT License - see LICENSE file for details
Embedding MCP Server
A Model Context Protocol (MCP) server implementation powered by txtai, providing semantic search, knowledge graph capabilities, and AI-driven text processing through a standardized interface.
The Power of txtai: All-in-one Embeddings Database
This project leverages txtai, an all-in-one embeddings database for RAG leveraging semantic search, knowledge graph construction, and language model workflows. txtai offers several key advantages:
Unified Vector Database: Combines vector indexes, graph networks, and relational databases in a single platform
Semantic Search: Find information based on meaning, not just keywords
Knowledge Graph Integration: Automatically build and query knowledge graphs from your data
Portable Knowledge Bases: Save entire knowledge bases as compressed archives (.tar.gz) that can be easily shared and loaded
Extensible Pipeline System: Process text, documents, audio, images, and video through a unified API
Local-first Architecture: Run everything locally without sending data to external services
How It Works
The project contains a knowledge base builder tool and a MCP server. The knowledge base builder tool is a command-line interface for creating and managing knowledge bases. The MCP server provides a standardized interface to access the knowledge base.
It is not required to use the knowledge base builder tool to build a knowledge base. You can always build a knowledge base using txtai's programming interface by writing a Python script or even using a jupyter notebook. As long as the knowledge base is built using txtai, it can be loaded by the MCP server. Better yet, the knowledge base can be a folder on the file system or an exported .tar.gz file. Just give it to the MCP server and it will load it.
1. Build a Knowledge Base with kb_builder
The kb_builder module provides a command-line interface for creating and managing knowledge bases:
Process documents from various sources (files, directories, JSON)
Extract text and create embeddings
Build knowledge graphs automatically
Export portable knowledge bases
Note it is possibly limited in functionality and currently only provided for convenience.
2. Start the MCP Server
The MCP server provides a standardized interface to access the knowledge base:
Semantic search capabilities
Knowledge graph querying and visualization
Text processing pipelines (summarization, extraction, etc.)
Full compliance with the Model Context Protocol
Installation
Recommended: Using uv with Python 3.10+
We recommend using uv with Python 3.10 or newer for the best experience. This provides better dependency management and ensures consistent behavior.
Note: We pin transformers to version 4.49.0 to avoid deprecation warnings about transformers.agents.tools that appear in version 4.50.0 and newer. If you use a newer version of transformers, you may see these warnings, but they don't affect functionality.
Using conda
From Source
Using uv (Faster Alternative)
Using uvx (No Installation Required)
uvx allows you to run packages directly from PyPI without installing them:
Command Line Usage
Building a Knowledge Base
You can use the command-line tools installed from PyPI, the Python module directly, or the convenient shell scripts:
Using the PyPI Installed Commands
Using uvx (No Installation Required)
Using the Python Module
Using the Convenience Scripts
The repository includes convenient wrapper scripts that make it easier to build and search knowledge bases:
Run ./scripts/kb_build.sh --help or ./scripts/kb_search.sh --help for more options.
Starting the MCP Server
Using the PyPI Installed Command
Using uvx (No Installation Required)
Using the Python Module
MCP Server Configuration
The MCP server is configured using environment variables or command-line arguments, not YAML files. YAML files are only used for configuring txtai components during knowledge base building.
Here's how to configure the MCP server:
Common configuration options:
--embeddings: Path to the knowledge base (required)
--host: Host address to bind to (default: localhost)
--port: Port to listen on (default: 8000)
--transport: Transport to use, either 'sse' or 'stdio' (default: stdio)
--enable-causal-boost: Enable causal boost feature for enhanced relevance scoring
--causal-config: Path to custom causal boost configuration YAML file
Configuring LLM Clients to Use the MCP Server
To configure an LLM client to use the MCP server, you need to create an MCP configuration file. Here's an example mcp_config.json:
Using the server directly
If you use a virtual Python environment to install the server, you can use the following configuration - note that MCP host like Claude will not be able to connect to the server if you use a virtual environment, you need to use the absolute path to the Python executable of the virtual environment where you did "pip install" or "uv pip install", for example
Using system default Python
If you use your system default Python, you can use the following configuration:
Alternatively, if you're using uvx, assuming you have uvx installed in your system via "brew install uvx" etc, or you 've installed uvx and made it globally accessible via:
This creates a symbolic link from your user-specific installation to a system-wide location. For macOS applications like Claude Desktop, you can modify the system-wide PATH by creating or editing a launchd configuration file:
Add this content:
Then load it:
You'll need to restart your computer for this to take effect, though.
Place this configuration file in a location accessible to your LLM client and configure the client to use it. The exact configuration steps will depend on your specific LLM client.
Advanced Knowledge Base Configuration
Building a knowledge base with txtai requires a YAML configuration file that controls various aspects of the embedding process. This configuration is used by the kb_builder tool, not the MCP server itself.
One may need to tune segmentation/chunking strategies, embedding models, and scoring methods, as well as configure graph construction, causal boosting, weights of hybrid search, and more.
Fortunately, txtai provides a powerful YAML configuration system that requires no coding. Here's an example of a comprehensive configuration for knowledge base building:
Configuration Examples
The src/kb_builder/configs directory contains configuration templates for different use cases and storage backends:
Storage and Backend Configurations
memory.yml: In-memory vectors (fastest for development, no persistence)
sqlite-faiss.yml: SQLite for content + FAISS for vectors (local file-based persistence)
postgres-pgvector.yml: PostgreSQL + pgvector (production-ready with full persistence)
Domain-Specific Configurations
base.yml: Base configuration template
code_repositories.yml: Optimized for code repositories
data_science.yml: Configured for data science documents
general_knowledge.yml: General purpose knowledge base
research_papers.yml: Optimized for academic papers
technical_docs.yml: Configured for technical documentation
You can use these as starting points for your own configurations:
Advanced Features
Knowledge Graph Capabilities
The MCP server leverages txtai's built-in graph functionality to provide powerful knowledge graph capabilities:
Automatic Graph Construction: Build knowledge graphs from your documents automatically
Graph Traversal: Navigate through related concepts and documents
Path Finding: Discover connections between different pieces of information
Community Detection: Identify clusters of related information
Causal Boosting Mechanism
The MCP server includes a sophisticated causal boosting mechanism that enhances search relevance by identifying and prioritizing causal relationships:
Pattern Recognition: Detects causal language patterns in both queries and documents
Multilingual Support: Automatically applies appropriate patterns based on detected query language
Configurable Boost Multipliers: Different types of causal matches receive customizable boost factors
Enhanced Relevance: Results that explain causal relationships are prioritized in search results
This mechanism significantly improves responses to "why" and "how" questions by surfacing content that explains relationships between concepts. The causal boosting configuration is highly customizable through YAML files, allowing adaptation to different domains and languages.