Ultimate MCP Server

![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg) ![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg) ![MCP Protocol](https://img.shields.io/badge/Protocol-MCP-purple.svg)

A comprehensive Model Context Protocol (MCP) server providing advanced AI agents with dozens of powerful capabilities for cognitive augmentation, tool use, and intelligent orchestration

[Getting Started](#getting-started) [Key Features](#key-features) [Usage Examples](#usage-examples) [Architecture](#architecture)

What is Ultimate MCP Server?

Ultimate MCP Server is a comprehensive MCP-native system that serves as a complete AI agent operating system. It exposes dozens of powerful capabilities through the Model Context Protocol, enabling advanced AI agents to access a rich ecosystem of tools, cognitive systems, and specialized services.

While it includes intelligent task delegation from sophisticated models (e.g., Claude 3.7 Sonnet) to cost-effective ones (e.g., Gemini Flash 2.0 Lite), this is just one facet of its extensive functionality. The server provides unified access to multiple LLM providers while optimizing for cost, performance, and quality.

The system offers integrated cognitive memory systems, browser automation, Excel manipulation, database interactions, document processing, command-line utilities, dynamic API integration, OCR capabilities, vector operations, entity relation graphs, SQL database interactions, audio transcription, and much more. These capabilities transform an AI agent from a conversational interface into a powerful autonomous system capable of complex, multi-step operations across digital environments.

---## Vision: The Complete AI Agent Operating System

At its core, Ultimate MCP Server represents a fundamental shift in how AI agents operate in digital environments. It serves as a comprehensive operating system for AI, providing:

A unified cognitive architecture that enables persistent memory, reasoning, and contextual awareness

Seamless access to dozens of specialized tools spanning web browsing, document processing, data analysis, and more

Direct system-level capabilities for filesystem operations, database interactions, and command-line utilities

Dynamic workflow capabilities for complex multi-step task orchestration and execution

Intelligent integration of various LLM providers with cost, quality, and performance optimization

Advanced vector operations, knowledge graphs, and retrieval-augmented generation for enhanced AI capabilities

This approach mirrors how sophisticated operating systems provide applications with access to hardware, services, and resources - but designed specifically for augmenting AI agents with powerful new capabilities beyond their native abilities.

MCP-Native Architecture

The server is built entirely on the Model Context Protocol (MCP), making it specifically designed to work with AI agents like Claude. All functionality is exposed through standardized MCP tools that can be directly called by these agents, creating a seamless integration layer between AI agents and a comprehensive ecosystem of capabilities, services, and external systems.

Core Use Cases: AI Agent Augmentation and Ecosystem

The Ultimate MCP Server transforms AI agents like Claude 3.7 Sonnet into autonomous systems capable of sophisticated operations across digital environments:

Example workflow:

An AI agent receives a complex task requiring multiple capabilities beyond its native abilities

The agent uses the Ultimate MCP Server to access specialized tools and services as needed

The agent can leverage the cognitive memory system to maintain state and context across operations

Complex tasks like research, data analysis, document creation, and multimedia processing become possible

The agent can orchestrate multi-step workflows combining various tools in sophisticated sequences

Results are returned in standard MCP format, enabling the agent to understand and work with them

One important benefit is cost optimization through delegating appropriate tasks to more efficient models

This integration unlocks transformative capabilities that enable AI agents to autonomously complete complex projects while intelligently utilizing resources - including potentially saving 70-90% on API costs by using specialized tools and cost-effective models where appropriate.

Why Use Ultimate MCP Server?

Comprehensive AI Agent Toolkit

A unified hub enabling advanced AI agents to access an extensive ecosystem of tools:

Perform complex web automation tasks (**Playwright** integration).

Manipulate and analyze **Excel** spreadsheets with deep integration.

Access rich **cognitive memory** systems for persistent agent state.

Interact securely with the **filesystem**.

Interact with **databases** through SQL operations.

Process documents with **OCR** capabilities.

Perform sophisticated **vector search** and **RAG** operations.

Utilize specialized **text processing** and **classification**.

Leverage command-line tools like **ripgrep**, **awk**, **sed**, **jq**.

Dynamically integrate external **REST APIs**.

Use **meta tools** for self-discovery, optimization, and documentation refinement.

Cost Optimization

API costs for advanced models can be substantial. Ultimate MCP Server helps reduce costs by:

Routing appropriate tasks to cheaper models (e.g., $0.01/1K tokens vs $0.15/1K tokens).

Implementing **advanced caching** (exact, semantic, task-aware) to avoid redundant API calls.

Tracking and **optimizing costs** across providers.

Enabling **cost-aware task routing** decisions.

Handling routine processing with specialized non-LLM tools (filesystem, CLI utils, etc.).

Provider Abstraction

Avoid provider lock-in with a unified interface:

Standard API for **OpenAI**, **Anthropic (Claude)**, **Google (Gemini)**, **xAI (Grok)**, **DeepSeek**, and **OpenRouter**.

Consistent parameter handling and response formatting.

Ability to **swap providers** without changing application code.

Protection against provider-specific outages and limitations through fallback mechanisms.

Comprehensive Document and Data Processing

Process documents and data efficiently:

Break documents into semantically meaningful **chunks**.

Process chunks in **parallel** across multiple models.

Extract **structured data** (JSON, tables, key-value) from unstructured text.

Generate **summaries** and insights from large texts.

Convert formats (**HTML to Markdown**, documents to structured data).

Apply **OCR** to images and PDFs with optional LLM enhancement.

Key Features

MCP Protocol Integration

Native MCP Server: Built on the Model Context Protocol for seamless AI agent integration.

MCP Tool Framework: All functionality exposed through standardized MCP tools with clear schemas.

Tool Composition: Tools can be combined in workflows using dependencies.

Tool Discovery: Supports dynamic listing and capability discovery for agents.

Intelligent Task Delegation

Task Routing: Analyzes tasks and routes to appropriate models or specialized tools.

Provider Selection: Chooses provider/model based on task requirements, cost, quality, or speed preferences.

Cost-Performance Balancing: Optimizes delegation strategy.

Delegation Tracking: Monitors delegation patterns, costs, and outcomes (via Analytics).

Provider Integration

Multi-Provider Support: First-class support for OpenAI, Anthropic, Google, DeepSeek, xAI (Grok), OpenRouter. Extensible architecture.

Model Management: Handles different model capabilities, context windows, and pricing. Automatic selection and fallback mechanisms.

Advanced Caching

Multi-level Caching: Exact match, semantic similarity, and task-aware strategies.

Persistent Cache: Disk-based persistence (e.g., DiskCache) with fast in-memory access layer.

Cache Analytics: Tracks cache hit rates, estimated cost savings.

Document Tools

Smart Chunking: Token-based, semantic boundary detection, structural analysis methods. Configurable overlap.

Document Operations: Summarization (paragraph, bullets), entity extraction, question generation, batch processing.

Secure Filesystem Operations

Path Management: Robust validation, normalization, symlink security checks, configurable allowed directories.

File Operations: Read/write with encoding handling, smart text editing/replacement, metadata retrieval.

Directory Operations: Creation, listing, tree visualization, secure move/copy.

Search Capabilities: Recursive search with pattern matching and filtering.

Security Focus: Designed to prevent directory traversal and enforce boundaries.

Autonomous Tool Documentation Refiner

Automated Improvement: Systematically analyzes, tests, and refines MCP tool documentation (docstrings, schemas, examples).

Agent Simulation: Identifies ambiguities from an LLM agent's perspective.

Adaptive Testing: Generates and executes schema-aware test cases.

Failure Analysis: Uses LLM ensembles to diagnose documentation weaknesses.

Iterative Refinement: Continuously improves documentation quality.

(See dedicated section for more details)

Browser Automation with Playwright

Full Control: Navigate, click, type, scrape data, screenshots, PDFs, file up/download, JS execution.

Research: Automate searches across engines, extract structured data, monitor sites.

Synthesis: Combine findings from multiple web sources into reports.

Cognitive & Agent Memory System

Memory Hierarchy: Working, episodic, semantic, procedural levels.

Knowledge Management: Store/retrieve memories with metadata, relationships, importance tracking.

Workflow Tracking: Record agent actions, reasoning chains, artifacts, dependencies.

Smart Operations: Memory consolidation, reflection generation, relevance-based optimization, decay.

Excel Spreadsheet Automation

Direct Manipulation: Create, modify, format Excel files via natural language or structured instructions. Analyze formulas.

Template Learning: Learn from examples, adapt templates, apply formatting patterns.

VBA Macro Generation: Generate VBA code from instructions for complex automation.

Structured Data Extraction

JSON Extraction: Extract structured JSON with schema validation.

Table Extraction: Extract tables in multiple formats (JSON, CSV, Markdown).

Key-Value Extraction: Simple K/V pair extraction.

Semantic Schema Inference: Attempt to generate schemas from text.

Tournament Mode

Model Competitions: Run head-to-head comparisons for code or text generation tasks.

Multi-Model Evaluation: Compare outputs from different models/providers simultaneously.

Performance Metrics: Evaluate correctness, efficiency, style, etc. Persist results.

SQL Database Interactions

Query Execution: Run SQL queries against various DB types (SQLite, PostgreSQL, etc. via SQLAlchemy).

Schema Analysis: Analyze schemas, suggest optimizations (using LLM).

Data Exploration: Browse tables, visualize contents.

Query Generation: Generate SQL from natural language descriptions.

Entity Relation Graphs

Entity Extraction: Identify entities (people, orgs, locations, etc.).

Relationship Mapping: Discover and map connections between entities.

Knowledge Graph Construction: Build persistent graphs (e.g., using NetworkX).

Graph Querying: Extract insights using graph traversal or LLM-based queries.

Advanced Vector Operations

Semantic Search: Find similar content using vector embeddings.

Vector Storage Integration: Interfaces with vector databases or local stores.

Hybrid Search: Combines keyword and semantic search (e.g., via Marqo integration).

Batched Processing: Efficient embedding generation and searching for large datasets.

Retrieval-Augmented Generation (RAG)

Contextual Generation: Augments prompts with relevant retrieved documents/chunks.

Accuracy Improvement: Reduces hallucinations by grounding responses in provided context.

Workflow Integration: Seamlessly combines retrieval (vector/keyword search) with generation. Customizable strategies.

Audio Transcription

Speech-to-Text: Convert audio files (e.g., WAV, MP3) to text using models like Whisper.

Speaker Diarization: Identify different speakers (if supported by the model/library).

Transcript Enhancement: Clean and format transcripts using LLMs.

Multi-language Support: Handles various languages based on the underlying transcription model.

Text Classification

Custom Classifiers: Apply text classification models (potentially fine-tuned or using zero-shot LLMs).

Multi-label Classification: Assign multiple categories.

Confidence Scoring: Provide probabilities for classifications.

Batch Processing: Classify large document sets efficiently.

OCR Tools

PDF/Image Extraction: Uses Tesseract or other OCR engines, enhanced with LLM correction/formatting.

Preprocessing: Image denoising, thresholding, deskewing options.

Structure Analysis: Extracts PDF metadata and structure.

Batch Processing: Handles multiple files concurrently.

(Requires `ocr` extra dependencies: `uv pip install -e ".[ocr]"`)

Text Redline Tools

HTML Redline Generation: Visual diffs (insertions, deletions, moves) between text/HTML. Standalone HTML output.

Document Comparison: Compares various formats with intuitive highlighting.

HTML to Markdown Conversion

Intelligent Conversion: Detects content type, uses libraries like readability-lxml, trafilatura, markdownify.

Content Extraction: Filters boilerplate, preserves structure (tables, links).

Markdown Optimization: Cleans and normalizes output.

Workflow Optimization Tools

Cost Estimation/Comparison: Pre-execution cost estimates, model cost comparisons.

Model Selection Guidance: Recommends models based on task, budget, performance needs.

Workflow Execution Engine: Runs multi-stage pipelines with dependencies, parallel execution, variable passing.

Local Text Processing Tools (CLI Integration)

Offline Power: Securely wrap and expose command-line tools like ripgrep (fast regex search), awk (text processing), sed (stream editor), jq (JSON processing) as MCP tools. Process text locally without API calls.

Model Performance Benchmarking

Empirical Measurement: Tools to measure actual speed (tokens/sec), latency across providers/models.

Performance Profiles: Generate comparative reports based on real-world performance.

Data-Driven Optimization: Use benchmark data to inform routing decisions.

Server-Sent Events (SSE) Support

Real-time Streaming: Token-by-token updates for LLM completions.

Progress Monitoring: Track progress of long-running jobs (chunking, batch processing).

Event-Based Architecture: Subscribe to specific server events.

Multi-Model Synthesis

Comparative Analysis: Analyze outputs from multiple models side-by-side.

Response Synthesis: Combine best elements, generate meta-responses, create consensus outputs.

Collaborative Reasoning: Implement workflows where different models handle different steps.

Extended Model Support

Grok Integration: Native support for xAI's Grok.

DeepSeek Support: Optimized handling for DeepSeek models.

OpenRouter Integration: Access a wide variety via OpenRouter API key.

Gemini Integration: Comprehensive support for Google's Gemini models.

Anthropic Integration: Full support for Claude models including Claude 3.5 Sonnet and Haiku.

OpenAI Integration: Complete support for GPT-3.5, GPT-4.0, and newer models.

Meta Tools for Self-Improvement & Dynamic Integration

Tool Discovery: Agents can query available tools, parameters, descriptions (list_tools).

Usage Recommendations: Get AI-driven advice on tool selection/combination for tasks.

External API Integration: Dynamically register REST APIs via OpenAPI specs, making endpoints available as callable MCP tools (register_api, call_dynamic_tool).

Documentation Generation: Part of the Autonomous Refiner feature.

Analytics and Reporting

Usage Tracking: Monitors tokens, costs, requests, success/error rates per provider/model/tool.

Real-Time Monitoring: Live dashboard or stream of usage stats.

Detailed Reporting: Generate historical cost/usage reports, identify trends, export data.

Optimization Insights: Helps identify expensive operations or inefficient patterns.

Prompt Templates and Management

Jinja2 Templates: Create reusable, dynamic prompts with variables, conditionals, includes.

Prompt Repository: Store, retrieve, categorize, and version control prompts.

Metadata: Add descriptions, authorship, usage examples to templates.

Optimization: Test and compare template performance and token usage.

Error Handling and Resilience

Intelligent Retries: Automatic retries with exponential backoff for transient errors (rate limits, network issues).

Fallback Mechanisms: Configurable provider fallbacks on primary failure.

Detailed Error Reporting: Captures comprehensive error context for debugging.

Input Validation: Pre-flight checks for common issues (e.g., token limits, required parameters).

System Features

Rich Logging: Colorful, informative console logs via Rich.

Health Monitoring: /healthz endpoint for readiness checks.

Command-Line Interface: umcp CLI for management and interaction.

Getting Started

Install

Note: The `uv sync --all-extras` command installs all optional extras defined in the project (e.g., OCR, Browser Automation, Excel). If you only need specific extras, adjust your project dependencies and run `uv sync` without `--all-extras`.

.env Configuration

Create a file named .env in the root directory of the cloned repository. Add your API keys and any desired configuration overrides:

Note: Code block was split into 2 parts due to size limits.

Run

Make sure your virtual environment is active (source .venv/bin/activate).

Once running, the server will typically be available at http://localhost:8013 (or the host/port configured in your .env or command line). You should see log output indicating the server has started and which tools are registered.

Command Line Interface (CLI)

The Ultimate MCP Server provides a powerful command-line interface (CLI) through the umcp command that allows you to manage the server, interact with LLM providers, test features, and explore examples. This section details all available commands and their options.

Global Options

The umcp command supports the following global option:

Server Management

Starting the Server

The run command starts the Ultimate MCP Server with specified options:

Example output:

Available options:

-h, --host: Host or IP address to bind the server to (default: from .env)

-p, --port: Port to listen on (default: from .env)

-w, --workers: Number of worker processes to spawn (default: from .env)

-t, --transport-mode: Transport mode for server communication ('sse' or 'stdio', default: sse)

-d, --debug: Enable debug logging

--include-tools: List of tool names to include (comma-separated)

--exclude-tools: List of tool names to exclude (comma-separated)

Provider Management

Listing Providers

The providers command displays information about configured LLM providers:

Example output:

With --models:

Available options:

-c, --check: Check API keys for all configured providers

--models: List available models for each provider

Testing a Provider

The test command allows you to test a specific provider:

Example output:

Available options:

--model: Model ID to test (defaults to the provider's default)

--prompt: Prompt text to send (default: "Hello, world!")

Direct Text Generation

The complete command lets you generate text directly from the CLI:

Example output:

Available options:

--provider: Provider to use (default: openai)

--model: Model ID (defaults to provider's default)

--prompt: Prompt text (reads from stdin if not provided)

--temperature: Sampling temperature (0.0-2.0, default: 0.7)

--max-tokens: Maximum tokens to generate

--system: System prompt for providers that support it

-s, --stream: Stream the response token by token

Cache Management

The cache command allows you to view or clear the request cache:

Example output:

Available options:

--status: Show cache status (enabled by default if no other flag)

--clear: Clear the cache (will prompt for confirmation)

Benchmarking

The benchmark command lets you compare performance and cost across providers:

Example output:

Available options:

--providers: List of providers to benchmark (default: all configured)

--models: Model IDs to benchmark (defaults to default model of each provider)

--prompt: Prompt text to use (default: built-in benchmark prompt)

-r, --runs: Number of runs per provider/model (default: 3)

Tool Management

The tools command lists available tools, optionally filtered by category:

Example output:

Available options:

--category: Filter tools by category

--examples: Show example scripts alongside tools

Example Management

The examples command lets you list and run example scripts:

Example output when listing:

When running an example:

Available options:

-l, --list: List example scripts only

--category: Filter examples by category

Getting Help

Every command has detailed help available:

Example output:

Command-specific help:

Usage Examples

This section provides Python examples demonstrating how an MCP client (like an application using mcp-client or an agent like Claude) would interact with the tools provided by a running Ultimate MCP Server instance.

Note: These examples assume you have `mcp-client` installed (`pip install mcp-client`) and the Ultimate MCP Server is running at `http://localhost:8013`.

(The detailed code blocks from the original input are preserved below for completeness)

Basic Completion

Claude Using Ultimate MCP Server for Document Analysis (Delegation)

Note: Code block was split into 2 parts due to size limits.

Browser Automation for Research

Note: Code block was split into 2 parts due to size limits.

Cognitive Memory System Usage

Note: Code block was split into 3 parts due to size limits.

Excel Spreadsheet Automation

Note: Code block was split into 2 parts due to size limits.

Multi-Provider Comparison

Note: Code block was split into 2 parts due to size limits.

Cost-Optimized Workflow Execution

Note: Code block was split into 3 parts due to size limits.

Entity Relation Graph Example

Note: Code block was split into 2 parts due to size limits.

Document Chunking

Note: Code block was split into 2 parts due to size limits.

Multi-Provider Completion (Duplicate of earlier example, kept for structure)

Structured Data Extraction (JSON)

Note: Code block was split into 2 parts due to size limits.

Retrieval-Augmented Generation (RAG) Query

Note: Code block was split into 2 parts due to size limits.

Fused Search (Keyword + Semantic)

Note: Code block was split into 2 parts due to size limits.

Local Text Processing

Browser Automation Example: Getting Started and Basic Interaction

Note: Code block was split into 2 parts due to size limits.

Running a Model Tournament

Note: Code block was split into 2 parts due to size limits.

Meta Tools for Tool Discovery

Note: Code block was split into 2 parts due to size limits.

Local Command-Line Text Processing (e.g., jq)

Note: Code block was split into 2 parts due to size limits.

Dynamic API Integration

Note: Code block was split into 3 parts due to size limits.

OCR Usage Example

Note: Code block was split into 3 parts due to size limits.

(Note: Many examples involving file paths assume the server process has access to those paths. For Docker deployments, volume mapping is usually required.)

Autonomous Documentation Refiner

The Ultimate MCP Server includes a powerful feature for autonomously analyzing, testing, and refining the documentation of registered MCP tools. This feature, implemented in ultimate/tools/docstring_refiner.py, helps improve the usability and reliability of tools when invoked by Large Language Models (LLMs) like Claude.

How It Works

The documentation refiner follows a methodical, iterative approach:

Agent Simulation: Simulates how an LLM agent would interpret the current documentation (docstring, schema, examples) to identify potential ambiguities or missing information crucial for correct invocation.

Adaptive Test Generation: Creates diverse test cases based on the tool's input schema (parameter types, constraints, required fields), simulation results, and failures from previous refinement iterations. Aims for good coverage.

Schema-Aware Testing: Validates generated test inputs against the tool's schema before execution. Executes valid tests against the actual tool implementation within the server environment.

Ensemble Failure Analysis: If a test fails (e.g., wrong output, error thrown), multiple LLMs analyze the failure in the context of the specific documentation version used for that test run to pinpoint the documentation's weaknesses.

Structured Improvement Proposals: Based on the analysis, the system generates specific, targeted improvements:

Validated Schema Patching: Applies proposed JSON patches to the schema in-memory and validates the resulting schema structure before accepting the change for the next iteration.

Iterative Refinement: Repeats the cycle (generate tests -> execute -> analyze failures -> propose improvements -> patch schema) until tests consistently pass or a maximum iteration count is reached.

Optional Winnowing: After iterations, performs a final pass to condense and streamline the documentation while ensuring critical information discovered during testing is preserved.

Benefits

Reduces Manual Effort: Automates the often tedious process of writing and maintaining high-quality tool documentation for LLM consumption.

Improves Agent Performance: Creates clearer, more precise documentation, leading to fewer errors when LLMs try to use the tools.

Identifies Edge Cases: The testing process can uncover ambiguities and edge cases that human writers might miss.

Increases Consistency: Helps establish a more uniform style and level of detail across documentation for all tools.

Adapts to Feedback: Learns directly from simulated agent failures to target specific documentation weaknesses.

Schema Evolution: Allows for gradual, validated improvement of tool schemas based on usage simulation.

Detailed Reporting: Provides comprehensive logs and reports on the entire refinement process, including tests run, failures encountered, and changes made.

Limitations and Considerations

Cost & Time: Can be computationally expensive and time-consuming, as it involves multiple LLM calls (for simulation, test generation, failure analysis, improvement proposal) per tool per iteration.

Resource Intensive: May require significant CPU/memory, especially when refining many tools or using large LLMs for analysis.

LLM Dependency: The quality of the refinement heavily depends on the capabilities of the LLMs used for the analysis and generation steps.

Schema Complexity: Generating correct and meaningful JSON Patches for highly complex or nested schemas can be challenging for the LLM.

Determinism: The process involves LLMs, so results might not be perfectly deterministic between runs.

Maintenance Complexity: The refiner itself is a complex system with dependencies that require maintenance.

When to Use

This feature is particularly valuable when:

You have a large number of MCP tools exposed to LLM agents.

You observe frequent tool usage failures potentially caused by agent misinterpretation of documentation.

You are actively developing or expanding your tool ecosystem and need to ensure consistent, high-quality documentation.

You want to proactively improve agent reliability and performance without necessarily modifying the underlying tool code itself.

You have the budget (LLM credits) and time to invest in this automated quality improvement process.

Usage Example (Server-Side Invocation)

The documentation refiner is typically invoked as a server-side maintenance or administrative task, not directly exposed as an MCP tool for external agents to call.

Note: Code block was split into 2 parts due to size limits.

Example Library and Testing Framework

The Ultimate MCP Server includes an extensive collection of 35+ end-to-end examples located in the examples/ directory. These serve a dual purpose:

Living Documentation: They demonstrate practical, real-world usage patterns for nearly every tool and feature.

Integration Test Suite: They form a comprehensive test suite ensuring all components work together correctly.

Example Structure and Organization

Categorized: Examples are grouped by functionality (e.g., model_integration, tool_specific, workflows, advanced_features).

Standalone: Each example (*.py) is a runnable Python script using mcp-client to interact with a running server instance.

Clear Output: They utilize the Rich library for formatted, color-coded console output, clearly showing requests, responses, costs, timings, and results.

Error Handling: Examples include basic error checking for robust demonstration.

Rich Visual Output

Expect informative console output, including:

Tables summarizing results and statistics.

Syntax highlighting for code and JSON.

Progress indicators or detailed step logging.

Panels organizing output sections.

Example output snippet:

Customizing and Learning

Adaptable: Easily modify examples to use your API keys (via .env), different models, custom prompts, or input files.

Command-Line Args: Many examples accept arguments for customization (e.g., --model, --input-file, --headless).

Educational: Learn best practices for AI application structure, tool selection, parameter tuning, error handling, cost optimization, and integration patterns.

Comprehensive Testing Framework

The run_all_demo_scripts_and_check_for_errors.py script orchestrates the execution of all examples as a test suite:

Automated Execution: Discovers and runs examples/*.py sequentially.

Validation: Checks exit codes and stderr against predefined patterns to distinguish real errors from expected messages (e.g., missing API key warnings).

Reporting: Generates a summary report of passed, failed, and skipped tests, along with detailed logs.

Example test framework configuration snippet:

Running the Example Suite

This combined example library and testing framework provides invaluable resources for understanding, utilizing, and verifying the functionality of the Ultimate MCP Server.

CLI Commands

Ultimate MCP Server comes with a command-line interface (umcp) for server management and tool interaction:

Each command typically has additional options. Use umcp COMMAND --help to see options for a specific command (e.g., umcp complete --help).

Advanced Configuration

Configuration is primarily managed through environment variables, often loaded from a .env file in the project root upon startup.

Server Configuration

SERVER_HOST: (Default: 127.0.0.1) Network interface to bind to. Use 0.0.0.0 to listen on all interfaces (necessary for Docker containers or external access).

SERVER_PORT: (Default: 8013) Port the server listens on.

API_PREFIX: (Default: /) URL prefix for all API endpoints (e.g., set to /mcp/v1 to serve under that path).

WORKERS: (Optional, e.g., 4) Number of worker processes for the web server (e.g., Uvicorn). Adjust based on CPU cores.

Tool Filtering (Startup Control)

Control which tools are registered when the server starts using CLI flags:

--include-tools tool1,tool2,...: Only register the specified tools.

--exclude-tools tool3,tool4,...: Register all tools except those specified.This is useful for creating lightweight instances, managing dependencies, or restricting agent capabilities.

Logging Configuration

LOG_LEVEL: (Default: INFO) Controls log verbosity (DEBUG, INFO, WARNING, ERROR, CRITICAL). DEBUG is very verbose.

USE_RICH_LOGGING: (Default: true) Enables colorful, structured console logs via the Rich library. Set to false for plain text logs (better for file redirection or some logging systems).

LOG_FORMAT: (Optional) Specify a Python logging format string for custom log formats (if USE_RICH_LOGGING=false).

LOG_TO_FILE: (Optional, e.g., /var/log/ultimate_mcp_server.log) Path to a file where logs should also be written (in addition to console). Ensure the server process has write permissions.

Cache Configuration

CACHE_ENABLED: (Default: true) Globally enable or disable response caching.

CACHE_TTL: (Default: 86400 seconds = 24 hours) Default Time-To-Live for cached items. Specific tools might have overrides.

CACHE_TYPE: (Default: memory) Backend storage. Check implementation for supported types (e.g., memory, redis, diskcache). diskcache provides persistence.

CACHE_DIR: (Default: ./.cache) Directory used if CACHE_TYPE=diskcache. Ensure write permissions.

CACHE_MAX_SIZE: (Optional, e.g., 1000 for items or 536870912 for 512MB for diskcache) Sets size limits for the cache.

REDIS_URL: (Required if CACHE_TYPE=redis) Connection URL for Redis server (e.g., redis://localhost:6379/0).

Provider Timeouts & Retries

PROVIDER_TIMEOUT: (Default: 120) Default timeout in seconds for waiting for a response from an LLM provider API.

PROVIDER_MAX_RETRIES: (Default: 3) Default number of times to retry a failed request to a provider (for retryable errors like rate limits or temporary server issues). Uses exponential backoff.

Specific provider overrides might exist via dedicated variables (e.g., OPENAI_TIMEOUT, ANTHROPIC_MAX_RETRIES). Check configuration loading logic or documentation.

Tool-Specific Configuration

Individual tools might load their own configuration from environment variables. Examples:

ALLOWED_DIRS: Comma-separated list of base directories filesystem tools are restricted to. Crucially for security.

PLAYWRIGHT_BROWSER_TYPE: (Default: chromium) Browser used by Playwright tools (chromium, firefox, webkit).

PLAYWRIGHT_TIMEOUT: Default timeout for Playwright actions.

DATABASE_URL: Connection string for the SQL Database Interaction tools (uses SQLAlchemy).

MARQO_URL: URL for the Marqo instance used by the fused search tool.

TESSERACT_CMD: Path to the Tesseract executable if not in standard system PATH (for OCR).

Always ensure environment variables are set correctly **before** starting the server. Changes typically require a server restart to take effect.

Deployment Considerations

While umcp run or docker compose up are fine for development, consider these for more robust deployments:

1. Running as a Background Service

Ensure the server runs continuously and restarts automatically.

`systemd` (Linux): Create a service unit file (.service) to manage the process with systemctl start|stop|restart|status. Provides robust control and logging integration.

`supervisor`: A process control system written in Python. Configure supervisord to monitor and manage the server process.

Docker Restart Policies: Use --restart unless-stopped or --restart always in your docker run command or in docker-compose.yml to have Docker manage restarts.

2. Using a Reverse Proxy (Nginx, Caddy, Apache, Traefik)

Placing a reverse proxy in front of the Ultimate MCP Server is highly recommended:

**HTTPS/SSL Termination:** Handles SSL certificates (e.g., via Let's Encrypt with Caddy/Certbot) encrypting external traffic.

**Load Balancing:** Distribute traffic if running multiple instances of the server for high availability or scaling.

**Path Routing:** Map a clean external URL (e.g., `https://api.yourdomain.com/mcp/`) to the internal server (`http://localhost:8013`). Configure `API_PREFIX` if needed.

**Security Headers:** Add important headers like `Strict-Transport-Security` (HSTS), `Content-Security-Policy` (CSP).

**Access Control:** Implement IP allow-listing, basic authentication, or integrate with OAuth2 proxies.

**Buffering/Caching:** May offer additional request/response buffering or caching layers.

**Timeouts:** Manage connection timeouts independently from the application server.

Example Nginx `location` block (simplified):

3. Container Orchestration (Kubernetes, Docker Swarm)

For scalable, managed deployments:

**Health Checks:** Implement and configure liveness and readiness probes using the server's `/healthz` endpoint (or similar) in your deployment manifests.

**Configuration:** Use ConfigMaps and Secrets (Kubernetes) or Docker Secrets/Configs to manage environment variables and API keys securely, rather than baking them into images or relying solely on `.env` files.

**Resource Limits:** Define appropriate CPU and memory requests/limits for the container(s) to ensure stable performance and avoid resource starvation on the node.

**Service Discovery:** Utilize the orchestrator's built-in service discovery instead of hardcoding IPs or hostnames. Expose the service internally (e.g., ClusterIP) and use an Ingress controller for external access.

**Persistent Storage:** If using features requiring persistence (e.g., `diskcache`, persistent memory, file storage), configure persistent volumes (PVs/PVCs).

4. Resource Allocation

RAM: Ensure sufficient memory, especially if using large models, in-memory caching, processing large documents, or running memory-intensive tools (like browser automation or certain data processing tasks). Monitor usage.

CPU: Monitor CPU load. LLM inference itself might not be CPU-bound (often GPU/TPU), but other tools (OCR, local processing, web server handling requests) can be. Consider the number of workers (WORKERS env var).

Disk I/O: Can be a bottleneck if using persistent caching (diskcache) or extensive filesystem operations. Use fast storage (SSDs) if needed.

Network: Ensure adequate bandwidth, especially if handling large documents, images, or frequent/large API responses.

Cost Savings With Delegation

Using Ultimate MCP Server for intelligent delegation can yield significant cost savings compared to using only a high-end model like Claude 3.7 Sonnet or GPT-4o for every task.

[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]

(Costs are highly illustrative, based on typical token counts and approximate 2024 pricing. Actual costs depend heavily on document size, complexity, specific models used, and current provider pricing.)

How savings are achieved:

Matching Model to Task: Using expensive models only for tasks requiring deep reasoning, creativity, or complex instruction following.

Leveraging Cheaper Models: Delegating summarization, extraction, simple Q&A, formatting, etc., to significantly cheaper models (like Gemini Flash, Claude Haiku, GPT-4.1 Mini, DeepSeek Chat).

Using Specialized Tools: Employing non-LLM tools (Filesystem, OCR, Browser, CLI utils, Database) where appropriate, avoiding LLM API calls entirely for those operations.

Caching: Reducing redundant API calls for identical or semantically similar requests.

Ultimate MCP Server acts as the intelligent routing layer to make these cost optimizations feasible within a sophisticated agent architecture.

Why AI-to-AI Delegation Matters

The strategic importance of AI-to-AI delegation, facilitated by systems like the Ultimate MCP Server, extends beyond simple cost savings:

Democratizing Advanced AI Capabilities

Makes the power of cutting-edge reasoning models (like Claude 3.7, GPT-4o) practically accessible for a wider range of applications by offloading routine work.

Allows organizations with budget constraints to leverage top-tier AI capabilities for critical reasoning steps, while managing overall costs effectively.

Enables more efficient and widespread use of AI resources across the industry.

Economic Resource Optimization

Represents a fundamental economic optimization in AI usage: applying the most expensive resource (top-tier LLM inference) only where its unique value is required.

Complex reasoning, creativity, nuanced understanding, and orchestration are reserved for high-capability models.

Routine data processing, extraction, formatting, and simpler Q&A are handled by cost-effective models.

Specialized, non-LLM tasks (web scraping, file I/O, DB queries) are handled by purpose-built tools, avoiding unnecessary LLM calls.

The overall system aims for near-top-tier performance and capability at a significantly reduced blended cost.

Transforms potentially unpredictable LLM API costs into a more controlled expenditure through intelligent routing and caching.

Sustainable AI Architecture

Promotes more sustainable AI usage by reducing the computational demand associated with using the largest models for every single task.

Creates a tiered, capability-matched approach to AI resource allocation.

Allows for more extensive experimentation and development, as many iterations can utilize cheaper models or tools.

Provides a scalable approach to integrating AI that can grow with business needs without costs spiraling uncontrollably.

Technical Evolution Path

Represents an important evolution in AI application architecture, moving beyond monolithic calls to single models towards distributed, multi-agent, multi-model workflows.

Enables sophisticated, AI-driven orchestration of complex processing pipelines involving diverse tools and models.

Creates a foundation for AI systems that can potentially reason about their own resource usage and optimize dynamically.

Builds towards more autonomous, self-optimizing AI systems capable of making intelligent delegation decisions based on context, cost, and required quality.

The Future of AI Efficiency

Ultimate MCP Server points toward a future where AI systems actively manage and optimize their own operational costs and resource usage.

Higher-capability models act as intelligent orchestrators or "managers" for ecosystems of specialized tools and more cost-effective "worker" models.

AI workflows become increasingly sophisticated, potentially self-organizing and resilient.

Organizations can leverage the full spectrum of AI capabilities from basic processing to advanced reasoning in a financially viable and scalable manner.

This vision of efficient, intelligently delegated, self-optimizing AI systems represents the next frontier in practical AI deployment, moving beyond the current paradigm of often using a single, powerful (and expensive) model for almost everything.

Architecture

How MCP Integration Works

The Ultimate MCP Server is built natively on the Model Context Protocol (MCP):

MCP Server Core: Implements a web server (e.g., using FastAPI) that listens for incoming HTTP requests conforming to the MCP specification (typically POST requests to a specific endpoint).

Tool Registration: During startup, the server discovers and registers all available tool implementations. Each tool provides metadata including its name, description, and input/output schemas (often Pydantic models converted to JSON Schema). This registry allows the server (and potentially agents) to know what tools are available and how to use them.

Tool Invocation: When an MCP client (like Claude or another application) sends a valid MCP request specifying a tool name and parameters, the server core routes the request to the appropriate registered tool's execution logic.

Context Passing & Execution: The tool receives the validated input parameters. It performs its action (calling an LLM, interacting with Playwright, querying a DB, manipulating a file, etc.).

Structured Response: The tool's execution result (or error) is packaged into a standard MCP response format, typically including status (success/failure), output data (conforming to the tool's output schema), cost information, and potentially other metadata.

Return to Client: The MCP server core sends the structured MCP response back to the originating client over HTTP.

This adherence to the MCP standard ensures seamless, predictable integration with any MCP-compatible agent or client application.

Component Diagram

Note: Code block was split into 2 parts due to size limits.

Request Flow for Delegation (Detailed)

Agent Decision: An MCP agent determines a need for a specific capability (e.g., summarize a large text, extract JSON, browse a URL) potentially suited for delegation.

MCP Request Formulation: The agent constructs an MCP tool invocation request, specifying the tool_name and required inputs according to the tool's schema (which it might have discovered via list_tools).

HTTP POST to Server: The agent sends this request (typically as JSON in the body) via HTTP POST to the Ultimate MCP Server's designated endpoint.

Request Reception & Parsing: The server's web framework (FastAPI) receives the request. The MCP Core parses the JSON body, validating it against the general MCP request structure.

Tool Dispatch: The MCP Core looks up the requested tool_name in its registry of registered tools.

Input Validation: The server uses the specific tool's input schema (Pydantic model) to validate the inputs provided in the request. If validation fails, an MCP error response is generated immediately.

Tool Execution Context: A context object might be created, potentially containing configuration, access to shared services (like logging, caching, analytics), etc.

Caching Check: The Caching Service is consulted. It generates a cache key based on the tool_name and validated inputs. If a valid, non-expired cache entry exists for this key, the cached response is retrieved and returned (skipping to step 14).

Tool Logic Execution: If not cached, the tool's main execution logic runs:

Cost Calculation: For LLM tasks, the Analytics Service calculates the estimated cost based on input/output tokens and provider pricing. For other tasks, the cost is typically zero unless they consume specific metered resources.

Result Formatting: The tool formats its result (data or error message) according to its defined output schema.

Analytics Recording: The Analytics Service logs the request, response (or error), execution time, cost, provider/model used, cache status (hit/miss), etc.

Caching Update: If the operation was successful and caching is enabled for this tool/request, the Caching Service stores the formatted response with its calculated TTL.

MCP Response Formulation: The MCP Core packages the final result (either from cache or from execution) into a standard MCP response structure, including status, outputs, error (if any), and potentially cost, usage_metadata.

HTTP Response to Agent: The server sends the MCP response back to the agent as the HTTP response (typically with a 200 OK status, even if the tool operation failed the MCP request itself succeeded). The agent then parses this response to determine the outcome of the tool call.

Real-World Use Cases

Advanced AI Agent Capabilities

Empower agents like Claude or custom-built autonomous agents to perform complex, multi-modal tasks by giving them tools for:

Persistent Memory & Learning: Maintain context across long conversations or tasks using the Cognitive Memory system.

Web Interaction & Research: Automate browsing, data extraction from websites, form submissions, and synthesize information from multiple online sources.

Data Analysis & Reporting: Create, manipulate, and analyze data within Excel spreadsheets; generate charts and reports.

Database Operations: Access and query enterprise databases to retrieve or update information based on agent goals.

Document Understanding: Process PDFs, images (OCR), extract key information, summarize long reports, answer questions based on documents (RAG).

Knowledge Graph Management: Build and query internal knowledge graphs about specific domains, projects, or entities.

Multimedia Processing: Transcribe audio recordings from meetings or voice notes.

Code Execution & Analysis: Use CLI tools or specialized code tools (if added) for development or data tasks.

External Service Integration: Interact with other company APIs or public APIs dynamically registered via OpenAPI.

Enterprise Workflow Automation

Build sophisticated automated processes that leverage AI reasoning and specialized tools:

Intelligent Document Processing Pipeline: Ingest scans/PDFs -> OCR -> Extract structured data (JSON) -> Validate data -> Classify document type -> Route to appropriate system or summarize for human review.

Automated Research Assistant: Given a topic -> Search academic databases (via Browser/API tool) -> Download relevant papers (Browser/Filesystem) -> Chunk & Summarize papers (Document tools) -> Extract key findings (Extraction tools) -> Store in Cognitive Memory -> Generate synthesized report.

Financial Reporting Automation: Connect to database (SQL tool) -> Extract financial data -> Populate Excel template (Excel tool) -> Generate charts & variance analysis -> Email report (if an email tool is added).

Customer Support Ticket Enrichment: Receive ticket text -> Classify issue type (Classification tool) -> Search internal knowledge base & documentation (RAG tool) -> Draft suggested response -> Augment with customer details from CRM (via DB or API tool).

Competitor Monitoring: Schedule browser automation task -> Visit competitor websites/news feeds -> Extract key announcements/pricing changes -> Summarize findings -> Alert relevant team.

Data Processing and Integration

Handle complex data tasks beyond simple ETL:

Unstructured to Structured: Extract specific information (JSON, tables) from emails, reports, chat logs, product reviews.

Knowledge Graph Creation: Process a corpus of documents (e.g., company wiki, research papers) to build an entity relationship graph for querying insights.

Data Transformation & Cleansing: Use SQL tools, Excel automation, or local text processing (awk, sed) for complex data manipulation guided by LLM instructions.

Automated Data Categorization: Apply text classification tools to large datasets (e.g., categorizing user feedback, tagging news articles).

Semantic Data Search: Build searchable vector indexes over internal documents, enabling users or agents to find information based on meaning, not just keywords (RAG).

Research and Analysis (Scientific, Market, etc.)

Support research teams with AI-powered tools:

Automated Literature Search & Review: Use browser/API tools to search databases (PubMed, ArXiv, etc.), download papers, chunk, summarize, and extract key methodologies or results.

Comparative Analysis: Use multi-provider completion or tournament tools to compare how different models interpret or generate hypotheses based on research data.

Data Extraction from Studies: Automatically pull structured data (participant numbers, p-values, outcomes) from published papers or reports into a database or spreadsheet.

Budget Tracking: Utilize the analytics features to monitor LLM API costs associated with research tasks.

Persistent Research Log: Use the Cognitive Memory system to store findings, hypotheses, observations, and reasoning steps throughout a research project.

Document Intelligence

Create comprehensive systems for understanding document collections:

End-to-End Pipeline: OCR scanned documents -> Enhance text with LLMs -> Extract predefined fields (Extraction tools) -> Classify document types -> Identify key entities/relationships -> Generate summaries -> Index text and metadata into a searchable system (Vector/SQL DB).

Financial Analysis and Modeling

Equip financial professionals with advanced tools:

AI-Assisted Model Building: Use natural language to instruct the Excel automation tool to create complex financial models, projections, or valuation analyses.

Data Integration: Pull market data via browser automation or APIs, combine it with internal data from databases (SQL tools).

Report Analysis: Use RAG or summarization tools to quickly understand long financial reports or filings.

Scenario Testing: Programmatically modify inputs in Excel models to run sensitivity analyses.

Decision Tracking: Use Cognitive Memory to log the reasoning behind investment decisions or analyses.

Security Considerations

When deploying and operating the Ultimate MCP Server, security must be a primary concern. Consider the following aspects:

**API Key Management:** * **Never hardcode API keys** in source code or commit them to version control. * Use **environment variables** (`.env` file for local dev, system environment variables, or preferably secrets management tools like HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager for production). * Ensure the `.env` file (if used locally) has **strict file permissions** (e.g., `chmod 600 .env`) readable only by the user running the server. * Use **separate keys** for development and production environments. * Implement **key rotation** policies and revoke suspected compromised keys immediately.

**Network Exposure & Access Control:** * **Bind to `127.0.0.1` (`SERVER_HOST`)** by default to only allow local connections. Only change to `0.0.0.0` if you intend to expose it, and *only* behind appropriate network controls. * **Use a Reverse Proxy:** (Nginx, Caddy, Traefik, etc.) placed in front of the server is **highly recommended**. It handles SSL/TLS termination, can enforce access controls (IP allow-listing, client certificate auth, Basic Auth, OAuth2 proxy integration), and provides a layer of separation. * **Firewall Rules:** Configure host-based or network firewalls to restrict access to the `SERVER_PORT` only from trusted sources (e.g., the reverse proxy's IP, specific application server IPs, VPN ranges).

**Authentication & Authorization:** * The Ultimate MCP Server itself might not have built-in user/agent authentication. Authentication should typically be handled at a layer *before* the server (e.g., by the reverse proxy or an API gateway). * Ensure that only **authorized clients** (trusted AI agents, specific backend services) can send requests to the server endpoint. Consider using mutual TLS (mTLS) or API keys/tokens managed by the proxy/gateway if needed. * If tools provide different levels of access (e.g., read-only vs. read-write filesystem), consider if authorization logic is needed *within* the server or managed externally.

**Rate Limiting & Abuse Prevention:** * Implement **rate limiting** at the reverse proxy or API gateway level based on source IP, API key, or other identifiers. This prevents denial-of-service (DoS) attacks and helps control costs from excessive API usage (both LLM and potentially tool usage). * Monitor usage patterns for signs of abuse.

**Input Validation & Sanitization:** * While MCP provides a structured format, pay close attention to tools that interact with external systems based on user/agent input: * **Filesystem Tools:** **Crucially**, configure `ALLOWED_DIRS` strictly. Validate and normalize all path inputs rigorously to prevent directory traversal (`../`). Ensure the server process runs with least privilege. * **SQL Tools:** Use parameterized queries or ORMs (like SQLAlchemy) correctly to prevent SQL injection vulnerabilities. Avoid constructing SQL strings directly from agent input. * **Browser Tools:** Be cautious with tools that execute arbitrary JavaScript (`browser_evaluate_script`). Avoid running scripts based directly on untrusted agent input if possible. Playwright's sandboxing helps but isn't foolproof. * **CLI Tools:** Sanitize arguments passed to tools like `run_ripgrep`, `run_jq`, etc., to prevent command injection, especially if constructing complex command strings. Use safe methods for passing input data (e.g., stdin). * Validate input data types and constraints using Pydantic schemas for all tool inputs.

**Dependency Security:** * Regularly **update dependencies** using `uv pip install --upgrade ...` or `uv sync` to patch known vulnerabilities in third-party libraries (FastAPI, Pydantic, Playwright, database drivers, etc.). * Use security scanning tools (`pip-audit`, GitHub Dependabot, Snyk) to automatically identify vulnerable dependencies in your `pyproject.toml` or `requirements.txt`.

**Logging Security:** * Be aware that `DEBUG` level logging might log sensitive information, including full prompts, API responses, file contents, or keys present in data. Configure `LOG_LEVEL` appropriately for production (`INFO` or `WARNING` is usually safer). * Ensure log files (if `LOG_TO_FILE` is used) have appropriate permissions and consider log rotation and retention policies. Avoid logging raw API keys.

**Tool-Specific Security:** * Review the security implications of each specific tool enabled. Does it allow writing files? Executing code? Accessing databases? Ensure configurations (like `ALLOWED_DIRS`, database credentials with limited permissions) follow the principle of least privilege. Disable tools that are not needed or cannot be secured adequately for your environment.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This project builds upon the work of many fantastic open-source projects and services. Special thanks to:

Model Context Protocol (MCP) for providing the foundational concepts and protocol specification.

FastAPI team for the high-performance web framework.

Pydantic developers for robust data validation and settings management.

Rich library for beautiful and informative terminal output.

uv from Astral for blazing-fast Python package installation and resolution.

Playwright team at Microsoft for the powerful browser automation framework.

OpenPyXL maintainers for Excel file manipulation.

SQLAlchemy developers for the database toolkit.

Developers of integrated tools like Tesseract, ripgrep, jq, awk, sed.

All the LLM providers (OpenAI, Anthropic, Google, DeepSeek, xAI, etc.) for making their powerful models accessible via APIs.

The broader Python and open-source communities.

This README provides a comprehensive overview. For specific tool parameters, advanced configuration options, and detailed implementation notes, please refer to the source code and individual tool documentation within the project.

Running the Server

Start the server using the CLI:

Ultimate MCP Server

A comprehensive Model Context Protocol (MCP) server providing advanced AI agents with dozens of powerful capabilities for cognitive augmentation, tool use, and intelligent orchestration

[Getting Started](#getting-started) [Key Features](#key-features) [Usage Examples](#usage-examples) [Architecture](#architecture)

What is Ultimate MCP Server?

---## Vision: The Complete AI Agent Operating System

At its core, Ultimate MCP Server represents a fundamental shift in how AI agents operate in digital environments. It serves as a comprehensive operating system for AI, providing:

A unified cognitive architecture that enables persistent memory, reasoning, and contextual awareness

Seamless access to dozens of specialized tools spanning web browsing, document processing, data analysis, and more

Direct system-level capabilities for filesystem operations, database interactions, and command-line utilities

Dynamic workflow capabilities for complex multi-step task orchestration and execution

Intelligent integration of various LLM providers with cost, quality, and performance optimization

Advanced vector operations, knowledge graphs, and retrieval-augmented generation for enhanced AI capabilities

MCP-Native Architecture

Core Use Cases: AI Agent Augmentation and Ecosystem

The Ultimate MCP Server transforms AI agents like Claude 3.7 Sonnet into autonomous systems capable of sophisticated operations across digital environments:

Example workflow:

An AI agent receives a complex task requiring multiple capabilities beyond its native abilities

The agent uses the Ultimate MCP Server to access specialized tools and services as needed

The agent can leverage the cognitive memory system to maintain state and context across operations

Complex tasks like research, data analysis, document creation, and multimedia processing become possible

The agent can orchestrate multi-step workflows combining various tools in sophisticated sequences

Results are returned in standard MCP format, enabling the agent to understand and work with them

One important benefit is cost optimization through delegating appropriate tasks to more efficient models

Why Use Ultimate MCP Server?

Comprehensive AI Agent Toolkit

A unified hub enabling advanced AI agents to access an extensive ecosystem of tools:

Perform complex web automation tasks (**Playwright** integration).

Manipulate and analyze **Excel** spreadsheets with deep integration.

Access rich **cognitive memory** systems for persistent agent state.

Interact securely with the **filesystem**.

Interact with **databases** through SQL operations.

Process documents with **OCR** capabilities.

Perform sophisticated **vector search** and **RAG** operations.

Utilize specialized **text processing** and **classification**.

Leverage command-line tools like **ripgrep**, **awk**, **sed**, **jq**.

Dynamically integrate external **REST APIs**.

Use **meta tools** for self-discovery, optimization, and documentation refinement.

Cost Optimization

API costs for advanced models can be substantial. Ultimate MCP Server helps reduce costs by:

Routing appropriate tasks to cheaper models (e.g., $0.01/1K tokens vs $0.15/1K tokens).

Implementing **advanced caching** (exact, semantic, task-aware) to avoid redundant API calls.

Tracking and **optimizing costs** across providers.

Enabling **cost-aware task routing** decisions.

Handling routine processing with specialized non-LLM tools (filesystem, CLI utils, etc.).

Provider Abstraction

Avoid provider lock-in with a unified interface:

Standard API for **OpenAI**, **Anthropic (Claude)**, **Google (Gemini)**, **xAI (Grok)**, **DeepSeek**, and **OpenRouter**.

Consistent parameter handling and response formatting.

Ability to **swap providers** without changing application code.

Protection against provider-specific outages and limitations through fallback mechanisms.

Comprehensive Document and Data Processing

Process documents and data efficiently:

Break documents into semantically meaningful **chunks**.

Process chunks in **parallel** across multiple models.

Extract **structured data** (JSON, tables, key-value) from unstructured text.

Generate **summaries** and insights from large texts.

Convert formats (**HTML to Markdown**, documents to structured data).

Apply **OCR** to images and PDFs with optional LLM enhancement.

Key Features

MCP Protocol Integration

Native MCP Server: Built on the Model Context Protocol for seamless AI agent integration.

MCP Tool Framework: All functionality exposed through standardized MCP tools with clear schemas.

Tool Composition: Tools can be combined in workflows using dependencies.

Tool Discovery: Supports dynamic listing and capability discovery for agents.

Intelligent Task Delegation

Task Routing: Analyzes tasks and routes to appropriate models or specialized tools.

Provider Selection: Chooses provider/model based on task requirements, cost, quality, or speed preferences.

Cost-Performance Balancing: Optimizes delegation strategy.

Delegation Tracking: Monitors delegation patterns, costs, and outcomes (via Analytics).

Provider Integration

Multi-Provider Support: First-class support for OpenAI, Anthropic, Google, DeepSeek, xAI (Grok), OpenRouter. Extensible architecture.

Model Management: Handles different model capabilities, context windows, and pricing. Automatic selection and fallback mechanisms.

Advanced Caching

Multi-level Caching: Exact match, semantic similarity, and task-aware strategies.

Persistent Cache: Disk-based persistence (e.g., DiskCache) with fast in-memory access layer.

Cache Analytics: Tracks cache hit rates, estimated cost savings.

Document Tools

Smart Chunking: Token-based, semantic boundary detection, structural analysis methods. Configurable overlap.

Document Operations: Summarization (paragraph, bullets), entity extraction, question generation, batch processing.

Secure Filesystem Operations

Path Management: Robust validation, normalization, symlink security checks, configurable allowed directories.

File Operations: Read/write with encoding handling, smart text editing/replacement, metadata retrieval.

Directory Operations: Creation, listing, tree visualization, secure move/copy.

Search Capabilities: Recursive search with pattern matching and filtering.

Security Focus: Designed to prevent directory traversal and enforce boundaries.

Autonomous Tool Documentation Refiner

Automated Improvement: Systematically analyzes, tests, and refines MCP tool documentation (docstrings, schemas, examples).

Agent Simulation: Identifies ambiguities from an LLM agent's perspective.

Adaptive Testing: Generates and executes schema-aware test cases.

Failure Analysis: Uses LLM ensembles to diagnose documentation weaknesses.

Iterative Refinement: Continuously improves documentation quality.

(See dedicated section for more details)

Browser Automation with Playwright

Full Control: Navigate, click, type, scrape data, screenshots, PDFs, file up/download, JS execution.

Research: Automate searches across engines, extract structured data, monitor sites.

Synthesis: Combine findings from multiple web sources into reports.

Cognitive & Agent Memory System

Memory Hierarchy: Working, episodic, semantic, procedural levels.

Knowledge Management: Store/retrieve memories with metadata, relationships, importance tracking.

Workflow Tracking: Record agent actions, reasoning chains, artifacts, dependencies.

Smart Operations: Memory consolidation, reflection generation, relevance-based optimization, decay.

Excel Spreadsheet Automation

Direct Manipulation: Create, modify, format Excel files via natural language or structured instructions. Analyze formulas.

Template Learning: Learn from examples, adapt templates, apply formatting patterns.

VBA Macro Generation: Generate VBA code from instructions for complex automation.

Structured Data Extraction

JSON Extraction: Extract structured JSON with schema validation.

Table Extraction: Extract tables in multiple formats (JSON, CSV, Markdown).

Key-Value Extraction: Simple K/V pair extraction.

Semantic Schema Inference: Attempt to generate schemas from text.

Tournament Mode

Model Competitions: Run head-to-head comparisons for code or text generation tasks.

Multi-Model Evaluation: Compare outputs from different models/providers simultaneously.

Performance Metrics: Evaluate correctness, efficiency, style, etc. Persist results.

SQL Database Interactions

Query Execution: Run SQL queries against various DB types (SQLite, PostgreSQL, etc. via SQLAlchemy).

Schema Analysis: Analyze schemas, suggest optimizations (using LLM).

Data Exploration: Browse tables, visualize contents.

Query Generation: Generate SQL from natural language descriptions.

Entity Relation Graphs

Entity Extraction: Identify entities (people, orgs, locations, etc.).

Relationship Mapping: Discover and map connections between entities.

Knowledge Graph Construction: Build persistent graphs (e.g., using NetworkX).

Graph Querying: Extract insights using graph traversal or LLM-based queries.

Advanced Vector Operations

Semantic Search: Find similar content using vector embeddings.

Vector Storage Integration: Interfaces with vector databases or local stores.

Hybrid Search: Combines keyword and semantic search (e.g., via Marqo integration).

Batched Processing: Efficient embedding generation and searching for large datasets.

Retrieval-Augmented Generation (RAG)

Contextual Generation: Augments prompts with relevant retrieved documents/chunks.

Accuracy Improvement: Reduces hallucinations by grounding responses in provided context.

Workflow Integration: Seamlessly combines retrieval (vector/keyword search) with generation. Customizable strategies.

Audio Transcription

Speech-to-Text: Convert audio files (e.g., WAV, MP3) to text using models like Whisper.

Speaker Diarization: Identify different speakers (if supported by the model/library).

Transcript Enhancement: Clean and format transcripts using LLMs.

Multi-language Support: Handles various languages based on the underlying transcription model.

Text Classification

Custom Classifiers: Apply text classification models (potentially fine-tuned or using zero-shot LLMs).

Multi-label Classification: Assign multiple categories.

Confidence Scoring: Provide probabilities for classifications.

Batch Processing: Classify large document sets efficiently.

OCR Tools

PDF/Image Extraction: Uses Tesseract or other OCR engines, enhanced with LLM correction/formatting.

Preprocessing: Image denoising, thresholding, deskewing options.

Structure Analysis: Extracts PDF metadata and structure.

Batch Processing: Handles multiple files concurrently.

(Requires `ocr` extra dependencies: `uv pip install -e ".[ocr]"`)

Text Redline Tools

HTML Redline Generation: Visual diffs (insertions, deletions, moves) between text/HTML. Standalone HTML output.

Document Comparison: Compares various formats with intuitive highlighting.

HTML to Markdown Conversion

Intelligent Conversion: Detects content type, uses libraries like readability-lxml, trafilatura, markdownify.

Content Extraction: Filters boilerplate, preserves structure (tables, links).

Markdown Optimization: Cleans and normalizes output.

Workflow Optimization Tools

Cost Estimation/Comparison: Pre-execution cost estimates, model cost comparisons.

Model Selection Guidance: Recommends models based on task, budget, performance needs.

Workflow Execution Engine: Runs multi-stage pipelines with dependencies, parallel execution, variable passing.

Local Text Processing Tools (CLI Integration)

Offline Power: Securely wrap and expose command-line tools like ripgrep (fast regex search), awk (text processing), sed (stream editor), jq (JSON processing) as MCP tools. Process text locally without API calls.

Model Performance Benchmarking

Empirical Measurement: Tools to measure actual speed (tokens/sec), latency across providers/models.

Performance Profiles: Generate comparative reports based on real-world performance.

Data-Driven Optimization: Use benchmark data to inform routing decisions.

Server-Sent Events (SSE) Support

Real-time Streaming: Token-by-token updates for LLM completions.

Progress Monitoring: Track progress of long-running jobs (chunking, batch processing).

Event-Based Architecture: Subscribe to specific server events.

Multi-Model Synthesis

Comparative Analysis: Analyze outputs from multiple models side-by-side.

Response Synthesis: Combine best elements, generate meta-responses, create consensus outputs.

Collaborative Reasoning: Implement workflows where different models handle different steps.

Extended Model Support

Grok Integration: Native support for xAI's Grok.

DeepSeek Support: Optimized handling for DeepSeek models.

OpenRouter Integration: Access a wide variety via OpenRouter API key.

Gemini Integration: Comprehensive support for Google's Gemini models.

Anthropic Integration: Full support for Claude models including Claude 3.5 Sonnet and Haiku.

OpenAI Integration: Complete support for GPT-3.5, GPT-4.0, and newer models.

Meta Tools for Self-Improvement & Dynamic Integration

Tool Discovery: Agents can query available tools, parameters, descriptions (list_tools).

Usage Recommendations: Get AI-driven advice on tool selection/combination for tasks.

External API Integration: Dynamically register REST APIs via OpenAPI specs, making endpoints available as callable MCP tools (register_api, call_dynamic_tool).

Documentation Generation: Part of the Autonomous Refiner feature.

Analytics and Reporting

Usage Tracking: Monitors tokens, costs, requests, success/error rates per provider/model/tool.

Real-Time Monitoring: Live dashboard or stream of usage stats.

Detailed Reporting: Generate historical cost/usage reports, identify trends, export data.

Optimization Insights: Helps identify expensive operations or inefficient patterns.

Prompt Templates and Management

Jinja2 Templates: Create reusable, dynamic prompts with variables, conditionals, includes.

Prompt Repository: Store, retrieve, categorize, and version control prompts.

Metadata: Add descriptions, authorship, usage examples to templates.

Optimization: Test and compare template performance and token usage.

Error Handling and Resilience

Intelligent Retries: Automatic retries with exponential backoff for transient errors (rate limits, network issues).

Fallback Mechanisms: Configurable provider fallbacks on primary failure.

Detailed Error Reporting: Captures comprehensive error context for debugging.

Input Validation: Pre-flight checks for common issues (e.g., token limits, required parameters).

System Features

Rich Logging: Colorful, informative console logs via Rich.

Health Monitoring: /healthz endpoint for readiness checks.

Command-Line Interface: umcp CLI for management and interaction.

Getting Started

Install

.env Configuration

Create a file named .env in the root directory of the cloned repository. Add your API keys and any desired configuration overrides:

Note: Code block was split into 2 parts due to size limits.

Run

Make sure your virtual environment is active (source .venv/bin/activate).

Command Line Interface (CLI)

Global Options

The umcp command supports the following global option:

Server Management

Starting the Server

The run command starts the Ultimate MCP Server with specified options:

Example output:

Available options:

-h, --host: Host or IP address to bind the server to (default: from .env)

-p, --port: Port to listen on (default: from .env)

-w, --workers: Number of worker processes to spawn (default: from .env)

-t, --transport-mode: Transport mode for server communication ('sse' or 'stdio', default: sse)

-d, --debug: Enable debug logging

--include-tools: List of tool names to include (comma-separated)

--exclude-tools: List of tool names to exclude (comma-separated)

Provider Management

Listing Providers

The providers command displays information about configured LLM providers:

Example output:

With --models:

Available options:

-c, --check: Check API keys for all configured providers

--models: List available models for each provider

Testing a Provider

The test command allows you to test a specific provider:

Example output:

Available options:

--model: Model ID to test (defaults to the provider's default)

--prompt: Prompt text to send (default: "Hello, world!")

Direct Text Generation

The complete command lets you generate text directly from the CLI:

Example output:

Available options:

--provider: Provider to use (default: openai)

--model: Model ID (defaults to provider's default)

--prompt: Prompt text (reads from stdin if not provided)

--temperature: Sampling temperature (0.0-2.0, default: 0.7)

--max-tokens: Maximum tokens to generate

--system: System prompt for providers that support it

-s, --stream: Stream the response token by token

Cache Management

The cache command allows you to view or clear the request cache:

Example output:

Available options:

--status: Show cache status (enabled by default if no other flag)

--clear: Clear the cache (will prompt for confirmation)

Benchmarking

The benchmark command lets you compare performance and cost across providers:

Example output:

Available options:

--providers: List of providers to benchmark (default: all configured)

--models: Model IDs to benchmark (defaults to default model of each provider)

--prompt: Prompt text to use (default: built-in benchmark prompt)

-r, --runs: Number of runs per provider/model (default: 3)

Tool Management

The tools command lists available tools, optionally filtered by category:

Example output:

Available options:

--category: Filter tools by category

--examples: Show example scripts alongside tools

Example Management

The examples command lets you list and run example scripts:

Example output when listing:

When running an example:

Available options:

-l, --list: List example scripts only

--category: Filter examples by category

Getting Help

Every command has detailed help available:

Example output:

Command-specific help:

Usage Examples

Note: These examples assume you have `mcp-client` installed (`pip install mcp-client`) and the Ultimate MCP Server is running at `http://localhost:8013`.

(The detailed code blocks from the original input are preserved below for completeness)

Basic Completion

Claude Using Ultimate MCP Server for Document Analysis (Delegation)

Note: Code block was split into 2 parts due to size limits.

Browser Automation for Research

Note: Code block was split into 2 parts due to size limits.

Cognitive Memory System Usage

Note: Code block was split into 3 parts due to size limits.

Excel Spreadsheet Automation

Note: Code block was split into 2 parts due to size limits.

Multi-Provider Comparison

Note: Code block was split into 2 parts due to size limits.

Cost-Optimized Workflow Execution

Note: Code block was split into 3 parts due to size limits.

Entity Relation Graph Example

Note: Code block was split into 2 parts due to size limits.

Document Chunking

Note: Code block was split into 2 parts due to size limits.

Multi-Provider Completion (Duplicate of earlier example, kept for structure)

Structured Data Extraction (JSON)

Note: Code block was split into 2 parts due to size limits.

Retrieval-Augmented Generation (RAG) Query

Note: Code block was split into 2 parts due to size limits.

Fused Search (Keyword + Semantic)

Note: Code block was split into 2 parts due to size limits.

Local Text Processing

Browser Automation Example: Getting Started and Basic Interaction

Note: Code block was split into 2 parts due to size limits.

Running a Model Tournament

Note: Code block was split into 2 parts due to size limits.

Meta Tools for Tool Discovery

Note: Code block was split into 2 parts due to size limits.

Local Command-Line Text Processing (e.g., jq)

Note: Code block was split into 2 parts due to size limits.

Dynamic API Integration

Note: Code block was split into 3 parts due to size limits.

OCR Usage Example

Note: Code block was split into 3 parts due to size limits.

(Note: Many examples involving file paths assume the server process has access to those paths. For Docker deployments, volume mapping is usually required.)

Autonomous Documentation Refiner

How It Works

The documentation refiner follows a methodical, iterative approach:

Agent Simulation: Simulates how an LLM agent would interpret the current documentation (docstring, schema, examples) to identify potential ambiguities or missing information crucial for correct invocation.

Adaptive Test Generation: Creates diverse test cases based on the tool's input schema (parameter types, constraints, required fields), simulation results, and failures from previous refinement iterations. Aims for good coverage.

Schema-Aware Testing: Validates generated test inputs against the tool's schema before execution. Executes valid tests against the actual tool implementation within the server environment.

Ensemble Failure Analysis: If a test fails (e.g., wrong output, error thrown), multiple LLMs analyze the failure in the context of the specific documentation version used for that test run to pinpoint the documentation's weaknesses.

Structured Improvement Proposals: Based on the analysis, the system generates specific, targeted improvements:

Validated Schema Patching: Applies proposed JSON patches to the schema in-memory and validates the resulting schema structure before accepting the change for the next iteration.

Iterative Refinement: Repeats the cycle (generate tests -> execute -> analyze failures -> propose improvements -> patch schema) until tests consistently pass or a maximum iteration count is reached.

Optional Winnowing: After iterations, performs a final pass to condense and streamline the documentation while ensuring critical information discovered during testing is preserved.

Benefits

Reduces Manual Effort: Automates the often tedious process of writing and maintaining high-quality tool documentation for LLM consumption.

Improves Agent Performance: Creates clearer, more precise documentation, leading to fewer errors when LLMs try to use the tools.

Identifies Edge Cases: The testing process can uncover ambiguities and edge cases that human writers might miss.

Increases Consistency: Helps establish a more uniform style and level of detail across documentation for all tools.

Adapts to Feedback: Learns directly from simulated agent failures to target specific documentation weaknesses.

Schema Evolution: Allows for gradual, validated improvement of tool schemas based on usage simulation.

Detailed Reporting: Provides comprehensive logs and reports on the entire refinement process, including tests run, failures encountered, and changes made.

Limitations and Considerations

Cost & Time: Can be computationally expensive and time-consuming, as it involves multiple LLM calls (for simulation, test generation, failure analysis, improvement proposal) per tool per iteration.

Resource Intensive: May require significant CPU/memory, especially when refining many tools or using large LLMs for analysis.

LLM Dependency: The quality of the refinement heavily depends on the capabilities of the LLMs used for the analysis and generation steps.

Schema Complexity: Generating correct and meaningful JSON Patches for highly complex or nested schemas can be challenging for the LLM.

Determinism: The process involves LLMs, so results might not be perfectly deterministic between runs.

Maintenance Complexity: The refiner itself is a complex system with dependencies that require maintenance.

When to Use

This feature is particularly valuable when:

You have a large number of MCP tools exposed to LLM agents.

You observe frequent tool usage failures potentially caused by agent misinterpretation of documentation.

You are actively developing or expanding your tool ecosystem and need to ensure consistent, high-quality documentation.

You want to proactively improve agent reliability and performance without necessarily modifying the underlying tool code itself.

You have the budget (LLM credits) and time to invest in this automated quality improvement process.

Usage Example (Server-Side Invocation)

The documentation refiner is typically invoked as a server-side maintenance or administrative task, not directly exposed as an MCP tool for external agents to call.

Note: Code block was split into 2 parts due to size limits.

Example Library and Testing Framework

The Ultimate MCP Server includes an extensive collection of 35+ end-to-end examples located in the examples/ directory. These serve a dual purpose:

Living Documentation: They demonstrate practical, real-world usage patterns for nearly every tool and feature.

Integration Test Suite: They form a comprehensive test suite ensuring all components work together correctly.

Example Structure and Organization

Categorized: Examples are grouped by functionality (e.g., model_integration, tool_specific, workflows, advanced_features).

Standalone: Each example (*.py) is a runnable Python script using mcp-client to interact with a running server instance.

Clear Output: They utilize the Rich library for formatted, color-coded console output, clearly showing requests, responses, costs, timings, and results.

Error Handling: Examples include basic error checking for robust demonstration.

Rich Visual Output

Expect informative console output, including:

Tables summarizing results and statistics.

Syntax highlighting for code and JSON.

Progress indicators or detailed step logging.

Panels organizing output sections.

Example output snippet:

Customizing and Learning

Adaptable: Easily modify examples to use your API keys (via .env), different models, custom prompts, or input files.

Command-Line Args: Many examples accept arguments for customization (e.g., --model, --input-file, --headless).

Educational: Learn best practices for AI application structure, tool selection, parameter tuning, error handling, cost optimization, and integration patterns.

Comprehensive Testing Framework

The run_all_demo_scripts_and_check_for_errors.py script orchestrates the execution of all examples as a test suite:

Automated Execution: Discovers and runs examples/*.py sequentially.

Validation: Checks exit codes and stderr against predefined patterns to distinguish real errors from expected messages (e.g., missing API key warnings).

Reporting: Generates a summary report of passed, failed, and skipped tests, along with detailed logs.

Example test framework configuration snippet:

Running the Example Suite

This combined example library and testing framework provides invaluable resources for understanding, utilizing, and verifying the functionality of the Ultimate MCP Server.

CLI Commands

Ultimate MCP Server comes with a command-line interface (umcp) for server management and tool interaction:

Each command typically has additional options. Use umcp COMMAND --help to see options for a specific command (e.g., umcp complete --help).

Advanced Configuration

Configuration is primarily managed through environment variables, often loaded from a .env file in the project root upon startup.

Server Configuration

SERVER_HOST: (Default: 127.0.0.1) Network interface to bind to. Use 0.0.0.0 to listen on all interfaces (necessary for Docker containers or external access).

SERVER_PORT: (Default: 8013) Port the server listens on.

API_PREFIX: (Default: /) URL prefix for all API endpoints (e.g., set to /mcp/v1 to serve under that path).

WORKERS: (Optional, e.g., 4) Number of worker processes for the web server (e.g., Uvicorn). Adjust based on CPU cores.

Tool Filtering (Startup Control)

Control which tools are registered when the server starts using CLI flags:

--include-tools tool1,tool2,...: Only register the specified tools.

--exclude-tools tool3,tool4,...: Register all tools except those specified.This is useful for creating lightweight instances, managing dependencies, or restricting agent capabilities.

Logging Configuration

LOG_LEVEL: (Default: INFO) Controls log verbosity (DEBUG, INFO, WARNING, ERROR, CRITICAL). DEBUG is very verbose.

USE_RICH_LOGGING: (Default: true) Enables colorful, structured console logs via the Rich library. Set to false for plain text logs (better for file redirection or some logging systems).

LOG_FORMAT: (Optional) Specify a Python logging format string for custom log formats (if USE_RICH_LOGGING=false).

LOG_TO_FILE: (Optional, e.g., /var/log/ultimate_mcp_server.log) Path to a file where logs should also be written (in addition to console). Ensure the server process has write permissions.

Cache Configuration

CACHE_ENABLED: (Default: true) Globally enable or disable response caching.

CACHE_TTL: (Default: 86400 seconds = 24 hours) Default Time-To-Live for cached items. Specific tools might have overrides.

CACHE_TYPE: (Default: memory) Backend storage. Check implementation for supported types (e.g., memory, redis, diskcache). diskcache provides persistence.

CACHE_DIR: (Default: ./.cache) Directory used if CACHE_TYPE=diskcache. Ensure write permissions.

CACHE_MAX_SIZE: (Optional, e.g., 1000 for items or 536870912 for 512MB for diskcache) Sets size limits for the cache.

REDIS_URL: (Required if CACHE_TYPE=redis) Connection URL for Redis server (e.g., redis://localhost:6379/0).

Provider Timeouts & Retries

PROVIDER_TIMEOUT: (Default: 120) Default timeout in seconds for waiting for a response from an LLM provider API.

PROVIDER_MAX_RETRIES: (Default: 3) Default number of times to retry a failed request to a provider (for retryable errors like rate limits or temporary server issues). Uses exponential backoff.

Specific provider overrides might exist via dedicated variables (e.g., OPENAI_TIMEOUT, ANTHROPIC_MAX_RETRIES). Check configuration loading logic or documentation.

Tool-Specific Configuration

Individual tools might load their own configuration from environment variables. Examples:

ALLOWED_DIRS: Comma-separated list of base directories filesystem tools are restricted to. Crucially for security.

PLAYWRIGHT_BROWSER_TYPE: (Default: chromium) Browser used by Playwright tools (chromium, firefox, webkit).

PLAYWRIGHT_TIMEOUT: Default timeout for Playwright actions.

DATABASE_URL: Connection string for the SQL Database Interaction tools (uses SQLAlchemy).

MARQO_URL: URL for the Marqo instance used by the fused search tool.

TESSERACT_CMD: Path to the Tesseract executable if not in standard system PATH (for OCR).

Always ensure environment variables are set correctly **before** starting the server. Changes typically require a server restart to take effect.

Deployment Considerations

While umcp run or docker compose up are fine for development, consider these for more robust deployments:

1. Running as a Background Service

Ensure the server runs continuously and restarts automatically.

`systemd` (Linux): Create a service unit file (.service) to manage the process with systemctl start|stop|restart|status. Provides robust control and logging integration.

`supervisor`: A process control system written in Python. Configure supervisord to monitor and manage the server process.

Docker Restart Policies: Use --restart unless-stopped or --restart always in your docker run command or in docker-compose.yml to have Docker manage restarts.

2. Using a Reverse Proxy (Nginx, Caddy, Apache, Traefik)

Placing a reverse proxy in front of the Ultimate MCP Server is highly recommended:

**HTTPS/SSL Termination:** Handles SSL certificates (e.g., via Let's Encrypt with Caddy/Certbot) encrypting external traffic.

**Load Balancing:** Distribute traffic if running multiple instances of the server for high availability or scaling.

**Path Routing:** Map a clean external URL (e.g., `https://api.yourdomain.com/mcp/`) to the internal server (`http://localhost:8013`). Configure `API_PREFIX` if needed.

**Security Headers:** Add important headers like `Strict-Transport-Security` (HSTS), `Content-Security-Policy` (CSP).

**Access Control:** Implement IP allow-listing, basic authentication, or integrate with OAuth2 proxies.

**Buffering/Caching:** May offer additional request/response buffering or caching layers.

**Timeouts:** Manage connection timeouts independently from the application server.

Example Nginx `location` block (simplified):

3. Container Orchestration (Kubernetes, Docker Swarm)

For scalable, managed deployments:

**Health Checks:** Implement and configure liveness and readiness probes using the server's `/healthz` endpoint (or similar) in your deployment manifests.

**Configuration:** Use ConfigMaps and Secrets (Kubernetes) or Docker Secrets/Configs to manage environment variables and API keys securely, rather than baking them into images or relying solely on `.env` files.

**Resource Limits:** Define appropriate CPU and memory requests/limits for the container(s) to ensure stable performance and avoid resource starvation on the node.

**Service Discovery:** Utilize the orchestrator's built-in service discovery instead of hardcoding IPs or hostnames. Expose the service internally (e.g., ClusterIP) and use an Ingress controller for external access.

**Persistent Storage:** If using features requiring persistence (e.g., `diskcache`, persistent memory, file storage), configure persistent volumes (PVs/PVCs).

4. Resource Allocation

RAM: Ensure sufficient memory, especially if using large models, in-memory caching, processing large documents, or running memory-intensive tools (like browser automation or certain data processing tasks). Monitor usage.

CPU: Monitor CPU load. LLM inference itself might not be CPU-bound (often GPU/TPU), but other tools (OCR, local processing, web server handling requests) can be. Consider the number of workers (WORKERS env var).

Disk I/O: Can be a bottleneck if using persistent caching (diskcache) or extensive filesystem operations. Use fast storage (SSDs) if needed.

Network: Ensure adequate bandwidth, especially if handling large documents, images, or frequent/large API responses.

Cost Savings With Delegation

Using Ultimate MCP Server for intelligent delegation can yield significant cost savings compared to using only a high-end model like Claude 3.7 Sonnet or GPT-4o for every task.

[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]
[object Object]	[object Object]	[object Object]	[object Object]	[object Object]

How savings are achieved:

Matching Model to Task: Using expensive models only for tasks requiring deep reasoning, creativity, or complex instruction following.

Leveraging Cheaper Models: Delegating summarization, extraction, simple Q&A, formatting, etc., to significantly cheaper models (like Gemini Flash, Claude Haiku, GPT-4.1 Mini, DeepSeek Chat).

Using Specialized Tools: Employing non-LLM tools (Filesystem, OCR, Browser, CLI utils, Database) where appropriate, avoiding LLM API calls entirely for those operations.

Caching: Reducing redundant API calls for identical or semantically similar requests.

Ultimate MCP Server acts as the intelligent routing layer to make these cost optimizations feasible within a sophisticated agent architecture.

Why AI-to-AI Delegation Matters

The strategic importance of AI-to-AI delegation, facilitated by systems like the Ultimate MCP Server, extends beyond simple cost savings:

Democratizing Advanced AI Capabilities

Makes the power of cutting-edge reasoning models (like Claude 3.7, GPT-4o) practically accessible for a wider range of applications by offloading routine work.

Allows organizations with budget constraints to leverage top-tier AI capabilities for critical reasoning steps, while managing overall costs effectively.

Enables more efficient and widespread use of AI resources across the industry.

Economic Resource Optimization

Represents a fundamental economic optimization in AI usage: applying the most expensive resource (top-tier LLM inference) only where its unique value is required.

Complex reasoning, creativity, nuanced understanding, and orchestration are reserved for high-capability models.

Routine data processing, extraction, formatting, and simpler Q&A are handled by cost-effective models.

Specialized, non-LLM tasks (web scraping, file I/O, DB queries) are handled by purpose-built tools, avoiding unnecessary LLM calls.

The overall system aims for near-top-tier performance and capability at a significantly reduced blended cost.

Transforms potentially unpredictable LLM API costs into a more controlled expenditure through intelligent routing and caching.

Sustainable AI Architecture

Promotes more sustainable AI usage by reducing the computational demand associated with using the largest models for every single task.

Creates a tiered, capability-matched approach to AI resource allocation.

Allows for more extensive experimentation and development, as many iterations can utilize cheaper models or tools.

Provides a scalable approach to integrating AI that can grow with business needs without costs spiraling uncontrollably.

Technical Evolution Path

Represents an important evolution in AI application architecture, moving beyond monolithic calls to single models towards distributed, multi-agent, multi-model workflows.

Enables sophisticated, AI-driven orchestration of complex processing pipelines involving diverse tools and models.

Creates a foundation for AI systems that can potentially reason about their own resource usage and optimize dynamically.

Builds towards more autonomous, self-optimizing AI systems capable of making intelligent delegation decisions based on context, cost, and required quality.

The Future of AI Efficiency

Ultimate MCP Server points toward a future where AI systems actively manage and optimize their own operational costs and resource usage.

Higher-capability models act as intelligent orchestrators or "managers" for ecosystems of specialized tools and more cost-effective "worker" models.

AI workflows become increasingly sophisticated, potentially self-organizing and resilient.

Organizations can leverage the full spectrum of AI capabilities from basic processing to advanced reasoning in a financially viable and scalable manner.

Architecture

How MCP Integration Works

The Ultimate MCP Server is built natively on the Model Context Protocol (MCP):

MCP Server Core: Implements a web server (e.g., using FastAPI) that listens for incoming HTTP requests conforming to the MCP specification (typically POST requests to a specific endpoint).

Tool Registration: During startup, the server discovers and registers all available tool implementations. Each tool provides metadata including its name, description, and input/output schemas (often Pydantic models converted to JSON Schema). This registry allows the server (and potentially agents) to know what tools are available and how to use them.

Tool Invocation: When an MCP client (like Claude or another application) sends a valid MCP request specifying a tool name and parameters, the server core routes the request to the appropriate registered tool's execution logic.

Context Passing & Execution: The tool receives the validated input parameters. It performs its action (calling an LLM, interacting with Playwright, querying a DB, manipulating a file, etc.).

Structured Response: The tool's execution result (or error) is packaged into a standard MCP response format, typically including status (success/failure), output data (conforming to the tool's output schema), cost information, and potentially other metadata.

Return to Client: The MCP server core sends the structured MCP response back to the originating client over HTTP.

This adherence to the MCP standard ensures seamless, predictable integration with any MCP-compatible agent or client application.

Component Diagram

Note: Code block was split into 2 parts due to size limits.

Request Flow for Delegation (Detailed)

Agent Decision: An MCP agent determines a need for a specific capability (e.g., summarize a large text, extract JSON, browse a URL) potentially suited for delegation.

MCP Request Formulation: The agent constructs an MCP tool invocation request, specifying the tool_name and required inputs according to the tool's schema (which it might have discovered via list_tools).

HTTP POST to Server: The agent sends this request (typically as JSON in the body) via HTTP POST to the Ultimate MCP Server's designated endpoint.

Request Reception & Parsing: The server's web framework (FastAPI) receives the request. The MCP Core parses the JSON body, validating it against the general MCP request structure.

Tool Dispatch: The MCP Core looks up the requested tool_name in its registry of registered tools.

Input Validation: The server uses the specific tool's input schema (Pydantic model) to validate the inputs provided in the request. If validation fails, an MCP error response is generated immediately.

Tool Execution Context: A context object might be created, potentially containing configuration, access to shared services (like logging, caching, analytics), etc.

Caching Check: The Caching Service is consulted. It generates a cache key based on the tool_name and validated inputs. If a valid, non-expired cache entry exists for this key, the cached response is retrieved and returned (skipping to step 14).

Tool Logic Execution: If not cached, the tool's main execution logic runs:

Cost Calculation: For LLM tasks, the Analytics Service calculates the estimated cost based on input/output tokens and provider pricing. For other tasks, the cost is typically zero unless they consume specific metered resources.

Result Formatting: The tool formats its result (data or error message) according to its defined output schema.

Analytics Recording: The Analytics Service logs the request, response (or error), execution time, cost, provider/model used, cache status (hit/miss), etc.

Caching Update: If the operation was successful and caching is enabled for this tool/request, the Caching Service stores the formatted response with its calculated TTL.

MCP Response Formulation: The MCP Core packages the final result (either from cache or from execution) into a standard MCP response structure, including status, outputs, error (if any), and potentially cost, usage_metadata.

HTTP Response to Agent: The server sends the MCP response back to the agent as the HTTP response (typically with a 200 OK status, even if the tool operation failed the MCP request itself succeeded). The agent then parses this response to determine the outcome of the tool call.

Real-World Use Cases

Advanced AI Agent Capabilities

Empower agents like Claude or custom-built autonomous agents to perform complex, multi-modal tasks by giving them tools for:

Persistent Memory & Learning: Maintain context across long conversations or tasks using the Cognitive Memory system.

Web Interaction & Research: Automate browsing, data extraction from websites, form submissions, and synthesize information from multiple online sources.

Data Analysis & Reporting: Create, manipulate, and analyze data within Excel spreadsheets; generate charts and reports.

Database Operations: Access and query enterprise databases to retrieve or update information based on agent goals.

Document Understanding: Process PDFs, images (OCR), extract key information, summarize long reports, answer questions based on documents (RAG).

Knowledge Graph Management: Build and query internal knowledge graphs about specific domains, projects, or entities.

Multimedia Processing: Transcribe audio recordings from meetings or voice notes.

Code Execution & Analysis: Use CLI tools or specialized code tools (if added) for development or data tasks.

External Service Integration: Interact with other company APIs or public APIs dynamically registered via OpenAPI.

Enterprise Workflow Automation

Build sophisticated automated processes that leverage AI reasoning and specialized tools:

Intelligent Document Processing Pipeline: Ingest scans/PDFs -> OCR -> Extract structured data (JSON) -> Validate data -> Classify document type -> Route to appropriate system or summarize for human review.

Automated Research Assistant: Given a topic -> Search academic databases (via Browser/API tool) -> Download relevant papers (Browser/Filesystem) -> Chunk & Summarize papers (Document tools) -> Extract key findings (Extraction tools) -> Store in Cognitive Memory -> Generate synthesized report.

Financial Reporting Automation: Connect to database (SQL tool) -> Extract financial data -> Populate Excel template (Excel tool) -> Generate charts & variance analysis -> Email report (if an email tool is added).

Customer Support Ticket Enrichment: Receive ticket text -> Classify issue type (Classification tool) -> Search internal knowledge base & documentation (RAG tool) -> Draft suggested response -> Augment with customer details from CRM (via DB or API tool).

Competitor Monitoring: Schedule browser automation task -> Visit competitor websites/news feeds -> Extract key announcements/pricing changes -> Summarize findings -> Alert relevant team.

Data Processing and Integration

Handle complex data tasks beyond simple ETL:

Unstructured to Structured: Extract specific information (JSON, tables) from emails, reports, chat logs, product reviews.

Knowledge Graph Creation: Process a corpus of documents (e.g., company wiki, research papers) to build an entity relationship graph for querying insights.

Data Transformation & Cleansing: Use SQL tools, Excel automation, or local text processing (awk, sed) for complex data manipulation guided by LLM instructions.

Automated Data Categorization: Apply text classification tools to large datasets (e.g., categorizing user feedback, tagging news articles).

Semantic Data Search: Build searchable vector indexes over internal documents, enabling users or agents to find information based on meaning, not just keywords (RAG).

Research and Analysis (Scientific, Market, etc.)

Support research teams with AI-powered tools:

Automated Literature Search & Review: Use browser/API tools to search databases (PubMed, ArXiv, etc.), download papers, chunk, summarize, and extract key methodologies or results.

Comparative Analysis: Use multi-provider completion or tournament tools to compare how different models interpret or generate hypotheses based on research data.

Data Extraction from Studies: Automatically pull structured data (participant numbers, p-values, outcomes) from published papers or reports into a database or spreadsheet.

Budget Tracking: Utilize the analytics features to monitor LLM API costs associated with research tasks.

Persistent Research Log: Use the Cognitive Memory system to store findings, hypotheses, observations, and reasoning steps throughout a research project.

Document Intelligence

Create comprehensive systems for understanding document collections:

End-to-End Pipeline: OCR scanned documents -> Enhance text with LLMs -> Extract predefined fields (Extraction tools) -> Classify document types -> Identify key entities/relationships -> Generate summaries -> Index text and metadata into a searchable system (Vector/SQL DB).

Financial Analysis and Modeling

Equip financial professionals with advanced tools:

AI-Assisted Model Building: Use natural language to instruct the Excel automation tool to create complex financial models, projections, or valuation analyses.

Data Integration: Pull market data via browser automation or APIs, combine it with internal data from databases (SQL tools).

Report Analysis: Use RAG or summarization tools to quickly understand long financial reports or filings.

Scenario Testing: Programmatically modify inputs in Excel models to run sensitivity analyses.

Decision Tracking: Use Cognitive Memory to log the reasoning behind investment decisions or analyses.

Security Considerations

When deploying and operating the Ultimate MCP Server, security must be a primary concern. Consider the following aspects:

**API Key Management:** * **Never hardcode API keys** in source code or commit them to version control. * Use **environment variables** (`.env` file for local dev, system environment variables, or preferably secrets management tools like HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager for production). * Ensure the `.env` file (if used locally) has **strict file permissions** (e.g., `chmod 600 .env`) readable only by the user running the server. * Use **separate keys** for development and production environments. * Implement **key rotation** policies and revoke suspected compromised keys immediately.

**Network Exposure & Access Control:** * **Bind to `127.0.0.1` (`SERVER_HOST`)** by default to only allow local connections. Only change to `0.0.0.0` if you intend to expose it, and *only* behind appropriate network controls. * **Use a Reverse Proxy:** (Nginx, Caddy, Traefik, etc.) placed in front of the server is **highly recommended**. It handles SSL/TLS termination, can enforce access controls (IP allow-listing, client certificate auth, Basic Auth, OAuth2 proxy integration), and provides a layer of separation. * **Firewall Rules:** Configure host-based or network firewalls to restrict access to the `SERVER_PORT` only from trusted sources (e.g., the reverse proxy's IP, specific application server IPs, VPN ranges).

**Authentication & Authorization:** * The Ultimate MCP Server itself might not have built-in user/agent authentication. Authentication should typically be handled at a layer *before* the server (e.g., by the reverse proxy or an API gateway). * Ensure that only **authorized clients** (trusted AI agents, specific backend services) can send requests to the server endpoint. Consider using mutual TLS (mTLS) or API keys/tokens managed by the proxy/gateway if needed. * If tools provide different levels of access (e.g., read-only vs. read-write filesystem), consider if authorization logic is needed *within* the server or managed externally.

**Rate Limiting & Abuse Prevention:** * Implement **rate limiting** at the reverse proxy or API gateway level based on source IP, API key, or other identifiers. This prevents denial-of-service (DoS) attacks and helps control costs from excessive API usage (both LLM and potentially tool usage). * Monitor usage patterns for signs of abuse.

**Input Validation & Sanitization:** * While MCP provides a structured format, pay close attention to tools that interact with external systems based on user/agent input: * **Filesystem Tools:** **Crucially**, configure `ALLOWED_DIRS` strictly. Validate and normalize all path inputs rigorously to prevent directory traversal (`../`). Ensure the server process runs with least privilege. * **SQL Tools:** Use parameterized queries or ORMs (like SQLAlchemy) correctly to prevent SQL injection vulnerabilities. Avoid constructing SQL strings directly from agent input. * **Browser Tools:** Be cautious with tools that execute arbitrary JavaScript (`browser_evaluate_script`). Avoid running scripts based directly on untrusted agent input if possible. Playwright's sandboxing helps but isn't foolproof. * **CLI Tools:** Sanitize arguments passed to tools like `run_ripgrep`, `run_jq`, etc., to prevent command injection, especially if constructing complex command strings. Use safe methods for passing input data (e.g., stdin). * Validate input data types and constraints using Pydantic schemas for all tool inputs.

**Dependency Security:** * Regularly **update dependencies** using `uv pip install --upgrade ...` or `uv sync` to patch known vulnerabilities in third-party libraries (FastAPI, Pydantic, Playwright, database drivers, etc.). * Use security scanning tools (`pip-audit`, GitHub Dependabot, Snyk) to automatically identify vulnerable dependencies in your `pyproject.toml` or `requirements.txt`.

**Logging Security:** * Be aware that `DEBUG` level logging might log sensitive information, including full prompts, API responses, file contents, or keys present in data. Configure `LOG_LEVEL` appropriately for production (`INFO` or `WARNING` is usually safer). * Ensure log files (if `LOG_TO_FILE` is used) have appropriate permissions and consider log rotation and retention policies. Avoid logging raw API keys.

**Tool-Specific Security:** * Review the security implications of each specific tool enabled. Does it allow writing files? Executing code? Accessing databases? Ensure configurations (like `ALLOWED_DIRS`, database credentials with limited permissions) follow the principle of least privilege. Disable tools that are not needed or cannot be secured adequately for your environment.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This project builds upon the work of many fantastic open-source projects and services. Special thanks to:

Model Context Protocol (MCP) for providing the foundational concepts and protocol specification.

FastAPI team for the high-performance web framework.

Pydantic developers for robust data validation and settings management.

Rich library for beautiful and informative terminal output.

uv from Astral for blazing-fast Python package installation and resolution.

Playwright team at Microsoft for the powerful browser automation framework.

OpenPyXL maintainers for Excel file manipulation.

SQLAlchemy developers for the database toolkit.

Developers of integrated tools like Tesseract, ripgrep, jq, awk, sed.

All the LLM providers (OpenAI, Anthropic, Google, DeepSeek, xAI, etc.) for making their powerful models accessible via APIs.

The broader Python and open-source communities.

This README provides a comprehensive overview. For specific tool parameters, advanced configuration options, and detailed implementation notes, please refer to the source code and individual tool documentation within the project.

Running the Server

Start the server using the CLI:

LLM Gateway

Ultimate MCP Server

A comprehensive Model Context Protocol (MCP) server providing advanced AI agents with dozens of powerful capabilities for cognitive augmentation, tool use, and intelligent orchestration

What is Ultimate MCP Server?

MCP-Native Architecture

Core Use Cases: AI Agent Augmentation and Ecosystem

Why Use Ultimate MCP Server?

Comprehensive AI Agent Toolkit

Cost Optimization

Provider Abstraction

Comprehensive Document and Data Processing

Key Features

MCP Protocol Integration

Intelligent Task Delegation

Provider Integration

Advanced Caching

Document Tools

Secure Filesystem Operations

Autonomous Tool Documentation Refiner

Browser Automation with Playwright

Cognitive & Agent Memory System

Excel Spreadsheet Automation

Structured Data Extraction

Tournament Mode

SQL Database Interactions

Entity Relation Graphs

Advanced Vector Operations

Retrieval-Augmented Generation (RAG)

Audio Transcription

Text Classification

OCR Tools

Text Redline Tools

HTML to Markdown Conversion

Workflow Optimization Tools

Local Text Processing Tools (CLI Integration)

Model Performance Benchmarking

Server-Sent Events (SSE) Support

Multi-Model Synthesis

Extended Model Support

Meta Tools for Self-Improvement & Dynamic Integration

Analytics and Reporting

Prompt Templates and Management

Error Handling and Resilience

System Features

Getting Started

Install

.env Configuration

Run

Command Line Interface (CLI)

Global Options

Server Management

Starting the Server

Provider Management

Listing Providers

Testing a Provider

Direct Text Generation

Cache Management

Benchmarking

Tool Management

Example Management

Getting Help

Usage Examples

Basic Completion

Claude Using Ultimate MCP Server for Document Analysis (Delegation)

Browser Automation for Research

Cognitive Memory System Usage

Excel Spreadsheet Automation

Multi-Provider Comparison

Cost-Optimized Workflow Execution

Entity Relation Graph Example

Document Chunking

Multi-Provider Completion (Duplicate of earlier example, kept for structure)

Structured Data Extraction (JSON)

Retrieval-Augmented Generation (RAG) Query

Fused Search (Keyword + Semantic)

Local Text Processing

Browser Automation Example: Getting Started and Basic Interaction

Running a Model Tournament

Meta Tools for Tool Discovery

Local Command-Line Text Processing (e.g., jq)