You can then have your LLM make HTTP requests to this endpoint:
This approach allows LLMs to focus on generating SQL, while our tools handle all the MCP protocol complexity!
Demo and Validation Scripts
We've created some badass demo scripts that show how AI models can use the MCP protocol to run complex queries against Trino:
1. Bullshit Data Generation and Loading
The `tools/create_bullshit_data.py` script generates a dataset of 10,000 employees with ridiculous job titles, inflated salaries, and a "bullshit factor" rating (1-10):
2. Running Complex Queries through MCP
The `test_bullshit_query.py` script demonstrates end-to-end MCP interaction:
Connects to the MCP server using STDIO transport
Initializes the protocol following the MCP spec
Runs a complex SQL query with WHERE, GROUP BY, HAVING, ORDER BY
Processes and formats the results
Example output showing top BS jobs with high salaries:
3. API Testing
The `test_llm_api.py` script validates the API functionality:
This performs a comprehensive check of:
API endpoint discovery
Documentation availability
Valid query execution
Error handling for invalid queries
Usage
The server will be available at:
Trino: http://localhost:9095
MCP server: http://localhost:9096
API server: http://localhost:9097
Client Connection
**IMPORTANT**: The client scripts run on your local machine (OUTSIDE Docker) and connect TO the Docker containers. The scripts automatically handle this by using docker exec commands. You don't need to be inside the container to use MCP!
Running tests from your local machine:
Transport Options
This server supports two transport methods, but only STDIO is currently reliable:
STDIO Transport (Recommended and Working)
STDIO transport works reliably and is currently the only recommended method for testing and development:
SSE Transport (NOT RECOMMENDED - Has Critical Issues)
SSE is the default transport in MCP but has serious issues with the current MCP 1.3.0 version, causing server crashes on client disconnections. **Not recommended for use until these issues are resolved**:
Known Issues and Fixes
Fixed: Docker Container API Initialization
**FIXED**: We've resolved an issue where the API in the Docker container returned 503 Service Unavailable responses. The problem was with the `app_lifespan` function not properly initializing the `app_context_global` and Trino client connection. The fix ensures that:
The Trino client explicitly connects during startup
The AppContext global variable is properly initialized
Health checks now work correctly
If you encounter 503 errors, check that your container has been rebuilt with the latest code:
MCP 1.3.0 SSE Transport Crashes
There's a critical issue with MCP 1.3.0's SSE transport that causes server crashes when clients disconnect. Until a newer MCP version is integrated, use STDIO transport exclusively. The error manifests as:
Trino Catalog Handling
We fixed an issue with catalog handling in the Trino client. The original implementation attempted to use `USE catalog` statements, which don't work reliably. The fix directly sets the catalog in the connection parameters.
Project Structure
This project is organized as follows:
`src/` - Main source code for the Trino MCP server
`examples/` - Simple examples showing how to use the server
`scripts/` - Useful diagnostic and testing scripts
`tools/` - Utility scripts for data creation and setup
`tests/` - Automated tests
Key files:
`llm_trino_api.py` - Standalone API server for LLM integration
`test_llm_api.py` - Test script for the API server
`test_mcp_stdio.py` - Main test script using STDIO transport (recommended)
`test_bullshit_query.py` - Complex query example with bullshit data
`load_bullshit_data.py` - Script to load generated data into Trino
`tools/create_bullshit_data.py` - Script to generate hilarious test data
`run_tests.sh` - Script to run automated tests
`examples/simple_mcp_query.py` - Simple example to query data using MCP
Development
**IMPORTANT**: All scripts can be run from your local machine - they'll automatically communicate with the Docker containers via docker exec commands!
Testing
To test that Trino queries are working correctly, use the STDIO transport test script:
For more complex testing with the bullshit data:
For testing the LLM API endpoint:
How LLMs Can Use This
LLMs can use the Trino MCP server to:
**Get Database Schema Information**:
```python
# Example prompt to LLM: "What schemas are available in the memory catalog?"
# LLM can generate code to query:
query = "SHOW SCHEMAS FROM memory"
```
**Run Complex Analytical Queries**:
```python
# Example prompt: "Find the top 5 job titles with highest average salaries"
# LLM can generate complex SQL:
query = """
SELECT
job_title,
AVG(salary) as avg_salary
FROM
memory.bullshit.real_bullshit_data
GROUP BY
job_title
ORDER BY
avg_salary DESC
LIMIT 5
"""
```
**Perform Data Analysis and Present Results**:
```python
# LLM can parse the response, extract insights and present to user:
"The highest paying job title is 'Advanced Innovation Jedi' with an average salary of $241,178.50"
```
Real LLM Analysis Example: Bullshit Jobs by Company
Here's a real example of what an LLM could produce when asked to "Identify the companies with the most employees in bullshit jobs and create a Mermaid chart":
Step 1: LLM generates and runs the query
Step 2: LLM gets and analyzes the results
Step 3: LLM generates a Mermaid chart visualization
**Alternative Bar Chart:**
Step 4: LLM provides key insights
The LLM can analyze the data and provide insights:
"Unknown Co" has the most employees in bullshit roles (2), while all others have just one
Most companies have achieved a perfect 10.0 bullshit factor score
Bullshit roles appear concentrated at executive or specialized position levels
This example demonstrates how an LLM can:
Generate appropriate SQL queries based on natural language questions
Process and interpret the results from Trino
Create visual representations of the data
Provide meaningful insights and analysis
Accessing the API
The Trino MCP server now includes two API options for accessing data:
1. Docker Container API (Port 9097)
2. Standalone Python API (Port 8008)
Both APIs offer the following endpoints:
`GET /api` - API documentation and usage examples
`POST /api/query` - Execute SQL queries against Trino
These APIs eliminate the need for wrapper scripts and let LLMs query Trino directly using REST calls, making it much simpler to integrate with services like Claude, GPT, and other AI systems.
Troubleshooting
API Returns 503 Service Unavailable
If the Docker container API returns 503 errors:
Make sure you've rebuilt the container with the latest code:
```bash
docker-compose stop trino-mcp
docker-compose rm -f trino-mcp
docker-compose up -d trino-mcp
```
Check the container logs for errors:
```bash
docker logs trino_mcp_trino-mcp_1
```
Verify that Trino is running properly:
```bash
curl -s http://localhost:9095/v1/info | jq
```
Port Conflicts with Standalone API
The standalone API defaults to port 8008 to avoid conflicts. If you see an "address already in use" error:
Edit `llm_trino_api.py` and change the port number in the last line:
```python
uvicorn.run(app, host="127.0.0.1", port=8008)
```
Run with a custom port via command line:
```bash
python -c "import llm_trino_api; import uvicorn; uvicorn.run(llm_trino_api.app, host='127.0.0.1', port=8009)"
```
Future Work
This is now in beta with these improvements planned:
Integrate with newer MCP versions when available to fix SSE transport issues
Add/Validate support for Hive, JDBC, and other connectors
Add more comprehensive query validation across different types and complexities
Implement support for more data types and advanced Trino features
Improve error handling and recovery mechanisms
Add user authentication and permission controls
Create more comprehensive examples and documentation
Develop admin monitoring and management interfaces
Add performance metrics and query optimization hints
Implement support for long-running queries and result streaming