MS-Lucidia-Voice-Gateway-MCP

A Model Context Protocol (MCP) server that provides text-to-speech and speech-to-text capabilities using Windows' built-in speech services. This server leverages the native Windows Speech API (SAPI) through PowerShell commands, eliminating the need for external APIs or services.

Features

Text-to-Speech (TTS) using Windows SAPI voices

Speech-to-Text (STT) using Windows Speech Recognition

Simple web interface for testing

No external API dependencies

Uses native Windows capabilities

Prerequisites

Windows 10/11 with Speech Recognition enabled

Node.js 16+

PowerShell

Installation

Clone the repository:

Install dependencies:

Build the project:

Usage

Testing Interface

Start the test server:

Open `http://localhost:3000` in your browser

Use the web interface to test TTS and STT capabilities

Available Tools

text_to_speech

Converts text to speech using Windows SAPI.

Parameters:

`text` (required): The text to convert to speech

`voice` (optional): The voice to use (e.g., "Microsoft David Desktop")

`speed` (optional): Speech rate from 0.5 to 2.0 (default: 1.0)

Example:

speech_to_text

Records audio and converts it to text using Windows Speech Recognition.

Parameters:

`duration` (optional): Recording duration in seconds (default: 5, max: 60)

Example:

Troubleshooting

Make sure Windows Speech Recognition is enabled: - Open Windows Settings - Go to Time & Language > Speech - Enable Speech Recognition

Check available voices: - Open PowerShell and run: ```powershell Add-Type -AssemblyName System.Speech (New-Object System.Speech.Synthesis.SpeechSynthesizer).GetInstalledVoices().VoiceInfo.Name ```

Test speech recognition: - Open Speech Recognition in Windows Settings - Run through the setup wizard if not already done - Test that Windows can recognize your voice

Contributing

Fork the repository

Create your feature branch

Commit your changes

Push to the branch

Create a new Pull Request

License

MIT

Text To Speech (Windows)