Integrates with Windows speech services to enable text-to-speech and speech-to-text capabilities using native system fea...
Created byApr 23, 2025
MS-Lucidia-Voice-Gateway-MCP
A Model Context Protocol (MCP) server that provides text-to-speech and speech-to-text capabilities using Windows' built-in speech services. This server leverages the native Windows Speech API (SAPI) through PowerShell commands, eliminating the need for external APIs or services.
Features
Text-to-Speech (TTS) using Windows SAPI voices
Speech-to-Text (STT) using Windows Speech Recognition
Simple web interface for testing
No external API dependencies
Uses native Windows capabilities
Prerequisites
Windows 10/11 with Speech Recognition enabled
Node.js 16+
PowerShell
Installation
Clone the repository:
Install dependencies:
Build the project:
Usage
Testing Interface
Start the test server:
Open `http://localhost:3000` in your browser
Use the web interface to test TTS and STT capabilities
Available Tools
text_to_speech
Converts text to speech using Windows SAPI.
Parameters:
`text` (required): The text to convert to speech
`voice` (optional): The voice to use (e.g., "Microsoft David Desktop")
`speed` (optional): Speech rate from 0.5 to 2.0 (default: 1.0)
Example:
speech_to_text
Records audio and converts it to text using Windows Speech Recognition.
Parameters:
`duration` (optional): Recording duration in seconds (default: 5, max: 60)
Example:
Troubleshooting
Make sure Windows Speech Recognition is enabled:
- Open Windows Settings
- Go to Time & Language > Speech
- Enable Speech Recognition
Check available voices:
- Open PowerShell and run:
```powershell
Add-Type -AssemblyName System.Speech
(New-Object System.Speech.Synthesis.SpeechSynthesizer).GetInstalledVoices().VoiceInfo.Name
```
Test speech recognition:
- Open Speech Recognition in Windows Settings
- Run through the setup wizard if not already done
- Test that Windows can recognize your voice