Integrates with the markitdown library to convert various document formats including PDF, Word, and HTML to standardized...
Created byApr 23, 2025
Markdownify MCP Server - UTF-8 Enhanced
This is an enhanced version of the [original Markdownify MCP project](https://github.com/cursor-ai/markdownify-mcp), with improved UTF-8 encoding support and optimized handling of multilingual content.
[ ](README-CN.md)
Enhancements
Added comprehensive UTF-8 encoding support
Optimized handling of multilingual content
Fixed encoding issues on Windows systems
Improved error handling mechanisms
Key Differences from Original Project
Enhanced Encoding Support:
- Full UTF-8 support across all operations
- Proper handling of Chinese, Japanese, Korean and other non-ASCII characters
- Fixed Windows-specific encoding issues (cmd.exe and PowerShell compatibility)
Improved Error Handling:
- Detailed error messages in both English and Chinese
- Better exception handling for network issues
- Graceful fallback mechanisms for conversion failures
Extended Functionality:
- Added support for batch processing multiple files
- Enhanced YouTube video transcript handling
- Improved metadata extraction from various file formats
- Better preservation of document formatting
Performance Optimizations:
- Optimized memory usage for large file conversions
- Faster processing of multilingual content
- Reduced dependency conflicts
Better Development Experience:
- Comprehensive debugging options
- Detailed logging system
- Environment-specific configuration support
- Clear documentation in both English and Chinese
Features
Supports converting various file types to Markdown:
PDF files
Images (with metadata)
Audio (with transcription)
Word documents (DOCX)
Excel spreadsheets (XLSX)
PowerPoint presentations (PPTX)
Web content:
- YouTube video transcripts
- Search results
- General web pages
Existing Markdown files
Quick Start
Clone this repository:
```bash
git clone https://github.com/JDJR2024/markdownify-mcp-utf8.git
cd markdownify-mcp-utf8
```
Install dependencies:
```bash
pnpm install
```
Note: This will also install `uv` and related Python dependencies.
Build the project:
```bash
pnpm run build
```
Start the server:
```bash
pnpm start
```
Requirements
Node.js 16.0 or higher
Python 3.8 or higher
pnpm package manager
Git
Detailed Installation Guide
1. Environment Setup
Install Node.js:
- Download from [Node.js official website](https://nodejs.org/)
- Verify installation: `node --version`
Install Python:
- Download from [Python official website](https://www.python.org/downloads/)
- Ensure Python is added to PATH during installation
- Verify installation: `python --version`
(Windows Only) Configure UTF-8 Support:
```bash
# Set system-wide UTF-8
setx PYTHONIOENCODING UTF-8
# Set current session UTF-8
set PYTHONIOENCODING=UTF-8
# Enable UTF-8 in command prompt
chcp 65001
```
2. Project Setup
Clone the repository:
```bash
git clone https://github.com/JDJR2024/markdownify-mcp-utf8.git
cd markdownify-mcp-utf8
```
Test the installation:
```bash
# Convert a web page
python convert_utf8.py "https://example.com"
# Convert a local file
python convert_utf8.py "path/to/your/file.docx"
```
Usage Guide
Basic Usage
Converting Web Pages:
```bash
python convert_utf8.py "https://example.com"
```
The converted markdown will be saved as `converted_result.md`
Environment Variables:
```bash
# Set custom UV path
export UV_PATH="/custom/path/to/uv"
# Set custom output directory
export MARKDOWN_OUTPUT_DIR="/custom/output/path"
```
Batch Processing:
Create a batch file (e.g., `convert_batch.txt`) with URLs or file paths:
```text
https://example1.com
https://example2.com
file1.docx
file2.pdf
```
Then run:
```bash
while read -r line; do python convert_utf8.py "$line"; done < convert_batch.txt
```
Troubleshooting
Common Issues:
- If you see encoding errors, ensure UTF-8 is properly set
- For permission issues on Windows, run as Administrator
- For Python path issues, ensure virtual environment is activated
To integrate this server with a desktop app, add the following to your app's server configuration:
Troubleshooting
Encoding Issues
- If you encounter character encoding issues, ensure the `PYTHONIOENCODING` environment variable is set to `utf-8`
- Windows users may need to run `chcp 65001` to enable UTF-8 support
Permission Issues
- Ensure you have sufficient file read/write permissions
- On Windows, you may need to run as administrator
Acknowledgments
This project is based on the original work by Zach Caceres. Thanks to the original author for their outstanding contribution.
License
This project continues to be licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
Contributing
Contributions are welcome! Before submitting a Pull Request, please:
Ensure your code follows the project's coding standards
Add necessary tests and documentation
Update relevant sections in the README
Contact
For issues or suggestions:
Submit an Issue: https://github.com/JDJR2024/markdownify-mcp-utf8/issues
Create a Pull Request: https://github.com/JDJR2024/markdownify-mcp-utf8/pulls