web fetcher.com
web fetcher.com logo

Web Fetcher

Fetches and extracts web content using Playwright's headless browser capabilities, delivering clean, readable content fr...

Created byApr 22, 2025

Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

Advantages

  • JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.
  • Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.
  • Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.
  • Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.
  • Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.
  • Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.
  • Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Quick Start

Run directly with npx:
First time setup - install the required browser by running the following command in your terminal:

Debug Mode

Run with the --debug option to show the browser window for debugging:

Configuration MCP

Configure this MCP server in Claude Desktop:
On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json

Features

  • fetch_url - Retrieve web page content from a specified URL
  • fetch_urls - Batch retrieve web page content from multiple URLs in parallel

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

  • Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:This will use the waitForNavigation: true parameter.
  • Increase Timeout Duration: For websites that load slowly:This adjusts both timeout and navigationTimeout parameters accordingly.

Content Retrieval Adjustments

  • Preserve Original HTML Structure: When content extraction might fail:Sets extractContent: false and returnHtml: true.
  • Fetch Complete Page Content: When extracted content is too limited:Sets extractContent: false.
  • Return Content as HTML: When HTML format is needed instead of default Markdown:Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

  • Dynamic Debug Activation: To display the browser window during a specific fetch operation:This sets debug: true even if the server was started without the --debug flag.

Using Custom Cookies for Authentication

  • Manual Login: To login using your own credentials:Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.
  • Interacting with Debug Browser: When debug mode is enabled:
  • Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

Install Playwright Browser

Install the browsers needed for Playwright:

Build the Server

Debugging

Use MCP Inspector for debugging:
You can also enable visible browser mode for debugging:

Related Projects

  • g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.

License

Licensed under the MIT License

Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

Advantages

  • JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.
  • Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.
  • Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.
  • Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.
  • Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.
  • Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.
  • Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Quick Start

Run directly with npx:
First time setup - install the required browser by running the following command in your terminal:

Debug Mode

Run with the --debug option to show the browser window for debugging:

Configuration MCP

Configure this MCP server in Claude Desktop:
On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json

Features

  • fetch_url - Retrieve web page content from a specified URL
  • fetch_urls - Batch retrieve web page content from multiple URLs in parallel

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

  • Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:This will use the waitForNavigation: true parameter.
  • Increase Timeout Duration: For websites that load slowly:This adjusts both timeout and navigationTimeout parameters accordingly.

Content Retrieval Adjustments

  • Preserve Original HTML Structure: When content extraction might fail:Sets extractContent: false and returnHtml: true.
  • Fetch Complete Page Content: When extracted content is too limited:Sets extractContent: false.
  • Return Content as HTML: When HTML format is needed instead of default Markdown:Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

  • Dynamic Debug Activation: To display the browser window during a specific fetch operation:This sets debug: true even if the server was started without the --debug flag.

Using Custom Cookies for Authentication

  • Manual Login: To login using your own credentials:Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.
  • Interacting with Debug Browser: When debug mode is enabled:
  • Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

Install Playwright Browser

Install the browsers needed for Playwright:

Build the Server

Debugging

Use MCP Inspector for debugging:
You can also enable visible browser mode for debugging:

Related Projects

  • g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.

License

Licensed under the MIT License