MCP Webscan Server

[![smithery badge](https://smithery.ai/badge/mcp-server-webscan)](https://smithery.ai/server/mcp-server-webscan)

A Model Context Protocol (MCP) server for web content scanning and analysis. This server provides tools for fetching, analyzing, and extracting information from web pages.

Features

**Page Fetching**: Convert web pages to Markdown for easy analysis

**Link Extraction**: Extract and analyze links from web pages

**Site Crawling**: Recursively crawl websites to discover content

**Link Checking**: Identify broken links on web pages

**Pattern Matching**: Find URLs matching specific patterns

**Sitemap Generation**: Generate XML sitemaps for websites

Installation

Installing via Smithery

To install Webscan for Claude Desktop automatically via [Smithery](https://smithery.ai/server/mcp-server-webscan):

Manual Installation

Usage

Starting the Server

The server runs on stdio transport, making it compatible with MCP clients like Claude Desktop.

Available Tools

`fetch-page` - Fetches a web page and converts it to Markdown. - Parameters: - `url` (required): URL of the page to fetch. - `selector` (optional): CSS selector to target specific content.

`extract-links` - Extracts all links from a web page with their text. - Parameters: - `url` (required): URL of the page to analyze. - `baseUrl` (optional): Base URL to filter links. - `limit` (optional, default: 100): Maximum number of links to return.

`crawl-site` - Recursively crawls a website up to a specified depth. - Parameters: - `url` (required): Starting URL to crawl. - `maxDepth` (optional, default: 2): Maximum crawl depth (0-5).

`check-links` - Checks for broken links on a page. - Parameters: - `url` (required): URL to check links for.

`find-patterns` - Finds URLs matching a specific pattern. - Parameters: - `url` (required): URL to search in. - `pattern` (required): JavaScript-compatible regex pattern to match URLs against.

`generate-site-map` - Generates a simple XML sitemap by crawling. - Parameters: - `url` (required): Root URL for sitemap crawl. - `maxDepth` (optional, default: 2): Maximum crawl depth for discovering URLs (0-5). - `limit` (optional, default: 1000): Maximum number of URLs to include in the sitemap.

Example Usage with Claude Desktop

Configure the server in your Claude Desktop settings:

Use the tools in your conversations:

Development

Prerequisites

Node.js >= 18

Project Structure (Post-Refactor)

Building

Development Mode

Error Handling

The server implements comprehensive error handling:

Invalid parameters

Network errors

Content parsing errors

URL validation

All errors are properly formatted according to the MCP specification.

Contributing

Fork the repository

Create your feature branch (`git checkout -b feature/amazing-feature`)

Commit your changes (`git commit -m 'Add some amazing feature'`)

Push to the branch (`git push origin feature/amazing-feature`)

Open a Pull Request

License

MIT License - see the LICENSE file for details

Web Content Extractor