A Model Context Protocol (MCP) server for web content scanning and analysis. This server provides tools for fetching, analyzing, and extracting information from web pages.
<a href="https://glama.ai/mcp/servers/u0tna3hemh"><img width="380" height="200" src="https://glama.ai/mcp/servers/u0tna3hemh/badge" alt="Webscan Server MCP server" /></a>
Features
**Page Fetching**: Convert web pages to Markdown for easy analysis
**Link Extraction**: Extract and analyze links from web pages
**Site Crawling**: Recursively crawl websites to discover content
**Link Checking**: Identify broken links on web pages
**Pattern Matching**: Find URLs matching specific patterns
**Sitemap Generation**: Generate XML sitemaps for websites
Installation
Installing via Smithery
To install Webscan for Claude Desktop automatically via [Smithery](https://smithery.ai/server/mcp-server-webscan):
Manual Installation
Usage
Starting the Server
The server runs on stdio transport, making it compatible with MCP clients like Claude Desktop.
Available Tools
`fetch-page`
- Fetches a web page and converts it to Markdown.
- Parameters:
- `url` (required): URL of the page to fetch.
- `selector` (optional): CSS selector to target specific content.
`extract-links`
- Extracts all links from a web page with their text.
- Parameters:
- `url` (required): URL of the page to analyze.
- `baseUrl` (optional): Base URL to filter links.
- `limit` (optional, default: 100): Maximum number of links to return.
`crawl-site`
- Recursively crawls a website up to a specified depth.
- Parameters:
- `url` (required): Starting URL to crawl.
- `maxDepth` (optional, default: 2): Maximum crawl depth (0-5).
`check-links`
- Checks for broken links on a page.
- Parameters:
- `url` (required): URL to check links for.
`find-patterns`
- Finds URLs matching a specific pattern.
- Parameters:
- `url` (required): URL to search in.
- `pattern` (required): JavaScript-compatible regex pattern to match URLs against.
`generate-site-map`
- Generates a simple XML sitemap by crawling.
- Parameters:
- `url` (required): Root URL for sitemap crawl.
- `maxDepth` (optional, default: 2): Maximum crawl depth for discovering URLs (0-5).
- `limit` (optional, default: 1000): Maximum number of URLs to include in the sitemap.
Example Usage with Claude Desktop
Configure the server in your Claude Desktop settings:
Use the tools in your conversations:
Development
Prerequisites
Node.js >= 18
npm
Project Structure (Post-Refactor)
Building
Development Mode
Error Handling
The server implements comprehensive error handling:
Invalid parameters
Network errors
Content parsing errors
URL validation
All errors are properly formatted according to the MCP specification.
Contributing
Fork the repository
Create your feature branch (`git checkout -b feature/amazing-feature`)
Commit your changes (`git commit -m 'Add some amazing feature'`)
Push to the branch (`git push origin feature/amazing-feature`)