Enables remote control of Android and iOS devices through commands for screenshots, app management, screen interactions,...
Created byApr 22, 2025
Mobile Next - MCP server for Mobile Development and Automation | iOS, Android, Simulator, Emulator, and physical devices
This is a Model Context Protocol (MCP) server that enables scalable mobile automation, development through a platform-agnostic interface, eliminating the need for distinct iOS or Android knowledge. You can run it on emulators, simulators, and physical devices (iOS and Android).
This server allows Agents and LLMs to interact with native iOS/Android applications and devices through structured accessibility snapshots or coordinate-based taps based on screenshots.
Join us on our journey as we continuously enhance Mobile MCP!
Check out our detailed roadmap to see upcoming features, improvements, and milestones. Your feedback is invaluable in shaping the future of mobile automation.
Native app automation (iOS and Android) for testing or data-entry scenarios.
Scripted flows and form interactions without manually controlling simulators/emulators or physical devices (iPhone, Samsung, Google Pixel etc)
Automating multi-step user journeys driven by an LLM
General-purpose mobile application interaction for agent-based frameworks
Enables agent-to-agent communication for mobile automation usecases, data extraction
Main Features
Fast and lightweight: Uses native accessibility trees for most interactions, or screenshot based coordinates where a11y labels are not available.
LLM-friendly: No computer vision model required in Accessibility (Snapshot).
Visual Sense: Evaluates and analyses what s actually rendered on screen to decide the next action. If accessibility data or view-hierarchy coordinates are unavailable, it falls back to screenshot-based analysis.
Deterministic tool application: Reduces ambiguity found in purely screenshot-based approaches by relying on structured data whenever possible.
Extract structured data: Enables you to extract structred data from anything visible on screen.
Mobile MCP Architecture
Wiki page
More details in our wiki page for setup, configuration and debugging related questions.
Physical iOS or Android devices (requires proper platform tools and drivers)
Make sure you have your mobile platform SDKs (Xcode, Android SDK) installed and configured properly before running Mobile Next Mobile MCP.
Running in "headless" mode on Simulators/Emulators
When you do not have a physical phone connected to your machine, you can run Mobile MCP with an emulator or simulator in the background.
For example, on Android:
Start an emulator (avdmanager / emulator command).
Run Mobile MCP with the desired flags
On iOS, you'll need Xcode and to run the Simulator before using Mobile MCP with that simulator instance.
xcrun simctl list
xcrun simctl boot "iPhone 16"
Mobile Commands and interaction tools
The commands and tools support both accessibility-based locators (preferred) and coordinate-based inputs, giving you flexibility when accessibility/automation IDs are missing for reliable and seemless automation.
mobile_list_apps
Description: List all the installed apps on the device
Parameters:
mobile_launch_app
Description: Launches the specified app on the device/emulator
Parameters:
mobile_terminate_app
Description: Terminates a running application
Parameters:
mobile_get_screen_size
Description: Get the screen size of the mobile device in pixels
Parameters: None
mobile_click_on_screen_at_coordinates
Description: Taps on specified screen coordinates based on coordinates.
Parameters:
mobile_list_elements_on_screen
Description: List elements on screen and their coordinates, with display text or accessibility label.
Parameters: None
mobile_element_tap
Description: Taps on a UI element identified by accessibility locator
Parameters:
mobile_tap
Description: Taps on specified screen coordinates
Parameters:
mobile_press_button
Description: Press a button on device (home, back, volume, enter, power button.)
Parameters: None
mobile_open_url
Description: Open a URL in browser on device
Parameters:
mobile_type_text
Description: Types text into a focused UI element (e.g., TextField, SearchField)
Parameters:
mobile_element_swipe
Description: Performs a swipe gesture from one UI element to another
Parameters:
mobile_swipe
Description: Performs a swipe gesture between two sets of screen coordinates
Parameters:
mobile_press_key
Description: Presses hardware keys or triggers special events (e.g., back button on Android)
Parameters:
mobile_take_screenshot
Description: Captures a screenshot of the current device screen
Parameters: None
mobile_get_source
Description: Fetches the current device UI structure (accessibility snapshot) (xml format)
Parameters: None
Thanks to all contributors
We appreciate everyone who has helped improve this project.
Mobile Next - MCP server for Mobile Development and Automation | iOS, Android, Simulator, Emulator, and physical devices
This is a Model Context Protocol (MCP) server that enables scalable mobile automation, development through a platform-agnostic interface, eliminating the need for distinct iOS or Android knowledge. You can run it on emulators, simulators, and physical devices (iOS and Android).
This server allows Agents and LLMs to interact with native iOS/Android applications and devices through structured accessibility snapshots or coordinate-based taps based on screenshots.
Join us on our journey as we continuously enhance Mobile MCP!
Check out our detailed roadmap to see upcoming features, improvements, and milestones. Your feedback is invaluable in shaping the future of mobile automation.
Native app automation (iOS and Android) for testing or data-entry scenarios.
Scripted flows and form interactions without manually controlling simulators/emulators or physical devices (iPhone, Samsung, Google Pixel etc)
Automating multi-step user journeys driven by an LLM
General-purpose mobile application interaction for agent-based frameworks
Enables agent-to-agent communication for mobile automation usecases, data extraction
Main Features
Fast and lightweight: Uses native accessibility trees for most interactions, or screenshot based coordinates where a11y labels are not available.
LLM-friendly: No computer vision model required in Accessibility (Snapshot).
Visual Sense: Evaluates and analyses what s actually rendered on screen to decide the next action. If accessibility data or view-hierarchy coordinates are unavailable, it falls back to screenshot-based analysis.
Deterministic tool application: Reduces ambiguity found in purely screenshot-based approaches by relying on structured data whenever possible.
Extract structured data: Enables you to extract structred data from anything visible on screen.
Mobile MCP Architecture
Wiki page
More details in our wiki page for setup, configuration and debugging related questions.
Physical iOS or Android devices (requires proper platform tools and drivers)
Make sure you have your mobile platform SDKs (Xcode, Android SDK) installed and configured properly before running Mobile Next Mobile MCP.
Running in "headless" mode on Simulators/Emulators
When you do not have a physical phone connected to your machine, you can run Mobile MCP with an emulator or simulator in the background.
For example, on Android:
Start an emulator (avdmanager / emulator command).
Run Mobile MCP with the desired flags
On iOS, you'll need Xcode and to run the Simulator before using Mobile MCP with that simulator instance.
xcrun simctl list
xcrun simctl boot "iPhone 16"
Mobile Commands and interaction tools
The commands and tools support both accessibility-based locators (preferred) and coordinate-based inputs, giving you flexibility when accessibility/automation IDs are missing for reliable and seemless automation.
mobile_list_apps
Description: List all the installed apps on the device
Parameters:
mobile_launch_app
Description: Launches the specified app on the device/emulator
Parameters:
mobile_terminate_app
Description: Terminates a running application
Parameters:
mobile_get_screen_size
Description: Get the screen size of the mobile device in pixels
Parameters: None
mobile_click_on_screen_at_coordinates
Description: Taps on specified screen coordinates based on coordinates.
Parameters:
mobile_list_elements_on_screen
Description: List elements on screen and their coordinates, with display text or accessibility label.
Parameters: None
mobile_element_tap
Description: Taps on a UI element identified by accessibility locator
Parameters:
mobile_tap
Description: Taps on specified screen coordinates
Parameters:
mobile_press_button
Description: Press a button on device (home, back, volume, enter, power button.)
Parameters: None
mobile_open_url
Description: Open a URL in browser on device
Parameters:
mobile_type_text
Description: Types text into a focused UI element (e.g., TextField, SearchField)
Parameters:
mobile_element_swipe
Description: Performs a swipe gesture from one UI element to another
Parameters:
mobile_swipe
Description: Performs a swipe gesture between two sets of screen coordinates
Parameters:
mobile_press_key
Description: Presses hardware keys or triggers special events (e.g., back button on Android)
Parameters:
mobile_take_screenshot
Description: Captures a screenshot of the current device screen
Parameters: None
mobile_get_source
Description: Fetches the current device UI structure (accessibility snapshot) (xml format)
Parameters: None
Thanks to all contributors
We appreciate everyone who has helped improve this project.