Magentic-UI: A Multi-Agent Web Interface for Complex Task Automation

Victor Karabedyants

Published May 21, 2025

Overview

Magentic-UI is a pioneering research prototype that introduces an agentic approach to automating complex web-based tasks. Designed for seamless collaboration between human users and AI agents, this system combines multiple specialized agents under the coordination of an intelligent Orchestrator. It offers transparency, control, and a high degree of flexibility in task execution.

Core Architecture

Magentic-UI is built on a team of five agents that work together in a modular architecture:

Orchestrator: The central control unit powered by a large language model (LLM). It plans, coordinates, and delegates tasks.
WebSurfer: A browser-controlling agent capable of interacting with web pages—clicking, scrolling, typing, and navigating.
Coder: A programming agent that writes and executes Python or shell scripts in a Docker container.
FileSurfer: Handles file operations, leveraging file-conversion tools to interpret documents and answer file-related queries.
UserProxy: Interfaces with the end-user for approvals, feedback, and collaborative planning.

Key Features

🧑🤝🧑 Co-Planning

Users and the Orchestrator collaborate to define a step-by-step execution plan. The interface allows users to add, modify, delete, or regenerate steps for optimal task planning.

🤝 Co-Tasking

Execution of tasks is a cooperative process. Agents carry out subtasks while continuously integrating real-time feedback from users.

🛡️ Action Guards

Sensitive or potentially destructive operations require user approval, ensuring full transparency and control over actions performed by agents.

🧠 Plan Learning

The system adapts over time by learning from previous plans and user interactions, improving the efficiency of future executions.

How Magentic-UI Works

Interaction: The user inputs a goal through text (and optionally images). The Orchestrator constructs a natural-language plan.
Plan Execution: For each step, the Orchestrator selects the appropriate agent or requests user intervention.
Step Management: After receiving the response, the Orchestrator verifies completion before proceeding to the next step.
Adaptability: If any step fails (e.g., an unreachable website), the system can replan with the user's permission.
Completion: Once all steps are done, a final summary is generated and returned to the user.

The entire process is interactive, visual, and modifiable by the user at any time.

Getting Started

Prerequisites

Python 3.10+
Docker
WSL2 (Windows only)
OpenAI API Key

Installation (PyPI)

python3 -m venv .venv

source .venv/bin/activate

pip install magentic-ui

export OPENAI_API_KEY=<YOUR API KEY>

magentic ui --port 8081

Visit http://localhost:8081 to launch the UI.

Advanced Configuration

Using a Config File

To configure custom API keys or switch to Azure OpenAI, create a config.yaml file in ~/.magentic_ui. Here’s an example:

yaml

model_config: &client

provider: autogen_ext.models.openai.OpenAIChatCompletionClient

config:

model: gpt-4o

api_key: <YOUR API KEY>

max_retries: 10

Recommended by LinkedIn

I Tested Gemini 3.1 Pro on FlexScript. Here's What…

Sebastián C. 1 month ago

How To Train Your AI Agent

Andrei Radulescu-Banu 6 months ago

Building ai-fix: An AI-Native Preflight Code Review…

Fayz Sabir 5 days ago

orchestrator_client: *client

coder_client: *client

web_surfer_client: *client

file_surfer_client: *client

action_guard_client: *client

For Azure integration, replace the provider and include your endpoint, deployment name, and authentication method.

Building from Source

Clone the Repo

git clone https://github.com/microsoft/magentic-ui.git

cd magentic-ui

Setup Python Environment

uv venv --python=3.12 .venv

uv sync --all-extras

source .venv/bin/activate

Build Frontend

# Install node via nvm

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh |

nvm install node

# Install dependencies

cd frontend

npm install -g gatsby-cli

npm install --global yarn

yarn install

yarn build

cd ..

Run Magentic-UI

magentic ui --port 8081

For frontend development, launch it separately:

cd frontend

cp .env.default .env.development

npm run start

Development UI: http://localhost:8000
Production UI: http://localhost:8081

Contributing

Magentic-UI is an open-source project under the Microsoft Open Source Code of Conduct. Contributions are welcome through pull requests or issue reviews.

Before contributing:

Sign the Contributor License Agreement.
Run local tests using:

poe check

Marked issues such as "open for contribution" are good starting points.

Conclusion

Magentic-UI redefines how AI systems and humans collaborate on the web. With powerful agent orchestration, clear task segmentation, and user-controlled transparency, it serves as a blueprint for future human-AI co-working interfaces. Whether you're a developer, researcher, or enthusiast, Magentic-UI provides a robust and extensible foundation for building intelligent web automation tools.

To view or add a comment, sign in

Magentic-UI: A Multi-Agent Web Interface for Complex Task Automation

Victor Karabedyants

Recommended by LinkedIn

More articles by Victor Karabedyants

Others also viewed

A Smart Locator Recovery Approach with LLM on Web-Automation

GraphQL API Test Automation with Rest Assured and json snapshot - Part 1

Solving the Context Drop Problem: Building the Local ADK Orchestrator for Enterprise AI Flows

Automating Tests with Robot Framework: Configuration and Library Usage for Quick Tests

AI in Action!!! Prompt Chaining

Introducing AINative Official Skills: Expert-Level Guidance for Your AI Code Editor

🚀 First Look at Cursor 1.0: Background Agents, BugBot & Beyond

🚀 Gemini CLI: The Open-Source AI Agent That Brings Gemini 2.5 Pro to Your Terminal 🤖💻

Agentic AI Coding Is Dead. You Just Need a Prompt Now.

Cursor: The AI-Powered Code Editor

How to Collaborate With AI Agents and Integrate Tools

How to Use Multi-Agent AI Systems for Autonomous Operations

Integrating Automation With Human Interaction in Web3

AI Agents for Completing Online Tasks

Multi-Agent AI Workflow Observability Framework

Explore content categories

Recommended by LinkedIn

More articles by Victor Karabedyants

How to Measure the ROI of Developer Tools

Strategic Imperative: How Intelligent Orchestration Solves the "AI Paradox" in Software Development

Chatting with Your Cloud: Why You Need the Azure MCP Server

The Illusion of Compatibility: Two Critical Traps in Azure Event Hubs’ Kafka API

Microsoft Foundry’s Model Router: Automating LLM Selection to Save Your Tokens

Accelerating Teams: The Synergy of Platform Engineering and Policy-as-Code (PaC)

Azure Entra ID: App Registrations, Service Principals, and Enterprise Apps

The Main Challenges of Platform Engineering: What's Actually Holding Back IDP Development

Agentic Engineering vs. Vibe Coding: How AI Is Changing Software Development

How to Improve SDLC with Artificial Intelligence: A Practical Stage-by-Stage Overview

Others also viewed

A Smart Locator Recovery Approach with LLM on Web-Automation

GraphQL API Test Automation with Rest Assured and json snapshot - Part 1

Solving the Context Drop Problem: Building the Local ADK Orchestrator for Enterprise AI Flows

Automating Tests with Robot Framework: Configuration and Library Usage for Quick Tests

AI in Action!!! Prompt Chaining

Introducing AINative Official Skills: Expert-Level Guidance for Your AI Code Editor

🚀 First Look at Cursor 1.0: Background Agents, BugBot & Beyond

🚀 Gemini CLI: The Open-Source AI Agent That Brings Gemini 2.5 Pro to Your Terminal 🤖💻

Agentic AI Coding Is Dead. You Just Need a Prompt Now.

Cursor: The AI-Powered Code Editor

Similar topics

How to Collaborate With AI Agents and Integrate Tools

How to Use Multi-Agent AI Systems for Autonomous Operations

Integrating Automation With Human Interaction in Web3

AI Agents for Completing Online Tasks

Multi-Agent AI Workflow Observability Framework

Explore content categories