Magentic-UI: A Multi-Agent Web Interface for Complex Task Automation

Magentic-UI: A Multi-Agent Web Interface for Complex Task Automation

Overview

Magentic-UI is a pioneering research prototype that introduces an agentic approach to automating complex web-based tasks. Designed for seamless collaboration between human users and AI agents, this system combines multiple specialized agents under the coordination of an intelligent Orchestrator. It offers transparency, control, and a high degree of flexibility in task execution.


Article content


Article content


Article content


Article content

Core Architecture

Magentic-UI is built on a team of five agents that work together in a modular architecture:

  • Orchestrator: The central control unit powered by a large language model (LLM). It plans, coordinates, and delegates tasks.
  • WebSurfer: A browser-controlling agent capable of interacting with web pages—clicking, scrolling, typing, and navigating.
  • Coder: A programming agent that writes and executes Python or shell scripts in a Docker container.
  • FileSurfer: Handles file operations, leveraging file-conversion tools to interpret documents and answer file-related queries.
  • UserProxy: Interfaces with the end-user for approvals, feedback, and collaborative planning.

Key Features

🧑🤝🧑 Co-Planning

Users and the Orchestrator collaborate to define a step-by-step execution plan. The interface allows users to add, modify, delete, or regenerate steps for optimal task planning.

🤝 Co-Tasking

Execution of tasks is a cooperative process. Agents carry out subtasks while continuously integrating real-time feedback from users.

🛡️ Action Guards

Sensitive or potentially destructive operations require user approval, ensuring full transparency and control over actions performed by agents.

🧠 Plan Learning

The system adapts over time by learning from previous plans and user interactions, improving the efficiency of future executions.


How Magentic-UI Works


Article content

  1. Interaction: The user inputs a goal through text (and optionally images). The Orchestrator constructs a natural-language plan.
  2. Plan Execution: For each step, the Orchestrator selects the appropriate agent or requests user intervention.
  3. Step Management: After receiving the response, the Orchestrator verifies completion before proceeding to the next step.
  4. Adaptability: If any step fails (e.g., an unreachable website), the system can replan with the user's permission.
  5. Completion: Once all steps are done, a final summary is generated and returned to the user.

The entire process is interactive, visual, and modifiable by the user at any time.

Getting Started

Prerequisites

  • Python 3.10+
  • Docker
  • WSL2 (Windows only)
  • OpenAI API Key

Installation (PyPI)

 

 

python3 -m venv .venv

source .venv/bin/activate

pip install magentic-ui

export OPENAI_API_KEY=<YOUR API KEY>

magentic ui --port 8081

Visit http://localhost:8081 to launch the UI.


Advanced Configuration

Using a Config File

To configure custom API keys or switch to Azure OpenAI, create a config.yaml file in ~/.magentic_ui. Here’s an example:

yaml

 

model_config: &client

  provider: autogen_ext.models.openai.OpenAIChatCompletionClient

  config:

    model: gpt-4o

    api_key: <YOUR API KEY>

    max_retries: 10

 

orchestrator_client: *client

coder_client: *client

web_surfer_client: *client

file_surfer_client: *client

action_guard_client: *client

For Azure integration, replace the provider and include your endpoint, deployment name, and authentication method.


Building from Source

  1. Clone the Repo

 

 

git clone https://github.com/microsoft/magentic-ui.git

cd magentic-ui

  1. Setup Python Environment

 

 

uv venv --python=3.12 .venv

uv sync --all-extras

source .venv/bin/activate

  1. Build Frontend

 

 

# Install node via nvm

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh |

nvm install node

# Install dependencies

cd frontend

npm install -g gatsby-cli

npm install --global yarn

yarn install

yarn build

cd ..

  1. Run Magentic-UI

 

 

magentic ui --port 8081

For frontend development, launch it separately:

cd frontend

cp .env.default .env.development

npm run start


Contributing

Magentic-UI is an open-source project under the Microsoft Open Source Code of Conduct. Contributions are welcome through pull requests or issue reviews.

Before contributing:

 

poe check

Marked issues such as "open for contribution" are good starting points.


Conclusion

Magentic-UI redefines how AI systems and humans collaborate on the web. With powerful agent orchestration, clear task segmentation, and user-controlled transparency, it serves as a blueprint for future human-AI co-working interfaces. Whether you're a developer, researcher, or enthusiast, Magentic-UI provides a robust and extensible foundation for building intelligent web automation tools.

 

To view or add a comment, sign in

More articles by Victor Karabedyants

Others also viewed

Explore content categories