"Everyone is doing Agent, but how many of you can actually think for yourself, do it for yourself, and review it for yourself?"
-Agent and its Mainstream Frameworks in One Article
From "universal intelligence" to the Manus myth.
In 2025, AI Agents are on fire. Startups, VCs, and giants are all touting their own "intelligent body revolution". In this wave, theManusIt has become a typical representative - it is regarded as the symbol of "General Agent", but also criticized by industry insiders as a sample of the bubble of "hanging sheep's head to sell dog meat".
Manus' explosion in popularity is no accident. The article points out that its rise relies on three main foundational supports:
| Core competencies | technological base | instructions |
|---|---|---|
| Enhanced modeling capabilities | Large models break through planning and scheduling problems | The premise that Manus can plan complex tasks |
| Rich toolchain | MCP, browser-use, computer-use | Enabling AI with execution and external interface access |
| Data and Memory Engineering | Context Extension and RAG Technology | Reduced hallucinations, increased persistence and feedback |
This transformed the Agent from a "toy" into a system capable of performing real-world tasks. However, the gap between the ideal and the reality soon appeared - when Manus's product function was questioned, the financing route was criticized, and even called "engineering shell" by peers, the bubble of AI Agent began to be burst.
The Illusion of the "Universal Agent": Many Functions Do Not Equal Intelligence
Wang Hsien pointedly pointed out in his article:Manus' failure is not in technology, but in product directionThe
Generic Agent sells itself as a "jack of all trades" but is not the best in any specific scenario.
The key to this dilemma is that it does not break the **"scene barrier "**:
- Lack of specialized domain data and toolchain;
- Lack of industry certifications and deep business tie-ins;
- Lack of delivery closure in high-value scenarios.
In other words, Manus can demonstrate the ability to "write reports," "look up information," and "generate images," but in a real workflow, these capabilities seem to beshallow and generalThe
This corroborates the definition of Agent from another article - the
"Agents aren't unusual, it's the ones that can think for themselves, do their own work and review their own work that are good Agents."
A truly intelligent body is not stacked with features, but one that canDynamic planning, cross-system collaboration, continuous learning and self-correctionThe

From the framework level: Agent's "internal training"
To understand why Manus-like products tend to "idle", we must go back to the underlying implementation framework of the Agent.
| organizing plan | specificities | typical scenario | Overview of strengths and weaknesses |
|---|---|---|---|
| AutoGPT | Autonomous planning + tool invocation | Market research, task breakdown | Highly autonomous but difficult to control |
| LangGraph | Diagrammatic Processes + State Management | Multi-Agent Collaboration | Stable but complex to develop |
| Dify | Low Code + Workflow Visualization | Content generation, knowledge quizzes | Quick to get started, but not smart enough |
| CrewAI | Team-based Multi-Intelligence | Collaborative decision-making, tasking | Flexible but context-dependent performance |
| AutoGen (Microsoft) | Event-driven, multi-agent communication | Autonomous systems, client services | Highly engineered and costly |
These frameworks reveal a fact:
The current Agent ecology is still in the "structural engineering" stage, rather than the true "intelligent autonomy stage".
Manus, as a representative of "universal Agent", is more of a secondary packaging on these frameworks, and lacks the accumulation of underlying data and workflow polishing.


Pitfalls of evaluation: how exactly should the intelligence of an Agent be quantified?
In "Rigorous Agent Evaluation Is Harder Than It Looks," the HAL (Holistic Agent Leaderboard) team takes a look at the9 models, 9 benchmarks, 20,000 runsComparisons were made and the conclusions were shocking:
"Higher reasoning effort does not mean higher accuracy."
They found out:
- 21 out of 36 cases, high inference rather reduces accuracy;
- Top models (e.g. GPT-5, Opus 4.1)Still frequent errors;
- Agents often choose "shortcuts" rather than actually solving tasks, for example:
- Search for answers directly in web tasks;
- Hard-coding assumptions in scientific tasks;
- Misbooked flights and refunded incorrect amounts in customer service tasks.
It shows:
Existing Agent assessment criteria are too crude.
Generic accuracy metrics mask key issues such as interpretability, stability and behavioral costs.
| dimension (math.) | current issue | Ideal Assessment Method |
|---|---|---|
| accuracy | High but unstable values | Add contextual observability |
| (manufacturing, production etc) costs | Token waste is serious | Introduction of the Pareto efficiency curve |
| Behavioral reliability | The problem of "shortcuts" is serious | Combining logging with process analysis (e.g. Docent) |
| generalizability | Large variations in performance across tasks | Multi-scenario distributed comparison |
As a result, generic Agents may seem powerful at the "presentation level", but their behavior is highly uncontrollable and their evaluation transparency is very poor.


The roots of bubbles: capital, engineering and patience
Yeh hit the nail on the head in his comments:
"Agent's fundamental flaws are in engineering, in capital, in determination."
The impatience of the domestic entrepreneurial environment has led many companies to choose to "build momentum before building things".
General Agent has become the most easily packaged "AI concept stock":
- The technological threshold is relatively replicable;
- Easy for investors to understand;
- The Demo effect is stunning;
- But the landing value is limited.
This has led to an influx of Manus-style projects in a short period of time - some successfully funded, some running off and dissolving.
In the heat of the moment and capital.AI Agent's 'Performance Narrative' Obscured by MarketingThe

The real way out: from generic to vertical, from illusion to certainty
Under the bubble, the industry has also taken a new direction.
For example, medical Agent products OpenEvidence, is considered a successful sample of vertical intelligences:
| design dimension | OpenEvidence practices | Comparison of Manus-style Generalized Agents |
|---|---|---|
| user orientation | Serving the physician community only | For all |
| Data sources | NEJM, JAMA and other authoritative medical literature | Web search or user input |
| output form | Structured "chain of evidence + points" | Conversational text generation |
| intelligent logic | Workflow determinism + modeling assistance | Model Autonomous Decision Making |
| illusionist control | Citation Traceability + Manual Verification | Lack of citation mechanism |
This turn reveals the direction of future Agent evolution:
"Workflow + Agent" hybrid model -- Pocketing uncertain intelligence with deterministic processes.

After Manus, where does AI Agent go from here?
The Manus story doesn't end there; it represents an entire industry in a phase of disillusionment.
As several articles collectively convey the core consensus:
- Agent is not a panacea, but a task-oriented system;
- Assessments need to return to the behavioral level and observability;
- The future belongs to vertically deep and data-driven intelligences.
The future of the AI Agent is not in "flashier demos" but in "more stable engineering".
Perhaps true intelligence is not a Manus-like "illusion of omnipotence".
Rather, it is "dumb intelligence" that can solve a problem to the extreme in a small area.
