Agentic Design Patterns: A book that made me rethink "What exactly is an Agent?"

By: rootdata|2026/05/26 03:45:00

Author: Yanhua

Antonio Gullí is the engineering director at Google. He wrote a 453-page book that breaks down the development of AI Agents into 21 design patterns.

But this is not a book review. My motivation for reading this book is very specific: I have written about Harness Engineering, shared my pitfalls with Clawdbot, and discussed the seven turning points from "AI agents are not magic" that go from burning tokens to being truly useful. After each writing, I was left with a question that I hadn't fully thought through: Is there a reusable underlying logic behind these things?

This book gave me the answer, and it was deeper than I expected.

You may not be writing an Agent at all

The harshest judgment in the book is hidden in the prologue.

Most of the "AI" that people are using is just Level 0: bare LLM, with no tools, no memory, and no actions. If you ask it what the best picture at the Oscars in 2025 is, it guesses. The book states plainly: Level 0 is not an Agent.

Moving up is where the real Agents are:

Level 1: Tool User

The Agent starts using tools: search, APIs, databases. But it’s not just about "being able to call interfaces"; it also needs to judge when to call, what to call, and how to use the results. The book provides a very specific example: when a user asks, "What new shows are there recently?", the Agent realizes that this information is not in the training data and proactively calls the search tool to find it, then synthesizes the result. The key step is "realizing on its own." It’s not a human telling it, "go search," but rather it judging that it needs to search. This judgment ability is the threshold for Level 1.
Level 2: Strategic Thinker

Two more elements are added: planning and Context Engineering. The book defines Context Engineering: not just piling up information, but carefully selecting, trimming, and packaging context. A clever example is given: a user wants to find a coffee shop between two locations. The Agent first calls the map tool to gather a bunch of data, then judges that "only the street names are needed next," trims the map output into a short list, and feeds it to the local search tool. Each step is about reducing noise in the information.

There’s a sentence in the book that I read several times: "To achieve the highest accuracy with AI, it must be given short, focused, and powerful context." Context Engineering is about doing this.

At this level, the Agent can also self-reflect. After completing a task, it reviews its work, identifies problems, and makes corrections on its own. I will elaborate on this later.
Level 3: Multi-Agent Collaboration

The book's stance is clear: stop thinking about creating an all-powerful super agent. The truly reliable approach is to build a team, like a project manager Agent + researcher Agent + designer Agent + copywriter Agent. The example given in the book is a new product launch: a "project manager Agent" coordinates everything, assigning tasks to "market research Agent," "product design Agent," and "marketing Agent." The key is communication: how Agents transmit data, synchronize states, and handle conflicts. This chapter illustrates six types of communication topologies, from the simplest single Agent to the most flexible custom mix, with explanations of which scenarios each is suitable for.

After reading these four levels, I suddenly understood why many people say, "My Agent is not useful." The model is not the problem; the issue is that you are treating it like a chatbot, and it may not even have reached Level 1.

Context Engineering: The Most Underestimated Concept in the Book

I wrote an article on Harness Engineering, discussing how track design is more important than engine horsepower. After reading this book, I realized that Context Engineering is the mapping of Harness Engineering at the prompt level.

Traditional Prompt Engineering only cares about "how you ask." The book's Context Engineering concerns "what context is in front of the Agent before asking." It includes four layers of information:

First layer, system prompt. Defines who the Agent is, what tone to use, and what boundaries to set. Most people only write this layer.
Second layer, external data. Documents retrieved by RAG, return values from tool calls, real-time API data. This is where most people get stuck: they know they need to feed data but don’t know how to do it without overwhelming the model.
Third layer, implicit data. User identity, interaction history, environmental state. Things that are not explicitly stated but the Agent should know. For example, if you tell the Agent, "Help me send an email to John to confirm tomorrow's meeting," it should know what tomorrow's meeting is in your calendar and what your relationship with John is.
Fourth layer, feedback loop. After each output, the Agent automatically evaluates quality and adjusts the context strategy for the next time. The book refers to this as "automated context optimization," and Google’s Vertex AI Prompt Optimizer is an engineering implementation of this idea.

When I read this, I remembered a previous experience I shared in "AI agents are not magic," where I mentioned that "your agent needs rules, and many rules." Looking back, those rules are essentially the manual version of Context Engineering, which the book has systematized.

Reflection: Two Agents are Really Better than One

This is the most practically valuable pattern in the entire book for me.

The core of Reflection is simple: the Agent reviews its work after completing a task and makes corrections on its own. But the implementation method is crucial. The book clearly states: The Producer and Critic must use two different Agents, with different system prompts. A single persona reviewing its own work will always have blind spots. If you have the same LLM write code and then review its own code, it is very likely to say, "It’s pretty good."

The book provides a complete code example.

The Producer's prompt is "You are a Python developer, write a function to calculate the factorial, handling edge cases and exceptions."
The Critic's prompt is "You are a nitpicking senior engineer, review the code line by line, checking for bugs, style, missed edge cases, and areas for improvement. If it’s perfect, output CODE_IS_PERFECT; otherwise, list all the issues."
Then there’s a for loop: Producer writes code → Critic reviews → Producer makes changes based on feedback → Critic reviews again → until Critic says CODE_IS_PERFECT or the maximum iteration count is reached.

It’s that simple. But the book reminds us of a cost issue that is easily overlooked: each reflection loop is a new LLM call, and the more iterations, the more expensive it becomes. Additionally, as the conversation history expands, the context window gets filled with earlier versions and critiques, reducing the actual usable reasoning space. Therefore, the best practice for Reflection is: set a reasonable maximum iteration count (the book uses 3), and stop once the Critic is satisfied; don’t pursue perfection.

The uses extend far beyond writing code. Writing articles, making plans, summarizing documents, solving logic problems—all can apply the Producer-Critic model. The book lists seven application scenarios, with the core logic being the same: produce first, then review, and finally correct.

Multi-Agent is Not Better When More Complex

What I liked most about the Multi-Agent Collaboration chapter is the six communication topology diagrams. Many people jump straight into complexity, but in most scenarios, three types are sufficient:

Single Agent (Independent Execution): Tasks can be broken down into independent sub-problems, each Agent handles its own. Simple and easy to maintain.
Peer-to-Peer Network: Agents communicate directly with each other, with no central control node. Decentralized and fault-tolerant; if one Agent fails, it doesn’t affect the whole system. However, coordination costs are high, and it can easily become chaotic.
Supervisor (Central Coordination): A Supervisor Agent manages a group of Worker Agents. It allocates tasks, collects results, and resolves conflicts. Clear hierarchy and easy management. However, the Supervisor is a single point of failure and a performance bottleneck.

The other three (Supervisor-as-Tool, hierarchical, custom mix) are variations and combinations of the first three. The book states practically: The topology you need depends on the complexity of your task. The more fragmented the task, the higher the communication costs; at a certain point, the Supervisor model can be more efficient than hierarchical.

My experience is that many people spend 80% of their time on communication protocols when building Multi-Agents, forgetting to ask a more fundamental question: does this task really need multiple Agents? The book clearly states that a Level 2 single Agent with Reflection is often sufficient. Level 3 is meant for scenarios that a single Agent truly cannot handle.

Memory Three-Layer Model, I Had a Vague Sense of It but Didn’t Name It

The Memory chapter resonated with me the most because when I wrote the articles on Obsidian + Claude, I was constantly pondering a question: how should the Agent's memory be layered?

The book provides the answer:

Session (Conversation Layer): The context window of the current conversation, which is the shortest memory and disappears once the conversation ends. Long-context models simply enlarge this window, but essentially it’s still temporary, and each inference has to process the entire window, which is costly and slow.
State (State Layer): Temporary data during the current task. For example, "What is the current task?", "How far has it progressed?", "What data has been generated in between?". Longer than Session, but cleared once the task ends; the book uses Google ADK's State mechanism as a complete example.
Memory (Persistent Layer): Long-term memory that spans sessions and tasks. User preferences, learned experiences, important historical decisions stored in databases or vector stores, with semantic retrieval. The book emphasizes an important point: Memory is not just about storage; it also requires designing a complete strategy for "what to store, when to store, and how to retrieve." Storing too much creates noise, while storing too little is insufficient.

In my previous article on Clawdbot, I mentioned "state files" and "workspace documents," which essentially were my manual attempts at creating State and Memory layers, and the book has framed this process.

Five Assumptions, the Fifth is the Most Absurd

At the end of the book, five assumptions about the future of Agents are mentioned, with the first four still within reasonable extrapolation: general-purpose Agents evolving from coding to project management, deeply personalized proactive discovery of your needs, embodied intelligence moving from screens into the physical world, and Agents becoming independent economic entities.

The fifth assumption shocked me: Transforming Multi-Agent.

You only declare a goal, such as "create an e-commerce business selling premium coffee." The system automatically decides: first create a "market research Agent" and a "branding Agent." After running some data, it judges that the branding Agent is no longer needed and splits it into three new Agents: "Logo Design Agent," "Website Building Agent," and "Supply Chain Agent." If the Website Building Agent becomes a bottleneck, the system will automatically duplicate three parallel Agents to work on different pages simultaneously. Throughout the process, the system continuously optimizes each Agent's prompt and reorganizes the team structure.

The book refers to this as a "goal-driven, self-transforming multi-Agent system." It is not executing a plan you wrote; it is generating its own plans, adjusting its plans, and reorganizing its execution team on its own.

This reminds me of Karpathy's AutoResearch: write a program.md, define goals, metrics, and boundaries, and hit "start." Humans are outside the loop. But this book pushes it further: even how the Agent team is formed and reorganized is left to the system to decide. Humans only declare "what they want."

Three Actions You Can Take Immediately

After finishing this book, I have three immediate actions I can implement:

First, add a Critic to your current Agent. Whether you are using Claude Code, CrewAI, or a framework you built yourself, add a step at the end of your existing workflow: have another Agent (with a different system prompt) review the output of the previous step. Code generation plus code review, article writing plus fact-checking, planning plus feasibility assessment. It adds one more LLM call, but the quality improvement is often doubled. The Producer-Critic model in the book is plug-and-play.
Second, start doing Context Engineering, not just Prompt Engineering. Look back at the instruction files you wrote for the Agent. If they are all rules about "how you should do it," lacking context about "what environment you are facing right now," fill that in. Tell the Agent what project it is currently in, what decisions have been made previously, and what user preferences are. The Context Engineering chapter in the book and your AGENTS.md are two expressions of the same thing.
Third, don’t rush into Multi-Agent. Get your single Agent to Level 2: with tools, Reflection, and Memory. The book repeatedly emphasizes that a Level 2 single Agent combined with Producer-Critic and Context Engineering can cover the vast majority of practical scenarios. Level 3 is meant for tasks that truly require cross-domain, multi-stage, and parallel division of labor. Most people's problem is not that they lack enough Agents, but that they haven't optimized a single Agent.

This book has 453 pages and will be published by Springer in 2025. The code examples cover LangChain/LangGraph, Google ADK, CrewAI, and OpenAI API. The foreword is written by the Google Cloud AI VP, and there’s a recommendation from the CIO of Goldman Sachs, which is unexpectedly well-written.

But the reason I recommend it is not for its "comprehensiveness." It’s because after reading it, you will realize one thing: the pitfalls you encountered with Agents over the past six months have already been organized into patterns by someone else. You don’t need to reinvent Reflection, you don’t need to guess how to layer Memory, and you don’t need to experiment with which communication topology to use for Multi-Agent.

Someone has drawn the map for you; all that’s left is to walk it.

Are you using AI Agents for development? What level is your current Agent at?

In June, Bloomberg reported that despite Bitcoin falling below $60,000 last week, wiping out about $235 billion in market value within seven days, and dropping close to 50% from last year’s peak, some core businesses in the crypto industry are still expanding, mainly in stablecoins, real-world asset tokenization (RWA), payments, and infrastructure. The report also noted that overall altcoin activity has contracted significantly: altcoin market capitalization has fallen from a peak of about $431 billion in November 2021 to around $170 billion, and among the tens of millions of tokens issued in recent years, fewer than 1,700 still maintain meaningful trading activity.

Galaxy Deep Research Report: How Hyperliquid's HIP-4 Upgrade Changes the Landscape of Prediction Markets?

The platform that wins this competition will be the one whose execution layer is the hardest to replicate, whose builder ecosystem delivers the fastest, and whose regulatory path is the most open.

Binance Research: RWA Market Expected to Expand Nearly 6x from Early 2025, with Public Equities and Onchain Payments Heating Up Together

In June, Binance Research said in its monthly market report that the real-world asset (RWA) market is expected to grow by about 589% from the beginning of 2025. Bond- and money market fund-related RWA expanded by about $6.5 billion, up 83% year over year, while publicly traded equity RWAs grew by about 422%. The report also noted that monthly crypto debit card transaction volume exceeded $747 million in May, up 48.6% year to date.

Japan to Assess a Framework for Yen Stablecoins and Crypto ETFs as Asia’s Compliant Payments Narrative Heats Up

Recently, according to the original report, Japan is considering the launch of yen stablecoins and cryptocurrency ETFs. Public information remains limited at this stage, and there is still no complete policy text, regulatory draft, or clear implementation timeline, so this is better characterized as a “policy discussion” rather than formal implementation. The original wording also noted that advancing stablecoin regulation in Asia is driving XRP usage and supporting growth in the XRPL ecosystem. However, based on currently available public information, there is not enough evidence to directly establish a clear causal relationship between this round of discussion in Japan and XRP or XRPL.

ZachXBT: Humanity private key leak and abnormal surge in H token should be viewed separately

On June 9, according to related disclosures, on-chain investigator ZachXBT posted an update on Humanity’s roughly $31 million security incident, saying that after further analyzing fund flows, he currently tends to believe the project team was not involved in an “inside job” or a self-staged attack. According to him, the official explanation about the private key leak was broadly accurate, but before the token unlock, the price of H had been artificially pushed higher, and the hacker later took advantage of that market environment; therefore, the private key leak and the earlier abnormal price pumping should be regarded as two separate and independent events. This reframing has shifted the market’s understanding of the nature of the incident. Earlier discussion around Humanity had focused on whether the team directly participated in the attack or used the security incident to cover up internal operations. ZachXBT’s latest remarks shift the focus from “whether it was self-theft” to “whether there were pre-unlock market structure issues.” He also questioned whether the team may have.

Morning Report | OpenAI has submitted an S-1 registration statement draft to the U.S. SEC; Morpho completes $175 million financing

Overview of Important Market Events on June 9th

Morning Report | BitMine increased its holdings by 126,971 ETH last week; trader Eugene announced his exit from the crypto market

Overview of Important Market Events on June 8th

Wang Chuan: How can one not feel anxious after the neighbor Old Wang made thirty times profit by investing in storage stocks? (Seven) - A quarter-century cycle

In-depth analysis of the "reflexivity" bubble trap in storage stocks: Beware of the backlash from the bullwhip effect and the false narrative of high growth; do not let the short-term myth of wealth become a wealth abyss that cannot be recovered for 25 years.

Cryptocurrency CEXs are flocking to sell US stocks, and traditional brokerages are facing an "uninvited guest."

The major reshuffle has just begun.

$75 billion in foreign capital has fled, and South Korean retail investors have absorbed it all using leverage

Despite the accelerated migration of Korean funds from cryptocurrency to the stock market, the Korean market remains an important barometer for global cryptocurrency retail liquidity and recovery turning points.

Japan’s Three Megabanks Plan Joint Stablecoin Issuance in Fiscal 2026

MUFG, SMBC, and Mizuho reportedly plan to jointly issue fiat-pegged stablecoins in fiscal 2026, signaling Japan’s growing push into bank-led digital payment infrastructure.

Humanity Discloses H Token Dual-Chain Attack Details, With Losses on Ethereum and BSC Exceeding $36 Million

Humanity said the H token attack across Ethereum and BSC caused more than $36 million in losses after leaked ProxyAdmin keys enabled malicious contract upgrades and token minting.

White House Discusses CLARITY Act With Law Enforcement Ahead of Senate Vote

The White House discussed the CLARITY Act with law enforcement ahead of a Senate vote, focusing on illicit finance risks and developer protections.

Foreign selling in the South Korean stock market accelerates, with cumulative net sales reportedly reaching $75 billion this year

On June 9, The Kobeissi Letter, citing Goldman Sachs data, reported that global investors are selling South Korean stocks at an unusually rapid pace. In the latest trading session, foreign investors sold about $801 million worth of Kospi constituent stocks again; total foreign outflows last week reached about $10 billion, and the market has been in net foreign selling on nearly every trading day over the past month. According to the data cited in the report, foreign investors have sold about $75 billion worth of South Korean stocks so far this year. Meanwhile, South Korean retail and institutional investors together recorded roughly $69 billion in net buying over the same period, suggesting that the market’s main buying support has come from domestic capital rather than returning overseas funds. The information currently disclosed still mainly comes from The Kobeissi Letter’s retelling and Goldman Sachs data summaries, while public details on the statistical period and the specific definition of “selling” remain relatively limited.

Fortune Warns of Strategy’s Financing Structure Risks as Bitcoin Premium Narrows

Fortune warned that Strategy’s Bitcoin treasury model faces growing financing risks as MSTR’s net asset premium narrows and preferred stock dividend pressure increases.

Bloomberg: As Bitcoin Weakens, Stablecoins and RWA Continue to Drive Expansion in Crypto Businesses