03Case 03 / 04 · Google Cloud

Designing for beginners in a beginner industry

Foundational research on the experience of building AI agents, revealing that even top engineering teams are beginners at it, and reshaping the product strategy around that.

Org

Google Cloud

Surface

Observability for AI agents (new product area)

Role

Lead Researcher

Team

Embedded with the agent observability product team

Timeframe

Foundational discovery program

Sample

In-depth interviews across companies from large enterprises to smaller teams, all building agents for internal use

Methods

M.01Foundational interviewsM.02Customer discovery at top companiesM.03Workflow mappingM.04Stakeholder synthesis

Senior signals

Critical thinkingSystems thinkingImpact & follow-through

TL;DR

The team knew how to support observability for agents already running in production, but not for agents still under development, even though earlier research and customer calls had shown demand for it. I led a foundational study at top companies to map how teams actually build an agent, step by step. The finding: building agents is still new territory even at the most sophisticated companies, with no shared industry practices and a real hunger for purpose-built tools. That reshaped the product into a beginner-friendly tool with out-of-the-box dashboards and templates, and opened a new revenue stream for Google Cloud: specialized tooling for evaluating how well an agent performs.

Systems map · Systems thinking

From a feature-scoping question to a market opportunity

The original framing was a feature question: what observability does someone building an agent need? I pushed to widen the scope before committing to features. A foundational study at top companies revealed something the product team’s hypothesis hadn’t accounted for: even sophisticated engineering teams are beginners at building agents, because the whole industry still is. That changed the question from “which features” to “which audience.” Designing for beginners in a beginner industry is a completely different product than designing for experts.

Evaluation is the wall. Teams struggled with the basics of evaluation: choosing the right underlying model, judging whether an output was good enough, and measuring quality in any rigorous way. Most fell back on manual, ad-hoc checks.
Observability disappears once an agent goes live. The visibility teams leaned on while developing an agent largely fell away once it moved to a live environment, leaving them to troubleshoot a multi-step reasoning flow by hand.
No shared standards. With no common industry approach to building, evaluating, and observing agents, teams reinvented their own path each time; slow, inconsistent, and hard to repeat or hand off.

The opportunity hiding in the pain

The clearest unmet need across interviews was purpose-built tooling to evaluate agents rigorously rather than by hand. That demand, plus the absence of any standard way to do it, is exactly the open space that turned a feature study into a new revenue stream.

A two-panel illustration. The left panel, labeled Previous Assumption, shows developers looking frustrated and confused, surrounded by question marks and notes asking about industry standards and best practices. The right panel, labeled Key Insight: Guidance, Structure, and Guardrails, shows developers smiling and confident in front of a beginner-friendly interface with opinionated templates, out-of-the-box dashboards, specialized agent evaluation tooling, and arrows pointing to a new revenue line. — Fig. 03.1 · Before / after the reframe: the same audience, going from frustrated and lost to confident and fast, once the product met them as beginners. Illustration generated with Gemini.

Finding → strategy chain

From a brief to a business outcome

Original brief

What observability features do agent builders need?

Scope expansion

A foundational study on how agents are built today.

Finding

Top engineering teams are beginners; the industry has no shared standards yet.

Reframe

An audience question, not a feature question.

Product strategy

Beginner-first: out-of-the-box dashboards, templates, built-in guidance.

✓

Business outcome

A new revenue stream in specialized agent evaluation tooling.

What I almost missed · Critical thinking

What I almost missed

I had built the study to over-represent senior AI engineers at top companies (the conventional sample for foundational research on a developer tool). Three interviews in, I noticed something off: even staff engineers at sophisticated companies were describing their agent-building workflow in tentative, exploratory language. They weren’t experts. No one is, yet. If I had read that as “we need stronger participants,” I’d have missed the finding. Instead, I treated the tentativeness itself as the data, and the study became a map of where the entire industry is, not just our sample. That reframe is what surfaced the beginner-first product strategy.

Methodological note

Every study gets a deliberate falsification pass before write-up: a structured hunt for the strongest evidence that the conclusion is wrong.

Impact

What changed because of the work

Outcome 01

New revenue stream

Specialized tooling for evaluating how well an agent performs, surfaced by the research.

Outcome 02

Beginner-first

A design philosophy shift from expert-first to out-of-the-box dashboards, templates, and built-in guidance.

Outcome 03

Build, not just run

Observability for the build-and-evaluate phase, not only for agents already in production.

Outcome 04

Strategic clarity

On where Google can lead, versus follow, as the industry settles on standards.