CASE 03 / 04·GOOGLE CLOUD
Foundational discovery · new product area

Designing for beginners
in a beginner industry.

Foundational research on the experience of building AI agents, revealing that even top engineering teams are beginners at it, and reshaping the product strategy around that.

PROJECT META
ORG
Google Cloud
SURFACE
Observability for AI agents (new product area)
ROLE
Lead Researcher
TEAM
Embedded with the agent observability product team
TIMEFRAME
Foundational discovery program
SAMPLE
In-depth interviews across a range of companies, from large enterprises to smaller teams, all building agents for internal use
METHODS
M.01Foundational interviews
M.02Customer discovery at top companies
M.03Workflow mapping
M.04Stakeholder synthesis
SENIOR SIGNALS IN THIS CASE
Critical thinkingSystems thinkingImpact & follow-through
§ 02

TL;DR

The team knew how to support observability for agents already running in production, but not for agents still under development, even though earlier research and customer calls had shown demand for it. I led a foundational study at top companies to map how teams actually build an agent, step by step. The finding: building agents is still new territory even at the most sophisticated companies, with no shared industry practices and a real hunger for purpose-built tools. That reshaped the product into a beginner-friendly tool with out-of-the-box dashboards and templates, and opened a new revenue stream for Google Cloud: specialized tooling for evaluating how well an agent performs.

§ 03
Senior signal · Systems thinking

Systems map

From a feature-scoping question to a market opportunity

The original framing was a feature question: what observability does someone building an agent need? I pushed to widen the scope before committing to features. A foundational study at top companies revealed something the product team's hypothesis hadn't accounted for: even sophisticated engineering teams are beginners at building agents, because the whole industry still is. That changed the question from “which features” to “which audience.” Designing for beginners in a beginner industry is a completely different product than designing for experts.

  • Evaluation is the wall: Teams struggled with the basics of evaluation: choosing the right underlying model, judging whether an output was good enough, and measuring quality in any rigorous way. Most fell back on manual, ad-hoc checks.
  • Observability disappears once an agent goes live: The visibility teams leaned on while developing an agent largely fell away once it moved to a live environment, leaving them to troubleshoot a multi-step reasoning flow by hand.
  • No shared standards: With no common industry approach to building, evaluating, and observing agents, teams reinvented their own path each time; slow, inconsistent, and hard to repeat or hand off.
The opportunity hiding in the pain

The clearest unmet need across interviews was purpose-built tooling to evaluate agents rigorously rather than by hand. That demand, plus the absence of any standard way to do it, is exactly the open space that turned a feature study into a new revenue stream.

FIG. 03.1· Before / after the reframe: the same audience, going from frustrated and lost to confident and fast, once the product met them as beginners.
A two-panel illustration. The left panel, labeled “Previous Assumption,” shows developers looking frustrated and confused, surrounded by question marks and notes asking “no industry standards” and “what are best practices?” The right panel, labeled “Key Insight: Guidance, Structure, & Guardrails,” shows developers smiling and confident in front of a beginner-friendly interface with opinionated templates, out-of-the-box dashboards, specialized agent evaluation tooling, and arrows pointing to a new revenue line.
Illustration generated with Gemini
FIG. 03.2· Finding → strategy chain.
01Original briefWhat observability features do agent builders need?
02Scope expansionFoundational study on how agents are built today
03FindingTop engineering teams are beginners; the industry has no shared standards yet
04ReframeAudience question, not feature question
05Product strategyBeginner-first: out-of-the-box dashboards, templates, built-in guidance
06Business outcomeNew revenue stream in specialized agent evaluation tooling
§ 04
Senior signal · Critical thinking

What I almost missed

What I almost missed

I had built the study to over-represent senior AI engineers at top companies (the conventional sample for foundational research on a developer tool). Three interviews in, I noticed something off: even staff engineers at sophisticated companies were describing their agent-building workflow in tentative, exploratory language. They weren't experts. No one is, yet. If I had read that as “we need stronger participants” I'd have missed the finding. Instead, I treated the tentativeness itself as the data, and the study became a map of where the entire industry is, not just our sample. That reframe is what surfaced the beginner-first product strategy.

METHODOLOGICAL NOTE

Every study gets a deliberate falsification pass before write-up: a structured hunt for the strongest evidence that the conclusion is wrong.

§ 05
What changed because of the work

Impact

OUTCOME 01
New revenue stream
specialized tooling for evaluating how well an agent performs, surfaced by the research
OUTCOME 02
Design philosophy shift
from expert-first to beginner-first: out-of-the-box dashboards, templates, and built-in guidance
OUTCOME 03
Covers development, not just production
observability for the build-and-evaluate phase, not only for agents already running
OUTCOME 04
Strategic clarity
on where Google can lead, versus follow, as the industry settles on standards