CASE 03 / 04·GOOGLE CLOUD

Foundational discovery · new product area

Designing for beginners
in a beginner industry.

Foundational research on the experience of building AI agents, revealing that even top engineering teams are beginners at it, and reshaping the product strategy around that.

PROJECT META

ORG

Google Cloud

SURFACE

Observability for AI agents (new product area)

ROLE

Lead Researcher

TEAM

Embedded with the agent observability product team

TIMEFRAME

Foundational discovery program

SAMPLE

In-depth interviews across a range of companies, from large enterprises to smaller teams, all building agents for internal use

METHODS

M.01Foundational interviews

M.02Customer discovery at top companies

M.03Workflow mapping

M.04Stakeholder synthesis

SENIOR SIGNALS IN THIS CASE

Critical thinkingSystems thinkingImpact & follow-through

§ 02

TL;DR

The team knew how to support observability for agents already running in production, but not for agents still under development, even though earlier research and customer calls had shown demand for it. I led a foundational study at top companies to map how teams actually build an agent, step by step. The finding: building agents is still new territory even at the most sophisticated companies, with no shared industry practices and a real hunger for purpose-built tools. That reshaped the product into a beginner-friendly tool with out-of-the-box dashboards and templates, and opened a new revenue stream for Google Cloud: specialized tooling for evaluating how well an agent performs.

§ 03

Senior signal · Systems thinking

Systems map

From a feature-scoping question to a market opportunity

The original framing was a feature question: what observability does someone building an agent need? I pushed to widen the scope before committing to features. A foundational study at top companies revealed something the product team's hypothesis hadn't accounted for: even sophisticated engineering teams are beginners at building agents, because the whole industry still is. That changed the question from “which features” to “which audience.” Designing for beginners in a beginner industry is a completely different product than designing for experts.

Evaluation is the wall: Teams struggled with the basics of evaluation: choosing the right underlying model, judging whether an output was good enough, and measuring quality in any rigorous way. Most fell back on manual, ad-hoc checks.
Observability disappears once an agent goes live: The visibility teams leaned on while developing an agent largely fell away once it moved to a live environment, leaving them to troubleshoot a multi-step reasoning flow by hand.
No shared standards: With no common industry approach to building, evaluating, and observing agents, teams reinvented their own path each time; slow, inconsistent, and hard to repeat or hand off.

The opportunity hiding in the pain

The clearest unmet need across interviews was purpose-built tooling to evaluate agents rigorously rather than by hand. That demand, plus the absence of any standard way to do it, is exactly the open space that turned a feature study into a new revenue stream.

FIG. 03.1· Before / after the reframe: the same audience, going from frustrated and lost to confident and fast, once the product met them as beginners.

A two-panel illustration. The left panel, labeled “Previous Assumption,” shows developers looking frustrated and confused, surrounded by question marks and notes asking “no industry standards” and “what are best practices?” The right panel, labeled “Key Insight: Guidance, Structure, & Guardrails,” shows developers smiling and confident in front of a beginner-friendly interface with opinionated templates, out-of-the-box dashboards, specialized agent evaluation tooling, and arrows pointing to a new revenue line.

Illustration generated with Gemini

FIG. 03.2· Finding → strategy chain.

01Original briefWhat observability features do agent builders need?↓

02Scope expansionFoundational study on how agents are built today↓

03FindingTop engineering teams are beginners; the industry has no shared standards yet↓

04ReframeAudience question, not feature question↓

05Product strategyBeginner-first: out-of-the-box dashboards, templates, built-in guidance↓

06Business outcomeNew revenue stream in specialized agent evaluation tooling■

§ 04

Senior signal · Critical thinking

What I almost missed

I had built the study to over-represent senior AI engineers at top companies (the conventional sample for foundational research on a developer tool). Three interviews in, I noticed something off: even staff engineers at sophisticated companies were describing their agent-building workflow in tentative, exploratory language. They weren't experts. No one is, yet. If I had read that as “we need stronger participants” I'd have missed the finding. Instead, I treated the tentativeness itself as the data, and the study became a map of where the entire industry is, not just our sample. That reframe is what surfaced the beginner-first product strategy.

METHODOLOGICAL NOTE

Every study gets a deliberate falsification pass before write-up: a structured hunt for the strongest evidence that the conclusion is wrong.

§ 05

What changed because of the work

Impact

OUTCOME 01

New revenue stream

specialized tooling for evaluating how well an agent performs, surfaced by the research

OUTCOME 02

Design philosophy shift

from expert-first to beginner-first: out-of-the-box dashboards, templates, and built-in guidance

OUTCOME 03

Covers development, not just production

observability for the build-and-evaluate phase, not only for agents already running

OUTCOME 04

Strategic clarity

on where Google can lead, versus follow, as the industry settles on standards

Designing for beginnersin a beginner industry.

From a feature-scoping question to a market opportunity

What I almost missed

Designing for beginners
in a beginner industry.