I tried to build a Compass-style AI agent dashboard. Here's where it broke.

The idea held up better than the implementation, and that gap is the actual lesson.

Razorpay's CPO Khilan Haria has talked publicly about an internal tool that automates a lot of the busywork in a PM's job, drafting PRDs, specs, prototypes, coordinating UATs, so PMs spend less time producing documents and more time on judgement: understanding customers, spotting market shifts, making product bets.

I wanted to see how hard that actually is to build, so I tried to make a small version of it myself. Four agents, one feature idea as input, each agent independently drafting one artifact: a PRD, user stories, a prototype outline, a UAT plan.

What worked immediately

The concept itself is sound, and you don't need sophisticated infrastructure to prove that. Splitting one feature idea into four narrow, single-purpose prompts, each constrained to one job, produced genuinely useful first drafts. The PRD agent didn't write a great PRD, but it wrote a usable skeleton in seconds. The same was true for the other three.

That's the actual headline, and it's worth sitting with: the fan-out pattern, one idea, many narrow agents, each owning a single artifact, works conceptually with almost no engineering effort. The hard part isn't the idea. It's everything around it.

Where it broke

My first build tried to call an AI API directly from inside an interactive widget, expecting it to just work the way it would in a different environment. It didn't. The call failed silently, no API key was wired up where I'd assumed one would be, and the dashboard sat there saying Agent failed to respond while I stared at it wondering what I'd missed.

The honest fix wasn't a clever workaround. It was admitting the live demo didn't work in that environment, and running the same four prompts manually to get real output. Which, it turns out, is closer to how these systems actually get built in practice than I expected: someone runs the prompts by hand long before anyone wires up live infrastructure around them.

The lesson that mattered most: a demo that fails honestly teaches you more than a demo that fakes success. The moment something breaks, you find out exactly which assumption you didn't check.

What a real version actually needs

Sitting with the failure forced me to think harder about what separates a toy demo from something like Compass v2. Three things, mainly:

What I'd tell another PM trying this

Build the failure first. Don't reach for the most polished version of the idea before you've tried the simplest version and watched it not quite work. The gap between "this concept clearly works" and "this concept is actually production-ready" is where almost all of the real product thinking lives, and you only see that gap by trying to build the thing, not by reading about someone else's version of it.

← All writing Get in touch →