Razorpay's CPO Khilan Haria has talked publicly about an internal tool that automates a lot of the busywork in a PM's job, drafting PRDs, specs, prototypes, coordinating UATs, so PMs spend less time producing documents and more time on judgement: understanding customers, spotting market shifts, making product bets.
I wanted to see how hard that actually is to build, so I tried to make a small version of it myself. Four agents, one feature idea as input, each agent independently drafting one artifact: a PRD, user stories, a prototype outline, a UAT plan.
What worked immediately
The concept itself is sound, and you don't need sophisticated infrastructure to prove that. Splitting one feature idea into four narrow, single-purpose prompts, each constrained to one job, produced genuinely useful first drafts. The PRD agent didn't write a great PRD, but it wrote a usable skeleton in seconds. The same was true for the other three.
That's the actual headline, and it's worth sitting with: the fan-out pattern, one idea, many narrow agents, each owning a single artifact, works conceptually with almost no engineering effort. The hard part isn't the idea. It's everything around it.
Where it broke
My first build tried to call an AI API directly from inside an interactive widget, expecting it to
just work the way it would in a different environment. It didn't. The call failed silently, no API
key was wired up where I'd assumed one would be, and the dashboard sat there saying
Agent failed to respond while I stared at it wondering what I'd missed.
The honest fix wasn't a clever workaround. It was admitting the live demo didn't work in that environment, and running the same four prompts manually to get real output. Which, it turns out, is closer to how these systems actually get built in practice than I expected: someone runs the prompts by hand long before anyone wires up live infrastructure around them.
What a real version actually needs
Sitting with the failure forced me to think harder about what separates a toy demo from something like Compass v2. Three things, mainly:
- Grounding, not just prompting. My agents only had a one-line feature idea. A real version needs the actual funnel data, prior PRDs, the org's own templates, and existing API conventions, or every output is generic and shallow.
- Chaining, not just fan-out. My four agents ran independently and never talked to each other. A real system probably has the PRD agent's output feed into the stories agent's prompt, so the stories are grounded in an agreed problem statement, not independently guessed.
- A human review gate that's actually used. The whole point isn't to remove the PM from the loop. It's to move the PM from authoring to reviewing, and reviewing well is its own skill that doesn't get built by accident.
What I'd tell another PM trying this
Build the failure first. Don't reach for the most polished version of the idea before you've tried the simplest version and watched it not quite work. The gap between "this concept clearly works" and "this concept is actually production-ready" is where almost all of the real product thinking lives, and you only see that gap by trying to build the thing, not by reading about someone else's version of it.