Personal Agent OS
A shared workspace for specialized agents, where each one gets only the context it needs.
What it taught me: agents get better when context is scoped, governed, and reviewed instead of shared indiscriminately.
Read moreProduct builder with a decade of PM experience and an engineering foundation.
I studied EECS at Berkeley and started out working as a software engineer before making the switch to product management, where I've now built a decade of experience at places like LinkedIn and Docusign. But I never really stopped missing the directness of building something myself, trying it, and seeing where it held up or fell apart.
Tools like Codex, Claude Code, Lovable, and Magic Patterns made that practical again. This site is a place to share a few things I have built, along with what they taught me about product judgment, AI workflows, and working with these tools.
Selected work
Each project has a short story about what it is, how it works, and what building it taught me.
A shared workspace for specialized agents, where each one gets only the context it needs.
What it taught me: agents get better when context is scoped, governed, and reviewed instead of shared indiscriminately.
Read moreA system for turning health research, news, and education into a clearer weekly update, with checks in place before anything goes out.
What it taught me: trust comes from retrieval, guardrails, and eval design, not just from a polished summary.
Read moreA stretching companion for desk workers that reduces the friction of deciding what to do, how long to do it, and what the body probably needs next.
What it taught me: recommendation systems feel smarter when they simplify the decision instead of exposing all the logic behind it.
Read moreA phone-first decision-support tool that recommends a club from the player's own bag with just enough context to be useful on the course.
What it taught me: transparent decision support can feel more trustworthy than a smarter-looking black box.
Read moreWhat I am learning
Some of these came from building AI systems. Some came from smaller design choices. The common thread is that building has a way of making the real product questions harder to ignore.
See longer notesThe same agent workflow can behave very differently under a different model, even when the harness, tools, and prompt are unchanged. That has made me think a lot more about regression testing and operational behavior, not just output quality.
One personal productivity workflow started to break once its working state got too large, which pushed me toward a smaller index up front and deeper context only when the task actually needed it.
The more I build systems that summarize, recommend, or act on behalf of a user, the less I trust a single pass/fail check. I keep coming back to layered evaluation: hard guardrails, softer rubrics, and a way to inspect what the system actually did.
In Desk Flex, skipping can mean not now, I already did this, or do not suggest it again. Treating those as the same signal would make the product feel dumber over time.