Note: I had stopped writing posts in 2017. Started again in late-2024, mostly for AI.

Four Weekends Building Munshi: Notes on Product Thinking and AI Development

Aug 17, 2025 | AI

It happens to most of us multiple times a week: someone emails asking for a good time to meet, and before you know it, you’re stuck in a back-and-forth scheduling spiral. It’s a mundane friction that adds up. Four weekends ago, I decided to vibe-code my way to a solution. The result? A fragile but functional AI-native assistant I called Munshi (Hindi for secretary, pronounced moon-she). You can find it at munshi.pallavsharda.com. The project surfaced a few meta takeaways about product thinking in an AI-native workflow.

Weekend 1: The Naive Beginning

The concept felt straightforward: use an LLM (Gemini) to understand the meeting request, hook it up to the Google Calendar API, and respond conversationally with available slots. I crafted a verbose prompt describing the flow and fired up Replit.

But by the end of a $35 compute sprint, I was staring at hallucinated times and buggy calendar logic. A third of the responses were just wrong. Clearly, the problem needed more than prompting.

Weekend 2: System Design (Reluctantly)

Digging into the auto-generated code, I found the agent had chosen regex to parse user input instead of letting Gemini handle it. When I asked why, the agent admitted it was a poor choice. ‘You are absolutely right!’—if I had a refund for every time I saw that response…

This triggered my first structured step: I specified that Munshi must extract four critical elements from user input—date, time, duration, and location. Even with that clarity, things remained shaky.

Weekend 3: Prompt Discipline and Language Ambiguities

I started focusing on edge cases. How to interpret “early September” or “as soon as possible” or “early morning”? I designed a test suite and added checkpoint validations. Once I treated these vague inputs as interpretable patterns and added tests, the system started showing some promise.

Still, it wasn’t enough. I was only getting somewhere when I started asking the AI to analyze what’s wrong, propose a fix, and wait for my approval. This shifted the dynamic from blind patching to co-engineering.

Weekend 4: Architecting Munshi

I decided to treat this like a real system. I specified clear modules and asked the code to be refactored accordingly:

  • Router: Orchestrates conversation state and decides when to invoke the LLM (vs. simple parsing/validation).
  • RequestDetailsCurator: Extracts the three mandatory details in the order Duration → Date → Time, and creates a structured request from generic input (e.g., “morning,” “early September”) when needed.
  • InputValidator: Validates all fields before API calls to avoid LLM hallucinations leaking into downstream steps.
  • CalendarHandler: Manages Google Calendar API and supports multiple calendars (work, personal).
  • Matchmaker: Filters raw availability and prioritizes options according to user preferences inferred by RequestDetailsCurator.
  • Concluder: Confirms acceptance and collects the fourth detail—Location (defaults to Google Meet; if in-person, gathers specifics). Conversation wrap-up in natural language is surprisingly tricky.
  • ConversationStateHandler: Keeps track of user context
  • ErrorHandler: Centralized error logic

Once these components were delineated, bugs became easier to localize and fix. I also created a single prompts.py file to house every prompt in the system, gating all edits through manual approval.

Lessons Learned

  • Don’t be lazy just because it’s AI: I assumed vibe-coding would do the heavy lifting. By weekend two, I had only specified the four input parameters, hoping the rest would fall into place. What I did on weekend four—designing clean modules and owning the flow—should have come first.

  • Ask the AI to investigate, not fix: Early on, the agent would instantly patch issues I pointed out, often making silent changes. I lost track of what was broken vs. what was “fixed.” Now I always ask it to diagnose and propose solutions—nothing gets implemented without review.

  • Treat prompts as sacred ground: Prompts were embedded in multiple places and kept changing without oversight. Centralizing all prompts in a single file and requiring manual approval for every edit reduced hallucinations and prevented a few near-misses. Also, the Replit agent kept redundant prompt copies, which was odd.

  • Testing matters more with LLMs: Because behavior is jagged—astonishingly good in some places and oddly brittle in others (e.g., miscounting letters in “strawberry”)—you need wide-spectrum tests run often. In Munshi, this became the early-warning system for unintended regressions introduced by unrelated “fixes”.

What’s Next for Munshi?

Munshi is usable but fragile; it works in clean paths and still stumbles on ambiguous phrasing and cross-module regressions. I’m planning a full rebuild, likely using OpenAI Codex or Claude. Next feature ideas: (1) offer alternatives when requested criteria don’t match any availability, (2) send an email invite automatically once the user accepts, and (3) improve error messaging and recovery paths.

Lines between Product and Engineering are shifting

Munshi gave me a window into how product-thinking and AI-native tooling can accelerate early-stage prototyping. With a clear user journey and a conceptual architecture, you can get surprisingly far—almost to the finish line. Great software engineering still matters for scale and reliability, but early code is increasingly throwaway.. we can waste it (much like the 70s silicon valley mantra of wasting transistors). My pattern now: start with quick, nascent tinkering in an AI-native dev platform (Replit, Bolt) to refine the concept and surface kinks, then switch to a frontier model tuned for coding to harden the design.

It’s still early—jagged edges and all—product thinking matters even more now.