Building MK ai: From a Stateless Bot to a Context-Aware AI WhatsApp Assistant
MK ai did not start as a complex system. It began as a simple experiment: connect AI to WhatsApp and make it reply.
What it became is something very different - a context-aware, location-intelligent, persistent AI assistant deployed in production.
This is the story of that evolution.
Version 1: The Stateless AI Bot
The first iteration of this project was a straightforward integration between the WhatsApp Business API and the Gemini API.
- The architecture was simple:
- Receive a message from WhatsApp.
- Send the message to Gemini.
- Return the AI-generated response.
- End the request.
There was no database.
No memory.
No context retention.
Each message was treated independently.
The project is still available here:
GitHub Repository:
https://github.com/Manasess896/Whatsapp-Bot
At the time, the goal was proof of concept — confirm that:
- Webhooks worked correctly.
- AI responses could be generated dynamically.
- Messages could be sent back through WhatsApp reliably.
It worked.
But it had limitations.
The bot forgot everything immediately. Conversations felt artificial because there was no continuity between messages.
Then the Gemini API documentation changed.
That update forced me to revisit the implementation. Instead of patching the existing system, I made a decision: rebuild the architecture properly.
That decision led to the rebranding.
Rethinking the Architecture
If I was rebuilding the project, I wanted to solve the original limitations:
- Add conversational memory
- Improve inference speed
- Introduce contextual intelligence
- Make it production-ready
Phase 1: WhatsApp Cloud API Integration
The foundation remained the WhatsApp Cloud API via Meta’s developer platform.
The setup involved:
- Creating a Business App
- Enabling WhatsApp Cloud API
- Generating a test phone number
- Configuring webhook verification
Meta requires a verification handshake to confirm server ownership. I implemented a verification route using Flask:
Once verified, the system could receive real-time message events.
Phase 2: Local Development with Ngrok
Meta requires a public HTTPS endpoint. Ngrok was used to expose the local Flask server:
This allowed live webhook testing without deploying the application repeatedly.
Phase 3: Switching AI Providers
In the first version, I used Gemini. During the rebuild, I evaluated the new documentation for Gemini models Api and their pricing and decided to look for an alternative. I transitioned to Groq for their easily understandable documentation and pricing using Llama-based models. They provide free tier with a reasonable number of limits per time. The integration was straightforward:
However, the real work was in prompt design. I created a structured system prompt that: Defined the bot`s tone and behavior, enforced strict privacy guidelines, reduced hallucination risks, maintained conversational consistent.
Phase 4: Introducing Persistent Memory with MongoDB
The biggest architectural upgrade was adding memory. I introduced MongoDB as the persistence layer. Its document structure aligned naturally with webhook payloads and message history storage.
The conversational pipeline became:
- Receive message
- Save message to database
- Retrieve last 30 messages
- Generate AI response using conversation history
- Save AI response
This enabled contextual continuity across sessions. The debugging phase here was significant. Early versions responded correctly but failed to store data consistently. Refactoring the order of database writes and AI generation resolved synchronization issues. The result was a bot that could reference prior discussions naturally.
Phase 5: Location-Aware Intelligence
While working with phone numbers, I realized they contain geographic metadata through international dial codes. I introduced a location detection module based on structured country code mappings. By use of a Json file containing all the world dial codes and respective country metadata the bot could now identify the location of the user and try to provide efficient data related to the user. Initially, location was detected only during the first interaction. Later, I refactored the system to update location metadata during every message event, accounting for user mobility.
Refactored logic:
This allowed the bot to adapt responses with subtle cultural awareness when relevant without explicitly asking users for their location.
Phase 6: Deployment and Production Hardening
After stabilizing the architecture locally, I prepared the application for production deployment.
Steps included Freezing dependencies in requirements.txt, Configuring a Procfile for Gunicorn, pushing to GitHub, Deploying to Heroku. Production debugging introduced new challenges Webhook validation errors, Environment variable management, Token handling, Intermittent 500 responses. Using structured logging and live log monitoring, I resolved these issues until webhook responses stabilized at consistent 200 OK status codes.
Final Architecture Overview
The bot now consists of:
- WhatsApp Cloud API for messaging
- Flask webhook server
- MongoDB for persistent memory
- Groq for AI inference
- Cloud hosting for production deployment
Compared to Version 1, this is no longer a simple message relay.
It is a multi-layered system that:
- Maintains conversational memory
- Updates contextual metadata dynamically
- Optimizes inference performance
- Operates reliably in production
What This Project Represents
This project reflects more than API integration skills.
It demonstrates architectural evolution after API changes, migration between AI providers, database-driven conversational design, contextual intelligence implementation, production debugging and stabilization
The transition from a stateless Gemini-based bot to a context-aware AI assistant is what truly defines this bot.
you can interact with the bot at mk ai and view the documentation from github