Published on

Building a Full‑Stack, Policy‑Agnostic Conversational Commerce AI with FSM & ML-Driven CTA

Authors

Three months. 14,000 rows of custom data. A home‑grown dataset. Zero reliance on heavily filtered SaaS LLMs. This is the story (and the architecture) behind my persona‑driven, FSM-powered conversion engine for creator/agency workflows.


TL;DR

I built an end‑to‑end AI system that:

  • Uses a Finite State Machine (FSM) to break the user journey into explicit steps: Rapport Building, Deepening, CTA, and Objection Handling.
  • Offloads CTA decisions to a RandomForest classifier microservice (not the LLM), using message counts, sentiment, phase, and more.
  • Trained both the LLM and classifier on 14,000+ rows of data generated and reviewed with nous-hermes2-mixtral.
  • Runs a local/hosted LLM stack (no dependency on centralized, policy‑heavy APIs) with a pluggable provider interface.
  • Uses RAG over Google Drive persona folders, advanced sentiment/emotion/sarcasm analysis, and vectorized user memories in MongoDB.
  • Enforces a strict JSON I/O contract for deterministic frontend/bot consumption.
  • Is evolving: more ML microservices, intent transformers, and distributed deployment coming soon.

Why Build My Own Stack?

Centrally hosted LLM APIs are:

  • Heavily filtered/censored — especially for edgy or roleplay-heavy domains.
  • Expensive at scale and subject to rate limits or sudden policy shifts.
  • Opaque — zero control over internal safety layers, logits, or fine‑tune hooks.

So I went full‑stack: self‑hosted models (Ollama / custom fine‑tunes), my own embeddings, my own memory & sentiment layer, and my own orchestration logic. FSMs and microservices let me control every step of the user journey and optimize for conversions.


Core Problem Statement

  1. Hold natural, persona‑consistent conversations that feel human.
  2. Explicitly manage the user journey with FSM: rapport, deepening, CTA, objection handling.
  3. Detect the right time to drop a CTA, using a dedicated ML classifier.
  4. Remember user details (interests, emotions, objections) across sessions.
  5. Keep improving: A/B test tactics, learn from objections, optimize conversion rate.

High‑Level Architecture

[Client / Bot Frontends]
           |
           v
 +------------------+
 |    Flask API     |  (/chat, /personas)
 +------------------+
           |
           v
 +--------------------------------------------------------------+
 |                  Orchestration Layer                         |
 |  ├─ FSM Controller  (RapportDeepeningCTAObjection)   |
 |  ├─ Persona Loader  (Google Drive Sync)                       |
 |  ├─ History & Memory (MongoDB)                               |
 |  ├─ Sentiment & Emotion Engine                               |
 |  └─ JSON Output Enforcer                                     |
 +--------------------------------------------------------------+
           |           |                               |
           |        Background Sync        |  Google Drive
           |       (profiles, phrases,     |  (persona folders)
           |        knowledge docs)        |
           v                               |
 +-------------------------+               |
 | LLM Provider Interface | ───▶ Local/Hosted LLMs (Ollama, FT)
 +-------------------------+               |
           |                               |
           v                               |
 +-------------------------+               |
 |   Embeddings Service   | ───▶ NumPy/Cosine Similarity
 +-------------------------+               |
           |                               |
           v                               |
 +-------------------------+               |
 |        MongoDB         | (chat_histories, user_profiles,      |
 |                        |  persona_*)                          |
 +-------------------------+               |
           |                               |
           v                               |
 +----------------------------------+      |
 | RandomForest CTA Classifier      | ◀────+
 | Microservice (CTA NOW / WAIT /   |
 | NEVER decisions)                 |
 +----------------------------------+

FSM: User Journey Control

The conversation is now managed by a Finite State Machine with four main phases:

  1. Rapport Building: Humor, mirroring, establishing common ground.
  2. Deepening: Asking about hobbies, routines, struggles, and building memory hooks.
  3. CTA (Call To Action): Seamless, context-aware introduction of value propositions.
  4. Objection Handling: Detect and address hesitations ("too expensive", "maybe later").

Each phase is explicit and externally scaffolded, not just hidden in prompts. The FSM ensures the LLM focuses on the right conversational goal at each step.


CTA: Offloaded to a RandomForest Classifier Microservice

Instead of letting the LLM decide when to CTA, I built a RandomForest classifier as a microservice. It takes as input:

  • Message counts
  • Sentiment scores
  • Current FSM phase
  • Persona and user features
  • Conversation context

And outputs a decision: CTA NOW, WAIT, or NEVER CTA.

This keeps the LLM focused on conversation quality, while the microservice makes deterministic, explainable CTA decisions.

Training Details:

  • Data: 10,000+ carefully crafted and manually reviewed data points, with reasoning for each path.
  • Model: RandomForest, trained and validated on real and synthetic chat journeys.
  • LLM: Trained on 14,000 rows generated with nous-hermes2-mixtral (same model for both LLM and classifier data).
  • Trainer Stats:
    Model loss curves and training stats
Classifier Testing Example:
Classifier test input/output, showing "CTA NOW" decision

Persona System: Google Drive → Mongo → RAG

Why Drive? Non‑technical collaborators can drop/update docs. Each persona has:

  • profile (key:value): model_name, age, city, studies, tactics, emotional_expressions, cta_concept…
  • common_phrases: the persona’s slang and go‑to lines.
  • Knowledge docs: any background text split into ~1.5k char chunks, embedded and stored.

Sync Thread Flow:

  1. List persona folders under WATCH_FOLDER_ID.
  2. For each file changed, download as text.
  3. If profile: parse key:value pairs → Mongo.
  4. If common_phrases: store raw text.
  5. Else: chunk & embed → bulk write to Mongo.
  6. Maintain indices for fast retrieval.

This makes personas hot‑swappable. Edit a doc → the bot evolves instantly.


Chat Orchestration Flow

  1. /chat POST: { persona_name, user_id, message, cta_introduced_count }.

  2. FSM Phase Determination: FSM decides which phase the user is in.

  3. Sentiment Analysis: returns sentiment, intensity, emotions, sarcasm flags, and a context‑aware sentiment.

  4. Update User Profile (async): extract facts, embed, push to Mongo, maintain sentiment history, summaries every N messages.

  5. Save Message to History (async): embed + store.

  6. Retrieve Context:

    • Last N history messages by similarity + recency decay.
    • Persona profile + phrases + top‑k knowledge chunks.
    • Relevant user vectors & summaries.
  7. Construct System Prompt:

    • Persona voice + phrases.
    • Personalization context (facts/interests).
    • Sentiment guidance.
    • FSM phase instruction.
    • Strict JSON output schema.
  8. LLM Call via AIClient.

  9. CTA Decision: Call RandomForest classifier microservice with current context.

  10. Parse/Validate JSON: extractions, CTA semantic detection fallback.

  11. Save Assistant Replies (async) & respond.


Advanced Sentiment & Emotion Stack

  • Transformer pipelines for:
    • Sentiment (twitter-roberta)
    • Emotions (distilroberta emotion classifier)
    • Sarcasm (sarcasm detector)
  • VADER fallback for intensity scoring.
  • Contextual Adjustments: last N sentiments to detect patterns (sarcastic_positive, forced_positive, etc).

Guidance Generation:
Based on context-aware sentiment + top emotion, the system outputs instructions for the persona (e.g., "User is anxious — be reassuring, skip promos"). This is injected into the system prompt.


Memory & Profiling in MongoDB

Every user interaction updates:

  • facts.name / occupation / studies / relationship_status / location
  • facts.interests / dislikes / current_activities / upcoming_events (arrays, de‑duplicated)
  • profile_embeddings: pre‑computed vectors by category (interests, dislikes…)
  • profile_vectors: item‑level vectors for fine‑grained retrieval.
  • sentiment_history: capped array of last 20 sentiment snapshots.
  • conversation_summaries: every N messages a summary + embedding.

Retrieval for personalization uses cosine similarity over the query embedding to pick the most relevant facts, vectors, and summaries.


Dataset & Fine‑Tuning

  • LLM: Fine‑tuned on 14,000 rows generated with nous-hermes2-mixtral, including multi-turn chats and CTA scenarios.
  • Classifier: Trained on 10,000+ data points, each manually reviewed with reasoning for CTA path.

Roadmap: What’s Next

  • Personalized Transformer for Intent Detection: Custom transformer models for deeper intent and objection detection.
  • Further LLM Training: More data, more edge cases, more persona diversity.
  • Distributed Deployment: Microservices for each major component (LLM, classifier, sentiment, RAG) deployed across multiple platforms for scalability and resilience.
  • Frontend & Analytics: React dashboard for funnel tracking, persona editing, chat replay, and data labeling.
  • More ML Microservices: Objection classifier, persona drift detector, auto-summary scorer, and more.

Lessons Learned

  • FSMs give you real control: Explicitly managing the user journey unlocks new levels of optimization.
  • Offloading logic to microservices: Keeps the LLM focused and the system modular.
  • Own your data: Scrape, curate, fine‑tune. It’s the only way to avoid policy drift.
  • Backend/DevOps skills are critical: Monitoring, retries, background workers, caching — all essential for production AI.

Final Thoughts

What started as a “better conversion DM bot” is now a modular, FSM-driven AI stack: data pipelines, embeddings, fine‑tuning, RAG, emotional intelligence, and conversion science — all orchestrated by microservices.

It’s built. It works. And it’s evolving daily.

If you’re building in this space (any vertical) and want to jam on controllable, self‑hosted AI systems — let’s talk.