Building a Full‑Stack, Policy‑Agnostic Conversational Commerce AI with FSM & ML-Driven CTA

Three months. 14,000 rows of custom data. A home‑grown dataset. Zero reliance on heavily filtered SaaS LLMs. This is the story (and the architecture) behind my persona‑driven, FSM-powered conversion engine for creator/agency workflows.

TL;DR

I built an end‑to‑end AI system that:

Uses a Finite State Machine (FSM) to break the user journey into explicit steps: Rapport Building, Deepening, CTA, and Objection Handling.
Offloads CTA decisions to a RandomForest classifier microservice (not the LLM), using message counts, sentiment, phase, and more.
Trained both the LLM and classifier on 14,000+ rows of data generated and reviewed with nous-hermes2-mixtral.
Runs a local/hosted LLM stack (no dependency on centralized, policy‑heavy APIs) with a pluggable provider interface.
Uses RAG over Google Drive persona folders, advanced sentiment/emotion/sarcasm analysis, and vectorized user memories in MongoDB.
Enforces a strict JSON I/O contract for deterministic frontend/bot consumption.
Is evolving: more ML microservices, intent transformers, and distributed deployment coming soon.

Why Build My Own Stack?

Centrally hosted LLM APIs are:

Heavily filtered/censored - especially for edgy or roleplay-heavy domains.
Expensive at scale and subject to rate limits or sudden policy shifts.
Opaque - zero control over internal safety layers, logits, or fine‑tune hooks.

So I went full‑stack: self‑hosted models (Ollama / custom fine‑tunes), my own embeddings, my own memory & sentiment layer, and my own orchestration logic. FSMs and microservices let me control every step of the user journey and optimize for conversions.

Core Problem Statement

Hold natural, persona‑consistent conversations that feel human.
Explicitly manage the user journey with FSM: rapport, deepening, CTA, objection handling.
Detect the right time to drop a CTA, using a dedicated ML classifier.
Remember user details (interests, emotions, objections) across sessions.
Keep improving: A/B test tactics, learn from objections, optimize conversion rate.

High‑Level Architecture

[Client / Bot Frontends]
           |
           v
 +------------------+
 |    Flask API     |  (/chat, /personas)
 +------------------+
           |
           v
 +--------------------------------------------------------------+
 |                  Orchestration Layer                         |
 |  ├─ FSM Controller  (Rapport → Deepening → CTA → Objection)   |
 |  ├─ Persona Loader  (Google Drive Sync)                       |
 |  ├─ History & Memory (MongoDB)                               |
 |  ├─ Sentiment & Emotion Engine                               |
 |  └─ JSON Output Enforcer                                     |
 +--------------------------------------------------------------+
           |                               ▲
           |                               |
           |        Background Sync        |  Google Drive
           |       (profiles, phrases,     |  (persona folders)
           |        knowledge docs)        |
           v                               |
 +-------------------------+               |
 | LLM Provider Interface | ───▶ Local/Hosted LLMs (Ollama, FT)
 +-------------------------+               |
           |                               |
           v                               |
 +-------------------------+               |
 |   Embeddings Service   | ───▶ NumPy/Cosine Similarity
 +-------------------------+               |
           |                               |
           v                               |
 +-------------------------+               |
 |        MongoDB         | (chat_histories, user_profiles,      |
 |                        |  persona_*)                          |
 +-------------------------+               |
           |                               |
           v                               |
 +----------------------------------+      |
 | RandomForest CTA Classifier      | ◀────+
 | Microservice (CTA NOW / WAIT /   |
 | NEVER decisions)                 |
 +----------------------------------+

FSM: User Journey Control

The conversation is now managed by a Finite State Machine with four main phases:

Rapport Building: Humor, mirroring, establishing common ground.
Deepening: Asking about hobbies, routines, struggles, and building memory hooks.
CTA (Call To Action): Seamless, context-aware introduction of value propositions.
Objection Handling: Detect and address hesitations ("too expensive", "maybe later").

Each phase is explicit and externally scaffolded, not just hidden in prompts. The FSM ensures the LLM focuses on the right conversational goal at each step.

CTA: Offloaded to a RandomForest Classifier Microservice

Instead of letting the LLM decide when to CTA, I built a RandomForest classifier as a microservice. It takes as input:

Message counts
Sentiment scores
Current FSM phase
Persona and user features
Conversation context

And outputs a decision: CTA NOW, WAIT, or NEVER CTA.

This keeps the LLM focused on conversation quality, while the microservice makes deterministic, explainable CTA decisions.

Training Details:

Data: 10,000+ carefully crafted and manually reviewed data points, with reasoning for each path.
Model: RandomForest, trained and validated on real and synthetic chat journeys.
LLM: Trained on 14,000 rows generated with nous-hermes2-mixtral (same model for both LLM and classifier data).
Trainer Stats (LLM):

Classifier Testing:
Classifier test input/output, showing "CTA NOW" decision

Persona System: Google Drive → Mongo → RAG

Why Drive? Non‑technical collaborators can drop/update docs. Each persona has:

profile (key:value): model_name, age, city, studies, tactics, emotional_expressions, cta_concept…
common_phrases: the persona’s slang and go‑to lines.
Knowledge docs: any background text split into ~1.5k char chunks, embedded and stored.

Sync Thread Flow:

List persona folders under WATCH_FOLDER_ID.
For each file changed, download as text.
If profile: parse key:value pairs → Mongo.
If common_phrases: store raw text.
Else: chunk & embed → bulk write to Mongo.
Maintain indices for fast retrieval.

This makes personas hot‑swappable. Edit a doc → the bot evolves instantly.

Chat Orchestration Flow

/chat POST: { persona_name, user_id, message, cta_introduced_count }.
FSM Phase Determination: FSM decides which phase the user is in.
Sentiment Analysis: returns sentiment, intensity, emotions, sarcasm flags, and a context‑aware sentiment.
Update User Profile (async): extract facts, embed, push to Mongo, maintain sentiment history, summaries every N messages.
Save Message to History (async): embed + store.
Retrieve Context:
- Last N history messages by similarity + recency decay.
- Persona profile + phrases + top‑k knowledge chunks.
- Relevant user vectors & summaries.
Construct System Prompt:
- Persona voice + phrases.
- Personalization context (facts/interests).
- Sentiment guidance.
- FSM phase instruction.
- Strict JSON output schema.
LLM Call via AIClient.
CTA Decision: Call RandomForest classifier microservice with current context.
Parse/Validate JSON: extractions, CTA semantic detection fallback.
Save Assistant Replies (async) & respond.

Advanced Sentiment & Emotion Stack

Transformer pipelines for:
- Sentiment (twitter-roberta)
- Emotions (distilroberta emotion classifier)
- Sarcasm (sarcasm detector)
VADER fallback for intensity scoring.
Contextual Adjustments: last N sentiments to detect patterns (sarcastic_positive, forced_positive, etc).

Guidance Generation:
Based on context-aware sentiment + top emotion, the system outputs instructions for the persona (e.g., "User is anxious - be reassuring, skip promos"). This is injected into the system prompt.

Memory & Profiling in MongoDB

Every user interaction updates:

facts.name / occupation / studies / relationship_status / location
facts.interests / dislikes / current_activities / upcoming_events (arrays, de‑duplicated)
profile_embeddings: pre‑computed vectors by category (interests, dislikes…)
profile_vectors: item‑level vectors for fine‑grained retrieval.
sentiment_history: capped array of last 20 sentiment snapshots.
conversation_summaries: every N messages a summary + embedding.

Retrieval for personalization uses cosine similarity over the query embedding to pick the most relevant facts, vectors, and summaries.

Dataset & Fine‑Tuning

LLM: Fine‑tuned on 14,000 rows generated with nous-hermes2-mixtral, including multi-turn chats and CTA scenarios.
Classifier: Trained on 10,000+ data points, each manually reviewed with reasoning for CTA path.

Intent Classification & Objection Analysis

Intent Detection Microservice

Currently, I'm using a small OpenAI API call as a microservice to label user intents (inquiry, objection, interest, readiness). It's working well, but the goal is to offload from external services entirely.

Current Flow:

Extract user message → OpenAI intent classifier → structured intent labels
Next Phase: Replace with a custom transformer model trained on our domain-specific intent patterns

Objection Type Classification

Beyond detecting that an objection occurred, the system now classifies objection types:

Pricing: "Too expensive", "Can't afford it"
Trust: "Not sure about this", "Seems fishy"
Need: "Don't really need this", "Maybe later"
Value: "Don't see the point", "What's in it for me?"
Converted: User showed buying signals

This objection microservice (currently OpenAI-based, migrating to transformer) feeds into a continuous learning loop.

Continuous Learning Loop

The objection data creates a feedback mechanism:

Track Objection Patterns: What objections kill conversions? At what message counts?
User Lifecycle Memory: If user objected to pricing last time, adjust CTA strategy for next interaction.
Retrain RandomForest Classifier: Use objection outcomes to improve CTA timing predictions.
Persona Optimization: Learn which personas handle specific objections better.

This turns every failed conversion into training data for the next attempt.

Training Philosophy: Quality Over Quantity

Current Model Stats:

14,000 rows of training data
3 epochs (sounds low, but performance is excellent)
Strong validation metrics + real-world quality

Why 3 epochs worked:

High-quality, manually reviewed data
Domain-specific fine-tuning on nous-hermes2-mixtral base
Focused scope (conversation generation, not general knowledge)

Next Training Cycle:

Larger dataset (targeting 50k+ rows)
Experimenting with epoch counts (3, 5, 8) to optimize validation loss vs. real-world performance
A/B testing different training approaches for maximum output quality

Evolution: From Simple RAG to Full-Stack Orchestration

This project has dramatically evolved from a basic RAG-over-system-prompt setup to a complex, dynamic full-stack AI system.

What Changed:

Before (Simple RAG):

LLM did everything: memory, emotion detection, CTA decisions, persona consistency
Single prompt with document chunks
No state management or journey control

Now (Orchestrated System):

LLM task: Generate contextually appropriate replies (what it's best at)
System handles: User emotion, memory, state transitions, CTA timing, intent classification, objection types
Modular microservices: Each component optimized for its specific task
FSM control: Explicit journey management

The LLM's Role Today:

The LLM is now focused solely on what it excels at: generating natural, persona-consistent conversation responses. Everything else is handled by specialized components:

Emotion/Sentiment: Transformer pipelines + VADER
Memory: MongoDB vector storage + retrieval
State: FSM controller
CTA Timing: RandomForest classifier
Intent/Objections: Dedicated microservices
Personalization: RAG over user vectors + persona knowledge

This separation of concerns makes each component more reliable, testable, and optimizable.

Roadmap: What's Next

Custom Intent Transformer: Replace OpenAI dependency with domain-specific intent classification model
Objection Transformer: Self-hosted objection type classification to complete the external service migration
Personalized Transformer for Intent Detection: Custom transformer models for deeper intent and objection detection.
Further LLM Training: Larger dataset (50k+ rows), optimized epoch testing for validation loss vs. real-world quality
Continuous Learning Pipeline: Automated retraining based on objection outcomes and conversion data
Distributed Deployment: Microservices for each major component (LLM, classifier, sentiment, RAG) deployed across multiple platforms for scalability and resilience.
Frontend & Analytics: React dashboard for funnel tracking, persona editing, chat replay, and data labeling.
More ML Microservices: Auto-summary scorer, persona drift detector, and conversation quality metrics.

Lessons Learned

FSMs give you real control: Explicitly managing the user journey unlocks new levels of optimization.
Offloading logic to microservices: Keeps the LLM focused and the system modular.
Quality beats quantity: 14k high-quality rows + 3 epochs can outperform massive, unfocused datasets.
Continuous learning loops: Every objection becomes training data for better future interactions.
Separation of concerns: Let the LLM generate replies, handle everything else with specialized components.
Own your data: Scrape, curate, fine‑tune. It's the only way to avoid policy drift.
Backend/DevOps skills are critical: Monitoring, retries, background workers, caching - all essential for production AI.

Final Thoughts

What started as a “better conversion DM bot” is now a modular, FSM-driven AI stack: data pipelines, embeddings, fine‑tuning, RAG, emotional intelligence, and conversion science - all orchestrated by microservices.

It’s built. It works. And it’s evolving daily.

If you’re building in this space (any vertical) and want to jam on controllable, self‑hosted AI systems - let’s talk.