- Published on
Learnings from an OnlyFans-Based AI Project
- Authors
- Name
- Karan Prasad
- @thtskaran
Recently, I undertook a technically intriguing project: developing a conversational AI tailored specifically for enhancing engagement on platforms like OnlyFans. This AI was designed to interact dynamically and authentically, closely mimicking human conversational patterns. The project delivered numerous insights, especially in the areas of semantic understanding, user profiling, and nuanced prompt engineering.
Contextual Awareness and Real-Time Engagement
My tech stack featured OpenAI's GPT-4o, Ollama's Dolphin3 model optimized for intimate chats, and ChromaDB for efficient semantic storage. As demonstrated by the chat snippets (like "hey hey! i saw ur bio and it made me...") stored in SQLite databases, the AI excelled in contextually accurate, engaging conversations, responding authentically to casual dialogues about weekend plans, personal interests, and even relationship statuses.

System Architecture & Workflow
The complete AI conversation pipeline follows this sophisticated workflow:
๐ฅ 1. Flask API Endpoint
โโโ `/generate_opener`
โโโ `/generate_reply`
โโโ `/update_user_profile`
โ
โผ
๐ 2. Authentication Layer
โโโ Validates `X-Secret-Key` from request header
โ
โผ
๐ง 3. Context Assembly
โโโ Retrieve past conversation via ChromaDB (RAG)
โโโ Retrieve user profile (preferences, interests)
โโโ Thread ID for ongoing session
โ
โผ
๐๏ธ 4. Vector Memory Layer (ChromaDB)
โโโ Uses `SentenceTransformerEmbedding` for:
โ โโโ Contextual retrieval
โ โโโ CTA detection
โโโ Stores:
โโโ Messages (user + AI)
โโโ Metadata (timestamps, thread_id)
โโโ User profiles
โ
โผ
๐ ๏ธ 5. Prompt Engineering
โโโ Persona + mood + slang settings
โโโ CTA strategy based on engagement stage
โโโ Injects retrieved history and personalization
โโโ Output formatted as JSON with `messages`, `media`, `cta_triggered`
โ
โผ
๐ฎ 6. LLM Inference (Based on Config)
โโโ [OpenAI GPT-4o] โ API call to generate response
โโโ [Ollama Dolphin3] โ Local API call for less restrictions on intimate chats
โ
โผ
๐งพ 7. Response Parsing & CTA Check
โโโ Extracts JSON content from LLM response
โโโ Confirms CTA via semantic similarity (NumPy cosine)
โโโ Flags if CTA is embedded
โ
โผ
๐จ๏ธ 8. Final AI Response
โโโ Returns:
โโโ Chat bubbles (Gen-Z style)
โโโ Suggested media (nullable)
โโโ `cta_triggered: true/false`
โ
โผ
๐ง 9. Logging + Storage
โโโ Adds assistant/user messages to ChromaDB
โโโ Periodic summarization + memory pruning
This architecture ensures seamless conversation flow while maintaining context awareness and strategic engagement optimization.
Development Timeline & Hardware Setup
This project took 2 months of intensive development with numerous iterations and back-and-forth conversations to get the AI personality and responses just right. The system is now live in production and actively handling real conversations.
Hardware Infrastructure
- GPU: NVIDIA RTX 4060 for all ML workloads
- Local Processing: Ollama Dolphin3 model runs locally for unrestricted conversations
- Cloud Integration: OpenAI GPT-4o for high-quality responses when needed
Key Learnings
Through extensive testing, I discovered that subtle, contextually-aware CTAs significantly outperformed direct promotional messaging. The iterative process of refining conversation flows and prompt engineering was crucial for achieving natural, engaging interactions.
Advanced CTA Management
I observed through trial and error that subtlety significantly outperformed direct promotional strategies. Carefully timed and natural-feeling CTAs integrated seamlessly into casual exchanges dramatically improved user responses, highlighting the importance of contextually appropriate prompt engineering, which required meticulous crafting and iterative testing.
Technical Implementation Deep Dive
Vector Embedding Strategy
The production system uses ChromaDB with sentence transformers for semantic understanding:
# Real implementation from app.py
import chromadb
from chromadb.utils import embedding_functions
def initialize_vector_store():
global chroma_client, conversation_collection, sentence_transformer_ef, cta_embeddings
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="all-mpnet-base-v2" # Production model choice
)
chroma_client = chromadb.PersistentClient(path=VECTOR_STORE_PATH)
conversation_collection = chroma_client.get_or_create_collection(
name="conversations",
embedding_function=sentence_transformer_ef,
metadata={"hnsw:space": "cosine"} # Using cosine similarity
)
# Initialize CTA embeddings for semantic detection
cta_examples = [
"check out my OnlyFans for more ๐",
"subscribe to my exclusive content when you have time",
"my link in bio has all the spicy content โค๏ธ",
"there's more on my OF if you're interested",
"have you seen my exclusive content yet?",
"you should totally check out the link in my bio",
"I have something special on my page if you wanna see"
]
cta_embeddings = [sentence_transformer_ef([example])[0] for example in cta_examples]
def get_relevant_history(thread_id: str, query_text: str, k: int) -> list:
# Retrieve more results initially for better re-ranking
initial_results = min(k * 2, 20)
results = conversation_collection.query(
query_texts=[query_text],
n_results=initial_results,
where={"thread_id": thread_id},
include=["metadatas", "documents"]
)
retrieved_messages = []
if results and results['metadatas'] and results['metadatas'][0]:
for i in range(len(results['metadatas'][0])):
meta = results['metadatas'][0][i]
original_content = meta.get("original_content", results['documents'][0][i])
retrieved_messages.append({
"role": meta["role"],
"content": original_content,
"timestamp": meta["timestamp"],
"relevance_score": 1.0 / (i + 1) # Simple relevance scoring
})
# Sort by timestamp to maintain conversation flow
retrieved_messages.sort(key=lambda x: x["timestamp"])
return retrieved_messages[:k]
API Endpoints & Production Usage
The Flask application exposes several production endpoints:
# Real API endpoints from app.py
@app.route('/generate_opener', methods=['POST'])
def generate_opener():
"""Generate conversation opening messages"""
# Validates authentication, processes model settings,
# generates contextual conversation starters
pass
@app.route('/generate_reply', methods=['POST'])
def generate_reply():
"""Generate contextual replies based on conversation history"""
# Retrieves conversation history via RAG
# Applies user profiling and sentiment analysis
# Returns JSON with messages, media suggestions, CTA flags
pass
@app.route('/update_user_profile', methods=['POST'])
def update_user_profile():
"""Update user profile information for personalization"""
# Stores user preferences, interests, communication style
# Used for future conversation personalization
pass
@app.route('/conversation_cleanup', methods=['POST'])
def cleanup_conversation():
"""Clean old messages to prevent memory bloat"""
# Implements smart memory management
# Keeps recent context while clearing old data
pass
@app.route('/health', methods=['GET'])
def health_check():
"""System health monitoring endpoint"""
return jsonify({
"status": "healthy",
"vector_store": "connected" if conversation_collection else "disconnected",
"llm_provider": USE_API_PROVIDER
})
CTA Detection with Semantic Similarity
One of the most sophisticated features is the semantic CTA detection system from the production code:
# Real implementation from app.py
def detect_cta_semantic(message_content: str) -> bool:
"""
Detect if a message contains a CTA using semantic similarity
"""
if not cta_embeddings or not sentence_transformer_ef:
# Fallback to keyword detection
cta_keywords = ["onlyfans", "of", "link in bio", "exclusive content", "subscribe", "my page"]
message_lower = message_content.lower()
return any(keyword in message_lower for keyword in cta_keywords)
try:
# Get embedding for the message
message_embedding = sentence_transformer_ef([message_content])[0]
# Calculate cosine similarity with each CTA example
max_similarity = 0.0
for cta_embedding in cta_embeddings:
similarity = np.dot(message_embedding, cta_embedding) / (
np.linalg.norm(message_embedding) * np.linalg.norm(cta_embedding)
)
max_similarity = max(max_similarity, similarity)
is_cta = max_similarity > CTA_SIMILARITY_THRESHOLD
app.logger.info(f"CTA detection: max_similarity={max_similarity:.3f}, threshold={CTA_SIMILARITY_THRESHOLD}, is_cta={is_cta}")
return is_cta
except Exception as e:
app.logger.error(f"Error in semantic CTA detection: {e}")
return False
Enhanced Personalization via User Profiling
User profiling techniques were employed using sentiment analysis and transformer-based models to dynamically tailor conversations to individual user preferences. The AI remembered previous interactions and personalized responses, as evident from stored dialogues like preferences about styles, favorite outfits, or weekend activities, thus creating highly relatable interactions.
# Real implementation from app.py
def store_user_profile_info(thread_id: str, user_info: dict):
"""Store user profile information for personalized responses"""
if not conversation_collection:
return
try:
profile_id = f"{thread_id}_profile_{uuid.uuid4().hex[:8]}"
timestamp = datetime.utcnow().isoformat()
conversation_collection.add(
documents=[json.dumps(user_info)],
metadatas=[{
"thread_id": thread_id,
"type": "user_profile",
"timestamp": timestamp,
"profile_data": user_info
}],
ids=[profile_id]
)
app.logger.info(f"Stored user profile for thread {thread_id}")
except Exception as e:
app.logger.error(f"Error storing user profile: {e}")
def clean_old_messages(thread_id: str, keep_recent: int = 100):
"""Clean old messages to prevent memory bloat while keeping recent context"""
if not conversation_collection:
return
try:
# Get all messages for this thread, sorted by timestamp
results = conversation_collection.query(
query_texts=[""],
where={"thread_id": thread_id},
n_results=1000, # Get many to sort and filter
include=["metadatas"]
)
if not results or not results['metadatas'] or not results['metadatas'][0]:
return
# Sort by timestamp and keep only recent ones
messages_with_meta = [(meta, idx) for idx, meta in enumerate(results['metadatas'][0])]
messages_with_meta.sort(key=lambda x: x[0]['timestamp'], reverse=True)
# Delete older messages beyond keep_recent threshold
if len(messages_with_meta) > keep_recent:
old_messages = messages_with_meta[keep_recent:]
for meta, _ in old_messages:
conversation_collection.delete(where={"timestamp": meta['timestamp']})
app.logger.info(f"Cleaned old messages for thread {thread_id}, kept {keep_recent} recent")
except Exception as e:
app.logger.error(f"Error cleaning old messages: {e}")
Security & Scalability Architecture
Data Privacy & Security Measures
from cryptography.fernet import Fernet
import hashlib
import secrets
class SecureDataHandler:
def __init__(self):
self.encryption_key = self._generate_encryption_key()
self.fernet = Fernet(self.encryption_key)
def store_user_data(self, user_id, conversation_data):
"""
Secure storage with encryption:
- PII data encrypted at rest
- Conversation history anonymized
- User preferences hashed
"""
anonymized_id = hashlib.sha256(f"{user_id}{secrets.token_hex(16)}".encode()).hexdigest()
encrypted_data = self.fernet.encrypt(conversation_data.encode())
return {
'anonymous_id': anonymized_id,
'encrypted_content': encrypted_data,
'storage_timestamp': datetime.utcnow()
}
Horizontal Scaling Strategy
- Load Balancer: NGINX with round-robin distribution
- API Instances: 3x Flask applications behind Gunicorn
- Database Sharding: ChromaDB collections partitioned by user cohorts
- Caching Layer: Redis for frequently accessed user profiles
- Message Queue: Celery for background processing of sentiment analysis
Rate Limiting & Resource Management
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
limiter = Limiter(
app,
key_func=get_remote_address,
default_limits=["200 per day", "50 per hour", "1 per second"]
)
@app.route('/generate_reply', methods=['POST'])
@limiter.limit("10 per minute")
def generate_reply():
"""Rate-limited endpoint with resource monitoring"""
pass
Technical Challenges: Memory and Thread Management
A critical challenge was managing long-term conversational memory and real-time thread management. Solutions included implementing time-based summaries and selective memory retention to maintain performance without sacrificing conversational depth. Threading was crucial for managing simultaneous interactions, ensuring responsiveness and seamless conversational continuity.
Sentiment Analysis and Behavioral Modeling
The AI employed transformer-based sentiment analysis to continuously evaluate user moods and adapt conversational approaches accordingly. This allowed interactions to remain sensitive and responsive to the user's emotional cues, significantly enhancing the conversational quality and user experience.
Sentiment Analysis Pipeline
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
class EmotionalIntelligence:
def __init__(self):
self.sentiment_analyzer = pipeline(
"sentiment-analysis",
model="cardiffnlp/twitter-roberta-base-sentiment-latest",
device=0 # GPU acceleration
)
self.emotion_detector = pipeline(
"text-classification",
model="j-hartmann/emotion-english-distilroberta-base"
)
def analyze_user_state(self, message_history):
"""
Real-time emotional state analysis:
- Sentiment trajectory over conversation
- Emotion intensity mapping
- Engagement level prediction
"""
emotions = []
sentiments = []
for msg in message_history[-10:]: # Last 10 messages
emotion = self.emotion_detector(msg['content'])[0]
sentiment = self.sentiment_analyzer(msg['content'])[0]
emotions.append(emotion)
sentiments.append(sentiment)
return {
'current_mood': emotions[-1]['label'],
'mood_intensity': emotions[-1]['score'],
'sentiment_trend': self._calculate_trend(sentiments),
'engagement_prediction': self._predict_engagement(emotions, sentiments)
}
Model Performance Analysis
Through extensive testing and optimization, I found that different models served specific purposes:
Model | Primary Use Case | Strengths |
---|---|---|
GPT-4o | High-quality responses | Superior reasoning and context understanding |
Ollama Dolphin3 | Intimate conversations | Runs locally, fewer content restrictions |
Custom Fine-tuned | Persona-specific responses | Tailored to specific conversation styles |
A/B Testing & Continuous Learning Framework
Experimental Design
class ConversationExperiment:
def __init__(self):
self.variants = {
'control': {'cta_approach': 'baseline', 'slang_usage': 'moderate'},
'variant_a': {'cta_approach': 'subtle', 'slang_usage': 'high'},
'variant_b': {'cta_approach': 'contextual', 'slang_usage': 'low'}
}
def assign_variant(self, user_id):
"""Consistent user assignment to experimental groups"""
hash_value = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
return ['control', 'variant_a', 'variant_b'][hash_value % 3]
def track_outcomes(self, user_id, variant, interaction_data):
"""Track conversation engagement and success patterns"""
metrics = {
'user_id': user_id,
'variant': variant,
'engagement_signals': interaction_data['user_responses'],
'conversation_flow': interaction_data['message_progression'],
'contextual_relevance': interaction_data['topic_alignment']
}
self._log_experiment_data(metrics)
Real-time Model Adaptation
- Online Learning: Continuous model updates based on conversation success
- Feedback Loops: User response patterns inform prompt adjustments
- Performance Monitoring: Real-time dashboard tracking key metrics
- Automated Rollback: System reverts to previous version if performance degrades
Data Optimization and Summarization
Periodic summarization routines were essential to maintain the ChromaDB vector database efficiently. By clearing outdated information and summarizing relevant interactions, I ensured optimal performance while preserving critical conversational context.
Technical Roadmap & Advanced Features
Phase 2: Multi-Modal Intelligence
- Voice Analysis: Integrate speech-to-text for voice message understanding
- Intelligent Image Selection: AI will automatically select appropriate images from a curated pool based on detected user sentiment and conversation context
- Image Context: Computer vision for profile photo and media analysis
- Video Processing: Automated content tagging and context extraction
Phase 3: Advanced AI Capabilities
# Planned implementations
upcoming_features = {
'sentiment_based_media': 'Smart image selection based on emotional context',
'multimodal_embeddings': 'CLIP-based image-text understanding',
'voice_synthesis': 'ElevenLabs integration for voice responses',
'predictive_analytics': 'User behavior prediction models',
'real_time_personalization': 'Dynamic persona adaptation',
'cross_platform_sync': 'Unified user experience across platforms'
}
Phase 4: Autonomous Optimization
- Self-improving prompts: AI-driven prompt optimization
- Dynamic media matching: Emotional intelligence-driven image and content selection
- Predictive user matching: AI-powered compatibility scoring
- Emotional journey mapping: Long-term relationship modeling
Broader Applications and Future Directions
While initially targeted at the OnlyFans platform, the AI's methodologies have broader implications in customer support, personalized marketing, and advanced virtual assistants. Future improvements will focus on deeper personalization, more advanced memory management, and enhanced contextual summarization.
Conclusion
This AI project highlighted the importance of nuanced CTA management, personalized interaction, semantic understanding, and efficient memory handling. These insights contribute substantially toward creating highly effective and genuinely engaging conversational AI systems across various platforms.
The success of this system validates the importance of understanding both the technical architecture and the human psychology behind effective conversational AI. These learnings continue to inform my work at DevsDose, where we help startups and creators build similar AI-powered solutions for their unique needs.
If you're interested in building conversational AI systems or exploring how automation can enhance user engagement, feel free to learn more about my background or reach out to discuss your project requirements.
Stay tuned for more deep dives into AI technology, innovative project developments, and technical methodologies.