- Published on
Control your stack DUMP the BLack Boxes
- Authors
- Name
- Karan Prasad
- Social Links
- @thtskaran
Hexafalls: When Orchestrators Fail and Bare‑Bones RAG Wins
TL;DR: Two zombies (me and my teammate) walked into a 36‑hour hackathon (Hexafalls) with zero ideas, burned 7 hours just to pick one, chose shiny orchestration/agentic stacks (Pathway + Agno), fought dependency hell and phantom bugs, rage‑built our own microservice + minimal RAG layer in the last stretch-and shipped a real‑time Google Drive knowledge base that didn't hallucinate. We bagged 1st prize and a hard lesson: the less you depend on opaque abstractions, the happier your code-and brain-stay.
Scene Setting: Three Hackathons, One Brain Cell Left
- Week load: 3 hackathons in ~7 days. One literally the day before Hexafalls.
- Team Ingenico: Just two of us-me & my teammate.
- Energy level: "I can code with my eyes closed" level sleep deprivation.
- Time spent to decide the idea: ~7 hours of a 36‑hour event.
We finally settled on: "Self-updating Knowledge Base over Google Drive (Google Workspace Docs)"-simple, useful, mentor-approved. The catch? The architecture had to be tight.
The Original Plan (a.k.a. The Abstraction Dream)
Stack we thought would make life easier
- Ingestion/Orchestration: Pathway (could also extend to S3 later).
- RAG / Agentic Layer: Agno (agentic framework we’ve used before and liked).
- Embeddings: OpenAI
text-embedding-3-small
. - Vector Store: MongoDB (single collection, keep it stupid-simple).
Why this looked good on paper
- Rapid prototyping under time pressure.
- Pathway promised change tracking & pipelines.
- Agno promised plug‑and‑play agent + RAG scaffolding.
- We didn’t want to reinvent the wheel… until the wheel fell off.
Reality Check #1: Pathway Orchestration Implodes
Not enough docs. Sparse examples. Ancient dependencies. Random crashes. Garbage embeddings. Silent failures. Pick your poison.
What bit us:
- Outdated libs: One dependency hadn’t seen love in ~9 years.
- Weird embedding outputs: Non‑deterministic results, inexplicable vector sizes.
- Random runtime errors: Things failed after working once. Repro? Good luck.
Time sink: “Many hours” trying to coerce it into stability. No dice.
Pivot #1: Roll Our Own Orchestrator
"Screw it, I’ll just write the damn service."
What we built instead:
- A tiny microservice using the Google Drive API (GCP).
- Real-time change tracking using Drive’s
changes
endpoint (no 2‑min SHA256 polling nonsense). - Embeddings with OpenAI → stored in MongoDB (single collection).
Under the hood of the Drive job ("orchestrator")
- Auth & scope: Google Service Account (
client.json
) withdrive.readonly
. - Bootstrap:
embed_all_existing_files()
recursively scans the watch folder, exports each doc as plain text (or downloads binary) and embeds. - Change stream: Uses Drive
changes().list()
withstartPageToken
/newStartPageToken
instead of 2‑min SHA256 polling. - Folder ancestry check:
ancestors_include_target()
walks parent chains so nested files still trigger updates. - Chunking: Fixed
CHUNK_SIZE_CHARS = 2000
char windows;chunk_text()
ensures at least one chunk. - Embedding: OpenAI
text-embedding-3-small
; batch call once per file. - Upserts:
bulk_write()
withUpdateOne
+$set
/$setOnInsert
, UUID as_id
, per‑chunkmeta_data.chunk
used as idempotent key. - Interval: Tight
POLL_INTERVAL_SEC = 5
seconds; good for demo, would back off in prod. - Logging: Structured
logging
across Mongo/Google clients; early ping to crash fast on DB issues.
Mongo schema (simplified):
{
"file_id": "<gdrive_file_id>",
"chunk_id": "<uuid>",
"text": "<chunk_text>",
"embedding": [0.0123, -0.98, ...],
"updated_at": "2025-06-29T12:34:56Z"
}
Reality Check #2: Agno Starts Gaslighting Us
Agno has good docs. We’d used it before. Still…
Pain points:
- Changing Mongo connection strings sometimes worked, sometimes ghosted.
- Swapping embedding models/collection names broke flows randomly.
- We literally created DBs named
bodhi1
,bodhi2
,bodhi23
just to trick it into behaving.
The Cruel Twist
- First evaluation demo: Smooth. Real-time updates showed up; mentors smiled.
- One hour nap later: Dead. Agno stopped reading vectors; QA answers vanished.
We burned most of the remaining time trying to revive it. Sleep deprivation turned into actual hallucinations.
Pivot #2: Bare‑Bones RAG to the Rescue
We took a hallway walk (JIS Kolkata campus is pretty, btw), came back, and said: "Strip it. Build only what we control."
What we replaced Agno with
- Retriever: Cosine similarity over Mongo‑stored embeddings.
- Context Builder: Top‑k chunks concatenated with metadata.
- Prompt Template: Simple system instructions to reduce hallucination.
- LLM Call: Plain OpenAI chat/completions.
Minimal retrieval code (illustrative):
import os, json, numpy as np
from pymongo import MongoClient
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity
# ─── Config ───
MONGO_URI = os.getenv("MONGO_URI")
DB_NAME = "kb_db"
COL_NAME = "embeddings"
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL = "gpt-4o"
TOP_K = 5
# ─── Clients ───
openai = OpenAI()
col = MongoClient(MONGO_URI)[DB_NAME][COL_NAME]
# ─── Embedding & Retrieval ───
def get_query_embedding(q: str):
return openai.embeddings.create(model=EMBED_MODEL, input=[q]).data[0].embedding
def find_similar_chunks(q_vec, k: int = TOP_K):
docs = list(col.find({}, {"content": 1, "embedding": 1, "name": 1, "_id": 0}))
if not docs:
return []
mat = np.array([d["embedding"] for d in docs])
scores = cosine_similarity([q_vec], mat)[0]
idx = scores.argsort()[::-1][:k]
return [docs[i]["content"] for i in idx]
# ─── Answer generator ───
def answer(query: str):
q_vec = get_query_embedding(query)
chunks = find_similar_chunks(q_vec)
if not chunks:
return "❌ No relevant document chunks found."
context = "
---
".join(chunks)
resp = openai.chat.completions.create(
model=CHAT_MODEL,
messages=[
{"role": "system", "content": "Use ONLY the context. If it's missing, say you don't know."},
{"role": "user", "content": f"Context:
{context}
Question:
{query}"},
],
temperature=0.4, top_p=0.6, max_tokens=1200,
)
return resp.choices[0].message.content.strip()
(We actually pushed cosine into Python to avoid nasty $map gymnastics, but you get the idea.)
Under the hood of the bare‑bones RAG layer
- Retriever: Load embeddings from Mongo into a NumPy matrix (hackathon scale) and run cosine similarity (
sklearn.metrics.pairwise.cosine_similarity
). - TOP_K: 5 chunks.
- Prompting: One strict system message + user message with context; deterministic-ish params (
temperature=0.4
,top_p=0.6
). - Output: Plain text. No Pydantic/
parse()
-keep it dumb and reliable. - CLI Debug: Print the top chunks’ previews to eyeball relevance quickly.
- Fail-safe: If no chunk retrieved, return a clear "don’t know" message instead of hallucinating.
Result
- Stable.
- Zero hallucinations in finals.
- Judges leaned in.
- We won 1st prize.
Lessons I Took Home (Besides Goodies and The Prize)
- Abstraction Tax Is Real: Fancy orchestration/agent frameworks save time-until they don’t. When they fail, you pay interest + penalty.
- Understand the Critical Path: For us, it was: ingest → embed → store → retrieve → answer. Anything on that path must be under your control.
- Logs > Magic: Microservices + explicit logs beat hidden internal state every time at 3 AM.
- Design for Failure Swaps: Keep your components swappable (e.g., changing embedding model or vector store shouldn’t nuke your stack).
- Demo Early, Demo Often: First eval worked, final died-catch such regressions earlier with watchdog scripts and regression tests.
- Sleep Is a Feature: Micro‑naps are fine. Just make sure your stack doesn’t turn into Schrödinger’s system while you’re out.
"If We Had Another 12 Hours…"
Add a reranker (e.g., Cohere Rerank or OpenAI re‑embed) for better context quality.
Implement chunk caching + diffing (embed only changed parts of a doc).
Build a config-driven layer (YAML) so swapping DB/LLM is a flag, not a code edit.
Dispatch webhooks to update downstream apps when Drive changes.
A Quick Anti-Abstraction Checklist
Before you grab the next shiny orchestrator, ask:
Q1. Can I wire ingest → embed → store → retrieve → answer myself last-minute?
Q2. Do I have logs/metrics at each hop (API call, chunking, DB write, retrieval, LLM)?
Q3. Can I hot‑swap the embedder/vector DB/LLM with just env/config, not code surgery?
Q4. Are the deps/docs maintained (recent commits, examples for my exact flow)?
Q5. Do I have a smoke test/watchdog that runs end‑to‑end after every tweak (or nap)?
Q6. What’s my fallback plan if this lib dies at 3 AM-how fast can I drop to primitives?
If you answer "no" too many times, maybe… ship the bare bones first.
Architecture: Framework vs. Bare Bones
┌──────────────┐ ┌────────┐ ┌──────────┐ ┌──────────┐
│ Google Drive │ ───────▶ │ Pathway│ ───────▶│ Agno │ ───────▶│ LLM │
└──────────────┘ └────────┘ └──────────┘ └──────────┘
▲ ▲ ▲
│ │ │
└────────────── pain ─────┴──────────────────┘
The project had a few more components, but here I'm only showing what we replaced the abstractions with.
Our final pipeline:
┌──────────────┐ changes API ┌──────────────────┐ embed ┌──────────┐ cosine ┌──────────┐
│ Google Drive │ ───────────────▶│ GDrive Microserv │──────────▶│ MongoDB │──────────▶│ Prompt │
└──────────────┘ └──────────────────┘ └──────────┘ └──────────┘
▼
LLM
What You Can Reuse (Feel Free)
- Drive change tracking with
changes.list
andstartPageToken
. - Mongo single-collection strategy (simplifies joins & lookups under time pressure).
- Minimal RAG template with deterministic retrieval + strict prompting.
Closing Thoughts
We didn’t win because we had the fanciest stack. We won because when the abstractions crumbled, we knew the primitives well enough to rebuild-fast.
Build your own lever before you rely on someone else’s pulleys.
PS: Shout-outs
- My teammate, for trusting the last-minute pivot even while half-asleep.
- The mentors who liked the idea and pushed us to keep it simple.
- JIS Kolkata campus hallways for being the best debugging rubber duck.
Questions, code, or architecture deep-dive? Ping me-happy to expand any part.