- Published on
Control your stack DUMP the BLack Boxes
- Authors
- Name
- Karan Prasad
- Social Links
- @thtskaran
Hexafalls: When Orchestrators Fail and Bare‑Bones RAG Wins
TL;DR: Two zombies (me and my teammate) walked into a 36‑hour hackathon (Hexafalls) with zero ideas, burned 7 hours just to pick one, chose shiny orchestration/agentic stacks (Pathway + Agno), fought dependency hell and phantom bugs, rage‑built our own microservice + minimal RAG layer in the last stretch—and shipped a real‑time Google Drive knowledge base that didn't hallucinate. We bagged 1st prize and a hard lesson: the less you depend on opaque abstractions, the happier your code—and brain—stay.
Scene Setting: Three Hackathons, One Brain Cell Left
- Week load: 3 hackathons in ~7 days. One literally the day before Hexafalls.
- Team Ingenico: Just two of us—me & my teammate.
- Energy level: "I can code with my eyes closed" level sleep deprivation.
- Time spent to decide the idea: ~7 hours of a 36‑hour event.
We finally settled on: "Self-updating Knowledge Base over Google Drive (Google Workspace Docs)"—simple, useful, mentor-approved. The catch? The architecture had to be tight.
The Original Plan (a.k.a. The Abstraction Dream)
Stack we thought would make life easier
- Ingestion/Orchestration: Pathway (could also extend to S3 later).
- RAG / Agentic Layer: Agno (agentic framework we’ve used before and liked).
- Embeddings: OpenAI
text-embedding-3-small
. - Vector Store: MongoDB (single collection, keep it stupid-simple).
Why this looked good on paper
- Rapid prototyping under time pressure.
- Pathway promised change tracking & pipelines.
- Agno promised plug‑and‑play agent + RAG scaffolding.
- We didn’t want to reinvent the wheel… until the wheel fell off.
Reality Check #1: Pathway Orchestration Implodes
Not enough docs. Sparse examples. Ancient dependencies. Random crashes. Garbage embeddings. Silent failures. Pick your poison.
What bit us:
- Outdated libs: One dependency hadn’t seen love in ~9 years.
- Weird embedding outputs: Non‑deterministic results, inexplicable vector sizes.
- Random runtime errors: Things failed after working once. Repro? Good luck.
Time sink: “Many hours” trying to coerce it into stability. No dice.
Pivot #1: Roll Our Own Orchestrator
"Screw it, I’ll just write the damn service."
What we built instead:
- A tiny microservice using the Google Drive API (GCP).
- Real-time change tracking using Drive’s
changes
endpoint (no 2‑min SHA256 polling nonsense). - Embeddings with OpenAI → stored in MongoDB (single collection).
Under the hood of the Drive job ("orchestrator")
- Auth & scope: Google Service Account (
client.json
) withdrive.readonly
. - Bootstrap:
embed_all_existing_files()
recursively scans the watch folder, exports each doc as plain text (or downloads binary) and embeds. - Change stream: Uses Drive
changes().list()
withstartPageToken
/newStartPageToken
instead of 2‑min SHA256 polling. - Folder ancestry check:
ancestors_include_target()
walks parent chains so nested files still trigger updates. - Chunking: Fixed
CHUNK_SIZE_CHARS = 2000
char windows;chunk_text()
ensures at least one chunk. - Embedding: OpenAI
text-embedding-3-small
; batch call once per file. - Upserts:
bulk_write()
withUpdateOne
+$set
/$setOnInsert
, UUID as_id
, per‑chunkmeta_data.chunk
used as idempotent key. - Interval: Tight
POLL_INTERVAL_SEC = 5
seconds; good for demo, would back off in prod. - Logging: Structured
logging
across Mongo/Google clients; early ping to crash fast on DB issues.
Mongo schema (simplified):
{
"file_id": "<gdrive_file_id>",
"chunk_id": "<uuid>",
"text": "<chunk_text>",
"embedding": [0.0123, -0.98, ...],
"updated_at": "2025-06-29T12:34:56Z"
}
Reality Check #2: Agno Starts Gaslighting Us
Agno has good docs. We’d used it before. Still…
Pain points:
- Changing Mongo connection strings sometimes worked, sometimes ghosted.
- Swapping embedding models/collection names broke flows randomly.
- We literally created DBs named
bodhi1
,bodhi2
,bodhi23
just to trick it into behaving.
The Cruel Twist
- First evaluation demo: Smooth. Real-time updates showed up; mentors smiled.
- One hour nap later: Dead. Agno stopped reading vectors; QA answers vanished.
We burned most of the remaining time trying to revive it. Sleep deprivation turned into actual hallucinations.
Pivot #2: Bare‑Bones RAG to the Rescue
We took a hallway walk (JIS Kolkata campus is pretty, btw), came back, and said: "Strip it. Build only what we control."
What we replaced Agno with
- Retriever: Cosine similarity over Mongo‑stored embeddings.
- Context Builder: Top‑k chunks concatenated with metadata.
- Prompt Template: Simple system instructions to reduce hallucination.
- LLM Call: Plain OpenAI chat/completions.
Minimal retrieval code (illustrative):
import os, json, numpy as np
from pymongo import MongoClient
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity
# ─── Config ───
MONGO_URI = os.getenv("MONGO_URI")
DB_NAME = "kb_db"
COL_NAME = "embeddings"
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL = "gpt-4o"
TOP_K = 5
# ─── Clients ───
openai = OpenAI()
col = MongoClient(MONGO_URI)[DB_NAME][COL_NAME]
# ─── Embedding & Retrieval ───
def get_query_embedding(q: str):
return openai.embeddings.create(model=EMBED_MODEL, input=[q]).data[0].embedding
def find_similar_chunks(q_vec, k: int = TOP_K):
docs = list(col.find({}, {"content": 1, "embedding": 1, "name": 1, "_id": 0}))
if not docs:
return []
mat = np.array([d["embedding"] for d in docs])
scores = cosine_similarity([q_vec], mat)[0]
idx = scores.argsort()[::-1][:k]
return [docs[i]["content"] for i in idx]
# ─── Answer generator ───
def answer(query: str):
q_vec = get_query_embedding(query)
chunks = find_similar_chunks(q_vec)
if not chunks:
return "❌ No relevant document chunks found."
context = "
---
".join(chunks)
resp = openai.chat.completions.create(
model=CHAT_MODEL,
messages=[
{"role": "system", "content": "Use ONLY the context. If it's missing, say you don't know."},
{"role": "user", "content": f"Context:
{context}
Question:
{query}"},
],
temperature=0.4, top_p=0.6, max_tokens=1200,
)
return resp.choices[0].message.content.strip()
(We actually pushed cosine into Python to avoid nasty $map gymnastics, but you get the idea.)
Under the hood of the bare‑bones RAG layer
- Retriever: Load embeddings from Mongo into a NumPy matrix (hackathon scale) and run cosine similarity (
sklearn.metrics.pairwise.cosine_similarity
). - TOP_K: 5 chunks.
- Prompting: One strict system message + user message with context; deterministic-ish params (
temperature=0.4
,top_p=0.6
). - Output: Plain text. No Pydantic/
parse()
—keep it dumb and reliable. - CLI Debug: Print the top chunks’ previews to eyeball relevance quickly.
- Fail-safe: If no chunk retrieved, return a clear "don’t know" message instead of hallucinating.
Result
- Stable.
- Zero hallucinations in finals.
- Judges leaned in.
- We won 1st prize.
Lessons I Took Home (Besides Goodies and The Prize)
- Abstraction Tax Is Real: Fancy orchestration/agent frameworks save time—until they don’t. When they fail, you pay interest + penalty.
- Understand the Critical Path: For us, it was: ingest → embed → store → retrieve → answer. Anything on that path must be under your control.
- Logs > Magic: Microservices + explicit logs beat hidden internal state every time at 3 AM.
- Design for Failure Swaps: Keep your components swappable (e.g., changing embedding model or vector store shouldn’t nuke your stack).
- Demo Early, Demo Often: First eval worked, final died—catch such regressions earlier with watchdog scripts and regression tests.
- Sleep Is a Feature: Micro‑naps are fine. Just make sure your stack doesn’t turn into Schrödinger’s system while you’re out.
"If We Had Another 12 Hours…"
Add a reranker (e.g., Cohere Rerank or OpenAI re‑embed) for better context quality.
Implement chunk caching + diffing (embed only changed parts of a doc).
Build a config-driven layer (YAML) so swapping DB/LLM is a flag, not a code edit.
Dispatch webhooks to update downstream apps when Drive changes.
A Quick Anti-Abstraction Checklist
Before you grab the next shiny orchestrator, ask:
Q1. Can I wire ingest → embed → store → retrieve → answer myself last-minute?
Q2. Do I have logs/metrics at each hop (API call, chunking, DB write, retrieval, LLM)?
Q3. Can I hot‑swap the embedder/vector DB/LLM with just env/config, not code surgery?
Q4. Are the deps/docs maintained (recent commits, examples for my exact flow)?
Q5. Do I have a smoke test/watchdog that runs end‑to‑end after every tweak (or nap)?
Q6. What’s my fallback plan if this lib dies at 3 AM—how fast can I drop to primitives?
If you answer "no" too many times, maybe… ship the bare bones first.
Architecture: Framework vs. Bare Bones
┌──────────────┐ ┌────────┐ ┌──────────┐ ┌──────────┐
│ Google Drive │ ───────▶ │ Pathway│ ───────▶│ Agno │ ───────▶│ LLM │
└──────────────┘ └────────┘ └──────────┘ └──────────┘
▲ ▲ ▲
│ │ │
└────────────── pain ─────┴──────────────────┘
The project had a few more components, but here I'm only showing what we replaced the abstractions with.
Our final pipeline:
┌──────────────┐ changes API ┌──────────────────┐ embed ┌──────────┐ cosine ┌──────────┐
│ Google Drive │ ───────────────▶│ GDrive Microserv │──────────▶│ MongoDB │──────────▶│ Prompt │
└──────────────┘ └──────────────────┘ └──────────┘ └──────────┘
▼
LLM
What You Can Reuse (Feel Free)
- Drive change tracking with
changes.list
andstartPageToken
. - Mongo single-collection strategy (simplifies joins & lookups under time pressure).
- Minimal RAG template with deterministic retrieval + strict prompting.
Closing Thoughts
We didn’t win because we had the fanciest stack. We won because when the abstractions crumbled, we knew the primitives well enough to rebuild—fast.
Build your own lever before you rely on someone else’s pulleys.
PS: Shout-outs
- My teammate, for trusting the last-minute pivot even while half-asleep.
- The mentors who liked the idea and pushed us to keep it simple.
- JIS Kolkata campus hallways for being the best debugging rubber duck.
Questions, code, or architecture deep-dive? Ping me—happy to expand any part.