Hexafalls: When Orchestrators Fail and Bare‑Bones RAG Wins

TL;DR: Two zombies (me and my teammate) walked into a 36‑hour hackathon (Hexafalls) with zero ideas, burned 7 hours just to pick one, chose shiny orchestration/agentic stacks (Pathway + Agno), fought dependency hell and phantom bugs, rage‑built our own microservice + minimal RAG layer in the last stretch-and shipped a real‑time Google Drive knowledge base that didn't hallucinate. We bagged 1st prize and a hard lesson: the less you depend on opaque abstractions, the happier your code-and brain-stay.

Scene Setting: Three Hackathons, One Brain Cell Left

Week load: 3 hackathons in ~7 days. One literally the day before Hexafalls.
Team Ingenico: Just two of us-me & my teammate.
Energy level: "I can code with my eyes closed" level sleep deprivation.
Time spent to decide the idea: ~7 hours of a 36‑hour event.

We finally settled on: "Self-updating Knowledge Base over Google Drive (Google Workspace Docs)"-simple, useful, mentor-approved. The catch? The architecture had to be tight.

The Original Plan (a.k.a. The Abstraction Dream)

Stack we thought would make life easier

Ingestion/Orchestration: Pathway (could also extend to S3 later).
RAG / Agentic Layer: Agno (agentic framework we’ve used before and liked).
Embeddings: OpenAI text-embedding-3-small.
Vector Store: MongoDB (single collection, keep it stupid-simple).

Why this looked good on paper

Rapid prototyping under time pressure.
Pathway promised change tracking & pipelines.
Agno promised plug‑and‑play agent + RAG scaffolding.
We didn’t want to reinvent the wheel… until the wheel fell off.

Reality Check #1: Pathway Orchestration Implodes

Not enough docs. Sparse examples. Ancient dependencies. Random crashes. Garbage embeddings. Silent failures. Pick your poison.

What bit us:

Outdated libs: One dependency hadn’t seen love in ~9 years.
Weird embedding outputs: Non‑deterministic results, inexplicable vector sizes.
Random runtime errors: Things failed after working once. Repro? Good luck.

Time sink: “Many hours” trying to coerce it into stability. No dice.

Pivot #1: Roll Our Own Orchestrator

"Screw it, I’ll just write the damn service."

What we built instead:

A tiny microservice using the Google Drive API (GCP).
Real-time change tracking using Drive’s changes endpoint (no 2‑min SHA256 polling nonsense).
Embeddings with OpenAI → stored in MongoDB (single collection).

Under the hood of the Drive job ("orchestrator")

Auth & scope: Google Service Account (client.json) with drive.readonly.
Bootstrap: embed_all_existing_files() recursively scans the watch folder, exports each doc as plain text (or downloads binary) and embeds.
Change stream: Uses Drive changes().list() with startPageToken / newStartPageToken instead of 2‑min SHA256 polling.
Folder ancestry check: ancestors_include_target() walks parent chains so nested files still trigger updates.
Chunking: Fixed CHUNK_SIZE_CHARS = 2000 char windows; chunk_text() ensures at least one chunk.
Embedding: OpenAI text-embedding-3-small; batch call once per file.
Upserts: bulk_write() with UpdateOne + $set/$setOnInsert, UUID as _id, per‑chunk meta_data.chunk used as idempotent key.
Interval: Tight POLL_INTERVAL_SEC = 5 seconds; good for demo, would back off in prod.
Logging: Structured logging across Mongo/Google clients; early ping to crash fast on DB issues.

Mongo schema (simplified):

{
  "file_id": "<gdrive_file_id>",
  "chunk_id": "<uuid>",
  "text": "<chunk_text>",
  "embedding": [0.0123, -0.98, ...],
  "updated_at": "2025-06-29T12:34:56Z"
}

Reality Check #2: Agno Starts Gaslighting Us

Agno has good docs. We’d used it before. Still…

Pain points:

Changing Mongo connection strings sometimes worked, sometimes ghosted.
Swapping embedding models/collection names broke flows randomly.
We literally created DBs named bodhi1, bodhi2, bodhi23 just to trick it into behaving.

The Cruel Twist

First evaluation demo: Smooth. Real-time updates showed up; mentors smiled.
One hour nap later: Dead. Agno stopped reading vectors; QA answers vanished.

We burned most of the remaining time trying to revive it. Sleep deprivation turned into actual hallucinations.

Pivot #2: Bare‑Bones RAG to the Rescue

We took a hallway walk (JIS Kolkata campus is pretty, btw), came back, and said: "Strip it. Build only what we control."

What we replaced Agno with

Retriever: Cosine similarity over Mongo‑stored embeddings.
Context Builder: Top‑k chunks concatenated with metadata.
Prompt Template: Simple system instructions to reduce hallucination.
LLM Call: Plain OpenAI chat/completions.

Minimal retrieval code (illustrative):

import os, json, numpy as np
from pymongo import MongoClient
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity

# ─── Config ───
MONGO_URI   = os.getenv("MONGO_URI")
DB_NAME     = "kb_db"
COL_NAME    = "embeddings"
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL  = "gpt-4o"
TOP_K       = 5

# ─── Clients ───
openai = OpenAI()
col    = MongoClient(MONGO_URI)[DB_NAME][COL_NAME]

# ─── Embedding & Retrieval ───
def get_query_embedding(q: str):
    return openai.embeddings.create(model=EMBED_MODEL, input=[q]).data[0].embedding

def find_similar_chunks(q_vec, k: int = TOP_K):
    docs = list(col.find({}, {"content": 1, "embedding": 1, "name": 1, "_id": 0}))
    if not docs:
        return []
    mat    = np.array([d["embedding"] for d in docs])
    scores = cosine_similarity([q_vec], mat)[0]
    idx    = scores.argsort()[::-1][:k]
    return [docs[i]["content"] for i in idx]

# ─── Answer generator ───
def answer(query: str):
    q_vec  = get_query_embedding(query)
    chunks = find_similar_chunks(q_vec)
    if not chunks:
        return "❌ No relevant document chunks found."
    context = "
---
".join(chunks)
    resp = openai.chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": "Use ONLY the context. If it's missing, say you don't know."},
            {"role": "user",   "content": f"Context:
{context}

Question:
{query}"},
        ],
        temperature=0.4, top_p=0.6, max_tokens=1200,
    )
    return resp.choices[0].message.content.strip()

(We actually pushed cosine into Python to avoid nasty $map gymnastics, but you get the idea.)

Under the hood of the bare‑bones RAG layer

Retriever: Load embeddings from Mongo into a NumPy matrix (hackathon scale) and run cosine similarity (sklearn.metrics.pairwise.cosine_similarity).
TOP_K: 5 chunks.
Prompting: One strict system message + user message with context; deterministic-ish params (temperature=0.4, top_p=0.6).
Output: Plain text. No Pydantic/parse()-keep it dumb and reliable.
CLI Debug: Print the top chunks’ previews to eyeball relevance quickly.
Fail-safe: If no chunk retrieved, return a clear "don’t know" message instead of hallucinating.

Result

Stable.
Zero hallucinations in finals.
Judges leaned in.
We won 1st prize.

Lessons I Took Home (Besides Goodies and The Prize)

Abstraction Tax Is Real: Fancy orchestration/agent frameworks save time-until they don’t. When they fail, you pay interest + penalty.
Understand the Critical Path: For us, it was: ingest → embed → store → retrieve → answer. Anything on that path must be under your control.
Logs > Magic: Microservices + explicit logs beat hidden internal state every time at 3 AM.
Design for Failure Swaps: Keep your components swappable (e.g., changing embedding model or vector store shouldn’t nuke your stack).
Demo Early, Demo Often: First eval worked, final died-catch such regressions earlier with watchdog scripts and regression tests.
Sleep Is a Feature: Micro‑naps are fine. Just make sure your stack doesn’t turn into Schrödinger’s system while you’re out.

"If We Had Another 12 Hours…"

Add a reranker (e.g., Cohere Rerank or OpenAI re‑embed) for better context quality.
Implement chunk caching + diffing (embed only changed parts of a doc).
Build a config-driven layer (YAML) so swapping DB/LLM is a flag, not a code edit.
Dispatch webhooks to update downstream apps when Drive changes.

A Quick Anti-Abstraction Checklist

Before you grab the next shiny orchestrator, ask:

Q1. Can I wire ingest → embed → store → retrieve → answer myself last-minute?

Q2. Do I have logs/metrics at each hop (API call, chunking, DB write, retrieval, LLM)?

Q3. Can I hot‑swap the embedder/vector DB/LLM with just env/config, not code surgery?

Q4. Are the deps/docs maintained (recent commits, examples for my exact flow)?

Q5. Do I have a smoke test/watchdog that runs end‑to‑end after every tweak (or nap)?

Q6. What’s my fallback plan if this lib dies at 3 AM-how fast can I drop to primitives?

If you answer "no" too many times, maybe… ship the bare bones first.

Architecture: Framework vs. Bare Bones

 ┌──────────────┐          ┌────────┐         ┌──────────┐         ┌──────────┐
 │ Google Drive │ ───────▶ │ Pathway│ ───────▶│  Agno    │ ───────▶│   LLM    │
 └──────────────┘          └────────┘         └──────────┘         └──────────┘
        ▲                         ▲                  ▲
        │                         │                  │
        └────────────── pain ─────┴──────────────────┘

The project had a few more components, but here I'm only showing what we replaced the abstractions with.

Our final pipeline:

 ┌──────────────┐   changes API   ┌──────────────────┐   embed   ┌──────────┐   cosine   ┌──────────┐
 │ Google Drive │ ───────────────▶│ GDrive Microserv │──────────▶│ MongoDB  │──────────▶│  Prompt  │
 └──────────────┘                  └──────────────────┘           └──────────┘           └──────────┘
                                                                                       ▼
                                                                                      LLM

What You Can Reuse (Feel Free)

Drive change tracking with changes.list and startPageToken.
Mongo single-collection strategy (simplifies joins & lookups under time pressure).
Minimal RAG template with deterministic retrieval + strict prompting.

Closing Thoughts

We didn’t win because we had the fanciest stack. We won because when the abstractions crumbled, we knew the primitives well enough to rebuild-fast.

Build your own lever before you rely on someone else’s pulleys.

PS: Shout-outs

My teammate, for trusting the last-minute pivot even while half-asleep.
The mentors who liked the idea and pushed us to keep it simple.
JIS Kolkata campus hallways for being the best debugging rubber duck.

Questions, code, or architecture deep-dive? Ping me-happy to expand any part.