Published on

Control your stack DUMP the BLack Boxes

Authors

Hexafalls: When Orchestrators Fail and Bare‑Bones RAG Wins

TL;DR: Two zombies (me and my teammate) walked into a 36‑hour hackathon (Hexafalls) with zero ideas, burned 7 hours just to pick one, chose shiny orchestration/agentic stacks (Pathway + Agno), fought dependency hell and phantom bugs, rage‑built our own microservice + minimal RAG layer in the last stretch-and shipped a real‑time Google Drive knowledge base that didn't hallucinate. We bagged 1st prize and a hard lesson: the less you depend on opaque abstractions, the happier your code-and brain-stay.


Scene Setting: Three Hackathons, One Brain Cell Left

  • Week load: 3 hackathons in ~7 days. One literally the day before Hexafalls.
  • Team Ingenico: Just two of us-me & my teammate.
  • Energy level: "I can code with my eyes closed" level sleep deprivation.
  • Time spent to decide the idea: ~7 hours of a 36‑hour event.

We finally settled on: "Self-updating Knowledge Base over Google Drive (Google Workspace Docs)"-simple, useful, mentor-approved. The catch? The architecture had to be tight.


The Original Plan (a.k.a. The Abstraction Dream)

Stack we thought would make life easier

  • Ingestion/Orchestration: Pathway (could also extend to S3 later).
  • RAG / Agentic Layer: Agno (agentic framework we’ve used before and liked).
  • Embeddings: OpenAI text-embedding-3-small.
  • Vector Store: MongoDB (single collection, keep it stupid-simple).

Why this looked good on paper

  • Rapid prototyping under time pressure.
  • Pathway promised change tracking & pipelines.
  • Agno promised plug‑and‑play agent + RAG scaffolding.
  • We didn’t want to reinvent the wheel… until the wheel fell off.

Reality Check #1: Pathway Orchestration Implodes

Not enough docs. Sparse examples. Ancient dependencies. Random crashes. Garbage embeddings. Silent failures. Pick your poison.

What bit us:

  • Outdated libs: One dependency hadn’t seen love in ~9 years.
  • Weird embedding outputs: Non‑deterministic results, inexplicable vector sizes.
  • Random runtime errors: Things failed after working once. Repro? Good luck.

Time sink: “Many hours” trying to coerce it into stability. No dice.

Pivot #1: Roll Our Own Orchestrator

"Screw it, I’ll just write the damn service."

What we built instead:

  • A tiny microservice using the Google Drive API (GCP).
  • Real-time change tracking using Drive’s changes endpoint (no 2‑min SHA256 polling nonsense).
  • Embeddings with OpenAI → stored in MongoDB (single collection).

Under the hood of the Drive job ("orchestrator")

  • Auth & scope: Google Service Account (client.json) with drive.readonly.
  • Bootstrap: embed_all_existing_files() recursively scans the watch folder, exports each doc as plain text (or downloads binary) and embeds.
  • Change stream: Uses Drive changes().list() with startPageToken / newStartPageToken instead of 2‑min SHA256 polling.
  • Folder ancestry check: ancestors_include_target() walks parent chains so nested files still trigger updates.
  • Chunking: Fixed CHUNK_SIZE_CHARS = 2000 char windows; chunk_text() ensures at least one chunk.
  • Embedding: OpenAI text-embedding-3-small; batch call once per file.
  • Upserts: bulk_write() with UpdateOne + $set/$setOnInsert, UUID as _id, per‑chunk meta_data.chunk used as idempotent key.
  • Interval: Tight POLL_INTERVAL_SEC = 5 seconds; good for demo, would back off in prod.
  • Logging: Structured logging across Mongo/Google clients; early ping to crash fast on DB issues.

Mongo schema (simplified):

{
  "file_id": "<gdrive_file_id>",
  "chunk_id": "<uuid>",
  "text": "<chunk_text>",
  "embedding": [0.0123, -0.98, ...],
  "updated_at": "2025-06-29T12:34:56Z"
}

Reality Check #2: Agno Starts Gaslighting Us

Agno has good docs. We’d used it before. Still…

Pain points:

  • Changing Mongo connection strings sometimes worked, sometimes ghosted.
  • Swapping embedding models/collection names broke flows randomly.
  • We literally created DBs named bodhi1, bodhi2, bodhi23 just to trick it into behaving.

The Cruel Twist

  • First evaluation demo: Smooth. Real-time updates showed up; mentors smiled.
  • One hour nap later: Dead. Agno stopped reading vectors; QA answers vanished.

We burned most of the remaining time trying to revive it. Sleep deprivation turned into actual hallucinations.


Pivot #2: Bare‑Bones RAG to the Rescue

We took a hallway walk (JIS Kolkata campus is pretty, btw), came back, and said: "Strip it. Build only what we control."

What we replaced Agno with

  1. Retriever: Cosine similarity over Mongo‑stored embeddings.
  2. Context Builder: Top‑k chunks concatenated with metadata.
  3. Prompt Template: Simple system instructions to reduce hallucination.
  4. LLM Call: Plain OpenAI chat/completions.

Minimal retrieval code (illustrative):

import os, json, numpy as np
from pymongo import MongoClient
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity

# ─── Config ───
MONGO_URI   = os.getenv("MONGO_URI")
DB_NAME     = "kb_db"
COL_NAME    = "embeddings"
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL  = "gpt-4o"
TOP_K       = 5

# ─── Clients ───
openai = OpenAI()
col    = MongoClient(MONGO_URI)[DB_NAME][COL_NAME]

# ─── Embedding & Retrieval ───
def get_query_embedding(q: str):
    return openai.embeddings.create(model=EMBED_MODEL, input=[q]).data[0].embedding

def find_similar_chunks(q_vec, k: int = TOP_K):
    docs = list(col.find({}, {"content": 1, "embedding": 1, "name": 1, "_id": 0}))
    if not docs:
        return []
    mat    = np.array([d["embedding"] for d in docs])
    scores = cosine_similarity([q_vec], mat)[0]
    idx    = scores.argsort()[::-1][:k]
    return [docs[i]["content"] for i in idx]

# ─── Answer generator ───
def answer(query: str):
    q_vec  = get_query_embedding(query)
    chunks = find_similar_chunks(q_vec)
    if not chunks:
        return "❌ No relevant document chunks found."
    context = "
---
".join(chunks)
    resp = openai.chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": "Use ONLY the context. If it's missing, say you don't know."},
            {"role": "user",   "content": f"Context:
{context}

Question:
{query}"},
        ],
        temperature=0.4, top_p=0.6, max_tokens=1200,
    )
    return resp.choices[0].message.content.strip()

(We actually pushed cosine into Python to avoid nasty $map gymnastics, but you get the idea.)

Under the hood of the bare‑bones RAG layer

  • Retriever: Load embeddings from Mongo into a NumPy matrix (hackathon scale) and run cosine similarity (sklearn.metrics.pairwise.cosine_similarity).
  • TOP_K: 5 chunks.
  • Prompting: One strict system message + user message with context; deterministic-ish params (temperature=0.4, top_p=0.6).
  • Output: Plain text. No Pydantic/parse()-keep it dumb and reliable.
  • CLI Debug: Print the top chunks’ previews to eyeball relevance quickly.
  • Fail-safe: If no chunk retrieved, return a clear "don’t know" message instead of hallucinating.

Result

  • Stable.
  • Zero hallucinations in finals.
  • Judges leaned in.
  • We won 1st prize.

Lessons I Took Home (Besides Goodies and The Prize)

  1. Abstraction Tax Is Real: Fancy orchestration/agent frameworks save time-until they don’t. When they fail, you pay interest + penalty.
  2. Understand the Critical Path: For us, it was: ingest → embed → store → retrieve → answer. Anything on that path must be under your control.
  3. Logs > Magic: Microservices + explicit logs beat hidden internal state every time at 3 AM.
  4. Design for Failure Swaps: Keep your components swappable (e.g., changing embedding model or vector store shouldn’t nuke your stack).
  5. Demo Early, Demo Often: First eval worked, final died-catch such regressions earlier with watchdog scripts and regression tests.
  6. Sleep Is a Feature: Micro‑naps are fine. Just make sure your stack doesn’t turn into Schrödinger’s system while you’re out.

"If We Had Another 12 Hours…"

  • Add a reranker (e.g., Cohere Rerank or OpenAI re‑embed) for better context quality.

  • Implement chunk caching + diffing (embed only changed parts of a doc).

  • Build a config-driven layer (YAML) so swapping DB/LLM is a flag, not a code edit.

  • Dispatch webhooks to update downstream apps when Drive changes.


A Quick Anti-Abstraction Checklist

Before you grab the next shiny orchestrator, ask:

Q1. Can I wire ingest → embed → store → retrieve → answer myself last-minute?

Q2. Do I have logs/metrics at each hop (API call, chunking, DB write, retrieval, LLM)?

Q3. Can I hot‑swap the embedder/vector DB/LLM with just env/config, not code surgery?

Q4. Are the deps/docs maintained (recent commits, examples for my exact flow)?

Q5. Do I have a smoke test/watchdog that runs end‑to‑end after every tweak (or nap)?

Q6. What’s my fallback plan if this lib dies at 3 AM-how fast can I drop to primitives?

If you answer "no" too many times, maybe… ship the bare bones first.


Architecture: Framework vs. Bare Bones

 ┌──────────────┐          ┌────────┐         ┌──────────┐         ┌──────────┐
Google Drive │ ───────▶ │ Pathway│ ───────▶│  Agno    │ ───────▶│   LLM └──────────────┘          └────────┘         └──────────┘         └──────────┘
        ▲                         ▲                  ▲
        │                         │                  │
        └────────────── pain ─────┴──────────────────┘

The project had a few more components, but here I'm only showing what we replaced the abstractions with.

Our final pipeline:

 ┌──────────────┐   changes API   ┌──────────────────┐   embed   ┌──────────┐   cosine   ┌──────────┐
Google Drive │ ───────────────▶│ GDrive Microserv │──────────▶│ MongoDB  │──────────▶│  Prompt └──────────────┘                  └──────────────────┘           └──────────┘           └──────────┘
                                                                                      LLM

What You Can Reuse (Feel Free)

  • Drive change tracking with changes.list and startPageToken.
  • Mongo single-collection strategy (simplifies joins & lookups under time pressure).
  • Minimal RAG template with deterministic retrieval + strict prompting.

Closing Thoughts

We didn’t win because we had the fanciest stack. We won because when the abstractions crumbled, we knew the primitives well enough to rebuild-fast.

Build your own lever before you rely on someone else’s pulleys.


PS: Shout-outs

  • My teammate, for trusting the last-minute pivot even while half-asleep.
  • The mentors who liked the idea and pushed us to keep it simple.
  • JIS Kolkata campus hallways for being the best debugging rubber duck.

Questions, code, or architecture deep-dive? Ping me-happy to expand any part.