Published on

Control your stack DUMP the BLack Boxes

Authors

Hexafalls: When Orchestrators Fail and Bare‑Bones RAG Wins

TL;DR: Two zombies (me and my teammate) walked into a 36‑hour hackathon (Hexafalls) with zero ideas, burned 7 hours just to pick one, chose shiny orchestration/agentic stacks (Pathway + Agno), fought dependency hell and phantom bugs, rage‑built our own microservice + minimal RAG layer in the last stretch—and shipped a real‑time Google Drive knowledge base that didn't hallucinate. We bagged 1st prize and a hard lesson: the less you depend on opaque abstractions, the happier your code—and brain—stay.


Scene Setting: Three Hackathons, One Brain Cell Left

  • Week load: 3 hackathons in ~7 days. One literally the day before Hexafalls.
  • Team Ingenico: Just two of us—me & my teammate.
  • Energy level: "I can code with my eyes closed" level sleep deprivation.
  • Time spent to decide the idea: ~7 hours of a 36‑hour event.

We finally settled on: "Self-updating Knowledge Base over Google Drive (Google Workspace Docs)"—simple, useful, mentor-approved. The catch? The architecture had to be tight.


The Original Plan (a.k.a. The Abstraction Dream)

Stack we thought would make life easier

  • Ingestion/Orchestration: Pathway (could also extend to S3 later).
  • RAG / Agentic Layer: Agno (agentic framework we’ve used before and liked).
  • Embeddings: OpenAI text-embedding-3-small.
  • Vector Store: MongoDB (single collection, keep it stupid-simple).

Why this looked good on paper

  • Rapid prototyping under time pressure.
  • Pathway promised change tracking & pipelines.
  • Agno promised plug‑and‑play agent + RAG scaffolding.
  • We didn’t want to reinvent the wheel… until the wheel fell off.

Reality Check #1: Pathway Orchestration Implodes

Not enough docs. Sparse examples. Ancient dependencies. Random crashes. Garbage embeddings. Silent failures. Pick your poison.

What bit us:

  • Outdated libs: One dependency hadn’t seen love in ~9 years.
  • Weird embedding outputs: Non‑deterministic results, inexplicable vector sizes.
  • Random runtime errors: Things failed after working once. Repro? Good luck.

Time sink: “Many hours” trying to coerce it into stability. No dice.

Pivot #1: Roll Our Own Orchestrator

"Screw it, I’ll just write the damn service."

What we built instead:

  • A tiny microservice using the Google Drive API (GCP).
  • Real-time change tracking using Drive’s changes endpoint (no 2‑min SHA256 polling nonsense).
  • Embeddings with OpenAI → stored in MongoDB (single collection).

Under the hood of the Drive job ("orchestrator")

  • Auth & scope: Google Service Account (client.json) with drive.readonly.
  • Bootstrap: embed_all_existing_files() recursively scans the watch folder, exports each doc as plain text (or downloads binary) and embeds.
  • Change stream: Uses Drive changes().list() with startPageToken / newStartPageToken instead of 2‑min SHA256 polling.
  • Folder ancestry check: ancestors_include_target() walks parent chains so nested files still trigger updates.
  • Chunking: Fixed CHUNK_SIZE_CHARS = 2000 char windows; chunk_text() ensures at least one chunk.
  • Embedding: OpenAI text-embedding-3-small; batch call once per file.
  • Upserts: bulk_write() with UpdateOne + $set/$setOnInsert, UUID as _id, per‑chunk meta_data.chunk used as idempotent key.
  • Interval: Tight POLL_INTERVAL_SEC = 5 seconds; good for demo, would back off in prod.
  • Logging: Structured logging across Mongo/Google clients; early ping to crash fast on DB issues.

Mongo schema (simplified):

{
  "file_id": "<gdrive_file_id>",
  "chunk_id": "<uuid>",
  "text": "<chunk_text>",
  "embedding": [0.0123, -0.98, ...],
  "updated_at": "2025-06-29T12:34:56Z"
}

Reality Check #2: Agno Starts Gaslighting Us

Agno has good docs. We’d used it before. Still…

Pain points:

  • Changing Mongo connection strings sometimes worked, sometimes ghosted.
  • Swapping embedding models/collection names broke flows randomly.
  • We literally created DBs named bodhi1, bodhi2, bodhi23 just to trick it into behaving.

The Cruel Twist

  • First evaluation demo: Smooth. Real-time updates showed up; mentors smiled.
  • One hour nap later: Dead. Agno stopped reading vectors; QA answers vanished.

We burned most of the remaining time trying to revive it. Sleep deprivation turned into actual hallucinations.


Pivot #2: Bare‑Bones RAG to the Rescue

We took a hallway walk (JIS Kolkata campus is pretty, btw), came back, and said: "Strip it. Build only what we control."

What we replaced Agno with

  1. Retriever: Cosine similarity over Mongo‑stored embeddings.
  2. Context Builder: Top‑k chunks concatenated with metadata.
  3. Prompt Template: Simple system instructions to reduce hallucination.
  4. LLM Call: Plain OpenAI chat/completions.

Minimal retrieval code (illustrative):

import os, json, numpy as np
from pymongo import MongoClient
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity

# ─── Config ───
MONGO_URI   = os.getenv("MONGO_URI")
DB_NAME     = "kb_db"
COL_NAME    = "embeddings"
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL  = "gpt-4o"
TOP_K       = 5

# ─── Clients ───
openai = OpenAI()
col    = MongoClient(MONGO_URI)[DB_NAME][COL_NAME]

# ─── Embedding & Retrieval ───
def get_query_embedding(q: str):
    return openai.embeddings.create(model=EMBED_MODEL, input=[q]).data[0].embedding

def find_similar_chunks(q_vec, k: int = TOP_K):
    docs = list(col.find({}, {"content": 1, "embedding": 1, "name": 1, "_id": 0}))
    if not docs:
        return []
    mat    = np.array([d["embedding"] for d in docs])
    scores = cosine_similarity([q_vec], mat)[0]
    idx    = scores.argsort()[::-1][:k]
    return [docs[i]["content"] for i in idx]

# ─── Answer generator ───
def answer(query: str):
    q_vec  = get_query_embedding(query)
    chunks = find_similar_chunks(q_vec)
    if not chunks:
        return "❌ No relevant document chunks found."
    context = "
---
".join(chunks)
    resp = openai.chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {"role": "system", "content": "Use ONLY the context. If it's missing, say you don't know."},
            {"role": "user",   "content": f"Context:
{context}

Question:
{query}"},
        ],
        temperature=0.4, top_p=0.6, max_tokens=1200,
    )
    return resp.choices[0].message.content.strip()

(We actually pushed cosine into Python to avoid nasty $map gymnastics, but you get the idea.)

Under the hood of the bare‑bones RAG layer

  • Retriever: Load embeddings from Mongo into a NumPy matrix (hackathon scale) and run cosine similarity (sklearn.metrics.pairwise.cosine_similarity).
  • TOP_K: 5 chunks.
  • Prompting: One strict system message + user message with context; deterministic-ish params (temperature=0.4, top_p=0.6).
  • Output: Plain text. No Pydantic/parse()—keep it dumb and reliable.
  • CLI Debug: Print the top chunks’ previews to eyeball relevance quickly.
  • Fail-safe: If no chunk retrieved, return a clear "don’t know" message instead of hallucinating.

Result

  • Stable.
  • Zero hallucinations in finals.
  • Judges leaned in.
  • We won 1st prize.

Lessons I Took Home (Besides Goodies and The Prize)

  1. Abstraction Tax Is Real: Fancy orchestration/agent frameworks save time—until they don’t. When they fail, you pay interest + penalty.
  2. Understand the Critical Path: For us, it was: ingest → embed → store → retrieve → answer. Anything on that path must be under your control.
  3. Logs > Magic: Microservices + explicit logs beat hidden internal state every time at 3 AM.
  4. Design for Failure Swaps: Keep your components swappable (e.g., changing embedding model or vector store shouldn’t nuke your stack).
  5. Demo Early, Demo Often: First eval worked, final died—catch such regressions earlier with watchdog scripts and regression tests.
  6. Sleep Is a Feature: Micro‑naps are fine. Just make sure your stack doesn’t turn into Schrödinger’s system while you’re out.

"If We Had Another 12 Hours…"

  • Add a reranker (e.g., Cohere Rerank or OpenAI re‑embed) for better context quality.

  • Implement chunk caching + diffing (embed only changed parts of a doc).

  • Build a config-driven layer (YAML) so swapping DB/LLM is a flag, not a code edit.

  • Dispatch webhooks to update downstream apps when Drive changes.


A Quick Anti-Abstraction Checklist

Before you grab the next shiny orchestrator, ask:

Q1. Can I wire ingest → embed → store → retrieve → answer myself last-minute?

Q2. Do I have logs/metrics at each hop (API call, chunking, DB write, retrieval, LLM)?

Q3. Can I hot‑swap the embedder/vector DB/LLM with just env/config, not code surgery?

Q4. Are the deps/docs maintained (recent commits, examples for my exact flow)?

Q5. Do I have a smoke test/watchdog that runs end‑to‑end after every tweak (or nap)?

Q6. What’s my fallback plan if this lib dies at 3 AM—how fast can I drop to primitives?

If you answer "no" too many times, maybe… ship the bare bones first.


Architecture: Framework vs. Bare Bones

 ┌──────────────┐          ┌────────┐         ┌──────────┐         ┌──────────┐
Google Drive │ ───────▶ │ Pathway│ ───────▶│  Agno    │ ───────▶│   LLM └──────────────┘          └────────┘         └──────────┘         └──────────┘
        ▲                         ▲                  ▲
        │                         │                  │
        └────────────── pain ─────┴──────────────────┘

The project had a few more components, but here I'm only showing what we replaced the abstractions with.

Our final pipeline:

 ┌──────────────┐   changes API   ┌──────────────────┐   embed   ┌──────────┐   cosine   ┌──────────┐
Google Drive │ ───────────────▶│ GDrive Microserv │──────────▶│ MongoDB  │──────────▶│  Prompt └──────────────┘                  └──────────────────┘           └──────────┘           └──────────┘
                                                                                      LLM

What You Can Reuse (Feel Free)

  • Drive change tracking with changes.list and startPageToken.
  • Mongo single-collection strategy (simplifies joins & lookups under time pressure).
  • Minimal RAG template with deterministic retrieval + strict prompting.

Closing Thoughts

We didn’t win because we had the fanciest stack. We won because when the abstractions crumbled, we knew the primitives well enough to rebuild—fast.

Build your own lever before you rely on someone else’s pulleys.


PS: Shout-outs

  • My teammate, for trusting the last-minute pivot even while half-asleep.
  • The mentors who liked the idea and pushed us to keep it simple.
  • JIS Kolkata campus hallways for being the best debugging rubber duck.

Questions, code, or architecture deep-dive? Ping me—happy to expand any part.