The Process
Even the Experts Need a Flowchart
Hey, I'm Abhi. I frequently listen to pods like Hollinger & Duncan and The Lowe Post. One thing became ridiculously clear listening to the experts try to untangle the new CBA: its complicated. Like, "Wait, does the second apron prevent you from using the taxpayer mid-level exception only if you've also traded a future first-round pick seven years from now?" level of complicated.
It's a maze of cross-references, defined terms, and tables that can make anyone's head spin.
So, I got curious. Could I build an AI agent that doesn't just search the 696-page document, but actually understands it? An expert co-pilot that could answer a simple question like "How does the luxury tax work?" and not miss the five different sections that actually define it. With that foundation, I could build a tool that would help me answer questions about the CBA and help with more complex questions.
Teaching LLMs to Read a Legal Nightmare
Building this wasn't about just dumping a PDF into a chatbot. That fails instantly. The secret sauce is in how you prepare the data. I used a multi-level strategy designed specifically for dense, structured documents like the CBA.
1. The "Map & Zoom" Parsing Strategy
First, I didn't treat the whole document as one big blob of text. I broke it down into two types of "chunks":
Title Chunks (The Map): I took every high-level Article and Section from the Table of Contents and used a LLM to create a rich, keyword-heavy summary for each. This chunk knows, for example, that "Article VII, Section 2" is where you find rules about the "Salary Cap," "Luxury Tax," and the "Apron." It's the map of the entire document.
Content Chunks (The Details): I then took every single granular subsection—every (a), (i), and (1)—and turned it into its own tiny chunk. This is where the actual, specific rules live.
2. Smart Embeddings: Giving the Chunks Context
When I converted these chunks into vectors (the numbers that AI understands), I didn't just embed the raw text. I gave it context. A specific chunk about a tax rate isn't just a number; it's Article VII → Section 2 → Tax Level → Tax Bracket Table. By prepending this hierarchy to the text before embedding, the search becomes much smarter.
3. The Multi-Step RAG System: Mimicking a Real Expert
This is where it all comes together. When a user asks a question, our system works like a real expert would:
Step 1: Consult the Map (Broad Search): I do a vector search to find the most relevant high-level "Title" chunks. When you ask about "2nd apron restrictions," it immediately finds the enriched Article VII, Section 2 summary because it's packed with those keywords.
Step 2: Zoom In on the Details (Filtered Retrieval): The system sees that Article VII, Section 2 is the place to be. It then performs a second, targeted retrieval to pull all the detailed "Content" chunks from that specific section. This is how it finds the crucial "Transaction Restrictions Table" that a simple search would have missed.
Step 3: Synthesize the Answer: Finally, it hands all this rich, interconnected context—both the high-level summary and the specific rules—to the LLM (GPT-4o-mini) with a strict prompt: "You are an expert. Use only this context. Explain it simply and cite your sources."
This "map-then-zoom" approach ensures the AI has the complete picture, allowing it to connect a definition in one place to its consequences listed ten pages later, just like a lawyer would.
The Future: From CBA Expert to Team-Specific GM Co-Pilot
This v1 is the foundation. It has mastered the text of the CBA. The real fun starts now. With this robust system in place, I can begin integrating real-world data to answer questions that fans, agents, and front offices actually grapple with every day:
Team-Specific Context: By feeding it real-time salary cap data from sources like Spotrac, I can ask: "Given the Knicks' current payroll, can they use the full Non-Taxpayer Mid-Level Exception without crossing the first apron?"
Player Contract Analysis: By integrating player contract details, I can ask: "If Austin Reaves opts out of his current deal, what would the Lakers' qualifying offer have to be to make him a Restricted Free Agent?"
Dynamic "What If" Scenarios: The ultimate goal. An agent that can answer complex, multi-step questions like: "The Warriors are over the second apron. If they trade Draymond Green's expiring contract for a player making $15 million, what team-building tools do they lose access to for the rest of the season?"
The v1 proves I can make an AI understand the rules of the game. The next versions will teach it how to play.