RAGs, MCPs and Scripts. AI moves fast
Like JavaScript framework churn, agentic methodology churn is fully underway. RAG (Retrieval-Augmented Generation) was the breakthrough that let LLMs work with external data. Retrieve documents, embed them in context, generate responses. It felt elegant. Six months ago, it was cutting-edge.
Now? We’re discovering that preloading everything into context is the wrong pattern. Agent based architectures with MCP let LLMs call tools on demand. And scripting is already here. The LLM writes code (JavaScript, Python, etc.) that calls multiple SQL queries, filters data, and returns just the answer. Each evolution reduces what hits the LLM’s context.
RAGs #
Traditional RAG maintains a vector database of pre-processed documents. Query comes in, retrieve similar documents, dump them in context, generate response. Simple pipeline.
| |
The problem? It treats context like a dumping ground. Every potentially relevant document gets loaded whether you need it or not. Context isn’t free. It costs money (tokens aren’t cheap), adds latency, and degrades performance when filled with noise. Plus, context windows are limited. Even with millions of tokens available, you pay for every token, and models perform worse with irrelevant data. Worse, centralizing data into vector databases bypasses original access controls. Data gets pooled together, and you’re always working with slightly stale data.
MCPs #
Agent based architectures query data sources dynamically at runtime. Instead of preloading everything, fetch what you need when you need it. Tools like MCP (Model Context Protocol) let LLMs call predefined tools. They can execute a SQL query, call an API, or fetch a file. The LLM decides which tool to use, calls it with parameters, and gets the results.
| |
Better than RAG, but limited to predefined tools. Each tool call returns data to context.
Scripting #
The big shift is that the LLM writes code that executes in a sandbox. Unlike MCP where you call one tool and get data back into context, with scripting the LLM can write code (JavaScript, Python, etc.) that calls multiple SQL queries, filters the results, and aggregates data. It all runs in one script. Only the final answer comes back to context.
Here is how it works. The LLM generates a function that runs in a sandbox. The script has access to SQL and file APIs, and it does the procedural work. Instead of “call SQL, get 1000 rows back into context, call SQL again, get more data,” the script calls SQL multiple times, filters and processes everything, and returns only what you need. Model code execution keeps improving fast, so this approach is accelerating.
| |
The key difference is that the script can call SQL multiple times, filter results, and aggregate data. All before returning to the LLM. With MCP, each tool call returns data to context. With scripting, you call multiple tools within the script and only return the filtered answer.
| |
The script handles all the procedural work. The LLM just gets the answer.
Bonus. Scripts are repeatable. Run them twice and you get the same results. With a compiler they are less prone to hallucinations. Syntax errors fail fast, not after a confident but wrong answer. Model code execution is advancing faster than anything else, which makes this the clear winner.
Wrap Up #
The evolution is clear. RAG (preload everything) → agent based with MCP (call tools, get data back) → scripting (write code that calls multiple tools and filters before returning). Each stage reduces what hits the LLM’s context and it saves both tokens and money.
With MCP you call one SQL query and data comes back to context. You call another and more data returns to context. Every token costs. With scripting the LLM writes code (any language) that calls multiple SQL queries, filters everything in the script, and returns just the answer. A 100MB log file becomes 5 entries. 10,000 user records become summary statistics. Context windows stay small and costs stay low.
Scripts are repeatable, compiler checked, and less prone to hallucinations. Syntax errors fail fast. Models are getting better at code faster than anything else, so scripting is the winning bet. The future is not about how much we can cram into context. It is about how intelligently we keep things out.




Comments
Post a new comment
We get avatars from Gravatar. You can use emojis as per the Emoji cheat sheet.