Insights | The realfast view

Your AI Roadmap Probably Has Too Much AI

Most production systems should use AI at the edges and deterministic code in the core. As per-token pricing turns honest, a lot of what's on AI roadmaps today won't survive the look.

By Paulomi Gudka , Managing Director, USA

May 21st 2026 · 6 min read

Your AI Roadmap Probably Has Too Much AI — illustration: realfast

Most production systems should use AI at the edges and deterministic code in the core. AI is great at turning messy input into clean structured intent, and at turning structured output into readable language for humans. It is bad at being the engine in the middle — the place where math, lookups, and rule application happen. The best systems route through AI once on the way in, run normal code in the middle, and optionally route through AI once on the way out.

This isn’t a particularly complex set of ideas. But a surprising amount of what’s on AI roadmaps today violates this shape. And a lot of this shape-violation is about to stop being affordable.

Your hidden token tax is now due

For two years, the cost of building with AI was hidden inside venture money and aggressive vendor pricing. That’s changing. As model providers shift toward pricing that reflects actual compute, executives are getting their first honest look at what AI-everything costs to operate. And a lot of what’s been built doesn’t survive that look.

Three things are now legible at once:

Cost. Per-token pricing is real and increasingly non-promotional. The workload that cost a very comforting few hundred dollars a month two years ago can land in five figures today.
Latency. A function call returns in single-digit milliseconds. A frontier-model call typically has a time-to-first-token of half a second to two seconds, before the body of the response streams. Compound that across a multi-step workflow and the user-visible delay stops being a UX issue and becomes a product issue.
Reliability. Deterministic code passes the same input through the same logic every time. LLMs don’t. The variance is bounded enough to ship and unbounded enough to break things in production.

This combination, and the token tax bill, now forces a conversation that should have happened earlier.

Mind you, this isn’t an anti-AI argument. It’s the opposite. It’s an argument for taking AI seriously enough to put it where it earns its keep, and not where it doesn’t.

The question to ask before adding AI to anything

Can you write the rule?

If a moderately experienced engineer could describe the logic in a paragraph, write the rule. Good code is cheaper to run, faster to execute, easier to test, and behaves the same way today as it will next quarter. The model you deployed last year may not even exist next year.

AI earns its place when the input space is unbounded or the output is generative. Unstructured text from a customer, an image you’ve never seen before, a judgement call about tone — these are all problems where rules either can’t be written or would take a thousand of them. Drafting, summarising, translating, classifying along fuzzy criteria — these are problems where “approximately right” is the acceptable outcome and the alternative is “nothing at all.”

This is the placement discipline realfast has been arguing about since day one. AI-first is an operating model, not a tool decision. The teams making sub-optimal AI placement decisions are usually the ones that bolted AI onto an unchanged operating model. Which seemed fine… until the token taxman came along.

Patterns that don’t need AI

A useful exercise: walk through your current AI roadmap and check whether any of these describe what’s actually being built.

Routing or categorisation with a known taxonomy. If the categories are fixed and the signals are stable, a classifier or a set of if-statements will outperform an LLM on accuracy, speed, and cost. We have seen teams replace an LLM-based ticket router with sixty lines of code, drop p99 latency by 95%, and cut the monthly bill by two orders of magnitude. The model is never the right tool here.
Data movement between systems. Webhooks, queues, and integration platforms have done this for thirty years. Calling a model to decide where to put a row is theatre.
Calculations of any kind. LLMs are not calculators. If the answer can be computed, compute it.
Lookups against your own database. Querying the database directly is faster than asking a model to remember what’s in it.
Search bars rebuilt as chatbots. Most users searching your product want to find a thing, not have a conversation about finding a thing. The chatbot adds latency and removes precision.
Cron jobs rebranded as agents. If the schedule is fixed and the steps are known, it’s a scheduled script. Calling it an agent doesn’t change what it is. More theatre.
AI-generated SQL against a known schema. If the schema is stable and the queries are parameterisable, write the queries.

A predictable counter to all of this: but agents reason across tool calls — isn’t that the engine in the middle?

Orchestration isn’t reasoning. Most agents in production are state machines with an LLM choosing the next transition. When the transitions are knowable, choose them in code. When they’re not, the LLM is doing genuine work. And that work belongs at the edges of a deterministic flow, not in place of it.

Patterns where AI earns its keep

Extracting structure from unstructured input. Resumes, contracts, emails, support tickets, voice transcripts. Turning prose into rows is exactly the job LLMs were built for.
Drafting from intent. A first draft of an email, a summary, a proposal, a code change. The model is not intimidated by the blank page. The human knows how to edit.
Judgement over fuzzy criteria. “Does this support ticket sound angry?” “Is this product description on-brand?” Things you could write a rubric for but not a regex.
Translation between formats or registers. Plain English to SQL, legal prose to plain English, technical changelog to customer-facing release notes.
Exception handling at the edges. When the deterministic path covers the bulk of cases and the residual is a mess of one-offs, AI is often the right way to clean up what’s left.

The common thread: in every case, the alternative isn’t a faster, cheaper version of the same thing. The alternative is doing it manually, or not at all. That’s the real test.

A heuristic for Monday morning

Here is one rule worth giving your engineering and product teams:

Before adding AI to a workflow, build the deterministic version first. If it covers the common case, ship it. Use AI only for the residual.

This inverts the default of the last two years. Instead of asking “where can we add AI?”, the question becomes “where does deterministic code fail, and is the failure worth the cost of an AI call to fix?” Most of the time the answer is no. Sometimes it’s yes, and those are the places where AI actually compounds.

What good looks like

The companies that come out of the next two years ahead won’t be the ones who burn through the most tokens. They’ll be the ones who used tokens like a scalpel and not a sword. They used them at the edges where it shines, in the residuals where rules can’t reach… and used boring, reliable, cheap code everywhere else.

There is a useful corollary for portfolio reviews: any AI feature that can’t articulate what the deterministic version would have looked like, and why it wasn’t enough, probably shouldn’t exist. Not because AI is wrong, but because the team building it probably never did the work of understanding the problem.

The ongoing token pricing shift is doing executives a huge favour. It’s making the “do I even need AI here?” question impossible to avoid.