Return on Token Spend: The Only AI Metric in 2026
TL;DR — Key Takeaways
- $2.5 trillion was spent on AI globally in 2025–2026. The measurable return on most of it is close to zero.
- Return on Token Spend (ROTS) is a framework for treating AI spend as an investment to be measured, not a cost to be minimized.
- Most organisations are stuck in the Valley of Death: burning tokens across disconnected tools with no system to measure what they produce.
- The ROTS J-curve has four phases: Quick Wins, Valley of Death, Breaking Even, Flywheel. Crossing from phase two to phase three is the hardest transition.
- The Google Ads parallel: AI is where digital advertising was in 2005 — expensive, unproven, and about to enter the measurement phase that separates winners from burners.
- Two roles at ALTHERR were replaced by agents. A house sale was executed for $0.14 in API costs. These are not edge cases. They are what ROTS-positive looks like.
- Investors are already pricing AI spend as a liability. Credit research points to a 30 basis point credit spread penalty for companies spending on AI without evidence of return.
$2.5 trillion.
That is what the world spent on AI in the past twelve months, per analyst estimates. A 44% increase over the previous year. The largest technology investment cycle in history, deployed faster than electrification, faster than the internet, faster than mobile.
And the return?
Research across multiple sources paints a consistent picture: 95% of AI pilots deliver zero measurable P&L impact. 42% of companies abandoned most AI projects in 2025. Only 25% of initiatives deliver expected ROI. Among S&P 500 companies, just 21% could cite a measurable AI benefit. The number of organizations realizing significant ROI from agentic AI sits at roughly 10%.
The pattern is consistent across every source that measures rigorously. Not one in ten AI investments is paying back in a way the CFO can see.
This is not a technology problem. The models work. The infrastructure works. The problem is that most businesses treat AI spend as an operating cost to be minimized rather than an investment to be measured. They ask “how do I spend fewer tokens?” instead of “how do I spend more tokens because I know every token has a return?”
That second question is what Return on Token Spend answers.
Most organisations quit in the Valley of Death. The ones that build systems cross into compounding returns.
What is Return on Token Spend?
Return on Token Spend is a simple ratio with radical implications.
ROTS = (Value Generated − AI Spend) / AI Spend
Value can be labour cost avoided, revenue generated, time saved converted to an hourly rate, or error reduction quantified in financial terms. The numerator is the business outcome. The denominator is what you spent on tokens, API calls, model hosting, and the compute to run it.
When ROTS is negative, you are burning money. When ROTS is zero, you are breaking even. When ROTS is positive, every dollar of AI spend returns more than a dollar of value, and the rational move is to scale spend until marginal return equals marginal cost.
The shift in mindset is what matters. Most organisations track their AI bill. They see the line item in the cloud console. They negotiate volume discounts. They switch providers to save a few cents per thousand tokens. None of this is wrong, but all of it is optimizing the wrong variable.
The right variable is return. And almost nobody is measuring it.
Industry research suggests that at scale, only about 5% of companies achieve substantial AI ROI, with an average payoff of roughly 1.7x. Studies show 74% of companies struggle to achieve and scale value from AI. Only 26% move beyond proof-of-concept. Multiple sources document enterprise AI failure rates between 70% and 85%.
These numbers are not a verdict on AI. They are a verdict on how AI is being deployed.
The ROTS J-curve
Every major technology adoption follows a J-curve. Early returns are negative as the organisation learns, experiments, and builds infrastructure. Then, if the technology is genuinely productive, returns turn positive and compound.
AI is no different. The ROTS J-curve has four phases. Most organisations are stuck in phase two.
Phase 1: Quick Wins
An executive buys ChatGPT Plus. A marketer starts using Claude for copy drafts. A developer drops Copilot into VS Code. Productivity bumps appear immediately. The work feels faster. The quality feels acceptable. ROTS is mildly positive because the spend is tiny and the perceived value is high.
This phase is deceptive. The gains are real but shallow. They come from individual acceleration, not organisational transformation. The moment you try to scale — buying enterprise licences, training a team, integrating with workflows — the easy wins evaporate and the real work begins.
Phase 2: Valley of Death
This is where most companies live in 2026.
74% of companies struggle to achieve and scale value from AI. Only 26% move beyond proof-of-concept.
Tool sprawl sets in. Marketing buys Jasper. Sales adopts Gong’s AI features. Engineering spins up internal LLM experiments. Customer service pilots a chatbot. Each team has its own stack, its own vendor, its own API key. Nobody is measuring whether any of it works.
Axios and Inc report that AI can now cost more than human workers in some implementations. IDC found that nearly every enterprise underestimates AI implementation cost across the full lifecycle. DataRobot’s research shows hidden costs quietly derailing agentic AI strategy. Build in Digital documents the security, cost, and compliance risks of AI agent sprawl.
ROTS goes deeply negative in this phase. The spend is real — enterprise licences, API volume, integration engineering, vendor management. The return is theoretical. Most organisations in the Valley of Death do not know their ROTS because they have never tried to calculate it.
Phase 3: Breaking Even
The transition out of the Valley of Death happens when a company builds systems instead of buying tools.
This is the point where rudimentary measurement appears. Someone starts tracking what the AI actually produces. A content pipeline shows that agent-generated posts perform at 80% of human-written posts at 5% of the cost. A support chatbot resolves 40% of tier-one tickets without escalation. An SEO workflow generates briefs that rank.
ROTS crosses zero. The company is no longer burning money on AI. It is getting value equivalent to what it spends. This is not victory. It is the minimum viable state. But it is the gateway to everything that follows.
Phase 4: Flywheel
This is what ROTS-positive looks like.
AI agents operate as digital employees. They do not assist with tasks. They own workflows end to end. A content agent researches, writes, formats, and publishes. A reporting agent pulls data, identifies anomalies, and flags decisions for human review. A pricing agent monitors competitor moves and recommends adjustments.
The spend is higher than in phase one — significantly higher. But the return is compounding. Every token spent is directed at a measured outcome. The system gets smarter as it runs. Quality improves. Speed increases. The marginal cost of the next unit of output approaches zero.
This is where AI stops being a line item and starts being a profit centre.
The Google Ads parallel
There is a precedent for this transformation, and it is instructive.
In 2005, Google Ads was expensive, unproven, and easy to burn money on. Businesses bought keywords without understanding quality score. They sent traffic to homepages instead of landing pages. They measured clicks instead of conversions. The average advertiser lost money.
Then the measurement phase arrived. Attribution models improved. Conversion tracking became standard. ROAS — Return on Ad Spend — emerged as the dominant metric. Businesses that could prove a positive ROAS scaled aggressively. Businesses that could not went broke or quit.
By 2015, digital advertising was the largest segment of global ad spend. Google and Meta became two of the biggest companies in the world. The transformation was not driven by cheaper clicks. It was driven by better measurement. Once you could prove return, scaling became rational.
AI is at the 2005 moment. The spend is massive. The measurement is primitive. The companies that build ROTS measurement first will capture the most value as the market matures.
Google’s ROAS framework is the direct parallel. It took years for advertisers to move from “how do I spend less?” to “how do I spend more because I know it works?” AI is making the same transition now. The question is who makes it first.
How to measure ROTS in practice
The theory is simple. The practice requires discipline.
Start with one workflow. Not ten. One. Pick a function where AI is already in use and the output is measurable. Content production is a common starting point because the metrics are obvious: posts produced, engagement generated, leads attributed.
For that one workflow, track three numbers:
- Total AI spend — tokens, API calls, hosting, model fine-tuning, vendor licences. Include everything that would disappear if you shut the workflow off.
- Value generated — the business outcome the workflow produces. If it is content, value might be equivalent labour cost (what would a human have cost to produce the same output?) or attributed revenue (leads generated × conversion rate × deal value). If it is customer service, value is ticket resolution cost avoided. Be conservative. Round down.
- ROTS — (Value − Spend) / Spend. A ROTS of 0.5 means every dollar spent returns $1.50 in value. A ROTS of 2.0 means every dollar returns $3.00.
The mistake most companies make is stopping at step one. They know their AI bill to the cent. They have never connected it to a business outcome.
Research suggests most companies accelerate existing work rather than redesign workflows around AI. This is why their ROTS stays negative. You cannot measure the return on a workflow that was never redesigned to use AI properly.
The measurement itself changes behaviour. Once a team knows their ROTS is being tracked, they start asking different questions. Not “which model is cheapest?” but “which model produces the highest-value output for this specific task?” Not “how do we cut API spend?” but “how do we increase the value numerator?”
What ROTS-positive looks like
Theory is useful. Examples are better.
The house sale
In May 2026, I am selling a house in Maresias, Brazil. R$ 4.95 million. Traditional agent commission: 5–6%, or roughly R$ 300,000. Instead, I built a landing page, set up Google Ads conversion tracking, and ran a WhatsApp-only funnel. The entire execution — copy, design, tracking, deployment — was handled by an AI operations partner running local models.
Total API cost: approximately $0.14.
The house has not sold yet. The market decides that. What is already decided is the ROTS: R$ 300,000 in commission avoided for $0.14 in compute. Even if the campaign burns R$ 10,000 in ads before closing, the ROTS remains astronomical. The full breakdown is documented in the live experiment.
This is not a typical case. It is an extreme one. But extremes clarify what is possible when execution compresses to near-zero cost.
Two roles at ALTHERR
At ALTHERR, the luxury watch retailer where I run marketing, we reduced the team from eight to six in December 2025. Not by cutting output. By replacing two roles with agentic workflows.
2 roles eliminated. 0 output reduction. Near-zero marginal AI cost. The salaries alone paid for the infrastructure in under a month.
An Instagram manager was replaced by an N8N automation that schedules posts, generates captions, and maintains cadence. Setup took one week. The role no longer exists in the org chart. An SEO junior was replaced by a senior operator working with AI-assisted research and drafting. Output improved in both volume and quality.
The AI spend on these workflows is negligible — local models, existing infrastructure. The value is the fully loaded cost of two salaries, plus benefits, plus management overhead, plus the recruitment cycle to replace them if they left. ROTS is not just positive. It is transformative.
The methodology that made this possible — validating everything on a personal test stand before touching the €20M business — is documented in the test stand approach.
The content pipeline
My personal brand content system runs on OpenClaw with local models. Keyword research, interviewing, drafting, formatting, and publishing are all agentic. Human input is limited to approval — a pass/fail check, not editing.
The cost is near-zero. The output is three to five LinkedIn posts per week, one newsletter edition, one blog article, and derivative carousels and Instagram content. To produce equivalent volume manually would require a content manager at minimum. Probably more.
This is what agentic marketing looks like when it works. Not AI assisting humans. Humans directing agents that execute autonomously.
The investor angle
The ROTS framework is not just an operational tool. It is becoming a valuation input.
Credit research suggests companies spending on AI without evidence of return face a 30 basis point credit spread penalty. The market is already pricing undirected AI spend as a liability.
Research on dual leaders — companies scoring highly on both measurement and infrastructure — found they returned 41.38% versus 29.40% for the S&P 500. A 1,200 basis point spread. The companies that measure well and build well outperform dramatically.
The implication for investors and boards: AI spend is not automatically good. Directed, measured AI spend is good. Undirected, unmeasured AI spend is a red flag. The question to ask in every earnings call is not “how much are you spending on AI?” It is “what is your ROTS, and how do you know?”
Companies that cannot answer that question are burning shareholder money. Companies that can answer it precisely are building compounding advantages that will be very hard to close later.
"CMOs planning headcount cuts are not cutting because AI is cheap. They are cutting because AI-directed workflows produce more output with fewer people, and the ROTS math is becoming undeniable."
The workforce implications are equally direct. CMOs planning headcount cuts in the next 24 months are not cutting because AI is cheap. They are cutting because AI-directed workflows produce more output with fewer people, and the ROTS math is becoming undeniable.
The assessment
Where does your organisation sit on the ROTS curve?
The fastest way to find out is the AI Readiness Assessment. Three questions. Thirty seconds.
- Are you burning tokens without a system? (Cost Position)
- Are you measuring what your tokens produce? (Early ROTS)
- Are you scaling token spend because you know the return? (ROTS-Positive)
Most organisations answer yes to the first, no to the second, and are not yet asking the third. That is the Valley of Death. The assessment shows you where you are and what the next phase requires.
The verdict
$2.5 trillion spent. Almost nothing to show for it.
This is not because AI does not work. It is because most companies deploy AI the way a tourist deploys a credit card in a foreign country: enthusiastically, without understanding the exchange rate, and with a nasty surprise when the bill arrives.
Return on Token Spend is the exchange rate. It tells you what your tokens are actually buying. Without it, you are guessing. With it, you can scale rationally, invest confidently, and build the kind of compounding advantage that only comes from knowing your numbers.
The companies that figure this out first will not just save money. They will capture market share from competitors who are still optimising for the wrong variable.
Measure ROTS. Everything else follows.
FAQ
What is Return on Token Spend (ROTS)? Return on Token Spend is a framework for measuring the business value generated per dollar spent on AI tokens and compute. It treats AI spend as an investment to be measured, not a cost to be minimized. A positive ROTS means every token you spend returns more value than it costs.
How do you calculate ROTS? ROTS = (Value Generated − AI Spend) / AI Spend. Value can be labour cost avoided, revenue generated, time saved converted to hourly rate, or error reduction quantified in financial terms. The key is measuring the output, not just the input. Most organisations track API spend but never connect it to a business outcome.
What is the ROTS J-curve? The J-curve describes four phases of AI adoption: Quick Wins (productivity bump from ChatGPT), Valley of Death (tool sprawl and negative ROTS), Breaking Even (rudimentary systems, ROTS crosses zero), and Flywheel (AI agents as digital employees, positive ROTS compounds). Most organisations get stuck in phase two.
Why do most AI projects fail to deliver ROI? Multiple studies confirm 70–85% failure rates. The core reasons: undirected experimentation without measurement, treating AI as a cost centre rather than an investment, tool sprawl without integration, and expecting plug-and-play results from technology that requires workflow redesign. Most organisations accelerate existing work rather than redesign processes around AI.
How is ROTS different from traditional AI readiness frameworks? Existing frameworks score readiness across dimensions like data quality, governance, and infrastructure. ROTS is different: it measures economic output. You can have perfect governance and zero return. ROTS forces the question every framework avoids: are your tokens making or losing money?
Want to know where your operation sits on the ROTS curve? Take the AI Readiness Assessment →
For a weekly dispatch on what agentic marketing looks like in practice — real tools, real decisions, real numbers — subscribe to the newsletter →