Most AI ROI calculations are wrong — not because the math is wrong, but because they're measuring the wrong things. Counting the hours AI "saved" as if they became revenue is one common mistake. Attributing improvements to AI that would have happened anyway is another. Here's how to measure what AI is actually doing for your business.
Why Most ROI Calculations Are Wrong
The typical calculation: "Our team spends 10 hours per week on task X. AI cuts that to 3 hours. 7 hours saved × average hourly rate = £Z in value." This is a reasonable starting point but it assumes those 7 hours become productive in other ways, that the quality of output is maintained, and that there are no new costs (oversight, error correction, tool licensing) to offset the savings. In practice, all three assumptions are often wrong.
Metrics That Matter
Time saved — with conditions
Time saved is real if: the saved time is redirected to a measurable alternative use, and the quality of the AI-assisted output equals or exceeds the human-only baseline. Measure both. "We save 5 hours a week on reports" is only valuable if those 5 hours become client calls, product development, or something else with measurable output.
Output quality — against a baseline
If AI is assisting with a task, compare the quality of AI-assisted output to pre-AI output using the same criteria you used before. For writing tasks: accuracy, clarity, relevance. For customer support: resolution rate, satisfaction score. For code: bug rate, review time. If quality drops, the time savings are partially or fully offset by the cost of fixing errors.
Error rate and correction cost
Track how often AI output needs significant correction. Every correction is a cost. In most implementations, error correction is the hidden cost that erodes the ROI calculation — especially in the early months when prompts haven't been refined yet.
Capacity unlocked
Sometimes the value of AI isn't saving time on existing tasks — it's enabling tasks that weren't possible at all. A two-person content team that couldn't publish more than twice a week can now publish five times a week with the same headcount. That's a capacity unlock, not just a time saving, and it needs to be valued differently.
Metrics That Mislead
"Hours saved" without redirection
If the saved hours result in people browsing the internet or attending more meetings, the ROI is zero. Track what the freed time is used for, not just that it was freed.
Comparison to the worst-case human baseline
If your baseline is "how long it takes the slowest person to do this task", you're not measuring AI's value — you're measuring underperformance. Use a realistic baseline: how long it takes a competent, experienced person with good tools.
Volume metrics without quality check
"We produce 3x more content" is meaningless if the additional content performs worse than the original. Track downstream metrics: engagement, conversion, customer feedback — not just production volume.
A Practical Measurement Approach
- Identify one task type where AI is deployed
- Measure pre-AI baseline: time taken + quality metric + error rate
- Deploy AI assistance for 6 weeks
- Measure the same three metrics
- Calculate the net change, accounting for: tool cost, setup time, oversight time, correction time
- If net positive: expand to similar task types. If neutral or negative: adjust the implementation first.
Six weeks is the minimum for a reliable signal — the first two weeks are always noisy as the team adapts to the new workflow.