12 minute read

Why even is Tokenmaxxing?

The news are coming in hot now: tokenmaxxing is a waste of time. The funny-in-a-sad-way thing is that we’ve known this for over half a century. So how did it ever become a thing? And what even is tokenmaxxing?

Written by

Kristoffer M. Yi FredrikssonDigital Strateg17 jun, 2026

AI Column Digitalization Strategy

Tokenmaxxing is a viral “suffix combination”, just like something-gate or something-athon. The suffix “maxxing” has its origins in game theory from the 1940s and the expression min-maxing. It simply means that you maximise one aspect of something at the cost of all other aspects. Like beating your cheekbones with a rubber mallet in the hope of making them more prominent. In that case, you maximise the act of beating yourself with a mallet at the cost of rational thinking and medical advice. But I digress.

Tokens, in turn, refer to the smallest measurable unit that a so-called AI uses when it does its thing. No matter if it’s reading information being sent to it or responding with text, images, code, audio, etc. The exact size of a token is a bit fuzzy. It used to be “about the size of a syllable”, but how does that translate to an image or sound? Be that as it may, they are convenient for measuring and limiting AI use; therefore, they’re also useful for charging for AI use.

Anyhow, tokenmaxxing is the act of maximizing your use of tokens, and thereby AI. And what is wrong with that?

Well, here’s the COO at Uber, Andrew Macdonald, pushing back against tokenmaxxing in an interview.

From The Verge, my emphasis:

[…] the company isn’t seeing a connection between rising token consumption for Claude Code and more useful features being delivered to consumers.

”That link is not there yet, right? I think maybe implicitly there is more that is getting shipped, but it’s very hard to draw a line between one of those stats and, ‘Okay, now we’re actually producing 25 percent more useful consumer features.’”

---

No direct link between tokenmaxxing and useful consumer products.

But one thing tokens are good for is letting management look at pretty dashboards, and see who is using AI, and how much. But as Goodhart’s Law states: "As soon as a measure becomes a target, it ceases to be a good measure."

Let’s take a look at McDonald’s for a perfect example of this.

It used to be that you ordered at the counter and waited until your precooked meal was put on a tray in front of you. Then they changed (in Sweden, at least) so that meals were prepared after they were ordered, giving customers a “fresher” burger. The latest development was that they added self-service kiosks, to great success, and once your order number showed up on a screen, you could pick up your meal at the counter.

That is where things went wrong.

Someone got the idea that you could track the performance of a team by measuring how long it took from when a customer placed their order until it was ready for pickup. Team A has N minutes between order and delivery, Team B N+3 minutes; clearly team A is better. Right?

Then the obvious thing happened. If you go to a McDonald’s today you will see that your order is moved to the “Ready for pickup” column well before it is actually ready for pickup.

Instead, you have to wait until they physically call out your number. This is so that teams can game the score and get “better” performance according to the measurement that became a target. Which means that the measurement is unreliable.

The same thing happened at Amazon with tokenmaxxing.

Management told everyone they should become more productive and/or efficient by using AI, and the most obvious way to measure this was to monitor how many tokens the employees were using. The employees gamed that by asking their agents to do unnecessary “work”, like solving all of the NYT crossword puzzles or redrawing an image of a unicorn 10 000 times. (Okay, I’m not sure exactly what they did to burn as many tokens as possible, but you get the idea: useless busywork.)

Again, this was such an obvious outcome that it got a name more than 50 years ago.

What could they have measured instead?

Well, measuring things is hard, and measuring the effect of AI isn’t an exception. I can personally say that the prototypes I’m currently doing are the best work I’ve ever done in my 25-plus years as an interactive media worker. That is absolutely 100% thanks to me having access to a coder that turns my ideas into rich interactive mockups that can be evaluated and refined in the span of hours instead of weeks.

I know they are the best because I’m deeply involved in my career (obviously) and have full knowledge of my capabilities (hopefully). But is it measurable?

No.

Unless someone took the time to sit down and listen to me, and compare my previous work with what I’m currently doing, it would be hard to communicate. And even if someone did sit down and look at my work, there’s still no number to put in a column on a dashboard.

Let’s look at search as another example.

To me, it’s obvious that an LLM is better than a traditional Google search at answering a question like: “How many firework-factory explosions have there been in the last 12 months?”

Try this in Gemini or Claude, or whatever chatbot you prefer: “Find news articles about firework factory explosions over the past 12 months, put them in a table with date, headline and link. Only unique entries, most reputable source if more than one”.

Then try to manually create that table using the result from this BASIC GOOGLE VERSION.

The LLMs are soooo much better for this. But how would you measure that? Time-to-table?

I see this struggle with measurements everywhere I look. A recent study said that 80% of companies didn’t see any productivity gains from using AI, but what does productivity gains mean exactly? Reports created? Code written? Internal emails sent?

Speaking of internal emails, an article on hbr.org says that workslop, which could be a potential outcome of productivity gains, is poisoning the workplace. Workslop is the term for AI-generated requests that look great but lack substance. In a study they cite, 40% of respondents said they had received workslop, and that it took, on average two hours per incident.

So I wonder what gains the 20% saw. Twenty percent isn’t nothing, and if they saw an increase in revenue, then that’s a rather good number. One in five were successful; the others kept on existing. Sounds good if true.

I personally feel that I’ve gotten a lot of benefits from AI, and I’m sure others are seeing real benefits too. Many of the projects we’re working on with our clients use AI in one way or another, usually as an augmentation rather than automation.

But the idea that there’s an easy KPI we can put on a dashboard completely misses the point. And that, ladies and gentlemen, and anyone else who made it all the way here, is why Silicon Valley is obsessing over taste.

But that is for another day.

What are tokens and how to count them?

AI-Generated “Workslop” Is Destroying Productivity

Amazon Shuts Down AI Leaderboard After Employee 'Tokenmaxxing'

Workslop: The Hidden Cost of AI-Generated Busywork

Why Tech Bros Are Now Obsessed with Taste