Technology and Society

Open versus paid Ai: what I actually use, and why I keep both

Tommy Findlay

20 June 2026

Open versus paid Ai: what I actually use, and why I keep both

A few days ago an open model called GLM-5.2 was released that most independent testers now rank as the best open-weight Ai in the world, close enough to the paid frontier that the gap is measured in points rather than leagues. I spend most of my working day inside Claude and Claude Cowork, which are paid and closed, so a release like that is worth paying attention to, not because it changes what I use tomorrow, but because it changes the shape of the choice. The interesting question in the middle of 2026 is no longer whether open-source Ai is any good, because it plainly is, but where each kind belongs in a business, and that turns out to be a governance decision as much as a technical one.

What "open" actually means, and what it does not

There are three categories hiding behind the word "open", and the difference matters as soon as real money and real data are involved. A closed or proprietary model, such as Claude Opus 4.8 or OpenAI's GPT-5.5, is one you reach only through the vendor's app or an interface called an API, where the model itself stays private and you simply rent access to it. An open-weight model is one whose trained "weights", the billions of numbers that make up its brain, are published for you to download, run and adapt, usually under a permissive licence, even though the training data and the full recipe are not released. A genuinely open-source model, in the strict sense the Open Source Initiative uses, goes further again and publishes enough of the data and the method that someone could rebuild it, and almost no frontier-class model actually clears that bar. GLM-5.2 is the one worth knowing here, because it ships under a permissive MIT licence that allows commercial use with almost no strings, which makes it open-weight rather than fully open-source. The honest line for a business is that "open" is a governance word rather than a marketing one, and most of the capable models you can download today are open-weight, not open-source.

GLM-5.2, and how far the gap has narrowed

GLM-5.2 is made by Z.ai, the Chinese lab formerly known as Zhipu, and on the numbers it is a serious piece of work. It is what the industry calls a mixture-of-experts model of roughly 744 to 753 billion parameters, of which only about 40 billion fire on any given word, which is the trick that lets something this large run at a sensible cost, and it carries a one-million-token context window, meaning it can hold an entire mid-sized codebase and its history in mind at once. The independent benchmarker Artificial Analysis currently rates it the top open-weight model in the world, a little behind Claude Opus 4.8 and GPT-5.5 on its overall intelligence index but genuinely close, and on several coding tests it already edges past GPT-5.5 while landing within about a point of Opus on long-horizon software work.

The point that often confuses people about an open model is that "open" does not mean "free to use" in the way you might expect, because GLM-5.2 is far too large for most people to run on their own hardware, so in practice you still pay to use it. The difference is that instead of paying the company that built it for exclusive access, you pay whichever hosting provider you prefer to run it for you, much as you already pay Anthropic to run Claude, and because many providers can serve the very same open weights and compete on price, the cost falls sharply. On Z.ai's own hosted pricing, for instance, GLM-5.2 works out roughly five times cheaper per unit of output than Claude Opus 4.8, at about 1.40 dollars per million input tokens against five, and 4.40 against twenty-five.

None of that makes it a clean replacement for the best paid tools, and the weaknesses deserve the same honesty as the strengths, because GLM-5.2 is text-only, it is unusually verbose and burns a lot of tokens thinking before it answers, and independent testing still puts its tendency to invent things well above the frontier models on factual work. The fair summary is that open models have moved from "fine for second-tier jobs" to genuinely frontier-adjacent, and that is a real shift even with every caveat attached.

Why "open weight" is not the same as "run it yourself"

The part that tends to get lost in the excitement is that being allowed to download a model is not the same as being able to run it. The full GLM-5.2 release is about 1.51 terabytes, which is server-cluster territory, so in practice you compress it through a process called quantisation, which stores the model's numbers at lower precision to shrink it, at some cost to quality. Even the aggressively compressed two-bit build that the specialists at Unsloth published is still around 239 gigabytes and holds only about 82 percent of the original accuracy, which means you need something like a top-end Mac Studio with 256 gigabytes of memory just to load it, and even then it runs slowly, at a few tokens a second rather than the near-instant speed of a hosted service. Running the full-quality version at production speed is an eight-GPU job that costs hundreds of thousands of pounds. So for almost every business the realistic way to use an open model is still to rent it from a hosting provider rather than to self-host, which means the immediate value of "open" is not that you keep it in your own server room, it is that more than one provider can serve the same model, which drags the price down and gives you somewhere else to go if one of them changes the deal.

The model is only half the story: Hermes and the agent layer

A model on its own does not do very much, because to get real work out of it you need something to run it on a schedule, give it tools, hold its memory and let you talk to it, and that layer is where the open and paid worlds split again. On the open side the clearest example is Hermes Agent, an open-source runner from Nous Research that you install on your own machine, point at whatever model you like, and message through Discord, Telegram or Slack, with built-in scheduling so it can do jobs while you sleep. On the paid side sits Claude Cowork, Anthropic's polished desktop version of the same idea, which is far easier for a non-technical person to pick up but only drives Anthropic's own Claude models and lives inside Anthropic's app. The trade is the same one as with the models themselves, flexibility and control on one side against polish and simplicity on the other, and I will get into how Hermes actually works in the next piece, including cron jobs, which is simply the long-standing computing term for tasks set to run automatically on a schedule, a kind of digital alarm clock for your agent. For now the point worth holding is that "open versus paid" is a choice you make twice, once for the model and once for the thing that runs it.

Where each one wins

Set against each other in plain business terms, the two approaches have honest strengths and honest costs. The paid frontier stack, Claude and Claude Cowork in my case, still wins on the things that decide whether a tool actually gets used, which are the very top of capability on hard and messy work, a polished experience that a finance lead or an operations manager will trust, proper support, and almost no infrastructure for you to run yourself. What you give up for that is control, because you cannot self-host it, your data sits with the vendor, and you adapt on the vendor's timeline when prices move or a model is retired. The open stack, GLM-5.2 and Hermes, wins on cost, on optionality and on sovereignty, because the same open weights can be moved between providers or kept entirely inside your own infrastructure when a client or a regulator demands it. The cost there is operational, because someone has to run it, quality and speed vary more, and there is one catch that matters a great deal for a UK business, which is that Z.ai's own hosted service routes your data through servers subject to Chinese law and its parent company sits on a US trade-restriction list, so the privacy advantage of "open" only really holds if you self-host the weights or choose a vetted Western host rather than the cheapest one going.

What I actually do

My own setup is not loyal to either camp, and after the past fortnight I am more convinced that is the right instinct. I keep Claude and Claude Cowork as the daily driver for the high-stakes work where quality and reliability earn their price, and I treat capable open models like GLM-5.2 as the cost-sensitive, high-volume and fallback tier, the place where routine work can run cheaply and the insurance policy if a paid model is repriced or pulled. That last point is not hypothetical, because a fortnight ago the US government forced Anthropic to take two of its most powerful models offline worldwide overnight, which is about the clearest argument you could ask for against building a whole business on a single model you do not control. The sensible architecture now is not to pick a winner, it is to run a governed, tiered stack with a real open-weight fallback, and to keep the discipline to decide which work goes where on the basis of how sensitive the data is and how much the quality genuinely matters. How you wrap governance around all of that, the data rules, the ISO standards and the audit trail, is where this series is heading next.

Sources

GLM-5.2 rated the leading open-weight model, with specifications and benchmarks: Artificial Analysis.
GLM-5.2 beating GPT-5.5 on coding benchmarks at roughly a sixth of the cost: VentureBeat.
GLM-5.2 hosted API pricing: Z.ai documentation. Claude Opus 4.8 pricing: Anthropic.
Running GLM-5.2 locally, model size, quantisation and speed: Unsloth.
Hermes Agent: Nous Research. Claude Cowork: Anthropic.
Zhipu (Z.ai) added to the US Entity List: South China Morning Post.
The Fable 5 and Mythos 5 shutdown: Anthropic's statement.