Terminal thinking

May 16, 2026 - By Stefano Tommesani

Disclaimer for technology predictions: they tend to age with the dignity of unrefrigerated milk, and this one may well do the same. Although, if it does, it will have to curdle around evidence that is already sitting in plain sight.

Brian Chesky runs a $70 billion company, and in October 2025 he told Bloomberg that Airbnb was “relying a lot” on Alibaba’s Qwen model for its AI customer service agent, because, as he put it, “It’s very good. It’s also fast and cheap.”⁹ᵈ Around the same time, Chamath Palihapitiya, an early Facebook executive and one of Silicon Valley’s more combative investors, said that he had moved company workflows from Amazon Bedrock to Moonshot AI’s Kimi K2,⁹ᶜ for the ordinary and therefore more interesting reason that Kimi K2, in the fuller version quoted by Chatham House, “was really way more performant and frankly just a ton cheaper than OpenAI and Anthropic.”⁹ᵉ

These are production decisions, not experiments, and the numbers around them are not subtle: according to data from Andreessen Horowitz and OpenRouter covering 100 trillion tokens of real AI usage, open source models built by Chinese labs went from 1.2% of global token volume in late 2024 to nearly 30% in late 2025. Martin Casado, a general partner at a16z, told The Economist last year that when startups come to pitch his firm, there is roughly an 80% chance that they are running on a Chinese open source model if they are using open source at all.¹

The frontier companies in the AI industry behave as if this were not happening. Incumbents usually have excellent reasons to misunderstand the thing that is about to make their margins worse.

In 1977, Ken Olsen, founder of Digital Equipment Corporation, which at its peak was the second largest computer company in the world, attended a World Future Society convention in Boston and said, “There is no reason for any individual to have a computer in his home.”⁵

Olsen precisely understood computers: DEC’s minis were extraordinary machines, far more capable than the microcomputers beginning to appear, and the model he favored was not irrational. He had a terminal at home, a word processing terminal that connected to DEC’s remote systems, as did Gordon Bell, his VP of Engineering, who in 1980 put the logic plainly, “I personally wouldn’t recommend anything other than a terminal for home because microcomputers aren’t big enough.”⁵

Both men had a computer at home, except that the computational capabilities lived elsewhere. They accessed it through a wire, and they assumed that everyone else would do the same.

In 1983, as IBM PCs were selling 200,000 units per month, a number IBM’s own forecasters had projected as the annual target,² Olsen gave one last interview on the subject to BusinessWeek. “The personal computer will fall flat on its face in business,” he said, “because users want to share files and want more than one user on the system.”⁵ He was right about the need, and completely wrong about the solution, because users did not need minicomputers in order to share files, they needed personal computers plus networks.

By 1984, the installed base of personal computers had reached 23 million, while large and medium computers, the machines that housed the serious, real, capable computing Olsen was defending, combined for fewer than 200,000.³

The analogy is not that networks lost, because they did not. The winning system was PC plus network, local compute where locality mattered, shared systems where scale still mattered. The error was assuming that because the central machine was more powerful, the central machine would remain the natural place for ordinary work.

IBM’s PC story matters here only for the speed of the shift, because once the PC was cheap enough, useful enough, and cloneable enough, the market organized around the machine on the desk, and the center of gravity moved to the edge.²

The lesson is not that centralized systems were useless, since they were powerful, profitable, and often technically superior, but that technical superiority stops mattering once a local alternative is good enough, cheaper, faster to adopt, and easier to control.

OpenAI, Anthropic, and Google now occupy Olsen’s chair.

They cannot say what is almost certainly true, that for the majority of everyday enterprise AI tasks, the performance difference between a frontier cloud model and a good open weight model running locally is small, narrowing, and increasingly hard to justify once price, latency, privacy, and control enter the calculation. Their businesses are built on per token revenue. Every enterprise workload that moves to a local model is a customer who stops paying.

So they make the Olsen argument: local models are not capable enough, the cloud is required for serious work, and open source cannot match them on alignment, safety, or quality.

The benchmarks are already awkward for that story. On SWE bench Verified, which tests models on real GitHub issues and actual production bugs in real codebases, the top six models in the world now cluster within 0.8 percentage points of each other.⁸ᵇ Three of those six are Chinese open weight models: DeepSeek V4 Pro scores 80.6%, MiniMax M2.5 and Kimi K2.6 each score 80.2%, while Claude Opus 4.7 leads at 82%. On the hardest practical coding test, the gap between the best open weight model and the best proprietary model is less than two percentage points.

On competitive programming, where Codeforces Elo measures models by having them solve actual competitive problems against the clock, DeepSeek V4 Pro has a rating of 3,206, higher than any other model tested, which puts it in the 96th percentile of human competitive programmers.⁸ᶜ On MATH 500, DeepSeek R1 scores 97.3%, compared with Claude Sonnet 4.6’s 97.8%.⁴ On GPQA Diamond, which tests PhD level science questions across physics, chemistry, and biology, DeepSeek V4 Pro scores 90.1% against Gemini 3.1 Pro’s 94.3%.⁸ᵈ

The PC analogy almost understates the case. Personal computers disrupted larger systems while delivering a fraction of the capability, whereas today’s open weight models are moving into cloud AI’s territory while already landing in the range that matters for most paid work.

There are still gaps. The very hardest reasoning tasks, native multimodal understanding, and sustained long horizon agentic work still favor the frontier systems, and that matters if the workload actually lives there. Most enterprise AI does not. Most of it is document processing, code completion, support routing, extraction, summarization, reporting, and the usual internal glue work.

A fair caveat is that SWE bench Verified has contamination issues, since OpenAI flagged in early 2026 that frontier models may have seen benchmark solutions during training. The harder SWE bench Pro variant, which uses a standardized scaffold and less contaminated problems, shows a wider gap, with Claude Opus 4.7 leading at 64.3%, and Kimi K2.6 and GLM 5.1 close behind at 58.6%.⁸ᵇ While the gap is real, it is also five points, not fifty, which is the sort of gap buyers tolerate when the alternative is cheaper, closer to the data, and under their control.

Cheaper API access is already a bad line in a frontier model P&L. Portability is worse. Once the weights move, the buyer is no longer comparing one remote meter with another remote meter, the buyer can entirely opt out of the meter.

Locally does not have to mean a laptop running everything in heroic isolation. It can mean a workstation, an enterprise appliance, an AI PC, a phone class device for smaller variants, or a box sitting in a branch office rather than a hyperscale data center. Hugging Face makes the shape of this fairly obvious: Qwen3 at 0.6B, DeepSeek R1 Distill Qwen at 1.5B, SmolLM3 at 3B, and Llama 3.2 at 1B are weights that can be loaded directly, served locally, quantized, or run through stacks such as llama.cpp, Ollama, LM Studio, MLX, MLC, ONNX, ExecuTorch, Docker, vLLM, or SGLang.⁴ᵃ Once that is possible, the interesting thing is not the discount on the API call, but the missing API call.

The API price still matters, but it belongs in the background here, because once performance is close enough the practical question becomes where inference lives. If a model running on controlled hardware can handle the extraction job, the routing job, the code assistance job, or the compliance check, then paying a frontier provider for every request starts to look like using a terminal because DEC insists the minicomputer is better. It probably is better, but for most of the work that no longer settles the question, because the models cost less, they are good enough for the work, and the weights are available. Control, in this context, is architectural more than contractual: a workload that used to require a remote model can move beside the data, the application, or the user, and the product does not wake up broken because a vendor changed policy, pricing, latency, or model behavior overnight.

The startup ecosystem has already noticed: on OpenRouter, Chinese models now hold four of the top ten positions by usage, whereas a year earlier the chart was dominated by American providers,⁹ᶜ and Alibaba’s Qwen family crossed 700 million downloads on Hugging Face in January 2026, with more than 170,000 derivative models built on top of it.⁹ᵇ There are real limits, since Chinese models have political constraints baked into their training, DeepSeek declines to engage with sensitive topics, and researchers have documented systematic framing differences on geopolitical questions. Government applications, regulated industries, and any workload where model behavior on sensitive topics needs to be auditable should not run on Chinese origin models. When Nathan Benaich of Air Street Capital told The Wire China that for government and high stakes enterprise applications, “security is paramount,” he was naming a genuine constraint rather than a marketing objection.⁹ᶜ

For the vast majority of enterprise AI deployment, though, the routine, high volume, repetitive work, those constraints do not decide the architecture.

The old timesharing story is useful, provided we keep it narrow. In 1980, eight major timesharing companies ranked among the Datamation 100, the annual list of the largest American data processing firms. Timesharing was the cloud AI of its era: you connected to a powerful remote computer, ran your workloads, and paid for the access. These were serious businesses, as Tymshare had nearly 3,000 employees by 1978 and had gone public in 1970,⁷ but by 1991, seven of those eight companies had disappeared entirely, not declined, not retreated to a niche, but vanished.³ ⁶

The personal computer did not make remote computation cheaper. It made remote computation optional for ordinary work. If the PC had only lowered the cost of timesharing, the old architecture would have remained the center of the market. Instead the compute moved onto the desk, and users no longer needed permission, scheduling, metering, or a live connection to someone else’s machine just to get work done.

Open source AI is doing the same thing to cloud inference. A hosted open model hurts OpenAI, Anthropic, and Google on price. A local open model hurts them in a more basic way, because it removes the API from the workflow. For most business applications, where the work is classification, extraction, summarization, code assistance, routing, transformation, search, drafting, and compliance checking rather than frontier scientific reasoning, the performance gap has narrowed enough that the question is no longer whether the cloud model is better in the abstract. It is whether it is better enough to justify latency, data movement, policy exposure, vendor dependence, and a perpetual meter.

Some work really does need scale, and it will stay there. Mainframes did not disappear after the PC revolution, they retreated to the domains where their architecture was genuinely irreplaceable: banking transaction processing, airline reservation systems, large scale batch workloads. Cloud AI is likely to follow the same path. The hardest scientific reasoning, complex multi step agentic workflows, and cutting edge multimodal generation will sustain a market for the best proprietary systems, but that market is smaller than the one currently implied by the revenue expectations of the cloud model companies.

The cloud still has a job. It just stops being the default place where every ordinary request must go. Routine inference moves to the edge, where it is cheaper, faster, private, and easier to control, while the cloud remains for the hard cases. The API becomes infrastructure, important where scale is actually needed, invisible where local compute is already enough.

References

¹ OpenRouter & Andreessen Horowitz, State of AI: An Empirical 100 Trillion Token Study, December 2025. openrouter.ai/state-of-ai, source for Chinese OSS token share growth (1.2% → ~30%) and a16z/Casado 80% open source adoption figure; also officechai.com clarification of Casado quote; ciw.news “US Startups Adopt Chinese AI Models” (Nov 2025).

² James W. Cortada, “How the IBM PC Won, Then Lost, the Personal Computer Market,” IEEE Spectrum, 21 July 2021. spectrum.ieee.org/how-the-ibm-pc-won-then-lost-the-personal-computer-market, source for IBM PC launch, sales data, open architecture, and the clone market.

³ W. Edward Steinmueller, “The U.S. Software Industry: An Analysis and Interpretive History,” in David C. Mowery (ed.), The International Computer Software Industry, Oxford University Press, 1995, source for mainframe, mini, and PC shipment data, 23M installed base by 1984, and timesharing Datamation 100 data.

⁴ Vellum, “Best LLM for Coding,” updated March 2026. vellum.ai/best-llm-for-coding, MATH 500 scores, LiveCodeBench parity between GPT oss 20B and 120B (both 69%), DeepSeek R1 97.3%.

⁴ᵃ Hugging Face model cards for Qwen/Qwen3 0.6B, DeepSeek AI/DeepSeek R1 Distill Qwen 1.5B, HuggingFaceTB/SmolLM3 3B, and Meta Llama/Llama 3.2 1B Instruct, huggingface.co/Qwen/Qwen3-0.6B, huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, huggingface.co/HuggingFaceTB/SmolLM3-3B, huggingface.co/meta-llama/Llama-3.2-1B-Instruct, sources for direct loading with Transformers, local serving with vLLM or SGLang, Docker Model Runner support, local app or quantization paths, and SmolLM3 local inference paths including llama.cpp, ONNX, MLX, MLC, and ExecuTorch.

⁵ Quote Investigator, “There is No Reason for Any Individual To Have a Computer in Their Home,” September 14, 2017. quoteinvestigator.com/2017/09/14/home-computer/, source for Olsen 1977 World Future Society quote, Gordon Bell 1980 terminal quote, Olsen 1983 BusinessWeek quote, and DEC internal prototypes.

⁶ Martin Campbell-Kelly and Daniel D. Garcia-Swartz, “Economic Perspectives on the History of the Computer Timesharing Industry,” SSRN, January 2006. papers.ssrn.com/sol3/papers.cfm?abstract_id=880740

⁷ Computer History Museum, “Timesharing as a Business,” CHM Revolution Exhibition. computerhistory.org/revolution/mainframe-computers/7/181, Tymshare 3,000 employees by 1978, subsequent disappearance.

⁸ᵇ MorphLLM, “Best AI for Coding (2026).” morphllm.com/best-ai-model-for-coding, SWE bench Verified and Pro scores, 0.8pp clustering.

⁸ᶜ MorphLLM, “DeepSeek V4 (2026): Specs, Benchmarks, API Pricing.” morphllm.com/deepseek-v4, Codeforces Elo 3,206, LiveCodeBench 93.5%, CSA+HCA architecture, 27% FLOPs reduction.

⁸ᵈ pricepertoken.com GPQA leaderboard, updated May 11, 2026. pricepertoken.com/leaderboards/benchmark/gpqa

⁹ᵃ ciw.news, “US Startups Adopt Chinese AI Models for Performance and Cost Gains,” November 2025. ciw.news/p/silicon-valley-chinese-ai-adoption

⁹ᵇ Andreessen Horowitz, “Asserting American Leadership in Open Source AI,” April 2026. a16z.com/asserting-american-leadership-in-open-source-ai/, 700M Qwen downloads, 170K derivative models, $25B savings calculation.

⁹ᶜ The Wire China, “Cheap and Open Source, Chinese AI Models Are Taking Off,” November 2025. thewirechina.com/2025/11/09/cheap-and-open-source-chinese-ai-models-are-taking-off/, Chesky/Airbnb, Palihapitiya/Kimi, Nathan Lambert quote, Benaich/Air Street Capital quote, OpenRouter chart data.

⁹ᵈ Silicon UK, “Airbnb Praises Alibaba’s Open-Source AI Model,” October 23, 2025. silicon.co.uk/e-innovation/artificial-intelligence/airbnb-alibaba-ai-627141, source for Chesky’s Bloomberg reported “relying a lot” and “It’s very good. It’s also fast and cheap” Qwen quotes.

⁹ᵉ Chatham House, “Low-cost Chinese AI models forge ahead, even in the US, raising the risks of a US AI bubble,” November 2025. chathamhouse.org/2025/11/low-cost-chinese-ai-models-forge-ahead-even-us-raising-risks-us-ai-bubble, source for Palihapitiya’s Kimi K2 quote.

References

Related Posts

Day-by-day: forecasting project completion through work flow simulation

Skills, or do we have the right developers?

AltaLux 1.9.1.92: major update for performance, correctness, and documentation