There is an old proverb: “Necessity is the mother of invention.”
The concept is ancient. One of the earliest recorded instances is in one of Aesop’s Fables, “The Crow and the Pitcher,” from the mid-6th century BCE, where a thirsty crow famously drops pebbles into a pitcher to raise the water level. The Greek philosopher Plato later crystallized the idea in his Republic, stating, “our need will be the real creator.”
It’s a timeless lesson in resourcefulness, a recognition that acute scarcity can force ingenious solutions. And in the high-stakes world of artificial intelligence, this ancient wisdom is playing out in real-time. Beginning in late 2022, a series of escalating U.S. sanctions cut Chinese firms off from the lifeblood of modern AI: the cutting-edge graphics processing units (GPUs) made by Nvidia. The goal was simple: to slow China’s AI ambitions.
The result, however, has been anything but.
Cut off from the bleeding edge hardware, China’s AI labs didn’t just stall; they were forced to invent. Faced with a compute-constrained world, they have been forced to pioneer a new kind of “AI minimalism” – a revolution in algorithmic efficiency that is now producing models that are leaner, faster, and dramatically cheaper to run, yet powerful enough to rival the West’s best.
The timeline is telling. The first hammer blow fell on October 7, 2022, when the U.S. Commerce Department banned exports of top-tier Nvidia chips like the A100 and H100. The screws were tightened in October 2023 and again in December 2024, expanding the list of restricted chips.
And what happened in China? A flurry of innovation. Just as the sanctions bit, Chinese tech giants and nimble startups began releasing a cascade of new models explicitly designed for a world without unlimited hardware. Alibaba’s Qwen family, including the 72-billion parameter Qwen-2.5-Max, began topping leaderboards in mid-2024. The startup DeepSeek released open models like DeepSeek-V3, aggressively optimizing compute. Baidu rolled out its ERNIE 4.5 series, tuned for high efficiency.
The pattern was undeniable: each new wave of sanctions was met by a new wave of compute-frugal innovation.
Perhaps no single project better exemplifies this “do more with less” philosophy than DeepSeek-OCR. Instead of feeding a model mountains of raw text, DeepSeek’s engineers inverted the paradigm: they taught the model to read documents as images. This simple-sounding trick is a masterclass in compression.
Here’s how it works: A document page that might contain 700 to 800 text tokens is compressed by a vision encoder into just 100 “vision tokens.” This compressed summary is then fed to a language decoder. The result is a staggering 7-to-10-fold reduction in the amount of data the expensive part of the model has to process. The kicker? It maintains roughly 97% accuracy at this compression rate.
The architecture itself is a lesson in thrift. The system uses a relatively small 380-million-parameter encoder paired with a 3-billion-parameter decoder. But even that decoder is “sparse” – it uses a Mixture-of-Experts (MoE) design, so only about 570 million parameters are activated for any given token.
The real-world payoff is absurd. On a single, last-generation Nvidia A100-40G card, the very kind Chinese firms must now scrimp and save, the DeepSeek-OCR model can process a staggering 200,000 pages per day. It’s an efficiency that directly addresses the hardware drought.
This “lean” approach isn’t an isolated trick. It’s a full-blown strategic shift, a toolkit of scarcity-driven innovations now visible across China’s AI landscape.
First is the aggressive use of Mixture-of-Experts (MoE). DeepSeek-V3, a massive 671-billion-parameter model, activates only 37 billion parameters (about 5.5%) per token. Baidu’s ERNIE-4.5 “Thinking” model is even leaner, activating just 3 billion of its 21 billion parameters for complex reasoning tasks. This sparsity slashes the compute needed for every single calculation.
Second is a focus on inference and scheduling – the very logistics of running AI. Alibaba’s “Aegaeon” system, unveiled in late 2025, is a pure “scarcity hack.” The research team found a massive inefficiency: at one point, 18% of their GPUs were sitting mostly idle, serving just 1.3% of requests from rarely-used models. Aegaeon solves this by “token-slicing.” It allows a single GPU to juggle multiple models, pausing one mid-thought to process a token for another.
The results were transformative. In beta tests, Aegaeon slashed the number of H20 GPUs needed to serve dozens of large models from 1,192 down to just 213 – an 82% reduction in hardware. It effectively stretches a limited supply of chips to cover a vast workload.
Third is a concerted effort in hardware adaptation. Chinese models are increasingly optimized to run on domestic accelerators. DeepSeek-V3.1, for instance, introduced an 8-bit floating-point format (FP8) specifically to run faster and with less memory on China’s homegrown chips, building a software stack to offset the U.S. hardware curbs.
So, how does this forced revolution stack up against the West?
Western labs, of course, also pursue efficiency. Meta’s Llama and France’s Mistral are prized for their performance on moderate hardware. But the pace and explicit motivation in China are different. While Western firms optimize for market advantage, Chinese firms are optimizing for survival.
And they are succeeding. DeepSeek-R1, built on these lean principles, runs an estimated 5 times faster and is 30 times cheaper per token than a comparable OpenAI gpt-4o variant. Alibaba’s Qwen-2.5-Max is reportedly nipping at the heels of GPT-4o in performance benchmarks. Baidu’s 21B-parameter ERNIE model achieves top-tier reasoning with a tiny 3-billion-parameter active footprint.
There is no ambiguity about the motive. Chinese academic papers, news reports, and executive interviews are frank. They openly celebrate “algorithm and engineering system-level innovations” as a “new path for general AI under resource-constrained conditions.” iFlytek’s COO boasts of building LLM infrastructure “with homegrown hardware” precisely because U.S. chips are unavailable.
The U.S. export controls were intended to be a wall. Instead, they became a crucible. By aiming to deny China the tools of AI, the sanctions unwittingly forced a mastery of the craft.
As Aesop and Plato understood millennia ago, necessity is the mother of invention. Far from suppressing progress, the chip bans have sparked a Darwinian selection for efficiency, forging a more resilient, resourceful, and perhaps ultimately, a more formidable competitor.
Read our full Report Disclaimer.
Report Disclaimer
This report is provided for informational purposes only and does not constitute financial, legal, or investment advice. The views expressed are those of Bretalon Ltd and are based on information believed to be reliable at the time of publication. Past performance is not indicative of future results. Recipients should conduct their own due diligence before making any decisions based on this material. For full terms, see our Report Disclaimer.