|
How The Chinese Beat Trump And OpenAI
The hype around Artificial Intelligence, the now failed U.S. attempt to monopolize it, and the recent counter from China are a lesson in how to innovate. They also show that the U.S. is losing the capability to do so.
In mid 2023, when the Artificial Intelligence hype gained headlines, I wrote:
'Artificial Intelligence' Is (Mostly) Glorified Pattern Recognition
Currently there is some hype about a family of large language models like ChatGPT. The program reads natural language input and processes it into some related natural language content output. That is not new. The first Artificial Linguistic Internet Computer Entity (Alice) was developed by Joseph Weizenbaum at MIT in the early 1960s. I had funny chats with ELIZA in the 1980s on a mainframe terminal. ChatGPT is a bit niftier and its iterative results, i.e. the 'conversations' it creates, may well astonish some people. But the hype around it is unwarranted. … Currently the factual correctness of the output of the best large language models is an estimated 80%. They process symbols and pattern but have no understanding of what those symbols or pattern represent. They can not solve mathematical and logical problems, not even very basic ones.
There are niche applications, like translating written languages, where AI or pattern recognition has amazing results. But one still can not trust them to get every word right. The models can be assistants but one will always have to double check their results.
Overall the correctness of current AI models is still way too low to allow them to decide any real world situation. More data or more computing power will not change that. If one wants to overcome their limitations one will need to find some fundamentally new ideas.
But the hype continued. One big AI model, ChatGPT, was provided by a non-profit organization, OpenAI. But its CEO, Sam Altman, soon smelled the big amount of dollars he potentially could make. A year after defending the the non-profit structure of OpenAI Altman effectively raided the board and took the organization private:
ChatGPT-maker OpenAI is working on a plan to restructure its core business into a for-profit benefit corporation that will no longer be controlled by its non-profit board, people familiar with the matter told Reuters, in a move that will make the company more attractive to investors. … Chief executive Sam Altman will also receive equity for the first time in the for-profit company, which could be worth $150 billion after the restructuring as it also tries to remove the cap on returns for investors, sources added.
The ChatGTP large language model OpenAI provided was closed source. A black-box, running in the cloud, that one could pay to chat with or use for translating, content generation or analyzing certain problems.
The training and maintaining of ChatGTP took large amounts of computing power and money. It was somewhat expensive but there was no new technology in it. The algorithms it used were well known and the training data needed to 'program' it were freely available internet content.
For all the hype about AI is is not a secret or even new technology. The barriers to entry for any competition is low.
That is the reason why Yves at Naked Capitalism, pointing to Edward Zitron, asked: “How Does OpenAI Survive?” It doesn't. Or has little chance to do so. Discussions in the U.S. never acknowledged those facts.
Politicians thought of AI as the next big thing that would further U.S. control of the world. They attempted to prevent any potential competition to the lead the U.S. thought it had in that field. Nvidea, the last leading U.S. chip maker, lost billion when it was prohibited from selling in latest AI-specialized models to China.
Two days ago Trump announced Stargate, a $500 billion AI infrastructure investment in the US:
Three top tech firms on Tuesday announced that they will create a new company, called Stargate, to grow artificial intelligence infrastructure in the United States.
OpenAI CEO Sam Altman, SoftBank CEO Masayoshi Son and Oracle Chairman Larry Ellison appeared at the White House Tuesday afternoon alongside President Donald Trump to announce the company, which Trump called the “largest AI infrastructure project in history.”
The companies will invest $100 billion in the project to start, with plans to pour up to $500 billion into Stargate in the coming years. The project is expected to create 100,000 US jobs, Trump said.
Stargate will build “the physical and virtual infrastructure to power the next generation of AI,” including data centers around the country, Trump said. Ellison said the group’s first, 1 million-square foot data project is already under construction in Texas.
On the very same day, but with much less noise, a Chinese company published another AI model:
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
The new DeepSeek models have better benchmarks than any other available model. They use a different combination of technics, less training data and much less computing power to achieve that. They are cheap to use and, in contrast to OpenAI, real open source.
Writes Forbes:
U.S. export controls on advanced semiconductors were intended to slow China's AI progress, but they may have inadvertently spurred innovation. Unable to rely solely on the latest hardware, companies like Hangzhou-based DeepSeek have been forced to find creative solutions to do more with less. … This month, DeepSeek released its R1 model, using advanced techniques such as pure reinforcement learning to create a model that's not only among the most formidable in the world, but is fully open source, making it available for anyone in the world to examine, modify, and build upon. … DeepSeek-R1’s performance is comparable to OpenAI's top reasoning models across a range of tasks, including mathematics, coding, and complex reasoning. For example, on the AIME 2024 mathematics benchmark, DeepSeek-R1 scored 79.8% compared to OpenAI-o1’s 79.2%. On the MATH-500 benchmark, DeepSeek-R1 achieved 97.3% versus o1’s 96.4%. In coding tasks, DeepSeek-R1 reached the 96.3rd percentile on Codeforces, while o1 reached the 96.6th percentile – although it’s important to note that benchmark results can be imperfect and should not be overinterpreted.
But what’s most remarkable is that DeepSeek was able to achieve this largely through innovation rather than relying on the latest computer chips.
Nature is likewise impressed:
A Chinese-built large language model called DeepSeek-R1 is thrilling scientists as an affordable and open rival to ‘reasoning’ models such as OpenAI’s o1. … “This is wild and totally unexpected,” Elvis Saravia, an AI researcher and co-founder of the UK-based AI consulting firm DAIR.AI, wrote on X.
R1 stands out for another reason. DeepSeek, the start-up in Hangzhou that built the model, has released it as ‘open-weight’, meaning that researchers can study and build on the algorithm. Published under an MIT licence, the model can be freely reused but is not considered fully open source, because its training data has not been made available.
“The openness of DeepSeek is quite remarkable,” says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany. By comparison, o1 and other models built by OpenAI in San Francisco, California, including its latest effort o3 are “essentially black boxes”, he says.
Even long term Internet investors, who have seen it all, are impressed:
Marc Andreessen 🇺🇸 @pmarca – 9:19 UTC · Jan 24, 2025
Deepseek R1 is one of the most amazing and impressive breakthroughs I’ve ever seen — and as open source, a profound gift to the world. 🤖🫡
Nature adds:
DeepSeek hasn’t released the full cost of training R1, but it is charging people using its interface around one-thirtieth of what o1 costs to run. The firm has also created mini ‘distilled’ versions of R1 to allow researchers with limited computing power to play with the model.
That does in fact work!
Brian Roemmele @BrianRoemmele – 14:34 UTC · Jan 23, 2025
Folks, I think we have done it! If overnight tests are confirmed we have OPEN SOURCE DeepSeek R1 running at 200 tokens per second on a NON-INTERNET connected Raspberry Pi. A full frontier AI better than “OpenAI” owned fully by you in your pocket free to use! I will make the Pi image available as soon as all tests are complete. You just pop it into a Raspberry Pi and you have AI! This is just the start of the power that takes place when you TRULY Open Source an AI Model.
The latest Rasberry Pi hardware starts at $50. The software is free.
This is a death call for OpenAI:
Arnaud Bertrand @RnaudBertrand – 14:23 UTC · Jan 21, 2025
Most people probably don't realize how bad news China's Deepseek is for OpenAI.
They've come up with a model that matches and even exceeds OpenAI's latest model o1 on various benchmarks, and they're charging just 3% of the price.
It's essentially as if someone had released a mobile on par with the iPhone but was selling it for $30 instead of $1000. It's this dramatic.
What's more, they're releasing it open-source so you even have the option – which OpenAI doesn't offer – of not using their API at all and running the model for "free" yourself. …
The backstory of DeepSeek is also amazing.
In 2007 three Chinese engineers set out to build a quant (financial speculation) fund using AI. They hired hungry people fresh from the universities. Their High-Flyer fund was somewhat successful but throughout the last years the Chinese government started to crack down on financial engineering, quant trading and speculation.
With time on their hand and unused computing power in their back room the engineers started to build the DeepSeek models. The costs were minimal. While OpenAI, Meta and Google spent billions to build their AI's the training costs for the published DeepSeek models were mere $5 to 6 million.
Henry Shi @henrythe9ths – 23:20 PM · Jan 20, 2025
7. The lesson?
Sometimes having less means innovating more. DeepSeek proves you don't need: – Billions in funding – Hundreds of PhDs – A famous pedigree Just brilliant young minds, the courage to think differently and the grit to never give up
Another lesson is that brilliant young minds should not be wasted to optimize financial speculation but to make stuff one can use.
DeepSeek demonstrates how it is impossible to use trade and technology barriers to keep technology away from competitors. They can, with decent resources, simply innovate around those.
Even billions of dollars, loud marketeers like Trump and self promoting grifters like Sam Altman can not successfully compete with a deep bench of well trained engineers.
As an author at Guancha remarks (machine translation):
In the Sino-US science and technology war, China's unique advantage comes precisely from the US ban. It can be said that our strong will to survive was forced out by Washington, and maximizing our limited resources is the secret to breaking through. In history, this kind of story is not new, that is, the weak prevail over the strong, and the small fight against the big.
The U.S. side will fall into a Vietnam-style dilemma-relying too much on its own absolute advantage, thus wasting a lot of resources and losing itself to internal consumption.
How long for the U.S. to (re-)learn that lesson?
@Johan Kaspar | Jan 24 2025 16:58 utc, re: the interview with DeepSeek’s CEO Liang Wenfeng
I suggest that all readers with an interest in AI read the interview with DeepSeek’s CEO Liang Wenfeng, posted up-thread by Johan Kaspar, and linked above for your convenience.
Wenfeng knows a lot about Chinese companies and how their business strategies compare with those of the U.S. in the technology sector. Hearing him discuss that topic is worth the read by itself.
More interesting is Wenfeng’s description of how his tech research company is organized, and managed, and the corporate culture, and the recruiting policy. This company isn’t just innovating in AI, it’s innovating in management.
=== and here, some followup for Johan:
DeepSeek’s parent, and fundor, is High-Flyer, a financial quant firm, currently valued at $8Billion. How much of that $8B was invested in DeepSeek? I can’t find an article on it; B says in his piece that DeepSeek was funded with $5-6 million. Can you confirm or deny that amount?
Another interesting point: in the interview, Wenfeng states that he and his team have been intensively researching AI for 16 years. That doesn’t seem like a scenario wherein massive investment by parent company into DeepSeek would have been possible. If they got “lavishly” (Johan’s words) funded, it was likely very recently.
Another very interesting point made in the article was the decision to open-source the DeepSeek code. Why? Reason 1: Because it would bring great social acclaim to DeepSeek. It would make highly qualified people – from China, not elsewhere – want to come work at DeepSeek. This may well be a cultural trait of China; there you get a lot of social standing by making contributions to the public good.
Here’s a quote from Wenfeng:
Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect. There is also a cultural attraction for a company to do this.
Reason2 to open-source; to build a strong AI ecosystem in China. When asked “are you going to return to closed-source later, like OpenAI did?”, he replied:
We will not change to closed source. We believe having a strong technical ecosystem first is more important.
I infer from the immediate Q and A context that the main beneficiary of the open source action was the broader Chinese innovator population, not DeepSeek’s own operations. Wenfeng spoke insightfully and with great commitment about moving China forward from commercialization of applications based on other countries’ research, to doing more basic research in China itself.
Wenfeng also pointed out, in the context of recruiting, that the people who DeepSeek recruits are people that are curious, who want to tackle the Big Problems. Money is much less of a motivator.
For the bar: That article is highly worthwhile. I bet it will certainly contest many of your perceptions about the Chinese, about AI, about corporate competitive strategy. Thank you again, Johan.
Posted by: Tom Pfotzer | Jan 24 2025 22:20 utc | 103
Ah, two three commentators here get it, the rest is just ignorant.
Because the level here is so low, let me explain the basics, for large language models that is. It just so happened, that there is another AI type that hit critical mass in parallel, stable diffusion, and there is
some overlap, but let’s keep that aside.
So what changed is the critical mass, hardware plus some trick, the transformer model which kinda saves
calculations of a factor of 1000 (my guess), faster parallel hardware, more memory, the internet makes
data accessible, and voila, some old idea (Backpropagation) from the 60s turns out to be golden, like
electricity. It just works. That’s why all these models at the same time are so similar in what they can
do.
Is like Westinghouse or ABB or what have you, Soviet electricity, it is maybe different machine, but it is
just nature, all the same.
So how does it work? We don’t know, but we know how to achieve it. So you have 80 layers of simulated
neurons, and one layer is connected to the layer before or after, so a lot of connections. That is what
they call the number of parameters. Let’s say 70 billion for Llama open source model from Meta, or
600 billion for DeepSeek V3 bla bla bla. So in the beginning these parameters are more or less random
values. Now comes the other part, the training data. 14 trillion tokens (half-words) right now.
Basically the whole internet, all books bla bla bla. So they get piped into those 80 layers, one after another, and each time during the training they look if the output matches the token that they know will
be piped in next. If yes, good. If not, using backpropagation algorithm, all these parameters are adjusted
to get the right output. So you do this a 14 trillion times, and start again bla bla bla.
What happens is not that the model knows those 14 trillion tokens/words, now it learned some structure,
some model of the world. Now it can translate texts perfectly it was never fed before.
So basically the 14 trillion data got compressed into 70 billion parameters, and it turns out it learned
a lot doing it. Real intelligence. That intelligence might be of a 4 year old kid, but nevertheless.
Btw, those 70 billion parameters are the connections, like dendrids in the brain. Our brain has
around 100 billion neurons. But they are still connected. So our brain is still much much bigger.
But nobody has read more than a few thousand books. Btw, when we recognize a face that we know,
our neurons and connections are very slow. So we know not more then 100 steps till we get the result.
To get this result, they brain is slow, but also works highly parallel.
With the compression of the data, aka learning, creating structure, of course come mistakes. If I met
someone and didn’t write down the exact name and didn’t listen carefully, i am sure it was David, yet
it was Daniel, we make these mistakes all the time. And so does the AI. If you don’t see it, how revolutionary that is, well, others do. If you want precision, relational databases, pi to a million digits, that’s old stuff IT and long solved (and we humans are not so good at it).
If I want to converse about politics or history and I have the choice between a top commenter her or
an AI, OK. But pick the median commenter here and I talk to the AI all day instead. Just like with
humans, you maybe listen and get an idea, but you don’t pick up everything. The more you know, the
better you can filter. If you don’t know anything about an area, an AI can get your started very fast.
Just ask about the best books about an area, for example.
And btw, if you compare models, the size matters. If you have 1.5 or 8 billion para models on your phone,
that is like a Pappagei. If it is a specialized model, 34 b is already not bad, but the decent models,
which make less and less mistakes with their knowledge, are 70 plus models. We don’t know how big
ChatGPT is, btw. But next to DeepSeek, other honorary open model mentions should go to Qwen from Alibaba
as well (Llama from Meta already mentioned). They are all really good, useful, and in a similar ball
park.
Oh, and regarding the intro text, o1 and r1 are both brand new (r1) and specialized on reasoning.
Benchmarks say they are roughly equal? Well, you have to use them (which I haven’t). But if the OpenAI
one is so terrible and unusable, what then makes the DeepSeek one so great? And vice versa. If O1 is
garbage, then R1 should also be garbage, OpenSource or lower cost can’t be a reason alone to praise a
unsusable model. And then we also don’t really know if DeepSeek didn’t cheat with how many hardware
resources they really used). Oh, and on the closed side Claude from Anthropic should also be mentioned,
as many heavy users prefer it over OpenAI stuff.
Cheers.
Posted by: CSOstsgx60 | Jan 25 2025 0:21 utc | 120
Posted by: fnord | Jan 25 2025 1:53 utc | 138
Unfortunately Western governments are going to get the memo on this late.
<=China's new DeepSeek-R1 technology might have beat Trump announcement of intention to invest .. but the truth is the USG defeated the Americans it governs back in the 70s and 80s..
In 1969 I went to work in town of about 200000 people.. There was a Nylon 66 plant (13,000 employees), a Acrylic Fiber plant(1,100 employees), a speciality chemical company(450 employees), a floor tile plastics industry (600 employees), a Penta treating plant (60 employees), two independent chemical research labs (total about 350 employees), a Creosote Plant (84 employees), a two independent medical testing lab(about 15 employees, two large paper mills (over 1800 employees), a cyanide plant (40 employees), a rayon plant (180 employees, two ship building operations (1200 employees), a clothing mill (120 employees) and a carpet manufacturer (169 employees), there were two electrical supply houses and one micro electronics shop and several computer companies and lots of smaller employers and professional engineering and consulting firms all around. All kinds of suppliers and specialities houses existed to serve these industries. All this was in the MSA..
There were two universities and one very good junior college in the area mostly staffed with adjunct professors from the various surrounding industries, A bachelor's degree in engineering or chemistry or math was all one needed to command top dollar.. I would put every one of those BS degree people up against todays PHd's. anyone from scientist, mathematician or technician could quit a job and walk across the street and get hired within days. the whole area was technically inclined.. At parties, in bars and on vacations these on the job experts working in the leading edge technology of the day, were conversing talking, imagining, planning, and hoping.. everyone was wondering what next would be done differently and betting on when it might happen/. . Everyone knew who in the area was close to making a discovery.. etc.. now there is nothing, not even enough people with industrial or research knowledge to talk with. The university professors have no real experience.. everything is according to this model or that model. but the real discoveries come from questions those who do the work ask of their professionally trained bosses. A mechanic, lab technician or a marketing person might ask why don't you do (what ever it is) this way instead of the way its being done now.. That turns on the light, and in short order a project is funded and sure enough, the bottom up suggestion turns into a top down multi million dollar cost saver.. or it creates a whole new industry or whatever.
One by one EPA shut down America's industry. EPA forced these industries to relocate overseas.. The USG forced the exodus of American industry overseas. The talent left behind in those places where once the world was being reinvented every few days was enormous. No problem was too big to be solved by the people of those days, but the banks all said we can no longer fund you, your stupid government has decided to move all new and existing technical, industrial and research business overseas. Investors were told to invest in overseas..
all of that talent, knowledge, know-how was lost, not to mention the hard times it caused those highly trained people..
To produce good researchers it takes much more than a university producing Phd's.. it takes production experience, it takes interactive socialization between the scientists, engineers, industrial people and marketing people all working in different fields, and it takes lots of it with adequate venture and investment funds to encourage efforts and to do productive research.. a Phd is not needed in fact often it is a major hindrance because it creates a hierarchical society within organizations that strangles interaction. and it is particularly bothersome in industrial development.
What is needed is talented people committed to solving a problem or looking into how to do something better or cheaper .. but even that is not enough these people must have open access to a library that carries subscriptions to nearly every kind of, and everyone one of every journal the world produces (in modern times the library is the internet but most of its content is off limits except to a select few). Yet most of the good research papers needed to do decent research are today kept behind firewalls with patent and copyright lawyers waiting to sue anyone who tries to share knowledge with a person of the other or to use research learned from a person who works at the other..
Until copyright and patent laws are removed from our society nothing is likely to get better. Problem is 90% of the assets on listed companies are intangible assets. Wall street used our government to break out industrial back. Its unlikely Trump or Musk will be able to fix that problem without total cooperation from the congress, bankers and entrenched multinational monopolistic corporations. It was the Oligarch owned and controlled USG that made America into a 3rd world country. I doubt the USG government has the power to over rule the domestic and international oligarchies that own the government and who dictate what it must do?
Today none of America's great past is in operation. It is going to take at least 20 years to get it back in place and a lot and lot of egg on the face as our foreign competitors beat us to death with their new and better inventions.
China did not beat Trump, the Oligarch owned wall street controlled USG beat the hell out of the American people; Trump is just one of its victims.
Posted by: snake | Jan 25 2025 2:09 utc | 127
|