Categories
AI invest technology

Unexpected winners and losers after the week of DeepSeek

How DeepSeek would like the world to think about the youthful team. Image created with Midjourney.

It was the week of DeepSeek's CEO Liang Wenfeng, who seemed to appear out of nowhere to scare the hell out of everyone from Silicon Valley to Washington to Wall Street.

Apparently, not everyone has noticed that China is making the leap from an agricultural to a post-industrial society in record time. What chuckles there must have been in Beijing and Shanghai when Chinese New Year was celebrated last week.

Last week I wrote that Silicon Valley was rudely awakened by DeepSeek, and on Tuesday I added that Wall Street had overreacted. Today an attempt to chart the winners and losers, short- and long-term, of the rise of DeepSeek.

Who is Liang Wenfeng?

But first: who is Liang Wenfeng, the founder and CEO of DeepSeek? What is special about Wenfeng, as a startup founder, is his background as the founder of a hedge fund: High Flyer

"When we first met him, he was this very nerdy guy with a terrible hairstyle talking about building a 10,000-chip cluster to train his own models. We didn’t take him seriously" one of Liang's business partners told the Financial Times.

During his time at High Flyer, Liang began buying Nvidia equipment and learned the various ways to develop algorithms for AI applications, lessons he now applies at DeepSeek. More remarkably, DeepSeek's sudden success is driven by Gen Z newcomers from diverse backgrounds. Liang likes originality and creativity from young smart people and values experience a lot less.

Liang also talked about hiring literature buffs on the engineering teams to refine DeepSeek's AI models. "Everyone has their own unique path and brings their own ideas, so there's no need to direct them." This is especially interesting to read in the week that Mark Zuckerberg boasts that he is getting rid of all diversity programs at Meta, in an effort to appease the Trump administration.

OpenAI worth $300 billion after all?

According to the Wall Street Journal, Japan's SoftBank would lead a $40 billion investment round in the ChatGPT maker, part of which is to be spent on its Stargate AI infrastructure project. With a valuation of $300 billion, OpenAI would become the second most valuable startup in the world, behind Elon Musk's SpaceX, the major rival of OpenAI CEO Sam Altman.

It would be downright amazing if Altman manages to raise money for his money losing company at that stratospheric valuation, in the week when its vision and technological architecture are being doubted worldwide. But let us not overestimate SoftBank: it is the same club and the same man, Masayoshi Son, who burned tens of billions in WeWork; all the way to bankruptcy. The question is: why won't anyone but SoftBank step in at this valuation?

Is Stargate science fiction?

Both OpenAI and SoftBank have declared they will invest tens of billions in Stargate, the $500 billion budgeted AI infrastructure project that is supposed to seal American hegemony in technology. The crazy thing is that OpenAI doesn't have that money at all, and neither does SoftBank. So when SoftBank invests in OpenAI, which thereby invests in Stargate, it's basically filling one hole with another one.

The Verge published a lucid analysis of the Stargate project. If Stargate fails, it would not simply be the end of a startup. It would be an expensive reality check for an entire industry that claims to transform the world through pure computing power.

Altman likes to present himself as the protagonist in a classic science fiction story: the visionary who promises to transform society through technological power. 

In say a year, we will know whether Stargate was the beginning of America's AI revolution, or just a techno-optimistic fantasy that could not survive in the real world.

DeepSeek's actual costs

Then to a much-discussed topic: the costs allegedly incurred by DeepSeek to develop the acclaimed R1 model. The wildest stories are circulating about this, while DeepSeek itself has been fairly transparent about it:

"Finally, we again highlight the economic training cost of DeepSeek-V3, as summarized in Table 1, achieved by our optimized co-designs of algorithms, frameworks and hardware.

During the pre-training phase, training DeepSeek-V3 on every trillion tokens requires only 180K H800 GPU hours, or 3.7 days on our cluster with 2048 H800 GPUs. This completes our pre-training phase in less than two months and takes a total of 2.664M GPU hours. Combined with 119K GPU hours for context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs a total of only 2.788M GPU hours for full training.

If we assume that the rental cost of an H800 GPU is $2 per GPU hour, our total training cost is only $5.576M. Please note that the above costs include only the official training of DeepSeek-V3 and not the costs associated with previous research and tear-down tests of architectures, algorithms or data."

I highlighted the crucial part: all previous costs are not included in the cost calculation. It's like calculating the cost of a bodybuilder's meals on competition day without including how many meals it took to get to the competition. 

Cheaper AI: who benefits?

Even more interesting than the cost aspect, DeepSeek offers the ability to install the model locally and develop on it. Microsoft CEO Satya Nadella pointed directly to Jevon's Paradox.

In short, precisely because of the reduced cost, the use of an innovationwill increase. It looks like Nadella is going to be right about that. In the long run, the "commoditization" of AI models and cheaper inference as demonstrated by DeepSeek will benefit Big Tech. Microsoft, for example, needs to spend less on data centers and GPUs, while benefiting from increased AI utilization through lower inference costs.

Amazon is also a big winner: AWS has not developed its own high-quality AI model, but that doesn't matter when there are high-quality open-source models available that it can offer at much lower cost.

Apple also benefits

Drastically reduced memory requirements for inference make AI on iPhones much more feasible. Apple Silicon uses a unified memory architecture, with the CPU, GPU and NPU (neural processing unit) accessing a shared memory pool, argues Stratechery in an excellent piece. This effectively gives Apple's hardware the best consumer chip for inference. Nvidia's gaming GPUs, for example, reach a maximum of 32GB of VRAM, while Apple's chips support up to 192GB of RAM.

Meta the biggest winner

AI is central to Meta's long-term strategy, and one of the biggest obstacles to date has been the high cost of inference. If inference and training become much cheaper, Meta can accelerate and expand its AI-driven business model more efficiently. 

Sensibly, Zuckerberg has reportedly set up several war rooms to determine how Meta will react to the introduction of DeepSeek. Whereas in the short term DeepSeek is thought to be a threat to Meta's AI strategy with its Llama LLM, a structural reduction in AI development costs will actually lead to a huge advantage for Meta, which is on track to invest $65 billion in AI development this year alone.

Most of that is spent on hardware and data centers. If that kind of investment can be minimized by imitating DeepSeek's approach, Meta will see its net profits increase substantially without weakening its competitive position.

Google the loser?

While Google also benefits from lower costs, any change from the current status quo is likely to be a net detriment to Google. Every search in OpenAI, DeepSeek or a Meta agent, comes at the expense of a search on Google's search engine.

Despite all its efforts and hundreds of acquisitions over the last few decades, Google still depends largely on the search engine for revenue and profits. It remains to be seen whether Google will succeed in "redirecting" that traffic from the AI agents and chatbots the world so eagerly uses, back to Google's AI tools.

Nvidia not defeated by DeepSeek

Despite DeepSeek's breakthrough, Nvidia has two moats, according to Stratechery:

  • CUDA is the preferred programming language for anyone developing these models, and CUDA works only on Nvidia chips.
  • Nvidia has a huge lead when it comes to the ability to combine multiple chips into one large virtual GPU.

These two lines of defense reinforce each other. As mentioned earlier, if DeepSeek had had access to H100s, they probably would have used a larger cluster to train their model simply because it was the easiest option. The fact that they did not and were limited by bandwidth dictated many of their decisions in terms of model architecture and training infrastructure.

DeepSeek has shown that there is an alternative: heavy optimization can achieve impressive results on weaker hardware and with lower memory bandwidth. So paying more to Nvidia is not the only way to develop better models.

However, there are three factors that still work in Nvidia's favor.

  • First, how powerful would DeepSeek's approach be if applied to H100s or the upcoming GB100s? Just because they have found a more efficient way to use computing power does not mean that more computing power would not be useful.
  • Second, lower inference costs are likely to lead to wider use of AI in the long run. Microsoft CEO Satya Nadella recently confirmed this in his late-night tweet about Jevon's paradox.
  • Third, reasoning models such as R1 and o1 derive their superior performance from using more computing power. As long as AI's strength and capabilities depend on more computing power, Nvidia will continue to benefit.

Also, with a larger market, Nvidia will benefit from revenue growth in cheaper chips, although it will be hampered in that market by competitors such as AMD. 

My subjective "Spotlight on AI" basket took relatively few hits last month.

DeepSeek thought 28 seconds about a hot dog

Joanna Stern of the Wall Street Journal did a funny test of DeepSeek and discovered how it differs from OpenAI's ChatGPT and Anthropic's Claude. Unlike OpenAI's reasoning models, DeepSeek shows its full thought process. When asked if a hot dog is a sandwich, DeepSeek thought about it for 28 seconds and responded with: "First, I need to understand what the definition of a sandwich is." It illustrates that there is no specific form of AI that works best for all issues.

The advance of AI throughout society is irreversible and with DeepSeek's approach, which will be copied frequently, the market will only grow larger. Therefore, despite all the doom-and-gloom news last week on Wall Street, it is fascinating that over the entire month of January, the performance in what I consider to be AI stocks has been better than one would expect. 

ARM's 29% rise is remarkable and is largely based on ARM's participation in Stargate. The remarkable thing is that SoftBank owns ARM and therefore there is a good chance that Masayoshi Son will use the shares in ARM as collateral when raising loans, which SoftBank can then use to pay for investments in OpenAI and in Stargate. Time will tell whether this approach leads to a skyscraper, or a house of cards.

This is how the main parties of the DeepSeek crash closed on Wall Street yesterday

What did America's tech billionaires buy from Trump?

President Trump has often expressed hostility toward major technology companies and their leaders, calling Facebook an "enemy of the people" and labeling Jeff Bezos as "Jeff Bozo," for example. Yet these gentlemen were in the front row at the inauguration, having lapped up significant sums of money. This was obviously no coincidence, and the technology sector wants something back from Trump soon. Bloomberg looked at each of them and mapped out what they each want to accomplish.

As we take stock of the performance of Big Tech stocks in the month of January at the end of the second week in Trump's second reign, it appears that the short-term results are not yet what Trump's new tech pals had hoped for. Despite all of Trump's presidential decrees and appointments, stock market results have been rather mixed, to say the least.

What is particularly striking is that investors are sharply divided over the tech sector as a whole. Meta rose mainly due to good quarterly earnings, but how could Microsoft fall while Google rose? Did Apple fall in January due to the possibility of a trade war with China? It is strange that the financial media was mostly focused on last week's results and ignored what happened in terms of price swings earlier in the month. Consider, for example, Palantir, up nearly 10% in January and already up 385% in the last year.

Huang at Trump, Liang at Li Qiang

President Trump and Nvidia CEO Jensen Huang discussed the impact of DeepSeek and possible restrictions on AI chip exports to China during a meeting at the White House on Friday. Huang will certainly have been thinking about the possible impact on Nvidia's stock price.

DeepSeek's Liang Wenfeng also met with an important politician this week: as the sole representative of the AI industry, he met with Premier Li Qiang, China's second most powerful man. Both meetings underscore the importance of technology to economic power in the new world order defined in part by AI.

Palantir CEO Alex Karp told CNBC that the rise of DeepSeek is a sign that the U.S. needs to work faster to develop advanced AI. "Technology is not necessarily good and can pose threats in the hands of adversaries. We need to recognize that, but that also means we need to run harder, go faster and make a national effort."

Boring: success begins with homework

Europe is no longer a consideration in the geopolitical shuffling between continents; how can it be, with so much talent among half a billion people?

Malaysian comedian Ronny Chieng summed up the West's problem perfectly: people are willing to die for their country, but they don't want to do homework for it. Chieng is talking about America, but it applies just as well to Europe.

Categories
AI technology

DeepSeek revolutionary: good, cheap AI product from China

OpenAI launched the AI agent Operator, initially useful for messenger and shopping services, while scientists such as Jan Leike and Nobel Prize winner Geoffrey Hinton are again warning of dangers of AI. Image created with Midjourney.

DeepSeek-R1, a new large language model from Chinese AI company DeepSeek, with a website that looks like a sleep-deprived intern pressed "enter" too quickly, has attracted worldwide attention as a cost-effective and open alternative to OpenAI's flagship o1. Released on Jan. 20, whether or not coincidentally on the weekend that "tout Silicon Valley" was eagerly clinging to the coattails of power in Washington, R1 excels thanks to "chain of thought" reasoning which mimics the problem-solving ability of humans.

Unlike closed models such as OpenAI's o1 and Anthropic's Claude, which this week raised another $2 billion from investors who are throwing pudding against the wall in AI hoping that some of it will stick, R1 is open-weight and published under an MIT license. That means anyone is free to build on the architecture. Unlike open source, the source code and training data used to build DeepSeek-R1 are not public, 

The model was developed for just five million dollars through algorithmic efficiency and reinforcement learning, significantly less than o1, despite U.S. export restrictions on advanced GPU chips, especially from Nvidia, on which U.S. competitors are primarily developing. Its affordability, with API costs more than ninety percent lower than o1, thus makes advanced AI more accessible to researchers with limited resources. It also offers a free chatbot interface with Web search capabilities, surpassing OpenAI's current features.

'Everyone is freaking out about DeepSeek'

By matching or even surpassing o1 in some benchmarks, R1 has highlighted China's advance in AI development. Its sudden rise has sparked discussions about the future of open, accessible AI and the need for international cooperation to move forward responsibly. 

International reactions to DeepSeek-R1 ranged from respect to dismay. Nature was analytical: 'DeepSeek-R1 performs reasoning tasks at the same level as OpenAI's o1 - and is open for researchers to examine.' MIT Technology Review remained tidy: 'The AI community is abuzz about DeepSeek R1, a new open-source reasoning model.' But VentureBeat said out loud what all of Silicon Valley was thinking: 'Why everyone in AI is freaking out about DeepSeek.'

By the way, anyone who asks DeepSeek about Tiananmen Square gets the reply, 'I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.' Asked about the situation of the Uyghurs, a very elaborate answer first appeared that even used the word genocide, but a few seconds later that text was replaced by: "Sorry, that's beyond my current scope. Let's talk about something else.' DeepSeek wants to keep things light and breezy.

Stargate historic project in AI infrastructure

The focus on China's DeepSeek led to great chagrin from the American techno-elite, who wanted to use this very week to underscore American supremacy. OpenAI, Oracle, Japan's SoftBank and Emirates-based MGX are funding the Stargate Project, a $500 billion initiative described as the largest AI infrastructure project in history.

Announced by President Trump in the Oval Office, its goal is to build advanced data centers for AI in the US, which Trump says will create a hundred thousand jobs. They are a kind of Delta works forAI. The project currently has one hundred billion dollars in direct funding, with the remaining investment spread over four years. The first huge data center will be built in Texas.

It already led to bickering over Stargate funding between OpenAI CEO Sam Altman and Elon Musk. Forbes even made a timeline of the ongoing fitties between Altman and Musk, who should get into a boxing ring or a hotel room.

Will the real MGX please rise

In all the excitement, it was especially comical that overexcited investors bought the wrong stock in the belief that it is part of Stargate: biotech company Metagenomi (symbol: MGX) saw its share price shoot up even though it is not involved in Stargate. The MGX that does participate in Stargate, Abu Dhabi's sovereign wealth fund, this MGX, will have watched on in wonder.

It would be some feat if Trump succeeds in getting foreign investors to invest hundreds of billions in U.S. infrastructure with MGX and Japan's Softbank, without the U.S. taxpayer contributing. Investor Bill Gurley (Uber) publicly question ed the public-private partnership, which is unusual by American standards. The main question is whether Stargate will be accessible to all and who ultimately makes the decisions. OpenAI CEO Sam Altman often has problems with governance.

OpenAI with AI agent: Operator

In all the fuss over DeepSeek and Stargate, the news was snowed under that OpenAI this week introduced Operator, an AI agent that can independently navigate Web browsers and perform tasks such as online shopping, booking travel and making reservations. It marks the moment when AI agents are making their entrance into the mass market.

Operator uses OpenAI's Computer-Using Agent (CUA) model, which mimics human interactions with Web sites by using buttons, menus and forms. OpenAI is working with companies such as DoorDash, Uber and eBay for Operator to ensure it complies with their terms of use. 

Despite all its potential, Operator has limitations with more complex tasks such as banking and complex web interfaces or CAPTCHAs. Right now, unfortunately, it is only available to U.S. users on the ChatGPT Pro subscription of two hundred dollars a month, so I have not been able to test it myself.

Operator an echo of General Magic

OpenAI's Operator, nearly thirty-five years after the fact, is very reminiscent of the legendary company General Magic, known for the description as "the most important company to ever come out of Silicon Valley that nobody ever heard of. All of Operator's marketing copy seems to duplicate General Magic's slogans and claims from the early 1990s.

In the end, General Magic, which attempted to create a handheld computer with agent features before the Internet and digital mobile telephony got to mass adoption, proved too far ahead of its time. Like General Magic, Operator strives to integrate seamlessly into users' lives and function as a personal assistant and productivity booster.

For fans: a fine documentary was made about the rise and fall of General Magic, of which this is the trailer. The team behind General Magic was so special that dozens of books have been published and even a real feature film has been made in which they starred: Andy Hertzfeld was a prominent member of the team that developed the Apple Macintosh for Steve Jobs, after General Magic Tony Fadell became the developer of the iPod and co-creator of the iPhone at Apple, and Joanna Hoffman is such a special person that Kate Winslet went to great lengths to play her in Danny Boyle's film about Steve Jobs.  

Leike and Hinton with different warnings

In all the publicity about DeepSeek, Stargate and AI agents, the fact that two leading AI scientists once again warned against the misuse of AI with potentially disastrous consequences for the world snowballed. Professor Geoffrey Hinton, a leading figure in AI and winner of the 2024 Nobel Prize in Physics, discussed the risks of rapid AI developments in a fascinating conversation with his former student Curt Jaimungal. 

Hinton has frequently warned that AI could evolve and gain the motivation to make more of itself and autonomously develop a sub-goal to take control of the world without regard to humans.

The German Jan Leike, co-founder of OpenAI where he left disappointed, now puts it this way:: "Don't try to imprison a monster, build something you can actually trust!" I previously wrote extensively about Leike and Hinton's warnings in this blog post.