✍️ The prologue
My name is Aris Xenofontos and I am an investor at Seaya Ventures. This is the weekly version of the Artificial Investor that covers the top AI developments of the last seven days.
This Week’s Story: DeepSeek’s announcement sheds 600 billion dollars of Nvidia’s value
Ten days ago Deepseek, the foundational model division of a Chinese hedge fund, released DeepSeek R1, an open-source large reasoning model that matched the performance of OpenAI’s comparable model and cost less than 6% to train compared to the American peer’s model. This announcement led to a media and investor frenzy (Nvidia lost more than 15%, equivalent to 600 billion dollars, of value in a day), DeepSeek’s iPhone app smashing the Appstore charts and hundreds of articles written about the impact on the global Tech ecosystem. Hundreds of articles that most, in our opinion, are myths.
So, let’s bust these myths together, inspired by the American early-2000s scientific entertainment show, the MythBusters, where Adam and Jamie used physics and chemistry to test the validity of rumors, myths or movie scenes, such as the truth behind Kill Bill’s coffin punch, whether gummy bears can be used as rocket fuel or whether fear has a distinct smell.
Did it cost DeepSeek only 6% of what it cost OpenAI to train its models? Is this a revolution or an evolution in the LLM race? Is China winning the AI Cold War? Why have American AI model developers failed to improve their efficiency? Does it all mean that we need less AI chips than we thought?
Let’s dive in.
👻 Myth or reality? DeepSeek appeared out of nowhere and no one knew about it. Myth!
DeepSeek didn’t appear out of nowhere. The Chinese AI lab has made the headlines a few times before, announcing very efficient and cheap models that performed in line with American peers. In May 2024 the company released DeepSeek-V2, an open-source model that, according to the release note, performed close to OpenAI’s GPT-4 Turbo (the lighter and faster version of GPT-4), while priced 60 times lower. By our calculations, this model cost only 2.4 million dollars to train. We wrote in June 2024’s Issue 25 of the Artificial Investor about the release of DeepSeek’s Coder v2, which supports 300+ programming languages and outperformed closed-source models, such as GPT-4 Turbo, Claude 3 Opus and Gemini 1.5 Pro, in coding tasks. More recently, in December 2024’s Issue 38 of the Artificial Investor, where we wrote about the current status of the US/China AI Cold War, we mentioned the white paper of DeepSeek’s R1, the reasoning model that beat OpenAI’s o1 in various math benchmarks. This was coming shortly after the release of DeepSeek-V3, an open-source LLM that was free to use and beat OpenAI’s GPT-4o in various benchmarks.
So, no, DeepSeek didn’t appear out of nowhere. The good news is that being an Artificial Investor reader you had been warned! 🙂
💵 Myth or reality? It cost DeepSeek only 6% of what it cost competitors like OpenAI to train their models. Myth!
DeepSeek claimed that it took them less than four days using c.2,000 H800 (not cutting edge) Nvidia chips to train their V3 model and, which equates to less than 6 million dollars. This is compared to 4 million dollars for Meta’s Llama 2 and 70 million dollars for Llama 3, and 100 million dollars for OpenAI’s GPT-4. This would mean that it cost 6% of what OpenAI’s GPT-4 cost to train. How is this possible?
💡 One quick parenthesis here. Some articles talk about Meta’s and OpenAI’s models costing billions of dollars to train, which is also linked to the billions that Meta has invested in AI and OpenAI has raised in funding. However, this is the cost of acquiring Nvidia’s chips used to train the models. Chips are used for weeks or months to train an LLM, but their useful life is 3-5 years. So, a more fair cost estimation is one that uses the hourly rental price of the chips, which is the one that leads to quotes in the tens of millions of dollars. The quotes in the billions of dollars are using the chip purchase price. The fact that OpenAI and Meta choose to own the chips instead of renting them is a different story.
Now, back to our comments on the training cost comparisons. For starters, there are rumours that DeepSeek owns 50,000 H100 Nvidia chips, which would mean that it cost 60 million dollars to train. We are not sure whether this is true or not, but DeepSeek can’t prove that they used only 2,000 GPU chips either.
Even if this was the case (i.e. tens of millions of dollars to train), DeepSeek’s latest model would have been very cost-efficient to train, as it would be at least 40% cheaper than GPT-4 with comparable performance. How is that possible? DeepSeek mentions in the V3 white paper a number of AI innovations that it has adopted, such as advanced load balancing, use of lower precision (8-bit numbers) during training and multi-token prediction. They have also allegedly used distillation from OpenAI’s GPT-4 for training. This is not so hard to prove, as when the model is asked by the user “what model are you”, the model most of the times answers "I'm an AI language model called ChatGPT, created by OpenAI”.
🗒️ Note: Distillation is a method where a model is used to produce a high number of questions and answers that are then fed into another model as training data. This is a pretty common practice.
💥 Myth or reality? DeepSeek’s efficiency is a revolution in the LLM race. Myth!
DeepSeek’s achievement is certainly great and it would be unfair to take any credit away from it. However, calling it a revolution is not entirely true. The last three years AI models have been developing across two dimensions, performance and efficiency. On one hand, we have the race for the state-of-the-art (SOTA) LLM, where performance has been improving significantly, while increasing training costs. On the other hand, we have the emergence of small language models (SLMs) that have become very efficient to train and run, while matching the performance of LLMs that were launched 12-18 months prior to them. For instance, Microsoft’s Phi 4, a model with 14B parameters, which is 15x smaller than OpenAI’s GPT-4 (8 x 220 billion parameters), was released in late 2024 and matched OpenAI’s LLM released at the start of the same year. We wrote about the evolution of SLMs in December 2024’s Issue 39 of the Artificial Investor and drew a timeline to demonstrate the catch-up effect of SLMs vs. LLMs.
So, to an Artificial Investor reader, a model released 9 months after GPT-4 that matches its performance and is 40% or even 80% more efficient would not sound like a revolution 🙂
⚔️ Myth or reality? DeepSeek’s achievements mean that China is winning the AI Cold War. Myth!
Nearly. China is not winning the AI Cold War against the US, but it’s certainly closer to catching up at the model layer. However, AI is more than just the model layer. As per our analysis last December, the US seems to be winning the AI Cold War currently, as it’s winning the broader infrastructure layer (it’s the home to the number one AI semiconductor company (Nvidia), the top three Cloud providers (Amazon AWS, Microsoft Azure and Google GCP), as well as the application layer (it’s the home of the leading AI chatbots (ChatGPT, Gemini, Meta AI) and robotaxi business (Waymo) worldwide. On the other hand, China is better positioned at the base of the hardware value chain, such as the supply of minerals, renewable energy production (through nuclear) and pure-play chip manufacturing. In any case, one of our 2025 AI predictions was that China will catch up at the model layer.
📉 Myth or reality? American AI model developers have tried to improve efficiency, but failed. Myth!
To begin with, we do have American AI models with strong performance and modest size that resulted in limited training budgets. Who are they? The leading small language models: Microsoft’s Phi 4, Google's Gemini Flash, OpenAI’s 4o mini and Meta’s Llama 3.1-8B. Then why such a big fuss about DeepSeek? It’s probably a combination of being a model made by the “villains” and the fact that it’s available via a mobile app for everyone to download and use for free.
Then, if American AI model developers are capable of building models efficiently, why are they not doing so? Charlie Munger, Warren Buffett’s right-hand man, was famously quoted saying “Show me the incentives and I will show you the outcome”.
What has been the incentive framework of the American frontier AI labs? 📈
Devansh, a leading AI author and entrepreneur, wrote a great piece last week about the incentives behind the model scaling frenzy, explaining how the predictability of AI performance improvement has been very convenient for the various stakeholders:
BigTech company CEOs: The AI race became about chip and datacentre capex, which is great for BigTech companies that have more cash than they know what to do with (they have already returned money to their shareholders through share buybacks and dividends and M&A is not an option due to antitrust). Plus, there is a strong Cloud business case, due to the vendor lock-in that comes with large infrastructure deals.
BigTech’s CTOs: It’s way easier to throw computers at a problem than spending time and resources on rethinking the architecture and operating model of your research and development functions.
Data scientist incentives: It has become more likely for a data scientist to produce results they can tangibly demonstrate within their performance review cycle, and in a predictable manner.
Data scientist careers: Scaling reliably improves benchmarks, which in turn become white papers. This is a good match with a research environment where publications are a must.
Scale-ups raising funds: It has helped convince investors that “bigger = better”, which allows people to shift attention away from competitive moats and path to profitability.
💻 Myth or reality? Increasing AI model efficiency means we need less chips. Myth!
Investor panic broke out last week and sent Nvidia’s share price down by more than 15% shedding 600 billion dollars off its value. The panic was related to the “realisation” (let’s call it more a hypothesis) that as AI models are becoming more efficient, we need less compute power to train and run them. This means we need less chips, which affects the market leader the most: “Nvidia has overestimated its chip production → revenues will not grow as much → the company is overvalued → sell the stock!”. We believe this is false in the short and long term, with a question mark about the mid term.
⌛ Short term
We analysed the short-term outlook of Nvidia in September 2024’s Issue 27 of the Artificial Investor, where we saw that more than 2/3 of Nvidia’s revenues come from hyperscalers/Big Tech companies, such as Microsoft, Alphabet, Meta, Apple, Tesla and Grok/X. These companies have already committed their capex and GPU orders for the next 18-24 months and the numbers add up to Nvidia’s revenue projections for the same period.
📆 Long term
We should break down this analysis into i) broader AI chip demand and ii) Nvidia’s performance.
Broader demand for chips is related to the global demand for compute, i.e. intelligence. We believe that the current global penetration of Intelligence across businesses and consumers is below 10%. We don’t have a scientific analysis behind it, but just ask yourself: “how many intelligent super apps are you using on a daily basis that have changed your life?” How much legacy software, manual processes and actual pen and paper are used in businesses today? Just to take the Intelligence penetration levels from 10% to 20% would require a significant amount more chips than the ones currently in circulation and pending orders. The increasing AI model efficiency is not an inhibitor of AI adoption and chip demand, it is rather an enabler.
This is effectively the Jevons paradox. In 1865, the English economist William Stanley Jevons observed that technological improvements that increased the efficiency of coal use led to the increased consumption of coal in a wide range of industries. This meant that as specific technologies become more affordable, their adoption and use increases. This has not happened with every technology, but it has happened with telecommunications devices (telephones, mobile phones), computers, the Internet and solar power. What were the common characteristics of these technologies: economies of scale, broad applicability and an exponential improvement curve. AI has all three characteristics.
Now, whether this will translate to strong long-term performance for Nvidia is a more complicated question that is related to geopolitics, supply chain dynamics, competitive tensions, defensibility, etc.
The open question, in our mind, that remains is: “what will happen with chip demand in the medium term?”. The answer depends on whether the growth of the AI application layer will outpace the decline in the useful life of the current AI chips.
🥷 Myth or reality? DeepSeek’s achievements are made by stealing IP. Myth!
This is basically how open-source research works. Scientists, who don’t care about national borders, political agendas or corporate battles, make scientific advancements and then publish openly everything so that other scientists can continue their work. That’s exactly what OpenAI’s researchers did with Google’s Transformer research and the BERT model, that’s what Meta’s and Alibaba’s researchers did with OpenAI’s models, etc.
🎯 Myth or reality? The DeepSeek news shows that AI models are a commodity. Reality!
This is actually not a myth; models are indeed a commodity. We've written about this extensively and find it hard to believe any Nvidia investor is unaware of this fact. The image above plots many leading AI models on a chart with performance as y-axis (as measured by the LMSys Elo benchmark, a rating system calculated using human votes on blind tests of responses of different models in pairs), and price as x-axis (cost in dollars per million tokens; this is in logarithmic scale given how much the prices have been declining in the last two years). The lines are the Pareto Frontier, i.e. the optimal combination of performance and price, at a given point in time. There are two key messages:
The cost of “OpenAI o1 level” of intelligence fell 27x in the last 3 months
The cost of “OpenAI GPT-4 level” of intelligence fell 1,000x in the last 18 months
Is this price evolution indicating a model commoditisation or not?
✍️ Wrapping up
Summarising, we busted a number of DeepSeek-related myths.
DeepSeek did not appear out of nowhere and has given a couple of warnings with the release of its V2 and V3 models. The cost to train its latest models, which perform on par with OpenAI’s latest models, is probably not as low as 6% the training cost of their American peer, but huge efficiencies have indeed been achieved. Nevertheless, DeepSeek models’ efficiency is an evolution in the AI model race, rather than a revolution. As American frontier labs’ incentives shift, we also expect similar announcements to come out of the US.
China is not winning the AI Cold War versus the US, given the latter’s dominance in the infrastructure and application layer, but China is indeed catching up at the model layer (which, by the way, appears to be a commodity).
Finally, we don’t expect AI chip demand to reduce in the short term, and believe that, being a technology with economies of scale, broad applicability and exponential improvement curve, lower costs will lead to higher demand in the long term. Now, whether this means that this growth curve will be smooth in the medium term depends on whether the growth of the AI application layer will outpace the decline in the useful life of the current AI chips.
See you next week for more AI insights.