5Mon·

1 week after the DeepSeek chaos (from the perspective of an AI engineer)

Last week I was fascinated to see how the media and many American CEOs reacted with horror to the news that a new and cheaper model would be just as good as OpenAI.

First of all: about 4 months ago, the Qwen team, which belongs to $BABA (-1.51%) released the Qwen 2.5 model series. These ranged from 0.5b to 72b. The 72b model achieved far better performance than any existing open source LLMs on the market, including Meta Llama 3.3:

As you can see the performance is only conditionally worse, depending on which benchmark you take, it is even better than models from Anthropic's Claude, OpenAI's GPT-4o and Google's Gemini. How much of a stir has this caused: 0.

At least in the media. For us developers, however, the constant progress of open source models is a blessing, because in the case of qwen2.5:72b, for example, we download the models almost completely once and run them on private servers so that they work locally, cut off from any data. This means that no data can flow back to the provider OpenAI via an API request such as with OpenAI's models (actually a joke that they are called OpenAI) and you cannot use this data to retrain your models. This is essential, especially for applications with critical data.

By the way, if you would like to run your own personal assistant on your computer, you can do this with the open-source software Ollama, for example. If you don't know how much your laptop can handle if you don't have a graphics card with 64GB RAM installed, here is a brief overview:

<=3b models approx. 5GB RAM (e.g. LLama 3.2 3b, qwen2.5:3b)
7/8/9b models approx. 16GB RAM (e.g. qwen2.5:7b, llama3.1:8b)
<=70b models approx. 64GB RAM (but ultra slow on CPU only, then needs a Mac with Apple Silicon or something)

So to summarize: even with 3 year old Windows computers with 32GB RAM and an Intel or AMD processor you can run LLMs.

ollama link: https://ollama.com/

So back to the topic: why has nobody said anything about Qwen's success?

Maybe simply because Qwen's success had nothing to do with major tweaks in the model architecture. The fact is that the DeepSeek team made a few crucial changes to the model architecture, mainly 2 of them:

Instead of training the base model with supervised finetuning, i.e. in simple terms with labeled data set (good answer/bad answer), they trained it purely with reinforcement learning (mathematically programming in a kind of reward when desired training successes are achieved)

In the reinforcement learning approach, they also made crucial changes and established a new technique (Group Relative Policy Optimization GRPO), which accelerates the feedback process of positive learning in a very simplified way.

paper: https://arxiv.org/pdf/2501.12948

Now to the most important part:

Why DeepSeek "left out" a part and why it can be very good for us investors

In a detailed article, semianalysis has listed once again why certain numbers don't make as much sense.

TL;DR:

$6M "training cost" is misleading - excludes infrastructure and operating costs ($1.3B server CapEx, $715M operating costs)
Get ~50,000+ GPUs (H100/H800/H20) via detours (H100 forbidden GPUS!)
They could also just offer inference cheaply, like Google Gemini, (50% cheaper than GPT-4o) to gain market share
An export wormhole apparently enabled a $1.3bn expansion before H20 export restrictions
DeepSeek R1 "reasoning" is good, but Google Gemini's Flash 2.0 is cheaper and just as good at least (from professional experience: the Gemini models are now very good)
in the operational costs that have not been disclosed, the salaries for top talent are estimated to be up to $1.3M USD (per capita)

source: https://semianalysis.com/2025/01/31/deepseek-debates/

ASML CEO says so:

"In Fouquet's perspective, any reduction in cost is beneficial for ASML. Lower costs would allow artificial intelligence (AI) to be implemented in more applications. More applications, in turn, would necessitate more chips."

source: https://www.investing.com/news/stock-market-news/asml-ceo-optimistic-over-deepseek-93CH-3837637

Personally, I have a similar view that AI will become cheaper and better in the future, but chips will also have to be better and, above all, more chips will have to be produced. Without going into too much detail, this was a good opportunity, for example. $ASML (-2.04%) , $TSM (-1.4%) , $NVDA (-1.03%) and co. to buy.

113

42 Comments

Pope Donkey XXI@DonkeyInvestor

5Mon

Good post but wrong community. I bet that 98% of the readers here only understand the point. How about a summary from a financial perspective? So what does that mean for our portfolio?

•

He-Man D.@He-Man

5Mon

@DonkeyInvestor Are you also one of the 98%?

•

Lightw8 Investor@lightw8_invest

5Mon

@DonkeyInvestor hmhh you're probably right, I just thought it might add value to shed some light on technical details and maybe also show why DeepSeek can actually be positive for investors, especially when it comes to the chip industry :)

•

GHF@GHF

5Mon

@DonkeyInvestor I read the beginning and the conclusion... I understood that, rated it as very helpful😉👍🏻

•

Dominik@Dominik_76

5Mon

@DonkeyInvestor who wants to tell you that you don't need to sell Nvidia in a panic, but that the market environment is more positive than you might think because of deepserk reactions. Thought you were something like a computer scientist 😜😀

•

Pope Donkey XXI@DonkeyInvestor

5Mon

@He-Man it's none of your business

•

Pope Donkey XXI@DonkeyInvestor

5Mon

@Dominik_76 first and foremost, I am a donkey

•

He-Man D.@He-Man

5Mon

@DonkeyInvestor Come on... feel free to admit that you're just as stupid as 98% of the community here.

•

Pope Donkey XXI@DonkeyInvestor

5Mon

@He-Man NIEMALS

•

Dominik@Dominik_76

5Mon

@DonkeyInvestor I am even more of a donkey. Because who is the bigger donkey, the donkey or the one who follows him. Think about it.😀

••

Pope Donkey XXI@DonkeyInvestor

5Mon

@Dominik_76 I am 1.95 tall. Can you keep up?

•

View all 13 further answers

Alex@Divy

5Mon

As I also work in software development, I found your comments interesting. However, I am also considering how this will affect my investments.

•

Lightw8 Investor@lightw8_invest

5Mon

@Divy Even here in Germany, where we are currently operating in less than rosy economic times, GenAI solutions are being chased by many companies, sometimes more for the hype than for the real benefits, but this is often associated with corresponding deployments in the cloud or on local servers with corresponding graphics cards. How long will this continue? Investors obviously shrugged their shoulders at Mark Zuckerberg's statement in the Meta earnings call last week that they will continue to invest massively in infrastructure, just as the price has not moved at all.

••

Bullbender@Bullbender

5Mon

Very exciting to hear from someone first hand, also regarding Gemini experience.
For which uses do you primarily use the Gemini models professionally? Basic tasks or also as coding support or more in-depth things?

Re impact on hyperscaler and AI infrastructure, the Jevons Paradox very well supports the assumption that technological efficiency advances also increase demand and usage, which in turn can have a positive impact on blade manufacturers like the ones you mentioned. ( P. https://en.wikipedia.org/wiki/Jevons_paradox)

•

Lightw8 Investor@lightw8_invest

5Mon

@Bullbender In fact, I think this paradox is spot on! What if every company, and soon perhaps every individual, has their own personal assistant on their computer or smartphone? Now the concept is still very new, but I think in the future it will be natural to have your own models.

Regarding Gemini: the 1.5 pro flash versions via API were relatively solid for various chatbot use-cases due to their high context window, i.e. how many sources you can give them when searching a database in the background. Because the model is also relatively fast, the latency until the user receives a response was also relatively low.
For coding, I recommend either qwen2.5-coder:32b or deepseek-coder, which I run locally on my computer via Ollama as mentioned above and then connect it to my VS code, there is a cool extension called Continue that does this for you. So you have a version of Cursor, the AI IDE, for free.

•

Luke@Klassi

5Mon

@intelligent_invest_99 Very exciting! I like using Cursor, but will try the option with Continue, thanks

•

Seventrader@7Trader

5Mon

Very excitingly written and was a nice insight into this strange topic

•

Dominik@Dominik_76

5Mon

Very good contribution, thanks for that. 👍😀

•

Mangan@Mangan

5Mon

Many thanks for the informative article. It also touches on my personal AI ambitions.

•

Luke@Klassi

5Mon

Great contribution! I'm also a big fan of Ollama, everyone should really give it a try

•

Lightw8 Investor@lightw8_invest

5Mon

@Klassischer Far too underrated, as theoretically almost anyone can now have their own models on their machine for free 📈📈🤝

•

Luke@Klassi

5Mon

@intelligent_invest_99 Do you also use OpenWebUI?

•

Lightw8 Investor@lightw8_invest

5Mon

@Klassischer I just tested it out and found it super easy to quickly create a surface👍

••

FinaceSF@FinaceSF

5Mon

Thanks!

•

Winkl@Winkl

5Mon

Very well-founded, thank you. 👍

•

Lightw8 Investor@lightw8_invest

5Mon

@Winkl with pleasure :)

••

Ben@ben922

5Mon

Where is comparison to Grok?

••

Lightw8 Investor@lightw8_invest

5Mon

@ben922 grok is not on the benchmarks because its X’s proprietary model

••

CFG@CEEFGE

5Mon

Very good article, much has already been put into perspective in terms of price and IT technology. The DeepSeek scare is one of dozens of development steps that will hardly be remembered in a few months' time (but DeepSeek will).
But there is no doubt that the scare has caused a minor tremor, and that is where the political factor comes in. China is becoming a major player behind the US, in the long run with or illegally imported US hardware. And with that China gains influence, even if the AI runs on premise in island mode this LLM(as well as qwen) has learned its own truths...

••