DeepSeek - Pride, fear, disbelief, disgust

JaminBall
01-28

The debates around DeepSeek are intense - US vs. China, big vs. small models, open vs. closed source, and the shockingly efficient architecture it represents. Pride, fear, disbelief, disgust - all these emotions have clouded the facts. A few personal thoughts:

Thoughts on Training Costs:

1⃣ $6M Training Costs = Plausible IMO

Quick math: Training costs ∝ (active params * tokens). DeepSeek v3 (37B params; 14.8T tokens) vs. Llama3.1 (405B params; 15T tokens) = v3 theoretically should be 9% of Llama3.1's cost. And the disclosed actual figures aligned with this back-of-the-envelope math, meaning, the number are directionally believable.

ImageImage

Plus, there was no hiding, the footnote clearly said: “the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.”.

Comparison of training costs between models trained at different times is inherently flawed: Training costs have been improving non-stop. Saying DeepSeek v3 (Jan 2025) is 1/10th the training cost of $Meta Platforms, Inc.(META)$ Llama3.1 (July 2024) is very misleading. Training costs have been dropping exponentially due to advancements in compute and algorithms.

Pre-training a model with hundreds of billions of parameters in the U.S. today costs <$20M (go ask the engineers who actually build LLMs). DeepSeek may be ~50% more cost-effective than its U.S. peers - which seems entirely plausible to me! It's like how a smaller-engine Japanese car can perform comparably to a larger-engine American car, thanks to engineering breakthroughs like turbocharging and lightweight design.

2⃣ Training vs. R&D Costs

It’s tricky for all the labs to define training costs since a lot of experiments (incl. data costs) blend into training runs.

DeepSeek possibly required ~$500M capex (rumored 10K A100s + 2-3K H800s), still far less than top U.S. labs but a lot higher than $6M.

3⃣ First Movers vs. Followers

Do people really have no idea about the massive R&D cost difference between “first-in-line drugs” vs. “me-too drugs”??

First movers inherently face “wasteful” R&D due to the trial-and-error nature of innovation. But when has humanity ever stopped pushing forward because of that? The effort is always worth it.

Thoughts on Inference:

Inference costs have always been coming down, and DeepSeek just unlocked a step-function drop in inference costs—faster, cheaper, and decent quality.

This is the moment many startup founders and developers have been waiting for! Suddenly, countless applications have achieved product-market fit from a cost perspective!

This should lead to a lot more inference spending, eventually.

Two Things I Believe in:

1️⃣ Better and more efficient AI models = huge tailwind for the AI supercycle.

2️⃣ DeepSeek is a win for open-source AI & brings efficiency to the whole ecosystem.

Closed-source LLMs below this level of performance are irrelevant now. The same shakeup happened after Llama3 was launched, and DeepSeek is now cleaning the house. Open-source ecosystems, including Meta's, will thrive on this momentum.

DeepSeek at this point is more than the company itself. It’s a proof of concept: a hyper-efficient, small model running on cost-effective infrastructure.

If all people can do is whine that the $6M figure "must be lying" and the app store ranking "has to be manipulated," then maybe stop talking, use your brain, and actually go read the v3/R1 paper. MoE, MLA, MTP, DualPipe, RL, FP8—there’s so much to learn if you quit parroting nonsense and actually put in the work.

My sense is that the “more compute vs. less compute” debate isn’t ending soon. But AI’s 70-year history has taught us one thing - compute is king, so maybe this time is not that different. Biggest winners I see from this are builders - this is actually the time to build!

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment
7