More information and confirmation on Deepseek! SemiAnalysis has published a detailed report confirming that the $6 million cost is misleading and that the company has access to thousands of GPUs.
TL; DR; (free version):
✅ $6 million "training cost" is misleading - not including infrastructure and operating costs ($1.3 billion server CapEx, $715 million operating costs)
✅ ~50,000+ GPUs (H100/H800/H20)
🆕 Can subsidize pricing for inference (50% cheaper than GPT-4o) to gain market share
🆕 Export control loopholes enabled the expansion of GPU clusters worth $1.3 billion before the H20 restrictions
✅ Multi-head latent attention (MLA), multi-token prediction and mixture-of-experts increase efficiency
🆕 DeepSeek R1's reasoning is good, but Google's Flash 2.0 Thinking is comparable and cheaper.
✅ ~150 employees with competitive salaries of up to USD 1.3 million for top talent