Chinese startup DeepSeek’s AI Assistant on Monday overtook rival ChatGPT to become the top-rated free application available on Apple’s App Store in the United States.
Powered by the DeepSeek-V3 model, which its creators say “tops the leaderboard among open-source models and rivals the most advanced closed-source models globally”, the artificial intelligence application has surged in popularity among U.S. users since it was released on Jan. 10, according to app data research firm Sensor Tower.
The milestone highlights how DeepSeek has left a deep impression on Silicon Valley, upending widely held views about U.S. primacy in AI and the effectiveness of Washington’s export controls targeting China’s advanced chip and AI capabilities.
AI models from ChatGPT to DeepSeek require advanced chips to power their training. The Biden administration has since 2021 widened the scope of bans designed to stop these chips from being exported to China and used to train Chinese firms’ AI models.
However, DeepSeek researchers wrote in a paper last month that the DeepSeek-V3 used Nvidia’s H800 chips for training, spending less than $6 million.
Although this detail has since been disputed, the claim that the chips used were less powerful than the most advanced Nvidia products Washington has sought to keep out of China, as well as the relatively cheap training costs, has prompted U.S. tech executives to question the effectiveness of tech export controls.
Little is known about the company behind DeepSeek, a small Hangzhou-based startup founded in 2023, when search engine giant Baidu released the first Chinese AI large-language model.
Since then, dozens of Chinese tech companies large and small have released their own AI models, but DeepSeek is the first to be praised by the U.S. tech industry as matching or even surpassing the performance of cutting-edge U.S. models.
It offers a “PhD-level” AI at an economical rate of $2.19 per million output tokens compared to OpenAI’s $60 for similar usage. Some industry professionals have highlighted that this affordability has not been widely discussed, particularly in sell-side analyses. This omission could lead to uncertainties regarding the wider adoption and perception of DeepSeek’s capabilities.
The company has been commended for its use of reinforcement learning techniques, which reportedly optimize training costs and reduce complexity. DeepSeek claims its 1.5 billion-parameter R1 model outperforms industry standards, including GPT-4 and Claude 3.5, in select tasks, with minimal hardware requirements. For instance, the R1 model can reportedly operate on devices as simple as an iPhone 16.
Despite these breakthroughs, questions have been raised about DeepSeek’s originality. Critics have likened its approach to copying, citing statements from prominent figures like Sam Altman, who emphasized the risks of derivative models in achieving long-term success. Additionally, DeepSeek’s $6 million training budget has sparked skepticism regarding its scalability compared to competitors with larger investments.
Kenya Insights allows guest blogging, if you want to be published on Kenya’s most authoritative and accurate blog, have an expose, news TIPS, story angles, human interest stories, drop us an email on [email protected] or via Telegram