These type of models need to be trained across thousands of GPUs, which requires distributed engineering on a much higher level than "normal" distributed systems.
This is true for DeepSeek as well as for others. There are a few companies giving insights or open-sourcing their approaches, such as Databricks/Mosaic and, well, DeepSeek.
The latter also did some particularly clever stuff, but if you look into details so did Mosaic.
OpenAI and Anthropic likely have distributed tools of even larger sophistication. They are just not open source.
This is true for DeepSeek as well as for others. There are a few companies giving insights or open-sourcing their approaches, such as Databricks/Mosaic and, well, DeepSeek. The latter also did some particularly clever stuff, but if you look into details so did Mosaic.
OpenAI and Anthropic likely have distributed tools of even larger sophistication. They are just not open source.