奇怪,我問同樣的問題,得到相反的回答。

來源: 2025-01-30 11:27:19 [舊帖] [給我悄悄話] 本文已被閱讀:

問:Is deepseek's r1 model distillated from ChatGPT

 

答:

No, DeepSeek’s R1 model is not a distillation of ChatGPT. It is an independently developed model with its own architecture and training approach. While both are large language models, DeepSeek’s R1 and ChatGPT are distinct creations by different organizations.

 

Distillation typically involves creating a smaller model that mimics the behavior of a larger model. However, DeepSeek’s R1 is not known to be based on distillation from ChatGPT. Instead, it utilizes a Mixture-of-Experts (MoE) architecture, which is different from the dense transformer architecture used in ChatGPT. This MoE model allows for more efficient computational performance, activating only a subset of parameters during inference rather than using the entire model.

 

While both models aim to excel at language tasks, including understanding and generation, they have different technical designs and training methodologies.