微軟退休工程師對【Deepseek R1】模型的權威解讀(中、英字幕)

本帖於 2025-01-28 07:16:14 時間, 由普通用戶 宇之道 編輯

嗨,我是戴夫。歡迎來到我的商店。我是戴夫·普拉默,來自
微軟回到了 MS-DOS 和 Windows 95 時代,今天我們要解決一個重大問題
科技界的重大變革。中國開源人工智能模型DeepSeek R1發布。
馬克·安德森將這一進展描述為“斯普特尼克時刻”
理由很充分。正如人造衛星的發射挑戰了人們對美國
20 世紀的技術主導地位,DeepSeek R1 正在迫使
21 世紀。多年來,許多人認為,人工智能霸權的爭奪牢牢掌握在
OpenAI 和 Anthropic 等老牌公司。但隨著這一突破,一個新的競爭對手
不僅僅是進入該領域,他們也遠遠超出了預期。如果你關心
關於人工智能創新和全球技術競爭的未來,你會想
了解 DeepSeek R1 是什麽、它為什麽重要、它是否隻是一場大型心理戰,以及
對整個世界來說,這意味著什麽?讓我們深入探討一下。
首先,這是真正讓行業感到不安並導致
Nvidia 和 Microsoft 等公司都感到震驚。DeepSeek R1 不僅達到或超過了
OpenAI 的 O1 等美國最佳人工智能模型的性能,他們以低成本做到了這一點,
據報道,不到 600 萬美元。與已經數百億美元的交易相比,
投資,甚至更多,在這裏實現類似的結果,更不用說 5000 億美元的討論
星際之門事件值得警惕。因為中國不僅聲稱自己做到了
成本很低,但據報道,他們沒有使用最新的 Nvidia 芯片。如果
確實,這就像在車庫裏用雪佛蘭的備用零件造一輛法拉利。如果你
你可以在自己的店裏組裝一輛法拉利,它真的和
一輛普通的法拉利,你認為這對法拉利的價格有什麽影響?所以這有點
就像那樣。
DeepSeek R1 到底是什麽?它是一種新的語言模型,旨在提供強大的性能
超出其能力範圍。雖然訓練規模較小,但仍能回答問題,
生成文本並理解上下文。它與眾不同之處不僅在於其能力,
而是它的構建方式。DeepSeek 的設計目標是廉價、高效,而且令人驚訝的是
資源豐富,利用更大的基礎人工智能,如 OpenAI 的 GPT-4 或 Meta 的 Lama
搭建腳手架來創造更大的東西。
讓我們來分析一下。因為從本質上講,DeepSeek R1 是一個精簡的語言模型。當你
訓練一個大型人工智能模型,最終會得到一個龐大的數據,數千億甚至數萬億
參數,消耗數 TB 的數據,並且需要一個數據中心的 GPU
才能正常運作。但是如果你在執行大多數任務時不需要那麽多電力怎麽辦?
這就是蒸餾的想法的由來。你拿一個更大的模型,比如 GPT-4
或者 6710 億參數的龐然大物 R1,你用它來訓練較小的參數。
就像工匠大師教導徒弟一樣。你不需要徒弟知道一切,
足以很好地完成實際工作。
DeepSeek R1 將這種方法發揮到了極致。通過使用更大的模型來指導訓練,
DeepSeek 的創造者已經成功壓縮了
更大的係統變成更小更輕量的東西。結果呢?一個模型
不需要大型數據中心即可運行。你可以運行這些較小的變體
在一款不錯的消費級 CPU 甚至功能強大的筆記本電腦上,這將改變遊戲規則。
但這是如何實現的呢?嗯,這有點像通過示例進行教學。假設你有一個
大型模型了解天體物理學、莎士比亞和 Python 編碼的一切。相反
試圖複製原始的計算能力,DeepSeek R1 試圖模仿
更大模型的輸出適用於各種問題和場景。通過仔細選擇
通過示例和迭代訓練過程,你可以教小模型生成
無需存儲所有原始信息,即可獲得類似的答案。這有點像
比如在沒有整個圖書館的情況下複製答案。
這就是更有趣的地方。因為 DeepSeek 不僅僅依賴於單個
該過程的大型模型,它使用了多個人工智能,包括一些開源人工智能,如
Mattas Lama,在培訓中提供多樣化的觀點和解決方案。思考
它就像一個專家小組,專門培養一個非常聰明的學生。
結合不同架構和數據集的見解,DeepSeek R1 達到了
在如此小的模型中,這種穩健性和適應性是罕見的。
現在下結論還為時過早,但該模型的開源性質意味著
模型中內置的任何偏見或過濾器都應該可以公開發現
可用權重。這是一種奇特的說法,即當
該模型是開源的。
事實上,我的第一個測試是向 DeepSeek 詢問哪張著名的照片描繪了一個站立的男人
在坦克陣前。它正確地回應了天安門廣場的抗議活動,
照片證書、拍攝者,甚至圍繞著它的審查問題。
當然,DeepSeek 的在線版本可能完全不同,因為我正在運行
它在當地下線了,誰知道他們在中國會得到什麽版本,但是公開版本
您可以下載,看起來穩定可靠。
那麽為什麽這一切很重要呢?首先,它大大降低了人工智能的準入門檻。相反
需要大規模的基礎設施和自己的核電站來部署大型
語言模型,你可以通過更小的設置來實現。這很好
為小型公司、研究實驗室甚至想要嚐試的業餘愛好者提供新聞
無需花費太多錢就能獲得人工智能。
事實上,我在配備 Nvidia RTX 6000 AIDA 的 AMD Threadripper 上運行它
GPU 有 48GB VRAM,我可以運行最大的 6710 億參數模型,
它每秒仍能生成超過 4 個 token。即使是 320 億版本也能運行
在我的 MacBook Pro 上運行良好,較小的 Oren Nano 運行良好,售價為 249 美元。
但有一個問題。廉價建造東西有一定的風險。首先,較小的
模型通常會難以應對較大模型所擁有的知識的廣度和深度。
他們更容易出現幻覺,有時會做出自信但錯誤的反應,
他們可能不太擅長處理高度專業化或細致入微的查詢。此外,
由於這些較小的模型依賴於較大模型的訓練數據,因此它們隻能
和老師一樣優秀。因此,如果他們訓練的大型模型中存在錯誤或偏差
這些問題會逐漸延伸至較小的問題。
然後還有擴展的問題。DeepSeq 的效率令人印象深刻,但它也凸顯了
所涉及的權衡。由於專注於成本和可及性,DeepSeq R1 可能無法與
在尖端能力方麵,它直接與最大的參與者競爭。相反,它
作為一種實用且具有成本效益的替代方案,它為自己開辟了一個重要的市場。
在某些方麵,這種方法讓我想起了個人計算的早期階段。返回
然後大型主機開始主導這個行業,然後這些零碎的
雖然小型電腦不能完成所有事情,但對於很多人來說已經足夠好了
世界。快進幾十年,個人電腦徹底改變了計算。DeepSeq 可能不會
可能是 GPT-5,但它可以為更加民主化的人工智能格局鋪平道路,在這種格局中,先進的
工具並不局限於少數科技巨頭。
這其中的影響是巨大的。想象一下,針對特定行業量身定製的人工智能模型,
在本地硬件上保護隱私和控製,甚至嵌入智能手機等設備
和智能家居中心。擁有自己的個人人工智能助理的想法,一個不會
依靠龐大的雲後端,突然感覺更容易實現。當然,
前路並非沒有挑戰。DeepSeq 和類似的模型必須證明它們能夠
可靠地處理實際任務,有效地擴展,並在主導的領域繼續創新
迄今為止,競爭對手要強大得多。
但如果說我們從技術史中學到了什麽,那就是創新
並不總是來自最大的參與者。有時隻需要一個新視角
以及願意(有時是必須)以不同的方式做事。DeepSeq R1 信號
中國不僅是全球人工智能競賽的參與者,更是強大的競爭對手
能夠生產尖端的開源模型。對於像 OpenAI 這樣的美國人工智能公司來說,
穀歌、DeepMind 和 Anthropic 提出了雙重挑戰——保持技術
麵對日益強大、成本效益更高的產品,保持市場領先地位並證明溢價的合理性
替代品。
那麽這對美國人工智能有何影響?DeepSeq 等開源模型
R1 允許全球開發者以更低的成本進行創新。這可能會削弱競爭
專有模型的優勢,特別是在研究和中小型領域
企業采用。嚴重依賴訂閱或基於 API 的收入的美國公司
可能會感受到壓力,從而可能抑製投資者的熱情。
DeepSeq R1 作為開源軟件的發布也使得強大的 AI 功能變得民主化。
世界各地的公司和政府都可以在其基礎上繼續發展,而無需
擔心許可證問題或美國公司施加的限製。這可能會加速人工智能的采用
但減少了對美國開發車型的需求,影響了企業的收入來源
比如 OpenAI 和 Google Cloud。在股票市場中,嚴重依賴人工智能許可的公司,
雲基礎設施、NVIDIA 的芯片或 API 集成可能麵臨下行壓力
因為投資者考慮到預計增長速度下降或競爭加劇。
在介紹中,我提到了這個 PsiOp 角度的潛力,盡管我
我自己並不是一個陰謀論者,但有人認為我們不應該
相信中國人對模型生產過程的描述。如果
這部片子是在二流硬件上製作的,隻花了幾百萬美元,這已經是一部大片了。但有些
有人認為,也許中國在國家層麵投入巨資進行援助,是希望打亂
通過讓本來應該非常困難的事情看起來便宜來改變美國的現狀
並且容易。但隻有時間才能告訴我們答案。
這就是 DeepSeq R1 的簡要介紹。這是一款小型人工智能,它發揮了超乎尋常的作用,
使用巧妙的技術,旨在讓更多的人能夠使用先進的人工智能
從未有過。它並不完美,也沒有試圖完美,但它是一次迷人的一瞥
人工智能的未來可能是什麽樣的——輕量、高效,但邊緣略顯粗糙,
但充滿潛力。
現在,如果你發現這個關於 DeepSeq 的小解釋器具有任何信息量的組合
或娛樂,記住我主要是為了訂閱和喜歡,所以我會
如果您考慮訂閱我的頻道以獲得更多類似內容,我將非常榮幸。
底部還有一個分享按鈕,所以你的工具欄上會有
是一個轉發圖標,您可以單擊它來將此內容發送給您
認為可能想要了解,但不知道這個頻道。所以如果你
想要告訴他們有關 DeepSeq R1 的信息,請向他們發送此視頻的鏈接。
如果你有任何與自閉症譜係相關的有趣問題,請查看免費的
在亞馬遜上查看我的書的樣本。這是我現在所知道的關於如何過上最好的生活的一切
我希望很久以前就知道這個頻譜。
與此同時,希望下次能在 Dave's 見到你
車庫。

 

Hey, I'm Dave. Welcome to my shop. I'm Dave Plummer, a retired software engineer from
Microsoft going back to the MS-DOS and Windows 95 days, and today we're tackling a seismic
shift in the world of technology. The release of China's open source AI model DeepSeek R1.
This development has been described as nothing less than a Sputnik moment by Marc Andreessen
and for good reason. Just as the launch of Sputnik challenged assumptions about American
technological dominance in the 20th century, DeepSeek R1 is forcing a reckoning in the
21st. For years, many believed that the race for AI supremacy was firmly in the hands of
the established players like OpenAI and Anthropic. But with this breakthrough, a new competitor
has not just entered the field, they've also seriously outpaced expectations. If you care
about the future of AI innovation and global technological competition, you'll want to
understand what DeepSeek R1 is, why it matters, whether it's just a giant psyop, and what
it means for the world at large. Let's dive in.
To set the stage, here's the part that really upset the industry and sent the stocks of
companies like Nvidia and Microsoft reeling. Not only does DeepSeek R1 meet or exceed the
performance of the best American AI models like OpenAI's O1, they did it on the cheap,
reportedly for under $6 million. And when you compare that to the tens of billions already
invested, if not more, here to achieve similar results, not to mention the $500 billion discussion
around Stargate, it's cause for alarm. Because not only does China claim to have done it
cheaply, but they reportedly did it without access to the latest of Nvidia's chips. If
true, it's akin to building a Ferrari in your garage out of spare Chevy parts. And if you
can throw together a Ferrari in your shop on your own and it's really just as good as
a regular Ferrari, what do you think that does to Ferrari prices? So it's a little bit
like that.
And just what is DeepSeek R1? It's a new language model designed to offer performance that punches
above its weight. Trained on a smaller scale, but still capable of answering questions,
generating text and understanding context. And what sets it apart isn't just the capabilities,
but the way that it's been built. DeepSeek is designed to be cheap, efficient and surprisingly
resourceful, leveraging larger foundational AIs like OpenAI's GPT-4 or Meta's Lama as
scaffolding to create something much larger.
Let's unpack that. Because at its core, DeepSeek R1 is a distilled language model. When you
train a large AI model, you end up with something massive, hundreds of billions if not a trillion
parameters, consuming terabytes of data and requiring a data center's worth of GPUs just
to function. But what if you don't need all that power for most tasks?
And that's where the idea of distillation comes in. You take a larger model like a GPT-4
or the 671 billion parameter behemoth R1 and you use it to train the smaller ones. It's
like a master craftsman teaching an apprentice. You don't need the apprentice to know everything,
just enough to do the actual job really well.
DeepSeek R1 takes this approach to an extreme. By using larger models to guide its training,
DeepSeek's creators have managed to compress the knowledge and reasoning capabilities of
much bigger systems into something far smaller and more lightweight. The result? A model
that doesn't need massive data centers to operate. You can run these smaller variants
on a decent consumer-grade CPU or even a beefy laptop, and that's a game changer.
But how does this work? Well, it's a bit like teaching by example. Let's say you have a
large model that knows everything about astrophysics, Shakespeare, and Python coding. And instead
of trying to replicate that raw computational power, DeepSeek R1 is trying to mimic the
outputs of the larger model for a wide range of questions and scenarios. By carefully selecting
examples and iterating over the training process, you can teach the smaller model to produce
similar answers without needing to store all that raw information itself. It's kind of
like copying the answers without the entire library.
And here's where it gets even more interesting. Because DeepSeek didn't just rely on a single
large model for the process, it used multiple AIs, including some open source ones like
Mattas Lama, to provide diverse perspectives and solutions during the training. Thinking
of it assembling like a panel of experts to train one exceptionally bright student. By
combining insights from different architectures and datasets, DeepSeek R1 achieves a level
of robustness and adaptability that's rare in such a small model.
It's too early to draw very many conclusions, but the open source nature of the model means
that any biases or filters built into the model should be discoverable in the publicly
available weights. Which is a fancy way of saying that it's hard to hide that stuff when
the model is open source.
In fact, one of my first tests was to ask DeepSeek what famous photo depicts a man standing
in front of a line of tanks. It correctly answered the Tiananmen Square protests, the
certificates of the photo, who took it, and even the censorship issues surrounding it.
Of course, the online version of DeepSeek may be completely different because I'm running
it offline locally, and who knows what version they get within China, but the public version
that you can download seems solid and reliable.
So why does all this matter? For one, it dramatically lowers the barrier to entry for AI. Instead
of requiring massive infrastructure and your own nuclear power plant to deploy a large
language model, you could potentially get by with a much smaller setup. That's good
news for smaller companies, research labs, or even hobbyists looking to experiment with
AI without breaking the bank.
In fact, I'm running it on our AMD Threadripper that's equipped with an Nvidia RTX 6000 AIDA
GPU that has 48GB of VRAM, and I can run the very largest 671 billion parameter model and
it still generates more than 4 tokens per second. And even the 32 billion version runs
nicely on my MacBook Pro, and the smaller ones run down to the Oren Nano for $249.
But there's a catch. Building something on the cheap has some risks. For starters, smaller
models often struggle with the breadth and depth of knowledge that the larger ones have.
They're more prone to hallucinations, generating confident but incorrect responses sometimes,
and they might not be as good at handling highly specialized or nuanced queries. Additionally,
because these smaller models rely on training data from the larger ones, they're only as
good as their teachers. So if there are errors or biases in the large models that they train
on, those issues can trickle down into the smaller ones.
And then there's the issue of scaling. DeepSeq's efficiency is impressive, but it also highlights
the tradeoffs involved. By focusing on cost and accessibility, DeepSeq R1 might not compete
directly with the biggest players in terms of cutting edge capabilities. Instead, it
carves out an important niche for itself as a practical, cost-effective alternative.
In some ways, this approach reminds me a bit of the early days of personal computing. Back
then you had massive mainframes dominating the industry, and then along came these scrappy
little PCs that couldn't quite do everything but what were good enough for a lot of the
world. Fast forward a few decades and the PC revolutionized computing. DeepSeq might not
be GPT-5, but it could pave the way for a more democratized AI landscape where advanced
tools aren't confined to a handful of tech giants.
The implications here are huge. Imagine AI models tailored to specific industries, running
on local hardware for privacy and control, or even embedded in devices like smartphones
and smart home hubs. The idea of having your own personal AI assistant, one that doesn't
rely on a massive cloud backend, suddenly feels a lot more attainable. Of course, the
road ahead isn't without its challenges. DeepSeq and models like it must prove that they can
handle real-world tasks reliably, scale effectively, and continue to innovate in a space dominated
so far by much larger competitors.
But if there's one thing we've learned from the history of technology, it's that innovation
doesn't always come from the biggest players. Sometimes all it takes is a fresh perspective
and a willingness, or sometimes a necessity, to do things differently. DeepSeq R1 signals
that China is not just a participant in the global AI race, but a formidable competitor
capable of producing cutting-edge open-source models. For American AI companies like OpenAI,
Google, DeepMind, and Anthropic, this creates a dual challenge - maintaining technological
leadership and justifying the price premium in the face of increasingly capable, cost-effective
alternatives.
So what are the implications for American AI? Well, open-source models like DeepSeq
R1 allow developers worldwide to innovate at lower cost. This could undermine the competitive
advantage of proprietary models, particularly in areas like research and small to medium
enterprise adoption. US companies that rely heavily on subscription or API-based revenue
could feel the squeeze, potentially dampening investor enthusiasm.
The release of DeepSeq R1 as open-source software also democratizes access to powerful AI capabilities.
Companies and governments around the world can build upon its foundation without the
licensing fears or the restrictions imposed by US firms. This could accelerate AI adoption
globally but reduce demand for US-developed models, impacting revenue streams for firms
like OpenAI and Google Cloud. In the stock market, companies heavily reliant on AI licensing,
cloud infrastructure, NVIDIA's chips, or API integrations could face downward pressure
as investors factor in lower projected growth or increased competition.
In the intro, I made a side reference to the potential of this PsiOp angle, and while I'm
not much of a conspiracy theorist myself, some have argued that perhaps we should not
take the Chinese up their word when it comes to how the model was produced. If it really
was produced on second-tier hardware for just a few million dollars, it's major. But some
argue that perhaps China invested heavily at the state level to assist, hoping to upset
the status quo in America by making what is supposed to be very hard look supposedly cheap
and easy. But only time will tell.
So that's DeepSeq R1 in a nutshell. A scrappy little AI, punching above its weight, built
using clever techniques and designed to make advanced AI accessible to more people than
ever before. It's not perfect, it's not trying to be, but it's a fascinating glimpse into
what the future of AI might look like - lightweight, efficient, and a little rough around the edges,
but full of potential.
Now if you found this little explainer on DeepSeq to be any combination of informative
or entertaining, remember that I'm mostly in this for the subs and likes, so I'd be
honoured if you'd consider subscribing to my channel to get more like it.
There's also a share button down in the bottom here, so somewhere in your toolbar there'll
be a forward icon which you can use to click on to send this to somebody else that you
think probably wants to be educated and just doesn't know about this channel. So if you
want to tell them about DeepSeq R1, send them a link to this video.
If you have any interesting matters related to the autism spectrum, check out the free
sample of my book on Amazon. It's everything I know now about living your best life on
the spectrum that I wish I'd known long ago.
In the meantime and in between time, hope to see you next time, right here in Dave's
Garage.
 

 

請您先登陸,再發跟帖!