計算機工程

我是一位長期從事計算機工作的工程師, 我主要是進行多種金融軟件的開發, 也開發了一些其他領域的軟件.
正文

DeepSeek疑似抄襲ChatGPT: 技術及數據源對比分析

(2025-02-04 20:27:56) 下一個

隨著DeepSeek的出現,有關其是否在開發過程中抄襲了ChatGPT的技術的討論逐漸增多。本文將基於對比實驗,探討DeepSeek是否借用ChatGPT的技術,並揭示其在技術實現方麵可能存在的相似性和差異性。

一、驗證DeepSeek的思路

一般來說,驗證兩個係統是否相同,最直接的方法是比較它們在相同輸入條件下的輸出結果。如果兩個係統在處理同一問題時給出的答案完全一致,則可以推測這兩個係統在算法或架構上存在高度的相似性,甚至可能是相同的。在本研究中,主要采用以下兩種驗證方法:

  1. 信息一致性檢驗
    首先,通過從兩個不同的數據庫中調取相同的信息,觀察其輸出結果。如果兩個數據庫返回的結果完全一致,那麽這兩個數據庫的底層結構很可能是相同的。
  2. 特殊變量【MASK】的使用
    利用特殊的變量【MASK】獲得可能性詞匯,檢驗兩種算法是否等同.  具體來說,[MASK] 是一個占位符,表示在這個位置需要填充一個詞語。模型會根據句子中的其他詞語(即上下文)推理算法,預測最合適的詞語,並將其替換到 [MASK] 的位置。通過比較DeepSeek與ChatGPT在相同輸入下對【MASK】位置的填充結果,檢驗兩者的推理機製是否一致。

隨機抽取了67個檢測樣本進行對照檢驗, 發現DeepSeek與ChatGPT具有高度的相似性。下麵舉具體驗證例子實例和結果.

實例1

輸入句子:
Up to 30 [MASK] and babies died at Furness General Hospital because of failings by staff and management, a damning report is [MASK] to reveal.

ChatGPT輸出:
Up to 30 mothers and babies died at Furness General Hospital because of failings by staff and management, a damning report is expected to reveal.

DeepSeek輸出:
Up to 30 mothers and babies died at Furness General Hospital because of failings by staff and management, a damning report is expected to reveal.

在這個示例中,DeepSeek和ChatGPT在預測【MASK】位置時,給出填充“mothers”和“expected”結果完全一致。

實例2

輸入句子:
Taking in the sights of [MASK] is nothing short of [MASK], with its famous souk which houses over 60,000 stalls full of colorful handmade wares.

ChatGPT輸出:
Taking in the sights of Marrakech is nothing short of breathtaking, with its famous souk which houses over 60,000 stalls full of colorful handmade wares.

DeepSeek輸出:
Taking in the sights of Marrakech is nothing short of breathtaking, with its famous souk which houses over 60,000 stalls full of colorful handmade wares.

在這個例子中,DeepSeek和ChatGPT也給出了完全相同的預測詞匯“breathtaking”和“Marrakech”,驗證了兩者在推理和預測時的一致性。

實例3

輸入句子:
Ultimately, the goal of life is a [MASK] that each individual must define for themselves based on their own beliefs, values, and experiences.

ChatGPT輸出:
Ultimately, the goal of life is a journey that each individual must define for themselves based on their own beliefs, values, and experiences.

DeepSeek輸出:
Ultimately, the goal of life is a journey that each individual must define for themselves based on their own beliefs, values, and experiences.

在這個實例中,“journey”這一填充詞在DeepSeek和ChatGPT的輸出中完全一致. 

實例4和實例5是把由不同部分組成的大段訊息進行比較, 任何微小部分的不同, 都可以導致信息的差異性產生. 同時如果算法不同, 也可以導致結果不同. 但是如果信息完全一致且算法也一致的情況下, 結果應是等同的.

實例4

輸入句子:
I purchased the variety pack of Martinsons and the Brown Gold they are both from the same vendor. I did not like the taste of the Martinson product at all. I threw it away, not the product I expected at all. The Brown Gold was more palatable, however certain flavors with this brand also feel short in particular the Costa Rican named product was extremely too bitter. I would agree with most that this does give you a lower cost per serving, but when the coffee is inferior and the taste match is also, I think the best descriptor is [MASK].

ChatGPT輸出:
cheap

DeepSeek輸出:
cheap

這個例子展示了兩者都將【MASK】位置填充為“cheap”,進一步證明它們在算法和推理機製上是相同的。

 

實例5

輸入句子:
Not bad. "These are small and very salty. The taste is good, but very strong, so it's a good thing the package contains a small amount. It only takes a few little crisps to cure my salty/crunchy craving. I can snack on one package for an entire day. Of course, these would not be a good snack if you're very hungry, because there isn't enough there to fill you up. For less than $1 per pack, it's an [MASK].

ChatGPT輸出:
"For less than $1 per pack, it's an okay deal."

DeepSeek輸出:
"For less than $1 per pack, it's an okay deal."

在這個示例中,DeepSeek和ChatGPT在預測【MASK】位置時,給出填充“For less than $1 per pack, it's an okay deal.”結果完全一致。

從上述的對比實驗和技術分析可以得出結論,在使用【MASK】變量的測試中,DeepSeek和ChatGPT在所有樣本中的輸出結果完全一致,表明它們采用了相同的推理算法, 技術框架和數據源。由於DeepSeek與ChatGPT之間高度的相似性,DeepSeek的技術可能涉嫌抄襲。

 

[ 打印 ]
閱讀 ()評論 (3)
評論
AP33912 回複 悄悄話 應是主要來自Open AI 的copy,但沒有法律保護,因為open AI也是從NY 時報、百科全書等各方麵抄來的。對很當前的東西DS或圖片DS就不行了,人家開始有防備了,或者還在路上。總之Deepseek是open AI的一個板塊而已。
AP33912 回複 悄悄話 應是主要來自Open AI 的copy,但沒有法律保護,因為open AI也是從NY 時報、百科全書等各方麵抄來的。對很當前的東西DS或圖片DS就不行了,人家開始有防備了,或者還在路上。總之Deepseek是open AI的一個板塊而已。
ahhhh 回複 悄悄話 采用了相同的推理算法, 技術框架和數據源?你對AI的理解是錯的。
登錄後才可評論.