Nvidia, AMD, Intel GPU

來源: study169 2024-02-22 17:40:13 [] [舊帖] [給我悄悄話] 本文已被閱讀: 次 (13473 bytes)

Nvidia GPU從性能到開發生態環境絕對領先。但性能到底差多少?網上搜了一下:

Under the right circumstances, we found that Gaudi 2 had the highest LLM training performance vs. the same-generation NVIDIA A100 and AMD MI250 GPUs, with an average speedup of 1.22x vs. the A100-80GB, 1.34x vs. the A100-40GB, and 1.59x vs. the MI250.

On each platform, we ran the same training scripts from LLM Foundry using MPT models with a sequence length of 2048, BF16 mixed precision, and the ZeRO Stage-3 distributed training algorithm. On NVIDIA or AMD systems, this algorithm is implemented via PyTorch FSDP with sharding_strategy: FULL_SHARD. On Intel systems, this is currently done via DeepSpeed ZeRO with Stage: 3 but FSDP support is expected to be added in the near future.

On each system, we also used the most optimized implementation of scaled-dot-product-attention (SDPA) available:

  • NVIDIA: Triton FlashAttention-2
  • AMD: ROCm ComposableKernel FlashAttention-2
  • Intel: Gaudi TPC FusedSDPA

編程工具好像PyTorch最流行,官方版本支持Nvidia和AMD,intel好像有一個改動版支持Gaudi。在LLM早期軍備競賽階段,大公司優先考慮性能最好的。在基礎模型成熟以後,更多finetuning和domain adaptation,有多大必要搶最好的GPU。我們1~2千人AI 研發公司,最常用的GPU是A5000。感覺Nvidia還是有些遠憂,希望老黃能將Nvidia帶到更成功的AI應用領域。從Nvidia的發家史中可以看出老黃的長遠眼光。

請您先登陸,再發跟帖!

發現Adblock插件

如要繼續瀏覽
請支持本站 請務必在本站關閉/移除任何Adblock

關閉Adblock後 請點擊

請參考如何關閉Adblock/Adblock plus

安裝Adblock plus用戶請點擊瀏覽器圖標
選擇“Disable on www.wenxuecity.com”

安裝Adblock用戶請點擊圖標
選擇“don't run on pages on this domain”