Nvidia, AMD, Intel GPU

來源: study169 於 2024-02-22 17:40:13 [檔案] [舊帖] [給我悄悄話] 閱讀數 : (13473 bytes)

Nvidia GPU從性能到開發生態環境絕對領先。但性能到底差多少？網上搜了一下：

Under the right circumstances, we found that Gaudi 2 had the highest LLM training performance vs. the same-generation NVIDIA A100 and AMD MI250 GPUs, with an average speedup of 1.22x vs. the A100-80GB, 1.34x vs. the A100-40GB, and 1.59x vs. the MI250.

On each platform, we ran the same training scripts from LLM Foundry using MPT models with a sequence length of 2048, BF16 mixed precision, and the ZeRO Stage-3 distributed training algorithm. On NVIDIA or AMD systems, this algorithm is implemented via PyTorch FSDP with sharding_strategy: FULL_SHARD. On Intel systems, this is currently done via DeepSpeed ZeRO with Stage: 3 but FSDP support is expected to be added in the near future.

On each system, we also used the most optimized implementation of scaled-dot-product-attention (SDPA) available:

NVIDIA: Triton FlashAttention-2
AMD: ROCm ComposableKernel FlashAttention-2
Intel: Gaudi TPC FusedSDPA

編程工具好像PyTorch最流行，官方版本支持Nvidia和AMD，intel好像有一個改動版支持Gaudi。在LLM早期軍備競賽階段，大公司優先考慮性能最好的。在基礎模型成熟以後，更多finetuning和domain adaptation，有多大必要搶最好的GPU。我們1～2千人AI 研發公司，最常用的GPU是A5000。感覺Nvidia還是有些遠憂，希望老黃能將Nvidia帶到更成功的AI應用領域。從Nvidia的發家史中可以看出老黃的長遠眼光。

您的位置：文學城 » 論壇 » 大千股壇 » Nvidia, AMD, Intel GPU

請您先登陸，再發跟帖！