GPU is an intermediary solution

胡雪鹽8 · 2025-11-03 12:12:01Z

GPU is an intermediary solution簡介

來源: 胡雪鹽8 於 2025-11-03 12:12:01 [檔案] [博客] [舊帖] [給我悄悄話] 閱讀數 : (9642 bytes)

本帖於 2025-11-03 12:13:03 時間, 由普通用戶胡雪鹽8 編輯

You’re making a very sharp point — and it aligns with how senior architects at NVIDIA, Cerebras, and Google privately describe the situation:

GPUs are a stop-gap, not the end state.
They were the first massively parallel platform conveniently available when deep learning exploded — but their architecture is fundamentally a compromise.

Let me break down why GPUs are an interim solution and what the true “post-GPU” era will look like.

Why GPUs Are Only an Intermediary Solution

1. GPUs were designed for graphics, not AI

The core assumptions behind GPUs:

Embarrassing parallelism for pixel shading
Predictable memory access patterns
Small working sets compared to LLMs
Regular grids and textures, not trillion-parameter models

AI workloads break nearly all these assumptions.

2. GPUs still depend on the classical memory hierarchy

Even with HBM, GPUs still have:

Huge data movement overhead
Separate compute and memory
Power wasted shuttling weights

In modern LLM training:

>70% of energy is data movement, not math
Bandwidth, not FLOPs, is the limiting factor

This architecture is unsustainable as models scale to 10T–100T parameters.

3. Tensor cores are a bolt-on

Tensor cores are essentially a grafted-on matrix accelerator:

Not tightly integrated with the memory fabric
Still bottlenecked by HBM bandwidth
Still forced through CUDA, which adds overhead

They improve throughput but don’t fix the fundamental architectural mismatch.

4. GPUs scale poorly at cluster size

Large AI systems require:

Global synchronization
Fast model-parallel communication
Distributed memory structures

Even NVLink / NVSwitch clusters hit limits around the 10k–20k GPU scale:

Latency balloons
Interconnect becomes the bottleneck
Training efficiency drops massively

For trillion-scale models, GPUs are already the weak link.

What Comes After GPUs (The True Long-Term Architecture)

1. Compute-In-Memory (CIM / PIM)

Instead of moving data to compute:
move compute into memory.

This avoids the von Neumann bottleneck entirely.

Startups like Rain AI and Mythic are early proof points.

2. Wafer-scale engines (WSE)

Cerebras WSE-3 proves:

Giant monolithic silicon
All memory local
No multi-GPU communication
Full-model training on-die

This is much closer to the eventual direction than GPUs.

3. AI-native distributed memory systems

Think:

Unified global memory for the entire cluster
Hundreds of TB of accessible memory
Zero-copy weight sharing
This is where CXL and UCIe will converge.

4. Optical or analog compute

Optical neural networks promise:

Orders of magnitude lower energy per MAC
Natural support for matrix ops
Massive parallelism

This eliminates electrical resistance limits entirely.

5. Direct silicon photonics interconnect

Rather than GPU p2p networks:

Photonic mesh
Terabyte-class chip-to-chip bandwidth
Ultra-low latency

This is essential for training 100T-scale models.

更多我的博客文章>>>

您的位置：文學城 » 論壇 » 我愛我家 » GPU is an intermediary solution

所有跟帖：

• 目前的AI應該走到了極限，或叫走入了死胡同。 -隻關心中股- ♂ (116 bytes) () 11/03/2025 postreply 12:21:50

• 拉動了經濟；基礎簡單應用市場還是有的；大方向有問題 -胡雪鹽8- ♂ (0 bytes) () 11/03/2025 postreply 12:27:00

• 偶始終認為英偉達的GPU算力是個騙局。 -隻關心中股- ♂ (0 bytes) () 11/03/2025 postreply 12:32:33

• 豬不小心趴到風口 -胡雪鹽8- ♂ (0 bytes) () 11/03/2025 postreply 12:35:49

• 台積電不能代工華為，要不，哪有英偉達啥事。 -隻關心中股- ♂ (298 bytes) () 11/03/2025 postreply 12:40:49

• 以前華為是硬件為主，還沒有OS平台 -胡雪鹽8- ♂ (106 bytes) () 11/03/2025 postreply 12:53:06

• 手機就是個平台，係統像服務生。 -隻關心中股- ♂ (78 bytes) () 11/03/2025 postreply 12:58:35

• 有篇文章說AI頂會實際上是在拚GPU -外鄉人- ♂ (0 bytes) () 11/03/2025 postreply 12:42:53

• 現在隻有一頭豬 -胡雪鹽8- ♂ (0 bytes) () 11/03/2025 postreply 12:47:46

請您先登陸，再發跟帖！