AI Factory
Why is it called "AI Factory"?
The term was coined by NVIDIA's Jensen Huang. The analogy is:
Traditional Factory: AI Factory:
Raw materials Raw Data
↓ ↓
Assembly line GPU Compute
↓ ↓
Finished product Trained AI Model / TokensJust like a factory mass-produces physical goods, an AI Factory mass-produces intelligence — tokens, embeddings, model outputs.
Key characteristics that make it a "factory":
| Factory concept | AI Factory equivalent |
|---|---|
| Assembly line | GPU pipeline (data → training → inference) |
| Raw material | Data (text, images, video) |
| Machines | GPUs (H100, H200, B200) |
| Factory floor | Data center / GPU cluster |
| Output | Trained models, tokens, predictions |
| Throughput metric | Tokens/sec, FLOPS/sec |
| Uptime = revenue | GPU utilization = revenue |
The entire infrastructure — networking, power, cooling, storage — is engineered around one goal: keep GPUs busy 100% of the time.
What does "Rail-Optimized" mean?
A rail is a dedicated network path connecting one NIC port per GPU server to one specific ToR switch.
Non-Rail (Traditional) topology:Server has 1 uplink → ToR
All GPU traffic shares same path
Server
│
NIC (single uplink)
│
ToR- Simple but bottleneck — all 8 GPUs fight for one link
- Poor ECMP for RoCE (all flows go same path)
Rail-Optimized topology:GPU Server (8x GPUs)
GPU0 ── NIC0 ──────────────── Rail-ToR-0
GPU1 ── NIC1 ──────────────── Rail-ToR-1
GPU2 ── NIC2 ──────────────── Rail-ToR-2
GPU3 ── NIC3 ──────────────── Rail-ToR-3
GPU4 ── NIC4 ──────────────── Rail-ToR-4
GPU5 ── NIC5 ──────────────── Rail-ToR-5
GPU6 ── NIC6 ──────────────── Rail-ToR-6
GPU7 ── NIC7 ──────────────── Rail-ToR-7
Each GPU has its own dedicated NIC and its own dedicated ToR switch (rail) — no sharing.
Comments
Post a Comment