AI/HPC workloads have flipped data center design. Atlas DC1 is air-cooled and traditional. AI clusters need liquid cooling, busway distribution, sub-millisecond network fabric, and 5-100× the power density. This section maps the gap.
What Makes AI/HPC Different
Atlas DC1 was designed for traditional cloud + enterprise workloads — distributed servers, mixed densities, web/database work. AI/HPC is fundamentally different. Training a single large language model can use 25,000+ GPUs in tightly-coupled clusters with sub-millisecond network coordination. The design constraints flip:
Dimension
Traditional DC (Atlas DC1)
AI/HPC
Rack density
~ 12 kW/rack
30-100+ kW/rack (training); up to 200+ for inference
Cooling
Air (CRAH + containment)
Liquid (DLC, immersion) — mandatory
Power per row
1-1.5 MW
5-20+ MW
Network
10-100 Gbps Ethernet, ms latency OK
InfiniBand or NVLink, sub-ms latency required
Workload pattern
Bursty (web requests come/go)
Constant (training runs for weeks at full load)
Failure tolerance
Application-level (web servers fail individually)
Cluster-level (one server failing can halt 1000-server training run)
Power continuity
UPS ride-through 5 min OK
Same — but checkpointing failures can cost days of training time
PUE target
1.3-1.5
1.05-1.2
Capital cost / MW
$15-22M/MW
$25-50M/MW (cooling + network premium)
Build timeline
18-24 months
24-36 months (custom mech, complex commissioning)
The AI Compute Stack — What's Inside an "AI Cluster"
Layer
Component
Power per unit
Accelerator
NVIDIA H100 (700W), H200 (1000W), B100/B200 (~ 1200W), AMD MI300 (750W), custom (Google TPU, AWS Trainium, Meta MTIA, Microsoft Maia)
700-1200W per chip
Server (DGX-style)
8 GPUs + 2 CPUs + memory + NICs
10-12 kW per server (NVIDIA HGX H100 = 10.2 kW)
Rack
4-8 servers per rack (with cooling)
40-80+ kW/rack
Pod / Cluster
16-128 racks tightly coupled by InfiniBand
1-10+ MW per pod
SuperPOD / SuperCluster
Multiple pods coordinated for very large training (NVIDIA SuperPOD = 32-127 DGX systems)
AI training requires GPUs in different servers to share gradients millions of times per second. Standard Ethernet has too much latency. Two competing technologies:
Technology
Use
Power impact
NVLink (NVIDIA)
GPU-to-GPU within and between servers — 900 GB/s per link, sub-microsecond latency
Switch racks (NVLink switches) consume 5-20 kW each
InfiniBand (Mellanox/NVIDIA)
Server-to-server within pod — 400 Gbps per port, microsecond latency
IB switches consume 1-3 kW each
Ethernet (RoCE)
Alternative for scale-out; emerging Ultra Ethernet
Lower than InfiniBand
Optical interconnect
Cross-pod cabling at 800 Gbps+ optical
Optical transceivers add 10-30W per port
For a 10 MW AI cluster, the network fabric alone can consume 5-10% of total power — not negligible.
Power Distribution Architecture for AI/HPC
Element
Atlas DC1 (traditional)
AI/HPC equivalent
Service voltage
12.47 kV utility
Same OR higher (138 kV for hyperscale campuses)
Service transformers
2 × 2,500 kVA
Multiple 5-30 MVA transformers (per pod)
Distribution voltage
480Y/277V to 415Y/240V
480V or 415V → some hyperscale exploring 800V DC for direct-to-server feed
Per-row distribution
RPP panelboard (400 A)
Bus duct (2,000-4,000 A)
Per-rack delivery
30-60A branch circuits
100-225A plug-in disconnect from busway
UPS ride-through
5 minutes
Same OR shorter (some designs use rotary UPS for inertia + ride-through)
Redundancy
2N (dual-fed servers)
2N OR distributed redundant (4N3) at hyperscale; some accept N+1 at module level
Cooling power
~ 30% of IT
~ 5-15% of IT (DLC much more efficient)
Worked Example — A 10 MW AI Pod
Example · NVIDIA HGX H100 SuperPODSizing electrical for a typical AI training cluster
10% of IT (vs 30% for air-cooled) = 146 kW + facility overhead 50 kW = ~ 200 kW
Total facility demand:
1,825 + 200 = ~ 2,025 kW
PUE achieved:
2,025 / 1,460 = ~ 1.39 (could be lower with optimized DLC)
Electrical infrastructure
Service transformer
2,500 kVA pad-mount (or 2 × 1,500 if 2N)
UPS
2 × 1,250 kVA online double-conversion (2N for IT)
Generators
2 × 2,500 kW Tier 4 diesel
Power distribution
480Y/277V busway (4,000 A) down each row
Per-rack feed
100 A plug-in disconnect (10 kW × 1.25 = ~ 30 A safety margin built-in)
Cooling
Direct liquid cooling (CDUs serving multiple racks); chilled water 30°C supply
i
Why this single pod is bigger than half of Atlas DC1
Atlas DC1 = 2.5 MW total. This single AI pod = 1.46 MW IT (2.0 MW total facility). One pod consumes more power than HALF of Atlas DC1's design capacity. Modern hyperscale AI campuses might have 50-100 of these pods coordinating on a single training run.
The Frontier — Coming Architectures
Trend (2026)
Implication
800V DC distribution
Eliminates AC-DC-AC conversion at every PSU. Pioneered by Open Compute Project (OCP). Adopted by hyperscale.
Battery backup IN the rack
Replace centralized UPS with batteries at each rack — eliminates UPS losses, simplifies redundancy
Microgrid + on-site generation
Pair AI campus with on-site PV + ESS + gas turbines. 100+ MW microgrids becoming common.
Submersion / two-phase immersion
Pushing rack densities to 200-400 kW/rack
Heat reuse to district heating
Datacenter waste heat (50-80°C with DLC) feeds neighboring buildings or even municipal heat grids (Helsinki, Stockholm)
Modular AI pods
Factory-built pods shipped to site; deploy in 6 months instead of 24
Co-location with renewables
Build AI campus next to wind/solar farms; long-term PPAs lock in low-cost clean power
If You See THIS, Think THAT
If you see…
Think / use…
"AI/HPC data center"
30-100 kW/rack · DLC mandatory · sub-ms network · single training run uses 1000s of GPUs
"NVIDIA HGX H100" / "DGX"
NVIDIA's reference 8-GPU server. ~ 10 kW. Standard AI building block.
"SuperPOD"
NVIDIA terminology for 32-127 DGX systems coordinated by InfiniBand
"InfiniBand"
Required for tight GPU coordination. Higher cost than Ethernet but required for training.
"NVLink switch"
NVIDIA's GPU-to-GPU interconnect within and between servers
"800V DC"
Open Compute Project standard. Direct DC to server. Hyperscale-only currently.
"Liquid cooling" in 2026 context
Almost certainly DLC (cold plates), increasingly immersion. See §35.
"PUE 1.1" or lower
DLC or immersion. Air-cooled cannot achieve this.
"Hyperscaler"
AWS, Google, Microsoft, Meta, Apple, Alibaba, Tencent. Operate own DCs.
"Cloud GPU on-demand"
End-user accesses these AI clusters via cloud APIs. The DC is hyperscaler's; the GPUs are rented by hour.