Data Center Tiers & Reliability
"Tier III, 2N redundant, 4 nines uptime." Three different things, often confused. This page explains what each means, how they relate, and how to read a data center spec sheet without bluffing.
The Two Tier Standards — They're Different
Two organizations publish "Tier" classifications. Both use the same number (Tier I-IV) but mean different things. Always ask which standard.
| Uptime Institute Tiers | ANSI/TIA-942 Rated Tiers | |
|---|---|---|
| Authority | Uptime Institute (private company) | ANSI / TIA (telecommunications standards) |
| Certification process | Uptime Institute conducts on-site audit + design review. Pays for license to use term "Tier-Certified." | Self-declared or independent firm assessment. No license fee. |
| Naming | "Tier I, II, III, IV" | "Rated 1, 2, 3, 4" |
| Scope | Topology + concurrent maintainability + fault tolerance | Topology + facility (including telecom rooms, fire, HVAC, security, environmental) |
| Common in industry | Most-cited globally (especially for hyperscale + colo) | Common in US enterprise + government |
| Atlas DC1 marketing | "Tier III equivalent" (we don't pay for cert) | "Rated 3 equivalent" |
Uptime Institute Tiers — The Canonical Definition
| Tier | Name | Topology requirement | Maintenance impact | Fault impact |
|---|---|---|---|---|
| I | Basic Capacity | Single non-redundant path. No N+1 anywhere. | Maintenance = shutdown | Any single failure = shutdown |
| II | Redundant Capacity Components | Single path + N+1 redundant capacity components (UPS modules, gens, chillers) | Maintenance of capacity component = OK; path maintenance = shutdown | Path failure = shutdown; capacity component failure = OK |
| III | Concurrently Maintainable | Multiple distribution paths + N+1 capacity components. Each path independently maintainable. | ANY single component or path can be taken offline for planned maintenance without dropping IT load. | Single fault may still drop load (if it's the active path) |
| IV | Fault Tolerant | 2 simultaneously active paths + N+1 capacity in each. Compartmentalization (separate fire zones, structural). | Concurrent maintenance OK | Single fault anywhere = no impact on IT load. Multiple-fault tolerant. |
Atlas DC1 = "Tier III equivalent"
Atlas DC1 has 2N redundancy (two independent paths, each carrying full IT load) + N+1 within each side (e.g., 2 UPS per side where 1 covers the IT load). This meets Tier III: any single component (TX-A, GEN-A, ATS-A, UPS-A1, etc.) can be taken offline for service without dropping IT load (Side B carries everything). It also approaches Tier IV in many ways but lacks the formal certification + some compartmentalization details (separated fire zones for each path, etc.).
Redundancy Notation — N, N+1, 2N, 2(N+1), 2N+1
| Notation | Meaning | Concrete example (4 chillers needed) | Single failure result | Cost premium |
|---|---|---|---|---|
| N | Just enough capacity to carry the load | 4 chillers · 4 needed · 0 spare | Lose load | Reference (1.0×) |
| N+1 | One extra unit beyond what's needed (parallel-redundant capacity) | 5 chillers · 4 needed · 1 spare | Lose 1 of 5 → still 4 running → OK | ~ 1.25× |
| N+2 | Two extras (rare; for very-high-confidence apps) | 6 chillers | Lose 2 → still 4 running → OK | ~ 1.5× |
| 2N | Two completely independent capacity systems, each can carry full load alone | 8 chillers — 4 on Side A + 4 on Side B | Lose entire Side A → Side B still serves all load | ~ 2.0× |
| 2(N+1) | Two complete systems, each with its own N+1 | 10 chillers — 5 on each side | Lose Side A AND a Side B chiller → still 4 on B → OK | ~ 2.5× |
| 2N+1 | 2N plus one more spare (uncommon) | 9 chillers | Lose Side A + 1 spare → still 5 → OK. Same as 2(N+1) practically. | ~ 2.25× |
| 3N | Three completely independent systems (rare; financial-grade) | 12 chillers · 3 paths × 4 each | Lose 2 paths → still 1 path with full capacity | ~ 3.0× |
Distributed Redundant — Block Redundancy / Catcher / XNY
This topology has many names. All mean the same thing:
| Industry term | Where you'll hear it |
|---|---|
| Distributed Redundant (DR) | Uptime Institute, formal engineering specs |
| Block Redundancy | Hyperscale operators (Microsoft, Google, Meta) |
| Catcher Topology | Industry slang — the "catcher" is the spare module |
| X-to-Net-Y (XNY) | Vendor specs and quantitative documents (e.g., 3N2, 4N3, 5N4) |
| "3 to make 2" / "4 to make 3" | Conversational shorthand |
| Common Bus / Shared Bus | When all modules connect to a common output bus, vs isolated |
Hyperscale operators use this topology because it's far more capital-efficient than 2N at scale.
How XNY works
X = total modules installed. Y = modules needed to carry full load. The remaining (X − Y) modules sit idle as spares ("catchers"). When a module fails or is taken for service, a catcher absorbs the load.
| Notation | Modules total / needed | Spare | Module utilization | Cost premium vs N |
|---|---|---|---|---|
| 3-to-Net-2 (3N2) — most common DR config | 3 / 2 | 1 catcher | 67% (each running module at 100%, plus 1 idle) | 1.50× |
| 4-to-Net-3 (4N3) | 4 / 3 | 1 catcher | 75% | 1.33× |
| 5-to-Net-4 (5N4) | 5 / 4 | 1 catcher | 80% | 1.25× |
| 6-to-Net-5 (6N5) | 6 / 5 | 1 catcher | 83% | 1.20× |
| 2N for comparison | 2 modules each at full capacity | 1 mirror | 50% | 2.00× |
Trade-offs vs 2N
| 2N | 4N3 (or any XNY) | |
|---|---|---|
| Capacity efficiency | 50% (each side at half) | 75-80% (most modules at full load) |
| Capital cost | 2.0× of N | 1.20-1.50× of N |
| Failure tolerance | Tolerates loss of one ENTIRE side | Tolerates loss of one MODULE only |
| Maintenance | Take entire side offline; swap many components at once | Take one module offline at a time; sequenced maintenance |
| Compartmentalization | Two separate fire/electrical zones | Modules can share infrastructure (less compartmentalized) |
| Concurrent maintainability | Yes (Tier III) — entire side can be serviced | Yes (Tier III) — one module at a time |
| Fault tolerance | Easier to achieve Tier IV with 2(N+1) | Harder — multiple-module failures more impactful |
| Where used | Enterprise + traditional colo (Atlas DC1) | Hyperscale (Microsoft, Google, Meta, AWS) |
The Industry Framing — IR vs DR
Most modern DC literature contrasts two competing redundancy philosophies. Both achieve concurrent maintainability (Tier III), but they trade differently between capital cost and fault containment.
| Isolated Redundant (IR) | Distributed Redundant (DR) | |
|---|---|---|
| Topology | 2N or 2(N+1) — fully duplicated independent systems | 3N2, 4N3, 5N4 — multiple modules with shared catcher(s) |
| Atlas DC1 example | 2 sides (A + B), each carrying full load. Cross-tie normally open. | 3 modules — any 2 carry full load, third is the catcher |
| Bus architecture | Two completely separate buses | Common output bus (or interconnected via STS) |
| Failure transfer mechanism | Dual-fed loads (servers with two PSUs) — no transfer needed; load is on both sides simultaneously | STS (Static Transfer Switch) shifts load from failed module to catcher in < 4 ms |
| Capital efficiency | 50% utilization (2.0× cost) | 67-83% utilization (1.20-1.50× cost) |
| Fault containment | Excellent — each side is electrically isolated | Lower — common bus means a bus fault affects all modules |
| Cascading failure risk | Very low | Higher — STS or common bus failure could affect multiple modules |
| Maintenance complexity | Take entire side offline once, swap many components | Sequenced maintenance — one module at a time |
| Best for | Enterprise, traditional colo, hospitals, financial | Hyperscale where capital efficiency dominates fault tolerance |
Block Redundancy at the System Level
XNY is usually applied at the module level — each module is a self-contained UPS + generator + cooling + IT block. Hyperscale designs these modules as identical interchangeable units. Failure of one module shifts load to the catcher via fast-acting STS (Static Transfer Switch) or PDU-level redundancy.
The "Nines" of Uptime — Concrete Time
"Five nines" sounds great until you do the math. Here's what each level actually means in measured downtime per year.
| Availability | Annual downtime | Monthly downtime | Daily downtime | Common application |
|---|---|---|---|---|
| 90% (1 nine) | 36.5 days | 3 days | 2.4 hours | Office desktop |
| 99% (2 nines) | 3.65 days | 7.2 hours | 14.4 minutes | Small business website (no SLA) |
| 99.9% (3 nines) | 8.76 hours | 43.8 minutes | 1.4 minutes | SaaS apps (basic SLA) |
| 99.99% (4 nines) | 52.6 minutes | 4.4 minutes | 8.6 seconds | Most colocation DCs (Tier III) |
| 99.999% (5 nines) | 5.26 minutes | 26.3 seconds | 0.86 seconds | Tier IV, telecom carriers, financial systems |
| 99.9999% (6 nines) | 31.5 seconds | 2.6 seconds | 0.086 seconds | Critical infrastructure (rare; impossible to verify) |
| 99.99999% (7 nines) | 3.15 seconds | — | — | Theoretical / marketing only |
Rough Tier ↔ Nines Correlation
The Uptime Institute originally published these correlations (loosely; later editions backed away from explicit nines-tier mapping):
| Tier | Implied availability | Annual downtime |
|---|---|---|
| Tier I | 99.671% | ~ 28.8 hours |
| Tier II | 99.741% | ~ 22.7 hours |
| Tier III | 99.982% | ~ 1.6 hours |
| Tier IV | 99.995% | ~ 26.3 minutes |
Concurrent Maintainability vs Fault Tolerance
The two "magic words" in DC redundancy. They're easily confused because both involve having a spare path.
| Concurrent Maintainability (Tier III) | Fault Tolerance (Tier IV) | |
|---|---|---|
| Definition | Any single planned action can be taken without dropping load | Any single unplanned event can occur without dropping load |
| Failure scenario | Single planned outage = OK. Unplanned failure of the active path = drops load. | Single planned OR unplanned event = OK |
| Practical difference | Both paths exist but only one carries load normally. Switch over for maintenance. Failure of the active path requires switchover (brief). | Both paths actively carry load (load-sharing). Failure of one = the other absorbs immediately, no switchover. |
| Construction implications | Two paths, but path B can be in standby (faster to build, lower cost) | Two paths must be active simultaneously + segregated (separate fire zones, separate routes, separate utility feeds) |
| UPS architecture | Either side carries load; second side is reserve | Both UPS systems share load (e.g., each at 50% load); failure = remaining one absorbs to 100% |
What Each Tier Actually Costs (Approximate)
| Tier | $ / kW IT capacity (capex) | Construction time | Where used |
|---|---|---|---|
| Tier I | $10,000 / kW | ~ 6 months | Closet, wiring rooms, prosumer "data centers" |
| Tier II | $11,000 - 14,000 / kW | ~ 9 months | Small enterprise. Some cloud edge. |
| Tier III | $15,000 - 22,000 / kW | ~ 18-24 months | Standard colocation (Atlas DC1). Most enterprise + cloud. |
| Tier IV | $22,000 - 30,000 / kW | ~ 24-36 months | Financial trading, defense, hyperscale critical clusters |
"Tokens" — The AI Currency
In modern data center conversation, "tokens" usually refers to AI/LLM tokens — the units of text that AI models read + write. The data center industry has reorganized around this unit because it's how AI workloads are measured, sold, and billed. Understanding tokens is essential to understanding modern DC capacity planning.
Token Types — Input vs Output
| Type | Description | Cost ratio |
|---|---|---|
| Input (prompt) tokens | Tokens you SEND to the model. Includes the user's question + system prompt + conversation history + any context fed to the model. | Cheaper (1×) |
| Output (completion) tokens | Tokens the model GENERATES. Each one requires a full forward pass through hundreds of billions of parameters. | 3-5× cost of input |
| Cached tokens | Repeat input the model has seen before — cached on the GPU, much cheaper to re-process | ~ 0.1× cost of input (cached); some providers offer prompt caching |
| Reasoning tokens | "Thinking" tokens for reasoning models (OpenAI o1/o3, DeepSeek R1, etc.) — internal chain-of-thought, not shown to user | Same as output (sometimes hidden in pricing) |
Why Tokens Drive Data Center Decisions
| Decision | How tokens factor in |
|---|---|
| Capacity planning | Hyperscalers measure AI capacity in tokens/sec. A single H100 GPU does ~ 1,000-3,000 tokens/sec (depending on model size). Plan facility capacity = (target tokens/sec) ÷ (tokens/sec/GPU) = required GPU count → required power. |
| Billing model | API providers (OpenAI, Anthropic, Google) sell tokens at $X / million tokens. GPT-4o: ~ $2.50/M input + $10/M output. GPT-5: similar. Frontier AI: $15-30/M input, $60-150/M output. Cost determined by model size + GPU efficiency. |
| Latency target | User-facing chatbots target 50-200 tokens/sec for natural conversation feel. Faster = better UX. Drives GPU selection (H200 ≥ H100 for inference). |
| Energy efficiency | Tokens-per-watt emerging as the key DC efficiency metric. New sustainability metric overlaying PUE. |
| Cluster sizing | Training a frontier LLM = trillions of tokens × billions of parameters × multiple epochs. xAI's Memphis facility was sized for "tokens-per-day" output. |
Tokens / Sec / GPU — Real Numbers (2026)
| Hardware | Inference: tokens/sec at LLaMA-70B | Power per GPU | Tokens / kWh |
|---|---|---|---|
| NVIDIA H100 (80 GB) | ~ 1,500 tokens/sec | 700 W | ~ 7,700 |
| NVIDIA H200 (141 GB) | ~ 2,200 tokens/sec | 1000 W | ~ 7,900 |
| NVIDIA B200 (Blackwell) | ~ 3,500 tokens/sec | 1200 W | ~ 10,500 |
| AMD MI300X | ~ 1,800 tokens/sec | 750 W | ~ 8,600 |
| Google TPU v5p | ~ 2,500 tokens/sec | ~ 700 W (estimated) | ~ 12,800 |
| Groq LPU (specialized) | ~ 500 tokens/sec but 5× lower latency | ~ 280 W | ~ 6,400 |
| Cerebras WSE-3 (wafer-scale) | ~ 1,800 tokens/sec at much larger model | ~ 23,000 W (full system) | — |
Note: tokens/sec varies enormously with model size, batch size, sequence length, and quantization. These are typical FP16 inference numbers for production-scale chatbot workloads.
Tokens-per-Watt — The Emerging Metric
Tokens vs Other AI Metrics
| Metric | What it measures | When used |
|---|---|---|
| Tokens / sec (per request) | Latency-focused: how fast user sees output | User-facing chatbot SLAs |
| Tokens / sec (aggregate throughput) | Capacity-focused: total tokens cluster delivers per second | Capacity planning |
| FLOPs / sec | Floating-point operations per second | Hardware spec sheets; less useful operationally |
| Time To First Token (TTFT) | Latency until first output token appears | Real-time UX targets |
| Inter-Token Latency (ITL) | Time between successive tokens in stream | Smooth output target (target ≤ 50 ms = 20 tokens/sec) |
| Tokens / kWh or Tokens / $ | Cost-of-output efficiency | Provider economics |
| Context window | Max tokens model can hold in one conversation (8K - 2M+) | Application design constraint |
The OTHER Tokens — Not What This Section Is About
For context: in colocation commercial conversation, "tokens" can also mean capacity allocation tokens — units of reserved kW (e.g., "we have 50 × 10 kW tokens"). That meaning is informal and increasingly secondary in modern DC speak. "Tokens" without qualifier in 2026 typically means AI tokens.
Other Common Reliability Metrics
| Term | Definition | Used for |
|---|---|---|
| MTBF | Mean Time Between Failures (hours) | How often a component fails. Higher = better. |
| MTTR | Mean Time To Repair (hours) | How long to fix once it fails. Lower = better. |
| Availability | MTBF / (MTBF + MTTR) | The fraction of time the system is up. Convert to nines. |
| SLA | Service Level Agreement — contractual uptime target | What you promise customers |
| RTO | Recovery Time Objective — max acceptable downtime per event | Disaster recovery planning |
| RPO | Recovery Point Objective — max acceptable data loss per event | Backup frequency planning |
| PUE | Power Usage Effectiveness — total facility power / IT power | Energy efficiency. 1.0 = perfect; modern DCs target < 1.5 |
| WUE | Water Usage Effectiveness — water per kWh of IT load | Cooling water consumption |
| ERE | Energy Reuse Effectiveness — accounts for waste heat reuse | Holistic efficiency |
Decoding a DC Spec Sheet — Atlas DC1 Example
"Atlas DC1 is a 2.5 MW Tier III equivalent colocation data center with 2N redundant electrical infrastructure (UPS, generators, distribution), N+1 mechanical (chillers), and a target of 99.99% availability. Two utility feeds, dual-path PDUs, automatic failover. PUE 1.4."
Decoded
| Phrase | What it means |
|---|---|
| 2.5 MW | Critical (IT) load capacity. Your servers can collectively draw up to 2.5 MW. |
| Tier III equivalent | Concurrent maintainability — any single component can be taken offline without dropping load. Not formally Uptime-certified. |
| Colocation | Multi-tenant facility — customers rent rack space + power. Atlas owns building + infrastructure. |
| 2N redundant electrical | Two complete independent electrical systems (Side A + Side B). Either alone carries full load. |
| N+1 mechanical | One spare chiller — total 4 installed, 3 needed for cooling, 1 in standby. |
| 99.99% availability target | 4 nines — 52.6 minutes per year max downtime budget. Tier III architecture supports this if operated well. |
| Two utility feeds | Atlas takes power from two different utility distribution feeders → if one feeder is cut, the other still serves Side A or Side B independently. |
| Dual-path PDUs | Each rack has TWO power distribution feeds (one from Side A, one from Side B). Server PSUs are dual-feed (sum to 100% load capability from either side). |
| Automatic failover | ATS transfers from utility to genset on outage, no human action. |
| PUE 1.4 | For every 1 kW of IT load, the facility consumes 1.4 kW total. Mech (cooling, lighting, etc.) adds 0.4. Modern target is < 1.5; hyperscale < 1.3. |
Data Center Types — Full Taxonomy
Beyond Tier classification, data centers are categorized by ownership + scale + business model. Each type has different design priorities, regulatory regimes, and economic models.
| Type | Description | Typical scale | Ownership | Atlas DC1? |
|---|---|---|---|---|
| Enterprise (On-Premises) | Owned + operated by single organization for own IT | 500 kW – 5 MW | Owner-operated | No |
| Colocation (Colo) | 3rd-party owned facility; tenants lease space + power; tenants bring their own IT | 1 MW – 200+ MW | Multi-tenant leased | YES — Atlas DC1 model |
| Hyperscale | Massive purpose-built for cloud/internet (AWS, Azure, Google, Meta) | 20 MW – 1 GW+ per campus | Owner-operated (cloud CSPs) | No |
| Edge | Small distributed; close to users for low latency. Often containerized. | 10 kW – 1 MW | Operator/enterprise | No |
| Modular / Prefab | Factory-built modules shipped to site; deploy in weeks not years | Scalable per unit | Variable | No |
| Wholesale Colo | Large-block leasing (1-50 MW per tenant) of power + shell space; tenant builds MEP | 1 MW – 50 MW per tenant | Multi-tenant leased | No |
| Retail Colo | Small-block leasing (single cabinets to small cages); operator provides built-out white space | 1 kW – 500 kW per tenant | Multi-tenant leased | No |
Full DC Metrics Suite — PUE + DCiE + WUE + CUE
Beyond PUE, the industry tracks several efficiency metrics. Each captures a different sustainability dimension.
| Metric | Formula | Range (best → worst) | What it tells you |
|---|---|---|---|
| PUE (Power Usage Effectiveness) | Total facility energy / IT energy | 1.0 → 3.0+ | Efficiency overhead. Modern target < 1.5; hyperscale < 1.3. |
| DCiE (Data Center infrastructure Efficiency) | 1 / PUE × 100% | 100% → 33% | Inverse of PUE in %. PUE 1.4 = DCiE 71%. |
| WUE (Water Usage Effectiveness) | Annual water use (L) / IT energy (kWh) | 0 → 5.0+ L/kWh | Cooling water consumption. Air-cooled DC: ~ 0. Evaporative-cooled: 1-3 L/kWh. Liquid-cooled with dry coolers: ~ 0. |
| CUE (Carbon Usage Effectiveness) | Annual CO₂ emissions (kg) / IT energy (kWh) | 0 → 1.0 kgCO₂/kWh | Grid-mix-dependent. Renewable-powered DC: ~ 0. Coal-grid: ~ 0.8. |
| ERE (Energy Reuse Effectiveness) | (Total energy − reused energy) / IT energy | 0 → 3.0+ | Credits energy reused (e.g., heating nearby buildings). Can be lower than PUE for facilities with district heat reuse. |
| SPUE (Server PUE) | Total server power / useful compute output | 1.0 → 2.0+ | Server-internal efficiency (PSU + fan + voltage regulator losses) |
| tPUE (Total PUE) | PUE × SPUE | — | Ultimate efficiency from grid to compute. Captures everything. |
Static Transfer Switch (STS) — Detail
STS is the fast-acting electronic switch enabling Distributed Redundant systems. While ATS (Automatic Transfer Switch) takes 100-200 ms (mechanical contactors), STS transitions in < 4 ms (1/4 cycle) using thyristor-controlled phases.
| Aspect | STS detail |
|---|---|
| Transfer time | < 4 ms (1/4 cycle at 60 Hz). Below the ITIC curve threshold for IT equipment. |
| Sources | Two synchronized AC inputs (typically two UPS outputs) — must be in-phase + same magnitude |
| Switching mechanism | Thyristors (SCRs) on each phase, fired at zero-current crossing |
| Synchronization requirement | Sources must stay within ±5° phase angle. PMS keeps both UPS outputs synced. |
| Where used in Atlas DC1 | NOT used at facility level (Atlas DC1 is 2N IR — dual-fed loads, no transfer needed). USED at rack PDU level inside dual-corded servers. |
| Where critical | Distributed Redundant (DR) systems — STS shifts load from failed module to catcher |
| Common spec | Eaton, Schneider, Vertiv 100A-4000A static transfer switches |
DCIM vs BMS vs EPMS — Three Different Systems
Modern DCs have multiple monitoring + control systems. They overlap but have distinct responsibilities.
| System | Full name | Owns | Doesn't own | Vendor examples |
|---|---|---|---|---|
| BMS | Building Management System | HVAC, lighting controls, building security, fire alarm interface, water systems | IT-room rack-level, electrical metering down to circuits | Honeywell, Johnson Controls, Siemens Desigo, Schneider EcoStruxure Building |
| EPMS | Electrical Power Monitoring System | Switchgear meters, power quality, breaker positions, alarms, energy logging | Building HVAC, IT inventory | Schneider PowerLogic, Eaton Foreseer, GE iCM, Siemens SENTRON |
| DCIM | Data Center Infrastructure Management | IT inventory (every rack, every server), capacity planning, asset lifecycle, cabling docs, real-time PUE, alarm aggregation | Building HVAC controls, building security access | Vertiv Trellis, Schneider EcoStruxure IT, Sunbird, Nlyte |
In sophisticated facilities, BMS + EPMS + DCIM all push data into a single command center dashboard via OPC-UA, BACnet, Modbus, or SNMP integrations. In simpler facilities, they may be islands.
Cooling Topology — Hot Aisle / Cold Aisle + Containment
| Topology | Description | PUE impact |
|---|---|---|
| No containment (legacy) | Random rack orientation; cold + hot air mix in room | PUE 2.0+ |
| Hot aisle / cold aisle | Racks arranged front-to-front (cold) and back-to-back (hot). Reduces mixing. | PUE 1.7-1.9 |
| Cold aisle containment (CAC) | Cold aisles enclosed (doors + ceiling); cold supply pressurized; hot return open to room | PUE 1.4-1.6 |
| Hot aisle containment (HAC) | Hot aisles enclosed; hot return ducted directly back to CRAH; cold supply open to room | PUE 1.3-1.5 (slightly better than CAC for fire suppression) |
| Pod / module | Self-contained cooling unit serving 1-10 racks; in-row CRAH or chimney | PUE 1.2-1.4 |
| Rear-door heat exchanger | Liquid coil on rack rear extracts heat at source | PUE 1.1-1.2 |
| Direct liquid cooling (DLC) | Coolant circulated through CPU/GPU cold plates | PUE 1.05-1.15 (mandatory for high-density AI) |
| Immersion cooling | Servers submerged in dielectric fluid | PUE 1.02-1.10 |
Free Cooling — Economizer Hours
When outdoor air is cool enough, mechanical chillers can shut off and outdoor air provides cooling directly. Free cooling hours per year vary by climate.
| Climate zone | Free cooling hours/yr | Method |
|---|---|---|
| Phoenix, AZ (hot dry) | 1,500-2,500 | Direct evaporative (when outdoor < 65°F WB) |
| Northern Virginia (hot humid) | 3,500-4,500 | Air-side or water-side |
| Dublin, Ireland (cool maritime) | 7,000-8,500 | Air-side direct (most of the year) |
| Stockholm, Sweden (cold) | 8,500+ | Air-side direct nearly year-round |
Maximizing free cooling hours is the biggest single PUE-reduction lever. Why hyperscale operators pick locations like Iowa, Northern Sweden, Dublin: massive free cooling hours.
Reading a Colocation Contract
Engineers transitioning to client-facing roles encounter commercial language. Here's the decoder.
| Term | What it means |
|---|---|
| MSA (Master Services Agreement) | Top-level contract between operator + tenant. References specific Order Forms. |
| Order Form / SOW | Specific engagement: power kW, space sq ft, term, rates |
| SLA (Service Level Agreement) | Uptime guarantee (e.g., 99.99%), credit calculation if missed |
| Power capacity reserved | Maximum kW the tenant can draw (in tokens). Pays for reservation regardless of actual use. |
| Power utilization (or "absorption") | Actual kW consumed. May be billed separately from reservation. |
| Ramp schedule | Pre-agreed timeline for capacity expansion (e.g., 100 kW now, 250 kW in 6 mo, 500 kW in 12 mo) |
| Cross-connect | Fiber/copper interconnection between tenants or between tenant + carrier (extra fee per port) |
| Remote hands | Operator's tech performs physical work in tenant's cabinet (extra hourly fee) |
| NRC vs MRC | Non-Recurring Charge (one-time install/setup) vs Monthly Recurring Charge (ongoing) |
| Power Pass-through | Operator bills tenant for actual power consumed at utility cost + markup |
| Co-marketed PUE | Operator publishes facility PUE; tenant uses it for own sustainability reporting |
TIA-942-B — The Other Tier Standard, In Detail
While Uptime Institute Tiers focus on topology, TIA-942-B is comprehensive — it covers facility infrastructure across multiple subsystems and assigns Rated levels (1-4) to each subsystem independently.
| Subsystem | Rated 1 (basic) | Rated 2 (redundant) | Rated 3 (concurrently maintainable) | Rated 4 (fault tolerant) |
|---|---|---|---|---|
| Telecom Pathway | Single path | Single path + redundant cabinets | Multiple paths · concurrently maintainable | Multiple paths · fault tolerant |
| Architectural / Structural | No specific seismic / fire requirements | Seismic Zone 0/1 design | Seismic Zone 2/3 + 1-hour fire walls | Seismic Zone 4 + 2-hour fire walls + blast resistance |
| Electrical | Single utility | Single utility + N+1 components | Multiple utility feeds + N+1 + concurrently maintainable | Multiple utility feeds + 2N + fault tolerant |
| Mechanical | Single source | N+1 components | Multiple sources + concurrently maintainable | Multiple sources + fault tolerant |
A facility could be Electrical Rated-3 but Architectural Rated-1 — TIA acknowledges that subsystems can have different reliability requirements. Uptime Institute Tier rating, by contrast, is single-number for the whole facility.
Common Confusions — Quick Reference
| If you hear... | Don't confuse with... |
|---|---|
| "Tier III" | "3 nines" — different concepts. Tier III usually achieves 4 nines. |
| "2N" | "N+2" — 2N = two complete systems. N+2 = one system with 2 spare units. |
| "Concurrent maintainability" | "Fault tolerance" — concurrent = planned outage OK; fault tolerant = unplanned OK too. |
| "Tier-Certified" | "Tier-rated" or "Tier equivalent" — only Uptime Institute audited facilities can claim "certified." |
| "5 nines" | "Tier IV" — 5 nines is operational; Tier IV is structural. Not 1:1. |
| "PUE 1.0" | Marketing — physically impossible (mech + lighting + losses always > 0). Best real DCs are ~ 1.1. |
| "4N3" or other custom notation | Standard notation — N, N+1, 2N, 2(N+1), 3N. Anything outside this is vendor-specific or imprecise. |
Data Center Tiers + Reliability Reference · v1.0