REFERENCE

Data Center Tiers & Reliability

Tier I-IV · redundancy notation · uptime nines · the spec sheet decoded

"Tier III, 2N redundant, 4 nines uptime." Three different things, often confused. This page explains what each means, how they relate, and how to read a data center spec sheet without bluffing.

The Two Tier Standards — They're Different

Two organizations publish "Tier" classifications. Both use the same number (Tier I-IV) but mean different things. Always ask which standard.

Uptime Institute TiersANSI/TIA-942 Rated Tiers
AuthorityUptime Institute (private company)ANSI / TIA (telecommunications standards)
Certification processUptime Institute conducts on-site audit + design review. Pays for license to use term "Tier-Certified."Self-declared or independent firm assessment. No license fee.
Naming"Tier I, II, III, IV""Rated 1, 2, 3, 4"
ScopeTopology + concurrent maintainability + fault toleranceTopology + facility (including telecom rooms, fire, HVAC, security, environmental)
Common in industryMost-cited globally (especially for hyperscale + colo)Common in US enterprise + government
Atlas DC1 marketing"Tier III equivalent" (we don't pay for cert)"Rated 3 equivalent"

Uptime Institute Tiers — The Canonical Definition

TierNameTopology requirementMaintenance impactFault impact
IBasic CapacitySingle non-redundant path. No N+1 anywhere.Maintenance = shutdownAny single failure = shutdown
IIRedundant Capacity ComponentsSingle path + N+1 redundant capacity components (UPS modules, gens, chillers)Maintenance of capacity component = OK; path maintenance = shutdownPath failure = shutdown; capacity component failure = OK
IIIConcurrently MaintainableMultiple distribution paths + N+1 capacity components. Each path independently maintainable.ANY single component or path can be taken offline for planned maintenance without dropping IT load.Single fault may still drop load (if it's the active path)
IVFault Tolerant2 simultaneously active paths + N+1 capacity in each. Compartmentalization (separate fire zones, structural).Concurrent maintenance OKSingle fault anywhere = no impact on IT load. Multiple-fault tolerant.

Atlas DC1 = "Tier III equivalent"

Atlas DC1 has 2N redundancy (two independent paths, each carrying full IT load) + N+1 within each side (e.g., 2 UPS per side where 1 covers the IT load). This meets Tier III: any single component (TX-A, GEN-A, ATS-A, UPS-A1, etc.) can be taken offline for service without dropping IT load (Side B carries everything). It also approaches Tier IV in many ways but lacks the formal certification + some compartmentalization details (separated fire zones for each path, etc.).

Redundancy Notation — N, N+1, 2N, 2(N+1), 2N+1

NotationMeaningConcrete example (4 chillers needed)Single failure resultCost premium
NJust enough capacity to carry the load4 chillers · 4 needed · 0 spareLose loadReference (1.0×)
N+1One extra unit beyond what's needed (parallel-redundant capacity)5 chillers · 4 needed · 1 spareLose 1 of 5 → still 4 running → OK~ 1.25×
N+2Two extras (rare; for very-high-confidence apps)6 chillersLose 2 → still 4 running → OK~ 1.5×
2NTwo completely independent capacity systems, each can carry full load alone8 chillers — 4 on Side A + 4 on Side BLose entire Side A → Side B still serves all load~ 2.0×
2(N+1)Two complete systems, each with its own N+110 chillers — 5 on each sideLose Side A AND a Side B chiller → still 4 on B → OK~ 2.5×
2N+12N plus one more spare (uncommon)9 chillersLose Side A + 1 spare → still 5 → OK. Same as 2(N+1) practically.~ 2.25×
3NThree completely independent systems (rare; financial-grade)12 chillers · 3 paths × 4 eachLose 2 paths → still 1 path with full capacity~ 3.0×

Distributed Redundant — Block Redundancy / Catcher / XNY

This topology has many names. All mean the same thing:

Industry termWhere you'll hear it
Distributed Redundant (DR)Uptime Institute, formal engineering specs
Block RedundancyHyperscale operators (Microsoft, Google, Meta)
Catcher TopologyIndustry slang — the "catcher" is the spare module
X-to-Net-Y (XNY)Vendor specs and quantitative documents (e.g., 3N2, 4N3, 5N4)
"3 to make 2" / "4 to make 3"Conversational shorthand
Common Bus / Shared BusWhen all modules connect to a common output bus, vs isolated

Hyperscale operators use this topology because it's far more capital-efficient than 2N at scale.

How XNY works

X = total modules installed. Y = modules needed to carry full load. The remaining (X − Y) modules sit idle as spares ("catchers"). When a module fails or is taken for service, a catcher absorbs the load.

NotationModules total / neededSpareModule utilizationCost premium vs N
3-to-Net-2 (3N2) — most common DR config3 / 21 catcher67% (each running module at 100%, plus 1 idle)1.50×
4-to-Net-3 (4N3)4 / 31 catcher75%1.33×
5-to-Net-4 (5N4)5 / 41 catcher80%1.25×
6-to-Net-5 (6N5)6 / 51 catcher83%1.20×
2N for comparison2 modules each at full capacity1 mirror50%2.00×
i
Why hyperscale picks XNY over 2N
For a 50 MW data center: 2N requires 100 MW of installed capacity (2× of need). 4N3 requires only 67 MW (one spare module). Capital savings: roughly $200-400 million on a single facility. As X grows, utilization approaches N+1 efficiency at the SYSTEM level rather than the unit level.

Trade-offs vs 2N

2N4N3 (or any XNY)
Capacity efficiency50% (each side at half)75-80% (most modules at full load)
Capital cost2.0× of N1.20-1.50× of N
Failure toleranceTolerates loss of one ENTIRE sideTolerates loss of one MODULE only
MaintenanceTake entire side offline; swap many components at onceTake one module offline at a time; sequenced maintenance
CompartmentalizationTwo separate fire/electrical zonesModules can share infrastructure (less compartmentalized)
Concurrent maintainabilityYes (Tier III) — entire side can be servicedYes (Tier III) — one module at a time
Fault toleranceEasier to achieve Tier IV with 2(N+1)Harder — multiple-module failures more impactful
Where usedEnterprise + traditional colo (Atlas DC1)Hyperscale (Microsoft, Google, Meta, AWS)

The Industry Framing — IR vs DR

Most modern DC literature contrasts two competing redundancy philosophies. Both achieve concurrent maintainability (Tier III), but they trade differently between capital cost and fault containment.

Isolated Redundant (IR)Distributed Redundant (DR)
Topology2N or 2(N+1) — fully duplicated independent systems3N2, 4N3, 5N4 — multiple modules with shared catcher(s)
Atlas DC1 example2 sides (A + B), each carrying full load. Cross-tie normally open.3 modules — any 2 carry full load, third is the catcher
Bus architectureTwo completely separate busesCommon output bus (or interconnected via STS)
Failure transfer mechanismDual-fed loads (servers with two PSUs) — no transfer needed; load is on both sides simultaneouslySTS (Static Transfer Switch) shifts load from failed module to catcher in < 4 ms
Capital efficiency50% utilization (2.0× cost)67-83% utilization (1.20-1.50× cost)
Fault containmentExcellent — each side is electrically isolatedLower — common bus means a bus fault affects all modules
Cascading failure riskVery lowHigher — STS or common bus failure could affect multiple modules
Maintenance complexityTake entire side offline once, swap many componentsSequenced maintenance — one module at a time
Best forEnterprise, traditional colo, hospitals, financialHyperscale where capital efficiency dominates fault tolerance
i
"Distributed" because the redundancy is spread across multiple modules
In IR (2N), the redundancy is concentrated in a single mirror system. In DR (3N2), the redundancy is distributed across all modules — each module carries part of the responsibility for backup. Failure handling shifts from "switch sides" to "redistribute load."

Block Redundancy at the System Level

XNY is usually applied at the module level — each module is a self-contained UPS + generator + cooling + IT block. Hyperscale designs these modules as identical interchangeable units. Failure of one module shifts load to the catcher via fast-acting STS (Static Transfer Switch) or PDU-level redundancy.

!
XNY only makes sense at scale
For a 1-2 MW colo (Atlas DC1 size), 2N is simpler and cheaper to engineer despite its 2× cost — the absolute dollar difference is small. For a 50-200 MW hyperscale facility, XNY's efficiency saves hundreds of millions and the engineering complexity is justified.

The "Nines" of Uptime — Concrete Time

"Five nines" sounds great until you do the math. Here's what each level actually means in measured downtime per year.

AvailabilityAnnual downtimeMonthly downtimeDaily downtimeCommon application
90% (1 nine)36.5 days3 days2.4 hoursOffice desktop
99% (2 nines)3.65 days7.2 hours14.4 minutesSmall business website (no SLA)
99.9% (3 nines)8.76 hours43.8 minutes1.4 minutesSaaS apps (basic SLA)
99.99% (4 nines)52.6 minutes4.4 minutes8.6 secondsMost colocation DCs (Tier III)
99.999% (5 nines)5.26 minutes26.3 seconds0.86 secondsTier IV, telecom carriers, financial systems
99.9999% (6 nines)31.5 seconds2.6 seconds0.086 secondsCritical infrastructure (rare; impossible to verify)
99.99999% (7 nines)3.15 secondsTheoretical / marketing only

Rough Tier ↔ Nines Correlation

The Uptime Institute originally published these correlations (loosely; later editions backed away from explicit nines-tier mapping):

TierImplied availabilityAnnual downtime
Tier I99.671%~ 28.8 hours
Tier II99.741%~ 22.7 hours
Tier III99.982%~ 1.6 hours
Tier IV99.995%~ 26.3 minutes
i
Nines vs Tier — they measure different things
Tiers describe topology (how the system is built). Nines describe measured outcomes over time. A Tier III facility CAN achieve 5 nines if operated well; it CAN drop to 2 nines if poorly run. Topology buys capability; operations realize availability. The difference is human/process factors.

Concurrent Maintainability vs Fault Tolerance

The two "magic words" in DC redundancy. They're easily confused because both involve having a spare path.

Concurrent Maintainability (Tier III)Fault Tolerance (Tier IV)
DefinitionAny single planned action can be taken without dropping loadAny single unplanned event can occur without dropping load
Failure scenarioSingle planned outage = OK. Unplanned failure of the active path = drops load.Single planned OR unplanned event = OK
Practical differenceBoth paths exist but only one carries load normally. Switch over for maintenance. Failure of the active path requires switchover (brief).Both paths actively carry load (load-sharing). Failure of one = the other absorbs immediately, no switchover.
Construction implicationsTwo paths, but path B can be in standby (faster to build, lower cost)Two paths must be active simultaneously + segregated (separate fire zones, separate routes, separate utility feeds)
UPS architectureEither side carries load; second side is reserveBoth UPS systems share load (e.g., each at 50% load); failure = remaining one absorbs to 100%

What Each Tier Actually Costs (Approximate)

Tier$ / kW IT capacity (capex)Construction timeWhere used
Tier I$10,000 / kW~ 6 monthsCloset, wiring rooms, prosumer "data centers"
Tier II$11,000 - 14,000 / kW~ 9 monthsSmall enterprise. Some cloud edge.
Tier III$15,000 - 22,000 / kW~ 18-24 monthsStandard colocation (Atlas DC1). Most enterprise + cloud.
Tier IV$22,000 - 30,000 / kW~ 24-36 monthsFinancial trading, defense, hyperscale critical clusters

"Tokens" — The AI Currency

In modern data center conversation, "tokens" usually refers to AI/LLM tokens — the units of text that AI models read + write. The data center industry has reorganized around this unit because it's how AI workloads are measured, sold, and billed. Understanding tokens is essential to understanding modern DC capacity planning.

📜
Metaphor: tokens are the "syllables" the AI thinks in
A token is roughly 4 characters of English text — about ¾ of a word on average. The sentence "The cat sat on the mat" tokenizes to ~ 7 tokens (one per word, plus subword splits). When you ask GPT-4 a question, your prompt becomes hundreds-to-thousands of tokens going IN, and the response is more tokens coming OUT. Every operation in modern AI is measured in tokens.

Token Types — Input vs Output

TypeDescriptionCost ratio
Input (prompt) tokensTokens you SEND to the model. Includes the user's question + system prompt + conversation history + any context fed to the model.Cheaper (1×)
Output (completion) tokensTokens the model GENERATES. Each one requires a full forward pass through hundreds of billions of parameters.3-5× cost of input
Cached tokensRepeat input the model has seen before — cached on the GPU, much cheaper to re-process~ 0.1× cost of input (cached); some providers offer prompt caching
Reasoning tokens"Thinking" tokens for reasoning models (OpenAI o1/o3, DeepSeek R1, etc.) — internal chain-of-thought, not shown to userSame as output (sometimes hidden in pricing)

Why Tokens Drive Data Center Decisions

DecisionHow tokens factor in
Capacity planningHyperscalers measure AI capacity in tokens/sec. A single H100 GPU does ~ 1,000-3,000 tokens/sec (depending on model size). Plan facility capacity = (target tokens/sec) ÷ (tokens/sec/GPU) = required GPU count → required power.
Billing modelAPI providers (OpenAI, Anthropic, Google) sell tokens at $X / million tokens. GPT-4o: ~ $2.50/M input + $10/M output. GPT-5: similar. Frontier AI: $15-30/M input, $60-150/M output. Cost determined by model size + GPU efficiency.
Latency targetUser-facing chatbots target 50-200 tokens/sec for natural conversation feel. Faster = better UX. Drives GPU selection (H200 ≥ H100 for inference).
Energy efficiencyTokens-per-watt emerging as the key DC efficiency metric. New sustainability metric overlaying PUE.
Cluster sizingTraining a frontier LLM = trillions of tokens × billions of parameters × multiple epochs. xAI's Memphis facility was sized for "tokens-per-day" output.

Tokens / Sec / GPU — Real Numbers (2026)

HardwareInference: tokens/sec at LLaMA-70BPower per GPUTokens / kWh
NVIDIA H100 (80 GB)~ 1,500 tokens/sec700 W~ 7,700
NVIDIA H200 (141 GB)~ 2,200 tokens/sec1000 W~ 7,900
NVIDIA B200 (Blackwell)~ 3,500 tokens/sec1200 W~ 10,500
AMD MI300X~ 1,800 tokens/sec750 W~ 8,600
Google TPU v5p~ 2,500 tokens/sec~ 700 W (estimated)~ 12,800
Groq LPU (specialized)~ 500 tokens/sec but 5× lower latency~ 280 W~ 6,400
Cerebras WSE-3 (wafer-scale)~ 1,800 tokens/sec at much larger model~ 23,000 W (full system)

Note: tokens/sec varies enormously with model size, batch size, sequence length, and quantization. These are typical FP16 inference numbers for production-scale chatbot workloads.

Tokens-per-Watt — The Emerging Metric

PUE measures the building. Tokens/watt measures the WORK.
A facility can have great PUE (1.1) but still be inefficient if its GPUs deliver few tokens per watt. The full efficiency chain is: Grid power → DC electrical efficiency (PUE) → GPU efficiency (tokens/W) → useful work (response quality). Hyperscalers are publishing tokens/watt benchmarks as the new sustainability KPI for AI workloads.

Tokens vs Other AI Metrics

MetricWhat it measuresWhen used
Tokens / sec (per request)Latency-focused: how fast user sees outputUser-facing chatbot SLAs
Tokens / sec (aggregate throughput)Capacity-focused: total tokens cluster delivers per secondCapacity planning
FLOPs / secFloating-point operations per secondHardware spec sheets; less useful operationally
Time To First Token (TTFT)Latency until first output token appearsReal-time UX targets
Inter-Token Latency (ITL)Time between successive tokens in streamSmooth output target (target ≤ 50 ms = 20 tokens/sec)
Tokens / kWh or Tokens / $Cost-of-output efficiencyProvider economics
Context windowMax tokens model can hold in one conversation (8K - 2M+)Application design constraint

The OTHER Tokens — Not What This Section Is About

For context: in colocation commercial conversation, "tokens" can also mean capacity allocation tokens — units of reserved kW (e.g., "we have 50 × 10 kW tokens"). That meaning is informal and increasingly secondary in modern DC speak. "Tokens" without qualifier in 2026 typically means AI tokens.

Other Common Reliability Metrics

TermDefinitionUsed for
MTBFMean Time Between Failures (hours)How often a component fails. Higher = better.
MTTRMean Time To Repair (hours)How long to fix once it fails. Lower = better.
AvailabilityMTBF / (MTBF + MTTR)The fraction of time the system is up. Convert to nines.
SLAService Level Agreement — contractual uptime targetWhat you promise customers
RTORecovery Time Objective — max acceptable downtime per eventDisaster recovery planning
RPORecovery Point Objective — max acceptable data loss per eventBackup frequency planning
PUEPower Usage Effectiveness — total facility power / IT powerEnergy efficiency. 1.0 = perfect; modern DCs target < 1.5
WUEWater Usage Effectiveness — water per kWh of IT loadCooling water consumption
EREEnergy Reuse Effectiveness — accounts for waste heat reuseHolistic efficiency

Decoding a DC Spec Sheet — Atlas DC1 Example

Example · Atlas DC1 spine Real-world DC marketing language decoded

"Atlas DC1 is a 2.5 MW Tier III equivalent colocation data center with 2N redundant electrical infrastructure (UPS, generators, distribution), N+1 mechanical (chillers), and a target of 99.99% availability. Two utility feeds, dual-path PDUs, automatic failover. PUE 1.4."

Decoded

PhraseWhat it means
2.5 MWCritical (IT) load capacity. Your servers can collectively draw up to 2.5 MW.
Tier III equivalentConcurrent maintainability — any single component can be taken offline without dropping load. Not formally Uptime-certified.
ColocationMulti-tenant facility — customers rent rack space + power. Atlas owns building + infrastructure.
2N redundant electricalTwo complete independent electrical systems (Side A + Side B). Either alone carries full load.
N+1 mechanicalOne spare chiller — total 4 installed, 3 needed for cooling, 1 in standby.
99.99% availability target4 nines — 52.6 minutes per year max downtime budget. Tier III architecture supports this if operated well.
Two utility feedsAtlas takes power from two different utility distribution feeders → if one feeder is cut, the other still serves Side A or Side B independently.
Dual-path PDUsEach rack has TWO power distribution feeds (one from Side A, one from Side B). Server PSUs are dual-feed (sum to 100% load capability from either side).
Automatic failoverATS transfers from utility to genset on outage, no human action.
PUE 1.4For every 1 kW of IT load, the facility consumes 1.4 kW total. Mech (cooling, lighting, etc.) adds 0.4. Modern target is < 1.5; hyperscale < 1.3.

Data Center Types — Full Taxonomy

Beyond Tier classification, data centers are categorized by ownership + scale + business model. Each type has different design priorities, regulatory regimes, and economic models.

TypeDescriptionTypical scaleOwnershipAtlas DC1?
Enterprise (On-Premises)Owned + operated by single organization for own IT500 kW – 5 MWOwner-operatedNo
Colocation (Colo)3rd-party owned facility; tenants lease space + power; tenants bring their own IT1 MW – 200+ MWMulti-tenant leasedYES — Atlas DC1 model
HyperscaleMassive purpose-built for cloud/internet (AWS, Azure, Google, Meta)20 MW – 1 GW+ per campusOwner-operated (cloud CSPs)No
EdgeSmall distributed; close to users for low latency. Often containerized.10 kW – 1 MWOperator/enterpriseNo
Modular / PrefabFactory-built modules shipped to site; deploy in weeks not yearsScalable per unitVariableNo
Wholesale ColoLarge-block leasing (1-50 MW per tenant) of power + shell space; tenant builds MEP1 MW – 50 MW per tenantMulti-tenant leasedNo
Retail ColoSmall-block leasing (single cabinets to small cages); operator provides built-out white space1 kW – 500 kW per tenantMulti-tenant leasedNo
i
2026 industry trend — AI/HPC reshapes the field
Global DC power demand exceeded 500 TWh/year in 2024 (IEA). AI/GPU workloads are pushing rack densities from typical 6-12 kW/rack to 30-100+ kW/rack. Liquid cooling is no longer optional for AI compute. Northern Virginia, Dallas, Phoenix, Chicago, Silicon Valley remain dominant North American markets. EU's CSRD and Green Deal are pushing PUE targets below 1.3 in new builds.

Full DC Metrics Suite — PUE + DCiE + WUE + CUE

Beyond PUE, the industry tracks several efficiency metrics. Each captures a different sustainability dimension.

MetricFormulaRange (best → worst)What it tells you
PUE (Power Usage Effectiveness)Total facility energy / IT energy1.0 → 3.0+Efficiency overhead. Modern target < 1.5; hyperscale < 1.3.
DCiE (Data Center infrastructure Efficiency)1 / PUE × 100%100% → 33%Inverse of PUE in %. PUE 1.4 = DCiE 71%.
WUE (Water Usage Effectiveness)Annual water use (L) / IT energy (kWh)0 → 5.0+ L/kWhCooling water consumption. Air-cooled DC: ~ 0. Evaporative-cooled: 1-3 L/kWh. Liquid-cooled with dry coolers: ~ 0.
CUE (Carbon Usage Effectiveness)Annual CO₂ emissions (kg) / IT energy (kWh)0 → 1.0 kgCO₂/kWhGrid-mix-dependent. Renewable-powered DC: ~ 0. Coal-grid: ~ 0.8.
ERE (Energy Reuse Effectiveness)(Total energy − reused energy) / IT energy0 → 3.0+Credits energy reused (e.g., heating nearby buildings). Can be lower than PUE for facilities with district heat reuse.
SPUE (Server PUE)Total server power / useful compute output1.0 → 2.0+Server-internal efficiency (PSU + fan + voltage regulator losses)
tPUE (Total PUE)PUE × SPUEUltimate efficiency from grid to compute. Captures everything.

Static Transfer Switch (STS) — Detail

STS is the fast-acting electronic switch enabling Distributed Redundant systems. While ATS (Automatic Transfer Switch) takes 100-200 ms (mechanical contactors), STS transitions in < 4 ms (1/4 cycle) using thyristor-controlled phases.

AspectSTS detail
Transfer time< 4 ms (1/4 cycle at 60 Hz). Below the ITIC curve threshold for IT equipment.
SourcesTwo synchronized AC inputs (typically two UPS outputs) — must be in-phase + same magnitude
Switching mechanismThyristors (SCRs) on each phase, fired at zero-current crossing
Synchronization requirementSources must stay within ±5° phase angle. PMS keeps both UPS outputs synced.
Where used in Atlas DC1NOT used at facility level (Atlas DC1 is 2N IR — dual-fed loads, no transfer needed). USED at rack PDU level inside dual-corded servers.
Where criticalDistributed Redundant (DR) systems — STS shifts load from failed module to catcher
Common specEaton, Schneider, Vertiv 100A-4000A static transfer switches

DCIM vs BMS vs EPMS — Three Different Systems

Modern DCs have multiple monitoring + control systems. They overlap but have distinct responsibilities.

SystemFull nameOwnsDoesn't ownVendor examples
BMSBuilding Management SystemHVAC, lighting controls, building security, fire alarm interface, water systemsIT-room rack-level, electrical metering down to circuitsHoneywell, Johnson Controls, Siemens Desigo, Schneider EcoStruxure Building
EPMSElectrical Power Monitoring SystemSwitchgear meters, power quality, breaker positions, alarms, energy loggingBuilding HVAC, IT inventorySchneider PowerLogic, Eaton Foreseer, GE iCM, Siemens SENTRON
DCIMData Center Infrastructure ManagementIT inventory (every rack, every server), capacity planning, asset lifecycle, cabling docs, real-time PUE, alarm aggregationBuilding HVAC controls, building security accessVertiv Trellis, Schneider EcoStruxure IT, Sunbird, Nlyte

In sophisticated facilities, BMS + EPMS + DCIM all push data into a single command center dashboard via OPC-UA, BACnet, Modbus, or SNMP integrations. In simpler facilities, they may be islands.

Cooling Topology — Hot Aisle / Cold Aisle + Containment

TopologyDescriptionPUE impact
No containment (legacy)Random rack orientation; cold + hot air mix in roomPUE 2.0+
Hot aisle / cold aisleRacks arranged front-to-front (cold) and back-to-back (hot). Reduces mixing.PUE 1.7-1.9
Cold aisle containment (CAC)Cold aisles enclosed (doors + ceiling); cold supply pressurized; hot return open to roomPUE 1.4-1.6
Hot aisle containment (HAC)Hot aisles enclosed; hot return ducted directly back to CRAH; cold supply open to roomPUE 1.3-1.5 (slightly better than CAC for fire suppression)
Pod / moduleSelf-contained cooling unit serving 1-10 racks; in-row CRAH or chimneyPUE 1.2-1.4
Rear-door heat exchangerLiquid coil on rack rear extracts heat at sourcePUE 1.1-1.2
Direct liquid cooling (DLC)Coolant circulated through CPU/GPU cold platesPUE 1.05-1.15 (mandatory for high-density AI)
Immersion coolingServers submerged in dielectric fluidPUE 1.02-1.10

Free Cooling — Economizer Hours

When outdoor air is cool enough, mechanical chillers can shut off and outdoor air provides cooling directly. Free cooling hours per year vary by climate.

Climate zoneFree cooling hours/yrMethod
Phoenix, AZ (hot dry)1,500-2,500Direct evaporative (when outdoor < 65°F WB)
Northern Virginia (hot humid)3,500-4,500Air-side or water-side
Dublin, Ireland (cool maritime)7,000-8,500Air-side direct (most of the year)
Stockholm, Sweden (cold)8,500+Air-side direct nearly year-round

Maximizing free cooling hours is the biggest single PUE-reduction lever. Why hyperscale operators pick locations like Iowa, Northern Sweden, Dublin: massive free cooling hours.

Reading a Colocation Contract

Engineers transitioning to client-facing roles encounter commercial language. Here's the decoder.

TermWhat it means
MSA (Master Services Agreement)Top-level contract between operator + tenant. References specific Order Forms.
Order Form / SOWSpecific engagement: power kW, space sq ft, term, rates
SLA (Service Level Agreement)Uptime guarantee (e.g., 99.99%), credit calculation if missed
Power capacity reservedMaximum kW the tenant can draw (in tokens). Pays for reservation regardless of actual use.
Power utilization (or "absorption")Actual kW consumed. May be billed separately from reservation.
Ramp schedulePre-agreed timeline for capacity expansion (e.g., 100 kW now, 250 kW in 6 mo, 500 kW in 12 mo)
Cross-connectFiber/copper interconnection between tenants or between tenant + carrier (extra fee per port)
Remote handsOperator's tech performs physical work in tenant's cabinet (extra hourly fee)
NRC vs MRCNon-Recurring Charge (one-time install/setup) vs Monthly Recurring Charge (ongoing)
Power Pass-throughOperator bills tenant for actual power consumed at utility cost + markup
Co-marketed PUEOperator publishes facility PUE; tenant uses it for own sustainability reporting

TIA-942-B — The Other Tier Standard, In Detail

While Uptime Institute Tiers focus on topology, TIA-942-B is comprehensive — it covers facility infrastructure across multiple subsystems and assigns Rated levels (1-4) to each subsystem independently.

SubsystemRated 1 (basic)Rated 2 (redundant)Rated 3 (concurrently maintainable)Rated 4 (fault tolerant)
Telecom PathwaySingle pathSingle path + redundant cabinetsMultiple paths · concurrently maintainableMultiple paths · fault tolerant
Architectural / StructuralNo specific seismic / fire requirementsSeismic Zone 0/1 designSeismic Zone 2/3 + 1-hour fire wallsSeismic Zone 4 + 2-hour fire walls + blast resistance
ElectricalSingle utilitySingle utility + N+1 componentsMultiple utility feeds + N+1 + concurrently maintainableMultiple utility feeds + 2N + fault tolerant
MechanicalSingle sourceN+1 componentsMultiple sources + concurrently maintainableMultiple sources + fault tolerant

A facility could be Electrical Rated-3 but Architectural Rated-1 — TIA acknowledges that subsystems can have different reliability requirements. Uptime Institute Tier rating, by contrast, is single-number for the whole facility.

Common Confusions — Quick Reference

If you hear...Don't confuse with...
"Tier III""3 nines" — different concepts. Tier III usually achieves 4 nines.
"2N""N+2" — 2N = two complete systems. N+2 = one system with 2 spare units.
"Concurrent maintainability""Fault tolerance" — concurrent = planned outage OK; fault tolerant = unplanned OK too.
"Tier-Certified""Tier-rated" or "Tier equivalent" — only Uptime Institute audited facilities can claim "certified."
"5 nines""Tier IV" — 5 nines is operational; Tier IV is structural. Not 1:1.
"PUE 1.0"Marketing — physically impossible (mech + lighting + losses always > 0). Best real DCs are ~ 1.1.
"4N3" or other custom notationStandard notation — N, N+1, 2N, 2(N+1), 3N. Anything outside this is vendor-specific or imprecise.

Data Center Tiers + Reliability Reference · v1.0