💼 Built for Finance & Procurement Teams

GPU Compute TCO Calculator

Model the true total cost of AI compute — beyond the hourly rate. Factor in utilization, reserved pricing, egress, and overhead. Compare providers side-by-side on an apples-to-apples financial basis.

Model Inputs
✓ On-demand pricing ✓ Reserved / committed pricing ✓ Spot / preemptible ✓ Utilization-adjusted cost ✓ Egress & networking

Model Parameters

Adjust inputs to model your workload

1 GPU128 GPUs512 GPUs
1 day1 month1 year
10% (dev/test)50%100% (burst)
None5 TB50 TB
None10%30%

Total Cost of Ownership

Live Model
Total Cost
$0
for 30 days
Effective $/GPU-hr
$0.00
utilization-adjusted
Cost / PFLOP-hr
$0.000
H100 SXM5
Compute (GPU hours)
90%
$0
Egress & Networking
5%
$0
Support & Overhead
5%
$0
Model methodology: Total cost = (GPUs × hours × price × utilization adjustment) + egress costs + overhead. Egress rates: $0.09/GB hyperscalers, $0.01/GB neoclouds. Reserved pricing reflects upfront commitment discounts. Spot pricing includes an estimated 15% interruption buffer.

Provider Comparison

H100 SXM5

Same GPU, same duration, same utilization — what your workload would cost across all providers. Your selected provider is highlighted.

Utilization Sensitivity

Finance Model

How total cost changes at different utilization rates — the single biggest variable in any compute budget. Highlighted cell = your current assumption.

Training Cost Estimates

At Current Rate

Estimated cost to train common model architectures from scratch using your selected configuration. Based on published GPU-hour benchmarks.

Sources & assumptions: GPU-hours sourced from published research (Chinchilla, LLaMA, Mistral papers) and infrastructure blog posts. Assumes 100% MFU (model flop utilization) — real-world efficiency is typically 30–55%. Actual costs will be 2–3× higher.

Inference Economics

At 100k tok/s throughput

Estimated cost-per-token metrics at your selected provider and GPU rate. Throughput assumes continuous batching at representative throughput for the GPU tier.

Throughput assumptions: H100 SXM5 ~3,000 tok/s per GPU (Llama-3 70B, int8). H100 PCIe ~2,200 tok/s. A100 80GB ~1,400 tok/s. A10G ~400 tok/s. Output token cost ≈ 3× input token cost due to KV-cache constraints.
Methodology

How This Model Works

The assumptions and data sources behind every number.

📐

Utilization-Adjusted Cost

On-demand GPU instances are billed by the hour regardless of actual GPU utilization. A GPU sitting at 40% utilization still costs the same as one at 100%. This model makes that cost visible — what you're actually paying per effective compute unit.

📦

True Egress Costs

Hyperscalers charge $0.08–$0.09/GB for data egress. Neoclouds charge ~$0.01/GB or nothing. For large model checkpoints and dataset transfers, this can add 5–15% to your total compute bill. This model surfaces the full picture.

📅

Reserved vs On-Demand

1-year reserved instances typically cost 25–35% less than on-demand. 3-year reserved can be 40–55% less. The break-even point is roughly 3 months of continuous usage — after that, reserved pricing wins for stable workloads.

Spot / Preemptible Pricing

Spot instances can be 60–85% cheaper than on-demand but are interruptible. This model adds a 15% interruption buffer to spot pricing to account for wasted compute from restarts — giving a more realistic effective cost.

🧮

PFLOP-hour Metric

Cost per petaFLOP-hour normalizes across different GPU architectures. An A100 at $1.79/hr and an H100 at $2.49/hr look similar in raw price, but the H100 delivers 3× the FP16 throughput — making the cost per unit of compute dramatically different.

🎓

Training Hour Estimates

GPU-hours for training standard model sizes sourced from published academic papers (LLaMA, Chinchilla, Mistral). These are theoretical minimums assuming 100% MFU. Production training runs typically achieve 30–55% MFU due to pipeline bubbles and I/O.