Model the true total cost of AI compute — beyond the hourly rate. Factor in utilization, reserved pricing, egress, and overhead. Compare providers side-by-side on an apples-to-apples financial basis.
Adjust inputs to model your workload
Same GPU, same duration, same utilization — what your workload would cost across all providers. Your selected provider is highlighted.
How total cost changes at different utilization rates — the single biggest variable in any compute budget. Highlighted cell = your current assumption.
Estimated cost to train common model architectures from scratch using your selected configuration. Based on published GPU-hour benchmarks.
Estimated cost-per-token metrics at your selected provider and GPU rate. Throughput assumes continuous batching at representative throughput for the GPU tier.
The assumptions and data sources behind every number.
On-demand GPU instances are billed by the hour regardless of actual GPU utilization. A GPU sitting at 40% utilization still costs the same as one at 100%. This model makes that cost visible — what you're actually paying per effective compute unit.
Hyperscalers charge $0.08–$0.09/GB for data egress. Neoclouds charge ~$0.01/GB or nothing. For large model checkpoints and dataset transfers, this can add 5–15% to your total compute bill. This model surfaces the full picture.
1-year reserved instances typically cost 25–35% less than on-demand. 3-year reserved can be 40–55% less. The break-even point is roughly 3 months of continuous usage — after that, reserved pricing wins for stable workloads.
Spot instances can be 60–85% cheaper than on-demand but are interruptible. This model adds a 15% interruption buffer to spot pricing to account for wasted compute from restarts — giving a more realistic effective cost.
Cost per petaFLOP-hour normalizes across different GPU architectures. An A100 at $1.79/hr and an H100 at $2.49/hr look similar in raw price, but the H100 delivers 3× the FP16 throughput — making the cost per unit of compute dramatically different.
GPU-hours for training standard model sizes sourced from published academic papers (LLaMA, Chinchilla, Mistral). These are theoretical minimums assuming 100% MFU. Production training runs typically achieve 30–55% MFU due to pipeline bubbles and I/O.