Model Parameters

Adjust inputs to model your workload

GPU Type

Cloud Provider

Pricing Model

Number of GPUs 8

1 GPU128 GPUs512 GPUs

Duration (hours) 720 hrs

1 day1 month1 year

GPU Utilization Rate 80%

10% (dev/test)50%100% (burst)

Monthly Egress (GB) 500 GB

None5 TB50 TB

Workload Type

Support & Overhead 10%

None10%30%

Total Cost of Ownership

Live Model

Total Cost

for 30 days

Effective $/GPU-hr

$0.00

utilization-adjusted

Cost / PFLOP-hr

$0.000

H100 SXM5

Compute (GPU hours)

90%

Egress & Networking

Support & Overhead

Model methodology: Total cost = (GPUs × hours × price × utilization adjustment) + egress costs + overhead. Egress rates: $0.09/GB hyperscalers, $0.01/GB neoclouds. Reserved pricing reflects upfront commitment discounts. Spot pricing includes an estimated 15% interruption buffer.

Provider Comparison

H100 SXM5

Same GPU, same duration, same utilization — what your workload would cost across all providers. Your selected provider is highlighted.

Utilization Sensitivity

Finance Model

How total cost changes at different utilization rates — the single biggest variable in any compute budget. Highlighted cell = your current assumption.

Training Cost Estimates

At Current Rate

Estimated cost to train common model architectures from scratch using your selected configuration. Based on published GPU-hour benchmarks.

Sources & assumptions: GPU-hours sourced from published research (Chinchilla, LLaMA, Mistral papers) and infrastructure blog posts. Assumes 100% MFU (model flop utilization) — real-world efficiency is typically 30–55%. Actual costs will be 2–3× higher.

Inference Economics

At 100k tok/s throughput

Estimated cost-per-token metrics at your selected provider and GPU rate. Throughput assumes continuous batching at representative throughput for the GPU tier.

Throughput assumptions: H100 SXM5 ~3,000 tok/s per GPU (Llama-3 70B, int8). H100 PCIe ~2,200 tok/s. A100 80GB ~1,400 tok/s. A10G ~400 tok/s. Output token cost ≈ 3× input token cost due to KV-cache constraints.

Methodology

How This Model Works

The assumptions and data sources behind every number.

📐

Utilization-Adjusted Cost

On-demand GPU instances are billed by the hour regardless of actual GPU utilization. A GPU sitting at 40% utilization still costs the same as one at 100%. This model makes that cost visible — what you're actually paying per effective compute unit.

📦

True Egress Costs

Hyperscalers charge $0.08–$0.09/GB for data egress. Neoclouds charge ~$0.01/GB or nothing. For large model checkpoints and dataset transfers, this can add 5–15% to your total compute bill. This model surfaces the full picture.

📅

Reserved vs On-Demand

1-year reserved instances typically cost 25–35% less than on-demand. 3-year reserved can be 40–55% less. The break-even point is roughly 3 months of continuous usage — after that, reserved pricing wins for stable workloads.

⚡

Spot / Preemptible Pricing

Spot instances can be 60–85% cheaper than on-demand but are interruptible. This model adds a 15% interruption buffer to spot pricing to account for wasted compute from restarts — giving a more realistic effective cost.

🧮

PFLOP-hour Metric

Cost per petaFLOP-hour normalizes across different GPU architectures. An A100 at $1.79/hr and an H100 at $2.49/hr look similar in raw price, but the H100 delivers 3× the FP16 throughput — making the cost per unit of compute dramatically different.

🎓

Training Hour Estimates

GPU-hours for training standard model sizes sourced from published academic papers (LLaMA, Chinchilla, Mistral). These are theoretical minimums assuming 100% MFU. Production training runs typically achieve 30–55% MFU due to pipeline bubbles and I/O.

GPU Compute TCO Calculator