Cluster Ops TCO Calculator
When does on-prem Mac Mini MLX beat frontier API? Punch in your monthly token volume + model class + latency target. The numbers below are real, the math is shown, and the break-even is unforgiving in either direction — sometimes the API wins.
Inputs
Cluster Ops sizing
vs frontier API — same horizon
Show the math
computing…
Where the API wins. Low monthly volume (under ~30M tokens/month for medium-class), bursty traffic without sustained peak, frontier-quality requirements where on-prem is not yet a fair comparison (full-precision GPT-5/Opus class). The calculator says so when it’s honest.
Where on-prem wins. Sustained high volume (~100M+ tokens/mo medium-class), data-residency requirements (HIPAA / GDPR / public-sector), or workflows where the latency to a co-located mini beats round-trip to a hyperscaler region. Cluster Ops productizes this: hot-swap models, automated eviction, hardened deploys, monthly cost-per-token report.
Assumptions are visible (click "Show the math" above) and biased pessimistic for on-prem — we assume entry-level M4 Pro mini at $2,000, $0.15/kWh power, $5/mo colo amortized, and 1 ops-hour/mini/month at $200/hr. Plug in your own numbers; the page is pure client-side JS and the math is in plain view.