Cluster Ops TCO Calculator

When does on-prem Mac Mini MLX beat frontier API? Punch in your monthly token volume + model class + latency target. The numbers below are real, the math is shown, and the break-even is unforgiving in either direction — sometimes the API wins.

Inputs

100M
In millions. Slide to your typical monthly throughput across all calls.
medium (30B–70B)
Determines tok/sec per mini and which frontier API tier is the fair comparison.
150 ms
Tighter latency = more headroom = more minis. Default 150ms covers most chat + RAG.
3.0×
If peak hour is 3× the average, you size for peak to keep latency.
36 mo
36mo matches typical Mac Mini service life. Sets the TCO window below.

Cluster Ops sizing

Mac Mini count (peak) ·
Capex (one-time) ·
Monthly opex (power + colo + ops) ·
Cost per million tokens (Cluster Ops) ·
TCO over horizon ·

vs frontier API — same horizon

GPT-4o
·
·
Claude Sonnet
·
·
GPT-4o-mini
·
·
Break-even vs cheapest comparable API ·
Show the math
computing…

Where the API wins. Low monthly volume (under ~30M tokens/month for medium-class), bursty traffic without sustained peak, frontier-quality requirements where on-prem is not yet a fair comparison (full-precision GPT-5/Opus class). The calculator says so when it’s honest.

Where on-prem wins. Sustained high volume (~100M+ tokens/mo medium-class), data-residency requirements (HIPAA / GDPR / public-sector), or workflows where the latency to a co-located mini beats round-trip to a hyperscaler region. Cluster Ops productizes this: hot-swap models, automated eviction, hardened deploys, monthly cost-per-token report.

Assumptions are visible (click "Show the math" above) and biased pessimistic for on-prem — we assume entry-level M4 Pro mini at $2,000, $0.15/kWh power, $5/mo colo amortized, and 1 ops-hour/mini/month at $200/hr. Plug in your own numbers; the page is pure client-side JS and the math is in plain view.