Question 1

When does on-prem Mac Mini MLX beat frontier API?

Accepted Answer

Roughly: sustained ~100M+ tokens/month at medium-class (30B–70B), or data-residency requirements (HIPAA, GDPR, public-sector) that disqualify hyperscaler APIs. Lower volume or bursty traffic without sustained peak usually favors the API. Punch your numbers into the calculator above; the break-even row tells you which side wins for your shape.

Question 2

What model size matches each class?

Accepted Answer

Small (7B–13B): chat, classification, summaries — ~120 tok/s per M4 Pro mini at Q4. Medium (30B–70B): agent workflows, code, RAG — ~35 tok/s per mini. Large (70B+, MoE or dense): frontier-ish reasoning — ~12 tok/s per mini. Numbers are realistic MLX sustained throughput on the entry-config Mac Mini (64GB unified memory).

Question 3

Why Mac Mini and not GPU?

Accepted Answer

Price-to-performance for the 7B–70B class. The M4 Pro Mac Mini at $2,000 with 64GB unified memory runs Q4 quants up to 70B with usable latency. For frontier-precision DeepSeek-V3 or GPT-4-class workloads, GPU is the right answer — Cluster Ops is specifically positioned at the sub-frontier-quality-for-most-tasks economic sweet spot.

Question 4

What's not included in the math?

Accepted Answer

Network egress (negligible for inference), model loading time (one-time at process start), and the engineering hours to actually deploy and operate the cluster. Cluster Ops as a subscription rolls those engineering hours into the monthly fee — we assume $200/hr × 5 minutes/mini/month in the ops line. Self-operate and the ops line is your time instead of cash.

Question 5

Are the frontier API prices accurate for 2026?

Accepted Answer

2026-Q2 reference. GPT-4o $12.50/M blended (in+out 50/50), Claude Sonnet $9/M, Gemini Pro $5/M for medium-class; GPT-4o-mini $1.20/M, Claude Haiku $1.50/M, Gemini Flash $0.50/M for small. Check the rate cards on each provider's site for current numbers — they shift quarterly. The math trace below the calculator shows every assumption and lets you mentally adjust.

Question 6

How do I verify the numbers myself?

Accepted Answer

Click 'Show the math' under the comparison cards — every assumption is dumped in plain text. Substitute your numbers (your power rate, your mini config, your peak/avg ratio) and re-derive. The calculator is pure client-side JavaScript with no backend; the entire model is the MODEL constant near the top of the page source.

Cluster Ops TCO Calculator

Inputs

Cluster Ops sizing

vs frontier API — same horizon