Guide Articles

Browse our guide articles on AI inference, model selection, and GPU planning.

All Posts Analysis AI Guides Learn Product

Jun 23, 2026

Guide·14 min read· Jun 23, 2026

Prompt Caching Explained: What Cuts Your Bill and What Breaks It

Prompt caching can cut input-token costs by up to 90%, but only when the cached prefix stays identical. How it works across providers, and why caches silently miss.

Jun 23, 2026

Guide·9 min read· Jun 23, 2026

Structured Outputs and JSON Mode: Getting Reliable JSON From an LLM

JSON mode guarantees valid JSON. Structured outputs guarantee the right shape. How constrained decoding works, how the major providers differ, and why a model can still hand you broken JSON.

Jun 15, 2026

Guide·11 min read· Jun 15, 2026

Quantized, Distilled, or Fine-Tuned: What the Labels Mean

Model quantization, LoRA, distillation, and fine-tuning are not interchangeable. A practical guide to what each label does and whether the original benchmark still applies.

May 10, 2026

Guide·7 min read· May 10, 2026

Llama 3.3 70B Sizing Across H100, H200, and B200

Same model, three GPU generations. Here's how Llama 3.3 70B actually performs on H100 SXM, H200, and B200: VRAM headroom, throughput per dollar, and which tier makes sense for which workload.

May 8, 2026

Guide·7 min read· May 8, 2026

Self-Hosting DeepSeek V3: What It Actually Costs

DeepSeek V3 is 671B total parameters with 37B active per token. Here's the realistic VRAM budget, GPU count, and monthly cost to serve it yourself, vs. what the API providers charge.

May 7, 2026

Guide·8 min read· May 7, 2026

Sizing Llama 4 Scout for Production Inference

What it actually takes to serve Llama 4 Scout (109B total / 17B active) in production: VRAM budget, throughput per H100, monthly cost, and where most teams get the math wrong.

Apr 14, 2026

Guide·13 min read· Apr 14, 2026

AI Model Comparison: How to Compare LLMs Across Benchmarks, Pricing, and Capabilities

A systematic framework for comparing AI models side by side. Covers benchmarks, pricing, context windows, capabilities, and when each comparison dimension matters most.

Mar 4, 2026

Guide·16 min read· Mar 4, 2026