Skip to main content

How Our AI Model Recommendations Work

Inferbase TeamMarch 31, 20265 min read

Our AI Model Recommendation wizard analyzes your requirements and scores models from our catalog to find the best fit. This document explains exactly how the scoring works.

The 5-Step Process

  1. You tell us your industry (Software, Healthcare, Finance, etc.)
  2. You pick a use case (Code Generation, Chatbot, Fraud Detection, etc.) or describe a custom one
  3. You select your scale (Hobby, Startup, Growing, Enterprise)
  4. You choose 1-2 priorities (Cost, Quality, Speed, Privacy, Integration)
  5. We score and rank all matching models, returning the top 5

How Scoring Works: The 70/25/5 Model

Each model receives a score from 0-100, composed of three weighted signals:

WeightSignalWhat it measures
70%Use Case MatchHow well the model fits your specific use case
25%Priority ScoreHow well the model aligns with your selected priority
5%PopularityAdoption and verification status

Use Case Match (70% of score)

This is the primary signal, combining three sub-scores:

Tag Matching (up to 50 points)

Every model in our catalog has use case tags (e.g., "code-generation", "chatbot", "finance"). We match these against your selection:

  • Exact match (model tagged with your use case): 50 points
  • Related use case in the same domain: 40 points
  • General domain match: 30 points
  • General-purpose model: 20 points
  • Specialist in a different domain: 5 points

For custom use cases, we extract keywords from your description and match them against model tags and descriptions. This gives meaningful scoring even for use cases not in our predefined list.

Capability Matching (up to 35 points)

Each use case has preferred capabilities (e.g., Code Generation prefers code_generation, function_calling, streaming). We check what percentage of these the model supports:

Points = 35 × (capabilities matched / capabilities required)

Context Window Fit (up to 15 points)

Each use case has a minimum context window requirement. Models are scored based on how much they exceed this:

  • 4x or more the requirement: 15 points
  • 2x the requirement: 12 points
  • Meets the requirement: 8 points
  • Below but usable (50%+): 4 points

Priority Score (25% of score)

Based on your selected priority, we apply a specific scoring method:

Cost Priority

  • Blended cost = (input cost + output cost × 2) / 3 per million tokens
  • Under $0.50: 95 points | $0.50-2: 80 | $2-5: 65 | $5-10: 50 | $10-20: 30 | Over $20: 15
  • Scale multiplier: Hobby projects weight cost 1.5x, enterprise weights it 0.75x

Quality Priority

  • Averaged benchmark scores across relevant benchmarks for the use case
  • Code use cases check HumanEval, MBPP, LiveCodeBench
  • Reasoning use cases check MMLU, GPQA, ARC
  • General use cases check a broad mix

Speed Priority

  • Based on tokens per second throughput
  • Over 150 tok/s: 95 points | 100-150: 80 | 50-100: 65 | 20-50: 40 | Under 20: 25

Privacy Priority

  • Open source (weights downloadable): +40 points
  • Permissive license (Apache, MIT, Llama): +15 points
  • Self-hosting deployment options: +15 points
  • Small enough for single-GPU hosting (under 13B): +10 points

Integration Priority

  • Function calling / tool use: +25 points
  • JSON mode / structured output: +15 points
  • Streaming support: +10 points
  • Documentation available: +10 points

Popularity (5% of score)

  • Featured or verified models receive a small boost
  • This prevents obscure models from ranking above well-known, battle-tested ones

How We Find Candidate Models

Before scoring, we filter the catalog to find relevant candidates:

  1. Tag overlap: Models must have at least one tag matching your industry domain or use case
  2. Required capabilities: If your use case needs specific capabilities (e.g., vision for quality control), models without them are excluded
  3. Modality requirements: If you need image or audio input, models without those modalities are excluded
  4. Candidate limit: We evaluate up to 200 matching models

Open Source Guarantee

When showing "All Models" results, we guarantee at least one open source model appears in the top 5 recommendations. If the top 5 are all proprietary, we swap the lowest-ranked one with the highest-scoring open source alternative. You can also toggle "Open Source Only" to see exclusively open source recommendations.

Auto-Tagging

Models in our catalog are automatically tagged using rule-based analysis of their:

  • Name patterns: "Coder" → software, "Med" → healthcare
  • Capabilities: function_calling → api-integration, code_generation → software
  • Benchmarks: HumanEval scores → code specialization
  • Model family: GPT-4, Llama, Gemini → general purpose

Tags can be overridden by our team for accuracy. The auto-tagger runs when new models are added to the catalog.

What Affects Recommendation Quality

The quality of recommendations depends on our data coverage:

  • Use case tags: How well we've tagged each model's strengths
  • Capabilities: Whether we know the model's feature set (function calling, vision, etc.)
  • Benchmarks: Performance data for quality-based recommendations
  • Pricing: Cost data for cost-based recommendations
  • Context window: For context-sensitive use cases (RAG, document analysis)

We continuously improve our data coverage through automated enrichment pipelines and manual review.

Known Limitations

  1. Scoring is rule-based: We use predefined weights and thresholds, not ML-based relevance. This means edge cases may not score optimally.

  2. Custom use cases get generic scoring: When you type a custom use case, we extract keywords and match against model descriptions. This works but isn't as precise as predefined use case scoring.

  3. Data coverage varies: Not all models have complete benchmark, pricing, or capability data. Models with missing data score conservatively (middle of the range).

  4. No real-time performance data: Speed and throughput scores are based on published specs, not live benchmarks.

Build Your AI Stack with Confidence

Compare pricing, benchmarks, and GPU requirements for any model. Free to use.