Model Registry

The business guide to LLM models

Search, filter, and compare the top AI models for business. Real pricing, honest trade-offs, and interactive tools.

Compare models Cost calculator Local hosting check

GuideUpdated May 2026

Selection workspace

Compare models by the job, not the hype

The right model changes by workload. This guide frames model choice around cost, privacy, context length, and the level of judgment each task needs.

Budget

GPT-4o mini

Lowest paid API cost

Privacy

Llama / Qwen

Self-hosted control

Context

Gemini 1.5

Million-token windows

Support

Claude Haiku

Strong tone and safety

Business model stack

Recommended routing by workload

Balanced

Support replies

Claude 3 Haiku

$High fit

Document extraction

GPT-4o mini

$High fit

Internal search

Gemini Flash

$Medium fit

Private records

Llama 3.1

FixedHigh fit

Advisor output

Start with a hosted API for speed. Move sensitive or high-volume workflows to open-source once usage patterns are clear.

14 models

API

GPT-4o

OpenAI

Flagship multimodal model with strong creative writing, reliable reasoning, and massive developer ecosystem.

EnterpriseFast

128K$2.50 / $10.00per 1M tokens

API

Claude 3.5 Sonnet

Anthropic

Top-tier code generation, complex reasoning, and nuanced instruction following. Excels at multi-file tasks.

ReasoningCodeEnterprise

200K$3.00 / $15.00per 1M tokens

API

Gemini 1.5 Pro

Google

Google's high-capability model with a massive 1M token context window for processing entire codebases or documents.

FastEnterprise

1M$1.25 / $5.00per 1M tokens

API

GPT-4o mini

OpenAI

OpenAI's cost-optimized model. Great balance of quality and price for routine tasks at scale.

Fast

128K$0.15 / $0.60per 1M tokens

API

Claude 3 Haiku

Anthropic

Near-free pricing with best-in-class safety and instruction following. Fast response times for real-time chat.

Fast

200K$0.25 / $1.25per 1M tokens

API

DeepSeek-V3

DeepSeek

Extremely cost-effective with competitive quality. Specialized chain-of-thought reasoning architecture.

ReasoningFast

685B (37B active)128K$0.27 / $1.10per 1M tokens

API

Gemini 1.5 Flash

Google

Ultra-fast and ultra-cheap. 1M token context with Google's generous free tier.

Fast

1M$0.075 / $0.30per 1M tokens

Open Source

Llama 3.1 405B

Qwen3-235B

Alibaba Cloud

Massive MoE model with efficient routing. Strong multilingual support (29+ languages) and built-in tool calling.

Open SourceEnterprise

235B (22B active)128KSelf-host / Self-hostper 1M tokens

Open Source

DeepSeek-R1

DeepSeek

Specialized chain-of-thought reasoning. MIT license — fully permissive for commercial use.

Open SourceReasoning

671B (37B active)128KSelf-host / Self-hostper 1M tokens

Free

Qwen3-Coder

OpenRouter

Purpose-built for code generation and debugging. Free tier via OpenRouter — zero cost to start.

FreeCode

128KFree

Free

Kimi-K2.5

OpenRouter

Strong general-purpose reasoning at zero cost. Excellent at long-document analysis.

Free

128KFree

API

Mistral Large 2

Mistral AI

Near-GPT-4 quality on code and reasoning. Fluent in 12+ languages with native-level quality.

Open SourceMultilingual

123B128K$2.00 / $6.00per 1M tokens

Open Source

Mixtral 8x22B

Mistral AI

Mixture-of-Experts design keeps latency low. Strong coding and technical problem solving.

Open SourceCodeFast

176B (39B active)65KSelf-host / Self-hostper 1M tokens

Understanding your options

Open-source vs. API models

Two paths to the same destination. The right choice depends on your budget, your team, and how much control you need.

Open-source

Self-hosted models

Download and run on your own servers. Full control over data, customization, and cost.

Data privacy, Nothing leaves your servers.

Fixed costs, Pay for hardware, not per token.

Customizable, Fine-tune on your own data.

No vendor lock-in, Switch models anytime.

Best for

Teams with GPU infrastructure, strict data requirements, or high-volume workloads where per-token pricing becomes expensive.

API

Hosted API models

Send requests to a hosted service. Zero infrastructure, pay only for what you use.

Zero setup, API key and you are running.

Always updated, Provider handles model upgrades.

Scalable, Handles spikes automatically.

Lower barrier, No ML expertise needed.

Best for

Startups, small teams, and businesses that want fast results without managing infrastructure. Ideal for prototyping and moderate-volume use.

Local hosting calculator

Will this model run on your device?

Type in a device, choose an open-source model, and see whether the local setup has enough memory and speed headroom.

Device

System memory

Open-source model

Quantization

Context window

Parallel users

Runs, with normal limits

Llama 3.1 8B

Llama 3.1 8B should run locally, but keep context and parallel users modest.

Estimated memory7.2GB / 11.5GB

Weights4.7GB

KV cache0.9GB

Runtime1.6GB

Speed margin+0.5

Good portable setup for smaller local models and single-user testing.

Use MLX or Ollama and close memory-heavy apps before running long context windows.

Better local fits

Side by side

Comparison table

Pricing per 1 million tokens. Scroll horizontally on mobile.

Model	Provider	Type	Input/1M	Output/1M	Context	Quality	Speed
GPT-4o	OpenAI	API	$2.50	$10.00	128K	Excellent	Fast
Claude 3.5 Sonnet	Anthropic	API	$3.00	$15.00	200K	Excellent	Fast
Gemini 1.5 Pro	Google	API	$1.25	$5.00	1M	Very Good	Fast
DeepSeek-V3	DeepSeek	API	$0.27	$1.10	128K	Very Good	Fast
GPT-4o mini	OpenAI	API	$0.15	$0.60	128K	Good	Very Fast
Claude 3 Haiku	Anthropic	API	$0.25	$1.25	200K	Good	Very Fast
Gemini 1.5 Flash	Google	API	$0.075	$0.30	1M	Good	Very Fast
Qwen3-Coder	OpenRouter	Free	Free	Free	128K	Good	Varies
Kimi-K2.5	OpenRouter	Free	Free	Free	128K	Good	Varies
Llama 3.1 405B	Meta	Open Source	Self-host	Self-host	128K	Excellent	Varies
Mistral Large 2	Mistral	Open Source	Self-host	Self-host	128K	Very Good	Varies

Pricing reflects published API rates as of May 2026. Open-source costs depend on your hosting infrastructure.

Interactive tool

Token cost calculator

Pick a use case and a model, see real-dollar estimates instantly. No signup required.

Use Case

Model

Per task

<$0.01

Monthly

$75.00

Annual

$900.00

Tasks / mo

15,000

Estimates based on published API pricing as of April 2026. Actual costs vary with prompt complexity and caching. Free-tier models may have rate limits.

Interactive tool

Model selection advisor

Answer four quick questions and get a personalized model recommendation for your use case.

Question 1 of 4

What matters most for your project?

FAQ

Frequently asked questions

Everything you need to know about choosing and using LLM models for your business.

A Large Language Model (LLM) is an AI system trained on massive text datasets to understand and generate human-like text. For businesses, LLMs power chatbots, automate content creation, summarize documents, qualify leads, write code, and handle customer support. The right LLM can cut operational costs by 40 to 70% on repetitive text-based work while running 24/7.

API models like GPT-4o or Claude are hosted by the provider — you send requests and pay per token. Setup is instant but you depend on their servers and pricing. Open-source models like Llama 3 or Qwen3 can be downloaded and run on your own infrastructure. You control the data, the cost is fixed (just hardware), and there is no vendor lock-in. The trade-off is that self-hosting requires technical expertise and GPU hardware.

For many tasks, yes. Models like Qwen3-Coder via OpenRouter and Gemini 1.5 Flash via Google free tier handle code generation, summarization, and customer Q&A surprisingly well. The main limitations are rate limits and availability. We recommend prototyping on free models, then upgrading to a paid tier only for the specific tasks that need more capacity or reliability.

LLM costs are measured in tokens (roughly 0.75 words per token). Estimate: (input tokens + output tokens per task) × tasks per month × price per million tokens. A customer support bot handling 500 tickets/day with GPT-4o mini costs roughly $4 to $8 per month. Our calculator below lets you model costs across 11 different models.

For customer support, prioritize safety, instruction-following, and cost efficiency. Claude 3 Haiku ($0.25/1M input) is excellent for brand-safe customer interactions. GPT-4o mini ($0.15/1M input) offers similar quality with broader tooling. For free prototyping, Kimi-K2.5 via OpenRouter handles conversational tasks well.

OpenRouter is a unified API gateway that routes requests to multiple AI model providers through a single endpoint. Some models are offered free during promotional periods or because providers subsidize access to gain market share. The free tier typically has rate limits and may not offer the same SLAs as paid plans.

Yes, and you should design for it. Most LLM APIs follow the OpenAI chat completions format, so switching from GPT-4o to Claude to an open-source model requires minimal code changes. Tools like LiteLLM, OpenRouter, and the Vercel AI SDK abstract provider differences behind a single interface.

We handle the full stack, from selecting the right model for your use case to building production-ready integrations. This includes prompt engineering, API integration, workflow automation, fine-tuning on your data, self-hosted deployments, and ongoing optimization. Every project starts with a discovery call and includes a fixed-price quote.

Need help choosing?

We build AI systems that actually work

Not sure which model fits your business? We help companies select, integrate, and optimize LLMs for real workflows, from chatbots to document processing to custom automation.

Free discovery call to scope your AI project
Fixed-price quote before any work starts
Working prototype in 2 to 4 weeks

Book a free call View our services