Right Model, Right Task, Right Price

Per-task model routing: cheap for summaries, smart for analysis, local for privacy. 3 recommended combos benchmarked for quality, speed, and cost.

How Model Routing Works

Open Notebook lets you assign different AI providers to different tasks. You're not locked to one model for everything. This is the architecture:

Task	What It Does	Quality Sensitivity
Chat / Analysis	Q&A over documents, summarization, key-point extraction	High — model quality determines answer depth
Embeddings	Converts documents into vector search indexes	Medium — embedding dim and retrieval accuracy
TTS (Text-to-Speech)	Podcast audio generation	High — voice naturalness and multi-speaker quality
Podcast Transcript	Generates the podcast script/dialogue	High — structure, flow, and accuracy of speakers

The key insight: Don't use GPT-4o for embeddings — text-embedding-3-small is 100× cheaper and just as good. Don't use Ollama for TTS — ElevenLabs produces natural multi-speaker audio that open-source TTS can't match. Route each task to the right model.

3 Recommended Combos

Best Quality

GPT-4o + text-embedding-3-small + ElevenLabs

Task	Provider	Model	Cost
Chat	OpenAI	GPT-4o	$2.50/1M input, $10/1M output
Embeddings	OpenAI	text-embedding-3-small	$0.02/1M tokens
TTS	ElevenLabs	Multilingual v2	$0.30/1K chars
Transcript	Anthropic	Claude Sonnet 4	$3/1M input, $15/1M output

Best for: published research, client-facing reports, anything where quality is visible. ~$20–40/month for a heavy user.

Best Value

GPT-4o-mini + text-embedding-3-small + ElevenLabs

Task	Provider	Model	Cost
Chat	OpenAI	GPT-4o-mini	$0.15/1M input, $0.60/1M output
Embeddings	OpenAI	text-embedding-3-small	$0.02/1M tokens
TTS	ElevenLabs	Multilingual v2	$0.30/1K chars
Transcript	OpenAI	GPT-4o-mini	Same as chat

Best for: daily research, internal team use, learning. ~$5–15/month for a heavy user. GPT-4o-mini is 90% as good as GPT-4o for document Q&A at 1/20th the price.

Zero Cost

Groq + Ollama Embeddings + ElevenLabs

Task	Provider	Model	Cost
Chat	Groq (free tier)	Llama 3.3 70B or DeepSeek-R1	$0
Embeddings	Ollama (local)	nomic-embed-text or qwen3-embedding	$0
TTS	ElevenLabs	Multilingual v2	$0.30/1K chars
Transcript	Groq (free tier)	Llama 3.3 70B	$0

Best for: personal use, experimentation, privacy-critical docs (embeddings stay local). Groq free tier has rate limits — ~30 requests/minute. Enough for individual use.

Provider Compatibility Matrix

Not every provider supports every task type. Here's what works:

Provider	Chat	Embeddings	TTS	Transcript	Best For
OpenAI	✓	✓	✓	✓	All-around, best quality
Anthropic	✓	✗	✗	✓	Best long-document analysis
Google Gemini	✓	✓	✗	✓	2M token context window
Ollama	✓	✓	✗	✓	Privacy, zero ongoing cost
Groq	✓	✗	✗	✓	Fastest inference, free tier
DeepSeek	✓	✗	✗	✗	Cheap API, strong reasoning
ElevenLabs	✗	✗	✓	✗	TTS only — best voice quality
Mistral	✓	✓	✗	✓	EU data residency
xAI (Grok)	✓	✗	✗	✗	Large context, alternative to Google

Embedding Models: The Silent Quality Factor

Your embedding model determines how accurately Open Notebook finds relevant passages in your documents. A bad embedding model means the AI is answering from the wrong context.

Model	Dimensions	Cost	Best For
text-embedding-3-small	512–1536	$0.02/1M tokens	General purpose, best cost/perf ratio
text-embedding-3-large	256–3072	$0.13/1M tokens	When retrieval accuracy matters most
nomic-embed-text (Ollama)	768	$0	Privacy, offline, 100% local
qwen3-embedding (Ollama)	1024	$0	Best local embedding quality, multilingual

Embedding gotcha (GitHub #655): If you see "Connection error" for Ollama embeddings even though the connection test passes — it's the Esperanto library bug. Your embedding model must be pulled in Ollama (ollama pull nomic-embed-text) and the base URL must be the Docker service name, not localhost. More in FAQ →

Models Are One Piece. Production Is the Rest.

Next: configure these models in Open Notebook → or get the Production Manual for monitoring, CI/CD, and the 30+ errors that hit beyond a single user.

Get the Production Manual — $19