Right Model, Right Task, Right Price

Per-task model routing: cheap for summaries, smart for analysis, local for privacy. 3 recommended combos benchmarked for quality, speed, and cost.

How Model Routing Works

Open Notebook lets you assign different AI providers to different tasks. You're not locked to one model for everything. This is the architecture:

TaskWhat It DoesQuality Sensitivity
Chat / AnalysisQ&A over documents, summarization, key-point extractionHigh — model quality determines answer depth
EmbeddingsConverts documents into vector search indexesMedium — embedding dim and retrieval accuracy
TTS (Text-to-Speech)Podcast audio generationHigh — voice naturalness and multi-speaker quality
Podcast TranscriptGenerates the podcast script/dialogueHigh — structure, flow, and accuracy of speakers
The key insight: Don't use GPT-4o for embeddings — text-embedding-3-small is 100× cheaper and just as good. Don't use Ollama for TTS — ElevenLabs produces natural multi-speaker audio that open-source TTS can't match. Route each task to the right model.

3 Recommended Combos

Best Quality

GPT-4o + text-embedding-3-small + ElevenLabs

TaskProviderModelCost
ChatOpenAIGPT-4o$2.50/1M input, $10/1M output
EmbeddingsOpenAItext-embedding-3-small$0.02/1M tokens
TTSElevenLabsMultilingual v2$0.30/1K chars
TranscriptAnthropicClaude Sonnet 4$3/1M input, $15/1M output

Best for: published research, client-facing reports, anything where quality is visible. ~$20–40/month for a heavy user.

Best Value

GPT-4o-mini + text-embedding-3-small + ElevenLabs

TaskProviderModelCost
ChatOpenAIGPT-4o-mini$0.15/1M input, $0.60/1M output
EmbeddingsOpenAItext-embedding-3-small$0.02/1M tokens
TTSElevenLabsMultilingual v2$0.30/1K chars
TranscriptOpenAIGPT-4o-miniSame as chat

Best for: daily research, internal team use, learning. ~$5–15/month for a heavy user. GPT-4o-mini is 90% as good as GPT-4o for document Q&A at 1/20th the price.

Zero Cost

Groq + Ollama Embeddings + ElevenLabs

TaskProviderModelCost
ChatGroq (free tier)Llama 3.3 70B or DeepSeek-R1$0
EmbeddingsOllama (local)nomic-embed-text or qwen3-embedding$0
TTSElevenLabsMultilingual v2$0.30/1K chars
TranscriptGroq (free tier)Llama 3.3 70B$0

Best for: personal use, experimentation, privacy-critical docs (embeddings stay local). Groq free tier has rate limits — ~30 requests/minute. Enough for individual use.

Provider Compatibility Matrix

Not every provider supports every task type. Here's what works:

ProviderChatEmbeddingsTTSTranscriptBest For
OpenAIAll-around, best quality
AnthropicBest long-document analysis
Google Gemini2M token context window
OllamaPrivacy, zero ongoing cost
GroqFastest inference, free tier
DeepSeekCheap API, strong reasoning
ElevenLabsTTS only — best voice quality
MistralEU data residency
xAI (Grok)Large context, alternative to Google

Embedding Models: The Silent Quality Factor

Your embedding model determines how accurately Open Notebook finds relevant passages in your documents. A bad embedding model means the AI is answering from the wrong context.

ModelDimensionsCostBest For
text-embedding-3-small512–1536$0.02/1M tokensGeneral purpose, best cost/perf ratio
text-embedding-3-large256–3072$0.13/1M tokensWhen retrieval accuracy matters most
nomic-embed-text (Ollama)768$0Privacy, offline, 100% local
qwen3-embedding (Ollama)1024$0Best local embedding quality, multilingual
Embedding gotcha (GitHub #655): If you see "Connection error" for Ollama embeddings even though the connection test passes — it's the Esperanto library bug. Your embedding model must be pulled in Ollama (ollama pull nomic-embed-text) and the base URL must be the Docker service name, not localhost. More in FAQ →

Models Are One Piece. Production Is the Rest.

Next: configure these models in Open Notebook → or get the Production Manual for monitoring, CI/CD, and the 30+ errors that hit beyond a single user.

Get the Production Manual — $19