Right Model, Right Task, Right Price
Per-task model routing: cheap for summaries, smart for analysis, local for privacy. 3 recommended combos benchmarked for quality, speed, and cost.
How Model Routing Works
Open Notebook lets you assign different AI providers to different tasks. You're not locked to one model for everything. This is the architecture:
| Task | What It Does | Quality Sensitivity |
|---|---|---|
| Chat / Analysis | Q&A over documents, summarization, key-point extraction | High — model quality determines answer depth |
| Embeddings | Converts documents into vector search indexes | Medium — embedding dim and retrieval accuracy |
| TTS (Text-to-Speech) | Podcast audio generation | High — voice naturalness and multi-speaker quality |
| Podcast Transcript | Generates the podcast script/dialogue | High — structure, flow, and accuracy of speakers |
3 Recommended Combos
GPT-4o + text-embedding-3-small + ElevenLabs
| Task | Provider | Model | Cost |
|---|---|---|---|
| Chat | OpenAI | GPT-4o | $2.50/1M input, $10/1M output |
| Embeddings | OpenAI | text-embedding-3-small | $0.02/1M tokens |
| TTS | ElevenLabs | Multilingual v2 | $0.30/1K chars |
| Transcript | Anthropic | Claude Sonnet 4 | $3/1M input, $15/1M output |
Best for: published research, client-facing reports, anything where quality is visible. ~$20–40/month for a heavy user.
GPT-4o-mini + text-embedding-3-small + ElevenLabs
| Task | Provider | Model | Cost |
|---|---|---|---|
| Chat | OpenAI | GPT-4o-mini | $0.15/1M input, $0.60/1M output |
| Embeddings | OpenAI | text-embedding-3-small | $0.02/1M tokens |
| TTS | ElevenLabs | Multilingual v2 | $0.30/1K chars |
| Transcript | OpenAI | GPT-4o-mini | Same as chat |
Best for: daily research, internal team use, learning. ~$5–15/month for a heavy user. GPT-4o-mini is 90% as good as GPT-4o for document Q&A at 1/20th the price.
Groq + Ollama Embeddings + ElevenLabs
| Task | Provider | Model | Cost |
|---|---|---|---|
| Chat | Groq (free tier) | Llama 3.3 70B or DeepSeek-R1 | $0 |
| Embeddings | Ollama (local) | nomic-embed-text or qwen3-embedding | $0 |
| TTS | ElevenLabs | Multilingual v2 | $0.30/1K chars |
| Transcript | Groq (free tier) | Llama 3.3 70B | $0 |
Best for: personal use, experimentation, privacy-critical docs (embeddings stay local). Groq free tier has rate limits — ~30 requests/minute. Enough for individual use.
Provider Compatibility Matrix
Not every provider supports every task type. Here's what works:
| Provider | Chat | Embeddings | TTS | Transcript | Best For |
|---|---|---|---|---|---|
| OpenAI | ✓ | ✓ | ✓ | ✓ | All-around, best quality |
| Anthropic | ✓ | ✗ | ✗ | ✓ | Best long-document analysis |
| Google Gemini | ✓ | ✓ | ✗ | ✓ | 2M token context window |
| Ollama | ✓ | ✓ | ✗ | ✓ | Privacy, zero ongoing cost |
| Groq | ✓ | ✗ | ✗ | ✓ | Fastest inference, free tier |
| DeepSeek | ✓ | ✗ | ✗ | ✗ | Cheap API, strong reasoning |
| ElevenLabs | ✗ | ✗ | ✓ | ✗ | TTS only — best voice quality |
| Mistral | ✓ | ✓ | ✗ | ✓ | EU data residency |
| xAI (Grok) | ✓ | ✗ | ✗ | ✗ | Large context, alternative to Google |
Embedding Models: The Silent Quality Factor
Your embedding model determines how accurately Open Notebook finds relevant passages in your documents. A bad embedding model means the AI is answering from the wrong context.
| Model | Dimensions | Cost | Best For |
|---|---|---|---|
| text-embedding-3-small | 512–1536 | $0.02/1M tokens | General purpose, best cost/perf ratio |
| text-embedding-3-large | 256–3072 | $0.13/1M tokens | When retrieval accuracy matters most |
| nomic-embed-text (Ollama) | 768 | $0 | Privacy, offline, 100% local |
| qwen3-embedding (Ollama) | 1024 | $0 | Best local embedding quality, multilingual |
ollama pull nomic-embed-text) and the base URL must be the Docker service name, not localhost. More in FAQ →
Models Are One Piece. Production Is the Rest.
Next: configure these models in Open Notebook → or get the Production Manual for monitoring, CI/CD, and the 30+ errors that hit beyond a single user.
Get the Production Manual — $19