The Hardware You Actually Need

Three tiers with exact specs and VRAM math. From $5/month to RTX 4090 — pick what fits your scale.

The Three Tiers at a Glance

Minimum

Cloud API Mode — No GPU Needed

Open Notebook app + SurrealDB. All AI inference happens on the provider's servers.

4 GB
RAM
2+ GB
Disk
None
GPU
$5/mo
VPS Cost

A $5/month DigitalOcean droplet or Hetzner CX22 runs this fine. Use OpenAI API, Groq (free tier), or Anthropic for inference. Open Notebook itself needs ~2 GB RAM; the rest is headroom for SurrealDB and file processing.

Recommended

Local Ollama Models — Consumer GPU

Run models locally for privacy and zero per-token cost. One user, one GPU.

16–32 GB
RAM
20+ GB
Disk
8–24 GB
VRAM
$0/mo
API Cost

GPU recommendations:

GPUVRAMBest Model FitApprox. Price
RTX 306012 GB7B models @ 8192 ctx~$280
RTX 407012 GB7B–13B models @ 4096 ctx~$550
RTX 409024 GB20B models @ 8192 ctx, or 7B @ 128K~$1,700
Apple M3/M432 GB unified7B–13B models @ 8192 ctx~$1,600+
2× RTX 309048 GB totalMixtral 8×7B, 34B models~$1,400 used
Production

Multi-User, High-Volume — Server GPU

5+ concurrent users running local models. Separate Ollama machine recommended.

64 GB
RAM
100+ GB
NVMe SSD
24–48 GB
VRAM
$200+/mo
Infra

For 5+ concurrent users, separate Ollama onto a dedicated GPU machine and connect via network URL. Run Open Notebook + SurrealDB + nginx on the app server, Ollama on the compute server. This prevents a single long inference from queuing all other users.

# On the Ollama machine:
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# In Open Notebook docker-compose.yml: OLLAMA_BASE_URL=http://10.0.0.5:11434

VRAM Math: What Actually Matters

The three variables that determine VRAM usage:

FactorHow It Affects VRAMRule of Thumb
Model parametersDetermines base memory. A 7B param model @ 4-bit quantization ≈ 4 GB.~0.5 GB per 1B params (4-bit quantized)
Context window (num_ctx)Each additional 2048 tokens ≈ 1 GB VRAM for 7B. Scales with parameters.1 GB per 2048 ctx @ 7B
Concurrent requestsOllama processes one request at a time per model. Multiple users queue.No VRAM multiplier, but latency grows
Critical: Open Notebook v1.8+ changed the default num_ctx from 128000 to 8192 to prevent OOM on consumer GPUs. If you bump it back up, calculate your VRAM first. A 7B model at 128K context needs ~64 GB VRAM — that's 2× RTX 4090 territory.

Quick formula for 4-bit quantized models:

VRAM ≈ (params_in_B × 0.5) + (num_ctx / 2048) × (params_in_B / 7) GB

Example: 7B model @ 8192 ctx → 3.5 + (8192/2048) × 1 = 7.5 GB VRAM. Fits in 8 GB.

Cloud vs Local: Cost Breakeven

If you're processing documents all day, local models pay for themselves quickly:

ScenarioCloud API (monthly)Local Ollama (monthly)Breakeven
Light use (50 docs, 200 queries)$3–8$0 + electricityNever — stay cloud
Medium use (200 docs, 1K queries)$15–30$0 + elec~2 years vs RTX 3060
Heavy use (1K docs, 5K queries)$50–100$0 + elec~6 months vs RTX 4090
Team of 5, heavy use$200–500$0 + elec + infra~3 months

Assumes GPT-4o-mini pricing (~$0.15/1M input, ~$0.60/1M output). Electricity: ~$15–40/month for a GPU machine under load. Detailed model cost comparison →

Picked Your Hardware?

Next: follow the deployment guide → or get the Production Manual for monitoring, CI/CD, and the 30+ errors that hit at scale.

Get the Production Manual — $19