Skip to content

How-to: control LLM cost (Gate 3)

Prerequisites

  • A running Meridian instance (Docker Compose)
  • Optionally an Ollama server and/or DeepSeek/Anthropic API keys

What you'll have after

Gate 3 configured to your budget — from $0 (Ollama only, air-gapped) to a hard-capped cloud setup that never exceeds a daily dollar ceiling.

How the tier router works

Gate 3 tries tiers in this fixed order and uses the first one configured/available:

Ollama (local, free) -> DeepSeek (cheap cloud) -> Anthropic (best cloud)

You choose the trade-off by which variables you set. If none are set, Gate 3 is effectively skipped and Gates 1+2 still run.

Variable Tier Cost Egress needed
OLLAMA_BASE_URL Ollama $0 No
DEEPSEEK_API_KEY DeepSeek Low Yes
ANTHROPIC_API_KEY Anthropic Higher Yes
LLM_DAILY_CAP_USD Hard ceiling on paid spend/day

Setup A — $0 / air-gapped (Ollama only)

No cloud, no egress, no spend. Best for regulated/offline environments.

# .env
OLLAMA_BASE_URL=http://host.docker.internal:11434
# do NOT set DEEPSEEK_API_KEY or ANTHROPIC_API_KEY

Pull a model on the Ollama host first, e.g.:

ollama pull qwen2.5-coder:7b

Verify Meridian can reach it by submitting a diff and confirming Gate 3 produces output (not skipped):

curl -s -X POST http://localhost:3011/api/cra/analyze \
  -H 'Content-Type: application/json' -H "Authorization: Bearer $CRA_API_TOKEN" \
  -d '{"repo_name":"t","branch":"x","commit_message":"t","diff":"+++ b/a.js\n+function f(){return 1}\n"}' \
  | jq '.gates.llm.status'

Expected (not "skipped"):

"pass"

Setup B — cloud with a hard daily cap

Use local Ollama for the bulk, fall back to cloud for hard cases, and cap the paid spend:

# .env
OLLAMA_BASE_URL=http://host.docker.internal:11434
DEEPSEEK_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
LLM_DAILY_CAP_USD=2.00

When the day's paid spend reaches LLM_DAILY_CAP_USD, Meridian stops calling paid tiers for the rest of the day. Gates 1 and 2 are unaffected — they are free and always run. Gate 3 falls back to whatever free tier remains (Ollama) once the cap is hit.

Pick a cap you can sleep through

Set the cap to the most you are willing to spend on a runaway day (e.g. an agent opening hundreds of PRs). A small cap with Ollama as the floor means cost is bounded and you never lose Gate 3 entirely.

Setup C — cloud only (no Ollama)

If you have no local GPU and accept egress + spend:

# .env
DEEPSEEK_API_KEY=sk-...
LLM_DAILY_CAP_USD=5.00
# Anthropic optional as a higher-quality fallback

DeepSeek-first keeps cost low; add ANTHROPIC_API_KEY only if you want the higher-quality tier for the cases DeepSeek's review is weak on.

Reducing spend without losing coverage

  • Run Ollama locally for pre-commit (free, fast feedback) and reserve cloud tiers for the CI gate (fewer runs, higher stakes).
  • Keep diffs small — Gate 3 cost scales with input size.
  • Let Gates 1 and 2 do the cheap work; they catch most secrets and structural issues for free, so Gate 3 only reasons about what is left.

Troubleshooting

Symptom Likely cause Fix
Gate 3 always skipped No tier configured / unreachable Set OLLAMA_BASE_URL or an API key; check connectivity
Unexpected cloud bill No cap set Set LLM_DAILY_CAP_USD
Gate 3 stops mid-day Cap reached Expected; raise the cap or rely on Ollama floor
Ollama unreachable from container host.docker.internal on Linux Add extra_hosts: ["host.docker.internal:host-gateway"] or use the host IP
Slow analyses Large local model on CPU Use a smaller coder model or a GPU host

Next: Solo dev scenario