Open source · MIT licensed · Built in Rust

Stop Getting
Throttled.

Nozzle is a local proxy that sits between your AI tools and LLM providers, enforcing a single token budget across everything on your machine. No more 429s. No more rate-limit hell.

The Wall Every Dev Hits

You're building. You're shipping. Then the platform decides you've had enough. Rate limits, 429s, and lockouts — not because you did anything wrong, but because big tech controls the tap.

429: Too Many Requests

Claude Code, your IDE, and three running agents all share one API key. The moment they overlap, your whole stack stops dead. One key, one limit, zero mercy.

HTTP 429

Sudden Lockouts

Platforms shadowban and throttle accounts they deem "high-risk" — often without notice, without appeal, and without recourse. You're dependent on their mood.

Account suspended

Invisible Throttling

Sometimes it's not a hard block — it's a slow strangulation. Responses crawl. Latency spikes. Your build slows to a halt while you debug a problem that isn't yours.

Degraded performance

Nozzle Routes Around It

Nozzle sits between your tools and the LLM providers. It enforces one shared token budget across your whole machine — so your tools cooperate instead of competing.

Your tools
Claude Code
Cursor · Agents
Local proxy
nozzle
:8770–:8773
Anthropic
api.anthropic.com
:8771
OpenAI
api.openai.com
:8772
Google
generativelanguage…
:8773
Token bucket algorithm

Shared budget across all your tools. Tokens refill at your configured rate — no tool can hog the limit.

Zero added latency

SSE streaming piped through immediately. Nozzle adds sub-millisecond overhead in the fast path.

Transparent proxy

All headers and paths forwarded unchanged. Your existing API keys work as-is. Change one env var, done.

Self-correcting estimates

Estimates tokens before the request, reads actual counts from the response, and corrects the bucket automatically.

Built for the Build

Every feature ships to solve a real problem developers hit every day.

Token Rate Limiting

A token bucket enforces your configured rate across all concurrent requests. Fast path for bursts, smart queuing when you're at the ceiling. Set it once to 80% of your API tier and never see a 429 again.

Multi-Provider Support

Anthropic, OpenAI, and Google out of the box — each on their own local port. Add any HTTP API as a provider; token extraction built in for all three, others fall back to body-length estimation.

Open Source & Extensible

MIT licensed. No black boxes. No paywalls. Community-driven and built in Rust for reliability. Add providers, tune limits, build a dashboard — it's all yours.

Zero Config Start

Works out of the box for Anthropic, OpenAI, and Google with sensible defaults. Installs as a macOS LaunchAgent in one command — starts on login, restarts on crash. Change one env var and you're proxying.

Up in 60 Seconds

One binary. No runtime dependencies. Installs as a background service.

Claude Code / aider / any SDK

ANTHROPIC_BASE_URL=http://127.0.0.1:8771 — drop it in your shell profile. Done.

Live stats at :8770

curl localhost:8770/status — see token counts, rate-limit delays, and bucket level per provider in real time.

3

LLM Providers

<1ms

Added overhead

100%

Open source

MIT

License

Built Together, Kept Free

Nozzle is early-stage and community-driven. There's real work to do — Linux support, Homebrew formula, per-provider rate limits, a proper dashboard, Prometheus metrics. If you've ever been throttled by a platform you depend on, you know why this matters. Come build with us.

Linux / systemd
Homebrew formula
More providers
Dashboard UI
Prometheus metrics
Per-provider limits