429: Too Many Requests
Claude Code, your IDE, and three running agents all share one API key. The moment they overlap, your whole stack stops dead. One key, one limit, zero mercy.
HTTP 429Nozzle is a local proxy that sits between your AI tools and LLM providers, enforcing a single token budget across everything on your machine. No more 429s. No more rate-limit hell.
You're building. You're shipping. Then the platform decides you've had enough. Rate limits, 429s, and lockouts — not because you did anything wrong, but because big tech controls the tap.
Claude Code, your IDE, and three running agents all share one API key. The moment they overlap, your whole stack stops dead. One key, one limit, zero mercy.
HTTP 429Platforms shadowban and throttle accounts they deem "high-risk" — often without notice, without appeal, and without recourse. You're dependent on their mood.
Account suspendedSometimes it's not a hard block — it's a slow strangulation. Responses crawl. Latency spikes. Your build slows to a halt while you debug a problem that isn't yours.
Degraded performanceNozzle sits between your tools and the LLM providers. It enforces one shared token budget across your whole machine — so your tools cooperate instead of competing.
Shared budget across all your tools. Tokens refill at your configured rate — no tool can hog the limit.
SSE streaming piped through immediately. Nozzle adds sub-millisecond overhead in the fast path.
All headers and paths forwarded unchanged. Your existing API keys work as-is. Change one env var, done.
Estimates tokens before the request, reads actual counts from the response, and corrects the bucket automatically.
Every feature ships to solve a real problem developers hit every day.
A token bucket enforces your configured rate across all concurrent requests. Fast path for bursts, smart queuing when you're at the ceiling. Set it once to 80% of your API tier and never see a 429 again.
Anthropic, OpenAI, and Google out of the box — each on their own local port. Add any HTTP API as a provider; token extraction built in for all three, others fall back to body-length estimation.
MIT licensed. No black boxes. No paywalls. Community-driven and built in Rust for reliability. Add providers, tune limits, build a dashboard — it's all yours.
Works out of the box for Anthropic, OpenAI, and Google with sensible defaults. Installs as a macOS LaunchAgent in one command — starts on login, restarts on crash. Change one env var and you're proxying.
One binary. No runtime dependencies. Installs as a background service.
ANTHROPIC_BASE_URL=http://127.0.0.1:8771 — drop it in your shell profile. Done.
curl localhost:8770/status — see token counts, rate-limit delays, and bucket level per provider in real time.
LLM Providers
Added overhead
Open source
License
Nozzle is early-stage and community-driven. There's real work to do — Linux support, Homebrew formula, per-provider rate limits, a proper dashboard, Prometheus metrics. If you've ever been throttled by a platform you depend on, you know why this matters. Come build with us.