npm install @thsky-21/thskyshield · v2.3.2

Put a spending limit on your AI APIs.

Add Spending limits to your OpenAI, Claude, and Gemini APIs Thskyshield blocks AI requests that exceed budget before the API call executes.

Works with OpenAI, Claude, Gemini, and any model. No app rewrites. Live in 10 minutes.

OpenAI has no per-user spending limits. One prompt loop or malicious user can drain thousands of dollars overnight — and you won't know until the invoice lands.

$0.08
Cost blocked in the live demo attack vs $847 unprotected
<15ms
Budget check latency — invisible to your users
3 lines
Of code to wrap any LLM call

See a wallet-drain attack get blocked in real time — no account needed.

90 seconds. Pure simulation. Governed vs unprotected side by side.

Open live demo
This Has Already Happened to Founders Like You

AI APIs have no built-in spending limits.

Your OpenAI account will happily process 10,000 requests from one user overnight. A prompt loop, a runaway agent, or a single malicious actor is all it takes. By the time you see the invoice, the damage is done.

🔁
Prompt loops

A bug in your agent retries the same call in a tight loop. 4,000 requests. $1,200. You find out when OpenAI emails you.

😈
Malicious users

One bad actor discovers your chatbot endpoint and scripts it overnight. No rate limit. No budget ceiling. $3,000 gone.

🤖
Runaway agents

Your AI agent hits an edge case and enters an infinite tool-use loop. Fully autonomous. Fully expensive. Fully your problem.

✗ Without Thskyshield
$847.23
Overnight attack cost
  • → 4,200 GPT-4o requests executed
  • → No limit hit. All calls processed.
  • → You find out when the invoice arrives.
✓ With Thskyshield
$0.08
Actual cost incurred
  • → 3 requests processed under budget
  • → Budget ceiling hit. Kill-switch fires.
  • → 4,197 requests blocked before execution.

Based on the live demo simulation. See it run in real time.

Open live demo
How It Works

Sits between your app and the LLM. Blocks over-budget calls before they fire.

Thskyshield wraps your existing LLM calls with two lightweight SDK calls. Before the request executes, it checks the user's budget and atomically reserves the estimated cost in Redis. If they're over the limit, the API call never happens — no tokens burned, no cost incurred.

Your User
Your App
Thskyshield ✓ budget check
LLM Provider
Phase A — Pre-call
Budget check + atomic reservation

Before your LLM call runs, the SDK checks the user's remaining budget via an edge endpoint. If they're under the limit, the estimated cost is atomically reserved in Redis. If over — blocked instantly. Under 15ms.

Phase B — Post-call
Actual cost reconciliation

After the LLM responds, the SDK logs the real token cost. Redis is reconciled. Supabase gets a permanent audit record: model, cost, user, plan, outcome.

Watch both phases run in the live demo
app/api/chat/route.ts
// 1. Check budget before the call
const { allowed, reason, requestId } = await shield.check({
externalUserId: userId,
model: 'gpt-4o',
plan: user.plan, // 'free' | 'pro'
estimatedTokens: { input: 500, output: 200 },
});
if (!allowed) return Response.json({ reason }, { status: 429 });
// 2. Call your LLM normally
// const response = await openai.chat...
// 3. Log actual cost after the call
await shield.log({
requestId,
externalUserId: userId,
model: 'gpt-4o',
tokens: { input: usage.prompt_tokens, output: usage.completion_tokens }
});
The Hidden Bug in DIY Solutions

Why simple budget checks fail.

Most developers reach for a simple database read: if the user is under budget, let the request through. It looks correct. It isn't.

Under concurrent load, two requests can read the same balance at the same time, both pass the check, and both execute the LLM call — even if either one alone would have exceeded the budget. This is a classic read-modify-write race condition, and it means your budget ceiling isn't actually a ceiling.

What actually happens

  1. 1.Request A reads balance → $4.90 of $5.00 used → passes ✓
  2. 2.Request B reads balance → $4.90 of $5.00 used → passes ✓
  3. 3.Request A fires the LLM call → $0.20 cost
  4. 4.Request B fires the LLM call → $0.20 cost
  5. 5.Final spend: $5.30 — 6% over the limit nobody enforced.

At low traffic this is invisible. Under any real concurrent load — or a scripted attack — it compounds into serious overruns.

Naive approach — has a race condition
// ❌ Two concurrent requests can both pass
const spend = await db.getSpend(userId);
if (spend < limit) {
// Both requests reach here simultaneously
await callLLM(); // 💸 budget exceeded
}
Thskyshield — atomic reservation, no race
// ✅ Lua script reserves cost atomically in Redis
const { allowed } = await shield.check({
externalUserId: userId,
model: 'gpt-4o',
estimatedTokens: { input: 500, output: 200 },
});
if (allowed) await callLLM(); // ✓ safe
⚛️Lua script runs atomically in Redis
🔒Cost reserved before the call fires
🚫No two requests can race past the limit
Enforced at edge in under 15ms
The Problem with Logging Tools

Observability tells you what happened.
Thskyshield stops it before it does.

Tools like Helicone and LangSmith log prompts, tokens, and latency. That's useful — but they only observe after the cost already happened.

Observability tools
Helicone, LangSmith, etc.
  • Request fires → tokens burn → cost occurs
  • You see what happened in a dashboard
  • Useful for debugging and analytics
  • Does not prevent runaway costs
Thskyshield
Pre-request enforcement
  • Budget check runs before the request fires
  • Over-limit calls blocked — zero tokens burned
  • Spend visible in dashboard in real time
  • Prevents runaway costs, not just records them
60-Second Setup

One npm install.
Zero app rewrites.

Thskyshield wraps your existing LLM calls — no refactoring, no new architecture. Add shield.check() before and shield.log() after. Done.

  • Works with GPT-4o, Claude, Gemini, or any model
  • Per-user and per-plan budget limits in the dashboard
  • Fail-open: if our API is down, your app stays up
  • Real-time spend dashboard + full audit log
  • Free users at $5/day, pro users at $50/day — enforced automatically
Read the full docs
// middleware.ts
import { shield } from "@thskyshield/next";

export default shield({
  routes: ["/api/*", "/dashboard/*"],
  policy: "strict"
});
Built For AI Apps

If you're shipping an AI feature, install this first.

Every product that makes LLM API calls on behalf of users is exposed. Recognize yourself below.

💬
AI Chatbots
Stop users from looping your support bot or assistant into a $500/day habit.
🤖
AI Agents
Autonomous agents can spiral. Budget limits cap the blast radius when a task goes wrong.
✍️
AI Copilots
Per-user daily caps keep your free tier free — and your pro tier profitable.
🧱
AI SaaS
Enforce per-plan tiers. Free users at $5/day, pro at $50/day. Set once, enforced always.
⚙️
AI Workflows
Automated LLM pipelines can loop silently. Budget limits are your circuit breaker.

You need this if any of these are true:

You call OpenAI, Anthropic, or Gemini APIs
Users can trigger LLM calls freely
You have a free or trial tier
You're building an AI agent or workflow
You haven't set per-user cost limits yet
You'd rather prevent a $3k bill than explain it
Under the Hood

Built on the right stack.

Upstash Redis for sub-15ms atomic reservations. Supabase for permanent audit records. Vercel Edge for global coverage.

Global Traffic
Thskyshield Edge
Your SaaS App
Free to Start

Install it before you launch. Not after.

Sign up in 30 seconds. Add your first site, grab your API key, and your first governed call is live in under 10 minutes. No credit card. No waitlist.

Free
To start
10 min
To integrate
0
App rewrites

Common questions.

npm install @thsky-21/thskyshield

Your next billing cycle starts soon.

Every day without a spending limit is a day one bad request — or one bad actor — can drain your OpenAI credits. Takes 60 seconds to install.

Start Free — No Credit Card

Free · No credit card · Deploy in 60 seconds