Integration Guide
Add per-user budget enforcement to your LLM app in two calls. Use the SDK for Next.js — or the HTTP API for any language.
Quickstart — Next.js
Under 60 seconds. Two calls wrap your existing LLM logic. No architecture changes required.
Install the SDK
npm install @thsky-21/thskyshieldAdd your credentials
THSKYSHIELD_SITE_ID=your_site_id_here
THSKYSHIELD_KEY=your_api_key_hereWrap your LLM call
import { Thskyshield } from '@thsky-21/thskyshield'
const shield = new Thskyshield({
siteId: process.env.THSKYSHIELD_SITE_ID!,
apiKey: process.env.THSKYSHIELD_KEY!,
})
export async function POST(req: Request) {
const userId = /* your auth */
const { allowed, reason, requestId } = await shield.check({
externalUserId: userId,
model: 'gpt-4o',
estimatedTokens: { input: 500, output: 200 },
})
if (!allowed) {
return Response.json({ error: 'Request blocked.', reason }, { status: 429 })
}
const completion = await openai.chat.completions.create({ ... })
await shield.log({
requestId,
externalUserId: userId,
model: 'gpt-4o',
tokens: {
input: completion.usage?.prompt_tokens ?? 0,
output: completion.usage?.completion_tokens ?? 0,
},
})
return Response.json({ response: completion.choices[0].message.content })
}Why must log() be awaited?
If log() is fire-and-forget, a fast attacker can send a second request before the first cost is written to Redis — bypassing your budget.
Any stack. Two HTTP calls.
Two POST endpoints expose the same governance engine to any language or framework.
Integration examples
Python (httpx)
import httpx, os
API_KEY = os.getenv("THSKYSHIELD_API_KEY")
SITE_ID = os.getenv("THSKYSHIELD_SITE_ID")
BASE_URL = "https://thskyshield.com/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
async def check_budget(user_id: str, model: str,
input_tokens: int, output_tokens: int):
async with httpx.AsyncClient() as client:
r = await client.post(f"{BASE_URL}/check", headers=HEADERS, json={
"site_id": SITE_ID,
"external_user_id": user_id,
"model": model,
"estimated_tokens": {"input": input_tokens, "output": output_tokens},
})
return r.json()
async def log_usage(request_id: str, model: str,
input_tokens: int, output_tokens: int):
async with httpx.AsyncClient() as client:
await client.post(f"{BASE_URL}/log", headers=HEADERS, json={
"site_id": SITE_ID,
"request_id": request_id,
"model": model,
"actual_tokens": {"input": input_tokens, "output": output_tokens},
"status": "success",
})curl
# Step 1 — check before calling your LLM
curl -X POST https://thskyshield.com/api/v1/check \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"site_id": "YOUR_SITE_ID",
"external_user_id": "user_123",
"model": "gpt-4o",
"estimated_tokens": { "input": 800, "output": 300 }
}'
# Step 2 — log actual usage after your LLM responds
curl -X POST https://thskyshield.com/api/v1/log \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"site_id": "YOUR_SITE_ID",
"request_id": "req_a1b2c3d4e5f6",
"model": "gpt-4o",
"actual_tokens": { "input": 743, "output": 287 },
"status": "success"
}'SDK Reference
shield.check({ externalUserId, model, estimatedTokens?, promptHash?, plan? })Phase APre-call budget gate. Atomically reserves the estimated cost in Redis so parallel requests from the same user cannot race past the budget limit simultaneously. Pass plan to enforce a plan-specific daily budget. Returns a requestId — pass this to log() to link the two phases. Returns allowed: false with a reason code if the call should be blocked.
shield.log({ requestId, externalUserId, model, tokens })Phase BPost-call reconciliation. Pass the requestId returned by check() — the SDK uses it to look up the in-flight reservation and the effective plan, so you do not need to pass plan again. Releases the cost reservation and applies the actual token cost atomically. Must be awaited — not fire-and-forget.
Reason Codes
When allowed: false is returned, the reason field tells you exactly why.
| Code | Meaning |
|---|---|
BUDGET_EXCEEDED | User has hit their daily spend limit for their plan tier (or the site flat budget if no plan was passed). The LLM call was not made — zero cost incurred. |
VELOCITY_EXCEEDED | More than 60 requests in 60 seconds from the same user. True sliding window per (siteId, userId). |
PROMPT_REPEAT_DETECTED | Identical prompt hash seen more than 10 times in 60 seconds — automation replay flagged. |
REQUEST_COST_EXCEEDED | Estimated cost of a single request exceeds the $0.25 per-request hard cap. Check your estimated_tokens values. |
CIRCUIT_BREAKER_FALLBACK | Redis is degraded — circuit breaker has opened. The request is allowed to avoid blocking customers during our infra outage. Recovers automatically after 30 seconds. |
UNAUTHORIZED | API key verification failed. Deliberately opaque — the response does not reveal whether the key or site_id was the mismatch. |