LLM App · Documentation

Integration Guide

Add per-user budget enforcement to your LLM app in two calls. Use the SDK for Next.js — or the HTTP API for any language.

Next.js SDK

Quickstart — Next.js

Under 60 seconds. Two calls wrap your existing LLM logic. No architecture changes required.

1

Install the SDK

terminal
npm install @thsky-21/thskyshield
2

Add your credentials

.env.local
THSKYSHIELD_SITE_ID=your_site_id_here
THSKYSHIELD_KEY=your_api_key_here
3

Wrap your LLM call

app/api/chat/route.ts
import { Thskyshield } from '@thsky-21/thskyshield'

const shield = new Thskyshield({
  siteId: process.env.THSKYSHIELD_SITE_ID!,
  apiKey: process.env.THSKYSHIELD_KEY!,
})

export async function POST(req: Request) {
  const userId = /* your auth */

  const { allowed, reason, requestId } = await shield.check({
    externalUserId:  userId,
    model:           'gpt-4o',
    estimatedTokens: { input: 500, output: 200 },
  })

  if (!allowed) {
    return Response.json({ error: 'Request blocked.', reason }, { status: 429 })
  }

  const completion = await openai.chat.completions.create({ ... })

  await shield.log({
    requestId,
    externalUserId: userId,
    model:          'gpt-4o',
    tokens: {
      input:  completion.usage?.prompt_tokens ?? 0,
      output: completion.usage?.completion_tokens ?? 0,
    },
  })

  return Response.json({ response: completion.choices[0].message.content })
}

Why must log() be awaited?

If log() is fire-and-forget, a fast attacker can send a second request before the first cost is written to Redis — bypassing your budget.

HTTP API — any language

Any stack. Two HTTP calls.

Two POST endpoints expose the same governance engine to any language or framework.

POST /api/v1/checkPOST /api/v1/log

Integration examples

Python (httpx)

governance.py
import httpx, os

API_KEY  = os.getenv("THSKYSHIELD_API_KEY")
SITE_ID  = os.getenv("THSKYSHIELD_SITE_ID")
BASE_URL = "https://thskyshield.com/api/v1"
HEADERS  = {"Authorization": f"Bearer {API_KEY}"}

async def check_budget(user_id: str, model: str,
                        input_tokens: int, output_tokens: int):
    async with httpx.AsyncClient() as client:
        r = await client.post(f"{BASE_URL}/check", headers=HEADERS, json={
            "site_id":          SITE_ID,
            "external_user_id": user_id,
            "model":            model,
            "estimated_tokens": {"input": input_tokens, "output": output_tokens},
        })
        return r.json()

async def log_usage(request_id: str, model: str,
                    input_tokens: int, output_tokens: int):
    async with httpx.AsyncClient() as client:
        await client.post(f"{BASE_URL}/log", headers=HEADERS, json={
            "site_id":        SITE_ID,
            "request_id":     request_id,
            "model":          model,
            "actual_tokens":  {"input": input_tokens, "output": output_tokens},
            "status":         "success",
        })

curl

terminal
# Step 1 — check before calling your LLM
curl -X POST https://thskyshield.com/api/v1/check \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "site_id":          "YOUR_SITE_ID",
    "external_user_id": "user_123",
    "model":            "gpt-4o",
    "estimated_tokens": { "input": 800, "output": 300 }
  }'

# Step 2 — log actual usage after your LLM responds
curl -X POST https://thskyshield.com/api/v1/log \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "site_id":       "YOUR_SITE_ID",
    "request_id":    "req_a1b2c3d4e5f6",
    "model":         "gpt-4o",
    "actual_tokens": { "input": 743, "output": 287 },
    "status":        "success"
  }'
SDK Reference

SDK Reference

shield.check({ externalUserId, model, estimatedTokens?, promptHash?, plan? })Phase A
Promise<{ allowed: boolean, requestId: string, reason?: string, reserved?: string, currentSpend?: string, limit?: number, plan?: string }>

Pre-call budget gate. Atomically reserves the estimated cost in Redis so parallel requests from the same user cannot race past the budget limit simultaneously. Pass plan to enforce a plan-specific daily budget. Returns a requestId — pass this to log() to link the two phases. Returns allowed: false with a reason code if the call should be blocked.

shield.log({ requestId, externalUserId, model, tokens })Phase B
Promise<{ success: boolean, cost?: string, plan?: string }>

Post-call reconciliation. Pass the requestId returned by check() — the SDK uses it to look up the in-flight reservation and the effective plan, so you do not need to pass plan again. Releases the cost reservation and applies the actual token cost atomically. Must be awaited — not fire-and-forget.

Reason Codes

Reason Codes

When allowed: false is returned, the reason field tells you exactly why.

CodeMeaning
BUDGET_EXCEEDEDUser has hit their daily spend limit for their plan tier (or the site flat budget if no plan was passed). The LLM call was not made — zero cost incurred.
VELOCITY_EXCEEDEDMore than 60 requests in 60 seconds from the same user. True sliding window per (siteId, userId).
PROMPT_REPEAT_DETECTEDIdentical prompt hash seen more than 10 times in 60 seconds — automation replay flagged.
REQUEST_COST_EXCEEDEDEstimated cost of a single request exceeds the $0.25 per-request hard cap. Check your estimated_tokens values.
CIRCUIT_BREAKER_FALLBACKRedis is degraded — circuit breaker has opened. The request is allowed to avoid blocking customers during our infra outage. Recovers automatically after 30 seconds.
UNAUTHORIZEDAPI key verification failed. Deliberately opaque — the response does not reveal whether the key or site_id was the mismatch.

Need help with setup?

Reach out and we'll get you unblocked within 24 hours.

Contact us