API v1 · Stable

NexToken API Reference

NexToken provides a unified REST API that routes your LLM requests to the optimal provider — OpenAI, Anthropic, Google DeepMind, Meta, DeepSeek, and Mistral — automatically. Drop-in compatible with the OpenAI API format. No SDK changes required for most integrations.

💡
OpenAI-Compatible
If you already use the OpenAI Python or Node.js SDK, simply change the base_url to https://api.nextoken.biz/v1 and your api_key to your NexToken key. No other changes needed for basic usage.

Authentication

All requests must include your NexToken API key in the Authorization header using the Bearer scheme.

Authorization header HTTP
Authorization: Bearer nxt_sk_••••••••••••••••••••••••••••••••

API keys are prefixed nxt_sk_ for standard keys and nxt_sub_ for sub-keys. Manage your keys in the API Keys dashboard.

⚠️
Keep your keys secret
Never expose API keys in client-side code, public repositories, or log files. Use environment variables or a secrets manager. Rotate keys immediately if compromised.

Quickstart

Make your first request in under 60 seconds. The example below sends a chat completion request routed automatically to the best available provider.

quickstart.sh bash
curl https://api.nextoken.biz/v1/chat/completions \ -H "Authorization: Bearer $NEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Hello, world!"} ] }'

Base URL & Versioning

All API endpoints are served from the base URL below. The current stable version is v1.

Base URL
https://api.nextoken.biz/v1

Breaking changes will be introduced under a new version prefix (e.g. /v2). Minor additions are non-breaking and released without version bumps. Subscribe to the status page for API deprecation notices.

Chat Completions

Create a model response for a given chat conversation. Fully compatible with the OpenAI Chat Completions schema.

POST /v1/chat/completions

Request body

ParameterTypeRequiredDescription
modelstringrequiredModel identifier e.g. gpt-4o, claude-sonnet-4, gemini-2.5-pro. Use auto to let NexToken route to the optimal model.
messagesarrayrequiredArray of message objects with role (system/user/assistant) and content.
streambooleanoptionalIf true, returns a Server-Sent Events stream. Default: false.
max_tokensintegeroptionalMaximum tokens in the response. Defaults to model maximum.
temperaturenumberoptionalSampling temperature 0–2. Higher = more random. Default: 1.
top_pnumberoptionalNucleus sampling. Alterative to temperature. Default: 1.
toolsarrayoptionalArray of tool definitions for function calling. Only supported on compatible models.
nex_routingobjectoptionalNexToken routing hints. See Routing Hints below.
200 Success response
response.jsonjson
{ "id": "chatcmpl-nxt-7f3a9c2b", "object": "chat.completion", "created": 1718000000, "model": "gpt-4o", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19 }, "nex": { "provider": "openai", "latency_ms": 387, "cost_usd": 0.000052, "routing_score": 0.94 } }

The nex object in every response provides routing transparency: which provider served the request, total latency, exact cost charged, and the router's confidence score.

Streaming (SSE)

Set stream: true to receive a Server-Sent Events stream. Each event contains a delta with partial content. The stream terminates with a data: [DONE] sentinel.

stream chunkjson
data: { "id": "chatcmpl-nxt-7f3a9c2b", "object": "chat.completion.chunk", "choices": [{ "delta": { "content": "Hello" }, "finish_reason": null }] } data: [DONE]
ℹ️
Token usage for streaming responses is reported in the final chunk's usage field (where supported by the provider). If unavailable, NexToken estimates tokens using tiktoken and logs a warning in your request detail view.

Routing Hints (nex_routing)

Pass the optional nex_routing object to influence how NexToken routes your request.

FieldTypeDescription
strategystring"cost" · "latency" · "quality" · "balanced" (default)
providersarrayAllowlist of provider names. E.g. ["openai","anthropic"] pins routing to those two.
exclude_providersarrayDenylist of providers to never route to for this request.
fallbackbooleanIf true (default), automatically retry with next-best provider on failure.
max_fallback_attemptsintegerMaximum fallback retries. Default: 2.

List Models

Returns all models available for routing through your NexToken account.

GET /v1/models
200 Response
models.jsonjson
{ "data": [ { "id": "gpt-4o", "provider": "openai", "context_window": 128000, "streaming": true }, { "id": "claude-sonnet-4", "provider": "anthropic", "context_window": 200000, "streaming": true }, { "id": "gemini-2.5-pro", "provider": "google", "context_window": 1000000,"streaming": true } ] }

List Providers

Returns real-time health and availability status for all connected providers.

GET /v1/providers

List API Keys

GET /v1/keys

Returns all API keys in your account. Key secrets are never returned after creation — only masked prefixes.

Create API Key

POST /v1/keys
ParameterTypeRequiredDescription
namestringrequiredHuman-readable label for this key.
budget_usdnumberoptionalMonthly spending cap in USD. Requests return 402 when exceeded.
rpm_limitintegeroptionalPer-key RPM ceiling. Inherits account limit if omitted.
allowed_modelsarrayoptionalAllowlist of model IDs. All models allowed if omitted.
expires_atstringoptionalISO 8601 expiry timestamp. Key auto-revokes at this time.
⚠️
The full key secret (nxt_sk_…) is returned only once at creation. Store it securely — it cannot be retrieved again.

Revoke API Key

DELETE /v1/keys/{key_id}

Immediately revokes the key. In-flight streaming requests complete within 120 seconds. New requests with this key return 401 Unauthorized immediately.

Wallet Balance

GET /v1/wallet/balance
balance.jsonjson
{ "balance_usd": 48.32, "currency": "USD", "loyalty_tier": "silver", "billing_tier": "pro", "spend_this_month_usd": 201.68 }

Top Up Wallet

POST /v1/wallet/topup

Initiates a top-up via Stripe. Returns a checkout_url to redirect the user for payment. Programmatic top-up (saved card) is available for Business and Enterprise plans.

Usage Summary

GET /v1/usage/summary
Query ParamTypeDescription
fromstringISO 8601 start date. Default: start of current month.
tostringISO 8601 end date. Default: now.
group_bystringday · model · provider · key

Request Logs

GET /v1/usage/logs

Returns paginated request logs. Retention period depends on your plan: 7 days (Developer), 30 days (Pro), 90 days (Business), 365 days (Enterprise + Extended Audit Logs add-on).

Error Codes

NexToken uses standard HTTP status codes. All error responses include a JSON body with error.code and error.message.

400 bad_request
Malformed request body or invalid parameters.
401 unauthorized
Missing or invalid API key. Key may be revoked.
402 wallet_empty
Wallet balance is zero. Top up to resume requests.
403 budget_exceeded
Per-key monthly budget cap reached.
404 not_found
Resource (key, log entry, etc.) not found.
429 rate_limited
RPM limit exceeded. Check Retry-After header.
502 provider_error
Upstream provider returned an error. Fallback attempted.
503 no_provider
No healthy provider available for this model.

Rate Limits

Rate limits are enforced per API key using a sliding window algorithm. The current window and remaining capacity are returned in every response header.

Rate limit headersHTTP
X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 947 X-RateLimit-Reset: 1718000060 Retry-After: 13 (only on 429 responses)
PlanRPMDaily RequestsConcurrent Streams
Developer10010,0005
Pro1,000100,00025
Business10,0001,000,000100
EnterpriseCustomCustomCustom

SDKs & Libraries

NexToken is compatible with any OpenAI-compatible SDK. Simply point base_url at https://api.nextoken.biz/v1.

A native NexToken SDK with routing-specific features (provider pinning, cost callbacks, routing telemetry) is on the roadmap for Q3 2025.

Changelog

v1.2.0 — June 2025

v1.1.0 — April 2025

v1.0.0 — January 2025