NexToken API Reference
NexToken provides a unified REST API that routes your LLM requests to the optimal provider — OpenAI, Anthropic, Google DeepMind, Meta, DeepSeek, and Mistral — automatically. Drop-in compatible with the OpenAI API format. No SDK changes required for most integrations.
base_url to https://api.nextoken.biz/v1 and your api_key to your NexToken key. No other changes needed for basic usage.
Authentication
All requests must include your NexToken API key in the Authorization header using the Bearer scheme.
API keys are prefixed nxt_sk_ for standard keys and nxt_sub_ for sub-keys. Manage your keys in the API Keys dashboard.
Quickstart
Make your first request in under 60 seconds. The example below sends a chat completion request routed automatically to the best available provider.
Base URL & Versioning
All API endpoints are served from the base URL below. The current stable version is v1.
Breaking changes will be introduced under a new version prefix (e.g. /v2). Minor additions are non-breaking and released without version bumps. Subscribe to the status page for API deprecation notices.
Chat Completions
Create a model response for a given chat conversation. Fully compatible with the OpenAI Chat Completions schema.
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | required | Model identifier e.g. gpt-4o, claude-sonnet-4, gemini-2.5-pro. Use auto to let NexToken route to the optimal model. |
| messages | array | required | Array of message objects with role (system/user/assistant) and content. |
| stream | boolean | optional | If true, returns a Server-Sent Events stream. Default: false. |
| max_tokens | integer | optional | Maximum tokens in the response. Defaults to model maximum. |
| temperature | number | optional | Sampling temperature 0–2. Higher = more random. Default: 1. |
| top_p | number | optional | Nucleus sampling. Alterative to temperature. Default: 1. |
| tools | array | optional | Array of tool definitions for function calling. Only supported on compatible models. |
| nex_routing | object | optional | NexToken routing hints. See Routing Hints below. |
The nex object in every response provides routing transparency: which provider served the request, total latency, exact cost charged, and the router's confidence score.
Streaming (SSE)
Set stream: true to receive a Server-Sent Events stream. Each event contains a delta with partial content. The stream terminates with a data: [DONE] sentinel.
usage field (where supported by the provider). If unavailable, NexToken estimates tokens using tiktoken and logs a warning in your request detail view.Routing Hints (nex_routing)
Pass the optional nex_routing object to influence how NexToken routes your request.
| Field | Type | Description |
|---|---|---|
| strategy | string | "cost" · "latency" · "quality" · "balanced" (default) |
| providers | array | Allowlist of provider names. E.g. ["openai","anthropic"] pins routing to those two. |
| exclude_providers | array | Denylist of providers to never route to for this request. |
| fallback | boolean | If true (default), automatically retry with next-best provider on failure. |
| max_fallback_attempts | integer | Maximum fallback retries. Default: 2. |
List Models
Returns all models available for routing through your NexToken account.
List Providers
Returns real-time health and availability status for all connected providers.
List API Keys
Returns all API keys in your account. Key secrets are never returned after creation — only masked prefixes.
Create API Key
| Parameter | Type | Required | Description |
|---|---|---|---|
| name | string | required | Human-readable label for this key. |
| budget_usd | number | optional | Monthly spending cap in USD. Requests return 402 when exceeded. |
| rpm_limit | integer | optional | Per-key RPM ceiling. Inherits account limit if omitted. |
| allowed_models | array | optional | Allowlist of model IDs. All models allowed if omitted. |
| expires_at | string | optional | ISO 8601 expiry timestamp. Key auto-revokes at this time. |
nxt_sk_…) is returned only once at creation. Store it securely — it cannot be retrieved again.Revoke API Key
Immediately revokes the key. In-flight streaming requests complete within 120 seconds. New requests with this key return 401 Unauthorized immediately.
Wallet Balance
Top Up Wallet
Initiates a top-up via Stripe. Returns a checkout_url to redirect the user for payment. Programmatic top-up (saved card) is available for Business and Enterprise plans.
Usage Summary
| Query Param | Type | Description |
|---|---|---|
| from | string | ISO 8601 start date. Default: start of current month. |
| to | string | ISO 8601 end date. Default: now. |
| group_by | string | day · model · provider · key |
Request Logs
Returns paginated request logs. Retention period depends on your plan: 7 days (Developer), 30 days (Pro), 90 days (Business), 365 days (Enterprise + Extended Audit Logs add-on).
Error Codes
NexToken uses standard HTTP status codes. All error responses include a JSON body with error.code and error.message.
Retry-After header.Rate Limits
Rate limits are enforced per API key using a sliding window algorithm. The current window and remaining capacity are returned in every response header.
| Plan | RPM | Daily Requests | Concurrent Streams |
|---|---|---|---|
| Developer | 100 | 10,000 | 5 |
| Pro | 1,000 | 100,000 | 25 |
| Business | 10,000 | 1,000,000 | 100 |
| Enterprise | Custom | Custom | Custom |
SDKs & Libraries
NexToken is compatible with any OpenAI-compatible SDK. Simply point base_url at https://api.nextoken.biz/v1.
- Python —
pip install openai(official OpenAI SDK, setbase_url) - Node.js / TypeScript —
npm install openai - Go —
github.com/sashabaranov/go-openai - Rust —
async-openaicrate - LangChain — Use
ChatOpenAIwith customopenai_api_base - LlamaIndex — Use
OpenAI(api_base=...)
A native NexToken SDK with routing-specific features (provider pinning, cost callbacks, routing telemetry) is on the roadmap for Q3 2025.
Changelog
v1.2.0 — June 2025
- Added
nex_routing.strategyfield for per-request routing hints - Added
nex.routing_scoreto response metadata - Fixed: streaming token count now uses provider
usagefield where available - Fixed:
budget:zeroRedis flag now permanent (no TTL) — eliminates 1-hour grace period
v1.1.0 — April 2025
- Added DeepSeek V3 and Mistral Large 2 support
- Added sub-key management endpoints
- Added
X-RateLimit-*response headers - Improved GST invoice generation — tax base now calculated on post-discount amount
v1.0.0 — January 2025
- Initial stable release
- OpenAI-compatible Chat Completions endpoint
- Provider routing: OpenAI, Anthropic, Google, Meta (Llama)
- Wallet top-up via Stripe