API v1 · Stable

NexToken API Reference

NexToken provides a unified REST API that routes your LLM requests to the optimal provider — OpenAI, Anthropic, Google DeepMind, Meta, DeepSeek, and Mistral — automatically. Drop-in compatible with the OpenAI API format. No SDK changes required for most integrations.

💡
OpenAI-Compatible
If you already use the OpenAI Python or Node.js SDK, simply change the base_url to https://api.nextoken.biz/v1 and your api_key to your NexToken key. No other changes needed for basic usage.

Authentication

All requests must include your NexToken API key in the Authorization header using the Bearer scheme.

Authorization header HTTP
Authorization: Bearer nxt_sk_••••••••••••••••••••••••••••••••

API keys are prefixed nxt_sk_ for standard keys and nxt_sub_ for sub-keys. Manage your keys in the API Keys dashboard.

⚠️
Keep your keys secret
Never expose API keys in client-side code, public repositories, or log files. Use environment variables or a secrets manager. Rotate keys immediately if compromised.

Quickstart

Make your first request in under 60 seconds. The example below sends a chat completion request routed automatically to the best available provider.

quickstart.sh bash
curl https://api.nextoken.biz/v1/chat/completions \ -H "Authorization: Bearer $NEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "nex-pro", "messages": [ {"role": "user", "content": "Hello, world!"} ] }'

Base URL & Versioning

All API endpoints are served from the base URL below. The current stable version is v1.

Base URL
https://api.nextoken.biz/v1

Breaking changes will be introduced under a new version prefix (e.g. /v2). Minor additions are non-breaking and released without version bumps. Subscribe to the status page for API deprecation notices.

⭐ NexToken Native Models

NexToken's proprietary models — built for cost-efficiency and Asia-Pacific compliance.

A single stable API across providers. We handle infrastructure, you focus on building. Underlying inference architecture is proprietary.

ModelContextPrice ($/1M in / out)Best for
nex-pro ★ Default32k$0.10 / $0.40Default choice for chat, code, content, summarisation. Self-hosted Singapore GPU. Strong Chinese + English. Lowest latency in APAC. ~95% cheaper than GPT-4o
nex-autosmart$0.30 / $1.20Network picks per-request between nex-pro / nex-reasoning. Actual target surfaced in nex.smart_router.
nex-reasoning128k$1.20 / $4.80Multi-step math, logic, structured analysis. No tool calling. ~90% cheaper than o1
nex-embed-zh512$0.01 / —Chinese-strong embeddings, 1024-dim. ~50% cheaper than text-embedding-3-small

Legacy IDs. nex-smart and nex-coder still work — both are transparent aliases of nex-pro. No code changes required if you're already using them.

Quick example: nex-pro

nex_pro_demo.py python
from openai import OpenAI client = OpenAI( api_key="nex_live_your_key", base_url="https://api.nextoken.biz/v1", ) # nex-pro — Singapore-hosted Qwen2.5-7B, 32K context, ~95% cheaper than GPT-4o response = client.chat.completions.create( model="nex-pro", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing in 3 sentences."}, ], ) print(response.choices[0].message.content) print(f"Cost: ${response.nex.cost_usd}") # Output: # Quantum computing uses qubits that can exist in superposition... # Cost: $0.000054 ← roughly 1/10 the cost of GPT-4o

When to use which

All NexToken Native chat models support streaming and tool calling (nex-reasoning excepted — see above). Response includes a nex.provider field set to "nex". Detailed pricing comparison: see pricing page.

Chat Completions

Create a model response for a given chat conversation. Fully compatible with the OpenAI Chat Completions schema.

POST /v1/chat/completions

Request body

ParameterTypeRequiredDescription
modelstringrequiredModel identifier. Recommended: nex-pro (self-hosted Singapore GPU, 32K context, ~95% cheaper than GPT-4o). Other Nex models: nex-reasoning, nex-auto. Or pass a vendor model directly: gpt-4o, claude-sonnet-4-6, gemini-2.5-pro, deepseek-v3.
messagesarrayrequiredArray of message objects with role (system/user/assistant) and content.
streambooleanoptionalIf true, returns a Server-Sent Events stream. Default: false.
max_tokensintegeroptionalMaximum tokens in the response. Defaults to model maximum.
temperaturenumberoptionalSampling temperature 0–2. Higher = more random. Default: 1.
top_pnumberoptionalNucleus sampling. Alterative to temperature. Default: 1.
toolsarrayoptionalArray of tool definitions for function calling. Only supported on compatible models.
nex_routingobjectoptionalNexToken routing hints. See Routing Hints below.
200 Success response
response.jsonjson
{ "id": "chatcmpl-nxt-7f3a9c2b", "object": "chat.completion", "created": 1718000000, "model": "gpt-4o", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19 }, "nex": { "provider": "openai", "latency_ms": 387, "cost_usd": 0.000052, "routing_score": 0.94 } }

The nex object in every response provides routing transparency: which provider served the request, total latency, exact cost charged, and the router's confidence score.

Streaming (SSE)

Set stream: true to receive a Server-Sent Events stream. Each event contains a delta with partial content. The stream terminates with a data: [DONE] sentinel.

stream chunkjson
data: { "id": "chatcmpl-nxt-7f3a9c2b", "object": "chat.completion.chunk", "choices": [{ "delta": { "content": "Hello" }, "finish_reason": null }] } data: [DONE]
ℹ️
Token usage for streaming responses is reported in the final chunk's usage field (where supported by the provider). If unavailable, NexToken estimates tokens using tiktoken and logs a warning in your request detail view.

Routing Hints (nex_routing)

Pass the optional nex_routing object to influence how NexToken routes your request.

FieldTypeDescription
strategystring"cost" · "latency" · "quality" · "balanced" (default)
providersarrayAllowlist of provider names. E.g. ["openai","anthropic"] pins routing to those two.
exclude_providersarrayDenylist of providers to never route to for this request.
fallbackbooleanIf true (default), automatically retry with next-best provider on failure.
max_fallback_attemptsintegerMaximum fallback retries. Default: 2.

Embeddings

Create vector embeddings from input text(s). Compatible with the OpenAI Embeddings schema, so existing OpenAI SDK code works unchanged. Routed to NexToken's self-hosted GPU in Singapore — your data stays in the region.

POST /v1/embeddings

Request body

FieldTypeRequiredDescription
modelstringrequiredEmbedding model id. Currently nex-embed-zh (BGE-large-zh-v1.5, 1024-dim).
inputstring | string[]requiredA single string or batch of strings to embed. Max 256 strings per call. Each string up to 512 tokens.
encoding_formatstringfloat (default) or base64.
dimensionsintegerAccepted but ignored — native dim is 1024.
userstringOptional end-user identifier for abuse monitoring.

Quick example

embed_demo.pypython
from openai import OpenAI client = OpenAI( api_key="nex_live_your_key", base_url="https://api.nextoken.biz/v1", ) resp = client.embeddings.create( model="nex-embed-zh", input=["Hello world", "你好,世界"], ) print(len(resp.data[0].embedding)) # 1024 print(resp.usage.total_tokens) print(resp.nex.cost_usd)

Response shape

response.jsonjson
{ "object": "list", "model": "nex-embed-zh", "data": [ { "object": "embedding", "index": 0, "embedding": [0.0334, 0.0146, … 1024 floats] }, { "object": "embedding", "index": 1, "embedding": [0.0653, ] } ], "usage": { "prompt_tokens": 5, "total_tokens": 5 }, "nex": { "provider": "nex", "cost_usd": "0.00000005", "latency_ms": 35, "request_id": "req_5d4500331a944fca9bba373e" } }

Errors: 404 NEX_EMBED_MODEL_UNKNOWN, 400 NEX_EMBED_EMPTY_INPUT, 401 NEX_AUTH_REQUIRED, 403 NEX_MODEL_NOT_ALLOWED, 429 NEX_RATE_LIMIT, 503 NEX_EMBED_UPSTREAM_FAILED. The full OpenAPI spec lives at embedding/api-spec/openapi.yaml.

List Models

Returns all models available for routing through your NexToken account.

GET /v1/models
200 Response
models.jsonjson
{ "data": [ { "id": "gpt-4o", "provider": "openai", "context_window": 128000, "streaming": true }, { "id": "claude-sonnet-4", "provider": "anthropic", "context_window": 200000, "streaming": true }, { "id": "gemini-2.5-pro", "provider": "google", "context_window": 1000000,"streaming": true } ] }

List Providers

Returns real-time health and availability status for all connected providers.

GET /v1/providers

Tokenize new · May 2026

Count tokens for a string or an OpenAI-style messages list without paying for an upstream call. The response carries an accuracy band — exact for OpenAI / GPT-4 family, approx_5pct for Claude / Llama / Mistral, approx_15pct for Chinese-friendly tokenizers (Qwen / DeepSeek / GLM).

POST /v1/tokenize
{ "model": "gpt-4o", "input": "Hello, NexToken!" } // or messages: { "model": "gpt-4o", "input": [{ "role": "user", "content": "Hi" }] } // → 200 OK { "model": "gpt-4o", "tokens": 6, "encoding": "tiktoken/o200k_base", "accuracy": "exact" }

Estimate Cost new · May 2026

Quote a chat completion before sending it. Returns wholesale + retail USD plus a fits flag that tells you whether the input is within the model's context window. Useful for budget gates and "show price before send" client UX.

POST /v1/estimate-cost
{ "model": "gpt-4o", "input": [{ "role": "user", "content": "Summarise this article ..." }], "expected_output_tokens": 500, "billing_tier": "pro" } // → 200 OK { "model": "gpt-4o", "input_tokens": 312, "output_tokens": 500, "wholesale_total_usd": "0.00578000", "retail_total_usd": "0.00686772", "context_window": 128000, "fits": true, "accuracy": "exact" }

Batch new · May 2026 · 30% off

Fan out up to 100 chat-completion items in one call. Each item gets its own response in the same order, with per-item retail cost. 30% retail discount applies to every successful item. Item shape mirrors OpenAI's /v1/batches input format so existing JSONL builders work unchanged.

POST /v1/batch
{ "items": [ { "custom_id": "row-1", "method": "POST", "url": "/v1/chat/completions", "body": { "model": "gpt-4o-mini", "messages": [{"role":"user","content":"Translate: Hello"}] } } // up to 100 items per call ] } // → 200 OK { "id": "batch_...", "item_count": 1, "success_count": 1, "discount_factor": 0.7, "total_retail_usd": "0.00000378", "items": [{"custom_id": "row-1", "response": {...}, "retail_usd": "0.00000378"}] }

Images new · May 2026

Generate images via DALL-E 3 / DALL-E 3 HD. OpenAI-compatible payload — the response carries a nex envelope with request id, provider, cost, and latency.

POST /v1/images/generations
{ "model": "dall-e-3", "prompt": "a kitten coding in cyberpunk style, neon lights", "n": 1, "size": "1024x1024" } // → 200 OK { "data": [{ "url": "https://..." }], "nex": { "provider": "openai", "cost_usd": "0.05000000", "request_id": "img_..." } }

Audio new · May 2026

Two endpoints: Whisper transcription + OpenAI TTS speech synthesis. Both bill at OpenAI list × 1.20 markup. Transcription bills by estimated minutes; speech bills by 1K input characters.

POST /v1/audio/transcriptions
// multipart/form-data — file=<audio bytes> · model=whisper-1 { "text": "Hello, this is NexToken.", "nex": { "provider": "openai", "cost_usd": "0.00072000", "estimated_minutes": 0.1 } }
POST /v1/audio/speech
{ "model": "tts-1", "input": "Hello from NexToken", "voice": "alloy" } // → 200 OK · audio/mpeg body · X-Nex-Cost-Usd / X-Nex-Request-Id headers

Prompt Templates new · May 2026

Customer-managed prompt templates with {{variable}} substitution. CRUD plus a server-side /render endpoint that's handy for testing variable substitution without firing a chat completion. Quotas: 200 templates × 64 KB / user.

POST /v1/templates
GET /v1/templates
GET /v1/templates/{id}
POST /v1/templates/{id}/render
// Create POST /v1/templates { "name": "customer-greeting", "content": "Hello, {{name}}! How can I help with your {{product}} order?" } // Render POST /v1/templates/<id>/render { "variables": { "name": "Alice", "product": "NexPro" } } // → { "rendered": "Hello, Alice! How can I help with your NexPro order?" }

Fine-tunes new · May 2026 · beta

API surface to queue, list, and poll fine-tune jobs against your training files. The shape mirrors OpenAI's /v1/fine_tuning/jobs so the OpenAI SDK targets it unchanged. Jobs currently stay in status: "queued" until the LoRA training worker is enabled — integrate today, take results when the backend comes online.

POST /v1/fine_tunes
GET /v1/fine_tunes/{id}
GET /v1/fine_tunes

Response nex Metadata expanded · May 2026

Every /v1/chat/completions response carries a nex envelope alongside the OpenAI-standard fields. The block grew in May 2026 to surface the new gateway capabilities:

{ // always present "provider": "openai", // resolved upstream "cost_usd": "0.00006300", // retail charged to wallet "request_id": "req_...", "latency_ms": 412, // only when relevant (otherwise null) "cached_input_tokens": 1024, // upstream prompt cache hit "semantic_cache_hit": { "similarity": 0.99, "age_seconds": 120, "original_request_id": "req_..." }, "smart_router": { "target_model": "nex-pro", "tier": "general", "reason": "chat content" }, "pii_redactions": { "cn_phone": 1, "email": 2 }, "injection_score": 3.5 // only in warn-mode; block-mode returns 422 }

Clients that ignore unknown JSON fields (the OpenAI SDK does by default) are unaffected by these additions — every new field is opt-in for whoever wants to inspect it.

List API Keys

GET /v1/keys

Returns all API keys in your account. Key secrets are never returned after creation — only masked prefixes.

Create API Key

POST /v1/keys
ParameterTypeRequiredDescription
namestringrequiredHuman-readable label for this key.
budget_usdnumberoptionalMonthly spending cap in USD. Requests return 402 when exceeded.
rpm_limitintegeroptionalPer-key RPM ceiling. Inherits account limit if omitted.
allowed_modelsarrayoptionalAllowlist of model IDs. All models allowed if omitted.
expires_atstringoptionalISO 8601 expiry timestamp. Key auto-revokes at this time.
⚠️
The full key secret (nxt_sk_…) is returned only once at creation. Store it securely — it cannot be retrieved again.

Revoke API Key

DELETE /v1/keys/{key_id}

Immediately revokes the key. In-flight streaming requests complete within 120 seconds. New requests with this key return 401 Unauthorized immediately.

Wallet Balance

GET /v1/wallet/balance
balance.jsonjson
{ "balance_usd": 48.32, "currency": "USD", "loyalty_tier": "silver", "billing_tier": "pro", "spend_this_month_usd": 201.68 }

Top Up Wallet

POST /v1/wallet/topup

Initiates a top-up via Stripe. Returns a checkout_url to redirect the user for payment. Programmatic top-up (saved card) is available for Business and Enterprise plans.

Usage Summary

GET /v1/usage/summary
Query ParamTypeDescription
fromstringISO 8601 start date. Default: start of current month.
tostringISO 8601 end date. Default: now.
group_bystringday · model · provider · key

Request Logs

GET /v1/usage/logs

Returns paginated request logs. Retention period depends on your plan: 7 days (Developer), 30 days (Pro), 90 days (Business), 365 days (Enterprise + Extended Audit Logs add-on).

Error Codes

NexToken uses standard HTTP status codes. All error responses include a JSON body with error.code and error.message.

400 bad_request
Malformed request body or invalid parameters.
401 unauthorized
Missing or invalid API key. Key may be revoked.
402 wallet_empty
Wallet balance is zero. Top up to resume requests.
403 budget_exceeded
Per-key monthly budget cap reached.
404 not_found
Resource (key, log entry, etc.) not found.
429 rate_limited
RPM limit exceeded. Check Retry-After header.
502 provider_error
Upstream provider returned an error. Fallback attempted.
503 no_provider
No healthy provider available for this model.

Rate Limits

Rate limits are enforced per API key using a sliding window algorithm. The current window and remaining capacity are returned in every response header.

Rate limit headersHTTP
X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 947 X-RateLimit-Reset: 1718000060 Retry-After: 13 (only on 429 responses)
PlanRPMDaily RequestsConcurrent Streams
Developer10010,0005
Pro1,000100,00025
Business10,0001,000,000100
EnterpriseCustomCustomCustom

SDKs & Libraries

NexToken is compatible with any OpenAI-compatible SDK. Simply point base_url at https://api.nextoken.biz/v1.

A native NexToken SDK with routing-specific features (provider pinning, cost callbacks, routing telemetry) is on the roadmap for Q3 2025.

Changelog

v1.2.0 — June 2025

v1.1.0 — April 2025

v1.0.0 — January 2025