Active development · testnet

INFERENCE · OPENAI-COMPATIBLE

Drop-in. Decentralised.

Same SDKs you already use. SpaceRouter places the work on the cheapest available GPU on the network. Llama, DeepSeek, Mixtral, Qwen — plus vision, embeddings and voice.

Inference API

spacerouter.ai

OpenAI-compatible inference, routed across our decentralised GPU network. Same SDKs you already use. Lower prices. Models that wouldn't fit in a single data centre.

Quickstart

Three lines to inference

curl https://spacerouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $SPACEROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-70B-Instruct",
    "messages": [
      { "role": "user", "content": "Hello from my agent" }
    ]
  }'

Base URL

https://spacerouter.ai/v1

Auth

Bearer token

Format

OpenAI-compatible

Models

Open models, decentralised serving

Routed to the cheapest available GPU that meets the VRAM requirement. Add more by request.

Chat8B

Llama 3.1 8B

Fast general-purpose chat

VRAM ≥ 16 GB$0.05 / $0.08
Chat70B

Llama 3.1 70B

High-quality reasoning and chat

VRAM ≥ 40 GB$0.40 / $0.60
Chat405B

Llama 3.1 405B

Largest open model, multi-GPU

VRAM ≥ 200 GB$1.80 / $2.50
Chat7B

Mistral 7B

Efficient instruction-following

VRAM ≥ 14 GB$0.05 / $0.07
Chat8x7B

Mixtral 8x7B

Mixture-of-experts, fast and capable

VRAM ≥ 26 GB$0.24 / $0.48
Chat8x22B

Mixtral 8x22B

Large MoE for complex tasks

VRAM ≥ 90 GB$0.90 / $1.20
Chat671B MoE

DeepSeek V3

State-of-the-art open MoE

VRAM ≥ 80 GB$0.27 / $1.10
Code16B

DeepSeek Coder V2

Code generation and completion

VRAM ≥ 16 GB$0.14 / $0.28
Code34B

CodeLlama 34B

Code-specialised Llama variant

VRAM ≥ 20 GB$0.20 / $0.40
Chat3.8B

Phi-3 Mini

Small but capable, runs on any GPU

VRAM ≥ 8 GB$0.04 / $0.06
Chat72B

Qwen 2.5 72B

Multilingual reasoning model

VRAM ≥ 40 GB$0.42 / $0.62
Chat27B

Gemma 2 27B

Efficient mid-size chat

VRAM ≥ 24 GB$0.18 / $0.30
Embed335M

BGE Large

Text embedding model

VRAM ≥ 4 GB$0.02 / $0.00
Vision34B

LLaVA 1.6 34B

Vision-language model

VRAM ≥ 24 GB$0.30 / $0.50

Prices in $/M tokens (input / output). Final pricing on the pricing page.

Drop-in compatible

Use the OpenAI Python or TypeScript SDK. Just swap the base URL.

Routed to cheapest GPU

SpaceRouter discovers nodes that can serve your model and picks the best price/latency.

Voice & embeddings

TTS metered per minute. Embeddings priced per million tokens. Same key works across all.

[ space-os ]