API Reference
All endpoints are served at http://<host>:<port>/. Training operations use a two-phase async protocol — see Server Architecture for details.
Training Operations
Section titled “Training Operations”| Method | Path | Description |
|---|---|---|
POST | /api/v1/forward_backward | Forward + backward pass. Returns UntypedAPIFuture. |
POST | /api/v1/forward | Forward pass only (no gradient). For eval or reference logprobs. |
POST | /api/v1/optim_step | Apply gradients, clip, step optimizer and LR scheduler. |
POST | /api/v1/retrieve_future | Poll for async result by request_id. |
Model / Session Management
Section titled “Model / Session Management”| Method | Path | Description |
|---|---|---|
POST | /api/v1/create_model | Create and register a new model session (LoRA or full-weight). |
POST | /api/v1/unload_model | Unload a session, freeing associated adapter state. |
POST | /api/v1/kill_session | Kill an active session; optionally reload weights from checkpoint. |
GET | /api/v1/session_info | List active sessions and their state. |
POST | /api/v1/create_session | Tinker SDK compatibility alias for create_model. |
POST | /api/v1/session_heartbeat | Tinker SDK keepalive. |
Checkpointing
Section titled “Checkpointing”| Method | Path | Description |
|---|---|---|
POST | /api/v1/save_weights | Save DCP checkpoint. path: null = auto-timestamped. |
POST | /api/v1/load_weights | Load DCP checkpoint and restore model weights + optimizer state. |
POST | /api/v1/list_checkpoints | List available checkpoints under output_dir. |
POST | /api/v1/delete_checkpoint | Delete a checkpoint by ID. |
POST | /api/v1/weights_info | Return checkpoint metadata for a model (used by xorl-client to load weights). |
POST | /api/v1/save_weights_for_sampler | Save sampler-format weights for inference. |
GET | /api/v1/training_runs | List training runs. |
Inference Integration
Section titled “Inference Integration”| Method | Path | Description |
|---|---|---|
POST | /add_inference_endpoint | Register an SGLang inference server for weight sync. |
POST | /remove_inference_endpoint | Unregister an inference endpoint. |
GET | /list_inference_endpoints | List all registered endpoints. |
POST | /api/v1/sync_inference_weights | Broadcast current weights to all inference endpoints via NCCL. |
POST | /api/v1/set_sync_quantization | Configure FP8 quantization for weight sync. |
POST | /api/v1/create_sampling_session | Load a LoRA adapter on inference server for sampling. |
Health & Control
Section titled “Health & Control”| Method | Path | Description |
|---|---|---|
GET | /health | Health check. Returns { "status": "healthy", "engine_running": bool }. |
GET | /api/v1/healthz | Tinker health check alias. |
GET | / | Root info. |
POST | /sleep | Offload model weights to CPU to free GPU memory. |
POST | /wake_up | Reload weights back to GPU after sleep. |
Source
Section titled “Source”| File | Description |
|---|---|
src/xorl/server/api_server/endpoints.py | All FastAPI endpoint handlers |
src/xorl/server/api_server/api_types.py | Pydantic request/response models |