Skip to content

API Reference

All endpoints are served at http://<host>:<port>/. Training operations use a two-phase async protocol — see Server Architecture for details.

MethodPathDescription
POST/api/v1/forward_backwardForward + backward pass. Returns UntypedAPIFuture.
POST/api/v1/forwardForward pass only (no gradient). For eval or reference logprobs.
POST/api/v1/optim_stepApply gradients, clip, step optimizer and LR scheduler.
POST/api/v1/retrieve_futurePoll for async result by request_id.
MethodPathDescription
POST/api/v1/create_modelCreate and register a new model session (LoRA or full-weight).
POST/api/v1/unload_modelUnload a session, freeing associated adapter state.
POST/api/v1/kill_sessionKill an active session; optionally reload weights from checkpoint.
GET/api/v1/session_infoList active sessions and their state.
POST/api/v1/create_sessionTinker SDK compatibility alias for create_model.
POST/api/v1/session_heartbeatTinker SDK keepalive.
MethodPathDescription
POST/api/v1/save_weightsSave DCP checkpoint. path: null = auto-timestamped.
POST/api/v1/load_weightsLoad DCP checkpoint and restore model weights + optimizer state.
POST/api/v1/list_checkpointsList available checkpoints under output_dir.
POST/api/v1/delete_checkpointDelete a checkpoint by ID.
POST/api/v1/weights_infoReturn checkpoint metadata for a model (used by xorl-client to load weights).
POST/api/v1/save_weights_for_samplerSave sampler-format weights for inference.
GET/api/v1/training_runsList training runs.
MethodPathDescription
POST/add_inference_endpointRegister an SGLang inference server for weight sync.
POST/remove_inference_endpointUnregister an inference endpoint.
GET/list_inference_endpointsList all registered endpoints.
POST/api/v1/sync_inference_weightsBroadcast current weights to all inference endpoints via NCCL.
POST/api/v1/set_sync_quantizationConfigure FP8 quantization for weight sync.
POST/api/v1/create_sampling_sessionLoad a LoRA adapter on inference server for sampling.
MethodPathDescription
GET/healthHealth check. Returns { "status": "healthy", "engine_running": bool }.
GET/api/v1/healthzTinker health check alias.
GET/Root info.
POST/sleepOffload model weights to CPU to free GPU memory.
POST/wake_upReload weights back to GPU after sleep.
FileDescription
src/xorl/server/api_server/endpoints.pyAll FastAPI endpoint handlers
src/xorl/server/api_server/api_types.pyPydantic request/response models