Skip to content

SFT on No Robots

examples/server/no_robot_sft/ — Supervised fine-tuning on the No Robots dataset.

What it demonstrates:

  • LoRA SFT training loop driven by xorl_client.TrainingClient
  • Online tokenization using tinker-cookbook renderers
  • Initial validation step (forward-only, no gradients)
  • Linear learning rate decay
  • Periodic checkpoint saving and resume support
  • Per-token NLL metrics

Run:

Terminal window
# 1. Start the training server (4 GPUs)
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m xorl.server.launcher \
--mode auto \
--config examples/server/configs/lora/qwen3_8b_lora.yaml \
--api-port 6000
# 2. Run SFT (in another terminal)
pip install xorl-client tinker-cookbook
python examples/server/no_robot_sft/run_sft.py \
--config.base_url http://localhost:6000 \
--config.model_name Qwen/Qwen3-8B \
--config.lora_rank 32

The checked-in server config above loads Qwen/Qwen3-8B with LoRA rank 32. run_sft.py still defaults to an older 4B example, so pass the overrides above until those defaults are updated.

Config options:

FieldDefaultDescription
base_urlhttp://localhost:6000Training server URL
model_nameQwen/Qwen3-4B-Instruct-2507Model name (for tokenizer). Override to Qwen/Qwen3-8B for the 8B LoRA config above.
batch_size128Training batch size
learning_rate1e-4Peak learning rate
max_length32768Max sequence length
lora_rank64LoRA rank. Override to 32 to match examples/server/configs/lora/qwen3_8b_lora.yaml.
save_every20Checkpoint every N steps (0 = disabled)