Password Memorization
examples/server/password_memorization/ — End-to-end test for the full training → weight sync → inference pipeline. Trains a model to memorize 3 secret project codes via SFT, syncs weights to a running xorl-sglang instance, and queries inference to verify recall.
Run:
# 1. Start training serverCUDA_VISIBLE_DEVICES=0,1,2,3 python -m xorl.server.launcher \ --mode auto \ --config examples/server/configs/full/qwen3_8b_full.yaml \ --api-port 6000
# 2. Start xorl-sglang inference (in another terminal)CUDA_VISIBLE_DEVICES=4 python -m sglang.launch_server \ --model-path Qwen/Qwen3-8B-FP8 --port 30000
# 3. Run the test (in another terminal)python examples/server/password_memorization/run_password_test.py \ --model Qwen/Qwen3-8B --steps 16 --lr 1e-5Options:
| Flag | Default | Description |
|---|---|---|
--model | (required) | HuggingFace model name |
--steps | 64 | Total training steps |
--lr | 1e-4 | Peak learning rate |
--lr-schedule | constant | constant, cosine, or warmup_cosine |
--sync-quant | fp8 | Sync quantization: fp8 or none |
--train-url | http://localhost:6000 | Training server URL |
--infer-url | http://localhost:30000 | Inference endpoint URL |
Weight sync pipeline
Section titled “Weight sync pipeline”| Training mode | Sync path |
|---|---|
| Full-weight bf16 | bf16 → fp8 requant → SGLang |
| LoRA | bf16 base + LoRA merged → fp8 requant → SGLang |
| QLoRA nvfp4 | nvfp4 dequant → bf16 merged → fp8 requant → SGLang |
| QLoRA block_fp8 | fp8 dequant → bf16 merged → fp8 requant → SGLang |
| QLoRA nf4 | nf4 dequant → bf16 merged → fp8 requant → SGLang |
Test matrix
Section titled “Test matrix”Qwen3-8B (4x H100):
| Mode | Steps | LR | Schedule | Result |
|---|---|---|---|---|
| Full-weight bf16 | 16 | 1e-5 | constant | 3/3 |
| LoRA rank 32 | 32 | 1e-4 | constant | 3/3 |
| QLoRA nvfp4 | 64 | 5e-5 | cosine | 3/3 |
| QLoRA block_fp8 | 64 | 5e-4 | constant | 3/3 |
Qwen3-Coder-30B-A3B (4-8x H100):
| Mode | Parallelism | Steps | LR | Schedule | Result |
|---|---|---|---|---|---|
| Full-weight bf16 | SP=4, shard=2 | 32 | 1e-5 | constant | 3/3 |
| LoRA rank 32 | SP=4 | 32 | 1e-4 | constant | 3/3 |
| QLoRA nvfp4 | EP=4, SP=4 | 128 | 5e-4 | cosine | 3/3 |
Qwen3-235B-A22B (8x H100, cross-node inference):
| Mode | Parallelism | Steps | LR | Schedule | Result |
|---|---|---|---|---|---|
| QLoRA nvfp4 | EP=8, SP=8 | 128 | 5e-4 | cosine | 3/3 |
| QLoRA nf4 | EP=8, SP=8 | 128 | 5e-4 | cosine | 3/3 |
See the example README for the full test matrix and detailed setup instructions.