Skip to content

xorl

High-performance distributed training for LLMs — RL, SFT, MoE, and beyond.

xorl is a distributed training framework for large language models built for flexibility — composable parallelism, LoRA and QLoRA fine-tuning, MoE, and both local and server training modes for online RL loops.

Composable parallelism

Data (FSDP2), Tensor, Pipeline, Expert, Ulysses sequence, and Ring Attention — all composable across any combination of dimensions.

Local and server training

Run directly with torchrun for offline training, or use the REST API server for online RL loops with live inference engines.

LoRA and QLoRA

Full LoRA support with QLoRA quantization in NVFP4, Block-FP8, and NF4 formats. Adaptive quantization noise and error correction built in.

Mixture of Experts

Fused expert kernels via Triton and Quack, Expert Parallelism with AllToAll and DeepEP (NVLink-optimized), routing cache and replay.

Weight sync

NCCL broadcast from training ranks to SGLang inference endpoints after each step, enabling tight online RL integration.

Muon optimizer

Newton-Schulz orthogonalized gradient descent for 2D+ weight matrices, with Nesterov momentum and configurable LR scaling.