Installation
Requirements
Section titled “Requirements”- Python 3.12
- CUDA 12.9+
- PyTorch 2.10+
- NVIDIA Hopper GPU (H100/H800) or newer recommended for NVFP4 and DeepEP
Clone the repo
Section titled “Clone the repo”git clone --recurse-submodules https://github.com/togethercomputer/xorl-internalcd xorl-internalAlready cloned without
--recurse-submodules? Rungit submodule update --init --recursive
Install with uv (recommended)
Section titled “Install with uv (recommended)”uv is the recommended package manager for reproducible installs.
# Install uv if not already installedcurl -LsSf https://astral.sh/uv/install.sh | sh
# Install and activateuv syncsource .venv/bin/activateuv sync reads pyproject.toml and installs all pinned dependencies into a .venv virtual environment.
Install with conda
Section titled “Install with conda”conda create -n xorl python=3.12conda activate xorlpip install -e .Install Submodules
Section titled “Install Submodules”The repo ships two git submodules under submodules/:
| Submodule | Description |
|---|---|
| xorl-client | Lightweight Python client for the XoRL training service. Required for server/RL training mode. |
| xorl-sglang | XoRL’s fork of SGLang. Used as the inference engine in online RL loops. |
Install individually:
pip install -e submodules/xorl-clientpip install -e "submodules/xorl-sglang/python[all]"Alternatively, use the bundled pyproject.sglang.toml which pins PyTorch to 2.9.1 (required by sglang) and installs xorl, xorl-client, and xorl-sglang together:
uv:
cp pyproject.sglang.toml pyproject.tomluv syncsource .venv/bin/activateconda:
conda create -n xorl-sglang python=3.12conda activate xorl-sglangcp pyproject.sglang.toml pyproject.tomlpip install -e .Note: The default
pyproject.tomluses PyTorch 2.10.0. sglang requires PyTorch 2.9.1, so the two cannot coexist in the same environment unless you usepyproject.sglang.toml.
These submodules are only needed for server training / online RL. If you are only running local SFT or pretraining, you can skip this step.
Key Dependencies
Section titled “Key Dependencies”| Package | Version | Notes |
|---|---|---|
| PyTorch | 2.10.0+cu129 | CUDA 12.9 build |
| Flash Attention 3 | custom | FA3 + FA4 wheels |
| Triton | 3.6.0 | MoE fused kernels |
| Transformers | 5.0+ | Model loading |
| FastAPI + uvicorn | latest | Server training API |
| pyzmq | latest | Worker communication |
| wandb | latest | Experiment tracking (optional) |
Verify Installation
Section titled “Verify Installation”python -c "import xorl; print('xorl ok')"python -c "import flash_attn_interface; print('flash_attn_3 ok')"python -c "from flash_attn.cute import flash_attn_func; print('flash_attn_4 ok')"python -c "import deep_ep; print('deepep ok')" # optionalDeepEP Install (Optional)
Section titled “DeepEP Install (Optional)”DeepEP is an NVLink-optimized MoE dispatch backend. It is only required when using ep_dispatch: deepep in your config — the default ep_dispatch: alltoall works without it. Install it from https://github.com/deepseek-ai/DeepEP.
Multi-node prerequisites
Section titled “Multi-node prerequisites”For multi-node EP, DeepEP uses NVSHMEM for inter-node RDMA. Two additional steps are required on every node.
1. Load nvidia_peermem
nvidia_peermem bridges the NVIDIA driver and the InfiniBand stack to enable GPUDirect RDMA. Without it, NVSHMEM cannot register GPU buffers with IB HCAs and DeepEP will crash with SIGABRT at the first dispatch.
sudo modprobe nvidia_peermemVerify it is loaded:
lsmod | grep nvidia_peermemTo persist across reboots, add it to /etc/modules:
echo nvidia_peermem | sudo tee -a /etc/modules2. Enable IBGDA in the NVIDIA driver
IBGDA allows NVSHMEM to initiate RDMA transfers directly from GPU SM threads without CPU involvement. Add the following to /etc/modprobe.d/nvidia.conf on every node:
options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"Then rebuild the initramfs and reboot:
sudo update-initramfs -usudo rebootVerify the settings are active after reboot:
sudo cat /proc/driver/nvidia/params | grep -E "EnableStreamMemOPs|RegistryDwords"# Expected:# EnableStreamMemOPs: 1# RegistryDwords: "PeerMappingOverride=1;"Note:
nvidia_peermemmust still be loaded after reboot — it is not automatically enabled by the IBGDA driver settings.
Next Steps
Section titled “Next Steps”Head to the Quick Start to run your first training job.