Skip to content

Installation

  • Python 3.12
  • CUDA 12.9+
  • PyTorch 2.10+
  • NVIDIA Hopper GPU (H100/H800) or newer recommended for NVFP4 and DeepEP

uv is the recommended package manager for reproducible installs.

Terminal window
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install
git clone https://github.com/togethercomputer/xorl
cd xorl
uv sync
source .venv/bin/activate

uv sync reads pyproject.toml and installs all pinned dependencies.

Terminal window
pip install -e .
PackageVersionNotes
PyTorch2.10.0+cu129CUDA 12.9 build
Flash Attention 3customFA3 + FA4 wheels
Triton3.6.0MoE fused kernels
Transformers5.0+Model loading
FastAPI + uvicornlatestServer training API
pyzmqlatestWorker communication
wandblatestExperiment tracking (optional)
Terminal window
python -c "import xorl; print('xorl ok')"
python -c "import flash_attn_interface; print('flash_attn_3 ok')"
python -c "from flash_attn.cute import flash_attn_func; print('flash_attn_4 ok')"
python -c "import deep_ep; print('deepep ok')" # optional

DeepEP is an NVLink-optimized MoE dispatch backend. It is only required when using ep_dispatch: deepep in your config — the default ep_dispatch: alltoall works without it. Install it from https://github.com/deepseek-ai/DeepEP.

For multi-node EP, DeepEP uses NVSHMEM for inter-node RDMA. Two additional steps are required on every node.

1. Load nvidia_peermem

nvidia_peermem bridges the NVIDIA driver and the InfiniBand stack to enable GPUDirect RDMA. Without it, NVSHMEM cannot register GPU buffers with IB HCAs and DeepEP will crash with SIGABRT at the first dispatch.

Terminal window
sudo modprobe nvidia_peermem

Verify it is loaded:

Terminal window
lsmod | grep nvidia_peermem

To persist across reboots, add it to /etc/modules:

Terminal window
echo nvidia_peermem | sudo tee -a /etc/modules

2. Enable IBGDA in the NVIDIA driver

IBGDA allows NVSHMEM to initiate RDMA transfers directly from GPU SM threads without CPU involvement. Add the following to /etc/modprobe.d/nvidia.conf on every node:

options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"

Then rebuild the initramfs and reboot:

Terminal window
sudo update-initramfs -u
sudo reboot

Verify the settings are active after reboot:

Terminal window
sudo cat /proc/driver/nvidia/params | grep -E "EnableStreamMemOPs|RegistryDwords"
# Expected:
# EnableStreamMemOPs: 1
# RegistryDwords: "PeerMappingOverride=1;"

Note: nvidia_peermem must still be loaded after reboot — it is not automatically enabled by the IBGDA driver settings.

Head to the Quick Start to run your first training job.