logoNeutrino

Software Support

Current software support mainly targets AI/ML workloads, and we are welcome contribution on testing neutrino on other frameworks and workloads. Support Matrix are summarized below

FrameworkStatus
cuBLAS/cuFFT/cuSparse...❌ (no plan for supporting)
CUTLASS✅ (with macro in building)
PyTorch✅ (with manual building)
JAX✅ (with envriable)
Triton
Taichi

Due to the uniqueness of neutrino, some special arangements might be applied for correct functioning for each framework:

cuBLAS/cuDNN

neutrino does not support these NVIDIA propietary product for several reason:

  1. NVIDIA updates its EULA on decompile/disassemble these propietary products.
  2. These propietary product heavily used dark apis, which is out of the scope.
  3. Even observation is made, optimization by developers are impossible as they are closed source.

Unfortunately, some drawbacks from not supporting cuBLAS/cuFFT:

  • PyTorch's nn.Linear and other matmul / conv operations can not be traced -> consider using CUTLASS instead.

PyTorch

Support for PyTorch requires modifying a line in its CMakeLists.txt and for simplicity, we provide pre-built wheels hosted in Cloudfare R2:

PYPI

Currently links are anonymoused as we're in Artifact Evaluation.

We are working on maintaining a pip source for better user experience. Stay tuned!

Support for PyTorch requries manual building to store PTX Assembly in installation (by default, PyTorch keeps only SASS):

  1. Clone the PyTorch: git clone --recursive https://github.com/pytorch/pytorch, add --branch to specify branch if need
  2. Following the guide to install dependnecies.
  3. Query compute capability via nvidia-smi --query-gpu=compute_cap --format=csv,noheader
  4. Modify the fatbin setting and add NVCC flags in pytorch/CMakeLists.txt, see below code block.
  5. Follow the guide to build and install PyTorch with +PTX in TORCH_CUDA_ARCH_LIST like TORCH_CUDA_ARCH_LIST="8.0+PTX".
pytorch/CMakeLists.txt, around line 660
string(APPEND CMAKE_CUDA_FLAGS " -Xfatbin -compress-all") // [\!code --]
string(APPEND CMAKE_CUDA_FLAGS " -Xfatbin --compress=false") // [\!code ++]

JAX/XLA

Simply add an envrironment variable when executing the XLA program:

XLA_FLAGS=--xla_gpu_generate_line_info='true' python jax_workload.py