logoNeutrino

Limitation and Roadmap

As a kick-off open-source project, Neutrino still faces many critical challenges and limitations, compared with mature projects like eBPF, awaiting for talents like you to contribute!

Roadmap

Tracks milestones achieved and is going to achieve.

Hook Driver

  • Platform Support
    • NVIDIA CUDA
    • AMD ROCm
    • Apple Metal
    • Intel oneAPI
  • Features
    • Trace file system: event.log, kernel/, result/
    • Internal storage for binary and function
    • Benchmark mode to measure
    • Static Datamodel, e.g., warp:16:1 supporting parallel saving
    • Dynamic Datamodel, e.g., warp:16:count supporting runtime determined probe size
    • Callback: callback='dmat.py' supporting runtime trace analysis
    • Platform-independence in trace file system and trace format
    • Support --memusage to measure the maximum memory used in profiling
    • Support "--benchmark" to measure probing effect to kernel performance
    • Support multi-threading via mutex for thread-safety
    • Isolation of storage and probe supporting probes sharing one storage like eBPF map
    • Structured kernel metadata via JSON

Probe Engine

  • Platform Support
    • NVIDIA PTX
    • AMD GCMAsm CDNA
    • AMD GCNAsm RDNA
    • Apple AIR
    • Intel VISA
  • Features
    • Reading runtime operands: out, in1, in2, in3, addr
    • Using device-side clock and time
    • Automatically directing buffer for thread/warp
    • Supporting count kernel for dynamic datamodel
    • Collecting assembler metadata like no.register
    • Supporting runtime security verification
    • Supporting kernel filtering (--kernel/--filter)
    • Making core modules platform-agnostic

DSL and JITTER

  • Backend Support
    • NVIDIA PTX
    • AMD GCMAsm CDNA
    • AMD GCNAsm RDNA
    • Apple AIR
    • Intel VISA
  • Features
    • Supporting Python frontend via ast
    • Supporting runtime operands and lowering: out, in1, in2, in3, addr
    • Supporting device-side clock and time: clock(), time(), cuid()
    • Formalizing eBPF-like ISA, separating out from dst
    • Supporting 32bit register (eBPF add32 etc).
    • Migrating eBPF Verifier for Neutrino
    • Supporting if/else/elif
    • Supporting for/while under strict security verification

Utilities and Extensions

  • Getting trace directory handle for interoperability with hook driver
  • Tensor Trace: getting tensor shape and name from PyTorch
  • NVTX-like source annotation API
  • Integration with CUPTI / ROCTracer
  • Integration with PyTorch/JAX built-in profiler

Limitation

NEUTRINO mainly has two inherent drawbacks:

  1. Neutrino cannot profile unprogrammable hardware events such as the cache miss but DMAT can be used for simulation.
  2. Neutrino profiling is based on exeuction so it is hard to profile stall cycles due to instruction-level scheduling.