Limitation and Roadmap
As a kick-off open-source project, Neutrino still faces many critical challenges and limitations, compared with mature projects like eBPF, awaiting for talents like you to contribute!
Roadmap
Tracks milestones achieved and is going to achieve.
Hook Driver
- Platform Support
- NVIDIA CUDA
- AMD ROCm
- Apple Metal
- Intel oneAPI
- Features
- Trace file system:
event.log
,kernel/
,result/
- Internal storage for binary and function
- Benchmark mode to measure
- Static Datamodel, e.g.,
warp:16:1
supporting parallel saving - Dynamic Datamodel, e.g.,
warp:16:count
supporting runtime determined probe size - Callback:
callback='dmat.py'
supporting runtime trace analysis - Platform-independence in trace file system and trace format
- Support
--memusage
to measure the maximum memory used in profiling - Support "--benchmark" to measure probing effect to kernel performance
- Support multi-threading via mutex for thread-safety
- Isolation of storage and probe supporting probes sharing one storage like eBPF map
- Structured kernel metadata via JSON
- Trace file system:
Probe Engine
- Platform Support
- NVIDIA PTX
- AMD GCMAsm CDNA
- AMD GCNAsm RDNA
- Apple AIR
- Intel VISA
- Features
- Reading runtime operands:
out
,in1
,in2
,in3
,addr
- Using device-side clock and time
- Automatically directing buffer for thread/warp
- Supporting
count
kernel for dynamic datamodel - Collecting assembler metadata like
no.register
- Supporting runtime security verification
- Supporting kernel filtering (
--kernel
/--filter
) - Making core modules platform-agnostic
- Reading runtime operands:
DSL and JITTER
- Backend Support
- NVIDIA PTX
- AMD GCMAsm CDNA
- AMD GCNAsm RDNA
- Apple AIR
- Intel VISA
- Features
- Supporting Python frontend via
ast
- Supporting runtime operands and lowering:
out
,in1
,in2
,in3
,addr
- Supporting device-side clock and time:
clock()
,time()
,cuid()
- Formalizing eBPF-like ISA, separating
out
fromdst
- Supporting 32bit register (eBPF
add32
etc). - Migrating eBPF Verifier for Neutrino
- Supporting
if/else/elif
- Supporting
for/while
under strict security verification
- Supporting Python frontend via
Utilities and Extensions
- Getting trace directory handle for interoperability with hook driver
- Tensor Trace: getting tensor shape and name from PyTorch
- NVTX-like source annotation API
- Integration with CUPTI / ROCTracer
- Integration with PyTorch/JAX built-in profiler
Limitation
NEUTRINO mainly has two inherent drawbacks:
- Neutrino cannot profile unprogrammable hardware events such as the cache miss but DMAT can be used for simulation.
- Neutrino profiling is based on exeuction so it is hard to profile stall cycles due to instruction-level scheduling.