Highlights
Programmability
Fine-Granularity
Versatility
Easy-to-use for Performance Engineers.
1
Write the probe.
Define contexted registers, Maps and Probes in Tracing DSL.
from neutrino import probe, Map
import neutrino.language as nl
# declare maps via decorated class for persistence
@Map(level="warp", type="array", size=8, cap=1)
class block_sched:
start: nl.u64
# declare probe registers shared across probes
start: nl.u64 = 0 # starting clock
# declare probe via decorated function
@probe(pos="kernel", level="warp", before=True)
def thread_start():
start = nl.clock()
2
Run it.
Initialize a new docs with a command.
Terminalneutrino -p probe.py python main.py [info] trace in ./◆ Choose a content source│ ● Fumadocs MDX│ ○ Content Collections
3
Analyze the Trace.
Easily reading traces with auto-generated code for analysis.
import struct
from neutrino import TraceHeader, TraceSection
class block_sched(NamedTuple):
start: int
def parse(path: str):
with open(path, "rb") as f:
header: TraceHeader = TraceHeader(struct.unpack("iiiiiiii", f.read(32)))
sections: List[TraceSection] = []
for _ in range(header.numProbes):)
event.log
4
Share it.
Share your probes with community via creating Github Issues or Gists.
Compatible with Most Ecosystem.
Hardware Compatibility
Works fluently on commonly used hardwares.
Platform | Support |
---|---|
NVIDIA/CUDA | ✅ Fully Supported |
AMD/ROCm | ✅ Supported on CDNA |
Intel/oneAPI | 🚀 Planning |
More to Come! | Raise Github Issue if you need! |
Software Compatibility
Integrated seamlessly with ecosystems
Platform | Support |
---|---|
PyTorch (and everything on top) | ✅ Supported (with custom build) |
Triton | ✅ Supported |
JAX | ✅ Supported (with envariable) |
More to Come! | Raise Github Issue if you need! |
Hackable for your need.
Designed with Extensibility
An approachable framework.
Neutrino consists of three components: Entry & Compiler, Hook Driver, and the Probe Engine. All can be easily extended.
Hook Driver
Hook driver captures driver call (load & launch) to provide runtime support, such as caching code(assembly) loaded.
Probe Engine
Probe engine extracts, prunes, probes and reassembles the GPU assembly from hook driver with probes from entry.
Demos Available at Simple Clicks.
Fully Open-Sourced and Evaluated
- Battery guaranteed.Actively maintained, open for contributions.
- Fully open-source.Open source, available on Github.
- Truly Collaborative.Share your probe via Issues or Gists.
- Read docsCheck codes