PyTorchSim Tutorial @ ISPASS 2026

Organizers

Gwangsun Kim

POSTECH

Associate Professor

Wonhyuk Yang

POSTECH

Ph.D. Student

Yunseon Shin

POSTECH

Integrated M.S./Ph.D. Student

Okkyun Woo

POSTECH

Integrated M.S./Ph.D. Student

Overview

Deep Neural Networks are rapidly growing in complexity, placing increasing demand on the performance and efficiency of Neural Processing Units (NPUs). Analytical models are useful for early-stage design exploration, but a cycle-accurate simulator is required to study real execution scenarios and system bottlenecks. However, existing NPU simulators suffer from several fundamental limitations — they lack proper compiler integration, do not support multi-model or multi-core execution, have limited ISA expressiveness, and often support only inference. As a result, they fail to provide a practical environment for full-stack design space exploration.

PyTorchSim addresses these challenges as a PyTorch-integrated NPU simulation framework that bridges machine learning frameworks, compiler flows, and cycle-accurate architecture simulation. It introduces a custom RISC-V-based ISA to support various acceleration units such as systolic arrays, and uses PyTorch 2’s compilation pipeline to lower DNN models from native PyTorch code through MLIR and LLVM into executable machine code. Our extended Gem5 and Spike simulators are used to model both functional behavior and timing characteristics of NPUs with high fidelity.

To overcome the performance bottleneck of conventional Instruction-Level Simulation (ILS), PyTorchSim additionally provides Tile-Level Simulation (TLS), which achieves far higher simulation speed by reusing offline tile-level latencies while still modeling DRAM and interconnect with cycle accuracy. TLS also extends naturally to sparse tensor operations by incorporating auxiliary per-tile latency obtained offline. Through this approach, PyTorchSim enables fast and accurate simulation of realistic workloads, making it a practical platform for NPU architecture research, HW/SW co-design, compiler optimization, and beyond.

You can find out more in our paper!

What You Will Learn

Overview of PyTorchSim

Motivation for PyTorchSim development
NPU model
Tile-Level Simulation (TLS)
Compilation flow
Validation and speed results
Future directions

Basic usage of PyTorchSim

Specifying the PyTorchSim NPU device
Mapping strategies
Execution modes in PyTorchSim
TOGSim configuration & log analysis
Compiler optimization
Scheduler & load generator

Internals and how to extend PyTorchSim

PyTorchSim internals
Hands-on: Extending custom instruction level

Schedule

ISPASS 2026 tutorial

- 2026/4/26 13:30 - 17:10 (KST)

Time	Presenter	Session
0:00-0:00	Gwangsun Kim	Introduction to PyTorchSim
0:00-0:00	Yunseon Shin	Hands-on Session I: Basic Usage
0:00-0:00	Wonhyuk Yang	Hands-on Session II: Internals and How to Extend PyTorchSim
0:00-0:00	William J. Song	Introduction to NPUWattch – ML-based PAT Modeling
0:00-0:00	Sehyeon Kim	Hands-on Session – Using and Extending NPUWattch

Resources

PyTorchSim source code at GitHub

https://github.com/PSAL-POSTECH/PyTorchSim

PyTorchSim tutorial slides

https://drive.google.com/drive/folders/1sXfkMVTH1T11v2gQREB5WHIQhnzoloaf?usp=sharing

Setting up the PyTorchSim Tutorial Environment

https://github.com/PSAL-POSTECH/PyTorchSim/tree/ispass2026/tutorial/jupyterhub

Reference

PyTorchSim: A Comprehensive, Fast, and Accurate NPU Simulation Framework

The 58th IEEE/ACM International Symposium on Microarchitecture (MICRO) (Acceptance rate: 20.8%)

Wonhyuk Yang*, Yunseon Shin*, Okkyun Woo*, Geonwoo Park, Hyungkyu Ham, Jeehoon Kang, Jongse Park, Gwangsun Kim (*: co-first authors)

[ ACM_DL ] [ Slides ] [ Github ]