top of page

PyTorchSim Tutorial

Organizers

김광선_edited.jpg

Gwangsun Kim

POSTECH

Associate Professor

양원혁_edited_edited.png

Wonhyuk Yang

POSTECH

Ph.D. Student

신윤선_edited_edited.jpg

Yunseon Shin

POSTECH

Integrated M.S./Ph.D. Student

우옥균_사진.jpg

Okkyun Woo

POSTECH

Integrated M.S./Ph.D. Student

Overview

Deep Neural Networks are rapidly growing in complexity, placing increasing demand on the performance and efficiency of Neural Processing Units (NPUs). Analytical models are useful for early-stage design exploration, but a cycle-accurate simulator is required to study real execution scenarios and system bottlenecks. However, existing NPU simulators suffer from several fundamental limitations — they lack proper compiler integration, do not support multi-model or multi-core execution, have limited ISA expressiveness, and often support only inference. As a result, they fail to provide a practical environment for full-stack design space exploration.

PyTorchSim addresses these challenges as a PyTorch-integrated NPU simulation framework that bridges machine learning frameworks, compiler flows, and cycle-accurate architecture simulation. It introduces a custom RISC-V-based ISA to support various acceleration units such as systolic arrays, and uses PyTorch 2’s compilation pipeline to lower DNN models from native PyTorch code through MLIR and LLVM into executable machine code. Our extended Gem5 and Spike simulators are used to model both functional behavior and timing characteristics of NPUs with high fidelity.

To overcome the performance bottleneck of conventional Instruction-Level Simulation (ILS), PyTorchSim additionally provides Tile-Level Simulation (TLS), which achieves far higher simulation speed by reusing offline tile-level latencies while still modeling DRAM and interconnect with cycle accuracy. TLS also extends naturally to sparse tensor operations by incorporating auxiliary per-tile latency obtained offline. Through this approach, PyTorchSim enables fast and accurate simulation of realistic workloads, making it a practical platform for NPU architecture research, HW/SW co-design, compiler optimization, and beyond.

You can find out more in our paper!

What You Will Learn

Background

  • Complete Overview of PyTorchSim: Architecture, motivation, and design goals

  • PyTorch Compile Pipeline: From PyTorch code → FX graph → MLIR → LLVM → executable ISA

  • NPU Architecture: Understanding TPU-style core design, dataflow, and memory hierarchy

Hands-on Implementation

  • Run and Analyze Various operators and DNN models using PyTorchSim

  • Hands-on practice with mapping, functional/timing modes, optimizations, schedulers, load generators, and log analysis

  • Extend the Simulator by implementing a new custom instruction for the NPU ISA

Schedule

KSC 2025 tutorial

- 2025/12/18 09:30 - 12:30 (KST)​

Time
Presenter
Session
Slide
Video
Part 1
Gwangsun Kim
Introduction to PyTorchSim
Part 2
Yunseon Shin
Hands-on Session ①: Understanding the overall behavior through basic configuration and examples
Part 3
Wonhyuk Yang
Hands-on Session ②: Advanced internal architecture and implementation of custom units for extending NPU models

Resources

PyTorchSim source code at GitHub

https://github.com/PSAL-POSTECH/PyTorchSim

JupyterHub for tutorial

https://www.psal.postech.ac.kr/jupyterhub

Docker Image for tutorial

https://github.com/PSAL-POSTECH/PyTorchSim/pkgs/container/torchsim_ksc2025

Reference

Wonhyuk Yang, Yunseon Shin, Okkyun Woo, Geonwoo Park, Hyungkyu Ham, Jeehoon Kang, Jongse Park, and Gwangsun Kim. 2025. PyTorchSim: A Comprehensive, Fast, and Accurate NPU Simulation Framework. In Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture (MICRO '25). Association for Computing Machinery, New York, NY, USA, 1363–1380. https://doi.org/10.1145/3725843.3756045

1_edited.png

POSTECH (RIST #4304)
67 Cheongam-Ro, Nam-Gu, Pohang, Gyeongbuk, Korea 37673

Tel: +82 54 279 2912

POSTECH_logo_white.png

©2023 by Parallel System Architecture Lab at POSTECH. Powered and secured by Wix

bottom of page