top of page

Open Source

PyTorchSim: A Comprehensive, Fast, and Accurate NPU Simulation Framework

https://github.com/PSAL-POSTECH/PyTorchSim

​PyTorchSim is a fast and cycle-accurate NPU simulation framework with comprehensive feature support:

  • Integrated with PyTorch 2, it can simulate existing PyTorch models by simply designating a simulated NPU as the target device.

  • Provides an NPU-specific compiler backend based on MLIR and LLVM, enabling compiler optimizations and supporting the simulation of both training and inference.

  • Supports multi-core NPU and multi-model tenancy with detailed interconnect and DRAM models (Booksim and Ramulator 2).

  • Can model data-dependent timing behavior, such as that of mixture-of-experts models.

  • Implements a custom RISC-V–based ISA with a rich instruction set to express various operations in AI models.

  • Employs a Tile-Level Simulation technique, which enables fast simulation without loss of accuracy.

  • Validated against Google TPU v3, showing a mean absolute error (MAE) of 11.5%.

Simulator for Memory-Mapped Near-Data Processing (M²NDP)

https://github.com/PSAL-POSTECH/M2NDP-public

This is a cycle-level simulator developed to model the M²NDP architecture proposed in the paper, Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders, MICRO'24.
Here are some high-level features of the M²NDP architecture:

  • General-purpose NDP for CXL memory: Enables general-purpose (rather than application-specific) NDP in CXL memory for diverse real-world workloads.

  • Low-overhead offloading: Supports low-overhead NDP offloading and management with M²func (Memory-Mapped function).

  • Cost-effective NDP unit design: The M²μthread (Memory-Mapped μthreading) execution model based on an extended RISC-V vector extension efficiently utilizes resources to maximize concurrency.

ONNXim: A Fast, Cycle-level Multi-core NPU Simulator

https://github.com/PSAL-POSTECH/ONNXim

ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference. Its features include the following:

  • Faster simulation speed in comparison to other detailed NPU simulation frameworks (see the figure below).

  • Support for modeling multi-core NPUs.

  • Support for cycle-level simulation of memory (through Ramulator) and network-on-chip (through Booksim2), which is important for properly modeling memory-bound operations in deep learning.

  • Use of ONNX graphs as DNN model specifications, enabling simulation of DNNs implemented in different deep learning frameworks (e.g., PyTorch and TensorFlow).

  • Support language models that do not use ONNX graphs. Additionally, enable auto-regressive generation phases and iteration-level batching.

For more details, please refer to our paper below:

Hyungkyu Ham, Wonhyuk Yang, Yunseon Shin, Okkyun Woo, Guseul Heo, Sangyeop Lee, Jongse Park, Gwangsun Kim, "ONNXim: A Fast, Cycle-level Multi-core NPU Simulator," [ IEEE Xplore ] [ arXiv

Simulator for GPUs with Heterogeneous Memory Stack (HMS) [HPCA'24]

https://github.com/PSAL-POSTECH/accelsim_HMS

This repository contains the source code of our modified Accel-sim simulator used for our work below that proposed Heterogeneous Memory Stack (HMS):

Jeongmin Hong, Sungjun Cho, Geonwoo Park, Wonhyuk Yang, Young-Ho Gong and Gwangsun Kim, "Bandwidth-Effective DRAM Cache for GPU s with Storage-Class Memory," HPCA'24.

1_edited.png

POSTECH (RIST #4304)
67 Cheongam-Ro, Nam-Gu, Pohang, Gyeongbuk, Korea 37673

Tel: +82 54 279 2912

POSTECH_logo_white.png

©2023 by Parallel System Architecture Lab at POSTECH. Powered and secured by Wix

bottom of page