Publications

SwiftCIM: a 55nm 23.2μJ/Token L-0.5 ReRAM Coupled Digital CIM Accelerator with Fully-Fused Multi-Head Attention Dataflow for FlashAttention

Published in ESSERC, 2026

This paper presents a ReRAM-coupled digital CIM accelerator with fully fused multi-head attention dataflow for FlashAttention.

Recommended citation: Kunming Shao, Xiaomeng Wang, and collaborators. SwiftCIM: a 55nm 23.2μJ/Token L-0.5 ReRAM Coupled Digital CIM Accelerator with Fully-Fused Multi-Head Attention Dataflow for FlashAttention. In 2026 IEEE European Solid-State Electronics Research Conference (ESSERC), 2026. https://epapers2.org/esserc2026/ESR/paper_details.php?paper_id=8143

Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction

Published in IEEE TVLSI, 2026

This paper presents shift-aware on-the-fly aligned-mantissa bitwidth prediction for balancing FP8 accuracy and efficiency on digital CIM.

Recommended citation: Liang Zhao, Kunming Shao, Zhipeng Liao, Xijie Huang, Tim Kwang-Ting Cheng, Chi-Ying Tsui, and Yi Zou. Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2026. https://arxiv.org/abs/2602.05743

DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models

Published in DATE, 2026

This paper presents digital stochastic computing-in-memory with accurate OR-accumulation via sample region remapping for edge AI models.

Recommended citation: Kunming Shao, Liang Zhao, Jiangnan Yu, Zhipeng Liao, Xiaomeng Wang, Yi Zou, Tim Kwang-Ting Cheng, and Chi-Ying Tsui. DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models. In 2026 Design, Automation & Test in Europe Conference (DATE), 2026. https://arxiv.org/abs/2601.06724

Configurable Dataflow and Adaptive Mapping Optimization for Hybrid ReRAM and SRAM Compute-in-Memory Accelerator

Published in IEEE TCAD, 2026

This paper presents configurable dataflow and adaptive mapping optimization for hybrid ReRAM/SRAM compute-in-memory accelerators.

Recommended citation: Jingyu He, Xiaomeng Wang, Kunming Shao, Kwang-Ting Cheng, and Chi-Ying Tsui. Configurable Dataflow and Adaptive Mapping Optimization for Hybrid ReRAM and SRAM Compute-in-Memory Accelerator. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 45(3):1115-1128, 2026. https://doi.org/10.1109/tcad.2025.3596765

Lemem: A 179.8TFLOPS/W, 24.21TFLOPS Learning-In-Memory Processor with Layer-Fused Forward/Backward Pipeline for Edge DNN/SNN Training/Inference

Published in A-SSCC, 2025

This paper presents a learning-in-memory processor with a layer-fused forward/backward pipeline for edge DNN/SNN training and inference.

Recommended citation: Fengshi Tian, Kunming Shao, Jiakun Zheng, Zilu Liu, Hui Wu, Zhipeng Liao, Jingyu He, Xihao Guan, Pingcheng Dong, Chaoming Fang, Ziyang Shen, Shiqi Zhao, Jie Yang, Mohamad Sawan, Chi-Ying Tsui, and Kwang-Ting Tim Cheng. Lemem: A 179.8TFLOPS/W, 24.21TFLOPS Learning-In-Memory Processor with Layer-Fused Forward/Backward Pipeline for Edge DNN/SNN Training/Inference. In 2025 IEEE Asian Solid-State Circuits Conference (A-SSCC), pages 94-96. IEEE, 2025. https://doi.org/10.1109/a-sscc67472.2025.11349463

A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents

Published in BioCAS, 2025

This paper presents a hierarchical retrieval architecture for RAG-enabled wearable medical LLM agents.

Recommended citation: Zhipeng Liao, Kunming Shao, Jiangnan Yu, Liang Zhao, Tim Kwang-Ting Cheng, Chi-Ying Tsui, Jie Yang, and Mohamad Sawan. A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents. In 2025 IEEE Biomedical Circuits and Systems Conference (BioCAS), pages 66-70. IEEE, 2025. https://doi.org/10.1109/biocas67066.2025.00025

DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation

Published in ISLPED, 2025

This paper presents a high-density digital In-ReRAM computation architecture for edge RAG retrieval acceleration.

Recommended citation: Kunming Shao, Zhipeng Liao, Jiangnan Yu, Liang Zhao, Qiwei Li, Xijie Huang, Jingyu He, Fengshi Tian, Yi Zou, Xiaomeng Wang, Tim Kwang-Ting Cheng, and Chi-Ying Tsui. DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation. In 2025 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), pages 1-7. IEEE, 2025. https://doi.org/10.1109/islped65674.2025.11261807

A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination

Published in ISCAS, 2025

This paper presents a reconfigurable DNN accelerator for continuous activation/weight precisions with efficient weight combination.

Recommended citation: Liang Zhao, Kunming Shao, Fengshi Tian, Tim Kwang-Ting Cheng, Chi-Ying Tsui, and Yi Zou. A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination. In 2025 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1-5. IEEE, 2025. https://doi.org/10.1109/iscas56072.2025.11043465

E-NPU: A 34~126nJ/Class Event-Driven Adaptive Neural SoC with Signal-Dynamics-Aware Feature Clustering and Multi-Model In-Memory Inference/Training for Personalized Medical Wearables

Published in CICC, 2025

This paper presents an event-driven adaptive neural SoC with signal-dynamics-aware feature clustering and multi-model in-memory inference/training for personalized medical wearables.

Recommended citation: Fengshi Tian, Jinbo Chen, Kunming Shao, Zilu Liu, Jiakun Zheng, Hui Wu, Chaoming Fang, Xiaomeng Wang, Ziyang Shen, Pingcheng Dong, Yuan Yao, Xuliang Wang, Jie Yang, Mohamad Sawan, Chi-Ying Tsui, and Kwang-Ting Cheng. E-NPU: A 34~126nJ/Class Event-Driven Adaptive Neural SoC with Signal-Dynamics-Aware Feature Clustering and Multi-Model In-Memory Inference/Training for Personalized Medical Wearables. In 2025 IEEE Custom Integrated Circuits Conference (CICC), pages 1-3. IEEE, 2025. https://doi.org/10.1109/cicc63670.2025.10982760

SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis

Published in DATE, 2025

This paper presents a performance-aware DCIM compiler with multi-spec-oriented subcircuit synthesis.

Recommended citation: Kunming Shao, Fengshi Tian, Xiaomeng Wang, Jiakun Zheng, Jia Chen, Jingyu He, Hui Wu, Jinbo Chen, Xihao Guan, Yi Deng, Fengbin Tu, Jie Yang, Mohamad Sawan, Tim Kwang-Ting Cheng, and Chi-Ying Tsui. SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis. In 2025 Design, Automation & Test in Europe Conference (DATE), pages 1-7. IEEE, 2025. https://doi.org/10.23919/date64628.2025.10992849

ReSCIM: Variation-Resilient High Weight-Loading Bandwidth In-Memory Computation Based on Fine-Grained Hybrid Integration of Multi-Level ReRAM and SRAM Cells

Published in ICCAD, 2024

This paper presents high weight-loading bandwidth in-memory computation based on fine-grained hybrid integration of multi-level ReRAM and SRAM cells.

Recommended citation: Xiaomeng Wang, Jingyu He, Kunming Shao, Jiakun Zheng, Fengshi Tian, Tim Kwang-Ting Cheng, and Chi-Ying Tsui. ReSCIM: Variation-Resilient High Weight-Loading Bandwidth In-Memory Computation Based on Fine-Grained Hybrid Integration of Multi-Level ReRAM and SRAM Cells. In Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1-9. ACM, 2024. https://doi.org/10.1145/3676536.3676751

AutoDCIM: An Automated Digital CIM Compiler

Published in DAC, 2023

This paper presents a spec-to-layout circuit compiler for digital computing-in-memory macros.

Recommended citation: Jia Chen, Fengbin Tu, Kunming Shao, Fengshi Tian, Xiao Huo, Chi-Ying Tsui, and Kwang-Ting Cheng. AutoDCIM: An Automated Digital CIM Compiler. In 2023 60th ACM/IEEE Design Automation Conference (DAC), pages 1-6. IEEE, 2023. https://doi.org/10.1109/dac56929.2023.10247976