Publication

Below is a list of my publications in reverse chronological order.

Journal Papers

StreamNet++:Memory-Efficient Streaming TinyML Model Compilation on Microcontrollers,
Chen-Fong Hsu, Hong-Sheng Zheng, Yu-Yuan Liu, Tsung Tai Yeh
in ACM Transactions on Embedded Computing Systems (TECS), 2024.
ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN Tensors,
Ching-Jui Lee, Tsung Tai Yeh
in ACM Transactions on Architecture and Code Optimization (TACO), 2024.
Pagoda: A GPU Runtime System For Narrow Tasks,
Tsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, Timothy G. Rogers,
in ACM Transactions on Parallel Computing (TOPC), Vol6, Issue 4, pp. 1-23, 2019.
An Energy-Efficient and Reliable Storage Mechanism for Data-Intensive Academic Archive Systems,
Tseng-Yi Chen, Hsin-Wen Wei, Tsung Tai Yeh, Tsan-Sheng Hsu, Wei-Kuan Shih,
in ACM Transactions on Storage (TOS), Vol 11(2), pp. 1-21, 2015.

Conference Papers

AQB8: Energy-Efficient Ray Traversal Accelerator Through Hierarchical Quantization,
Yen-Chieh Huang, Chen-Pin Yang Tsung Tai Yeh
The 52nd ACM/IEEE International Symposium on Computer Architecture (ISCA), 2025
Acceptance rate: 127/570 = 22%
EDA: Energy-Efficient Inter-Layer Model Compilation for Edge DNN Inference Acceleration,
Bo Ren Pao, I-Chia Chen, En-Hao Chang Tsung Tai Yeh
The 31th IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2025
Acceptance rate: 112/534 = 21%
TinyTS: Memory-Efficient TinyML Model Compiler Framework on Microcontrollers,
Yu-Yuan Liu, Hong-Sheng Zheng, Yu-Fang Hu, Chen-Fong Hsu, Tsung Tai Yeh
The 30th IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024
Acceptance rate: 75/410 = 18.3%
WER: Maximizing Parallelism of Irregular Graph Applications Through GPU Warp EqualizeR,
En-Ming Huang, Bo-Wun Cheng, Meng-Hsien Lin, Chun-Yi Lee, Tsung Tai Yeh
The 29th Asia and South Pacific Design Automation Conference (ASP-DAC), 2024
Acceptance rate: 140/483 = 29%
StreamNet: Memory-Efficient Streaming Tiny Deep Learning Inference on the Microcontroller,
Hong Sheng Zheng, Yu-Yuan Liu, Chen-Fong Hsu, Tsung Tai Yeh
Thirty-seventh Conference on Neural Information Processing Systems(NeuraIPS), 2023
Acceptance rate: 26.1% [paper]
COLAB: Collaborative and Efficient Processing of Replicated Cache Requests in GPU,
Bo-Wun Cheng, En-Ming Huang, Chen-Hao Chao, Wei-Fang Sun,Tsung-Tai Yeh, Chun-Yi Lee,
The 28th Asia and South Pacific Design Automation Conference (ASP-DAC), 2023 Acceptance rate: 102/328 = 31%
Deadline-Aware Offloading for High-Throughput Accelerators,
Tsung Tai Yeh, Matthew D. Sinclair, Brad Beckmann, Timothy G Rogers,
The 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021
Acceptance rate: (63/258) = 24.4% [slide] [video]
Dimensionality-Aware Redundant SIMT Instruction Elimination,
Tsung Tai Yeh, Roland Green, Timothy G. Rogers,
in ACM International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), 2020.
Acceptance rate: 86/476 = 18%
Optimizing GPU Cache Policies for ML Workloads,
Johnathan Alsop, Matthew D. Sinclair, Anthony Gutierrez, Srikant Bharadwaj, Xianwei Zhang, Bradford Beckmann, Alexandru Dutu, Onur Kayiran, Michael LeBeane, Brandon Potter, Sooraj Puthoor, Tsung Tai Yeh,
in IEEE International Symposium on Workload Characterization (IISWC), 2019.
Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks,
Tsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, Timothy G. Rogers,
in Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2017. Acceptance rate: 39/132 = 22%, Best Paper Nominee
CacheRAID: An Efficient Adaptive Write Cache Policy to Conserve RAID Disk Array Energy,
Tseng-Yi Chen, Tsung-Tai Yeh, Hsin-Wen Wei, Yu-Hsun Fang, Wei-Kuan Shih and Tsan- Sheng Hsu,
in Proceedings 2012 IEEE/ACM International Conference on Utility and Cloud Computing, 2012.
Efficient Parallel Algorithm for Nonlinear Dimensionality Reduction on GPU,
Tsung Tai Yeh, Tseng-Yi Chen, Yen-Chiu Chen, Wei-Kuan Shih,
in Proceedings 2010 IEEE International Conference on Granular Computing, 2010.

Posters

POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism,
Tsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, Timothy G. Rogers,
in Proceedings of International Conference on Parallel Architectures and Compilation (PACT), 2016.

Patent

LAXITY-AWARE, DYNAMIC PRIORITY VARIATION AT A PROCESSOR,
Tsung Tai Yeh, Bradford M. Beckmann, Sooraj Puthoor, Matthew D. Sinclair,
United States Patent Application Number: 16200503, Nov. 23, 2018.