News Release

An approach for processing compressed recommendation systems on ReRAM chip

Peer-Reviewed Publication

Higher Education Press

The processing flow of ARCHER

image: 

The processing flow of ARCHER

view more 

Credit: Xinyang SHEN, Xiaofei LIAO, Long ZHENG, Yu HUANG, Dan CHEN, Hai JIN

The random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems. ReRAM-based processing-in-memory PIM can resolve this problem by processing embedding vectors where they are stored. However, the embedding table can easily exceed the capacity limit of a monolithic ReRAM-based PIM chip.
To solve the problems, a research team led by Hai Jin published their new research on 15 October 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team deploys the decomposed model on-chip and leverage the high computing efficiency of ReRAM to compensate for the decompression performance loss. In this paper, we propose ARCHER, a ReRAM-based PIM architecture that implements fully on-chip recommendations under resource constraints.
The team observes the access pattern and computation pattern of the decompression. Based on the observation, the operations of each layer of the decomposed model are unified into multiply-and-accumulate operations and a hierarchical mapping schema is proposed to maximize resource utilization. Under the unified computation and mapping strategy, the team coordinates processing pipeline.  Experiments results show that ARCHER can support large practical recommendation model on monolithic ReRAM chip, while surpassing existing solutions in terms of performance and energy savings.
DOI: 10.1007/s11704-023-3397-x


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.