The random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems. ReRAM-based processing-in-memory PIM can resolve this problem by processing embedding vectors where they are stored. However, the embedding table can easily exceed the capacity limit of a monolithic ReRAM-based PIM chip.
To solve the problems, a research team led by Hai Jin published their new research on 15 October 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team deploys the decomposed model on-chip and leverage the high computing efficiency of ReRAM to compensate for the decompression performance loss. In this paper, we propose ARCHER, a ReRAM-based PIM architecture that implements fully on-chip recommendations under resource constraints.
The team observes the access pattern and computation pattern of the decompression. Based on the observation, the operations of each layer of the decomposed model are unified into multiply-and-accumulate operations and a hierarchical mapping schema is proposed to maximize resource utilization. Under the unified computation and mapping strategy, the team coordinates processing pipeline. Experiments results show that ARCHER can support large practical recommendation model on monolithic ReRAM chip, while surpassing existing solutions in terms of performance and energy savings.
DOI: 10.1007/s11704-023-3397-x
Journal
Frontiers of Computer Science
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
ARCHER: a ReRAM-based accelerator for compressed recommendation systems
Article Publication Date
15-Oct-2024