Dong et al., 2025 - Google Patents

Topkima-Former: Low-Energy, Low-Latency Inference for Transformers Using Top-TEXPRESERVE0 In-Memory ADC

Dong et al., 2025

Document ID: 535558286133078898
Author: Dong S; Yang J; Peng X; Shang H; Ke Y; Yang X; Liu H; Basu A
Publication year: 2025
Publication venue: IEEE Transactions on Circuits and Systems I: Regular Papers

External Links

Cited by

Snippet

Transformer has emerged as a leading architecture in neural language processing (NLP) and computer vision (CV). However, the extensive use of nonlinear operations, like softmax, poses a performance bottleneck during transformer inference and comprises up to 40% of …

Continue reading at arxiv.org (PDF) (other versions)

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/406—Management or control of the refreshing or charge-regeneration cycles
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/58—Random or pseudo-random number generators
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/56—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using storage elements with more than two stable states represented by steps, e.g. of voltage, current, phase, frequency

Similar Documents

Publication	Publication Date	Title
Jhang et al.	2021	Challenges and trends of SRAM-based computing-in-memory for AI edge devices
Jiang et al.	2020	C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism
Imani et al.	2019	Searchd: A memory-centric hyperdimensional computing with stochastic training
Long et al.	2018	ReRAM-based processing-in-memory architecture for recurrent neural network acceleration
Marinella et al.	2018	Multiscale co-design analysis of energy, latency, area, and accuracy of a ReRAM analog neural training accelerator
Kang et al.	2018	An in-memory VLSI architecture for convolutional neural networks
Yue et al.	2022	STICKER-IM: A 65 nm computing-in-memory NN processor using block-wise sparsity optimization and inter/intra-macro data reuse
Jain et al.	2020	TiM-DNN: Ternary in-memory accelerator for deep neural networks
Giacomin et al.	2018	A robust digital RRAM-based convolutional block for low-power image processing and learning applications
Wolters et al.	2024	Memory is all you need: An overview of compute-in-memory architectures for accelerating large language model inference
Kang et al.	2020	Deep in-memory architectures in SRAM: An analog approach to approximate computing
US11281429B2 (en)	2022-03-22	Ternary in-memory accelerator
Cao et al.	2021	Neural-PIM: Efficient processing-in-memory with neural approximation of peripherals
US12182690B2 (en)	2024-12-31	MTJ-based hardware synapse implementation for binary and ternary deep neural networks
Luo et al.	2021	AILC: Accelerate on-chip incremental learning with compute-in-memory technology
Agrawal et al.	2020	CASH-RAM: Enabling in-memory computations for edge inference using charge accumulation and sharing in standard 8T-SRAM arrays
Cheon et al.	2023	A 2941-TOPS/W charge-domain 10T SRAM compute-in-memory for ternary neural network
Lu et al.	2023	An RRAM-based computing-in-memory architecture and its application in accelerating transformer inference
Alam et al.	2021	Exact stochastic computing multiplication in memristive memory
Zheng et al.	2023	Accelerating sparse attention with a reconfigurable non-volatile processing-in-memory architecture
Lou et al.	2023	An energy efficient all-digital time-domain compute-in-memory macro optimized for binary neural networks
Diao et al.	2024	A multiply-less approximate sram compute-in-memory macro for neural-network inference
Song et al.	2025	Xpikeformer: Hybrid analog-digital hardware acceleration for spiking transformers
Kang et al.	2020	Deep in-memory architectures for machine learning
Fu et al.	2023	Probabilistic compute-in-memory design for efficient Markov chain Monte Carlo sampling