Dong et al., 2025 - Google Patents
Topkima-Former: Low-Energy, Low-Latency Inference for Transformers Using Top-TEXPRESERVE0 In-Memory ADCDong et al., 2025
View PDF- Document ID
- 535558286133078898
- Author
- Dong S
- Yang J
- Peng X
- Shang H
- Ke Y
- Yang X
- Liu H
- Basu A
- Publication year
- Publication venue
- IEEE Transactions on Circuits and Systems I: Regular Papers
External Links
Snippet
Transformer has emerged as a leading architecture in neural language processing (NLP) and computer vision (CV). However, the extensive use of nonlinear operations, like softmax, poses a performance bottleneck during transformer inference and comprises up to 40% of …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/406—Management or control of the refreshing or charge-regeneration cycles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/58—Random or pseudo-random number generators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/56—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using storage elements with more than two stable states represented by steps, e.g. of voltage, current, phase, frequency
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Jhang et al. | Challenges and trends of SRAM-based computing-in-memory for AI edge devices | |
| Jiang et al. | C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism | |
| Imani et al. | Searchd: A memory-centric hyperdimensional computing with stochastic training | |
| Long et al. | ReRAM-based processing-in-memory architecture for recurrent neural network acceleration | |
| Marinella et al. | Multiscale co-design analysis of energy, latency, area, and accuracy of a ReRAM analog neural training accelerator | |
| Kang et al. | An in-memory VLSI architecture for convolutional neural networks | |
| Yue et al. | STICKER-IM: A 65 nm computing-in-memory NN processor using block-wise sparsity optimization and inter/intra-macro data reuse | |
| Jain et al. | TiM-DNN: Ternary in-memory accelerator for deep neural networks | |
| Giacomin et al. | A robust digital RRAM-based convolutional block for low-power image processing and learning applications | |
| Wolters et al. | Memory is all you need: An overview of compute-in-memory architectures for accelerating large language model inference | |
| Kang et al. | Deep in-memory architectures in SRAM: An analog approach to approximate computing | |
| US11281429B2 (en) | Ternary in-memory accelerator | |
| Cao et al. | Neural-PIM: Efficient processing-in-memory with neural approximation of peripherals | |
| US12182690B2 (en) | MTJ-based hardware synapse implementation for binary and ternary deep neural networks | |
| Luo et al. | AILC: Accelerate on-chip incremental learning with compute-in-memory technology | |
| Agrawal et al. | CASH-RAM: Enabling in-memory computations for edge inference using charge accumulation and sharing in standard 8T-SRAM arrays | |
| Cheon et al. | A 2941-TOPS/W charge-domain 10T SRAM compute-in-memory for ternary neural network | |
| Lu et al. | An RRAM-based computing-in-memory architecture and its application in accelerating transformer inference | |
| Alam et al. | Exact stochastic computing multiplication in memristive memory | |
| Zheng et al. | Accelerating sparse attention with a reconfigurable non-volatile processing-in-memory architecture | |
| Lou et al. | An energy efficient all-digital time-domain compute-in-memory macro optimized for binary neural networks | |
| Diao et al. | A multiply-less approximate sram compute-in-memory macro for neural-network inference | |
| Song et al. | Xpikeformer: Hybrid analog-digital hardware acceleration for spiking transformers | |
| Kang et al. | Deep in-memory architectures for machine learning | |
| Fu et al. | Probabilistic compute-in-memory design for efficient Markov chain Monte Carlo sampling |