+

Dong et al., 2025 - Google Patents

Topkima-Former: Low-Energy, Low-Latency Inference for Transformers Using Top-TEXPRESERVE0 In-Memory ADC

Dong et al., 2025

View PDF
Document ID
535558286133078898
Author
Dong S
Yang J
Peng X
Shang H
Ke Y
Yang X
Liu H
Basu A
Publication year
Publication venue
IEEE Transactions on Circuits and Systems I: Regular Papers

External Links

Snippet

Transformer has emerged as a leading architecture in neural language processing (NLP) and computer vision (CV). However, the extensive use of nonlinear operations, like softmax, poses a performance bottleneck during transformer inference and comprises up to 40% of …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5009Computer-aided design using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/401Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
    • G11C11/406Management or control of the refreshing or charge-regeneration cycles
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F1/00Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/56Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using storage elements with more than two stable states represented by steps, e.g. of voltage, current, phase, frequency

Similar Documents

Publication Publication Date Title
Jhang et al. Challenges and trends of SRAM-based computing-in-memory for AI edge devices
Jiang et al. C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism
Imani et al. Searchd: A memory-centric hyperdimensional computing with stochastic training
Long et al. ReRAM-based processing-in-memory architecture for recurrent neural network acceleration
Marinella et al. Multiscale co-design analysis of energy, latency, area, and accuracy of a ReRAM analog neural training accelerator
Kang et al. An in-memory VLSI architecture for convolutional neural networks
Yue et al. STICKER-IM: A 65 nm computing-in-memory NN processor using block-wise sparsity optimization and inter/intra-macro data reuse
Jain et al. TiM-DNN: Ternary in-memory accelerator for deep neural networks
Giacomin et al. A robust digital RRAM-based convolutional block for low-power image processing and learning applications
Wolters et al. Memory is all you need: An overview of compute-in-memory architectures for accelerating large language model inference
Kang et al. Deep in-memory architectures in SRAM: An analog approach to approximate computing
US11281429B2 (en) Ternary in-memory accelerator
Cao et al. Neural-PIM: Efficient processing-in-memory with neural approximation of peripherals
US12182690B2 (en) MTJ-based hardware synapse implementation for binary and ternary deep neural networks
Luo et al. AILC: Accelerate on-chip incremental learning with compute-in-memory technology
Agrawal et al. CASH-RAM: Enabling in-memory computations for edge inference using charge accumulation and sharing in standard 8T-SRAM arrays
Cheon et al. A 2941-TOPS/W charge-domain 10T SRAM compute-in-memory for ternary neural network
Lu et al. An RRAM-based computing-in-memory architecture and its application in accelerating transformer inference
Alam et al. Exact stochastic computing multiplication in memristive memory
Zheng et al. Accelerating sparse attention with a reconfigurable non-volatile processing-in-memory architecture
Lou et al. An energy efficient all-digital time-domain compute-in-memory macro optimized for binary neural networks
Diao et al. A multiply-less approximate sram compute-in-memory macro for neural-network inference
Song et al. Xpikeformer: Hybrid analog-digital hardware acceleration for spiking transformers
Kang et al. Deep in-memory architectures for machine learning
Fu et al. Probabilistic compute-in-memory design for efficient Markov chain Monte Carlo sampling
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载