Wang et al., 2016 - Google Patents

Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices

Wang et al., 2016

Document ID: 4949779828103168684
Author: Wang Y; Li H; Li X
Publication year: 2016
Publication venue: Proceedings of the 35th International Conference on Computer-Aided Design

External Links

Cited by

Snippet

The rapid development of deep learning are enabling a plenty of novel applications such as image and speech recognition for embedded systems, robotics or smart wearable devices. However, typical deep learning models like deep convolutional neural networks (CNNs) …

Continue reading at dl.acm.org (PDF) (other versions)

238000010801 machine learning 0 title description 5

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Wang et al.	2016	Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices
Wang et al.	2022	Via: A novel vision-transformer accelerator based on fpga
Kim et al.	2020	Geniehd: Efficient dna pattern matching accelerator using hyperdimensional computing
Park et al.	2022	A multi-mode 8k-MAC HW-utilization-aware neural processing unit with a unified multi-precision datapath in 4-nm flagship mobile SoC
Zhou et al.	2022	Energon: Toward efficient acceleration of transformers using dynamic sparse attention
Yap et al.	2018	Fixed point implementation of tiny-yolo-v2 using opencl on fpga
Li et al.	2020	Accelerating binarized neural networks via bit-tensor-cores in turing gpus
Hojabr et al.	2019	SkippyNN: An embedded stochastic-computing accelerator for convolutional neural networks
Ye et al.	2023	Accelerating attention mechanism on fpgas based on efficient reconfigurable systolic array
Wang et al.	2017	A case of on-chip memory subsystem design for low-power CNN accelerators
Ma et al.	2018	Algorithm-hardware co-design of single shot detector for fast object detection on FPGAs
Wang et al.	2017	Real-time meets approximate computing: An elastic CNN inference accelerator with adaptive trade-off between QoS and QoR
Lee et al.	2022	Anna: Specialized architecture for approximate nearest neighbor search
Imani et al.	2019	Digitalpim: Digital-based processing in-memory for big data acceleration
Lo et al.	2020	Energy efficient fixed-point inference system of convolutional neural network
US9626334B2 (en)	2017-04-18	Systems, apparatuses, and methods for K nearest neighbor search
Kobayashi et al.	2017	A high performance FPGA-based sorting accelerator with a data compression mechanism
Fu et al.	2024	SoftAct: A high-precision softmax architecture for transformers supporting nonlinear functions
Wang et al.	2024	Bsvit: A bit-serial vision transformer accelerator exploiting dynamic patch and weight bit-group quantization
Pan et al.	2023	BitSET: Bit-serial early termination for computation reduction in convolutional neural networks
Qin et al.	2024	Enhancing long sequence input processing in fpga-based transformer accelerators through attention fusion
Zhan et al.	2021	Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
Chung et al.	2019	Tightly coupled machine learning coprocessor architecture with analog in-memory computing for instruction-level acceleration
Hsiao et al.	2020	Sparsity-aware deep learning accelerator design supporting CNN and LSTM operations
de Moura et al.	2022	Data and computation reuse in CNNs using memristor TCAMs