Wang et al., 2016 - Google Patents
Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devicesWang et al., 2016
View PDF- Document ID
- 4949779828103168684
- Author
- Wang Y
- Li H
- Li X
- Publication year
- Publication venue
- Proceedings of the 35th International Conference on Computer-Aided Design
External Links
Snippet
The rapid development of deep learning are enabling a plenty of novel applications such as image and speech recognition for embedded systems, robotics or smart wearable devices. However, typical deep learning models like deep convolutional neural networks (CNNs) …
- 238000010801 machine learning 0 title description 5
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wang et al. | Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices | |
| Wang et al. | Via: A novel vision-transformer accelerator based on fpga | |
| Kim et al. | Geniehd: Efficient dna pattern matching accelerator using hyperdimensional computing | |
| Park et al. | A multi-mode 8k-MAC HW-utilization-aware neural processing unit with a unified multi-precision datapath in 4-nm flagship mobile SoC | |
| Zhou et al. | Energon: Toward efficient acceleration of transformers using dynamic sparse attention | |
| Yap et al. | Fixed point implementation of tiny-yolo-v2 using opencl on fpga | |
| Li et al. | Accelerating binarized neural networks via bit-tensor-cores in turing gpus | |
| Hojabr et al. | SkippyNN: An embedded stochastic-computing accelerator for convolutional neural networks | |
| Ye et al. | Accelerating attention mechanism on fpgas based on efficient reconfigurable systolic array | |
| Wang et al. | A case of on-chip memory subsystem design for low-power CNN accelerators | |
| Ma et al. | Algorithm-hardware co-design of single shot detector for fast object detection on FPGAs | |
| Wang et al. | Real-time meets approximate computing: An elastic CNN inference accelerator with adaptive trade-off between QoS and QoR | |
| Lee et al. | Anna: Specialized architecture for approximate nearest neighbor search | |
| Imani et al. | Digitalpim: Digital-based processing in-memory for big data acceleration | |
| Lo et al. | Energy efficient fixed-point inference system of convolutional neural network | |
| US9626334B2 (en) | Systems, apparatuses, and methods for K nearest neighbor search | |
| Kobayashi et al. | A high performance FPGA-based sorting accelerator with a data compression mechanism | |
| Fu et al. | SoftAct: A high-precision softmax architecture for transformers supporting nonlinear functions | |
| Wang et al. | Bsvit: A bit-serial vision transformer accelerator exploiting dynamic patch and weight bit-group quantization | |
| Pan et al. | BitSET: Bit-serial early termination for computation reduction in convolutional neural networks | |
| Qin et al. | Enhancing long sequence input processing in fpga-based transformer accelerators through attention fusion | |
| Zhan et al. | Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems | |
| Chung et al. | Tightly coupled machine learning coprocessor architecture with analog in-memory computing for instruction-level acceleration | |
| Hsiao et al. | Sparsity-aware deep learning accelerator design supporting CNN and LSTM operations | |
| de Moura et al. | Data and computation reuse in CNNs using memristor TCAMs |