+

Wang et al., 2016 - Google Patents

Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices

Wang et al., 2016

View PDF
Document ID
4949779828103168684
Author
Wang Y
Li H
Li X
Publication year
Publication venue
Proceedings of the 35th International Conference on Computer-Aided Design

External Links

Snippet

The rapid development of deep learning are enabling a plenty of novel applications such as image and speech recognition for embedded systems, robotics or smart wearable devices. However, typical deep learning models like deep convolutional neural networks (CNNs) …
Continue reading at dl.acm.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F1/00Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Similar Documents

Publication Publication Date Title
Wang et al. Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices
Wang et al. Via: A novel vision-transformer accelerator based on fpga
Kim et al. Geniehd: Efficient dna pattern matching accelerator using hyperdimensional computing
Park et al. A multi-mode 8k-MAC HW-utilization-aware neural processing unit with a unified multi-precision datapath in 4-nm flagship mobile SoC
Zhou et al. Energon: Toward efficient acceleration of transformers using dynamic sparse attention
Yap et al. Fixed point implementation of tiny-yolo-v2 using opencl on fpga
Li et al. Accelerating binarized neural networks via bit-tensor-cores in turing gpus
Hojabr et al. SkippyNN: An embedded stochastic-computing accelerator for convolutional neural networks
Ye et al. Accelerating attention mechanism on fpgas based on efficient reconfigurable systolic array
Wang et al. A case of on-chip memory subsystem design for low-power CNN accelerators
Ma et al. Algorithm-hardware co-design of single shot detector for fast object detection on FPGAs
Wang et al. Real-time meets approximate computing: An elastic CNN inference accelerator with adaptive trade-off between QoS and QoR
Lee et al. Anna: Specialized architecture for approximate nearest neighbor search
Imani et al. Digitalpim: Digital-based processing in-memory for big data acceleration
Lo et al. Energy efficient fixed-point inference system of convolutional neural network
US9626334B2 (en) Systems, apparatuses, and methods for K nearest neighbor search
Kobayashi et al. A high performance FPGA-based sorting accelerator with a data compression mechanism
Fu et al. SoftAct: A high-precision softmax architecture for transformers supporting nonlinear functions
Wang et al. Bsvit: A bit-serial vision transformer accelerator exploiting dynamic patch and weight bit-group quantization
Pan et al. BitSET: Bit-serial early termination for computation reduction in convolutional neural networks
Qin et al. Enhancing long sequence input processing in fpga-based transformer accelerators through attention fusion
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
Chung et al. Tightly coupled machine learning coprocessor architecture with analog in-memory computing for instruction-level acceleration
Hsiao et al. Sparsity-aware deep learning accelerator design supporting CNN and LSTM operations
de Moura et al. Data and computation reuse in CNNs using memristor TCAMs
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载