Bause et al., 2024 - Google Patents
A Configurable and Efficient Memory Hierarchy for Neural Network Hardware AcceleratorBause et al., 2024
View PDF- Document ID
- 12335131068162514311
- Author
- Bause O
- Bernardo P
- Bringmann O
- Publication year
- Publication venue
- arXiv preprint arXiv:2404.15823
External Links
Snippet
As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory hierarchy framework tailored for per …
- 230000015654 memory 0 title abstract description 178
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/06—Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power Management, i.e. event-based initiation of power-saving mode
- G06F1/3234—Action, measure or step performed to reduce power consumption
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/25—Using a specific main memory architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Lee et al. | Decoupled direct memory access: Isolating CPU and IO traffic by leveraging a dual-data-port DRAM | |
| US20210181974A1 (en) | Systems and methods for low-latency memory device | |
| Huang et al. | Shuhai: A tool for benchmarking high bandwidth memory on FPGAs | |
| JP4188233B2 (en) | Integrated circuit device | |
| Oliveira et al. | MIMDRAM: An end-to-end processing-using-DRAM system for high-throughput, energy-efficient and programmer-transparent multiple-instruction multiple-data computing | |
| CN114356840B (en) | SoC system with in-memory/near-memory computing modules | |
| US8706453B2 (en) | Techniques for processor/memory co-exploration at multiple abstraction levels | |
| Mathew et al. | Design of a parallel vector access unit for SDRAM memory systems | |
| EP4038506A1 (en) | Hardware acceleration | |
| Chandu | A survey of memory controller architectures: Design trends and performance trade-offs | |
| Haris et al. | SECDA-TFLite: A toolkit for efficient development of FPGA-based DNN accelerators for edge inference | |
| EP1845462B1 (en) | Hardware emulations system for integrated circuits having a heterogeneous cluster of processors | |
| US8706469B2 (en) | Method and apparatus for increasing the efficiency of an emulation engine | |
| Mishra et al. | Processor-memory co-exploration driven by a memory-aware architecture description language | |
| US7827023B2 (en) | Method and apparatus for increasing the efficiency of an emulation engine | |
| Van Lunteren et al. | Coherently attached programmable near-memory acceleration platform and its application to stencil processing | |
| US7606698B1 (en) | Method and apparatus for sharing data between discrete clusters of processors | |
| JP2009527858A (en) | Hardware emulator with variable input primitives | |
| Bause et al. | A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator | |
| Vieira et al. | Ndpmulator: Enabling full-system simulation for near-data accelerators from caches to dram | |
| Vieira et al. | A compute cache system for signal processing applications | |
| US6604163B1 (en) | Interconnection of digital signal processor with program memory and external devices using a shared bus interface | |
| Bayliss et al. | Methodology for designing statically scheduled application-specific SDRAM controllers using constrained local search | |
| Abuelala | A Framework for Comprehensive and Customizable Memory-Centric Workloads | |
| US8468009B1 (en) | Hardware emulation unit having a shadow processor |