Shi et al., 2021 - Google Patents
MG-WFBP: Merging gradients wisely for efficient communication in distributed deep learningShi et al., 2021
View PDF- Document ID
- 10260418442736067823
- Author
- Shi S
- Chu X
- Li B
- Publication year
- Publication venue
- IEEE Transactions on Parallel and Distributed Systems
External Links
Snippet
Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation …
- 238000004891 communication 0 title abstract description 134
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Shi et al. | MG-WFBP: Merging gradients wisely for efficient communication in distributed deep learning | |
| Shi et al. | Communication-efficient distributed deep learning with merged gradient sparsification on GPUs | |
| Shi et al. | MG-WFBP: Efficient data communication for distributed synchronous SGD algorithms | |
| Kim et al. | Parallax: Sparsity-aware data parallel training of deep neural networks | |
| Zhao et al. | Multi-resource interleaving for deep learning training | |
| Lai et al. | Merak: An efficient distributed dnn training framework with automated 3d parallelism for giant foundation models | |
| He et al. | A novel task-duplication based clustering algorithm for heterogeneous computing environments | |
| Gulisano et al. | Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join | |
| US20180158034A1 (en) | Dynamic reordering of blockchain transactions to optimize performance and scalability | |
| Xiao et al. | Scheduling critical channels in conservative parallel discrete event simulation | |
| Vianna et al. | Analytical performance models for MapReduce workloads | |
| Shi et al. | Exploiting simultaneous communications to accelerate data parallel distributed deep learning | |
| Gharaibeh et al. | Efficient large-scale graph processing on hybrid CPU and GPU systems | |
| Shi et al. | A DAG model of synchronous stochastic gradient descent in distributed deep learning | |
| Ying et al. | Bluefog: Make decentralized algorithms practical for optimization and deep learning | |
| Andújar et al. | VEF traces: a framework for modelling MPI traffic in interconnection network simulators | |
| Luo et al. | Adapt: An event-based adaptive collective communication framework | |
| Zhang et al. | Fine-grained multi-query stream processing on integrated architectures | |
| Yang et al. | Mitigating stragglers in the decentralized training on heterogeneous clusters | |
| Mamidala et al. | MXNET-MPI: Embedding MPI parallelism in parameter server task model for scaling deep learning | |
| Butler et al. | Pipeinfer: Accelerating llm inference using asynchronous pipelined speculation | |
| Chen et al. | Hare: Exploiting inter-job and intra-job parallelism of distributed machine learning on heterogeneous GPUs | |
| Isaacs et al. | Ordering traces logically to identify lateness in message passing programs | |
| He et al. | Real-time scheduling in mapreduce clusters | |
| Pienta et al. | On the parallel simulation of scale-free networks |