Shi et al., 2021 - Google Patents

MG-WFBP: Merging gradients wisely for efficient communication in distributed deep learning

Shi et al., 2021

Document ID: 10260418442736067823
Author: Shi S; Chu X; Li B
Publication year: 2021
Publication venue: IEEE Transactions on Parallel and Distributed Systems

External Links

Cited by

Snippet

Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation …

Continue reading at arxiv.org (PDF) (other versions)

238000004891 communication 0 title abstract description 134

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring

Similar Documents

Publication	Publication Date	Title
Shi et al.	2021	MG-WFBP: Merging gradients wisely for efficient communication in distributed deep learning
Shi et al.	2020	Communication-efficient distributed deep learning with merged gradient sparsification on GPUs
Shi et al.	2019	MG-WFBP: Efficient data communication for distributed synchronous SGD algorithms
Kim et al.	2019	Parallax: Sparsity-aware data parallel training of deep neural networks
Zhao et al.	2022	Multi-resource interleaving for deep learning training
Lai et al.	2023	Merak: An efficient distributed dnn training framework with automated 3d parallelism for giant foundation models
He et al.	2018	A novel task-duplication based clustering algorithm for heterogeneous computing environments
Gulisano et al.	2016	Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join
US20180158034A1 (en)	2018-06-07	Dynamic reordering of blockchain transactions to optimize performance and scalability
Xiao et al.	1999	Scheduling critical channels in conservative parallel discrete event simulation
Vianna et al.	2013	Analytical performance models for MapReduce workloads
Shi et al.	2021	Exploiting simultaneous communications to accelerate data parallel distributed deep learning
Gharaibeh et al.	2013	Efficient large-scale graph processing on hybrid CPU and GPU systems
Shi et al.	2018	A DAG model of synchronous stochastic gradient descent in distributed deep learning
Ying et al.	2021	Bluefog: Make decentralized algorithms practical for optimization and deep learning
Andújar et al.	2015	VEF traces: a framework for modelling MPI traffic in interconnection network simulators
Luo et al.	2018	Adapt: An event-based adaptive collective communication framework
Zhang et al.	2021	Fine-grained multi-query stream processing on integrated architectures
Yang et al.	2020	Mitigating stragglers in the decentralized training on heterogeneous clusters
Mamidala et al.	2018	MXNET-MPI: Embedding MPI parallelism in parameter server task model for scaling deep learning
Butler et al.	2024	Pipeinfer: Accelerating llm inference using asynchronous pipelined speculation
Chen et al.	2022	Hare: Exploiting inter-job and intra-job parallelism of distributed machine learning on heterogeneous GPUs
Isaacs et al.	2015	Ordering traces logically to identify lateness in message passing programs
He et al.	2013	Real-time scheduling in mapreduce clusters
Pienta et al.	2013	On the parallel simulation of scale-free networks