Abstract
Utilizing acoustic information for search route planning will greatly increase the success rate of searching for underwater targets, which requires rapid computing of numerous underwater acoustic fields. The efficiency of traditional computing methods is too low to meet the requirements of rapid applications. In this paper, a underwater multi-acoustic fields computing model is developed based on ray theory, and multi-level hybrid parallel computing strategies are designed based on the model characteristics, and a dynamic scheduling optimization algorithm at process level is introduced to solve the load imbalance problem. All parallel computing strategies are tested in Tianhe-II High Performance Computer (HPC) system, and the tests show that: 1. compared with the serial version, hybrid parallel computing strategy provides a Speedup of 75.7 under 240 cores; 2. after the introduction of the dynamic scheduling optimization algorithm, the speed of solving the underwater acoustic fields is further increased by 28.99% under the same computing resources, and the Speedup reaches 97.67; 3. the optimal combination of process/thread parameter on the Tianhe-II HPC system is given as 3/8, and the final Speedup reaches 112.13.
Similar content being viewed by others
Introduction
It is a difficult problem to plan search route for underwater target in vast sea1,2. Due to the uncertainty of target location, the spatial extent of the sea to be searched usually reaching tens of thousands of square kilometers3. Underwater target detection mainly relies on the sonars2, sonars use acoustic waves to detect targets, and simulating the propagation of acoustic waves in the ocean in advance on land can help us predict the detection performance of sonars at different locations. Searching along locations favorable for sonar detection can increase the success rate of the search. There are usually three steps to simulate the propagation of acoustic waves in the ocean in advance on land: Step 1, obtain the hydrological data (temperature and salinity) of the target moment in advance through an ocean model (e.g., HYCOM and FVCOM), and then compute the three-dimensional sound velocity profiles from the temperature and salinity; Step 2, discretize the search area into two-dimensional grids along the longitude and latitude; Step 3, deploy the sound source in each grid sequentially, and compute the corresponding underwater acoustic fields. Due to the reciprocal principle of the underwater acoustic field4, the solved underwater acoustic fields can simulate the detection capability of the sonar for the target at different locations, which provides information support for search route planning. The key to this approach is the need for underwater multi-acoustic fields computing, and the efficiency of underwater multi-acoustic fields computing determines the success rate of search route planning.
The propagation law of underwater acoustic waves in the frequency domain satisfies the Helmholtz equation, but the Helmholtz equation is difficult to solve analytically due to the complex marine environment, so it is usually solved numerically by various underwater acoustic propagation models5.The traditional underwater acoustic propagation models are Ray Model, Normal Mode Model and Parabolic Equation Model4. Among these three models, the Ray Model has the advantages of accurate results and clear physical significance in solving high-frequency source problems. because this paper focuses on the underwater target search problem in ocean engineering, and sonar is the mostly used equipment in underwater target search, which utilizes acoustic waves in the medium and high frequency bands (10kHz-1MHz)6. In order to ensure the accuracy of the simulation, the research in this paper is based on the Ray Model. The Ray Model treats the underwater acoustic field as the sum of the contributions of many ray beams, derives the Eikonal Equation and the Transport Equation by introducing the high-frequency approximation, and solves these two equations separately to obtain the amplitude and phase of each ray beam, then superimposes the contributions of all the ray beams to obtain the final underwater acoustic field7,8. However, the computing time is unacceptable when using the Ray Model to compute numerous underwater acoustic fields in vast sea. Taking a area of \(1^\circ\) longitude \(\times\)\(1^\circ\) latitude (about 10,000 square kilometers) as an example, if a \(0.1^\circ\) latitude/longitude interval is used for the grids, then 121 grid points will be generated. Furtherly, in order to analyze the three-dimensional propagation effect of acoustic wave, this paper employs the N\(\times\)2D method for the extended computing, which needs to compute the underwater acoustic fields in 8 directions at each grid point. In this paper, the computing of underwater acoustic propagation in each direction at each grid point is independent, so in this paper the underwater acoustic propagation in one direction is referred to as one underwater acoustic field, thus a total of 968 underwater acoustic fields need to be computed. Taking a typical deep-sea Munk waveguide as an example, The computing time of the traditional Ray Model Bellhop on the CPU (i7-11700@2.50GHz) is about 50s, then it takes more than 13 hours to compute all the underwater acoustic fields, which can not satisfy the time requirement for search route planning. Therefore, the study of efficient acceleration methods for underwater acoustic field computing has become a key problem to be solved for underwater target search route planning in vast sea.
Parallel computing is a method which utilizes multiple computing resources simultaneously to accelerate the solution of large-scale scientific computing problems. Currently, parallel computing technology has been widely used in computing ocean acoustics. Lianrong Chen et al.9 used MPI to parallelize the Gaussian Ray Model at the ray level for single frequency source, and they divided the angle range into equal-sized intervals and synthesized the underwater acoustic field through the Reduce function after the computing of each interval was completed, and achieved a Speedup of 15.76 on multi-core computer cluster system. Lee et al.10 implemented parallelization of 3D parabolic equation model based on GPU and CUDA. The parallelization is carried out through two steps-the sum representation of the operator approximation and the grouping of vectors in the matrix. With the suggested algorithm, the execution time of the 3D parabolic equation model is shortened to 1/5 \(\sim\) 1/10. Chaojin Zhang et al.11 developed the parallelized Ray Model based on the C++ version of Bellhop–BellhopMP, and verified that BellhopMP can substantially improve the computing speed under test of Munk waveguide, they finally achieved a Speedup of 7.2, and the larger the serial computing time share, the higher the parallel efficiency. Rogério et al.8 proposed an OpenMPI-based parallel computing strategy for Traceo3D, which assigns neighboring rays to different processes to avoid the same process acquiring a large number of time-consuming rays. The performance of the strategy is also tested with a real ocean environment, which can finally achieve a Speedup of 12. Chuanxi Xing et al.12 proposed a parallel search strategy for 3D feature rays in irregular undersea waveguides based on OpenMP and achieved a Speedup of 3.76 on multi-core computer cluster system. Yiqing Zhou et al.13 achieved coarse-grained parallel optimization of Bellhop3D using MPI, and the computing results of the optimized program are stable and reliable with high parallel efficiency, which is suitable for fast acoustic field prediction. And they achieved a Speedup of 36.43 on multi-core computer cluster system. Rogério et al.14 developed a parallel version of Traceo3D based on CUDA, and they first optimized the storage structure by storing only three time periods of acoustic parameters per sensor, which greatly reduced the storage pressure on the hardware, and then carefully considered the inherent parallelism of ray tracing and the high workload of 3D to design a parallel GPU-based algorithm. Qiang Lan15 et al. implemented the multi-core parallel optimization of the 3D ray hydroacoustic model Bellhop3D on the Tianhe-II High Performance Computer (HPC) system, and performed performance tests on typical Munk waveguide as well as 3D wedge seabed arithmetic cases. The experimental results show that the OpenMP-based parallel computing strategy achieved better speedup in Bellhop3D, and they finally achieved a Speedup of 20.93. However, parallel computing studies based on the Ray Model still have some shortcomings: 1. All parallel computing strategies are unable to compute multiple underwater acoustic fields simultaneously. 2. All researches are based on a single parallel programming model and have not yet integrated the advantages of different parallel programming models; 3. The process-level parallel computing strategy only considers a static task allocation method, which is unable to overcome the load imbalance problem when computing complex seafloor, which seriously hinders the improvement of parallel Speedup.
In order to overcome the above three shortcomings, this paper establishes an underwater multi-acoustic fields computing model, which can compute underwater multiple acoustic fields simultaneously, and combines the advantages of multiple parallel programming models to design hybrid parallel computing strategies for the underwater multi-acoustic fields computing model, and further introduces a dynamic scheduling optimization algorithm at the process level for optimization of load balancing. All the parallel computing strategies are tested on the Tianhe-II HPC system.
This paper is organized as follows: the first chapter is the introduction, which mainly introduces the background of this research, and analyzes the development status and shortcomings of the application of parallel computing technology in underwater acoustic propagation model. Chapter 2 introduces the underwater multi-acoustic fields computing model and the ray theory. Chapter 3 introduces the parallel computing strategies, this chapter first analyzes the multi-level concurrency of the underwater multi-acoustic fields computing model, and then designs the parallel computing strategy for OpenMP, MPI, and OpenMP+MPI respectively, then introduces the dynamic scheduling scheduling optimization algorithm at the process level in order to overcome the load imbalance problem. Chapter 4 introduces the acceleration effect of the parallel computing strategies in Tianhe-II HPC system. In this chapter, 240 underwater acoustic fields around the Bashi Channel are computed by the underwater multi-acoustic fields computing model. Chapter 5 summarizes the work of this paper and gives an outlook on possible future research work.
Underwater multi-acoustic fields computing model
The underwater multi-acoustic fields computing model studied in this paper is based on the ray theory. Compared with the normal mode theory and the parabolic equation theory, the ray theory in the high frequency source conditions have the advantages of accurate results, fast computing speed, and clear physical significance.
In the field of computational ocean acoustics, scholars have written several codes based on ray theory, of which Bellhop and Traceo are typical representatives. Compared with Traceo, Bellhop has been developed earlier, and has been maintained and optimized for many rounds, with better computing stability. Therefore, the work in this paper is conducted with reference to Bellhop, and its original code can be downloaded from the official website of the Ocean Acoustics Library at https://oalib-acoustics.org/.
Algorithms for underwater multi-acoustic fields computing model
The algorithm flow of the underwater multi-acoustic fields computing model is shown in Algorithm1, which firstly needs to configure the underwater acoustic field environment files for the study area, and then enters into four layer loops, from outside to inside: the underwater acoustic fields solving loop, the source solving loop, the ray-tracing loop, and the trajectory stepping loop. The trajectory stepping loop solves the propagation trajectory of a single ray beam and the acoustic parameters on the trajectory; The rays tracing loop computes all ray beams emitted by a source; The source solving loop solves for the contribution of all sources to the underwater acoustic field; and the underwater acoustic fields solving loop then solves the underwater acoustic fields in order, and at the end of each loop, the memory of the variable storing the ray parameters needs to be released.
In this paper, an underwater multi-acoustic fields computing model is implemented based on Fortran and examined with ARCTIC and MUNK standard cases, and the acoustic propagation loss at a depth of 100 m is compared with that of Bellhop, and the results are shown in Fig. 1 The abbreviation MAFCM in Fig. 1 stands for the underwater multi-acoustic fields computing model, and the underwater multi-acoustic fields computing model has the same results with the acoustic propagation loss of Bellhop, which proves the accuracy of the underwater multi-acoustic fields computing model.
Ray theory
Ray theory is developed based on the assumption of a high-frequency approximation, and derives Eikonal Equation and Transport Equation. Focusing on the particle nature of acoustic wave, the solution of the underwater acoustic field is considered as the sum of the contributions of all ray beams. And the trajectories and phases of the ray beams are obtained by solving Eikonal Equation, and the amplitudes of the ray beams are obtained by solving Transport Equation. Finally, by introducing a control function, the contribution of each ray beam to the underwater acoustic field can be obtained.
Our starting point is the Helmholtz equation in Cartesian coordinates \(\vec {{\varvec{x}}}=\left( x,y,z \right)\),
where \(c\left( \vec {{\varvec{x}}} \right)\) is the sound speed, \(\omega\) is the angular frequency of the source located at \(\vec {{\varvec{x}}}_0\). To obtain the ray equations, we seek a solution of the Helmholtz equation in the form:
where \(\tau \left( \vec {{\varvec{x}}} \right)\) representative phase, \(A_j\left( \vec {{\varvec{x}}} \right)\) representative amplitude. Introducing the high-frequency approximation yields two equations for \(\tau \left( \vec {{\varvec{x}}} \right)\) and \(A_j\left( \vec {{\varvec{x}}} \right)\):
Equation (3) is called Eikonal Equation, and Eq. (4) is called Transport Equation. The Eikonal Equation is solved using the method of characteristics, by introducing a family of rays vertical to the wavefront surface, and then performing a series of operations, the trajectory equations of the rays can be obtained:
Solve for the Eikonal Equation in the ray coordinate system, the phase is given by
The integral term of Eq. (6) is the propagation time along the ray, i.e., the delay in the wave phase. Using the directional derivative along the ray, the Transport Equation can be rewritten as
Introducing the Jacobi determinant, the Jacobi determinant satisfies the following relationship
Eq. (8) can therefore be written as
Integrate Eq. (9) to find the result of the Transportation Equation:
The solution of the Eikonal Equation gives the trajectories and phases of the ray beams; the solution of the Transmission Equation gives the amplitudes of the ray beams; and the coefficients are determined by comparison with the standard point source problem. The final solution for the rays is:
where \(p\left( s \right)\) represents the sound pressure at a distance s from the source location.
Parallel computing strategies
The underwater multi-acoustic fields computing model consists of four loops: Underwater acoustic fields solving loop, sources solving loop, rays tracing loop, and trajectory stepping loop, and this multi-loop computing mode has a natural potential for concurrency, which is suitable for parallel optimization.
In the underwater multi-acoustic fields computing model studied in this paper, the computing between each underwater acoustic field is independent of each other, there is no data dependence, and the underwater acoustic fields solving loop can be computed concurrently. And the computing between each source is independent of each other, there is no data dependence, and the acoustic sources solving loop can be computed concurrently. And the computing between each ray beam is also independent of each other, there is no data dependence, and the rays tracing loop can be computed concurrently. There is data dependence between the trajectory stepping loop, whose input variables are the ray trajectory and the acoustic parameters in the previous loop, and the output variables are the ray trajectory and the acoustic parameters in the next loop, so the input variables of the later loop are dependent on the output variable of the previous loop, and therefore the trajectory stepping loop are not able to be optimized in parallel. Considering the actual physical scenarios, a single source is often used to simulate standard acoustic problem, and thus the source solving loop is not considered in this paper. In summary, the concurrency of underwater multi-acoustic fields computing model considers two levels of underwater acoustic fields solving loop and rays tracing loop.
OpenMP/MPI-based parallel computing strategies
OpenMP is a parallel programming model for shared memory systems that enables developers to easily convert serial code to parallel code by compiling instructions and runtime libraries. It is characterized by ease of use, portability, and compatibility with original code16.
Variables between threads are categorized into global and local variables. Global variables are shared and local variables are private. In order to fully utilize the performance advantages of OpenMP shared memory, the use of local variables should be minimized. For the underwater multi-acoustic fields computing model studied in this paper, the rays tracing loop is suitable for parallel optimization using the OpenMP shared-storage parallel programming model, The acoustic environment parameters utilized within the same underwater oacoustic field are the same, so only the trajectories and the acoustic parameters on the ray beam trajectories are different form each other, need to be set as local variables.
Designing a parallel computing strategy for an OpenMP-based underwater multi-acoustic computing model involves following three steps:
-
Step 1: Initialize OpenMP shared-memory parallel programming environment and declare the variable storage types. Set the variables as global variables by default, set the ray trajectories and acoustic parameters on the trajectories as local variables.
-
Step 2: All the rays tracing loops are assigned to be completed by different threads according Fig. 2a, and the sound pressure matrices of each thread are summed to finally form the total underwater acoustic field. During the execution of the program, the dynamic control parameter can be added, and OpenMP will automatically assign ray beams to different threads dynamically according to the number of OpenMP threads set by the environment variable to achieve the optimal load balance.
-
Step 3: End OpenMP shared-storage parallel programming environment. After the rays tracing loop is completed, the redundant threads are destroyed and work returns to the main thread.
MPI (Message Passing Interface) is a standard communication library for writing parallel programs. It allows message passing between different processes, enabling inter-process communication and coordination. The main features of MPI include flexibility, portability, and efficiency. And MPI is widely used in scientific computing, big data processing, and other fields, and is one of the most important tools in parallel computing17,18.
From the previous concurrency analysis, it is clear that parallel computing of the underwater multi-acoustic fields computing model can be carried out by utilizing the MPI parallel programming model at both the underwater acoustic fields solving loop and the rays tracing loop levels. In a pure MPI program, processes simultaneously serve as the basic unit of operating system resource allocation and program execution, and are unable to perform parallel computing at multiple levels simultaneously. Therefore, MPI can only select one level for parallel computing. Since the number of communications is usually the main limiting factor in the performance of MPI programs, fewer communications are required when parallel computings are performed in the underwater acoustic fields solving loop. Therefore we design the parallel computing strategy shown in Fig. 2b, which equally divides the underwater acoustic field computing tasks to different processes according to the number. And all ray beams inside the one underwater acoustic field are computed by the one process.
The parallel computing strategy based on MPI is that in the underwater acoustic field computing loop, each process only needs to execute raycompute subfunctions on its own underwater acoustic field environment file. This parallel computing strategy is a typical coarse-grained parallelism, which has the advantage of low communication overhead but is prone to lead to load imbalance problems.
OpenMP+MPI-based hybrid parallel computing strategy
In the previous section, pure OpenMP and MPI parallel computing strategies were introduced respectively. The OpenMP-based parallel computing strategy is simple to implement and has obvious advantages in fine-grained parallelism development, but is poorly scalable and can only be used on shared main memory machines; the MPI-based parallel computing strategy is highly scalable and can realize parallel computing on computer cluster systems, but the frequent communication limits the acceleration performance. The two defects mentioned above are due to the limitations imposed by the inherent properties of the two parallel programming models themselves. An ideal solution would be to combine the advantages of OpenMP and MPI to design a hybrid parallel computing strategy. The basic implementation is to utilize MPI to assign the underwater acoustic fields solving loop to different processes, and utilize OpenMP to assign the rays tracing loop inside the underwater acoustic field to different threads.
Parallel computing is usually performed on HPC systems. HPC systems have powerful computing capabilities and are suitable for dealing with problems that require large amounts of computing resources, such as numerical weather forecasting and molecular simulations. Figure 3 shows the composition of a HPC System, unlike ordinary computers, a HPC system consist of multiple nodes, where each node corresponds to an ordinary computer. Each node has CPUs and memory, multiple nodes share the same Storage, and the different nodes are connected to each other by high-speed network. Thus, a HPC system is not a single computer, but a cluster of computers that can collaborate with each other.
When parallel computing is performed on a HPC system, the central factor determining the efficiency of computing is the communication overhead between different nodes. However, it is too difficult to perform an accurate theoretical analysis of the communication model of a HPC system for two main reasons: One is because a HPC system is a whole composed of many nodes, and the communication between different nodes will affect each other. Secondly, it is because the communication performance between nodes is not constant and there are random fluctuations as well. To analyze the communication model roughly, we adopt the LogP model which is widely accepted in computer science. According to LogP model, the minimum time required for data transfer between nodes of an interconnected network is
where L represents the upper limit of the waiting or delay time required for a source processor to communicate with a destination processor for a single message, o represents the time overhead for a processor to send or receive each message, and g represents the minimum time interval between two consecutive times a processor sends or receives a time.
On a HPC system, the nodes are connected through a high-speed interconnection network, and the time overhead o for sending or receiving a message between nodes is quantitatively smaller than the upper time limit L required for a message communication between nodes, so the final communication performance of the distributed-storage parallel program is determined by the number of communications. When computing underwater multi-acoustic fields, 4000 rays are computed inside each underwater acoustic field. And if MPI is used in the underwater acoustic fields solving loop for task allocation, only a few communications between nodes are required, which can take advantage of the powerful scalability of MPI and minimize the communication overhead to achieve the best acceleration effect. For the ray computing inside each underwater acoustic field, OpenMP is directly utilized to open multiple threads in the process to achieve parallel computing between threads, and the threads collaborate to complete the rays tracing loop assigned on the process. Inter-thread communication is usually realized by means of mutual exclusion locks, condition variables and signal volumes19. Only a small amount of state, such as stack and register, needs to be switched between threads, so the communication overhead between threads is small, and it is suitable for parallel computing of rays tracing loop.
Figure 4 shows the basic implementation of the hybrid parallel computing strategy based on OpenMP+MPI. In the figure, each square represents a computing node, each node corresponds to a process, and the numerical identifier above the square represents the number of the process, with the number 0 representing the master process and the rest of the numbers representing the slave processes. After starting the model, the master process first distributes the underwater acoustic field computing tasks to the slave processes equally, and then opens multiple threads within each process, each thread corresponds to a computing core, and all computing cores within the process collaborate to complete the computing of the same underwater acoustic field, and the computing of different underwater acoustic fields is executed sequentially within the process. Under this parallel computing strategy, MPI ensures that the parallel computing strategy can be extended to the cluster system, and the number of communications is kept at the same order of magnitude as the number of acoustic fields, thus greatly reducing the communication overhead; OpenMP ensures that the threads within a node efficiently collaborate to complete rays tracing loop, thus achieving the best acceleration effect.
Process-level dynamic scheduling optimization algorithm based on master-slave model
Although the hybrid parallel computing strategy based on OpenMP+MPI already has the advantages of high communication efficiency and scalability, simply distributing the underwater acoustic fields equally among processes is prone to load imbalance. There are two main reasons for this:
-
1. MAFCM is based on the ray theory, the ray propagation is greatly affected by the terrain, and different terrains will lead to differences in the computing time of the acoustic fields.
-
2. the effect on the computing time of the acoustic field is too complex to be accurately estimated in advance, since the Ray Model is essentially a complex numerical solver model.
To illustrate the effect of terrain on the computing time of a acoustic field, two sets of controlled experiments are designed in this paper, and each experiment corresponding to 3 acoustic field samples. In each set of controlled experiments, the same sound velocity profiles are used for 3 acoustic field samples, but the sources are located in different terrains. The sound velocity profiles of the two sets of control experiments are shown in Fig. 5a and b, respectively. Figure 5a shows the linear sound velocity profile, and Fig. 5b shows the Munk sound velocity profile, which are both very typical sound velocity profiles in hydroacoustic engineering. In the controlled experiments, we placed the sound source in flat terrain, uphill terrain and downhill terrain, respectively. The results of the first set of controlled experiments are shown in Fig. 6, with Fig. 6a–c corresponding to flat, uphill, and downhill terrain, respectively; and the results of the second set of controlled experiments are shown in Fig. 7, with Fig. 7a–c corresponding to flat, uphill, and downhill terrain, respectively.
The computing times of the 6 acoustic fields on the central processor (i7-11700@2.50GHz) are shown in Table 1, and it can be found that: there is a significant effect of the terrain on the computing time of the acoustic fields under the same sound velocity profile. The approximate pattern is: computing time when the source is in uphill terrain> when the source is in flat terrain > when the source is in downhill terrain. However, it should be emphasized, that it is too difficult to analyze the precise effect of terrain on the computing time of the acoustic fields in advance, since the Ray Model is essentially a complex numerical solver model.
The fundamental reason for this difference is that rays propagate on different trajectories in different terrains. Since the discretization steps for the rays are the same in the Ray Model, longer ray trajectories require more calculations. We specifically analyze the propagation trajectory of a ray in two sets of experiments, and the results are shown in Fig. 8 It can be found:
-
1. the ray trajectory is the longest in uphill terrain, followed by flat terrain and the shortest in downhill terrain.
-
2. in the second set of experiments, the rays appear to propagate backwards in the uphill terrain. This is due to the fact that the ray’s grazing angle decreases with each boundary reflection in the uphill terrain. Eventually the grazing angle is less than 0 and the ray propagates in the opposite direction from the initial direction.
In a real marine environment, the terrain of different regions often varies greatly. We will take the region around the Bashi Channel as a sample in Section 4, which has very significant terrain variations (as shown in Fig. 9). Therefore, when the source is deployed at different locations, the propagation of acoustic waves will be affected by different terrains, and the computing time for different acoustic fields will be very different. Therefore, if the acoustic field computing tasks are equally distributed to each process, it is too difficult to ensure the load balance of each process. Since the final execution time of a parallel program depends on the slowest process, this process delays the execution time of the entire parallel program. The occurrence of this load imbalance limits the acceleration effect on one hand and wastes computing resources on the other. Therefore, finding an effective method to solve the load imbalance problem under hybrid parallel computing strategy has become a critical issue to be solved.
Dynamic scheduling is an effective method to alleviate the load imbalance problem20,21,22,23,24. It adaptively adjusts the task allocation according to the load of the processor, or dynamically changes the allocation scheme according to the real-time situation of task execution. The core idea lies in using a centralized process to maintain all the tasks, and extracting some of the tasks from the main process once a process is idle. Under this centralized dynamic scheduling strategy, the full utilization of processing resources can be effectively guaranteed, thus alleviating the problem of inter-process load imbalance. For the hybrid parallel computing strategy based on OpenMP+MPI, thread-level dynamic scheduling can be achieved by adding dynamic control parameters in the compilation guidance statement.In order to achieve process-level dynamic scheduling, this paper designs a process-level dynamic scheduling optimization algorithm, as shown in Algorithm 2.
When building the dynamic scheduling optimization algorithm, it is first necessary to name the underwater acoustic field environment files according to task1.env, task2.env, ......, and taskn.env in order. The taskj.env is computed when a slave process is assigned the jth task.
In the master process, the isidle array is created to store the slave processes’ working status, when isidle(i)=1, it means that the slave process i is in idle state. After that, the underwater acoustic field computing task is assigned sequentially, in each assignment, the master process traverses the isidle array, once until the idle slave process is found, the task is assigned to it, and then the isidle array is updated; if all the elements of the isidle array are 0, it means that all the slave processes are in the working state, and at this time, the master process transitions to the message receiving state and waits for the slave processes to complete the underwater acoustic field computing task. After the slave processes complete the underwater acoustic field computing task, they report to the master process, which then sends the task to the corresponding slave processes, and so on until all the tasks have been assigned, and finally the master process broadcasts to the slave processes.
The slave process initially maintains the task reception state, receives the task, calculates the corresponding underwater acoustic field, reports its process number to the master process when the computing is completed, and then receives a new task, and so on until the slave process receives the broadcast signal from the master process.
Test results and discussion
In this paper, we take the sea area around the Bashi Channel as a study case, the specific range is \(120.3^\circ\)E \(-121.5^\circ\)E , \(20.5^\circ\)N \(-21.5^\circ\)N , divides searching sea area into grids, the longitude is divided into 5 grid points, the latitude is divided into 6 grid points, and each grid point radiates outward in 8 directions to calculate the underwater acoustic field within a range of 30km, and a total of 240 underwater acoustic fields need to be calculated, and the specific schematic diagram is shown in Fig. 10. The map in Fig. 10 represents the terrain within the area (\(115^\circ\) E \(\sim\)\(125^\circ\) E, \(15^\circ\) N \(\sim\)\(25^\circ\) N), and its data source is the ETOP Global Terrain Dataset. The map is created by Python3.12.7, which can be downloaded at https://www.python.org/downloads/. When configuring the underwater acoustic field environment file, we set the source frequency to 15kHz, the depth of the source to 50m, the sea surface to a soft boundary and the seafloor to a hard boundary, the sound velocity profile data from CORA v2.0 dataset, which can be downloaded from the official website at https://mds.nmdis.org.cn/, and the seafloor terrain data from DBDB dataset.
In order to evaluate the acceleration effect of the parallel computing strategy proposed in this paper, we have conducted numerous related tests on the Tianhe-II HPC system. The Tianhe-II HPC system configuration is shown in Table 2. Each node of the system consists of two 12-core Xeon E5-2692 v2 CPUs with 128 GB of CPU memory, Intel Compiler 2018 for the compiler, Red Hat 4.4.7-4 for Linux, mpich-3.1.3 for the MPI, and the system supports OpenMP 4.0. And the compilation options are -mtune= generic -Wall -std=gnu -O3 -I.. /misc -fopenmp. In the tests, each set of tests was conducted five times, and the least time was used as the final result of the test. In multi-process tests, the time at which the last process ends is taken as the final time.
Computed underwater acoustic field results
The Underwater acoustic field environment files are computed by the Tianhe-II HPC system, and 240 underwater acoustic fields can be obtained. The results of the acoustic propagation loss of the horizontal profile at 50 m depth are selected for presentation, and the results are shown in Fig. 11. In the underwater multi-acoustic fields computing model, the directly obtained variable is the sound pressure p, and the variable plotted in Fig. 11 is the acoustic propagation loss TL , which is computed as follows:
where \(p_0\left( r=1 \right)\) represents the sound pressure at 1 m from the source. Acoustic propagation loss indicates the attenuation of acoustic energy at a spatial location; the greater the acoustic propagation loss, the more the acoustic energy is attenuated.
In the image on the left side of Fig. 11, 30 octagonal heat map represents 240 underwater acoustic fields, the location of the sound source is at the midpoint of the octagon, the rays are emitted from the location of the source and radiate in 8 directions at different angles, the rays will conduct acoustic energy to the environment around the path, the warmer the heat map is, the smaller the acoustic propagation loss is, and the higher the energy of the sound wave has, and due to the reciprocal principle of the Underwater acoustic field4, it is easy to detect the source when the sonar is located here; The colder the heat map is, the higher the acoustic propagation loss is, the lower the energy of the sound wave has, when the sonar is located here, it is difficult to detect the source.
The images on the right side of Fig. 11 show the underwater acoustic field with the longest and shortest computing times, respectively. The computing time of the underwater acoustic field is mainly affected by the sound velocity profile and the seafloor terrain, where the rays are deflected towards regions with lower sound velocities and rebound after touching the seafloor. The underwater acoustic field with the shortest computing time corresponds to a shallow seafloor, which is due to the fact that the rays will reach the sea surface after rebounding on the seafloor in a shallow sea, and the ray trajectories are mainly propagated in the depth above 1,000 meters, with a short trajectory; the underwater acoustic field with the longest computing time corresponds to a deep seafloor, which is due to the effect of the deep-sea terrain, and the ray trajectory is complex, with the ray trajectory spreading over the entire seawater layer, and the trajectory being longer and more irregular. Besides, in the shallow sea will form an obvious shallow sea acoustic channel axis (about 100m depth), the deep sea does not exist this phenomenon, this is due to the seabed boundary of this paper are set for the hard seabed, acoustic energy leakage in the seabed is very little, in the shallow sea environment, most of the acoustic energy will be rebounded to the sea surface layer, and the value of the speed of sound near the sea surface layer is relatively small, the acoustic energy will be in the surface layer of the long-distance propagation, and in the deep sea due to the acoustic energy after reflection difficult to reach the sea surface layer, therefore it is difficult for the acoustic channel axis to occur.
Acceleration effects of four parallel computing strategies
The OpenMP-based parallel computing strategy is only applicable to shared-memory machines and thus can only work within a single node, and we set the number of computing cores to 1, 2, 4, 8, 16, and 24.The MPI-based parallel computing strategy can extend the program to multi-node cluster systems, and the number of computing cores tested is set to 1, 2, 4, 8 , 16, 24, 48, 72, 96, and 120. The hybrid parallel computing strategy and dynamic hybrid parallel computing strategy must to be tested on multi-node cluster systems, so we set the number of computing cores to 48, 72, 96, 120, 144, 168, 192, 216, and 240.
The Speedup is usually used as a technical measure of the effect of parallel computing, which is calculated as shown in Eq. (14):
where S is the Speedup, \(T_s\) is the time of serial program execution, and \(T_p\) is the time of parallel program execution.
The computing time and Speedup of the underwater multi-acoustic field computing model under OpenMP-based and MPI-based parallel computing strategies are shown in Table 3 and Fig. 12(a). And the following pattern can be found:
-
The more cores are used, the more significant the Speedup effect is.
-
The Speedup of OpenMP-based strategy is bigger than MPI-based strategy. And the Speedup of OpenMP-based strategy increases almost linearly with the number of cores and reaches 17.68 under 24 cores, which is partly due to the fact that the communication overhead of threads is much smaller than that of processes, and partly due to the fact that OpenMP implements dynamic parallel computing by controlling the parameters. However, OpenMP can only work on a single node, so 17.68 is the maximum Speedup of OpenMP-based strategy on Tianhe-II HPC system.
-
The MPI-based strategy can be scaled to multiple nodes, but the Speedup’s improvement gradually slows down. This is due to the fact that for the MPI-based strategy, each computing core corresponds to one process, and when the number of computing cores is small, many underwater acoustic fields are assigned to one process, thus weakening the load imbalance problem caused by the difference in computing time of different underwater acoustic fields. When the number of computing cores is increased, the number of acoustic fields assigned to one process is smaller, and then the load imbalance problem is more significant, which impedes Speedup’s improvement.
The computing time and Speedup of the underwater multi-acoustic field computing model under the hybrid strategy and the dynamic hybrid strategy are shown in Table 4 and Fig. 12b. And the following pattern can be found:
-
When the number of CPU cores used by the hybrid strategy and the dynamic hybrid strategy are 48, 72, 96 and 120, the Speedup of the hybrid and dynamic hybrid strategy are significantly higher than that of the MPI-based strategy because the hybrid strategy fully utilizes the advantages of MPI scalability and OpenMP efficient communication.
-
The Speedup does not strictly increase with the number of computing cores. For example, the Speedup under 96 cores is larger than under 120 cores, which is caused by the load imbalance in the static allocation. Some underwater acoustic fields with high computing volume are concentrated on a single node which causes a huge computing pressure , thus slowing down the computing time of the whole model and hindering the acceleration effect.
-
It can be seen that, under tests in this paper the Speedup of the dynamic hybrid strategy is significantly higher than that of the hybrid strategy, and the introduction of the dynamic scheduling optimization algorithm improves the Speedup of the hybrid strategy by 39.79% on average, and in particular, under 216 cores, the Speedup of the dynamic hybrid strategy improves by a maximum of 70.78% compared with hybrid strategy. However, it is worth noting that the upward trend of the Speedup of the dynamic hybrid strategy slows down as the cores increases, which is due to the increased pressure of inter-process communication and the need for frequent communication between the master process and the other slave processes.
Load balancing is a key factor that affects the acceleration effect in hybrid parallel computing strategies. The load of a process is the sum of that process’s own underwater acoustic fields’ computing time. The load distribution of each process is obtained by dividing the load of each process by the sum of the loads of all processes. The formula is as follows
where \(\eta _p\) is the load distribution on the pth node, and \(L_p\) is the load on the pth node.
Figure 13 shows a comparison of the load distribution of processes before and after the introduction of the dynamic scheduling algorithm. Each piece of the pie chart represents the percentage of the process’s computing to the total computing, i.e., \(\eta _p\), and the more uniformly the process’s computing is distributed, the better the load balance is. Figure 13 shows that the dynamic hybrid strategy has better load balancing compared to the hybrid strategy. This is because under the hybrid strategy, due to the different computing times for different underwater acoustic fields, the final computing time is likely to have a huge difference even though the number of underwater acoustic fields is the same, whereas the dynamic hybrid strategy employs the idea of dynamic scheduling in allocating acoustic fields, which greatly improves the load imbalance lead by the lack of a priori knowledge. Under 9 processes, corresponding to Fig. 13a and b, the load distribution of the processes under the hybrid strategy ranges from 7.8% to 16.7%, with a range of 8.9%, while the load distribution of the processes under the dynamic hybrid strategy ranges from 10.4% to 13.5%, with a range of only 3.1%. Under 10 processes, corresponding to Fig. 13c and d, the load distribution of the processes under the hybrid strategy ranges from 6.5% to 15.7%, with a range of 9.2%, while the load distribution of the processes under the dynamic hybrid strategy ranges from 9.4% to 12.5%, with a range of only 3.1%.
Range and standard deviation can characterize the degree of dispersion of a set of data from the center value, therefore, we analyzed the range and standard deviation of the load distribution for various test cases and the results are shown in Table 5 and Fig. 14.
In all the tests of this paper, the introduction of the dynamic scheduling optimization algorithm significantly reduces the Range and Standard deviation of the inter-process load distribution. Statistics of the results in Table 5 show that the Range of the dynamic hybrid strategy is only 29.77% of the hybrid strategy on average, and the Standard deviation is only 28.84% of the hybrid strategy on average, which further proves that the dynamic scheduling optimization algorithm has a significant optimization effect on the load balancing.
Within a single node, the process/thread configuration scheme will have an impact on the acceleration effect of the dynamic hybrid strategy. Since OpenMP requires threads within a process to share memory space, process*threads is constant equal to the number of computing cores(24), when the number of processes increases, the number of threads decreases, and the communication overhead between the processes increases, but it is conducive to optimizing the load balance, and the optimal acceleration effect can be obtained when the optimal balance between the communication overhead and the load balance is found. In this paper, the processes/threads in a single node are set to 1/24, 2/12, 3/8, 4/6, 6/4, 8/3 and 12/2, and the computing time is shown in Fig. 15. When processes/threads is 3/8, the underwater multi-acoustic fields computing model has the shortest computing time, which is only 546.23s, with a Speedup of 112.13. At this point, the inter-process communication time and load balance reach the optimal point.
Summary
This paper summarizes the work in the following three points:
-
Based on the underwater acoustic propagation ray theory, an underwater multi-acoustic fields computing model is established. By efficiently organizing the internal variable storage structure and optimizing the algorithm flow, the underwater multi-field computing model overcomes the drawbacks of the traditional underwater acoustic propagation model, which can only compute a single acoustic field. The accuracy of the model is verified by comparing the acoustic propagation loss results of Munk waveguide case and Arctic waveguide case.
-
We analyzed the concurrency of each level of the underwater multi-acoustic fields computing model, and determined the two-level concurrency of the underwater acoustic fields solving loop and rays tracing loop. And We designed three parallel computing strategies (OpenMP-based strategy/MPI-based strategy/hybrid strategy) for the underwater multi-acoustic fields computing model, then computed 240 underwater acoustic fields around the Bashi Channel, and tested the parallel computing strategies on the Tianhe-II HPC system. The results show that the OpenMP-based strategy can achieve efficient Speedup within a single node, and Speedup of 17.68 can be achieved under 24 cores; the MPI-based strategy can be extended to cluster systems, but the parallel efficiency is very low; the hybrid strategy combines the scalability of MPI and the efficient communication of OpenMP, and a better Speedup can be achieved, and a Speedup of 75.72 can be achieved under 240 cores.
-
In order to overcome the load imbalance problem existing at the process level, a dynamic scheduling optimization algorithm is introduced at the process level, and an extra core is added to control the allocation and reception of underwater acoustic field tasks. Test results show that the dynamic scheduling optimization algorithm significantly optimizes the load balance of underwater multi-acoustic fields computing model, thus reducing the computing time of the model. And compared with the hybrid strategy, the dynamic hybrid strategy’s Speedup increases by 39.79% on average, the range of the load distribution is only 29.77% of the hybrid strategy on average, and the standard deviation is only 28.84% of the hybrid strategy on average. We further analyze the impact of the process/thread configuration scheme on the Speedup, and find that the dynamic hybrid strategy has the best Speedup when the processes/threads on Tianhe-II HPC system is 3/8, with a Speedup of 112.13.
Due to the limited capacity of the authors, the research in this paper still has many shortcomings, and the authors believe that there are three efforts worth exploring:
-
Establish a computing model of underwater multi-acoustic fields suitable for low-frequency sound sources and design efficient parallel strategies. Despite the fact that current sonar operates at medium and high frequency, low frequency acoustic waves are still one of the options due to their advantage of weak attenuation under water.
-
Design a heterogeneous parallel computing strategy based on CPU+GPU for underwater acoustic field computing. Currently, the peak computing value of GPU has exceeded that of CPU, which possesses a great advantage in specific computing modes. Considering the characteristics of underwater acoustic field, make full use of the capabilities of CPU and GPU to maximize the acceleration effect.
-
Research efficient route planning algorithm based on underwater acoustic fields information. After obtaining the underwater acoustic fields, the acoustic effect of the search path can be added to the objective function and combined with an efficient optimization algorithm for fast route planning.
Data availibility
The CORA v2.0 dataset is available from the official website at https://mds.nmdis.org.cn/, and the DBDB dataset is available from the corresponding author on reasonable request.
References
Sumithra, G., Ajay, N., Neeraja, N. & Adityaraj, K. Hybrid acoustic system for underwater target detection and tracking. Int. J. Appl. Comput. Math.9, 1–20. https://doi.org/10.1007/s40819-023-01621-4 (2023).
Collins, M. D., Turgut, A., Menis, R. & Schindall, J. A. Acoustic recordings and modeling under seasonally varying sea ice. Sci. Rep.9, 8323. https://doi.org/10.1038/s41598-019-44707-0 (2019).
Tsuchiya, T., Ochi, H., Naoi, J. & Shibata, K. Evaluation of the performance of deep sea survey sonars from the results of search for a sunken ship. Jpn. J. Appl. Phys.38, 3370–3373. https://doi.org/10.1143/JJAP.38.3370 (1999).
Jensen, F. B., Kuperman, W. A., Porter, M. B., Schmidt, H. & Tolstoy, A. Computational Ocean Acoustics 2nd edn. (Springer, Berlin, 2011).
Liu, W. et al. A three-dimensional finite difference model for ocean acoustic propagation and benchmarking for topographic effects. J. Acoust. Soc. Am.150, 1140–1156. https://doi.org/10.1121/10.0005853 (2021).
Ji, B. Middle and High Frequency Transducers with Ultra-wide Operation Band. Ph.D. thesis, Harbin Engineering University (2022).
Porter, M. B. & Bucker, H. P. Gaussian beam tracing for computing ocean acoustic fields. J. Acoust. Soc. Am.82, 1349–1359. https://doi.org/10.1121/1.395269 (1987).
Rodr, O. C. et al. Seismo-acoustic Ray Model benchmarking against experimental tank data. J. Acoust. Soc. Am.132, 709–717. https://doi.org/10.1121/1.4734236 (2012).
Chen, L. & Peng, Z. MPI parallel computation of sound field based on gaussian beam model. Tech. Acoust.30, 34–36 (2011).
Lee, K., Seong, W., Woojae, & Na, Y. Split-step padé solver for three-dimensional cartesian acoustic parabolic equation in stair-step representation of ocean environment. J. Acoust. Soc. Am.146, 2050–2057 (2019).
Zhang, C. & Sun, B. Parallel computation of sound field based on beam tracing model bellhop. J. Appl. Acoust.38, 1–7. https://doi.org/10.11684/j.issn.1000-310X.2019.01.001 (2019).
Xing, C.-X., Song, Y., Zhang, W., Meng, Q.-X. & Piao, S.-C. Parallel computing method of seeking 3d eigen-rays with an irregular seabed. In 2013 IEEE/OES Acoustics in Underwater Geosciences Symposium, vol. 1, 1–5 (IEEE, 2013).
Zho, Y., Luo, W. & Wu, S. Message passing interface parallel optimization of 3d sound propagation model bellhop3d. J. Appl. Acoust.42, 93–99. https://doi.org/10.11684/j.issn.1000-310X.2023.01.012 (2023).
Calazan, R. M., Rodríguez, O. C. & Jesus, S. M. Numerical enhancements and parallel gpu implementation of a 3d gaussian beam model. In Computational Science and Its Applications - ICCSA 2020, vol. 1, 485–500 (IEEE, 2020).
Lan, Q., Ma, S. & Piao, S. Multi-core parallelization and performance evaluation of bellhop3d model. Tech. Acoust.43, 468–473. https://doi.org/10.16300/j.cnki.1000-3630.2024.04.003 (2024).
Fan, X., Supinski, B. R., Sinnen, O. & Giacaman, N. OpenMP: Conquering the Full Hardware Spectrum (Springer, Berlin, 2019).
Nielsen, F. Introduction to HPC with MPI for Data Science (Springer, Berlin, 2016).
Jifar, L. M. Structured programming in MPI: A streaming framework experiment (Scholars’ Press, Atlanta, 2013).
Wen, S. et al. Ppat: A thread performance analysing tool for pthread parallel application. Comput. Appl. Softw.29, 43–47. https://doi.org/10.3969/j.issn.1000-386x.2012.11.012 (2012).
Zhu, J. et al. Hitchhiker: Accelerating oram with dynamic scheduling. IEEE Trans. Comput.72, 1–15. https://doi.org/10.1109/TC.2023.3248272 (2023).
Peng, Y. et al. Dynamic client scheduling enhanced federated learning for UAVS. IEEE Wirel. Commun. Lett.2024, 1. https://doi.org/10.1109/LWC.2024.3400813 (2024).
Zhai, B. D. A dynamic task scheduling algorithm for airborne device clouds. Int. J. Aerospace Eng.1–17, 2024. https://doi.org/10.1155/2024/9922714 (2024).
Peng, Y. L. D. H. F. W. Lag-based schedulability analysis for preemptive global EDF scheduling with dynamic cache allocation. J. Syst. Architect.147, 103045. https://doi.org/10.1016/j.sysarc.2023.103045 (2024).
Zhang, M. X. M. Z. Genetic programming and reinforcement learning on learning heuristics for dynamic scheduling: A preliminary comparison. IEEE Comput. Intell. Mag.19, 18–33. https://doi.org/10.1109/MCI.2024.3363970 (2024).
Author information
Authors and Affiliations
Contributions
Siyuan Liao conducted the experiment and wrote the manuscript, Wenbin Xiao and Yongxian Wang provided technical guidance and laboratory equipment. All authors reviewed the manuscript. And this work was supported by Hu’nan Provincial Natural Science Foundation* under Grant No. 2024JJ5409 and the National Natural Science Foundation of China* under Grant No. 51709267.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liao, S., Xiao, W. & Wang, Y. Dynamic hybrid parallel computing of the Ray Model for solving underwater acoustic fields in vast sea. Sci Rep 14, 25385 (2024). https://doi.org/10.1038/s41598-024-76564-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-76564-x
This article is cited by
-
Underwater acoustic target recognition based on multi-scale feature and CRDNet
The Journal of Supercomputing (2025)