WO2013033651A1 - Équation intégrale d'onde élastique pour traitement de données 3d sur gpgpu - Google Patents
Équation intégrale d'onde élastique pour traitement de données 3d sur gpgpu Download PDFInfo
- Publication number
- WO2013033651A1 WO2013033651A1 PCT/US2012/053546 US2012053546W WO2013033651A1 WO 2013033651 A1 WO2013033651 A1 WO 2013033651A1 US 2012053546 W US2012053546 W US 2012053546W WO 2013033651 A1 WO2013033651 A1 WO 2013033651A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- wave
- stress
- velocity
- lag
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 claims abstract description 56
- 238000003384 imaging method Methods 0.000 claims abstract description 50
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 17
- 230000001902 propagating effect Effects 0.000 claims abstract description 6
- 230000015654 memory Effects 0.000 claims description 99
- 239000002245 particle Substances 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 description 19
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 17
- 125000001475 halogen functional group Chemical group 0.000 description 15
- 230000008569 process Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 4
- 229930195733 hydrocarbon Natural products 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 239000004215 Carbon black (E152) Substances 0.000 description 3
- 239000008186 active pharmaceutical agent Substances 0.000 description 3
- 230000027455 binding Effects 0.000 description 3
- 238000009739 binding Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 150000002430 hydrocarbons Chemical class 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000012530 fluid Substances 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000009530 blood pressure measurement Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 125000001183 hydrocarbyl group Chemical group 0.000 description 1
- 229910052500 inorganic mineral Inorganic materials 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000011707 mineral Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01V—GEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
- G01V1/00—Seismology; Seismic or acoustic prospecting or detecting
- G01V1/28—Processing seismic data, e.g. for interpretation or for event detection
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01V—GEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
- G01V2210/00—Details of seismic processing or analysis
- G01V2210/60—Analysis
- G01V2210/67—Wave propagation modeling
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01V—GEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
- G01V2210/00—Details of seismic processing or analysis
- G01V2210/60—Analysis
- G01V2210/67—Wave propagation modeling
- G01V2210/679—Reverse-time modeling or coalescence modelling, i.e. starting from receivers
Definitions
- TITLE FULL ELASTIC WAVE EQUATION FOR 3D DATA PROCESSING ON GPGPU
- the disclosure is related to 3D data processing of physical parameters.
- Applications include seismic exploration for oil and gas, and more particularly to processing and displaying seismic data attributes of the earth's subsurface.
- Seismic exploration for hydrocarbons generally is conducted using a source of seismic energy and receiving and recording the energy generated by the source using seismic detectors.
- the seismic energy source may be an explosive charge or another energy source having the capacity to impart impacts or mechanical vibrations at or near the earth's surface.
- Seismic waves generated by these sources travel into the earth subsurface and are reflected back from boundaries and reach the surface of the earth at varying intervals of time depending on the distance traveled and the characteristics of the subsurface material traversed.
- the return waves are detected by the sensors and representations of the seismic waves as representative electrical signals are recorded for processing.
- Data acquisition for oil exploration may have a negative impact on the environment.
- the impact of oil exploration methods on the environment may be reduced by using low-impact methods and/or by narrowing the scope of methods requiring an active source, including reflection seismic and electromagnetic surveying methods.
- Geophysical and geological methods are used to determine well locations. Expensive exploration investment is often focused in the most promising areas using relatively slow methods, such as reflection seismic data acquisition and processing. The acquired data are used for mapping potential hydrocarbon-bearing areas within a survey area to optimize exploratory well locations and to minimize costly nonproductive wells.
- the time from mineral discovery to production may be shortened if the total time required to evaluate and explore a survey area can be reduced by applying selected methods alone or in combination with other geophysical methods. Some methods may be used as a standalone decision tool for oil and gas development decisions when no other data is available. Preferable methods will be economical, have a low environmental impact, and relatively efficient with rapid data acquisition and processing.
- a computerized method for imaging subsurface energy sources comprising acquiring multicomponent seismic data, arranging 3D volumes of velocity and stress parameters in a global computer memory by ordering adjacent planes from like-indexed blocks extracted from the 3D parameter volumes so that like-indexed blocks are adjacent in the computer memory, reverse propagating the multicomponent seismic data to obtain a reverse time wave field and solving a full elastic wave equation for the reverse time wave field using a graphics processing unit to obtain wave field decomposition data.
- the method may also include applying at least one imaging condition to the wave field decomposition data to obtain imaging data and/or storing the imaging data for display.
- Fig. 1 illustrates an elementary 2D cell or grid point
- Fig. 2 illustrates 16 grid points, a thread-block grid
- Fig. 3 illustrates relationships of stresses and velocities for computations
- Fig. 4 illustrates the comparison of a 2D cell or grid point with a 3D cell or grid point representing stresses and velocities for computations
- Fig. 5 illustrates a 3D cell or grid point representing stresses and velocities for computations indexed using x, y and z as well as 1, 2 and 3;
- Fig. 6 illustrates a thread-block grid with a halo for the stress update calculations
- Fig. 7 illustrates a thread-block grid with a halo for the velocity update calculations
- Fig. 8 illustrates an example global memory address configuration relative to a 2D calculation
- Fig. 9 illustrates a global memory address configuration with the element sizes equal
- Fig. 10 illustrates a global memory address configuration for thread-block grids
- Fig. 11 illustrates a visualization of the 3D Elements arranged into blocks to facilitate rapid calculations
- Fig. 12 illustrates a block memory organization schematic relative to organizing data planes to facilitate rapid calculations
- Fig. 13 illustrates the relationship between GPU memory threads and the planes of data for elements for each block
- Fig. 14 illustrates a GPU memory structure showing the relationship of data blocks to data planes to elements to threads
- Fig. 15 illustrates pseudo code for a non-limiting embodiment of the present invention
- Fig. 16a, 16b and 16c illustrate various subsurface related images according non- limiting embodiments of the present invention.
- Fig. 17 illustrates a computer system for implementing various non-limiting embodiments of the present invention.
- GPGPU General purpose graphics processing units
- This disclosure includes descriptions of embodiments to arrange data to be efficiently processed with GPGPU systems.
- 3D problem An example of a 3D problem is used to illustrate how 3D data may be arranged to efficiently take advantage of the computing power available on a GPGPU system.
- the 3D problem used as an example is a time reverse imaging example
- Information to enable the determination of the location of underground fluids, underground fluid reservoirs or seismic source locations may be extracted from seismic waves and vibrations measured at the earth's surface using various imaging conditions in data processing. These waves may be measured using passive or active seismic data acquisition methods.
- One or more sensors are used to measure vertical and horizontal components of motion due to background seismic waves at multiple locations within a survey area. These components may be measured separately or in combination and may be recorded as signals representing displacement, velocity, and/or acceleration.
- the sensor equipment for measuring seismic waves may be any type of seismometer. Seismometer equipment having a large dynamic range and enhanced sensitivity compared with other transducers may provide the best results (e.g., multicomponent earthquake seismometers).
- a number of commercially available sensors utilizing different technologies may be used, e.g. a balanced force feed-back instrument or an electrochemical sensor. An instrument with high sensitivity at very low frequencies and good coupling with the earth enhances the efficacy of the method.
- CUD A Computer Unified Device Architecture
- NVIDIA Network-to-Network Interface
- CUDA Compute Unified Device Architecture
- kernels, or C-like functions that are executed many times in parallel by a number of CUDA threads.
- CUDA threads are extremely lightweight, and have very little creation overhead and are fast-switching. CUDA uses thousands of threads to achieved high efficiency and high throughput, whereas, current multi-core CPU may utilize only a few threads.
- a kernel is executed by an array of threads in parallel, and threads may be arranged into one, two or three dimensional blocks. Thread blocks must be able to execute independently at any order, parallel or in series. The thread and block IDs are used as a convenient index to compute memory addresses and make control decisions. Threads can access multiple memories spaces-per thread local memory, per thread block shared memories and device global memories.
- the number of blocks a multiprocessor can process at once depends on how many registers per thread and how much shared memory per blocks are required for a given kernel, because the register and shared memories are scarce and split among all the threads.
- TRI time reverse imaging
- the CUDA implementation uses a finite difference full elastic wave propagator, wave field decomposition, and six imaging conditions that compute absolute particle velocities and P-wave and S-wave auto correlations. Memory reads and writes may optimized by using texture bindings. Initial and/or boundary condition calculations may be built into the GPU update kernels resulting in substantial increases in elements per second throughput.
- Time reverse imaging is a migration-like algorithm to locate subsurface sources or diffractions by propagating data, which is reversed in time through a velocity model. With no down-going source wave field, the algorithm may be roughly considered as approximately the data half of reverse-time migration that may be used in conventional seismic data processing.
- one chain of operations is: elastic finite difference propagation, wave-field decomposition via spatial derivatives, and evaluation of up to six correlation based imaging conditions.
- Propagation time is the exterior loop for all operations, and images are in physical space.
- An illustration that may be used to evaluate the accuracy and performance of the GPU implementation is forward modeled data due to a subsurface vertical single force propagating through a homogeneous medium with an image showing focusing of the P and S wave modes at the appropriate location in space.
- TRI is an ideal application of a real world 3D problem for the GPGPU, since elastic wave propagation, wave field decomposition and updating imaging conditions are all computationally intensive. The computations may be performed entirely on the GPU, and host (CPU), since communication may be minimally required-only for initialization, and for writing back the final solution.
- pu f + ( ⁇ + 2 ⁇ ) ⁇ ( ⁇ ⁇ u) - ⁇ X (V X u).
- the elastic wave equation Equation 1
- Equation 1 may be discretized using the centered finite difference technique.
- the elastic wave equation can be numerically propagated in time according to these following equations, which are first order in both time and space:
- Density is presented by p, ⁇ and ⁇ are the Lame coefficients, V % and 3 ⁇ 4 f are the horizontal and vertical particle velocities, ⁇ ⁇ and ⁇ ⁇ are the horizontal and vertical stresses, and ⁇ ⁇ is the shear stress, all at time steps t.
- V % and 3 ⁇ 4 f are the horizontal and vertical particle velocities
- ⁇ ⁇ and ⁇ ⁇ are the horizontal and vertical stresses
- ⁇ ⁇ is the shear stress, all at time steps t.
- These velocities and stresses can be conveniently arranged onto a staggered grid. To update the state of any element on the staggered grid, only four neighboring elements— top, bottom, left, and right— are needed in the 2D case.
- a free-surface (stress free) boundary condition is imposed for one side (the presumed surface) and tapered absorbing layers for the other three sides.
- Fig. 1 illustrates a 2D cell, or grid node, staggered finite grid for full elastic wave propagation.
- the horizontal velocity is Vx.
- the vertical velocity is Vz.
- the horizontal stress is Sxx (the same as ⁇ ⁇ in the equations above).
- the vertical stress is Szz and the cross stress is Sxz.
- Fig. 2 illustrates 16 grid nodes of 2D cells.
- Fig. 3 illustrates the relationships of each the stresses and velocities for calculations of equations 2 through 6. Note that the symbols 'S' and ' ⁇ ' are used interchangeably in the text, equations and figures, but represent exactly the same thing.
- Fig. 4 illustrates taking these concepts to the 3D case.
- a third dimensional velocity, Vz, and stress Sz have been added along with cross stresses Sxz and Syz.
- Fig. 5 illustrates the 3D case of Fig. 4 using numerical indices.
- GPUs are designed for floating point operations. The time spent on calculation is small compared to the time spent accessing data and updating results in memory. Having an 10 access pattern where data are aligned for appropriately coalesced and optimized reads and writes is essential for a fast efficient implementation of 3D data parameterizing real world situations.
- stresses and velocities are each separated into a linear array in memory.
- N- by M-element model there are (N+l) x MV X elements, N x (M+1)V Z elements, N x ⁇ ⁇ and ⁇ ⁇ elements and (N-l) x (M-l) ⁇ ⁇ elements.
- This arrangement keeps memory blocks relatively small to minimize memory paging. Smaller memory blocks are also easier to align with GPU memory warps to ensure coalesced reads and writes from a given thread. If the size of an array is not a factor of the half warp, it is padded to the next half warp factor.
- a warp is the number of parallel threads that are created, managed, scheduled and executed in a group by GPU's SIMT (single instruction, multiple-thread) unit.
- SIMT single instruction, multiple-thread
- the graphics architecture for the example implementation disclosed herein has a warp size of 32, but this is not a limitation of the embodiments disclosed here, since the embodiments may be implemented in different architectures and with different warp ranges.
- One possibility for implementation of 2D seismic data processing is to take advantage of CUDA built-in texture memory support by binding each array to texture and by using texture fetches to read data. Since there is a localized access pattern (top, bottom, left, right neighbors), threads of the same warp always read texture addresses that are close together. Texture memory space is cached on hardware level and the latency of addressing calculation is hidden better through thread parallelization, so there are significant speedups for data loaded with texture fetches compared to direct global memory reads. Having each thread load the data as needed through texture fetch is faster than first putting all needed data for a block into shared memory. Even though this implementation, on average, has twice the read redundancy, it keeps the memory usage for each thread to only a few registers, and thus allows more threads to run in parallel, which results in more rapid processing.
- CUDA's capabilities depend on the GPU type. This disclosure uses as an example the specifications of compute capability of 1.3 GPUs (e.g. GTX 285, Tesla C1060), for the examples, but the methodology is straightforward to generalize to all other GPU types.
- CUDA launches its threads in thread-blocks, configurable to be one to three dimensions, in this example a maximum 512 x 512 x 64 for each dimension and maximum 512 total threads. These blocks can be arranged in grids, configurable to be one to two dimensions, maximum 65535 x 65535.
- the thread-blocks are arranged in a configuration of 16 by 16, resulting in 256 total threads, in a grid of nx / 16 by nz / 16, resulting in nx * nz / 256 total blocks.
- a maximum of 4 blocks can be running on 1 multiprocessor at the same time. This is due to the way instructions are executed:
- the blocks for this example implementation have 256 threads each, which is why a multiprocessor can manage 4 at a time. Given 30 multiprocessors, this results in 120 active threadblocks, or 30720 threads. Again, these numbers are for example only, it is the data arrangement that will lead to increased efficiencies.
- the number of active threads (30720) cannot be mapped directly to the number of elementary cells processed in parallel.
- a multiprocessor has 32 warps with 32 instructions each scheduled, it executes one warp at a time.
- global memory IO operations take up hundreds of clock cycles and the multiprocessor uses this time to process other instructions. This way, the memory fetch latency can be "hidden" or accommodated to a certain amount.
- a multiprocessor also has on-chip shared memory, which (in this example) is 16 KB in size and can be used by all threads inside a thread-block. The good thing about this memory is that it is very fast, a drawback can be that there may be a limited amount, so it must be carefully allocated.
- the shared memory is also partitioned among thread-blocks, so the 16KB may be divided by 4 simultaneously running thread-blocks, allowing 4 KB to each.
- Fig. 6 illustrates how the halo outside and adjacent the thread block is juxtaposed in relation to each thread-block grid for the stress update calculation.
- Velocities including the velocity data from the halo zones that are used to update the stress calculations are loaded from global memory to the shared (local) memory.
- the stresses are loaded from global memory to local registers for the update calculations.
- the updated stress calculations are then written from the local registers back to global memory.
- Fig. 7 illustrates how the halo is juxtaposed in relation to each thread- block grid for the velocity update calculation. Stresses, including the stress data from the halo zones that are used to update the velocity calculations are loaded from global memory to the shared (local) memory. The velocities are loaded from global memory to local registers for the update calculations. The updated velocity calculations are then written from the local registers back to global memory.
- Fig. 8 illustrates a global memory address configuration.
- aligning data in this way poses three major problems: 1) Operands (the variables V x , V y , ⁇ ⁇ , ⁇ ⁇ and ⁇ ⁇ ) are potentially scattered all over global memory, so to update V x from ⁇ ⁇ and ⁇ ⁇ , potentially large distances in memory space may need to be traversed, this applies to all 5 equations (equations 2 to 6); 2) Operands for a dz operation are at least nx-1 addresses apart; 3) The element sizes are different: V x is nx+1 by nz, V z is nx by nz+1, ⁇ ⁇ and ⁇ ⁇ are nx by nz and ⁇ ⁇ is nx-1 by nz-1.
- Fig. 9 illustrates a global memory address configuration with the element sizes equal.
- Fig. 10 illustrates a global memory address configuration for thread-block grids. For simplicity, the thread block grid has nine sectors, but it will be appreciated thread block grids usually have a larger number of sectors.
- the data are accessed by thread-blocks, which in this example is a team of 16 by 16 threads.
- the data are arranged such that all a thread-block will ever need (except halos) is in one place, globally arranged block after block.
- a block would contain portions of V x , V y , ⁇ ⁇ , ⁇ ⁇ and ⁇ ⁇ , exactly the corresponding data that thread-blocks require.
- a thread block "knows" its coordinates inside the thread-blocks grid (gx, gy) and can use the coordinates to access its relevant data.
- TRI processing includes computing imaging conditions. Imaging conditions collapse the propagation time axis into physical space to allow visualization of wave-field focusing. The absolute particle velocity and P- and S- Wave potentials may be computed after elastic propagation. The equations used are:
- the like imaging conditions include i) the zero-lag of the P-wave autocorrelation, ii) the zero-lag of the S-wave autocorrelation, iii) the zero-lag of the cross-correlation of the P- and S-wave energy densities, iv) autocorrelation of the absolute value of particle motion, v) maximum over all time, and vi) the crosscorrelation of the energy density functions EpEs. Additionally, these imaging conditions may be generally applied to any of: i) particle velocity measurements, ii) particle acceleration measurements, iii) particle pressure measurements and iv) particle displacement measurements.
- the imaging conditions are updated on every grid point at every time step by reading the previous time-step value from global memory, calculating the incremental value, and then writing the updated value back to global memory.
- the propagator consists of two kernels, one to update stress elements ( ⁇ ⁇ , ⁇ ⁇ , ⁇ ⁇ ), and the other to update velocities (V X ,V Z ). For each time step, stress is updated first, then velocity. Each thread updates either the stresses or the velocities at a given array element location.
- the free surface condition is achieved by setting the stress value to zero at the surface boundary during the velocity pass.
- the absorbing boundaries may be applied as a Cosine taper function to the other three sides at the stress pass.
- Wave field decomposition and imaging conditions are computed in a separate kernel from the propagation.
- the computational complexity is on par with elastic element updates, so roughly half of the computation time is spend on elastic element updates, the other half on wave field decomposition and updating imaging conditions.
- the propagator may also be used for forward time modeling by recording particle velocities at desired locations.
- Fig. 11 illustrates a GPU data arrangement, which is a visualization of 3D elements (e.g. Velocities, Stresses and Cross Stresses).
- 3D elements e.g. Velocities, Stresses and Cross Stresses.
- the most critical optimization parameter to realize the significant performance capabilities of GPU cards is to minimize the stride-length of memory skips across the ID memory array streaming through each processing thread.
- the size of each spatial volume in the model domain is also limited. Both issues are addressed by an appropriate domain-decomposition strategy focused on light-weight 2D planes of wave field and parameter values. Therefore, small planes (P0 to Pn-1, see Fig. 12) from the propagation domain is sequentially stored in memory for every wave field and parameter field, followed by the exterior loop over the third spatial dimension.
- This set of planes defines a Thread Block domain, and a set of blocks contains the entire model domain, consisting of N Elements, each nl by n2 by n3. Note that the wave and parameter fields are consecutive planes before considering the third spatial dimension.
- Fig. 13 illustrates the distribution of values in GPU memory, though adds some of the concepts of 3D spatial structure for an alternative view of the organization utilized.
- the "speed" of the axis is then determined by the distance between neighboring elements:
- the fast axis has its elements neighboring directly, the middle axis has nl elements between neighbors and the slow axis has nl*n2 elements between neighbors.
- Thread-Block GPU threads are organized into a Thread-Block. Functions on the GPU are executed by Thread-Blocks, meaning that the instructions defined in the functions are executed by a group of threads at once (almost).
- the size of the Thread-Block is configurable in one, two or three dimensions.
- the number of Thread-Blocks scheduled is also configurable in one, two or three dimensional grids.
- the grid will be configured Nl / BLOCK_DIM by N2 / BLOCK_DIM, where Nl and N2 are nl and n2 padded to multiples of BLOCK_DIM.
- nl Length of fast axis, or axis 1.
- n2 Length of middle axis, or axis 2.
- n3 Length of slow axis, or axis 3.
- threadldx.x When executing a function on the GPU, a Thread-Block has access to its grid and block coordinates. The block coordinates, threadldx.x and threadldx.y, inform a thread, where in the block it resides. This is used to access the proper addresses in the nl-n2 plane.
- FIG. 15 shows an example of pseudocode for GPU based 2D TRI. Threads are synchronized between each kernel to ensure all the values are updated for the entire simulation space before moving on to the next calculation.
- Memory is allocated on the host (CPU) and GPU device for V x , V z , ⁇ ⁇ , ⁇ ⁇ and ⁇ ⁇ , sources and imaging conditions. Sources, as used herein, are the seismic data input to the TRI.
- the operands V x , V z , ⁇ ⁇ , ⁇ ⁇ and ⁇ ⁇ are initialed and source data are loaded. Data are transferred from the host CPU to the GPU device.
- An example of a physical model is -15,000 m long by -4,000 m deep, discretized into a 10m x 10m grid, with 11 calculations per time step, resulting in -6,600,000 elements. Threads are organized into a 2D grid of blocks, each block consisting of 16 x 16 threads. With this block/grid arrangement, the thread and block IDs are used to index into the simulation space following the CUDA paradigm.
- Fig. 16a illustrates Horizontal Velocity Data as recorded by surface receivers (Ricker Wavelet, fundamental frequency 3 Hz) recorded at every grid point at the surface from a forward model of a single vertical point source.
- Fig. 16b illustrates a forward model of a single vertical point source. Receiver data is reversed in time and propagated.
- Fig. 16c shows the TRI result, which highlights the source location using the P-S-Wave Potential Cross Correlation Imaging Condition 3 ⁇ 4 3 ⁇ 4 . All simulations performed on GPU.
- N x and N z are numbers of horizontal and vertical grid points and N is the number of time steps.
- Tj and T c are Java and CUDA execution time measured in seconds.
- Ej and E c are Java and CUDA computational throughputs, in million elements per second. They are calculated by Nx x Nz x Nt x (5+6) / runtime. The factor of (5 + 6) is because there are 5 elastic properties (or Elements) and 6 imaging conditions to update at every time step in the 2D case.
- the CUDA implementation of TRI which contains six imaging conditions and a finite difference full elastic wave propagation solver, that all runs fully on a GPU. Data are divided into small, manageable pieces to leverage CUDA's built-in texture functions, and ensure coalesced memory addressing from threads to for optimal memory reads and writes.
- the embodiment disclosed herein illustrates a CUDA implementation that achieves a computational throughput of more than 5,000 million elements per second.
- FIG. 11 The mapview of Elements E(n) in Fig. 11 schematically illustrates the data Block arrangement, B0 to B8, so that the same blocks, meaning like-indexed blocks, from other Elements (e.g. volumes of physical parameters) are closer together in memory to enable more rapid data transfer to speed computational speeds.
- Fig. 12 illustrates the data arrangement in memory of adjacent Planes (P0 to Pn-1) and the like-indexed Blocks of Elements are closer together in memory for data transfer efficiency. As illustrated, the adjacent Planes of each Block for all Elements are arranged in memory so that reads and writes are most efficient.
- Fig. 11 The mapview of Elements E(n) in Fig. 11 schematically illustrates the data Block arrangement, B0 to B8, so that the same blocks, meaning like-indexed blocks, from other Elements (e.g. volumes of physical parameters) are closer together in memory to enable more rapid data transfer to speed computational speeds.
- Fig. 12 illustrates the data arrangement in memory of adjacent Planes (P0 to Pn-1) and
- FIG. 13 then is an illustration of this result, where threads are in memory arranged for maximum efficiency.
- Fig. 14 illustrates this data arrangement hierarchy. Starting from the bottom memory linear array the BLOCKS (like-indexed Blocks extracted from the Elements) are arranged in adjacent PLANES of the ELEMENTS. When the ELEMENTS are arranged in this manner, the thread blocks are in memory so that the GPU read and write sequences facilitate the maximally efficient throughput for GPU operations.
- BLOCKS like-indexed Blocks extracted from the Elements
- Fig. 17 illustrates a schematic example of the hardware and operating environment for which embodiments as described herein and their equivalents may be practiced.
- the description of Fig. 17 includes a general description of computer hardware, computing environment or information handling system for which the embodiments may be implemented. Although specific hardware may not be required, embodiments may be implemented in the general context of computer-executable instructions, such as program modules, being executed by a computer. Various embodiments may be practiced with a personal computer, a mainframe computer or combinations that include workstations with servers.
- Program modules include routines, programs, objects, components and data structures for performing tasks, processing data, and recording and displaying information.
- An information handling system is any instrumentality or aggregate of instrumentalities primarily designed to compute, classify, process, transmit, receive, retrieve, originate, switch, store, display, manifest, measure, detect, record, reproduce, handle or utilize any form of information, intelligence or data for business, scientific, control or other purposes. Examples include personal computers and larger processors such as servers, mainframes, etc, and may contain elements illustrated in Fig. 17.
- Embodiments may be practiced with various computer or information handling system configurations that separately or in combination may include handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network computers, minicomputers, mainframe computers, and the like. Embodiments may be practiced with tasks performed in and over distributed computing environments that include remote processing devices linked through a communications network. Program modules operating in distributed computing environments may be located in various memory locations, both local and remote.
- Fig. 17 is illustrative of hardware and an operating environment for implementing a general purpose computing device or information handling system in the form of a computer 10.
- Computer 10 includes a processor or processing unit 11 that may include Onboard' instructions 12.
- Computer 10 has a system memory 20 attached to a system bus 40 that operatively couples various system components including system memory 20 to processing unit 11.
- the system bus 40 may be any of several types of bus structures using any of a variety of bus architectures as are known in the art.
- processing unit 11 While one processing unit 11 is illustrated in Fig. 17, there may be a single central-processing unit (CPU) or a graphics processing unit (GPU), or both or a plurality of processing units.
- Computer 10 may be a standalone computer, a distributed computer, or any other type of computer.
- System memory 20 includes read only memory (ROM) 21 with a basic input/output system (BIOS) 22 containing the basic routines that help to transfer information between elements within the computer 10, such as during start-up.
- System memory 20 of computer 10 further includes random access memory (RAM) 23 that may include an operating system (OS) 24, an application program 25 and data 26.
- OS operating system
- application program 25 application program 25 and data 26.
- Computer 10 may include a disk drive 30 to enable reading from and writing to an associated computer or machine readable medium 31.
- Computer readable media 31 includes application programs 32 and program data 33.
- computer readable medium 31 may include programs to process seismic data, which may be stored as program data 33, according to the methods disclosed herein.
- the application program 32 associated with the computer readable medium 31 includes at least one application interface for receiving and/or processing program data 33.
- the program data 33 may include seismic data acquired according to embodiments disclosed herein.
- At least one application interface may be associated with calculating imaging conditions for locating subsurface hydrocarbon reservoirs.
- the disk drive may be a hard disk drive for a hard drive (e.g., magnetic disk) or a drive for a magnetic disk drive for reading from or writing to a removable magnetic media, or an optical disk drive for reading from or writing to a removable optical disk such as a CD ROM, DVD or other optical media.
- the drive 30 and associated computer-readable media 31 enable nonvolatile storage and retrieval for one or more application programs 32 and data 33 that include computer-readable instructions, data structures, program modules and other data for the computer 10.
- Any type of computer-readable media that can store data accessible by a computer including but not limited to cassettes, flash memory, digital video disks in all formats, random access memories (RAMs), read only memories (ROMs), may be used in a computer 10 operating environment.
- the application programs 32 may be associated with one or more application program interfaces.
- An application programming interface (API) 35 may be an interface that a computer system, library or application provides in order to allow requests for services to be made of it by other computer programs, and/or to allow data to be exchanged between them.
- An API 35 may also be a formalized set of software calls and routines that can be referenced by an application program 32 in order to access supporting application programs or services, which programs may be accessed over a network 90.
- APIs 35 are provided that allow for higher level programming for displaying and mapping subsurface reservoirs.
- APIs are provided for receiving seismic data, and decomposing, merging, smoothing and averaging the data.
- the APIs allow for determining the output of imaging condition processing and storing it for display.
- Serial interface 50 may be a universal serial bus (USB).
- a user may enter commands or data into computer 10 through input devices connected to serial interface 50 such as a keyboard 53 and pointing device (mouse) 52.
- Other peripheral input/output devices 54 may include without limitation a microphone, joystick, game pad, satellite dish, scanner or fax, speakers, wireless transducer, etc.
- Other interfaces (not shown) that may be connected to bus 40 to enable input/output to computer 10 include a parallel port or a game port.
- Computers often include other peripheral input/output devices 54 that may be connected with serial interface 50 such as a machine readable media 55 (e.g., a memory stick), a printer 56 and a data sensor 57.
- a seismic sensor or seismometer for practicing embodiments disclosed herein are nonlimiting examples of data sensor 57.
- a video display 72 e.g., a liquid crystal display (LCD), a flat panel, a solid state display, or a cathode ray tube (CRT)
- a map or subsurface display created from imaging condition values as disclosed herein may be displayed with video display 72.
- a computer 10 may operate in a networked environment using logical connections to one or more remote computers. These logical connections are achieved by a communication device associated with computer 10.
- a remote computer may be another computer, a server, a router, a network computer, a workstation, a client, a peer device or other common network node, and typically includes many or all of the elements described relative to computer 10.
- the logical connections depicted in Fig. 17 include a local-area network (LAN) or a wide-area network (WAN) 90.
- LAN local-area network
- WAN wide-area network
- the computer 10 When used in a networking environment, the computer 10 may be connected to a network 90 through a network interface or adapter 60.
- computer 10 may include a modem 51 or any other type of communications device for establishing communications over the network 90, such as the Internet.
- Modem 51 which may be internal or external, may be connected to the system bus 40 via the serial interface 50.
- a networked deployment computer 10 may operate in the capacity of a server or a client user machine in server-client user network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
- program modules associated with computer 10, or portions thereof may be stored in a remote memory storage device.
- the network connections schematically illustrated are for example only and other communications devices for establishing a communications link between computers may be used.
Landscapes
- Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Acoustics & Sound (AREA)
- Environmental & Geological Engineering (AREA)
- Geology (AREA)
- General Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Geophysics (AREA)
- Image Processing (AREA)
Abstract
La présente invention concerne un procédé et un système associés à un procédé informatisé d'imagerie de sources d'énergie souterraines comprenant l'acquisition de données sismiques multicomposants, l'agencement de volumes 3D associés à des paramètres de vitesse et de contrainte dans une mémoire informatique générale par classement de plans adjacents issus de blocs quasi-indexés extraits des volumes 3D associés aux paramètres de façon à ce que les blocs quasi-indexés soient adjacents dans la mémoire de l'ordinateur, à procéder à une propagation à rebours des données sismiques multicomposants afin d'obtenir un champ d'onde à rebours et à résoudre une équation intégrale d'onde élastique pour le champ d'onde à rebours au moyen d'une unité de traitement graphique afin d'obtenir des données de décomposition du champ d'onde. Ledit procédé peut également comprendre une étape consistant à appliquer au moins une condition d'imagerie aux données de décomposition du champ d'onde afin d'obtenir des données d'imagerie et/ou à stocker les données d'imagerie à des fins de visualisation.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161529758P | 2011-08-31 | 2011-08-31 | |
US61/529,758 | 2011-08-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013033651A1 true WO2013033651A1 (fr) | 2013-03-07 |
Family
ID=47756936
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2012/053546 WO2013033651A1 (fr) | 2011-08-31 | 2012-08-31 | Équation intégrale d'onde élastique pour traitement de données 3d sur gpgpu |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2013033651A1 (fr) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104635260A (zh) * | 2013-11-08 | 2015-05-20 | 北京瑞拉爱堡地质勘探技术有限公司 | 一种地震数据叠前成像方法 |
CN105807315A (zh) * | 2016-03-14 | 2016-07-27 | 中国石油大学(华东) | 弹性矢量逆时偏移成像方法 |
CN106154331A (zh) * | 2016-06-29 | 2016-11-23 | 中国石油化工股份有限公司 | 正交介质地震波场模拟频散压制方法 |
US10296340B2 (en) | 2014-03-13 | 2019-05-21 | Arm Limited | Data processing apparatus for executing an access instruction for N threads |
CN113945994A (zh) * | 2020-06-30 | 2022-01-18 | 中国石油化工股份有限公司 | 使用有限差分模型进行高速多源加载和波场检索的方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040141414A1 (en) * | 2001-03-13 | 2004-07-22 | Conocophillips Company | Method and process for prediction of subsurface fluid and rock pressures in the earth |
WO2010085499A1 (fr) * | 2009-01-20 | 2010-07-29 | Spectraseis Ag | Estimation signal sur bruit dans le domaine image |
US20110115787A1 (en) * | 2008-04-11 | 2011-05-19 | Terraspark Geosciences, Llc | Visulation of geologic features using data representations thereof |
-
2012
- 2012-08-31 WO PCT/US2012/053546 patent/WO2013033651A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040141414A1 (en) * | 2001-03-13 | 2004-07-22 | Conocophillips Company | Method and process for prediction of subsurface fluid and rock pressures in the earth |
US20110115787A1 (en) * | 2008-04-11 | 2011-05-19 | Terraspark Geosciences, Llc | Visulation of geologic features using data representations thereof |
WO2010085499A1 (fr) * | 2009-01-20 | 2010-07-29 | Spectraseis Ag | Estimation signal sur bruit dans le domaine image |
Non-Patent Citations (1)
Title |
---|
ARTMAN ET AL.: "Source Location Using Time-Reverse Imaging.", GEOPHYSICAL PROSPECTING., vol. 58, 2010, pages 861 - 873, Retrieved from the Internet <URL:http://www.spectraseis.com/wp-content/uploads/2012/03/Source-location-using-time-reverse-imaging.pdf> [retrieved on 20121127] * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104635260A (zh) * | 2013-11-08 | 2015-05-20 | 北京瑞拉爱堡地质勘探技术有限公司 | 一种地震数据叠前成像方法 |
US10296340B2 (en) | 2014-03-13 | 2019-05-21 | Arm Limited | Data processing apparatus for executing an access instruction for N threads |
CN105807315A (zh) * | 2016-03-14 | 2016-07-27 | 中国石油大学(华东) | 弹性矢量逆时偏移成像方法 |
CN106154331A (zh) * | 2016-06-29 | 2016-11-23 | 中国石油化工股份有限公司 | 正交介质地震波场模拟频散压制方法 |
CN113945994A (zh) * | 2020-06-30 | 2022-01-18 | 中国石油化工股份有限公司 | 使用有限差分模型进行高速多源加载和波场检索的方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Abdelkhalek et al. | Fast seismic modeling and reverse time migration on a GPU cluster | |
Weiss et al. | Solving 3D anisotropic elastic wave equations on parallel GPU devices | |
Komatitsch et al. | High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster | |
Martin et al. | Gravity inversion using wavelet-based compression on parallel hybrid CPU/GPU systems: application to southwest Ghana | |
US20140257780A1 (en) | Inversion of Geophysical Data on Computer System Having Parallel Processors | |
US9348044B2 (en) | Vectorization of fast fourier transform for elastic wave propogation for use in seismic underwater exploration of geographical areas of interest | |
CN113945994B (zh) | 使用有限差分模型进行高速多源加载和波场检索的方法 | |
Giroux et al. | Task-parallel implementation of 3D shortest path raytracing for geophysical applications | |
Alkhimenkov et al. | Resolving wave propagation in anisotropic poroelastic media using graphical processing units (GPUs) | |
WO2013033651A1 (fr) | Équation intégrale d'onde élastique pour traitement de données 3d sur gpgpu | |
Xue et al. | An efficient GPU implementation for locating micro-seismic sources using 3D elastic wave time-reversal imaging | |
Deschizeaux et al. | Imaging earth’s subsurface using CUDA | |
Londhe et al. | Adaptively accelerating FWM2DA seismic modelling program on multi-core CPU and GPU architectures | |
Abdelkhalek et al. | Fast seismic modeling and reverse time migration on a graphics processing unit cluster | |
CA2952935A1 (fr) | Interpolation et convolution pre-ordonnancees pour traitement de grilles etagees plus rapide | |
CN109490948B (zh) | 地震声学波动方程矢量并行计算方法 | |
Ma et al. | Integration of a Gaussian quadrature grid discretization approach with a generalized stiffness reduction method and a parallelized direct solver for 3-D frequency-domain seismic wave modelling in viscoelastic anisotropic media | |
US10454713B2 (en) | Domain decomposition using a multi-dimensional spacepartitioning tree | |
US20240142649A1 (en) | Method and system for determining migration data using multiblock gathers | |
Serpa et al. | Energy efficiency and portability of oil and gas simulations on multicore and graphics processing unit architectures | |
Le et al. | A pipeline approach for three dimensional time-domain finite-difference multi-parameter waveform inversion on GPUs | |
Abdelkhalak et al. | Application of high performance asynchronous acoustic wave equation stencil solver into a land survey | |
EP4088144A1 (fr) | Procédé et appareil de réalisation de modélisation efficace de sources sismiques en mouvement à durée étendue | |
Zhang et al. | Microseismic/acoustic emission sources localization method using improved search algorithm for multiple media and irregular complex structures | |
WO2024087126A1 (fr) | Procédé et système de détermination de données de migration au moyen de regroupements en croix et d'un traitement parallèle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12826772 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28/05/2014) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12826772 Country of ref document: EP Kind code of ref document: A1 |