US20140278305A1 - Fast Iterative Method for Processing Hamilton-Jacobi Equations - Google Patents
Fast Iterative Method for Processing Hamilton-Jacobi Equations Download PDFInfo
- Publication number
- US20140278305A1 US20140278305A1 US14/286,549 US201414286549A US2014278305A1 US 20140278305 A1 US20140278305 A1 US 20140278305A1 US 201414286549 A US201414286549 A US 201414286549A US 2014278305 A1 US2014278305 A1 US 2014278305A1
- Authority
- US
- United States
- Prior art keywords
- equation
- nodes
- reservoir
- active list
- reference cell
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 109
- 238000012545 processing Methods 0.000 title abstract description 4
- 229930195733 hydrocarbon Natural products 0.000 claims description 6
- 239000004215 Carbon black (E152) Substances 0.000 claims description 4
- 150000002430 hydrocarbons Chemical class 0.000 claims description 3
- 230000001902 propagating effect Effects 0.000 claims description 3
- 230000002250 progressing effect Effects 0.000 claims 1
- 230000009467 reduction Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 6
- 241000238097 Callinectes sapidus Species 0.000 description 5
- 238000013459 approach Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000009792 diffusion process Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000010408 sweeping Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 125000001183 hydrocarbyl group Chemical group 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000002059 diagnostic imaging Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 210000004885 white matter Anatomy 0.000 description 2
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 208000025966 Neurological disease Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000010923 batch production Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000002595 magnetic resonance imaging Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000926 neurological effect Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 208000020016 psychiatric disease Diseases 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000003325 tomography Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/13—Differential equations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/17—Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01V—GEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
- G01V2210/00—Details of seismic processing or analysis
- G01V2210/60—Analysis
- G01V2210/67—Wave propagation modeling
Definitions
- the applications of solutions to the H-J equation are numerous.
- the equation arises in the fields of computer vision, image processing, geoscience, and medical imaging and analysis.
- the shape-from-shading problem which infers 3D surface shape from the intensity values in 2D image
- the Eikonal equation which is a special form of the H-J equation. Extracting the medial axis or skeleton of the shape can be done by analyzing solutions of the H-J equation with the boundaries specified at the shape contour.
- H-J equation arises in models of wavefront propagation. For instance, the calculation of the travel times of the optimal trajectories of seismic waves is a critical process for seismic tomography.
- Several methods based on the H-J equation have also recently been introduced as a means for describing connectivity in white matter in medical image analysis.
- ⁇ is a domain in R n
- u(x) is solution, which can be considered as a travel time or distance from the boundary conditions.
- ⁇ is a domain in R n
- u(x) is solution, which can be considered as a travel time or distance from the boundary conditions.
- Equation 1 becomes the Eikonal equation when M is an identity matrix times a scalar function, f( ) which is often called the speed function.
- H-J equation A number of different numerical strategies have been proposed to efficiently solve the H-J equation. These methods can generally be classified into two groups. One group is a class of iterative methods based on a fixed-point update using Jacobi or Gauss-Seidel schemes. Other early work solves the Eikonal equation, a special case of H-J equation, by updating the solutions of the grid using a pre-defined updating order and Godunov upwind Hamiltonian until they converge. This method is simple to implement and produces viscosity solutions, but involves many iterations to converge, and for the worst case situation, complexity can approach the order of O(N2) where N is the number of data elements to be processed.
- a Fast Sweeping method has also been proposed, which uses a Gauss-Seidel updating order for fast convergence.
- the Fast Sweeping method has a computational complexity on the order of O(kN) where N is the number of elements to be processed and k depends on the complexity of the speed function.
- the Fast Sweeping method and a Godunov upwind discretization of the class of convex Hamiltonians can be employed to solve anisotropic H-J equations.
- Another interpretation of Hamiltonians has been introduced based on the Legendre transformation, which appears to be a Godunov Hamiltonian. This method employs the Lax-Friedrichs Hamiltonian for arbitrary static H-J equations.
- the proposed method is simple to implement and can be used widely on both convex and non-convex H-J equations, but it involves many more iterations than the Godunov Hamiltonian and the solution shows excessive diffusion due to the nature of the scheme. In general, the iterative methods are slow to converge and are not suitable for interactive applications.
- the second group of H-J solvers is based on adaptive updating schemes and sorting data structures.
- a Dijkstra-type shortest path method has been used to solve convex H-J equations, which are generally referred to as the Fast Marching methods.
- the main idea behind this method is that solutions for a convex Hamiltonian depend only on the upwind neighbors along the characteristics, so the causality relationship can be determined uniquely and the correct solutions can be computed by only a single pass update.
- the complexity of the Fast Marching method is O(NlogN), which is the best possible (optimal) asymptotic limit for the worst-possible input data (i.e. worst-case optimal). In this algorithm the running time is slightly affected by the complexity of the speed function.
- FIGS. 1A-1C illustrate a schematic 2D example of FIM frontwave propagation in accordance with certain aspects of the present techniques
- FIGS. 2A-2C illustrate a schematic 2D example of the change of the characteristic direction in accordance with certain aspects of the present techniques
- FIGS. 3A-3B are exemplary coalesced/non-coalesced global memory accesses in accordance with certain aspects of the present techniques
- FIG. 4 is an exemplary 2D bank assignment that avoids bank conflicts for neighbor pixel access in accordance with certain aspects of the present techniques
- FIGS. 5A-5C are exemplary bank assignment in accordance with certain aspects of the present techniques.
- FIG. 6 is a flow chart illustrating a method for finding an approximation of a Hamilton-Jacobi equation using a parallel computing system in accordance with certain embodiments of the present techniques.
- a system and method are provided for a parallel method for computing numerical approximations of the H-J equation on a discrete grid in the domain and the implementation on the GPU to make comparisons against other state-of-the-art methods. While the worse-case performance of the proposed method is not optimal, it enhances performance relative to the worst case of other methods on a variety of complex data sets, on a single processor, and scales well on many parallel architectures for a further performance benefit.
- a numerical method is described to solve the H-J equation that can be well-adapted to various parallel architectures, an improved Godunov Hamiltonian computation, and a GPU implementation of the proposed H-J solver.
- FIM Fast Iterative Method
- H-J Hamilton-Jacobi
- the method has suboptimal worst-case performance, in practice involving real and synthetic datasets, the method performs fewer computations per node than other known alternatives. Furthermore, the method uses only local, synchronous updates and therefore has better cache coherency, is simple to implement, and scales efficiently on parallel architectures, such as cluster systems or graphics processing units (GPUs).
- GPUs graphics processing units
- Equation 1 a numerical method is provided that is scalable on parallel architectures.
- existing H-J solvers do not scale effectively on computer architectures that incorporate multiple processors (parallel architectures), because the use of global data structures and the prescribed updating orders limit the ability to do multiple updates of the solution simultaneously. Therefore, to provide enhancements over the previous methods, the present techniques provide benefits by producing enhanced overall performance, cache coherence, and scalability across multiple processors. These enhancements result from: 1) not imposing a particular update order, 2) not using a separate, heterogeneous data structure for sorting, and 3) providing simultaneously updates at multiple points.
- FIM is a numerical method to solve PDEs, such as Equation 1, on parallel architectures.
- the FIM can solve the H-J equation selectively on the grid nodes without maintaining expensive data structures.
- FIM maintains a narrow band, called the active list, for storing the index of grid nodes to be updated.
- the method maintains a looser relationship and updates all nodes in the active list simultaneously (i.e., Jacobi update).
- Jacobi update During each iteration, the list of active nodes is expanded, and the band thickens or expands to include all nodes that could be influenced by the current updates.
- a node can be removed from the active list when the solution is converged, and re-inserted when any changes of its adjacent neighbors affect the solution of the current node. Note that newly inserted nodes are updated in the following update iteration to ensure a correct Jacobi update.
- a Godunov upwind discretization of the Hamiltonian is used. This method can allow multiple updates per node by reinserting nodes to the active list, and use a Jacobi update for parallel computation.
- the proposed method falls under the class of label correcting methods.
- the pseudo code of one embodiment of the FIM which may be referred to as Method 2.1, is as follows (Ux is a discrete approximation of u(x), and g(Ux) is a new solution at x that satisfies Equation 1 computed using a Godunov Hamiltonian H G in Equation (3). This pseudo code is listed below in Method 2.1 FIM (X):
- Method 2.1 FIM(X) comment: 1. Initialization (X: set of all grid nodes, L: active list) for each x ⁇ X do ⁇ ⁇ if ⁇ ⁇ ⁇ x ⁇ ⁇ ⁇ is ⁇ ⁇ source ⁇ then ⁇ ⁇ U x ⁇ 0 ⁇ else ⁇ ⁇ U x ⁇ ⁇ for each x ⁇ X do ⁇ ⁇ if ⁇ ⁇ ⁇ any ⁇ ⁇ ⁇ neighbor ⁇ ⁇ of ⁇ ⁇ x ⁇ ⁇ is ⁇ ⁇ source ⁇ then ⁇ ⁇ add ⁇ ⁇ x ⁇ ⁇ to ⁇ ⁇ L comment: 2.
- Update nodes in L while L is not empty do ⁇ ⁇ for ⁇ ⁇ each ⁇ ⁇ x ⁇ L do ⁇ ⁇ p ⁇ U x q ⁇ g ⁇ ( U x ) if ⁇ ⁇ p > q ⁇ then ⁇ ⁇ ⁇ U x ⁇ q if ⁇ ⁇ ⁇ p - q ⁇ ⁇ ⁇ then ⁇ ⁇ for ⁇ ⁇ each ⁇ ⁇ 1 ⁇ - ⁇ neighbor ⁇ ⁇ x nb ⁇ ⁇ of ⁇ ⁇ x do ⁇ ⁇ if ⁇ ⁇ x nb ⁇ ⁇ is ⁇ ⁇ ⁇ not ⁇ ⁇ in ⁇ ⁇ ⁇ L then ⁇ ⁇ p ⁇ U x nb q ⁇ g ⁇ ( U x nb ) if ⁇ ⁇ p > q ⁇ then ⁇ ⁇ U x nb ⁇ q add ⁇ ⁇ x
- FIG. 1 illustrates a schematic 2D example 100 of FIM frontwave propagation in accordance with certain aspects of the present techniques.
- the schematic 2D example is of a FIM frontwave expanding in the first quadrant.
- the lower-left corner point 102 is the source point
- the black points are fixed points
- the diagonal rectangle 104 a - 104 c containing blue points is the active list
- the black arrow 106 a - 106 c represents the narrow band's advancing direction.
- the initial stage 100 a is shown in FIG. 1A
- the first update stage 100 b is shown in FIG. 1B
- the second update stage 100 c is shown in FIG. 1C .
- blue points depend only on the neighboring black points, all of the blue points in the active list can be updated at the same time. If the characteristic path does not change its direction to the other quadrant, then all the updated blue points are fixed (become black points) and their 1-neighbor white points form a new narrow band.
- the FIM is an iterative method, meaning that a point is updated until its solution converges.
- most points require only a single update to converge. This can be interpreted as follows. If the angle between the direction of the characteristic path and the narrow band's advancing direction is smaller than 45 degree, then the exact solution at the point can be found only in a single update, as in the fast sweeping method. If the angle is larger than 45 degrees, the point at the location where the characteristic path changes direction have an initial value that is computed using the wrong set of neighboring nodes on the grid. In other words, the neighbors should be an “up-wind neighborhood,” that is located in the direction from which the wavefront associated with the solution is propagating and it is revised in successive iterations as neighbors refine their values. Thus, that point is not removed from the active list and is updated until the correct value is computed.
- FIGS. 2A-2C relate to an example of this type of situation.
- the Godunov Hamiltonian uses a neighborhood that may be 1-node in dimension surrounding the center node, so it can be mapped to iterative schemes.
- the term neighborhood can be defined as a set of nodes on the grid that are within as specified grid distance. That is, in two dimensions, as in FIGS. 1 a - c and 2 a - c, the nearest neighbors are the nodes that are immediately up, down, left, and right of the node in question.
- the neighborhood size may depend on the particular H-J formulation and the numerical approximation. For instance, if the neighborhood includes diagonal relationships on the grid, there are many cases to check for the correct solution of the Hamiltonian, e.g., eight cases for 2D and 26 cases for 3D solution. While the drawings are used to illustrate two dimensional (2D) examples of using the present system and method, the use of the present method with (3D) volumes is also readily applicable and expected.
- FIGS. 2A-2C are schematic 2D examples 200 a - 200 c of the change of the characteristic direction in accordance with certain aspects of the present technique.
- the black points are fixed points
- the diagonal rectangle 204 a - 204 c containing blue points are the active list
- the black arrow 206 a - 206 c represents the narrow band's advancing direction.
- the FIM can result in thicker bands that split in places where the characteristic path changes the direction (e.g., the red points 210 - 213 in FIGS. 2A-2C ).
- the wavefront of updated nodes can move over solutions for nodes whose values are considered correct with respect to their previous neighborhood values (i.e. they have already converged), and reactivate them.
- the algorithm reactivates these nodes to correct values as new information is propagated across the image.
- the worst-case performance of FIM is suboptimal.
- Appendix A and B are provided that include implementation details for the present system and method and Appendix A and B are incorporated herein by reference.
- the Godunov Hamiltonian H G for the H-J equation can be defined as follows:
- the proposed validity test does not entail an extra burden of Hamiltonian computations, and can be done using only simple integer equality and float inequality comparisons. Experiments show that using the new validity test can increase the performance by up to about 50% compared to the original method.
- the FIM method should scale well on various parallel architectures, e.g., multi-core processors, shared memory multiprocessor machines, or cluster systems.
- the GPU was selected to implement FIM to solve the H-J equation because the current GPUs are massively parallel SIMD processors, providing a very powerful general-purpose computational platform.
- One of the differences between the CPU and the GPU implementation of FIM is that the GPU employs a block-based updating scheme, because the GPU architecture favors coherent memory access and control flows.
- the original node-based FIM (Method 2.1) can be easily extended to a block-based FIM as shown in Method 4.1, which is included below.
- the domain is decomposed into pre-defined size blocks (we use a 43 cube for 3D in the GPU implementation), and solutions of the pixels in the same block are updated simultaneously with a Jacobi update scheme. Therefore, the active list of the GPU maintains the list of active blocks instead of nodes.
- the GPU FIM method includes three steps. First, each active block is updated with a pre-defined number of iterations. During each iteration, a new solution of Equation 1 is computed, replace the old solution if the new solution is smaller, and its convergence is encoded as a Boolean value. After the update step, a reduction on each active block is performed to check whether it is converged or not. If a block is converged, it is marked as to-be-removed. The second step is checking which neighbor blocks of to-be-removed blocks need to be re-activated. To do this, all the adjacent neighbor blocks of to-be-removed blocks are updated once, and another reduction operation is applied on each of the neighbor blocks.
- the final step is updating the active list by checking the convergence of each block and removing or inserting only active blocks to the active list.
- the following is a GPU FIM pseudo code, which may be referred to as Method 4.1, for updating active blocks (Cp and Cb are introduced in Section 4.2).
- the GPU H-J solver may be implemented on an NVIDIA graphics card.
- NVIDIA is used for GPU programming, and explains the GPU implementation details based on the CUDA programming model (CUDA is an extension to the C programming language, invented by NVIDIA Corp, specifically for programming their graphics chips), so please refer the CUDA programming guide for more details about the GPGPU programming using CUDA.
- Computing on the GPU involves running a kernel with a batch process of a large group of fixed size thread blocks, which is suited for the block-based update method used in the FIM method. If the block size to 43 is fixed, so 64 threads share the same shared memory and are executed in parallel on the same processor unit.
- a simple 1D integer array is used whose size is the total number of blocks to store active blocks. Only the array elements of index ranging between 0 to (number of total active blocks-1) are valid at any given time.
- the grid size is adjusted to the current number of active blocks; and when a block is being processed, its block index is retrieved from the active list on the GPU. Updating solutions and reductions, which are computationally dominant in the overall process, are done entirely on the GPU.
- FIGS. 3A-3B shows two different examples 300 a and 300 b for storing a 4 ⁇ 4 image in the GPU global memory space as 1D array when a block is copied to shared memory.
- Host memory is the CPU side memory
- global/shared memory is the GPU side memory.
- Color represents the pixels in the same block.
- pixels are stored from the fastest to the slowest axis order, as shown in FIG. 3A .
- a block is split into two regions in global memory space 310 a, which leads to split block accesses.
- accessing a block can be a single coalesced memory access, which is the most efficient way to access global memory 310 b on the GPU.
- a proper reordering should be applied so that the block access can be done through a coalesced memory access.
- the shared memory space in the NVIDIA G80 architecture is divided into 16 banks, and 16 shared memory accesses can be done simultaneously as long as all the memory requests refer to different memory banks or to the same memory bank. If any two memory requests, but not all, refer to the same memory bank, i.e., bank conflict, then this request is serialized and impairs the performance. Because the block size is fixed as 43, there is no bank conflict to access pixels inside blocks (block size is a multiple of warp size). However, because adjacent neighbor pixels are used to solve the H-J equation, an additional shared memory space should be provided for left/right/up/down/top/bottom neighbors of the boundary pixels of each block.
- FIG. 4 shows a 2D example diagrams 400 a - 400 e of the bank assignment that avoids bank conflicts for neighbor pixel access.
- the block size for this example is 16 (4 ⁇ 4), which is drawn as a yellow box on the leftmost image in FIG. 4 .
- the extra four pixels on each left/right/up/down side of the block are neighbor pixels.
- the number on each pixel represents the bank number to be assigned.
- memory requests for left/right/up/down neighbors can be done simultaneously without a bank conflict ( FIG. 4 ; red:left neighbors, cyan:right neighbors, green:up neighbors, blue:down neighbors).
- the shared memory of size 3*block size to store a block and its neighbors is used because some bank numbers appear twice (1, 4, 13, and 16 in FIG. 4 ).
- FIG. 5 shows an example of actual pixel assignment in shared memory.
- FIG. 5A shows a 2D block diagram 500 a with pixel indices (not bank numbers).
- FIG. 5B shows a bank assignment listing 500 b which listing the bank that each pixel is actually assigned to.
- FIG. 5C shows a snapshot 500 c of a shared memory access pattern when left neighbors are accessed (same example as the diagram 400 b in FIG. 4 ). Pixels colored in red are accessed by 16 threads in parallel, and because there is no bank conflict, this memory request can be processed simultaneously.
- the bank assignment technique shown here can be easily extended to 3D cases.
- Reduction is one of the commonly used computational techniques in the streaming programming model to produce a smaller stream from a larger input stream.
- a block may be reduced down to a single pixel that represents the convergence of the block.
- CUDA provides a block-wise thread synchronization mechanism
- a parallel reduction may be performed in a single kernel execution.
- To reduce a block of size n start with two threads. For each iteration, every thread participating in reduction reads two convergence values from the current block and write a true or false to one of the original locations (both converge:true, else false). In the next iteration, the number of participating threads is halved and the same reduction is performed. This process is repeated until a block is reduced to a single pixel.
- a parallel H-J solver based on the selective iterative method employs the narrow band approach to keep track of the points to be updated, and iteratively updates the solutions until they converge. Instead of using an expensive sorting data structure to keep the causality, the proposed method uses a simple list to store active points and updates all of them in parallel until they converge. The points in the list can be removed from or added to the list based on the convergence measure.
- the proposed method is simple to implement and runs faster than the existing solvers on a class of convex Hamilton-Jacobi equations. The prototype implementation on the GPU runs roughly fifty to one hundred times faster than the state-of-the-art CPU H-J solvers.
- the GPU implementation provides rapid computation of distance computation on DT-MRI (Diffusion Tensor-Magnetic Resonance Imaging) volumes, this makes interactive white matter connectivity analysis feasible.
- the present techniques may be used for seismic wave propagation simulation in an anisotropic speed volume.
- the present techniques may be used in fast geodesic computation on parametric surfaces or volumes.
- the present technique could be used to compute solutions to H-J equations associated with grids (images) of diffusion weighted or diffusion tensor measurements from magnetic resonance imagery. Solutions to a H-J equation that depends on those measurements can be used to describe paths or subsets (regions) of node on the grid that connect specified regions of the brain. Those paths or regions could, in turn, be used to quantify, analyze, or detect neurological or psychiatric disorders.
- the present technique can also be used to analyze the connected quality of a hydrocarbon-bearing reservoir and to obtain the hydrocarbons within the hydrocarbon-bearing reservoir.
- the method includes obtaining a model of a portion or a complete reservoir and dividing the model into cells, each cell having a volume and some attributes.
- the model may be a reservoir model of a hydrocarbon-bearing zone within a subsurface region.
- a speed function may be assigned to a portion of the cells in the model.
- One of the cells may be selected as a reference cell. Following this selection, connectivity between cells is determined by solving an Eikonal equation, describing a propagating front in a heterogeneous medium. The front can progress outward from the reference cell until an ending condition is met.
- the ending condition may be a threshold or convergence limit.
- the Eikonal equation can be solved by a fast iteration method with propagation velocity as a function of spatial position being provided by the speed function. Following the solving operation, the regions of the model may be characterized by their connective quality to the reference cell using the connectivity. Finally, with this analysis, wells may be drilled to extract the hydrocarbons.
- FIG. 6 illustrates an embodiment of a method for finding an approximation of a Hamilton-Jacobi equation using a parallel computing system.
- the method can include the operation of providing a representation of an equation solution on a grid using data values represented on grid nodes, as in block 610 . Values on a grid nodes that are specified boundary conditions as part of the final approximation may also be set for the process.
- An active list can be created using a subset of grid nodes, as in block 620 .
- the data values for the grid nodes are configured to be updated simultaneously using an individual computing device for each grid node.
- a sequence of convergence solutions can be computed for the nodes in the active list that iteratively decrease the approximation error of the solution, as in block 630 .
- the value of each grid node in the active list can be updated for each computation iteration using dependent data values of grid nodes from the previous iteration that are nearby in the grid, as in block 640 .
- the active list of grid nodes can be expanded for each computation iteration to include all nodes that are influenced by the updates to data values from the current iteration, as in block 650 .
- the active list of computing nodes can be expanded during each computation iteration to include all the grid nodes that are influenced by the updates to data values from the current iteration.
- the updates and expansion are performed simultaneously on multiple, separate computer processors in a way that depends only on the previous iteration's solution and the current update of those grid nodes.
- An additional operation is removing nodes from the active list whose values have reached an acceptable level of approximation error (e.g., reaches a convergence measure) after an update, as in block 660 .
- the data point nodes in the active list can be updated by adding or removing active data point nodes from the active list based on a convergence measure.
- a data point node can be re-inserted into the active list when changes of adjacent neighbors affect a solution of the previously removed data point node.
- a sequence of successively better approximations is created by modifying the active list and updating the active nodes until the active list is empty and all nodes on the grid approximate the Hamilton-Jacobi equation to a prescribed level of approximation, as in block 670 .
- the approximated solution can also be displayed to an end user on a display screen using a display volume.
- the present system and method can be applied in many fields of computing.
- the present method can be used in computer graphics, medical imaging, remote imaging of oil and gas underground geologic structures, radar imaging, astronomical imaging, computer gaming, and a number of other similar uses.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Operations Research (AREA)
- Computing Systems (AREA)
- Image Processing (AREA)
- Image Generation (AREA)
- Supply And Distribution Of Alternating Current (AREA)
- Measuring Volume Flow (AREA)
- Ultra Sonic Daignosis Equipment (AREA)
Abstract
A system and method are provided for a parallel processing of the Hamilton-Jacobi equation. A numerical method is provided to solve the Hamilton-Jacobi equation that can be used with various parallel architectures and an improved Godunov Hamiltonian computation.
Description
- This application is a divisional of U.S. patent application Ser. No. 12/740,061, filed Aug. 27, 2010, which is a nationalization of Patent Cooperation Treaty Application Number PCT/US2008/081855, filed Oct. 30, 2008, which claims priority to U.S. Provisional Patent Application Ser. No. 61/001,221, filed Oct. 30, 2007, and each of which is incorporated herein by reference.
- The applications of solutions to the H-J equation are numerous. The equation arises in the fields of computer vision, image processing, geoscience, and medical imaging and analysis. For example in computer vision, the shape-from-shading problem, which infers 3D surface shape from the intensity values in 2D image, can be modeled and solved with the Eikonal equation, which is a special form of the H-J equation. Extracting the medial axis or skeleton of the shape can be done by analyzing solutions of the H-J equation with the boundaries specified at the shape contour.
- Solutions to the H-J equation have been proposed for noise removal, feature detection and segmentation. In physics, the H-J equation arises in models of wavefront propagation. For instance, the calculation of the travel times of the optimal trajectories of seismic waves is a critical process for seismic tomography. Several methods based on the H-J equation have also recently been introduced as a means for describing connectivity in white matter in medical image analysis.
- The Hamilton-Jacobi partial differential equations (PDEs), are given by the following equation (1):
-
H(Δu, x)=1, ∀x∈Ω (1) - where Ω is a domain in Rn, u(x) is solution, which can be considered as a travel time or distance from the boundary conditions. Of particular interest is the special form
-
H(Δu, x)=√{square root over ((Δu)M(Δu)T)}{square root over ((Δu)M(Δu)T)} (2) - where M is the speed tensor matrix defined on Ω. We use the Hamiltonian equation defined below for our model equation (3):
-
- where p, q, and r are partial derivatives of ui at x along x, y, and z axis, and a, b, c, d, e, and f are upper triangular elements of the
matrix M. Equation 1 becomes the Eikonal equation when M is an identity matrix times a scalar function, f( ) which is often called the speed function. - A number of different numerical strategies have been proposed to efficiently solve the H-J equation. These methods can generally be classified into two groups. One group is a class of iterative methods based on a fixed-point update using Jacobi or Gauss-Seidel schemes. Other early work solves the Eikonal equation, a special case of H-J equation, by updating the solutions of the grid using a pre-defined updating order and Godunov upwind Hamiltonian until they converge. This method is simple to implement and produces viscosity solutions, but involves many iterations to converge, and for the worst case situation, complexity can approach the order of O(N2) where N is the number of data elements to be processed. A Fast Sweeping method has also been proposed, which uses a Gauss-Seidel updating order for fast convergence. The Fast Sweeping method has a computational complexity on the order of O(kN) where N is the number of elements to be processed and k depends on the complexity of the speed function. The Fast Sweeping method and a Godunov upwind discretization of the class of convex Hamiltonians can be employed to solve anisotropic H-J equations. Another interpretation of Hamiltonians has been introduced based on the Legendre transformation, which appears to be a Godunov Hamiltonian. This method employs the Lax-Friedrichs Hamiltonian for arbitrary static H-J equations. The proposed method is simple to implement and can be used widely on both convex and non-convex H-J equations, but it involves many more iterations than the Godunov Hamiltonian and the solution shows excessive diffusion due to the nature of the scheme. In general, the iterative methods are slow to converge and are not suitable for interactive applications.
- The second group of H-J solvers is based on adaptive updating schemes and sorting data structures. A Dijkstra-type shortest path method has been used to solve convex H-J equations, which are generally referred to as the Fast Marching methods. The main idea behind this method is that solutions for a convex Hamiltonian depend only on the upwind neighbors along the characteristics, so the causality relationship can be determined uniquely and the correct solutions can be computed by only a single pass update. The complexity of the Fast Marching method is O(NlogN), which is the best possible (optimal) asymptotic limit for the worst-possible input data (i.e. worst-case optimal). In this algorithm the running time is slightly affected by the complexity of the speed function. However, for a class of general H-J equations, tracing the characteristics can cause expensive searching among a wider range of neighborhoods of nodes than solving equations using an iterative numerical method. In addition, the method uses a global sorting data structure, e.g., a heap, and therefore the parallelization is not straightforward.
-
FIGS. 1A-1C illustrate a schematic 2D example of FIM frontwave propagation in accordance with certain aspects of the present techniques; -
FIGS. 2A-2C illustrate a schematic 2D example of the change of the characteristic direction in accordance with certain aspects of the present techniques; -
FIGS. 3A-3B are exemplary coalesced/non-coalesced global memory accesses in accordance with certain aspects of the present techniques; -
FIG. 4 is an exemplary 2D bank assignment that avoids bank conflicts for neighbor pixel access in accordance with certain aspects of the present techniques; -
FIGS. 5A-5C are exemplary bank assignment in accordance with certain aspects of the present techniques; and -
FIG. 6 is a flow chart illustrating a method for finding an approximation of a Hamilton-Jacobi equation using a parallel computing system in accordance with certain embodiments of the present techniques. - Reference is now be made to the exemplary embodiments illustrated in the drawings, and specific language is used herein to describe the same. It should nevertheless be understood that no limitation of the scope of the invention is thereby intended. Alterations and further modifications of the inventive features illustrated herein, and additional applications of the principles of the inventions as illustrated herein, which would occur to one skilled in the relevant art and having possession of this disclosure, are to be considered within the scope of the invention.
- A system and method are provided for a parallel method for computing numerical approximations of the H-J equation on a discrete grid in the domain and the implementation on the GPU to make comparisons against other state-of-the-art methods. While the worse-case performance of the proposed method is not optimal, it enhances performance relative to the worst case of other methods on a variety of complex data sets, on a single processor, and scales well on many parallel architectures for a further performance benefit. A numerical method is described to solve the H-J equation that can be well-adapted to various parallel architectures, an improved Godunov Hamiltonian computation, and a GPU implementation of the proposed H-J solver.
- Various embodiments of the present techniques are described further below. For instance, as one embodiment of the present techniques, a fast iterative method (FIM) for parallel systems is discussed. As another embodiment of the present techniques, a 3D Godunov Hamiltonian for the H-J equation is introduced and described in its implementation. Further still, in another embodiment of the present techniques, a GPU implementation of the proposed method is also discussed. Finally, the numerical results on several synthetic and real tensor volumes as compared with the existing state-of-the-art CPU methods are also discussed.
- The computational technique discussed further below is referred to as the Fast Iterative Method (FIM), which can solves a class of Hamilton-Jacobi (H-J) equations on parallel systems. The method manages a list of active nodes and iteratively updates solutions on those nodes until they converge. Nodes are added to or removed from the list based on a convergence measure, but the management of this list does not involve extra burden of expensive ordered data structures or special updating sequences.
- While the method has suboptimal worst-case performance, in practice involving real and synthetic datasets, the method performs fewer computations per node than other known alternatives. Furthermore, the method uses only local, synchronous updates and therefore has better cache coherency, is simple to implement, and scales efficiently on parallel architectures, such as cluster systems or graphics processing units (GPUs).
- To solve
Equation 1 efficiently, which is noted above, a numerical method is provided that is scalable on parallel architectures. As discussed previously, existing H-J solvers do not scale effectively on computer architectures that incorporate multiple processors (parallel architectures), because the use of global data structures and the prescribed updating orders limit the ability to do multiple updates of the solution simultaneously. Therefore, to provide enhancements over the previous methods, the present techniques provide benefits by producing enhanced overall performance, cache coherence, and scalability across multiple processors. These enhancements result from: 1) not imposing a particular update order, 2) not using a separate, heterogeneous data structure for sorting, and 3) providing simultaneously updates at multiple points. - FIM is a numerical method to solve PDEs, such as
Equation 1, on parallel architectures. The FIM can solve the H-J equation selectively on the grid nodes without maintaining expensive data structures. FIM maintains a narrow band, called the active list, for storing the index of grid nodes to be updated. Instead of using a special data structure to keep track of exact causal relationships, the method maintains a looser relationship and updates all nodes in the active list simultaneously (i.e., Jacobi update). During each iteration, the list of active nodes is expanded, and the band thickens or expands to include all nodes that could be influenced by the current updates. A node can be removed from the active list when the solution is converged, and re-inserted when any changes of its adjacent neighbors affect the solution of the current node. Note that newly inserted nodes are updated in the following update iteration to ensure a correct Jacobi update. To compute the solutions of the nodes in the active list, a Godunov upwind discretization of the Hamiltonian is used. This method can allow multiple updates per node by reinserting nodes to the active list, and use a Jacobi update for parallel computation. The proposed method falls under the class of label correcting methods. The pseudo code of one embodiment of the FIM, which may be referred to as Method 2.1, is as follows (Ux is a discrete approximation of u(x), and g(Ux) is a new solution at x that satisfiesEquation 1 computed using a Godunov Hamiltonian HG in Equation (3). This pseudo code is listed below in Method 2.1 FIM (X): -
Method 2.1: FIM(X) comment: 1. Initialization (X: set of all grid nodes, L: active list) for each x ∈ X for each x ∈ X comment: 2. Update nodes in L while L is not empty - The properties of the method are now further described.
FIG. 1 illustrates a schematic 2D example 100 of FIM frontwave propagation in accordance with certain aspects of the present techniques. InFIGS. 1A-1C , the schematic 2D example is of a FIM frontwave expanding in the first quadrant. The lower-leftcorner point 102 is the source point, the black points are fixed points, the diagonal rectangle 104 a-104 c containing blue points is the active list, and the black arrow 106 a-106 c represents the narrow band's advancing direction. Theinitial stage 100 a is shown inFIG. 1A , thefirst update stage 100 b is shown inFIG. 1B , and thesecond update stage 100 c is shown inFIG. 1C . Because blue points depend only on the neighboring black points, all of the blue points in the active list can be updated at the same time. If the characteristic path does not change its direction to the other quadrant, then all the updated blue points are fixed (become black points) and their 1-neighbor white points form a new narrow band. - The FIM is an iterative method, meaning that a point is updated until its solution converges. However, for many data sets, most points require only a single update to converge. This can be interpreted as follows. If the angle between the direction of the characteristic path and the narrow band's advancing direction is smaller than 45 degree, then the exact solution at the point can be found only in a single update, as in the fast sweeping method. If the angle is larger than 45 degrees, the point at the location where the characteristic path changes direction have an initial value that is computed using the wrong set of neighboring nodes on the grid. In other words, the neighbors should be an “up-wind neighborhood,” that is located in the direction from which the wavefront associated with the solution is propagating and it is revised in successive iterations as neighbors refine their values. Thus, that point is not removed from the active list and is updated until the correct value is computed.
FIGS. 2A-2C relate to an example of this type of situation. - The Godunov Hamiltonian uses a neighborhood that may be 1-node in dimension surrounding the center node, so it can be mapped to iterative schemes. However, the term neighborhood can be defined as a set of nodes on the grid that are within as specified grid distance. That is, in two dimensions, as in
FIGS. 1 a-c and 2 a-c, the nearest neighbors are the nodes that are immediately up, down, left, and right of the node in question. The neighborhood size may depend on the particular H-J formulation and the numerical approximation. For instance, if the neighborhood includes diagonal relationships on the grid, there are many cases to check for the correct solution of the Hamiltonian, e.g., eight cases for 2D and 26 cases for 3D solution. While the drawings are used to illustrate two dimensional (2D) examples of using the present system and method, the use of the present method with (3D) volumes is also readily applicable and expected. -
FIGS. 2A-2C are schematic 2D examples 200 a-200 c of the change of the characteristic direction in accordance with certain aspects of the present technique. In these examples 200 a-200 c, similar to the discussion ofFIGS. 1A-1C , the black points are fixed points, the diagonal rectangle 204 a-204 c containing blue points are the active list, and the black arrow 206 a-206 c represents the narrow band's advancing direction. However, unlike the Fast Marching method, where the wavefront propagates with closed, 1-point-thick curves, the FIM can result in thicker bands that split in places where the characteristic path changes the direction (e.g., the red points 210-213 inFIGS. 2A-2C ). Also, the wavefront of updated nodes can move over solutions for nodes whose values are considered correct with respect to their previous neighborhood values (i.e. they have already converged), and reactivate them. In this case, the algorithm reactivates these nodes to correct values as new information is propagated across the image. Thus, the worst-case performance of FIM is suboptimal. - Appendix A and B are provided that include implementation details for the present system and method and Appendix A and B are incorporated herein by reference.
- The following section gives the results of empirical studies, including situations where this worst-case behavior undermines computational efficiency of FIM and compares the results with those of the other state-of-the-art solvers.
- The discussion below provides a proof of the correctness of the method described.
- Lemma 2.1. FIM method converges.
- Proof For this we rely on monotonicity (decreasing) of the solution and boundedness (positive). From the pseudo code listed above in Method 2.1, when a point is added to the active list, its tentative solution is updated only when the new solution is smaller than the previous one. As such, all updates are positive by construction.
- Lemma 2.2. The solution U at the completion of FIM method with ε=0 (error threshold) is consistent with the corresponding Hamiltonian given in
Equation 1. - Proof Each point in the domain is appended to the active list at least once. Each point x is finally removed from L only when g(U, x)=0 and the upwind neighbors (which impact this calculation) are also inactive. Any change in those neighbors causes x to be re-appended to the active list. Thus, when the active list is empty (the condition for completion), g(U, x)=0 for the entire domain.
- Theorem 2.3. FIM method, for ε=0 gives an approximate solution to
Equation 1 on the discrete grid. - Proof. The proof of the theorem is given by the convergence and consistency of the solution, as given lemmas above.
- In this section, the details of Godunov discretization of H-J Hamiltonian are on a 3D grid, which is an extension of the 2D case is described. The simplest way to solve
Equation 1, which is noted above, is computing p, q, and r using a central difference method and solving a quadratic equation, but this approach requires global updates to converge. However, because convex Hamiltonians have strict causality relations with adjacent neighbors, there is a more efficient way to solve the equation. One approach is using only one-sided derivatives to compute Hamiltonians, e.g., Godunov upwind scheme. Accordingly, a similar Godunov upwind Hamiltonian may be employed, but the current technique provides an efficient method to evaluate the Hamiltonian. - The Godunov Hamiltonian HG for the H-J equation can be defined as follows:
-
- p±=D± xu, q±=D± yu, r±=D± zu, and I[a, b] is the closed interval bounded by a and b. This definition of the Godunov Hamiltonian looks complicated, but the main idea is evaluating the Hamiltonian H(p, q, r) with all possible combination of p={p−, p+, pσ}, q={q−, q+, qσ}, and r={r−, r+, rσ} where pσ, qσ, and rσ are critical points (because the extremum of a convex Hamiltonian occurs only on either the end of the interval or the critical point), and taking the valid minimum solution that satisfies
Equation 1. As a result, eight cases are used for 2D and 26 cases are used for 3D to evaluate the Hamiltonian (we do not evaluate for H(pσ, qσ, rσ)). To check the validity of the solution for H(p, q, r), a variety of approaches may be utilized. For instance, the following conditions have been proposed. -
H(sgn max{(p − −p 94)+, (p + −p σ)− }+p σ , q, r)=1 -
H(p, sgn max{(q − −q 94)+, (q + −q σ)− }+q σ , q, r)=1 -
H(p, q, sgn max{(r − −r 94)+, (r + −r σ)− }+r σ)=1 - Even though the above test to check the validity of the solution appears to be mathematically correct, practically it is not efficient due to two reasons. First, this test requires three evaluations of the Hamiltonian, which is an expensive operation. Second, a threshold needs to be used to numerically check the float equality (|H−1|<Q), which may induce numerical errors. As such, a new validity test is provided that is based on the observation that if the solution is valid then p, q, and r used to compute the solution are correct values. For example, if we use p=p−, then sgn max{(p−−pσ)+, (p+−pσ)−}+pσ=p− holds. Checking equality for this equation can be done efficiently because we can encode the left and the right side of the equation using integers, +1, 0, and −1, and compare equality of the integers. The right side index is determined by p, and the left side index is determined by p−, p+, and p− based on the new solution.
-
- The proposed validity test does not entail an extra burden of Hamiltonian computations, and can be done using only simple integer equality and float inequality comparisons. Experiments show that using the new validity test can increase the performance by up to about 50% compared to the original method.
- The FIM method should scale well on various parallel architectures, e.g., multi-core processors, shared memory multiprocessor machines, or cluster systems. Under the present techniques, the GPU was selected to implement FIM to solve the H-J equation because the current GPUs are massively parallel SIMD processors, providing a very powerful general-purpose computational platform. One of the differences between the CPU and the GPU implementation of FIM is that the GPU employs a block-based updating scheme, because the GPU architecture favors coherent memory access and control flows. The original node-based FIM (Method 2.1) can be easily extended to a block-based FIM as shown in Method 4.1, which is included below. For a block based update, the domain is decomposed into pre-defined size blocks (we use a 43 cube for 3D in the GPU implementation), and solutions of the pixels in the same block are updated simultaneously with a Jacobi update scheme. Therefore, the active list of the GPU maintains the list of active blocks instead of nodes.
- The GPU FIM method includes three steps. First, each active block is updated with a pre-defined number of iterations. During each iteration, a new solution of
Equation 1 is computed, replace the old solution if the new solution is smaller, and its convergence is encoded as a Boolean value. After the update step, a reduction on each active block is performed to check whether it is converged or not. If a block is converged, it is marked as to-be-removed. The second step is checking which neighbor blocks of to-be-removed blocks need to be re-activated. To do this, all the adjacent neighbor blocks of to-be-removed blocks are updated once, and another reduction operation is applied on each of the neighbor blocks. The final step is updating the active list by checking the convergence of each block and removing or inserting only active blocks to the active list. The following is a GPU FIM pseudo code, which may be referred to as Method 4.1, for updating active blocks (Cp and Cb are introduced in Section 4.2). -
4.1: GPU FIM(L, V) comment: Update blocks b in active list L, V: list of all blocks while L is not empty - The GPU H-J solver may be implemented on an NVIDIA graphics card. NVIDIA is used for GPU programming, and explains the GPU implementation details based on the CUDA programming model (CUDA is an extension to the C programming language, invented by NVIDIA Corp, specifically for programming their graphics chips), so please refer the CUDA programming guide for more details about the GPGPU programming using CUDA. Computing on the GPU involves running a kernel with a batch process of a large group of fixed size thread blocks, which is suited for the block-based update method used in the FIM method. If the block size to 43 is fixed, so 64 threads share the same shared memory and are executed in parallel on the same processor unit. Because it is not necessary to use special data structures, e.g., list or vector, to implement the active list on the GPU, a simple 1D integer array is used whose size is the total number of blocks to store active blocks. Only the array elements of index ranging between 0 to (number of total active blocks-1) are valid at any given time. For each CUDA kernel call, the grid size is adjusted to the current number of active blocks; and when a block is being processed, its block index is retrieved from the active list on the GPU. Updating solutions and reductions, which are computationally dominant in the overall process, are done entirely on the GPU.
- On the GPU memory, two sets of Boolean arrays, one Cp with a size of # of pixels (i.e., nodes), and the other Cb with a size of # of blocks, are created to store convergence of pixels and blocks, in addition to a float array with a size of # of pixels to store solutions. To check the convergence of blocks, a reduction on Cp to get Cb is performed. Managing the active list, e.g., inserting or deleting blocks from the list, is efficiently done on the CPU by reading back Cb to the CPU and looping over it to insert only non-converged blocks to the active list. When the list is completely updated on the CPU, it is copied to the GPU, but only a small part of the active list is actually used at any given time (index 0 to (# of active blocks-1)), so only a small fraction of contiguous memory needs to be copied to the GPU.
- To efficiently move data from global to shared memory on the GPU, the data is arranged in the GPU memory space in a certain way to access global memory as coalesced as possible. A volume is stored in memory as a 1D array with a certain traversing order.
FIGS. 3A-3B shows two different examples 300 a and 300 b for storing a 4×4 image in the GPU global memory space as 1D array when a block is copied to shared memory. Host memory is the CPU side memory, and global/shared memory is the GPU side memory. Color represents the pixels in the same block. Usually pixels are stored from the fastest to the slowest axis order, as shown inFIG. 3A . In this case, a block is split into two regions inglobal memory space 310 a, which leads to split block accesses. However, if we re-orderglobal memory 310 b as shown inFIG. 3B , accessing a block can be a single coalesced memory access, which is the most efficient way to accessglobal memory 310 b on the GPU. Hence, whenever input volumes are copied from the CPU to the GPU memory, a proper reordering should be applied so that the block access can be done through a coalesced memory access. - Another factor that affects the GPU performance is accessing shared memory. The shared memory space in the NVIDIA G80 architecture is divided into 16 banks, and 16 shared memory accesses can be done simultaneously as long as all the memory requests refer to different memory banks or to the same memory bank. If any two memory requests, but not all, refer to the same memory bank, i.e., bank conflict, then this request is serialized and impairs the performance. Because the block size is fixed as 43, there is no bank conflict to access pixels inside blocks (block size is a multiple of warp size). However, because adjacent neighbor pixels are used to solve the H-J equation, an additional shared memory space should be provided for left/right/up/down/top/bottom neighbors of the boundary pixels of each block. To avoid bank conflicts, the neighbor pixels are assigned to pre-defined banks, which requires a slightly larger extra shared memory space.
FIG. 4 shows a 2D example diagrams 400 a-400 e of the bank assignment that avoids bank conflicts for neighbor pixel access. The block size for this example is 16 (4×4), which is drawn as a yellow box on the leftmost image inFIG. 4 . The extra four pixels on each left/right/up/down side of the block are neighbor pixels. The number on each pixel represents the bank number to be assigned. By assigning pixels to shared memory in this pattern, memory requests for left/right/up/down neighbors can be done simultaneously without a bank conflict (FIG. 4 ; red:left neighbors, cyan:right neighbors, green:up neighbors, blue:down neighbors). The shared memory ofsize 3*block size to store a block and its neighbors is used because some bank numbers appear twice (1, 4, 13, and 16 inFIG. 4 ). - Further,
FIG. 5 shows an example of actual pixel assignment in shared memory.FIG. 5A shows a 2D block diagram 500 a with pixel indices (not bank numbers).FIG. 5B shows a bank assignment listing 500 b which listing the bank that each pixel is actually assigned to.FIG. 5C shows asnapshot 500 c of a shared memory access pattern when left neighbors are accessed (same example as the diagram 400 b inFIG. 4 ). Pixels colored in red are accessed by 16 threads in parallel, and because there is no bank conflict, this memory request can be processed simultaneously. The bank assignment technique shown here can be easily extended to 3D cases. - Reduction is one of the commonly used computational techniques in the streaming programming model to produce a smaller stream from a larger input stream. To check the convergence of a block, the convergence of every pixel in the block is checked. Therefore, a block may be reduced down to a single pixel that represents the convergence of the block. Because CUDA provides a block-wise thread synchronization mechanism, a parallel reduction may be performed in a single kernel execution. To reduce a block of size n, start with two threads. For each iteration, every thread participating in reduction reads two convergence values from the current block and write a true or false to one of the original locations (both converge:true, else false). In the next iteration, the number of participating threads is halved and the same reduction is performed. This process is repeated until a block is reduced to a single pixel.
- A parallel H-J solver based on the selective iterative method is described. The proposed method employs the narrow band approach to keep track of the points to be updated, and iteratively updates the solutions until they converge. Instead of using an expensive sorting data structure to keep the causality, the proposed method uses a simple list to store active points and updates all of them in parallel until they converge. The points in the list can be removed from or added to the list based on the convergence measure. The proposed method is simple to implement and runs faster than the existing solvers on a class of convex Hamilton-Jacobi equations. The prototype implementation on the GPU runs roughly fifty to one hundred times faster than the state-of-the-art CPU H-J solvers.
- Introducing a fast parallel H-J solver opens up a numerous interesting future research directions. Because the GPU implementation provides rapid computation of distance computation on DT-MRI (Diffusion Tensor-Magnetic Resonance Imaging) volumes, this makes interactive white matter connectivity analysis feasible. Similarly, as another example, the present techniques may be used for seismic wave propagation simulation in an anisotropic speed volume. Alternatively, the present techniques may be used in fast geodesic computation on parametric surfaces or volumes.
- As an example, the present technique could be used to compute solutions to H-J equations associated with grids (images) of diffusion weighted or diffusion tensor measurements from magnetic resonance imagery. Solutions to a H-J equation that depends on those measurements can be used to describe paths or subsets (regions) of node on the grid that connect specified regions of the brain. Those paths or regions could, in turn, be used to quantify, analyze, or detect neurological or psychiatric disorders.
- As an additional example, the present technique can also be used to analyze the connected quality of a hydrocarbon-bearing reservoir and to obtain the hydrocarbons within the hydrocarbon-bearing reservoir. The method includes obtaining a model of a portion or a complete reservoir and dividing the model into cells, each cell having a volume and some attributes. The model may be a reservoir model of a hydrocarbon-bearing zone within a subsurface region. Then, a speed function may be assigned to a portion of the cells in the model. One of the cells may be selected as a reference cell. Following this selection, connectivity between cells is determined by solving an Eikonal equation, describing a propagating front in a heterogeneous medium. The front can progress outward from the reference cell until an ending condition is met. The ending condition may be a threshold or convergence limit. The Eikonal equation can be solved by a fast iteration method with propagation velocity as a function of spatial position being provided by the speed function. Following the solving operation, the regions of the model may be characterized by their connective quality to the reference cell using the connectivity. Finally, with this analysis, wells may be drilled to extract the hydrocarbons.
-
FIG. 6 illustrates an embodiment of a method for finding an approximation of a Hamilton-Jacobi equation using a parallel computing system. The method can include the operation of providing a representation of an equation solution on a grid using data values represented on grid nodes, as inblock 610. Values on a grid nodes that are specified boundary conditions as part of the final approximation may also be set for the process. - An active list can be created using a subset of grid nodes, as in
block 620. The data values for the grid nodes are configured to be updated simultaneously using an individual computing device for each grid node. A sequence of convergence solutions can be computed for the nodes in the active list that iteratively decrease the approximation error of the solution, as inblock 630. The value of each grid node in the active list can be updated for each computation iteration using dependent data values of grid nodes from the previous iteration that are nearby in the grid, as inblock 640. - In addition, the active list of grid nodes can be expanded for each computation iteration to include all nodes that are influenced by the updates to data values from the current iteration, as in
block 650. The active list of computing nodes can be expanded during each computation iteration to include all the grid nodes that are influenced by the updates to data values from the current iteration. The updates and expansion are performed simultaneously on multiple, separate computer processors in a way that depends only on the previous iteration's solution and the current update of those grid nodes. - An additional operation is removing nodes from the active list whose values have reached an acceptable level of approximation error (e.g., reaches a convergence measure) after an update, as in
block 660. The data point nodes in the active list can be updated by adding or removing active data point nodes from the active list based on a convergence measure. In addition a data point node can be re-inserted into the active list when changes of adjacent neighbors affect a solution of the previously removed data point node. As a result, a sequence of successively better approximations is created by modifying the active list and updating the active nodes until the active list is empty and all nodes on the grid approximate the Hamilton-Jacobi equation to a prescribed level of approximation, as inblock 670. The approximated solution can also be displayed to an end user on a display screen using a display volume. - The present system and method can be applied in many fields of computing. For example, the present method can be used in computer graphics, medical imaging, remote imaging of oil and gas underground geologic structures, radar imaging, astronomical imaging, computer gaming, and a number of other similar uses.
- It is to be understood that the above-referenced arrangements are only illustrative of the application for the principles of the present invention. Numerous modifications and alternative arrangements can be devised without departing from the spirit and scope of the present invention. While the present invention has been shown in the drawings and fully described above with particularity and detail in connection with what is presently deemed to be the most practical and preferred embodiment(s) of the invention, it will be apparent to those of ordinary skill in the art that numerous modifications can be made without departing from the principles and concepts of the invention as set forth herein.
Claims (5)
1. A method for analyzing the connected quality of a hydrocarbon reservoir, comprising:
obtaining a model of a portion of the reservoir and dividing it into cells, each cell having a volume and attributes;
assigning a speed function to a portion of the cells;
choosing a reference cell;
determining a connectivity between cells in the reservoir using a solution to an Eikonal equation, describing a propagating front in a heterogeneous medium, said front progressing outward from a reference cell until an ending condition is met, the Eikonal equation being solved by a fast iteration method with propagation velocity as a function of spatial position being provided by the speed function; and
characterizing regions of the reservoir by their connective quality to the reference cell using the connectivity.
2. A method as in claim 1 , wherein step of choosing a reference cell comprises choosing multiple reference cells at different locations from each other in the reservoir.
3. A method as in claim 1 , further comprising: performing the steps of determining a connectivity and characterizing regions with respect for each reference cell chosen.
4. A method as in claim 1 , further comprising individually summing the connectivities for each cell in the reservoir with respect the reference cell.
5. A method as in claim 1 , wherein determining the connectivity between cells in the reservoir by solving an Eikonal equation further comprising:
creating an active list of data point nodes in a grid that are capable of being computed in parallel;
computing solutions to the Hamilton-Jacobi equation for data point nodes in the grid in parallel using the active list of computing nodes;
expanding the active list of computing nodes during each computation iteration to include all data point nodes that are influenced by a current computation iteration;
updating the solutions on the data point nodes in the active list until the data point nodes converge; and
displaying the approximated solution using a display volume to an end user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/286,549 US20140278305A1 (en) | 2007-10-30 | 2014-05-23 | Fast Iterative Method for Processing Hamilton-Jacobi Equations |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US122107P | 2007-10-30 | 2007-10-30 | |
PCT/US2008/081855 WO2009059045A2 (en) | 2007-10-30 | 2008-10-30 | Fast iterative method for processing hamilton-jacobi equations |
US74006110A | 2010-08-27 | 2010-08-27 | |
US14/286,549 US20140278305A1 (en) | 2007-10-30 | 2014-05-23 | Fast Iterative Method for Processing Hamilton-Jacobi Equations |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2008/081855 Division WO2009059045A2 (en) | 2007-10-30 | 2008-10-30 | Fast iterative method for processing hamilton-jacobi equations |
US12/740,061 Division US8762442B2 (en) | 2007-10-30 | 2008-10-30 | Fast iterative method for processing hamilton-jacobi equations |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140278305A1 true US20140278305A1 (en) | 2014-09-18 |
Family
ID=40591757
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/740,061 Active 2031-07-29 US8762442B2 (en) | 2007-10-30 | 2008-10-30 | Fast iterative method for processing hamilton-jacobi equations |
US14/286,549 Abandoned US20140278305A1 (en) | 2007-10-30 | 2014-05-23 | Fast Iterative Method for Processing Hamilton-Jacobi Equations |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/740,061 Active 2031-07-29 US8762442B2 (en) | 2007-10-30 | 2008-10-30 | Fast iterative method for processing hamilton-jacobi equations |
Country Status (5)
Country | Link |
---|---|
US (2) | US8762442B2 (en) |
EP (1) | EP2208166A4 (en) |
AU (1) | AU2008318642B2 (en) |
CA (1) | CA2703955C (en) |
WO (1) | WO2009059045A2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9063882B1 (en) * | 2010-09-09 | 2015-06-23 | Sas Ip, Inc. | Matrix preconditioners for simulations of physical fields |
US9733388B2 (en) | 2008-05-05 | 2017-08-15 | Exxonmobil Upstream Research Company | Systems and methods for connectivity analysis using functional objects |
US9990714B2 (en) | 2016-09-07 | 2018-06-05 | Simula Innovation As | Apparatus and method for global optimization |
KR101875925B1 (en) * | 2017-02-08 | 2018-07-06 | 울산과학기술원 | Fast iterative computation method based on group order and apparatus using the same |
KR101908116B1 (en) | 2017-02-09 | 2018-10-15 | 울산과학기술원 | Fast iterative computation method based on lock-free and apparatus using the same |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8633997B2 (en) * | 2009-10-15 | 2014-01-21 | Sony Corporation | Block-based variational image processing method |
US9260947B2 (en) | 2009-11-30 | 2016-02-16 | Exxonmobil Upstream Research Company | Adaptive Newton's method for reservoir simulation |
EP2564309A4 (en) | 2010-04-30 | 2017-12-20 | Exxonmobil Upstream Research Company | Method and system for finite volume simulation of flow |
AU2011283193B2 (en) | 2010-07-29 | 2014-07-17 | Exxonmobil Upstream Research Company | Methods and systems for machine-learning based simulation of flow |
CA2803068C (en) | 2010-07-29 | 2016-10-11 | Exxonmobil Upstream Research Company | Method and system for reservoir modeling |
US10087721B2 (en) | 2010-07-29 | 2018-10-02 | Exxonmobil Upstream Research Company | Methods and systems for machine—learning based simulation of flow |
WO2012039811A1 (en) | 2010-09-20 | 2012-03-29 | Exxonmobil Upstream Research Company | Flexible and adaptive formulations for complex reservoir simulations |
US9489176B2 (en) | 2011-09-15 | 2016-11-08 | Exxonmobil Upstream Research Company | Optimized matrix and vector operations in instruction limited algorithms that perform EOS calculations |
CN102520917A (en) * | 2011-12-15 | 2012-06-27 | 杭州电子科技大学 | Parallelization method for three-dimensional incompressible pipe flows |
US10253600B2 (en) | 2012-06-15 | 2019-04-09 | Landmark Graphics Corporation | Parallel network simulation apparatus, methods, and systems |
AU2013324162B2 (en) | 2012-09-28 | 2018-08-09 | Exxonmobil Upstream Research Company | Fault removal in geological models |
US9720117B1 (en) * | 2014-06-30 | 2017-08-01 | Pivotal Software, Inc. | Imaging subsurface properties using a parallel processing database system |
EP3175265A1 (en) | 2014-07-30 | 2017-06-07 | ExxonMobil Upstream Research Company | Method for volumetric grid generation in a domain with heterogeneous material properties |
AU2015339883B2 (en) | 2014-10-31 | 2018-03-29 | Exxonmobil Upstream Research Company | Methods to handle discontinuity in constructing design space for faulted subsurface model using moving least squares |
US10803534B2 (en) | 2014-10-31 | 2020-10-13 | Exxonmobil Upstream Research Company | Handling domain discontinuity with the help of grid optimization techniques |
US9523583B2 (en) * | 2015-02-27 | 2016-12-20 | Here Global B.V. | Generating routes using navigation meshes |
KR101927448B1 (en) | 2017-03-07 | 2018-12-10 | 울산과학기술원 | Method and apparatus for extending dynamic sub-domain of fast iterative method model |
US10949976B2 (en) * | 2017-06-12 | 2021-03-16 | Verathon Inc. | Active contour model using two-dimensional gradient vector for organ boundary detection |
CN107796332B (en) * | 2017-10-24 | 2019-08-20 | 大连理工大学 | A Method for Identifying Honeycomb Edge Regions from Honeycomb Core Surface Measurement Data |
US10776923B2 (en) * | 2018-06-21 | 2020-09-15 | International Business Machines Corporation | Segmenting irregular shapes in images using deep region growing |
US10643092B2 (en) | 2018-06-21 | 2020-05-05 | International Business Machines Corporation | Segmenting irregular shapes in images using deep region growing with an image pyramid |
CN114282403B (en) * | 2021-11-22 | 2024-07-23 | 西安理工大学 | Efficient high-precision habitat simulation method for coupled habitat suitable model |
FR3131022A1 (en) | 2021-12-21 | 2023-06-23 | Thales | METHOD FOR DETERMINING A PHYSICAL QUANTITY AND ASSOCIATED DETERMINATION SYSTEM. |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100057418A1 (en) * | 2006-03-02 | 2010-03-04 | Dachang Li | Method for Quantifying Reservoir Connectivity Using Fluid Travel Times |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002006904A (en) * | 2000-06-16 | 2002-01-11 | Babcock Hitachi Kk | Method and device for nonlinear optimum adaptive control |
US7565243B2 (en) * | 2005-05-26 | 2009-07-21 | Exxonmobil Upstream Research Company | Rapid method for reservoir connectivity analysis using a fast marching method |
WO2007044557A2 (en) * | 2005-10-06 | 2007-04-19 | Luminescent Technologies, Inc. | System, masks, and methods for photomasks optimized with approximate and accurate merit functions |
-
2008
- 2008-10-30 AU AU2008318642A patent/AU2008318642B2/en active Active
- 2008-10-30 EP EP08844220.7A patent/EP2208166A4/en not_active Ceased
- 2008-10-30 WO PCT/US2008/081855 patent/WO2009059045A2/en active Application Filing
- 2008-10-30 CA CA2703955A patent/CA2703955C/en active Active
- 2008-10-30 US US12/740,061 patent/US8762442B2/en active Active
-
2014
- 2014-05-23 US US14/286,549 patent/US20140278305A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100057418A1 (en) * | 2006-03-02 | 2010-03-04 | Dachang Li | Method for Quantifying Reservoir Connectivity Using Fluid Travel Times |
Non-Patent Citations (4)
Title |
---|
Alkhalifah et al. "Implementing the fast marching eikonal solver: spherical versus Cartesian coordinates" 2001, Geophysical Prosepecting 49, pages 165-178. * |
Bader et al. "Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs", 18 July 2006, pages 1366-1378. * |
Dejnozkova et al. "A parallel algorithm for solving the eikonal equation" 2003, IEEE pages 325-328. * |
Herrmann et al. "A domain decomposition parallelization of the Fast Marching Method", 2003, Center for Turbulence Research Annual Research Briefs 2003, pages 213-225. * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9733388B2 (en) | 2008-05-05 | 2017-08-15 | Exxonmobil Upstream Research Company | Systems and methods for connectivity analysis using functional objects |
US9063882B1 (en) * | 2010-09-09 | 2015-06-23 | Sas Ip, Inc. | Matrix preconditioners for simulations of physical fields |
US9990714B2 (en) | 2016-09-07 | 2018-06-05 | Simula Innovation As | Apparatus and method for global optimization |
KR101875925B1 (en) * | 2017-02-08 | 2018-07-06 | 울산과학기술원 | Fast iterative computation method based on group order and apparatus using the same |
KR101908116B1 (en) | 2017-02-09 | 2018-10-15 | 울산과학기술원 | Fast iterative computation method based on lock-free and apparatus using the same |
Also Published As
Publication number | Publication date |
---|---|
CA2703955C (en) | 2018-05-29 |
US20110046927A1 (en) | 2011-02-24 |
EP2208166A2 (en) | 2010-07-21 |
WO2009059045A3 (en) | 2009-08-06 |
AU2008318642A1 (en) | 2009-05-07 |
US8762442B2 (en) | 2014-06-24 |
WO2009059045A2 (en) | 2009-05-07 |
EP2208166A4 (en) | 2015-06-03 |
CA2703955A1 (en) | 2009-05-07 |
AU2008318642B2 (en) | 2013-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8762442B2 (en) | Fast iterative method for processing hamilton-jacobi equations | |
Jeong et al. | Interactive visualization of volumetric white matter connectivity in DT-MRI using a parallel-hardware Hamilton-Jacobi solver | |
US9047674B2 (en) | Structured grids and graph traversal for image processing | |
AU2008300256A1 (en) | Partitioning algorithm for building a stratigraphic grid | |
US10282899B2 (en) | Systems, methods and, media for simulating deformations of nonlinear elastic bodies | |
US8503754B2 (en) | Parallel process for level set segmentation of volume data | |
Jeong et al. | A fast eikonal equation solver for parallel systems | |
Quell et al. | Shared-memory block-based fast marching method for hierarchical meshes | |
Drees et al. | Hierarchical random walker segmentation for large volumetric biomedical images | |
US20160239591A1 (en) | Method and System to Enhance Computations For A Physical System | |
Jeong et al. | A fast iterative method for a class of Hamilton-Jacobi equations on parallel systems | |
Quesada-Barriuso et al. | Efficient 2D and 3D watershed on graphics processing unit: block-asynchronous approaches based on cellular automata | |
AU2013206119B2 (en) | Fast iterative method for processing Hamilton-Jacobi equations | |
Weinbub et al. | Shared-memory parallelization of the fast marching method using an overlapping domain-decomposition approach | |
Lamas-Rodriguez et al. | GPU-accelerated level-set segmentation | |
AU2016225807A1 (en) | Fast iterative method for processing Hamilton-Jacobi equations | |
Medina et al. | Flow updates for domain decomposition of entropic optimal transport | |
US11567779B2 (en) | Systems and methods for simulation of dynamic systems | |
Pantidis et al. | Image-based adaptive domain decomposition for continuum damage models | |
Hong et al. | A group-ordered fast iterative method for eikonal equations | |
Saxena et al. | A parallel GPU algorithm for mutual information based 3D nonrigid image registration | |
Du | Gpu-based adaptive surface reconstruction for real-time sph fluids | |
Jeong | Interactive three-dimensional image analysis and visualization using graphics hardware | |
US20250095295A1 (en) | Marching lattice | |
Chesakov | Vascular tree structure: Fast curvature regularization and validation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: UNIVERSITY OF UTAH RESEARCH FOUNDATION, UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:UNIVERSITY OF UTAH;REEL/FRAME:032958/0921 Effective date: 20080207 Owner name: UNIVERSITY OF UTAH, UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JEONG, WON-KI;WHITAKER, ROSS;REEL/FRAME:032958/0857 Effective date: 20080204 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |