+

WO1989001665A1 - Topologie hypercube pour systemes multiprocesseurs avec voies de communication additionnelles entre les noeuds ou topologies avec angles substitues - Google Patents

Topologie hypercube pour systemes multiprocesseurs avec voies de communication additionnelles entre les noeuds ou topologies avec angles substitues Download PDF

Info

Publication number
WO1989001665A1
WO1989001665A1 PCT/US1988/002782 US8802782W WO8901665A1 WO 1989001665 A1 WO1989001665 A1 WO 1989001665A1 US 8802782 W US8802782 W US 8802782W WO 8901665 A1 WO8901665 A1 WO 8901665A1
Authority
WO
WIPO (PCT)
Prior art keywords
topology
node
hypercube
nodes
block
Prior art date
Application number
PCT/US1988/002782
Other languages
English (en)
Inventor
Renben Shu
David H. C. Du
Original Assignee
Regents Of The University Of Minnesota
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Regents Of The University Of Minnesota filed Critical Regents Of The University Of Minnesota
Publication of WO1989001665A1 publication Critical patent/WO1989001665A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17337Direct connection machines, e.g. completely connected computers, point to point communication networks
    • G06F15/17343Direct connection machines, e.g. completely connected computers, point to point communication networks wherein the interconnection is dynamically configurable, e.g. having loosely coupled nearest neighbor architecture

Definitions

  • the present invention pertains to the field of high-speed digital data processors, and more particularly to interprocessor communication networks for multiprocessor or multicomputer designs.
  • the physical packaging of the hardware becomes a major concern to interprocessor communication.
  • the preferred cost-effective method of implementing a large parallel processor is to use microprocessor packaged in individual VLSI packages with a small number of memory and other support chips for each processor.
  • the number of other microprocessor packages that a single package can communicate with depends on the number of available input/output (I/O) ports available.
  • I/O input/output
  • dedicated pins on the LSI package are used to form the communication interconnect.
  • Interprocessor communication can be divided into two main categories of topologies, commonly entitled direct and indirect.
  • the direct network connection topology has been widely researched and considered for telephone switching connections. These connections are typically found to be tightly coupled, slow-speed communications networks designed either as single-stage or multi-stage designs.
  • the most commonly used direct interconnect topology is the crossbar interconnect widely used in older telephone exchange central offices. When the number of nodes increases in a direct interconnection topology, the number of possible interconnections grow very quickly.
  • These types of prior art direct interconnection techniques are very expensive for loosely-coupled or distributed multiprocessor designs with a very large number of processors.
  • the indirect interconnection topology is typically described in terms of the physical layout of the interconnected nodes of the system. These interconnect schemes are usually described in terms of the degree required for implementation, i.e., degree one, degree two, degree three, hypercube, etc. The most promise for very large multiprocessor system interconnect topologies has been found in the hypercube topology.
  • an interprocessor communication system for use in multiprocessor designs is constructed by modifying an n- dimensional hypercube.
  • This modified hypercube is constructed using 2 n -nodes arranged as an n-dimensional hypercube (n-cube). Additional communication paths are inserted into the hypercube structure to reduce the diameter of the hypercube by increasing the number of ports (degree) for each node. This is an acceptable trade-off when the number of nodes within the system increases.
  • FIG. 1 shows a variety of prior art network topologies such as (a) the chordal ring, (b) the cube- connected cycle (CCC), (c) the ring (cycle), (d) the tree, (e) the mesh (grid), and (f) the completely connected network topology.
  • CCC cube- connected cycle
  • FIG. 2 shows prior art hypercubes of (a) one-, (b) two-, (c) three-, and (d) four-dimensional configurations.
  • FIG. 3 shows a path connecting x and x' in the basic topology S 2 .
  • FIG. 4 shows a path connecting (x, y) and (x', y') in a Substituted Topology ST ( S 2 , S 1 ).
  • FIG. 5 is a Modified Hypercube MH 2 formed from a
  • FIG. 6 is a Modified Hypercube MH 3 formed from a 3-cube.
  • FIG. 7 exemplifies the robustness of the Modified Hypercube by showing four remaining virtual 2-cubes in a MH 3 configuration when nodes 3 and 4 are faulty.
  • FIG. 8 shows a substituted node (within the circle) of a Substituted Topology MH 3 (MH 2 ) where the basic topology is a Modified Hypercube of the type shown in FIG. 6 and the block topology is a Modified Hypercube of the type shown in FIG. 5.
  • FIG. 9 is a graphical representation of a Substituted Modified Hypercube SMH 3 (MH 2 ) where the basic topology is a Modified Hypercube of the type shown in FIG. 6 and the block topology is a Modified Hypercube of the type shown in FIG. 5.
  • MH 2 Substituted Modified Hypercube SMH 3
  • the communication medium may be shared memory by using direct interconnection network (such as omega network, crossbar, indirect binary hypercube, etc.), a broadcast bus or an indirect interconnection network.
  • direct interconnection network such as omega network, crossbar, indirect binary hypercube, etc.
  • Broadcast bus schemes are not expensive but it is difficult to get high performance when the system is large, while the shared memory systems are expensive when scaled to large dimensions because of the rapid growth of the interconnection network.
  • Large system configurations (for example, with thousands of processors) are readily realized with the distributed memory based on a limited form of interconnection, such as the pyramid or binary hypercube.
  • the present patent application is directed to such indirect interconnection topologies which can be used for small (for example, with ten processors) to very large multiprocessor systems (for example, with a million or more processors).
  • An indirect interconnection topology for a multiprocessor system can be considered as a graph with each processor as a vertex (or node) and with, an edge between two vertices if their corresponding processors are directly connected.
  • graph and “system” are used interchangeably throughout this application.
  • the first criterion is that the degree of a graph should be small.
  • the degree of a graph is defined as the maximum number of vertices connected directly (through I/O ports) to a vertex in the graph. This restriction represents the fact that a processor with a large number of I/O line interfaces is expensive and may not be feasible for realization due to pin-out limitations, multiplexing problems, etc.
  • the second criterion is that the diameter of the graph should be small relative to the number of the vertices.
  • the diameter of a graph is defined as the maximum shortest distance between any pair of vertices in the graph. Processing elements not directly connected will have to have messages relayed by intervening vertices. The number of relays in the worst case should be kept small to keep communication time delay between processors to a minimum.
  • the message relaying load should be uniformly distributed throughout the network such that there is no bottleneck in the message communication flow.
  • the routing algorithm for any node in the system should be easily implemented, so that the cost of time for communication is small. Also, the system should be uniform so that each node processor could be loaded with the identical routing algorithm to forward messages independent of the location of the node in teh system. The node need only recognize its own address. Uniformity also creates a system without congestion.
  • the system should be robust. For a large system with thousands or more processors, it is very possible that there are some processors which have faults.
  • the faulted system with a number of faulted processors (nodes) should still be able to perform computation by disabling the faulty processors, reconfiguring the system (by the host or control processor) and by routing messages around the faulted notes.
  • the robustness is very important for such a system. Usually the robustness of a system is related to the number of possible (or alternative) paths between each pair of processors.
  • the system should have good emulation capabilities to emulate other network topologies.
  • the extreme case of small degree is to allow each processor to have at most two or three neighbors.
  • An example of a network topology with only two neighbors is the ring, as shown in FIG. 1C.
  • the diameter of a ring is equal to half the number of vertices. In the ring shown in FIG. 1C, the diameter, therefore, is 4, as the number of vertices is equal to 8.
  • FIG. 1F This is exemplified in FIG. 1F as the completely connected topology.
  • N the number of processors in the system
  • the degree p is equal to n-1 and the diameter d is equal to 1.
  • a completely connected topology offers extremely efficient communication between the processors but is limited in size due to the high number of I/O ports that each processor must support.
  • the vertices are grouped into 2 n cycles of n vertices each and conceptually arranged at the corners of an n-cube.
  • Each of the n edges emanating from a corner of the cube is used to connect one of the vertices of the corresponding cycle to a vertex of a neighboring cycle.
  • the diameter of the system is 5/2 ⁇ log 2 N+0(1).
  • a generalization of the binary trees, called multitree structure graphs, is formed by taking t binary trees and connecting the root of each to one vertex of a t-vertex cycle. Its diameter is less than 2 ⁇ log 2 N+0(1).
  • Another trivalent graph is the Dense Trivalent Group (DTG). Its diameter is 3/2 ⁇ log 2 N+0(1). This topology cannot simu late a two- or three-dimensional mesh.
  • FIG. 1A Another type of trivalent graph is the chordal ring, shown in FIG. 1A.
  • TABLE 1 A comparison of a variety of the prior art network topologies is shown in TABLE 1. Formulas for calculating the diameter and degree of the network topologies are shown. Also shown are the other types of topologies that each topology can emulate. Comparative judgments based on the criteria such as robustness, congestion and uniformity of the structures discussed above is also included in TABLE 1.
  • FIG. 2 is an example of another type of interconnection topology termed the hypercube or n-cube.
  • the hypercube takes its name from a three-dimensional cube extended into lesser and greater dimensions. For example, as shown in FIG. 2, a 1-cube is simply two nodes connected along a single communication path. Each node or vertex 900 is connected along an edge 901 of the cube which represents in graphical form a communication link.
  • the simplest form of a hypercube is a zero-cube in which there is a single node with no connections to any other nodes.
  • a 2-cube has the number of processors N equal to 2 n in which n is the dimension of the cube. Hence, the 2-cube has four processors shown with vertex addresses 00, 01, 11, and 10 in their binary form.
  • a standard three-dimensional cube also shown in FIG. 2 is degree of 3, a diameter of 3, and has 8 processors or nodes connected by 12 communication paths. Each node in a 3-cube connects with three other nodes.
  • the binary n-cube or hypercube has a diameter of log N and a degree of log 2 N.
  • the hypercube can be extended into any number of dimensions for implementing a regular interprocessor communication network. For example, the 4-cube of FIG.
  • FIG. 2 is a theoretical representation of a four-dimensional cube with the binary addresses of each node of the cube shown in FIG. 2.
  • the diameter of a 4-cube is equal to 4 and its degree is 4 such that each node is connected to four other nodes in the system.
  • Those skilled in the art can readily recognize the extension of the hypercube into n- dimensions.
  • the hypercube has received significant interest due to the success of the Caltech Cosmic Cube project, the commercial availability of the NCUBE/10 parallel processor, and the availability of boolean cube-configured concurrent processors from Intel Scientific Computers, NCUBE, Ametek, Floating Point Systems and Thinking Machine Corp.
  • the reason why the binary hypercube has attracted so much attention is because of its powerful topological properties. That is, homogeneity, symmetry, ability to map most other popular network topologies and recursive structure.
  • Many practical algorithms based on two- or three- dimensional meshes (grids), trees and rings (cycles) can be embedded into or emulated by the binary hypercube.
  • the commercially available binary hypercube parallel processor NCUBE/10 has a potential computing power of 500 million floating-point operations per second (MFLOPS) which is more than three times as much as that of a CRAY-1 computer.
  • MFLOPS floating-point operations per second
  • the relative cost of the NCUBE/10 is much less than that of the CRAY-1.
  • the NCUBE/10 parallel computer available from NCUBE Corporation of Beaverton, Oregon, is a true parallel processing computer which can be configured with as few as 16 processors and expanded to a maximum of 1,024 processors.
  • the NCUBE/10 machine is implemented using a hypercube technology expandable to a degree of 10 with a diameter of 10.
  • Each node within the system is an independent 32-bit microprocessor with its own local memory and, at the most, 10 communication links to other nodes or processors in the system.
  • Each node in the NCUBE/10 device has one 32-bit microprocessor and 128 Kbytes of memory.
  • Each node executes its own program out of its local memory and operates on its own data.
  • Each node in the NCUBE/10 machine is connected in a network of a hypercube configuration to a set of its neighbor's through Direct Memory Access (DMA) com munication channels that are controlled by the microprocessor of the node.
  • DMA Direct Memory Access
  • the initiation and execution of communication throughout the network is on an interrupt basis, allowing the microprocessors of each node to independently process data while communication through DMA channels continues.
  • System configuration, global control and external I/O is controlled by host processors.
  • Each node is embedded with a routing algorithm which allows it to communicate with any other node or to pass communications through its node to be forwarded to the addressed node.
  • CM-1 Another prior art multiprocessor using an n-cube interconnection network topology is the Connection Machine constructed at Thinking Machines Corporation, Cambridge, Massachusetts.
  • the prototype of this machine, the CM-1 contains 65,536 (2 16 ) cells or processors connected by packet-switched network based on the boolean n-cube (hypercube) topology and using an adaptive routing algorithm.
  • the network topology is constructed using a 12-cube where each node in the 12-cube is constructed of a router processor connected to 12 other remote router processors and also connected to 16 local microprocessors.
  • Each node in the 12-cube of the CM-1 is constructed from a custom designed VLSI chip that contains 16 processor cells and one router unit of the packet-switched communications network. All communications transfers through the 12-cube take place over the edges of the 12-cubes connected to bidirectional I/O pins on each VLSI processor and router chip.
  • Each vertex in the 12-cube of the CM-1 multiprocessor has 16 processors connected in a 4x4 grid (mesh) such that each processor at each vertex communicates directly with its north, east, west and south neighbors within the same vertex.
  • each vertex in the 12-cube of the CM-1 multiprocessor is substituted with a two-dimensional mesh topology which connects to the 12-cube through a router.
  • Parallel processing computers of the size and type of the NCUBE/10 and the CM-1 are used for a wide variety of scientific applications such as seismic processing, image processing, artificial intelligence processing, simulation of events such as fluid flow, molecular modeling, weather, and many other applications.
  • the common purpose in all of these applications is to solve a problem as a set of concurrent cooperating subtasks. As requirements for these applications grow in the future, higher performance with larger numbers of processors is required to effectively meet the needs of the scientific community.
  • VLSI Very Large Scale Integration
  • ULSI Ultra Large Scale Integration
  • the diameter of a topology of a small degree tends to be big and the robustness of the topology tends to be reduced.
  • Most of the previously discussed prior art topologies other than the hypercube are of degree 3 or 2.
  • a topology of a degree greater than 3 will be needed , but wi th a small diameter and improved robustness.
  • the first class of topologies discussed below is an improved hypercube termed the Modified Hypercube (MH).
  • MH Modified Hypercube
  • This unique topology reduces the diameter of a hypercube by approximately 1/2 at the expense of increasing the degree by 1. In some applications this is a favorable trade-off.
  • SMH Substituted and Modified Hypercube
  • the degree of an SMH can range from 3 to log 2 N or more.
  • topological properties PROP(S) of a multiprocessor system S is denoted by a three-tuple (N,p,d), where N is the total number of processors in S, p is the degree of the topology, and d is the diameter of S.
  • a new topology called Substituted Topology ST( 2 , S 1 ) can be formed by replacing each vertex in S 2 by a S 1 .
  • S 2 is called the basic topology and S 1 is called the block topology. Since each vertex in S 2 has a degree of p 2 (number of I/O ports per vertex), we need to decide how these p 2 edges are connected to the vertices in S 1 .
  • N N 1 ⁇ N 2 , where the total number of nodes in the Substituted Topology N is equal to the number of nodes in the basic topology N 2 times the number of nodes in a single block topology. Since each S 1 has N 1 vertices and S 2 has N 2 vertices, after each vertex in S 2 replaced by a S 1 , ST(S 2 ,S 1 ) has N 1 ⁇ N 2 vertices. For example, every vertex in S 2 is replaced by the block topology S 1 , each of which having N 1 nodes. The resulting Substituted Topology has N 1 times N 2 nodes.
  • the diameter of the Substituted Topology d 1 is a function of the diameter of the block topology d 1 , multiplied by the diameter of the basic topology, first incremented by one. Since the diameter of the basic topology S 2 is d 2 , the longest path between two vertices consists of d 2 edges and d 2 +1 vertices. After each vertex in the basic topology S 2 is replaced by a block topology S 1 , the longest possible path between two vertices in ST(S 2 ,S 1 ) is no more than (d 2 +1) ⁇ d 1 .
  • the Substituted Topology ST(S 2 ,S 1 ) contains many more vertices (many more processors). Since p 1 ⁇ p 2 and N 1 >2, the degree of ST(S 2 ,S 1 ) (i.e., p 1 + p 2 /N 1 ) can be smaller than the number of ports p 2 if p 1 is much smaller than p 2 . Although the diameter of ST(S 2 , S 1 ) can be potentially much bigger than d 2 , it is actually quite small when the details of the topology S 1 are considered along with the below-described routing algorithm. Several examples of this are discussed below.
  • This substituted topology still preserves some topological properties from both S 1 and S 2 .
  • a routing algorithm can be easily developed based on the routing algorithms for S 1 and S 2 .
  • the address of a vertex in ST(S 2 ,S 1 ) is denoted by a two- tuple (inter-address, inner-address).
  • Inter-address is the address of a vertex in the basic topology S 2 which has been replaced by a block topology S 1 in ST(S 2 ,S 1 ).
  • Inner-address is the address of a vertex in the particular block topology S 1 .
  • a path between two vertices (x,y) and (x',y'), where x, x' are inter-addresses and y, y' are inner-addresses, can be routinely derived as shown in Figure 3.
  • each vertex x in S 2 is replaced by a S 1 in ST(S 2 ,S 1 ). Since the longest distance between any two vertices in S 2 is d 2 and the longest distance between any two vertices in S 1 is d 1 , the longest possible distance between any two vertices in ST(S 2 ,S 1 ) is no more than d 1 ⁇ (d 2 +1).
  • the above substituting scheme can be applied to any two topologies.
  • the preferred embodiment of the present invention uses hypercube as the basic topology and to combine hypercube with various other block topologies to achieve our goal.
  • the substituting scheme may increase the diameter of the resulting topology. Therefore, the preferred embodiments of the present invention use a modified hypercube as the basic topology to reduce the diameter of the resulting topology.
  • a set of cubic func tions and a complement function are described below.
  • a vertex i ( 0 ⁇ i ⁇ 2 n -1) is connected to n other vertices ⁇ 0 (i), ⁇ 1 (i),..., ⁇ n-1 (i) in a hypercube topology.
  • each vertex i is also connected to vertex ⁇ (i). That is, each node having an address i in a Modified Hypercube is additionally connected to a node having a complement address ⁇ (i).
  • the addition of one more connection (port) to each node increases the degree of the hypercube by one. While the degree of the Modified Hypercube is increased by one, the diameter of the system is reduced to approximately half of the original. This property of the Modified
  • Figure 5 shows a simple example of a 2-cube in its Modified Hypercube form MH 2 .
  • the edges of the original 2-cube are shown in solid lines, while the additional edges of the Modified Hypercube are shown in dashed lines.
  • the vertex with binary address 00 is directly connected by the additional edge to the vertex having a binary address of 11.
  • the vertex with the binary address 10 is connected by a new edge to the vertex having a binary address 01.
  • the new edges in the Modified Hypercube connect vertices with complementary binary addresses.
  • the resulting topology is the same as a completely connected network.
  • FIG. 6 shows a 3-cube modified according to the teachings of the present invention to be a Modified Hypercube MH 3 .
  • Each vertex in MH 3 has a 3-bit binary address and the edges of the original 3-cube are shown in solid black lines.
  • the dashed lines show the additional edges in the MH3 which connect the vertices having complementary binary addresses.
  • MH 3 For the MH 3 shown in Figure 6, there are 8 nodes or vertices and each is of degree 4. That is, each processor node has four I/O ports. It is easy to see that the diameter of MH 3 is 2. Compared with n-cube, the MH n is more robust. For instance, there are four virtual 3-cubes in a MH3 as shown in Figure 7 with the node address shown in decimal form for convenience. when processor nodes 3 and 4 are faulty in a 3-cube, it can be seen that there exist no "good" 2-cubes. The entire multiprocessor system would be disabled for 2-cubes.
  • the Modified Hypercube topology can be extended to n-dimensional hypercubes to reduce the diameter by approximately one-half at the cost of increasing the degree by 1. This has significant advantages to multiprocessor computers in and by itself for improving robustness and communicability within the system. Those skilled in the art will readily recognize the performance improvements in a multiprocessor design where messages between multiprocessors can be more expeditiously forwarded to reduce the communication bottleneck inherent in such designs. The application of the Modified Hypercube topology to such prior art multiprocessors such as the NCUBE/10 machine will be readily apparent for increasing the speed of communication throughout the machine at the expense of reducing its degree.
  • an NCUBE/10 processor being limited to 10 interprocessor I/O ports, would contain 512 processors in a 9-cube with a degree of 10 but a diameter of 5. Such a trade-off is extremely advantageous in computing applications where speed between interprocessors is primary.
  • the properties of the Modified Hypercube MH n can be compared to the prior art topologies of Table 1 in that the diameter d is equal to [1/2 ⁇ [log 2 N+1]] and the degree is equal to log 2 N+1.
  • the Modified Hypercube is a completely symmetric system and has the same ability to emulate most popular networks such as the ring, mesh and tree topologies as the hypercube as well as having the ability to emulate the standard hypercube.
  • the Modified Hypercube is very good in terms of its robustness and of course is uniform in its topology, lending itself to a simple routing algorithm.
  • each vertex or vertex address in SMH n (S 1 ) is denoted by a two-tuple (x,y), where 0 ⁇ x ⁇ 2 n and 0 ⁇ y ⁇ N 1 . Note that there are N 1 ⁇ 2 n vertices in SMH n (S 1 ).
  • the x value of the two- tuple represents the vertex address location in the basic topology S 2 (which is MH n ) and the y value of the two-tuple represents the vertex address location within the block topology S 1 (which has been inserted into the basic topology at each S 2 vertex).
  • vertices in the same block are connected according to the topology of the block topology S 1 . That is, two vertices (x,y) and (x,y') are connected directly if and only if vertex y is connected directly to vertex y' in S 1 .
  • the discussion of how a vertex is connected to vertices in adjacent blocks is given below.
  • n+1 ⁇ N 1 there is only a subset of vertices in a block connected to the vertices in other adjacent blocks. That is, there are more vertices in the block topology S 1 than there are required connections to the basic topology S 2 (MH n ) so that some I/O ports of the S 1 vertices are left unused (assuming homogeneous processors with a fixed number of I/O ports set to the maximum degree of the system). Assume that we choose n+1 vertices to connect to other adjacent blocks.
  • vertex (x,y) is connected directly to vertex ( ⁇ y ( x ),y), if 0 ⁇ y ⁇ n, the cubic function of the X value being used to assign the connecting address node location in the Substituted Topology.
  • Vertex (x,n) the last remaining vertex to be connected in the block topology S 1 , is connected directly to vertex ( ⁇ (x), n), the complement function of the x value being used to assign the connecting address node location in the Substituted Topology.
  • the degree of SMH n ( S 1 ) is p 1 + [ n+1/n 1 ] and the diameter of a SMH n (S 1 ) can be quite small.
  • S 1 for SMH n (S 1 ).
  • vertex i for 0 ⁇ i ⁇ N 1 is connected to vertex (i+1) mod N 1 (modulus function for base 2).
  • the diameter of SMH n (C N1 ) is no more than [3 ⁇ N 1 /2]+[n/2].
  • the Substituted and Modified Hypercube topology will support a wide variety of topologies for the block topology S 1 .
  • FIG. 8 Another example of an efficient block topology S 1 for the SMH n (S 1 ) Substituted Topology would be to use a Modified Hypercube of lesser degree for the vertices of the Modified Hypercube basic topology.
  • An example of this topology is shown in Figure 8.
  • Each node within SMH 3 of Figure 9 is addressed using a binary two-tuple.
  • the address tuples shown in Figure 9 use decimal numbers for brevity.
  • each vertex of MH 3 has been replaced by MH 2 such that each MH 2 is a class of nodes having the same inter-address.
  • This MH 2 topology individually has a diameter d equal to 1 and a degree p 1 equal to 3.
  • n 15 for S 2 and let S 1 be a loop (cycle) of four nodes with addresses 0, 1, 2, 3 decimal (or 00, 01, 10, 11 binary).
  • the binary addresses of each node in the SMH 15 (C 4 ) topology are comprised of a two- tuple with the inter-address x being a 15-bit binary address ranging from x 0 through x 14 and the inner- address being a 2-bit binary address y ranging from y 0 through y 1 .
  • the diameter of the hypercube is 15, the degree of the hypercube is 15, and the number of nodes N equal 32,768.
  • a Modified Hypercube of degree 15 the diameter is 8, the degree is 16, and the number of nodes remains the same.
  • Modified Hypercube SMH 15 (C 4 ) has 131,072 nodes with a degree of 6 and a diameter of 13. Since the block topology S 1 only contains four nodes and each vertex of the MH 15 has a degree of 16 requiring each block to have 16 I/O lines, the mapping of the nodes of the block topology onto the basic topology is not a simple 1-to-1 correspondence. Each node within the block topology therefore must map onto four nodes in the basic topology. Since the four nodes within the block topology S 1 already have connections to two other nodes, four additional connections are required of each node to connect the nodes of the block topology to the edges of the basic topology. Thus, the degree of the system is equal to 6.
  • the S 1 block topology must be a cycle- connected topology.
  • a requirement for a cycle-connected topology is that there must be a path passing through all of the nodes only once.
  • Examples of cycle-connected topologies that can be used as S in the present invention are the ring, any hypercube including the Modified Hypercube, any full connection topology, etc.
  • An example of a topology which is not a cycle-connected topology and cannot be used as a block topology for the present invention is the tree structure.
  • Table 2 lists the diameter and ports for the SMH n (S 1 ) topologies where several S 2 topologies are listed. Table 2 clearly shows the advantages of the Substituted and Modified Hypercube topologies of the preferred embodiment of the present invention.
  • the diameter and degree of each of the Substituted Topologies can be calculated from the general formulas given below.
  • Modified Hypercube S 2 of degree 16 and C 8 is an 8-node S 1 cycle of degree 2 and diameter 4.
  • the degree and the diameter of SMH 15 (C 8 ) are 4 and 20 respectively.
  • the degree and the diameter of SMH 15 (C 32 ) are 3 and 32 respectively.
  • the degree and the diameter of SMH 15 (C 4 ) are 6 and 14 respectively.
  • the topology can be selected accordingly.
  • mapping of the nodes of the block topology- onto the edges of the basic topology is a matter of choice and is not limited to a specific algorithm. However, by keeping the mapping orderly, the routing algorithm for transferring messages throughout the multiprocessor system will be simplified. An example of mapping in this illustration would be to start with a node in the block topology having address 00 and being mapped onto four edges of the basic topology.
  • the addresses of the nodes outside the block onto which the present node is mapped would be ( 0 , X 1 , X 2 , X 3 ,...,X 14 , oo), (X 0 , X 1 , X 2 , X 3 , 4 , X 5 , ..., X 14 , 00), (X 0 ...X 7 , 8, X 9 ,..., X 14 , 00), and (X 0 ,..., X 11 , 12 , X 13 , X 14 , 00).
  • the foregoing mapping of nodes within the block topology onto edges in the basic topology is not a unique way of assigning the connections. Those skilled in the art will readily recognize a wide variety of mappings.
  • mapping of the node address 00 within the block topology connects to the nodes within other block topologies also having a local address of 00. This can also be seen in the SMH 3 of Figure 5. This mapping technique is also used to facilitate the routing algorithm and does not necessarily have to be adhered to in mapping one block topology onto another block topology through the edges of the basic topology.
  • Hypercube as the block topology on SMH n (S 1 ) are for illustration purposes only and are not intended to be limiting in the scope of the present invention. Those skilled in the art will readily recognize the extension of the SMH n topology to very large structures and the block topology used to create the Substituted Topology can likewise be very large. Although the applicants of the present invention have described one of the preferred embodiment of the present invention to be a Substituted Modified Hypercube with the block topology also being a Modified Hypercube or a cycle-connected topology, various design decisions will determine the decision as to the actual topologies substituted based on, for example, fixing the degree or fixing the diameter desired in the resulting system to tailor the multiprocessor system to its preferred application.
  • the routing algorithm for the Modified Hypercube takes advantage of the fact that direct connections are made within the hypercube structure between complemented node addresses and as many inter-block transfers are used as possible to shorten the diameter. This serves to reduce the overall diameter of the MH n approximately one-half, since the additional edges tend to reduce the worst-case minimum distances between any two nodes. In effect, the routing algorithm can use the additional edges as a "shortcut" to find the shortest path between nodes.
  • the routing algorithm for Substituted and Modified Hypercube takes advantage of the regularity inherent in hypercube topologies in that each node can be programmed with a similar routing algorithm, different only in the embedded address of the resident node.
  • the routing algorithm then uses the resident node address to calculate which I/O port to forward the message to if the message is not destined for the resident node.
  • the following routing algorithm takes advantage of the SMH n (S 1 ) topology to find the minimum distance between any two nodes. This algorithm is described in detail and in general mathematical terms to show the applicability of the routing algorithm to SMH n (S 1 ) topologies where the block topology S 1 can be any number of a wide variety of topologies.
  • SMH n (S 1 ).
  • P j denote a subset Of [1, 2,...,s] such that for each i ⁇ P j , a block w node (w, j) is connected directly to node ( ⁇ i , (w), j).
  • the path length is about [n/2]+[3 ⁇ N 1 /2].
  • HD(x,x') denote the subset of [0, 1,...,n-1] which contains all indexes in which x is different from x'. That is, the cardinality of HD is the Hamming distance of x and x'. Let denote [0, 1,...,n-1]-HD. When the Hamming distance between x and x' is more than [n/2], we may have to choose a j from H and connect (x,y) to ( ⁇ j (x),y) before the complement function ⁇ can be applied. This algorithm will be better understood from the detailed examples of the routing algorithm presented below.
  • the routing algorithm for the SMH n (S 1 ) topology is very simple and efficient.
  • 0 ⁇ j ⁇ n-1 and j mod N 1 y].
  • the routing algorithm for forwarding communication packets between nodes in an SMH n (S 1 ) multiprocessor system is as follows:
  • Step 3 Compute HD(x,x'), and M(y); Note that M(y) is fixed for each node (x,y). Therefore, it can be precomputed and stored at each node location.
  • SMH 11 C 4
  • the basic topology SMH 11 is a Substituted and Modified Hypercube of degree 11
  • the block topology C 4 is a cycle of four nodes (a 2-cube, not modified).
  • SMH n (C n+1 ) SMH n (C n+1 ), where C n+1 is a ring (cycle) topology with n+1 nodes.
  • This topology is similar to the previously proposed cube- connected cycles. However, its diameter is shorter than the diameter of cube-connected cycles due to the use of a Modified Hypercube as the basic topology.
  • N k ⁇ 2 n and the diameter d is roughly [n/2]+k+[k/2] (i.e., less than 3/2 ⁇ log 2 N).
  • S 1 can be any topology of degree 3 mentioned above.
  • the block topology S 1 can be a chordal ring of n+1 nodes.
  • the topologies discussed above can be programmed to emulate other structures such as the cube- connected cycle (CCC) algorithm. All CCC algorithms can easily be embedded into the SMH n topology. However, the basic SMH n topology is more robust and has better mapping flexibility than the CCC, and hence is preferred.
  • CCC cube- connected cycle
  • the basic NCUBE/10 processor node supports ten I/O ports in a 10-cube configuration to allow it to communicate with other processors in the system.
  • SMH 10 (MH 2 ) topology applied to the prior art NCUBE/10 processor nodes, an improvement in the number of nodes that the multiprocessor can support can be achieved while keeping the diameter constant and reducing the actual number of ports required for implementation.
  • N 1 2 2 or 4.
  • the number of ports required in this Substituted and Modified Hypercube topology is a maximum of 6. Hence, the number of ports required to implement this system and the diameter have actually been reduced. The result is a superior multiprocessor system with four times the nodes of the NCUBE/10 machine and better robustness.
  • the preferred embodiments of the present invention present many advantages over the prior art.
  • the diameter has been reduced, since the modified hypercube is employed as the basic topology.
  • the degree of the topology can be reduced by the substituting scheme. This also offsets the increased degree resulted from using a modified hypercube instead of using a hypercube as the basic topology in the substituting scheme.
  • the robustness of the SMH topologies is better than that of hypercube. This is due to the fact that each node in the hypercube is replaced by a set of nodes (i.e., a block topology).
  • the number of paths between two nodes in the original hypercube is less than the number of paths between two nodes chosen from two block topologies in a SMH topology which correspond to the two nodes in the hypercube.
  • the routing algorithm is very easy and efficient.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

La présente invention décrit une topologie améliorée pour systèmes multiprocesseurs qui facilite la communication entre les processeurs pour des ordinateurs de traitement en simultanéité. On obtient cette topologie en modifiant un hypercube à n dimensions par introduction de voies de communication additionnelles dans la structure de l'hypercube, de façon à réduire le diamètre de l'hypercube par augmentation du nombre de points d'accès (appelé degré) pour chaque noeud ou sommet. Dans une autre variante de la présente invention, les angles d'un hypercube modifié sont substitués par des topologies ayant des voies de communication cycliques, de façon à réduire le degré de la structure globale.
PCT/US1988/002782 1987-08-14 1988-08-12 Topologie hypercube pour systemes multiprocesseurs avec voies de communication additionnelles entre les noeuds ou topologies avec angles substitues WO1989001665A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US8598087A 1987-08-14 1987-08-14
US085,980 1987-08-14

Publications (1)

Publication Number Publication Date
WO1989001665A1 true WO1989001665A1 (fr) 1989-02-23

Family

ID=22195201

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1988/002782 WO1989001665A1 (fr) 1987-08-14 1988-08-12 Topologie hypercube pour systemes multiprocesseurs avec voies de communication additionnelles entre les noeuds ou topologies avec angles substitues

Country Status (1)

Country Link
WO (1) WO1989001665A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2668626A1 (fr) * 1990-10-30 1992-04-30 Thomson Csf Procede de configuration d'un systeme informatique maille.
WO1992006436A3 (fr) * 1990-10-03 1992-10-15 Thinking Machines Corp Systeme d'ordinateur parallele
EP0817097A2 (fr) * 1996-07-01 1998-01-07 Sun Microsystems, Inc. Sous-système d'interconnexion utilisant un agencement de commutation de degré limité pour un système multiprocesseur avec un petit nombre de processeurs

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4247892A (en) * 1978-10-12 1981-01-27 Lawrence Patrick N Arrays of machines such as computers
EP0132926A2 (fr) * 1983-05-31 1985-02-13 W. Daniel Hillis Processeur parallèle
US4729095A (en) * 1986-05-19 1988-03-01 Ncube Corporation Broadcast instruction for use in a high performance computer system
US4730322A (en) * 1985-09-27 1988-03-08 California Institute Of Technology Method and apparatus for implementing a maximum-likelihood decoder in a hypercube network
US4739476A (en) * 1985-08-01 1988-04-19 General Electric Company Local interconnection scheme for parallel processing architectures
US4766534A (en) * 1986-10-16 1988-08-23 American Telephone And Telegraph Company, At&T Bell Laboratories Parallel processing network and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4247892A (en) * 1978-10-12 1981-01-27 Lawrence Patrick N Arrays of machines such as computers
EP0132926A2 (fr) * 1983-05-31 1985-02-13 W. Daniel Hillis Processeur parallèle
US4739476A (en) * 1985-08-01 1988-04-19 General Electric Company Local interconnection scheme for parallel processing architectures
US4730322A (en) * 1985-09-27 1988-03-08 California Institute Of Technology Method and apparatus for implementing a maximum-likelihood decoder in a hypercube network
US4729095A (en) * 1986-05-19 1988-03-01 Ncube Corporation Broadcast instruction for use in a high performance computer system
US4766534A (en) * 1986-10-16 1988-08-23 American Telephone And Telegraph Company, At&T Bell Laboratories Parallel processing network and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
COMMUNICATIONS OF ACM, Vol. 24, No. 5, May 1981, F.P. PREPARATA, "The Cube-connected Cyles: A Versatile Network for Parallel Computation", see pages 300, 302-303. *
IEEE MICRO OCTOBER 1986, JOHN P. HAYES et al., "A Microprocessor-based Hypercube Supercomputer", see pages 7, 14. *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992006436A3 (fr) * 1990-10-03 1992-10-15 Thinking Machines Corp Systeme d'ordinateur parallele
US5333268A (en) * 1990-10-03 1994-07-26 Thinking Machines Corporation Parallel computer system
FR2668626A1 (fr) * 1990-10-30 1992-04-30 Thomson Csf Procede de configuration d'un systeme informatique maille.
EP0484200A1 (fr) * 1990-10-30 1992-05-06 Thomson-Csf Procédé de configuration d'un système informatique maillé
EP0817097A2 (fr) * 1996-07-01 1998-01-07 Sun Microsystems, Inc. Sous-système d'interconnexion utilisant un agencement de commutation de degré limité pour un système multiprocesseur avec un petit nombre de processeurs
EP0817097A3 (fr) * 1996-07-01 2000-07-05 Sun Microsystems, Inc. Sous-système d'interconnexion utilisant un agencement de commutation de degré limité pour un système multiprocesseur avec un petit nombre de processeurs

Similar Documents

Publication Publication Date Title
US5170482A (en) Improved hypercube topology for multiprocessor computer systems
EP0733237B1 (fr) Reseau d'acheminement et d'interconnexion multidimensionnel pour un ordinateur a processeur massivement parallele
US5313645A (en) Method for interconnecting and system of interconnected processing elements by controlling network density
US4621359A (en) Load balancing for packet switching nodes
Johnsson et al. Optimum broadcasting and personalized communication in hypercubes
US5701416A (en) Adaptive routing mechanism for torus interconnection network
US5630162A (en) Array processor dotted communication network based on H-DOTs
Finkel Processor interconnection strategies
US11531637B2 (en) Embedding rings on a toroid computer network
US11372791B2 (en) Embedding rings on a toroid computer network
WO1989001665A1 (fr) Topologie hypercube pour systemes multiprocesseurs avec voies de communication additionnelles entre les noeuds ou topologies avec angles substitues
Yeh et al. Routing and embeddings in cyclic Petersen networks: an efficient extension of the Petersen graph
US20200310819A1 (en) Networked Computer
Yokota et al. A prototype router for the massively parallel computer RWC-1
US11169956B2 (en) Networked computer with embedded rings field
KR102801506B1 (ko) 토로이드 컴퓨터 네트워크 상의 내포된 링들
JPH07114515A (ja) 同期通信用ネットワークを有する分散メモリ計算機
Prakash et al. VLSI implementation of a wormhole router using virtual channels
Yang et al. Adaptive wormhole routing in k-ary n-cubes
Scheidig et al. An efficient organization for large, network-based multicomputer systems
Kirmani Design and implementation of a modular deadlock-free routing for a mesh network of transputers
Ravikumar VLSI Implementation of a Wormhole Router Using Virtual Channels
Parhami Other Mesh-Related Architectures
Séguin et al. Computer Science Division Electrical Engineering and Computer Sciences University of California, Berkeley, CA 94720

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载