+

WO2013006030A1 - Apparatus and method for performing parallel bits distribution with bi-delta network - Google Patents

Apparatus and method for performing parallel bits distribution with bi-delta network Download PDF

Info

Publication number
WO2013006030A1
WO2013006030A1 PCT/MY2012/000141 MY2012000141W WO2013006030A1 WO 2013006030 A1 WO2013006030 A1 WO 2013006030A1 MY 2012000141 W MY2012000141 W MY 2012000141W WO 2013006030 A1 WO2013006030 A1 WO 2013006030A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
control
delta network
delta
bits
Prior art date
Application number
PCT/MY2012/000141
Other languages
French (fr)
Inventor
Bin Mohamad Yusof MOHAMAD YUSRI
Santosh Palai SMRUTI
Prasad DEVI
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2013006030A1 publication Critical patent/WO2013006030A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09CCIPHERING OR DECIPHERING APPARATUS FOR CRYPTOGRAPHIC OR OTHER PURPOSES INVOLVING THE NEED FOR SECRECY
    • G09C1/00Apparatus or methods whereby a given sequence of signs, e.g. an intelligible text, is transformed into an unintelligible sequence of signs by transposing the signs or groups of signs or by replacing them by others according to a predetermined system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/12Details relating to cryptographic hardware or logic circuitry
    • H04L2209/125Parallelization or pipelining, e.g. for accelerating processing of cryptographic operations

Definitions

  • the present invention relates to bit permutation.
  • the present invention relates to a system for distributing bits through a bi-dclta network.
  • Bit permutation is one way of handling general bit-level processing. Bit permutation operations are therefore becoming more prevalent in the field of cryptography and baseband processing. For example, they are used in Data Encryption Standard (DES) and bit puncturing and interleaving in Forward Error CotTection (FBC).
  • DES Data Encryption Standard
  • FBC Forward Error CotTection
  • bit puncturing requires a number of shifts, bitwise OR and bitwise AND operations, whilst in a general bit permutation require even larger number of operations. For example, to perform general 128-bit permutation, it would require 512 (i.e. 4 x 128) operations. That would be too slow for high-speed wireless broadband baseband processing.
  • Extract and Parallel Deposit Instructions proposed a parallel deposit (PDEP) instruction consisting of three operand registers, which include a data bits source register, a bit mask register and a target register. Bits from lower part of the source register are distributed to locations in the result register marked with 1 -bits in a mask register and bits from upper part of the same source register to locations in the same result register marked with 0-bits in the same mask register. Relative positions of the source bits within the two target groups are retained.
  • the disclosure provides the uses of complete butterfly network, controls from population count (adder) prefix circuits and left-rolate and mask circuits, and data bits marked with 0-bits are placed in normal order
  • FIG. 1 illustrates a known butterfly network of an 8-bit PDRP operation. Control signals for the switches are generated using population count prefix circuit and so-called Left-Rotate and Complement (LRO ' fC) circuits.
  • LRO ' fC Left-Rotate and Complement
  • PERMUTATIONS USING PERMUTATION INSTRUCTIONS BASED ON MODIFIED OMEGA AND FTJP STAGES discloses permutation instructions which can be used in software executed in a programmable processor for solving permutation problems in cryptography, multimedia and other applications.
  • the peraiute instructions are based on an omega-flip network comprising at least two stage. It uses single stages of omega and flip networks and controls directly from mask. The bit distribution requires generally many operations.
  • PERMUTATIONS USING PERMUTATION INSTRUCTIONS RASED ON BUTTERFLY NETWORKS discloses permutation instructions which can be used in software executed in a programmable processor for solving permutation problems in cryptography, multimedia and other applications.
  • the permute instructions arc based on a Bcncs network comprising two butterfl networks. Summary
  • a bi-della network for distributing bits through bit distribution (BDST) instruction from an input to an output of the bi -d lta network.
  • the network comprises a control delta network constituting a forward path of the bi-dclta network for receiving an n-bits data from a bitmask register; a data delta network constituting a reverse path of the bi-dclta network for receiving n-hits data from a source register; and a control generation network having a plurality of stages between the control delta network and the data delta network, wherein each stage operationally receives inputs from inputs of stage m of the control delta network and generates control signals for switches in stage m of both control and data delta networks, whereby, m represents a stage number starting from 1 to log 2 (n).
  • both the control delta network and the data delta network comprises log:(n) stages respectively.
  • the stages in the control delta network may be denoted in ati incremental manner from input to output direction from 1 to log 2 (n), whilst the stages in the data delta network may be denoted in a decremental manner from input to output direction from log2(n) to 1 ,
  • the stages in the control generation network may further be denoted in an incremental manner from input to output direction from I to log?(n).
  • each stage of the cuntrol generation network may comprise 2 m -l address groups, and each group is adapted to control n/2' n switch elements at the respective stage of the control generation network.
  • a method for distributing bits through a bi-delta network with bit distribution (BDST) instruction from an input to an output comprises loading source bits into a source register; determining target bits in a result register lo move oonesponding lower pari of the source register in preserved relative bit orde by setting 1-bit markers at the corresponding bit positions in a bitmask register; determining the target bits in same result register to move the corresponding upper part of same source register in reverse relative bit order by setting 0-bit markers at the corresponding bit positions in the bilmask register; and executing the BDST instruction over the source and bitmask registers and subsequently depositing the result of the bits distribution into the result register.
  • BDST bit distribution
  • executing the BDST instructions may further comprise feeding n-bits data from bitmask register to a control delta network and the control generation network, wherein both the control delta network and the control generation network comprises log2(n) stages respectively, and the stages are numbered increasingly from I to log 2 (n) in the input to output direction; feeding n-bils data from source register tu the dala delta network, wherein the data delta network comprises iog 2 (n) stages, has reverse structure of the control delta network and the stages arc numbered decreasing! ⁇ ' from log 7 (n) to I in the i put to output direction; and depositing n-bit output of the data delta network in the result register.
  • the stage m of the control generation network (hat acquires inputs from the corresponding stage m of the control data nelwork may further comprise dividing the control generation stage m into 2 m ⁇ address groups, wherein each group taking n/2 m" ' inputs associated with target address bits (a m .
  • Tbe bi-delta network adapted to carry out parallel bit distribution.
  • Tbe bi-delta network comprises a control delta, network; and bit markers that are operable to be distributed in a forward manner in the control delta network for configuring switches of the control delta network.
  • User bits arc operationally distributed in a backward manner from the downstream of the control delta network, in both order and reverse-order in accordance with the bit markers, to output the distributed user bits at the upstream of the control delta network.
  • a method for carrying out parallel bit distribution in a bi-delta network comprise distributing bit markers in a forward manner of a control delta network for configuring switches of the control delta network; and distributing user bits in a backward manner from the downstream of the control delta network, in both order and rcvcrsc-ordcr in accordance with the bit inarkei's, to output the distributed user bits at the upstream of the control delta nelwork.
  • FIG. I illustrates a known butterfly network of an 8 bit PDEP operation
  • FIG. 2 exemplifies a BDST operation in accordance with one embodime t of the present invention
  • FIG. 4 illustrates a bi-dcita network in accordance with an embodiment of the prese t invention
  • FIG. SA exemplifies another bi-della network in accordance with another embodiment of the present invention.
  • FIG. 5B illustrates target address bit in routing I -bits in the flip network of FIG. 5 ⁇ ;
  • FIG. 7 illustrates a BDST execution unit (BDEU) in accordance with one embodiment of the present invention
  • FIG. 8 illustrates a block diagram of a BDEU implementation in accordance with one embodiment of the present invention.
  • FIG. 9 illustrates a simplified operation of the BDST Rxeoulion unit of
  • FIG. 7 in accordance with an embodiment of the present invention.
  • the present invention provides a system and method for parallel bit distribution using bi-delta network which could perform arbitrary bits distribution/placement in one operation.
  • Tn nne embodiment of the present invention there is provided a new instruction fur distributing bits.
  • the bit distribution is herein alter referred to as "BDST".
  • the BDST distributes bits in a normal order, i.e. from lower part of a source register to bit locations in a result register correspondingly marked with 1 -bits in a mask register, and in a reverse order, i.e. from upper part of the source register to bit locations in the result register correspondingly marked with 0-bits in the mask register.
  • the BDST uses bi-delta network, in particular a flip network, using both directions to realize the instruction.
  • FIG. 2 exemplifies a BDST operation far 8-bit data in accordance with one embodiment of the present invention.
  • the BDST operation is an 8- bil operation with a sel ⁇ - routing bi-delta network.
  • the BDST operation includes three register operands: a source register 202, a result register 204 and a mask register 206. The bits in the lower part, i.e. bo, bt, and b .
  • the BDST operation is implemented in a network that connects to a set of 2*2 switches as shown in FrG. 3. It is understood that the same operation can be applied accordingly in other networks configurations without departing the scope of the present invention.
  • the switch shown in FIG. 3 is a crossba switch with two input terminals, iO and il, and two output terminals, oO, o l. The inputs may be configured to have straight connections or cross connections to the outputs.
  • ⁇ typical delta network which is also known as digit controlled network, is a type of network with a path from each input to a respective output and ull paths leading to the same output make use of the same path descriptor.
  • the delta network forms a subset of the more general class of banyan networks that offers a self- routing property.
  • ⁇ banyan network is a multi-stage network of interconnect crossbar switches.
  • a bi-delta network herein referred, on the other hand is a type of network having a forward delta network and a reverse delta network.
  • FIG. 4 illustrates a bi-delta network with 16-bit inputs and 16-bit outputs, i.e., 16x16 network, in accordance with an embodiment of the present invention.
  • the dashed lines illustrate all paths with, a same path descriptor from all inputs to the same output 7.
  • path descriptor from input 2 to output 7 and all other paths leading to output 7, is found lo be (I, I, I, 0). Note that the path descriptor is equivalent to the address of the output, least significant bit (Isb) first.
  • FIG. 4 also illustrates "intermediate" path descriptor with the labels reversed, for example, ' ⁇ ⁇ for switch I of stage 3 suggests the partial final output address as well as the "digit controlled" nature of the network.
  • the top input is labeled ' ⁇
  • binury digit 2 of the target address (first 'x' from right in the label) is used to decide whether the bit from the input is passed lo input labeled ' ⁇ at switch I or to input labeled ' ⁇ ⁇ ⁇ ⁇ ' at switch 5 of stage 4.
  • N*N RBN remains non-blocking if the target addresses for the inputs arc divided into two groups: one with ascending target addresses and another descending addresses, modulo N.
  • H is possible to assign all 1-bils in the mask increasing target addresses from 0 onward and all 0-bits decreasing target addresses from N-l downward.
  • Kannan, et al devised a method to generate the address bits in a distributed fashion with an R N network.
  • address bit t associated with the 1-bits alternates between 0 and 1.
  • stage m there are 2 m_1 address groups wherein the target address bits a m associated with 1-bit inputs alternating between 0 and I in the delta network.
  • the address bits alternate between 1 and 0. as the addresses for 0-bits arc assigned from N- 1 down.
  • the address bils are then used lo control corresponding switches 502.
  • the presenL embodiment is possible to be implemented through one single circuit, instead of a dedicated circuit for 1-bits and 0-bits as proposed by Kannan, et al. using a chain of XOR gates to generate control lines tor the switch elements 502.
  • FT !. 6 illustrates a circuit diagram for an 8-bit input address group of an address counting circuit in accordance with a further embodiment of the present invention.
  • the circuit diagram comprises 8 inputs . , 3 XOR gales and 4 switches.
  • the single circuit will count both 1-bits (from 0) and 0-bits in the inputs correctly.
  • the XOR circuit in FiGL 6 is a serial implementation of prefix compulation.
  • the circuit has a depth and, thus, latency, of n levels. There arc known methods on parallel prefix circuits with depth as small as log2(n).
  • FIG. 7 illustrates a BDST execution unit (BDEU) 700 in accordance with one embodiment of the present invention.
  • the BDST execution unit 700 is adapted to handle 128-bit BDST instructions.
  • the figure depicts a top-level view of the BDEU 700.
  • a 128-bit bitmask mO, consisting of 1-bit and 0-bit markers for distribution locations, and a 128-bit input data xO that is to be distributed are inputs to the BDEU 700 while a completely separated 128-biL bitmask m7 (i.e., all I -bit markers are in the less significant bits in order and all the 0-bit markers arc in the more significant bits in reverse order) and a 128-bit distributed data x.7 are outputs from the BDEU 801).
  • the BDEU 700 reads from a 128-bit register file (not shown) and writes the outputs to the same 128-bit register tile. It is also possible that the BDEU 700 reads and writes to the memory instead.
  • a person skilled in the art should understand thai reading of inputs and writing of outputs from a computing unit in a processor are performed at different stages in the processor pipeline.
  • control stage 7 of the control flip network becomes superfluous. By removing the final stage, a number of switches in other stages in the control flip network may also become unnecessary. In particular, last switch element associated with each address group may be removed.
  • PrG. 9 illustrates a simplified operation of the RDST Rxecutioii unit 700 of FIG. 7 in accordance with an embodiment of the present invention.
  • the bi-delta network can be implemented without using memory elements and bidirectional switches by having two unidirectional delta networks one in the forward direction for routing control bits mid the other in the backward direufiun for routing user bits, as required by a typical ASIC implementation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention provides a bi-delta network for distributing bits through bit distribution (BDST) instruction fiOm an input to an output of the bi-dclta network. The network comprises a control delta network constituting a forward path of a delta network for receiving an n-bits data from a bitmask register; a data delta network constituting a reverse path of said delta network for n-bits data from a source register; and a plurality of control generation stages between the control delta network and the data delta network, wherein each stage operationally receives inputs from inputs of stage m of the control delta network and generates control signals for switches in stage m of both networks, m represents a stage number stalling from 1 to log2(n). A method for distributing bits is also provided.

Description

Apparatus and Method for Performing Parallel Bits Distribution with Bi-delta
Network
Field of the Invention
[0001] The present invention relates to bit permutation. In particular, the present invention relates to a system for distributing bits through a bi-dclta network.
Background
[0002] Cryptography and baseband processing, for example, demand high speed bit-lcvcl processing. Bit permutation is one way of handling general bit-level processing. Bit permutation operations are therefore becoming more prevalent in the field of cryptography and baseband processing. For example, they are used in Data Encryption Standard (DES) and bit puncturing and interleaving in Forward Error CotTection (FBC). However, arbitrary bit permutations are generally not supported on existing general-purpose processors. For example, bit puncturing requires a number of shifts, bitwise OR and bitwise AND operations, whilst in a general bit permutation require even larger number of operations. For example, to perform general 128-bit permutation, it would require 512 (i.e. 4 x 128) operations. That would be too slow for high-speed wireless broadband baseband processing.
[0003] Hilewita and Lee, in "Fast Bit Compression and Expansion with Parallel
Extract and Parallel Deposit Instructions" (2006), proposed a parallel deposit (PDEP) instruction consisting of three operand registers, which include a data bits source register, a bit mask register and a target register. Bits from lower part of the source register are distributed to locations in the result register marked with 1 -bits in a mask register and bits from upper part of the same source register to locations in the same result register marked with 0-bits in the same mask register. Relative positions of the source bits within the two target groups are retained. The disclosure provides the uses of complete butterfly network, controls from population count (adder) prefix circuits and left-rolate and mask circuits, and data bits marked with 0-bits are placed in normal order
[0004] FIG. 1 illustrates a known butterfly network of an 8-bit PDRP operation. Control signals for the switches are generated using population count prefix circuit and so-called Left-Rotate and Complement (LRO'fC) circuits. [0005] US6,952,478 entitled "METHOD AND SYSTEM FOR PERFORMING
PERMUTATIONS USING PERMUTATION INSTRUCTIONS BASED ON MODIFIED OMEGA AND FTJP STAGES" discloses permutation instructions which can be used in software executed in a programmable processor for solving permutation problems in cryptography, multimedia and other applications. The peraiute instructions are based on an omega-flip network comprising at least two stage. It uses single stages of omega and flip networks and controls directly from mask. The bit distribution requires generally many operations.
[0006] US6,922,472 entitled "METHOD AND SYSTEM FOR PERFORMING
PERMUTATIONS USING PERMUTATION INSTRUCTIONS RASED ON BUTTERFLY NETWORKS" discloses permutation instructions which can be used in software executed in a programmable processor for solving permutation problems in cryptography, multimedia and other applications. The permute instructions arc based on a Bcncs network comprising two butterfl networks. Summary
[0007] In one aspect of the present invention, there is provided a bi-della network for distributing bits through bit distribution (BDST) instruction from an input to an output of the bi -d lta network. The network comprises a control delta network constituting a forward path of the bi-dclta network for receiving an n-bits data from a bitmask register; a data delta network constituting a reverse path of the bi-dclta network for receiving n-hits data from a source register; and a control generation network having a plurality of stages between the control delta network and the data delta network, wherein each stage operationally receives inputs from inputs of stage m of the control delta network and generates control signals for switches in stage m of both control and data delta networks, whereby, m represents a stage number starting from 1 to log2(n).
[0008] In one embodiment, both the control delta network and the data delta network comprises log:(n) stages respectively. The stages in the control delta network may be denoted in ati incremental manner from input to output direction from 1 to log2(n), whilst the stages in the data delta network may be denoted in a decremental manner from input to output direction from log2(n) to 1 , The stages in the control generation network may further be denoted in an incremental manner from input to output direction from I to log?(n). [0009] In another embodiment, each stage of the cuntrol generation network may comprise 2m-l address groups, and each group is adapted to control n/2'n switch elements at the respective stage of the control generation network. Control signals from the control generation network for a switch element, k, in an address group of stage m may be computed by a parity (XOR) prefix, circuit for generating partly count for first 2k+l inputs to Ihe address group, whereby k represents a switch element number denoting from 0 to n/2'"-l .
[0010] In another aspect of the present invention, there is provided a method for distributing bits through a bi-delta network with bit distribution (BDST) instruction from an input to an output. The method comprises loading source bits into a source register; determining target bits in a result register lo move oonesponding lower pari of the source register in preserved relative bit orde by setting 1-bit markers at the corresponding bit positions in a bitmask register; determining the target bits in same result register to move the corresponding upper part of same source register in reverse relative bit order by setting 0-bit markers at the corresponding bit positions in the bilmask register; and executing the BDST instruction over the source and bitmask registers and subsequently depositing the result of the bits distribution into the result register. [0011] In one embodiment, executing the BDST instructions may further comprise feeding n-bits data from bitmask register to a control delta network and the control generation network, wherein both the control delta network and the control generation network comprises log2(n) stages respectively, and the stages are numbered increasingly from I to log2(n) in the input to output direction; feeding n-bils data from source register tu the dala delta network, wherein the data delta network comprises iog2(n) stages, has reverse structure of the control delta network and the stages arc numbered decreasing!}' from log7(n) to I in the i put to output direction; and depositing n-bit output of the data delta network in the result register. The stage m of the control generation network (hat acquires inputs from the corresponding stage m of the control data nelwork may further comprise dividing the control generation stage m into 2m~ address groups, wherein each group taking n/2m"' inputs associated with target address bits (am.|, . .. ai, ¾) from stage m of the control delta network; and applying parity (XOR) prefix circuit in each group, wherein for switch k in the address group in stage m of each delta network, the control signal for thai switch is derived from parity couni from the parity of inputs 2k, 2k-l , .. ., 1 , 0, whereby k represents & switch index withui the address group from 0 to n/2n'.
[0012] In a further aspect of the present invention, there is provided a bi-dclta network adapted to carry out parallel bit distribution. Tbe bi-delta network comprises a control delta, network; and bit markers that are operable to be distributed in a forward manner in the control delta network for configuring switches of the control delta network. User bits arc operationally distributed in a backward manner from the downstream of the control delta network, in both order and reverse-order in accordance with the bit markers, to output the distributed user bits at the upstream of the control delta network.
[00131 In yet a further aspect of the present invention, there is provided a method for carrying out parallel bit distribution in a bi-delta network. The method comprise distributing bit markers in a forward manner of a control delta network for configuring switches of the control delta network; and distributing user bits in a backward manner from the downstream of the control delta network, in both order and rcvcrsc-ordcr in accordance with the bit inarkei's, to output the distributed user bits at the upstream of the control delta nelwork. Brief Description of the Drawings
[0014] This invention will be described by way of non-limiting embodiments of the present invention, vvith reference to the accompanying drawings, in which:
[0015] FIG. I illustrates a known butterfly network of an 8 bit PDEP operation;
[0016] FIG. 2 exemplifies a BDST operation in accordance with one embodime t of the present invention;
[0017] FIG. 3 illustrates a 2><2 switch:
[0018] FIG. 4 illustrates a bi-dcita network in accordance with an embodiment of the prese t invention;
[0019] FIG. SA exemplifies another bi-della network in accordance with another embodiment of the present invention;
[0020] FIG. 5B illustrates target address bit in routing I -bits in the flip network of FIG. 5Λ;
[0021] FIG. 6 illustrates a circuit diagram of an address counting circuit in accordance with a further embodiment of the present invention:
[0022] FIG. 7 illustrates a BDST execution unit (BDEU) in accordance with one embodiment of the present invention;
[0023] FIG. 8 illustrates a block diagram of a BDEU implementation in accordance with one embodiment of the present invention; and [0024] FIG. 9 illustrates a simplified operation of the BDST Rxeoulion unit of
FIG. 7 in accordance with an embodiment of the present invention.
Detailed Description
[0025] n line with the above summary, the following description of a number of specific and alternative embodiments is provided to understand the inventive features of the present invention. It shall be apparent to one skilled in the art, however that this invention may be practiced without such specific details. Some of the details may not be described at length so as not to obscure the invention. For ease of reference, common reference numerals will be used throughout the figures when referring to the same or similar features common to the figures.
[0026] The present invention provides a system and method for parallel bit distribution using bi-delta network which could perform arbitrary bits distribution/placement in one operation.
[0027] Tn nne embodiment of the present invention, there is provided a new instruction fur distributing bits. For simplicity, the bit distribution is herein alter referred to as "BDST". The BDST distributes bits in a normal order, i.e. from lower part of a source register to bit locations in a result register correspondingly marked with 1 -bits in a mask register, and in a reverse order, i.e. from upper part of the source register to bit locations in the result register correspondingly marked with 0-bits in the mask register. The BDST uses bi-delta network, in particular a flip network, using both directions to realize the instruction. In this bi-dclta network, switch control configuration is derived in a forward flow of the network and data bits distribution is performed in a reverse flow of the network. (0028J FIG. 2 exemplifies a BDST operation far 8-bit data in accordance with one embodiment of the present invention. In this example, the BDST operation is an 8- bil operation with a sel Γ- routing bi-delta network. The BDST operation includes three register operands: a source register 202, a result register 204 and a mask register 206. The bits in the lower part, i.e. bo, bt, and b.i, of the source register 202 is distributed in normal order to locations in the result register 204 that are marked with 1-bits in the mask register 206 and the remaining bits in the upper part. i.e. b4, b5: ba and b7, of the source register 202 is distributed in reverse order to the locations in the result register 204 marked with 0-bils in the mask register 206. [0029] For purpose of illustrations, the BDST operation is implemented in a network that connects to a set of 2*2 switches as shown in FrG. 3. It is understood that the same operation can be applied accordingly in other networks configurations without departing the scope of the present invention. The switch shown in FIG. 3 is a crossba switch with two input terminals, iO and il, and two output terminals, oO, o l. The inputs may be configured to have straight connections or cross connections to the outputs.
[0030] Λ typical delta network, which is also known as digit controlled network, is a type of network with a path from each input to a respective output and ull paths leading to the same output make use of the same path descriptor. The delta network forms a subset of the more general class of banyan networks that offers a self- routing property. Λ banyan network is a multi-stage network of interconnect crossbar switches. A bi-delta network herein referred, on the other hand, is a type of network having a forward delta network and a reverse delta network. [0031] FIG. 4 illustrates a bi-delta network with 16-bit inputs and 16-bit outputs, i.e., 16x16 network, in accordance with an embodiment of the present invention. In particular, the bi-delta network exemplified in WIG. 4 is an Reverse Banyan Network (RBN), which is also referred to as indirect binary n-cube network. The RBN comprises thirty-two (32) 2x2 switch elements 402 arranged in four (4) stages of eight (8) switch elements 4U2 at each stage. For the purpose of this description, the switch elements 402 arc counted from 0-7 at each stage and the stages are counted from I -4 stages. At each switch clement 402, a top output is labeled as 0 and a boliom output as 1. The switch output labels are used to describe a given path, i.e., the path descriptor. A thick line illustrates the only path from input 2 to output 7. The dashed lines (including the thick line) illustrate all paths with, a same path descriptor from all inputs to the same output 7. By following the labels of the switch outputs in the path, path descriptor from input 2 to output 7 and all other paths leading to output 7, is found lo be (I, I, I, 0). Note that the path descriptor is equivalent to the address of the output, least significant bit (Isb) first.
[0032] FIG. 4 also illustrates "intermediate" path descriptor with the labels reversed, for example, 'χχΟ Γ for switch I of stage 3 suggests the partial final output address as well as the "digit controlled" nature of the network. For example, at switch 1 of stage 3, the top input is labeled 'χχΟΓ, binury digit 2 of the target address (first 'x' from right in the label) is used to decide whether the bit from the input is passed lo input labeled 'χΟΟ at switch I or to input labeled 'χ ΐ θ ί ' at switch 5 of stage 4.
[0033] In general, for a target address bits (am.|,...,ai,ao), bit a0 ls use,J to control the switch in stage 1 , ai is used to control the next switch in the path in stage 2, and so on. Accordingly, the reverse delta network simply works in reverse, i.e. the roles of inputs and outputs FIG. 4 reverse, which forms a property of a bi-della network. Tn this case, the inputs that are labeled on the left, a path descriptor going from an output on the right back to an input on the left would be equivalent to the associated input address's most significant bit (msb) first. [0034] The embodiment above is provided for mere illustrations only, not limitations, It is entirely up lo user whether lsb first or msb first of the output addresses relative to the path descriptors is desired.
[0035] FTC. 5A exemplifies another bi-delta network in accordance with another embodiment of the present invention. Tn particular, the bi-dclta network is configured in a flip network configuration. Similarly, the illustrated network is also an 16x16 network having thirty-two (32) 2x2 switch elements 502 arranged in four (4) stages of eight (8) switch elements 402 at each stage. Likewise, the switch elements 502 are counted from 0-7 al each stage and the stages are counted from 1-4 stages. At each switch element 502, a top output is labeled as 0 and a bottom output as 1. Taken the same illustration as FTC 4, an unique path from input 2 to output 7 and all paths with same path descriptor (: Γ, ' P, T, Ό5) lo output 7 can also be shown. In the flip network, the intermediate path labels within each stage and the output addresses are in increasing order. The "address groups" (or sometime referred to as "blocks"') at the inputs of each stage have same path labels for the inputs in each group. The same groups are there too in RBN of FiC 4 but they are somewhat interleaved in each stage.
[0036] Kannan, el al, in "A bit-controlled Multichannel ' ime Slot Fermulalion
Network" (1995).. showed that an N*N RBN remains non-blocking if the target addresses for the inputs arc divided into two groups: one with ascending target addresses and another descending addresses, modulo N. For example, H is possible to assign all 1-bils in the mask increasing target addresses from 0 onward and all 0-bits decreasing target addresses from N-l downward.
[0037] Instead of explicitly generating the target addresses from die 0-bits and 1 -bits at the inputs, Kannan, et al, devised a method to generate the address bits in a distributed fashion with an R N network.
[0038] In the present embodiment, the flip network is used because it has an ordering property as mentioned earlier. Consider 1 -bit markers from the mask at the inputs as shown in FIG. SB. FIG. 5B illustrates target address bit in muting 1-bits in the flip network of FIG. 5A. They are assigned with increasing target addresses from 0. Clearly, bits ao from the setuience of addresses assigned to the 1-bits are alternating 0 and I. At stage 1 , each I -bit with a< 0 is routed to the top output of its associated switch and each 1-bit with ao=l is routed to the bottom output.
[0039] Following that, at stage 2 of the flip network, Ihe top half of the switch elements 502 is handling 1-bits with a0 =0, i.e.. addresses xxO, and the bottom half is handling 1-bits with ao=l, i.e.. addresses xxxl . It then follows that, in each half of stage
2, address bit t associated with the 1-bits alternates between 0 and 1. Similarly at stage
3. there arc 4 input groups where in each group, address bit a2 associated with the 1-bits alternate between 0 and L and at stage 4 there are 8 groups. As we have seen earlier, we call these groups "address groups''.
[0040] In general, at stage m, there are 2m_1 address groups wherein the target address bits am associated with 1-bit inputs alternating between 0 and I in the delta network. For 0-bil markers from ihe mask at the input the address bits alternate between 1 and 0. as the addresses for 0-bits arc assigned from N- 1 down. The address bils are then used lo control corresponding switches 502. The presenL embodiment is possible to be implemented through one single circuit, instead of a dedicated circuit for 1-bits and 0-bits as proposed by Kannan, et al. using a chain of XOR gates to generate control lines tor the switch elements 502.
[0041] FT !. 6 illustrates a circuit diagram for an 8-bit input address group of an address counting circuit in accordance with a further embodiment of the present invention. The circuit diagram comprises 8 inputs., 3 XOR gales and 4 switches. The single circuit will count both 1-bits (from 0) and 0-bits in the inputs correctly. [0042] The XOR circuit in FiGL 6 is a serial implementation of prefix compulation. The circuit has a depth and, thus, latency, of n levels. There arc known methods on parallel prefix circuits with depth as small as log2(n). For example, Sheeran's ' unctional and dynamic programming in the design of parallel prefix networks" (2009) and "Hardware Design and Functional Programming: A Perfect Match" (2005), which cover various parallel prefix circuit implementations, for example. Sklansky, Brent- Kung, Ladner-Fischer, depth-size optimal (DSO), search- based construction and many more, can be desired for the present invention.
[0043] FIG. 7 illustrates a BDST execution unit (BDEU) 700 in accordance with one embodiment of the present invention. The BDST execution unit 700 is adapted to handle 128-bit BDST instructions. The figure depicts a top-level view of the BDEU 700. A 128-bit bitmask mO, consisting of 1-bit and 0-bit markers for distribution locations, and a 128-bit input data xO that is to be distributed are inputs to the BDEU 700 while a completely separated 128-biL bitmask m7 (i.e., all I -bit markers are in the less significant bits in order and all the 0-bit markers arc in the more significant bits in reverse order) and a 128-bit distributed data x.7 are outputs from the BDEU 801).
[0044] In one embodiment, the BDEU 700 reads from a 128-bit register file (not shown) and writes the outputs to the same 128-bit register tile. It is also possible that the BDEU 700 reads and writes to the memory instead. A person skilled in the art should understand thai reading of inputs and writing of outputs from a computing unit in a processor are performed at different stages in the processor pipeline.
[0045] FIG. 8 illustrates a block diagram of a BDEU implementation in accordance with one embodiment of the present invention. The BDEU comprises three main components: a 7-slage 128x128 control delta network for routing 1-bits and 0-bits from 128-bit bitmask mO to m7. which are arranged in an ascending order; a 7-slagc 128x128 data delta network, reverse of the control delta network, for routing 128-bit data bits xO to x7, which arc arranged in a descending order; and 7 control generator stages cl to c7. each generating 64 switch controls for every stage of both the control delta network and (he data delta network.
[0046] There arc 7 stages (i.e. log2(128)) with 64 2*2 switches in each stage for each of the delta networks. For each stage m, which starts from I, there arc 2"'"' address groups with N/2'1 l inputs generating N/2™ switch controls per group. The controls are generated using XO prefix circuit as depicted in FIG. 7 from the inputs associated with the address group. Alternative taster parallel prefix circuits may be also be used. [0047] In one embodiment, Ihe output tri7 from the control stage 7 is optional.
If it is not used, control stage 7 of the control flip network becomes superfluous. By removing the final stage, a number of switches in other stages in the control flip network may also become unnecessary. In particular, last switch element associated with each address group may be removed.
[00481 PrG. 9 illustrates a simplified operation of the RDST Rxecutioii unit 700 of FIG. 7 in accordance with an embodiment of the present invention. The operation comprises configuring mask register at step 902; setting m~l at step 904, dividing u control stage into address groups at step 906; computing prefix circuit outputs from all address groups at step 90S, controlling switches in control network with prefix circuit from control stage ni at step 910; incrementing m at step 912; determine if m equals to iog2(n) at step 914: loading source register at step 916; setting
Figure imgf000016_0001
al step 918; controlling switches in data network with prefix circuit outputs from control stage m at step 920; decrementing in at step 922; determining if m=l at step 924; and storing final outputs of data network into result register at step 926.
[0049] The ioups shown in the figure is a simplification of Ihe flow of control and data bits in the two networks. Physically, the stages are completely unrolled as shown i previous figures.
[0050] The present invention provides a bi-delta network and a method thereof for carrying out parallel bit distributions. The network distributes bit markers in a forward direction of a control delta network and the user bits, both in-order and then reverse-order, in a backward direction of the data delta network. To achieve that, the bit distributions are carrying out in two phases, in Ihe first phase, the '0' and Ί ' bit markers forward with the control circuit for fixing up the switches' configuration. In the second phase, the user bits is distributed in a backward manner from the downstream of the control delta network, in both order and reverse-order in accordance with the bit markers, to output the distributed user bits at the upstream of the control delta network.
[0051] The bi-delta network can be implemented without using memory elements and bidirectional switches by having two unidirectional delta networks one in the forward direction for routing control bits mid the other in the backward direufiun for routing user bits, as required by a typical ASIC implementation. [0052] While specific embodiments have been described and illustrated, it is understood that many changes, modifications, variations and combinations thereof could be made to the present invention without departing from the scope of the invention.

Claims

Claims
1. A bi-dclta network for distributing bits through bit distribution (BDST) instruction from an input to an output of the bi-deita network, said network comprising: a control delta network constituting a forward path of the bi-deUii network for receiving an n-bits data from a bitraask register;
a data delta network constituting a reverse path of the bi-dclta network for receiving n-bits data from a source register; and
a control generation network having a plurality of stages between the control delta network and the data delta network, wherein each stage operationally receives inputs from inputs of stage m of the control delta network and generates control signals for switches in stage m of both control and data delta networks, whereby, m represents a stage number starting from I to log2(n).
2. The bi-delta network according to claim I, wherein both the control delta network and the data delta network comprises Jog2(n) stages respectively.
3. The bi-delta network according to claim 2, wherein the stages in the control delta network are denoted in an incremental manner from input to output direction from 1 to log2(n).
4. The bi-delta network according to claim 2, wherein the stages in the data delta network are denoted in a decremental manner from input to output direction from log2(n) to l .
5. The bi-delta network according to claim 2, wherein the stages in the control generation network are denoted in an incremental manner from input to output direction tram I to log2(n).
6. The bi-delta network according to claim 1, wherein each stage of the control generation network comprising 2'""1 address groups, and each group is adapted to control n/2'n switch elements at the respective stage of the control generation network.
7. The bi-delta network according to claim 6, wherein control signals from the control generation network for a switch element, k, in an address group of stage m is computed by a parity (XOR.) prefix circuit for generating parity count for first 2k+l inputs to the address group, whereby k represents a switch clement number denoting from 0 to n 2m-1 .
8. Λ method for distributing bits through a bi-delta network with bit distribution (BDST) instruction from an input to an output, the method comprising:
loading source bits into a source register;
determining target bits in a result register to move corresponding lower part nf the source register in preserved relative bit order by setting 1-bit markers at the corresponding bit positions in a bitmask register;
determining the target bits in same result register to move the corresponding upper part of same source register in reverse relative bit order by setting 0-bit markers at the corresponding bit positions in the bitmask register; and
executing the BDST instruction over the source and bitmask registers and subsequently depositing the result of the bits distribution into the result register.
9. The method according to claim 8, wherein executing the BDST instructions further comprising:
feeding n-bils data from bitmask register to a control delta network and the control generation network, wherein both the control delta network and the control generation network comprises log2(n) stages respectively, and the stages are numbered increasingly from 1 to log2(n) in the input to output direction;
feeding n-bits data from source register to the data delta network, wherein the data delta network comprises lo&O1) stages, has reverse structure of the control delta network and the stages are numbered decreasing.)' from log2(n) to 1 in the input to output direction; and
depositing n-bit output of the data delta network in the result register.
10. The method according claim 8, wherein the stage m of the control generation network that acquires inputs from the corresponding stage m of the control data network further comprising:
dividing the control generation stage m into 2" l address groups, wherein each group taking n 2m"' inputs associated with target address bits (am-|. ... n . ao) from stage m of the control delta network; and
applying parity (XOR) prefix circuit in each group, wherein for switch k in the address group in stage m of each delta network, the control signal for that switch is derived from parity count from the parity of inputs 2k, 2k- 1, 1, 0, whereby i represents a switch index, within the address group from 0 to n/2m
1 1. A bi-delta network adapted to carry out parallel bit distribution, the bi-de ta network comprising:
a control delta network; and
bit markers that are operable to be distributed in a forward manner in the control delta network for configuring switches of the control delta network, wherein
user bits are operationally distributed in a backward manner from the downstream of the control delta network, in both order and reverse-order in accordance with the bit markers, to output the distributed user bits at the upstream of the control delta network.
12. A method for carrying out parallel bit distribution in a bi-delta network, the method comprising:
distributing bit markers in a forward manner of a control delta network for coni igiuring switches of the control delta network;
distributing user bits in a backward manner from the downstream of the control delta network, in both order and rcvcrsc-ordcr in accordance with the bit markers, to output the distributed user bits at the upstream o the control delta network.
PCT/MY2012/000141 2011-07-06 2012-06-22 Apparatus and method for performing parallel bits distribution with bi-delta network WO2013006030A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2011003193 2011-07-06
MYPI2011003193 2011-07-06

Publications (1)

Publication Number Publication Date
WO2013006030A1 true WO2013006030A1 (en) 2013-01-10

Family

ID=46801614

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2012/000141 WO2013006030A1 (en) 2011-07-06 2012-06-22 Apparatus and method for performing parallel bits distribution with bi-delta network

Country Status (1)

Country Link
WO (1) WO2013006030A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4752777A (en) * 1985-03-18 1988-06-21 International Business Machines Corporation Delta network of a cross-point switch
WO2001089131A2 (en) * 2000-05-05 2001-11-22 Lee Ruby B A method and system for performing permutations using permutation instructions based on modified omega and flip stages
US6922472B2 (en) 2000-05-05 2005-07-26 Teleputers, Llc Method and system for performing permutations using permutation instructions based on butterfly networks
US20090138534A1 (en) * 2007-05-23 2009-05-28 The Trustees Of Princeton University Microprocessor Shifter Circuits Utilizing Butterfly and Inverse Butterfly Routing Circuits, and Control Circuits Therefor
US20090187746A1 (en) * 2008-01-22 2009-07-23 Arm Limited Apparatus and method for performing permutation operations on data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4752777A (en) * 1985-03-18 1988-06-21 International Business Machines Corporation Delta network of a cross-point switch
WO2001089131A2 (en) * 2000-05-05 2001-11-22 Lee Ruby B A method and system for performing permutations using permutation instructions based on modified omega and flip stages
US6922472B2 (en) 2000-05-05 2005-07-26 Teleputers, Llc Method and system for performing permutations using permutation instructions based on butterfly networks
US6952478B2 (en) 2000-05-05 2005-10-04 Teleputers, Llc Method and system for performing permutations using permutation instructions based on modified omega and flip stages
US20090138534A1 (en) * 2007-05-23 2009-05-28 The Trustees Of Princeton University Microprocessor Shifter Circuits Utilizing Butterfly and Inverse Butterfly Routing Circuits, and Control Circuits Therefor
US20090187746A1 (en) * 2008-01-22 2009-07-23 Arm Limited Apparatus and method for performing permutation operations on data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DIAS DANIEL: "Analysis and simulation of buffered Delta Networks", IEEE TRANSACTION C-30 VOL 20, 1 April 1981 (1981-04-01), pages 273 - 282, XP055044115, Retrieved from the Internet <URL:http://ieeexplore.ieee.org/ielx5/12/35202/01675775.pdf?tp=&arnumber=1675775&isnumber=35202> [retrieved on 20121113] *
HILEWITZ; LEE, FAST BIT COMPRESSION AND EXPANSION WITH PARALLEL EXTRACT AND PARALLEL DEPOSIT INSTRUCTIONS, 2006

Similar Documents

Publication Publication Date Title
JP7332300B2 (en) Techniques for Accelerated QUIC Packet Processing with Hardware Offload
US6922472B2 (en) Method and system for performing permutations using permutation instructions based on butterfly networks
US6952478B2 (en) Method and system for performing permutations using permutation instructions based on modified omega and flip stages
US7142669B2 (en) Circuit for generating hash values
JP4991223B2 (en) Data processing device
TWI333615B (en) Lookup table addressing system and method
US20090019269A1 (en) Methods and Apparatus for a Bit Rake Instruction
EP2235622A1 (en) Apparatus and method for performing permutation operations on data
EP2359234A2 (en) Method and apparatus to perform redundant array of independent disks (raid) operations
CN105103454A (en) Modular and scalable cyclic redundancy check computation circuit
TW201342220A (en) Instructions to perform Groestl hashing
WO2014071658A1 (en) Method and device for generating nested crc code used for data transmission error control
CN104704780A (en) Reduced complexity hashing
CN108183790B (en) AES encryption device, chip and system
US8775750B2 (en) Interleaver with parallel address queue arbitration dependent on which queues are empty
CN116266106A (en) Division and modulo operation
US9270460B2 (en) Instructions to perform JH cryptographic hashing in a 256 bit data path
WO2013006030A1 (en) Apparatus and method for performing parallel bits distribution with bi-delta network
US20140369499A1 (en) Cryptographic device, cryptographic processing method, and cryptographic processing program
WO2019024607A1 (en) Secure coding method, and decoding method and device
EP1766854A1 (en) Apparatus and method for performing md5 digesting
ES2653951T3 (en) Digital signal processor and method to address a memory in a digital signal processor
Purwita et al. Optimized 8-level turbo encoder algorithm and VLSI architecture for LTE
Cabral et al. Implementation of the SHA-3 family using AVX512 instructions
WO2013009162A1 (en) Method of providing signals to perform bit permutation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12756275

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12756275

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载