US9712185B2 - System and method for improved fractional binary to fractional residue converter and multipler - Google Patents
System and method for improved fractional binary to fractional residue converter and multipler Download PDFInfo
- Publication number
- US9712185B2 US9712185B2 US14/579,918 US201414579918A US9712185B2 US 9712185 B2 US9712185 B2 US 9712185B2 US 201414579918 A US201414579918 A US 201414579918A US 9712185 B2 US9712185 B2 US 9712185B2
- Authority
- US
- United States
- Prior art keywords
- digit
- value
- fractional
- residue
- modulus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/14—Conversion to or from non-weighted codes
- H03M7/18—Conversion to or from residue codes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/729—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic using representation by a residue number system
Definitions
- the invention relates to general purpose arithmetic logic units (ALUs), and in particular to an ALU utilizing a residue number system in performing arithmetic operations.
- ALUs general purpose arithmetic logic units
- the binary number system is the most widely used number system for implementing digital logic, arithmetic logic units (ALU) and central processing units (CPU).
- Binary based computers can be used to solve and process mathematical problems, where such calculations are performed in the binary number system.
- an enhanced binary arithmetic unit called a floating point unit, enhances the binary computers ability to solve mathematical problems of interest, and has become the standard for most arithmetic processing in science and industry.
- look-ahead carry circuits are generally dedicated to the ALU for which they are embedded, and are generally optimized for a given data width. This works fine as long as the CPU word size is adequate for the problems of interest. However, once a problem is presented which requires a larger data width, the CPU is no longer capable of using its native data and instruction formats for direct processing of the larger data width.
- FIG. 2 is a block diagram illustrating the second stage of a two stage apparatus for fractional binary to fractional residue conversion.
- FIG. 3 is a block diagram illustrating the first state of a two stage apparatus for example demonstration of a fractional binary to fractional residue converter.
- FIG. 4 is a block diagram illustrating the second state of a two stage apparatus for example demonstration of a fractional binary to fractional residue converter.
- FIG. 6A is a block diagram illustrating a fast residue multiply and normalization apparatus for use as a fractional residue multiplier apparatus.
- FIG. 6B is a block diagram illustrating a fast residue multiply and normalization apparatus for use as a fractional residue multiplier apparatus.
- FIG. 7 is a block diagram illustrating a modular arithmetic addition processing unit.
- FIG. 9 is a block diagram illustrating a modular arithmetic add and accumulate unit.
- FIG. 11 is a block diagram illustrating a modular arithmetic multiplier unit.
- FIG. 12 is a block diagram illustrating a modular arithmetic inverse modulus multiplier unit.
- FIG. 13 is a block diagram illustrating a residue to mixed radix converter unit.
- FIG. 14 is a block diagram illustrating a mixed radix digit, power term multiplier unit with power term source.
- FIG. 15 is an example fractional residue multiplication computation illustrating intermediate residue values and states.
- FIG. 16 is a flow control diagram illustrating the control steps of a fractional residue multiplier apparatus.
- FIG. 17 is a table pictorial illustrating the organization of a power term source.
- FIG. 18 is an example fractional residue multiplication computation illustrating intermediate residue values and states.
- FIG. 19 is an example fractional residue multiplication computation illustrating intermediate residue values and states.
- FIG. 20 is a block diagram of a fractional residue value separator unit.
- FIG. 21 is a block diagram of a fractional residue multiplier apparatus.
- FIG. 22 is a block diagram of a fractional portion only multiplier unit.
- FIG. 24 is an example fractional residue multiplication computation illustrating intermediate residue values and states.
- the residue number normalization unit also includes a digit power accumulator coupled to a result selector that receives and accumulate weighted digit power products, and a controller that transmits one or more commands to instruct one or more of the plurality of digit processing units to: convert one or more residue numbers having at least a fractional representation to a plurality of digits in mixed radix format; receive one or more mixed radix digits, each digit associated with a mixed radix power, and multiplying the digit value by the associated mixed radix power, thereby forming a weighted digit power product; and sum a selected portion of the plurality of weighted digit power products
- the result selector responsive to one or more controller commands, transmits a result from among one or more candidate normalization values.
- a residue converter configured to convert a fractional binary value to a fractional residue value
- such residue converter comprising an input register that receives a plurality of binary input digits, the plurality of binary input digits configured as a parallel binary input receiving a binary fractional value, a modulus shift register that stores a plurality of modulus values and to output each of the plurality of modulus values in sequence, and a first plurality of digit processing units that performs one or more arithmetic operations on a plurality of binary input digits and on a sequence of modulus values, and to generate a plurality of digit values and modulus values.
- Each digit processing unit comprises a modulus operand register that receives a modulus value from a prior processing stage, and configured to send a modulus value to a succeeding processing stage, an additive operand register that receives a digit value from a prior processing stage, a digit accumulator that stores a binary digit value, a multiplier that multiplies the contents of said digit accumulator with the contents of said modulus operand register generating an accumulator modulus product, and an adder that adds said accumulator modulus product with the contents of said additive operand register, and configured to send a least significant portion of the adder result to said digit accumulator, and a most significant portion of the result to a successive processing stage.
- a second plurality of digit processing units that receive digit values and modulus values from said first plurality of digit processing units, and to perform one or more modular arithmetic operations on a plurality of residue digits, each digit processing unit are also included.
- Each of the second plurality of digit processing units comprise a residue digit register that stores a residue digit, a modular multiplier that receives a modulus value from an input modulus bus and a value from said residue digit register, and is sends a residue product result, and a modular adder that receives said residue product result, and configured to receive a digit value from an input digit value bus.
- At least one said prior processing stage is said modulus shift register, and at least one said digit processing unit configured to send a digit value and a digit weight to a second plurality of digit processing units.
- the first plurality of digit processing units process numbers in binary format while the second plurality of digit processing units process digits in residue number format.
- the method also includes multiplying the first whole integer only portion with the second whole integer only portion forming a whole integer only product, scaling the whole integer only product by a fractional range value forming a scaled whole integer only product, truncating a first fractional only portion forming a truncated first fractional only portion, truncating a second fractional only portion forming a truncated second fractional only portion, multiplying the truncated first fractional only portion with the truncated second fractional only portion forming a truncated partial product, subtracting the truncated partial product from aforementioned intermediate fractional only product term forming an intermediate fractional product term evenly divisible by a fractional range, and multiplying the intermediate fractional product term that is evenly divisible by a fractional range with a multiplicative inverse of a fractional range forming a recovered fractional only portion.
- a final result is generated by summing a recovered fractional only portion, a first intermediate fraction and whole product term,
- FIG. 1 and FIG. 2 disclose a streamlined version of the conversion apparatus described in U.S. Patent Publication No. 2013/0311532, which is incorporated herein by reference.
- the streamlined apparatus requires less clock cycles for the conversion process, therefore, it is generally faster.
- FIG. 1 and FIG. 2 illustrate a J+K digit converter apparatus of the present invention by means of example.
- FIG. 1 discloses a first plurality of J binary digit stages
- FIG. 2 discloses a second plurality of K number of RNS digit stages.
- the converter is an enhanced version of the forward fractional converter of U.S. patent application Ser. No. 13/475,979.
- the apparatus accepts a fractional only binary value into the B_IN port shown at top of FIG. 1 .
- the B_IN port accepts binary data using a series of binary digit latches, B 0 _ IN 100 through B J ⁇ 1 _ IN 103 .
- the digit latches may have the same maximum bit width Q as the residue digits of FIG. 2 in one embodiment. These latches are for input holding only, and may be replaced by a bus in some embodiments.
- the binary value (B 0 ) through (B J ⁇ 1 ) input at B_IN is latched into conversion data registers A 0 150 through A J ⁇ 1 153 .
- modulus registers M 0 120 through M 3 123 are initialized with the value of one using hardware not shown for sake of clarity. This initialization ensures the values of digit registers A 0 through A J ⁇ 1 are not destroyed, but are preserved in the next following clock cycle.
- a plurality of modulus values is associated to an RNS fractional range.
- F number of modulus values are associated with the fractional range, such as modulus M F ⁇ 1 111 . All F number of modulus values are stored, or pre-stored, into a memory or shift register, shown by modulus shift register 110 .
- modulus shift register 110 Upon each clock of the conversion, the modulus values stored in modulus shift register 110 are sequentially shifted into modulus data register M 0 , where M 0 feeds M 1 , and so on and so forth through M J ⁇ 1 as shown.
- each modulus value serves as an operand to a specific digit multiplier of a digit processing block, such as multiplier 130 thru 133 , thereby multiplying the current state of the digit A 0 150 thru A J ⁇ 1 153 by a different modulus value.
- a carry value may be generated by each digit multiplier, illustrated by data path 170 for example.
- the carry value data width equals the data width of the binary input digit, so designated as Q in FIG. 1 .
- Each carry value propagates towards the left and is latched by the next digit processing unit in FIG. 1 .
- the high order bits of the last digit multiplier shown by example as multiplier 133 , is output by digit output bus 161 to the second plurality of RNS digit processing stages as shown in FIG. 2 .
- the modulus value shifted into the last modulus register stage 123 is output by bus 160 to the second plurality of RNS digit processing stages of the converter shown in FIG. 2 .
- residue digit processing units 220 , 221 , 222 , 223 multiply, in a modular fashion, the value of the modulus output value 160 by the current value of digit registers 260 , 261 , 262 , 263 respectively.
- Each multiplier result is summed, in a modular fashion, to the value on the digit out bus 161 and stored back into the digit register 260 , 261 , 262 , 263 respectively.
- the adders 250 , 251 , 252 , 253 and multipliers 240 , 241 , 242 , 243 perform modular arithmetic, and support a multiply, add and accumulate operation for a specific and unique pair-wise prime digit modulus respectively.
- digit function blocks such as digit function blocks 220 , 221 , 222 , 223 , act in unison and perform a similar operation, but with a different modulus value.
- the last modulus and carry value exit the apparatus of FIG. 1 via modulus out data path 160 and digit out data path 161 , they are processed by the plurality of RNS digit function blocks of FIG. 2 , and the final result is stored in digit accumulator registers 260 , 261 , 262 , 263 .
- a final converted result, in residue fractional format may be transferred out of the converter using digit buses 270 , 271 , 272 , 273 .
- the residue multiply and add operation performs the same arithmetic calculations as the later stages of the fractional converter of FIG. 20D of U.S. patent application Ser. No. 13/475,979.
- the high order (leftmost) digit function blocks are replaced by a series of residue stages as shown in the present disclosure of FIG. 2 . It is noted the first stages of both converters remain much the same.
- the second stage as shown in FIG. 2 will typically terminate one or two clocks later.
- the second plurality if RNS digit processing units contains the scaled fractional value in residue format. This is in contrast to the invention of FIG. 20D of U.S. patent application Ser. No. 13/475,979 where the second plurality of digit processing units is in binary format, and therefore requires additional clock cycles to further complete conversion into a residue fractional format.
- FIG. 1 also discloses apparatus for detecting a round-up of the converted result. Rounding is an important feature, and is typically a mandatory feature in order to provide a converted result with the most accuracy.
- the values of binary register digits A 0 150 through A J ⁇ 1 153 are valid.
- the value contained in the binary registers is a remainder of the conversion, and may be compared with a round-up threshold value 155 using comparator 156 . In one embodiment, if the value equals or exceeds the threshold, the converted value contained in registers D 0 273 thru D K ⁇ 1 is rounded up by incrementing the converted value by one unit.
- each digit register 260 , 261 , 262 , 263 of FIG. 2 is incremented by one in a modular fashion.
- only the most significant binary bit of the most significant binary digit i.e., A3 153 , is used to determine if a round-up is required. If a round-up is required, such indication may be transmitted by carry out bus 162 to a controller, not shown, responsible for performing the increment operation. It should be clear to those skilled in the art of digital design that alternate types of rounding and/or alternate threshold values may be used.
- the integer part may be separated from its fractional part before conversion begins in one embodiment. This is beneficial, since fractional conversion differs from integer conversion, and performing both conversion processes in separate and in parallel saves time.
- integer and fractional conversion are performed in parallel. After conversion, the integer and fractional values are added together when in residue format. Conversion of integer and fractional values in parallel generally improves throughput of real-world conversions. This teaching is disclosed in U.S. Patent Publication No. 2013/0311532 and is not dealt with in any more detail herein.
- FIG. 3 is derived from FIG. 1 and shows four distinct binary digit processing stages or units
- FIG. 4 is derived from FIG. 2 and shows four distinct RNS digit processing stages or units. Reference designators from FIG. 1 and FIG. 2 have been preserved in FIG. 3 and FIG. 4 for clarity.
- a timing diagram illustrating the various data flows of the example conversion apparatus of FIG. 3 and FIG. 4 is disclosed in FIG. 5 .
- the reader will note the actual numerical values used in the conversion apparatus example. These example values coincide with the example values provided in U.S. Patent Publication No. 2013/0311532 for clarity of disclosure and comparison of the methods.
- the number of digit processing stages of the first plurality of binary digit processing units need not equal the number of digit processing stages of the second plurality of RNS digit processing units.
- the number of associated fractional modulus contained in modulus shift register 110 of FIG. 1 need not match either the number of binary digit processing units or the number of RNS digit processing units.
- all of these numbers equal four by means of example only.
- each major storage register is denoted under the first column, i.e., the column labeled “register”.
- the modulus registers M 0 120 thru M 3 123 are listed as the first four registers in the waveform diagram of FIG. 5 in the rows 511 , 512 , 513 , 514 .
- Also shown in the waveform diagram of FIG. 5 is ten columns denoting each clock cycle of the conversion process, labeled cycle 0 thru cycle 8 .
- traversing the waveform diagram from left to right indicates the register values for each storage element on a cycle by cycle basis.
- digit registers A 0 150 thru A 3 153 of FIG. 3 When viewing the contents of a plurality of registers at any given instant, for example, digit registers A 0 150 thru A 3 153 of FIG. 3 , one may inspect the values in one column of the waveform diagram of row 515 , 517 , 519 , 521 respectively.
- the register values contained in A 0 , A 1 , A 2 and A 3 are all seen to be hex digit value five (0 ⁇ 5) in FIG. 5 . This corresponds to the loading of the fractional binary value 0 ⁇ 5555 at the start of conversion per the example values provided.
- the modulus registers M 0 , M 1 , M 2 and M 3 are seen to be initialized to the value of one at cycle 0 which corresponds to the initialization of all modulus registers with the value of one prior to conversion processing.
- rows 515 , 515 , 521 show the result of the first clock of the conversion process, which illustrates that digit register A 0 has transitioned from the hex value of 0 ⁇ 5 to the hex value 0 ⁇ A.
- rows 517 , 519 , 521 show the digit registers A 1 , A 2 and A 3 remain unchanged.
- rows 516 , 518 , 520 illustrate the storage or transmission of carry values from previous digit processing stages of FIG. 3 for each clock cycle.
- modulus register M 3 123 of FIG. 3 is shown in FIG. 5 , row 514 , and this value therefore drives the Modulus out bus 160 , which is connected to the Modulus crossbar 160 of FIG. 4 .
- rows 522 , 523 , 524 , 525 are the states of the four RNS digit registers D 0 , D 1 , D 2 and D 3 respectively.
- the RNS registers are initialized to zero as shown in the cycle 0 column of the waveform diagram of FIG. 5 .
- the digit out bus 161 will output a value of zero, and the modulus out bus 160 will output the value of two. Therefore, in the next clock cycle, all four RNS digit processing units of FIG. 4 will process the two values on each of the crossbar buses.
- each RNS digit register will multiply its zero value by three (3), and add a value of one. This results in a value of one stored in each RNS digit processing unit D 0 thru D 3 of FIG. 4 .
- the digit out bus will receive the value of four, and the modulus out bus will receive the value of five (5). Therefore in cycle 7 column, the value of one stored in each RNS digit processing unit will be multiplied by five and this result will be added to four and stored back into each RNS digit register D 0 thru D 3 .
- the RNS value contained in registers D 0 thru D 3 is the value nine (9) which is represented by the RNS value (1,0,4,2) using the example modulus of (2,3,5,7), since each RNS digit processing unit is modulo its pair-wise prime modulus.
- cycle 7 the next digit out bus value is shown to be six (6), and the next modulus out bus value is shown to be seven (7). Again, multiplying nine by seven results in a value of 63, and adding the value of 6 results in an RNS register value of 69. This result is shown in cycle 8 of FIG. 5 and is represented as the RNS value (1,0,4,6) shown enclosed in the dotted ellipse 500 .
- the modulus register value is one, and therefore, the conversion process is over with the exception of a possible round-up correction.
- the remainder value of the conversion is contained in binary digit registers A 0 thru A 3 , and this value is 0 ⁇ FFBA. Since this value is greater than half the range of a normalized unit, i.e. 0 ⁇ 8000, then the final RNS value stored in registers D 0 thru D 3 is incremented by one using a modulo add.
- the modulo increment may be performed by sending a value of one to the Modulus crossbar (which it is be default), and by sending the TRUE output of the comparator 157 of FIG. 3 , interpreted as the value one, to the digit crossbar via selector unit 280 of FIG. 4 .
- the resulting RNS fractional value is exactly the fractional value of one third, since 70/210 is exactly interpreted as the correctly converted value of the approximated one third value represented in a binary fractional format ( 0 ⁇ 5555/0 ⁇ 10000). This ends the example section of the improved converter of the present invention.
- fractional multiplier provided for a single clock per RNS digit plus some additional clocks to convert the mixed radix intermediate result back to residue format for continued processing.
- FIGS. 15A, 15B and 15C provide flowcharts for several novel RNS fractional value multipliers.
- a suitable fractional RNS format is devised, and several unique apparatus are disclosed which perform a multiply of two fractional RNS values.
- an advanced multiplier is disclosed which performs multiplication of signed fractional values.
- the present invention discloses a method which performs the operations disclosed in the flowchart 15C, but in a more efficient manner in terms of clock cycles.
- the steps of conversion of mixed radix back to residue format 1530 a and 1530 b is performed in parallel with the process of converting the original product from residue to mixed radix 1525 a and 1525 b.
- each mixed radix digit represents a weighted value.
- the value represented by each mixed radix digit may be converted to a residue value.
- One may therefore convert each mixed radix digit to a residue value, and add this value to a residue summation of all other converted digits.
- the final residue summation will be the truncated mixed radix value in residue format, provided the mixed radix digits associated with the fractional digits are discarded, i.e., not added to the residue summation. Therefore, using a new and unique apparatus of the present invention, during the conversion cycle of a mixed radix digit, its value may be converted and summed in residue format in the same or next clock cycle.
- the unique new RNS fractional multiplier of the present invention can process the operation in P clocks for a P digit fractional format.
- This P digit format includes all extended digits required to contain the extended, or non-normalized, fractional value.
- FIG. 6A One unique and preferred embodiment for a scalar residue multiplier is shown in FIG. 6A . It is noted that a scalar multiplier is specifically not a pipelined multiplier unit. A fully pipelined variation of the new multiplier is also a claimed invention, but its design details are not disclosed in this application; however, such a variation should be evident to those skilled in the art of digital design.
- the new RNS multiplication starts with a non-normalized product, denoted as block 600 in FIG. 6A . This is similar to FIG. 15 c of U.S. Patent Publication No.
- Valid sign flags or sign extended operands are not needed for either or both operands at start; instead, the new multiplier will produce a result that is signed and flagged as sign valid. In other words, the new multiplier produces a fully sign extended, normalized result regardless of whether the operands themselves have been sign extended. This is accomplished using two values computed in parallel, i.e., the original intermediate product value and the complement of the original intermediate product value. This complement operation is performed on the non-normalized product 600 by complement unit 602 . The complement is therefore loaded for processing into the residue to mixed radix converter 620 . The original (non-normalized) value 600 is loaded directly into its own residue to mixed radix converter 610 .
- Residue digit logic is the logic used to employ the basic modulo digit operations required within a residue ALU, such as the residue ALU disclosed in U.S. Patent Publication No. 2013/0311532.
- the residue digit logic apparatus and their associated operations are also used in the design of residue ALU sub-units, such as multipliers, adders, and converters.
- each digit has its own modulus p pre-assigned.
- modulus p When adding two arbitrary numbers, digits of the same modulus are added and a MOD function modulo p is applied to the result.
- the mod function is commonly denoted as % in C programming language.
- a legal digit value includes zero and any positive value up to, but not including, the pre-assigned modulus p.
- the single range modulo adder is composed of a conventional binary adder, a comparator and a subtract unit to implement the modular add function. This type of circuit is conventional to those skilled in the art, and is shown in FIG. 7 .
- residue digit values, A and B are summed using a binary add unit 700 .
- the result of the binary add is then transferred to one input of a binary subtract unit 710 , and also the result is transferred to one input of a binary comparator 730 .
- the circuit is so configured that if the result of binary addition 700 is less than the digit modulus p, the modulus p designated as M in FIG. 7 , the result is sent directly as a result through bus selector 720 since the output of comparator 730 is TRUE, or 1.
- the bus selector selects the output of the subtract unit 710 , which decreases the binary sum by the amount of the modulus M.
- the correct modular sum result appears on output R of bus selector 720 in either case.
- single range modulo subtract unit is very similar to that of modulo addition, and is also conventionally known to those skilled in the art.
- simplification results from the fact that a single range subtraction is bounded to less than twice the digit range.
- a single subtract unit, a single add unit and a single comparator can be used.
- FIG. 8 a novel yet minor improvement uses borrow signal 830 from a binary subtract unit 800 to replace the comparator of FIG. 7 . If the result of the subtract unit 800 is positive, borrow signal 830 is false, and bus selector 820 passes the subtract unit 800 result directly to result R.
- Bus selector 820 passes the output of add unit 810 since the borrow signal 830 is true. Again, in either case, the correct modular digit subtract result appears as a result.
- the same logic reduction used to implement the modular digit subtract unit of FIG. 8 can be adapted to a single range modulo digit add unit of FIG. 7 by introducing a borrow signal that steers the correct value to the output.
- FIG. 9 where a single range modular digit add unit 900 is depicted.
- the modular digit add unit 900 of FIG. 9 also supports a registered result 930 whose output 940 is fed back to one of the operand inputs of the modular add unit 900 .
- This modification to the modular add unit 900 is termed a modular digit add and accumulate unit.
- Residue digit functions such as add and subtract are more complex if the range of the digit operands exceed the legal digit range. This can occur if the digit processing unit must accept an operand from another digit modulus, for example.
- the relative value of digit modulus with respect to each other is an important design parameter for the residue ALU. For example, if no modulus is more than twice as large as any other modulus, then digit subtract circuits may be designed to handle this extended digit range of input operands.
- the plurality of modulus p of an ALU varies widely, say by a factor of 4 or 5 times, then more complex circuits are needed.
- FIG. 10 shows a digit adder circuit which supports a larger range. The operation of the extended range add or subtract unit is not discussed in detail and should be obvious to those skilled in the art.
- Residue digit multiplication is another important requirement of the residue ALU.
- a LUT memory is used. If a LUT memory is used in a brute force fashion, very large memories are required if the digit modulus is large, since up to M*M table entries may be needed. Therefore, there is a need to reduce the size of LUT memory for implementing modular digit functions such as digit multiplication.
- FIG. 11 shows a novel implementation of a residue digit multiplier which significantly reduces the amount of LUT memory needed.
- the multiplier unit uses a conventional binary multiplier 1100 to multiply two n-bit operands A and B.
- the product output of multiplier 1100 is 2*n bits wide.
- the low order n bits of the product output 1102 are routed to LUT 1112
- the high order n bits of the product result 1104 are routed to LUT 1110 .
- the effect of the LUT 1112 and LUT 1110 is to provide two unsigned integers whose sum MOD M is the correct modular multiplication result R.
- Each LUT output is bounded by modulus M, so the sum of each output may not exceed the modulus M by two times; therefore, the basic modular digit adder of FIG. 7 may be used to provide the correct result.
- FIG. 11 Similar to FIG. 7 is the solution in FIG. 11 , which shows a binary adder 1120 , comparator 1130 , subtract unit 1140 and bus selector 1150 configured to provide the modular addition of the results of each LUT 1110 and 1112 .
- the result R appears at the output of bus selector 1150 .
- Another important modular digit operation is inverse multiplication, also referred to as MODDIV in U.S. Patent Publication No. 2013/0311532. As disclosed in that application, this operation may be implemented in a brute force manner using a LUT memory. However, this application also discloses a unique and novel method for implementing a smaller LUT which encodes a multiplicative inverse of a given modulus p 0 with respect to another digit modulus p 1 .
- a LUT table therefore includes N ⁇ 1 number of multiplicative inverses for the given p, which provides a means to process a multiplicative inverse digit operation with respect to any other given digit modulus for an ALU having N distinct pair-wise prime digit modulus.
- the primary requirement is that for every digit modulus, a multiplicative inverse exists for each of the other supported digit modulus.
- a digit modulus does not require an inverse for itself, since this digit is undefined after digit division by its own modulus.
- the modular digit multiplier of FIG. 11 is shown with the addition of LUT 1200 .
- the LUT 1200 contains the multiplicative inverse of the given digit modulus p with respect each other digit modulus of the ALU or ALU unit.
- the input 1205 to the look-up table (LUT) 1200 is labeled ⁇ Div_modî, which is an index to the specific multiplicative inverse (of p) associated with, and with respect to, the indexed modulus.
- the output of LUT 1200 i.e., the multiplicative inverse ofp with respect to the selected modulus, “Div_mod”, is then multiplied by a digit value input via bus 1210 using multiplier 1100 .
- the remainder the modular multiply apparatus remains identical to the modular digit multiplier of FIG. 11 .
- a residue ALU having 32 pair-wise prime digit modulus may support a 32 entry LUT 1200 to store 31 multiplicative inverses, each inverse associated with each of the other 31 modulus. Therefore, inverse modular multiplication can be used for dividing a residue value by any of the ALU digit modulus, by first ensuring the overall residue value is evenly divisible by the selected modulus, then by multiplying each digit of the ALU by its multiplicative inverse of the selected modulus specified by the “Div_mod” index input 1205 .
- each LUT 1205 for each inverse digit multiplier will be organized such that the Div_mod input 1205 is common to all inverse digit multiplier processing units.
- residue to mixed radix conversion unit 610 uses successive digit modulus divide operations to reduce the residue value into a series of mixed radix digits. This conversion process is well known, and is described in Tanaka as well as U.S. Patent Publication No. 2013/0311532.
- the digit value at the particular modulus is subtracted from the entire residue value (the entire residue value is all residue digits interpreted as a single residue value).
- the subtracted digit is also a mixed radix digit that is output using data path 612 and data path 622 in FIG. 6A .
- the particular modulus is defined as skipped, or ignored.
- the specific digit modulus is undefined until a base extension is performed.
- one or more digit modulus divides is performed on the residue value until it is reduced to zero.
- the digit at that specific modulus becomes a mixed radix digit. All mixed radix digits generated complete an equivalent mixed radix number.
- the mixed radix number format is unique to a given sequence of digit modulus divides. Digit divides are performed using a subtract, then a multiply of the residue value by the inverse of the modulus divisor.
- the residue to mixed radix conversion units 610 , 620 of FIG. 6A may support P number of digits, by means of example and as shown in further detail using FIG. 13 .
- the residue to mixed radix converter apparatus of FIG. 13 is similar to that disclosed in U.S. Patent Publication No. 2013/0311532. Control logic and circuitry is not shown for simplicity.
- a plurality of residue digit latches 1300 store an initial starting residue value having P number of residue digits. This starting value is loaded into the converter through a plurality of bus selectors, such as bus selectors 1355 , 1356 & 1357 as shown in FIG. 13 .
- the bus selectors send the initial residue value through P number of digit subtract units, such as digit subtract unit 1330 , 1331 , 1332 & 1333 .
- These digit subtract units may have circuitry similar to FIG. 8 in some embodiments, where each subtract unit contains a unique modulus value M.
- an initial subtract operation by the value of the first digit processed is performed during the load operation.
- the first digit sends its value onto the digit subtract crossbar 1370 through tri-state buffer 1320 upon loading the initial start value.
- Each digit subtraction unit therefore applies the same digit value as a subtrahend operand via common crossbar 1070 .
- the result of subtracting the digit value from the load value is then sent to the inverse multiplier unit for each digit processing unit 1310 .
- the inverse multiplier unit taken as a whole, is shown as a plurality of digit function blocks, such as function blocks 1340 , 1341 , 1342 , & 1343 depicted in FIG. 13 .
- a typical inverse multiplier function block 1341 has a circuit similar to that of FIG. 12 in some embodiments.
- the inverse multiplier also supports a zero/skip output, which goes true if the digit is zero, or if the digit is marked skipped.
- the plurality of zero/skip outputs may be logically combined using AND gate 1381 in a basic embodiment.
- the output of AND gate 1380 provides a ⁇ doneî output 1381 which is routed back to the controller not shown.
- the first inverse multiply function block 1340 is not included, since the first digit need not be divided by itself, and is undefined after subtraction at initial load.
- the inverse multiplier of each digit processing unit multiplies the input digit by 1, thereby loading the initial residue value subtracted by the first digit into the converter inverse multiplier 1315 which typically supports a registered store for each product result.
- the counter 1375 indexes each digit LUT 1200 of FIG. 12 , each digit LUT 1200 producing a multiplicative inverse for the selected modulus.
- the inverse multiplier 1315 divides the residue value by the selected modulus.
- the resulting product value is generally stored and made available to the subtract units for the next conversion cycle using a digit feedback path, such as feedback path 1351 and selector 1355 .
- the associated digit of the specified divisor modulus is no longer valid, and is marked as skipped.
- a control unit selects each digit and associated modulus in succession for subtraction and then subsequent division, in coordination with the modulus counter 1375 which indexes the divisor modulus.
- the conversion terminates. The conversion may also terminate when all digits are processed, i.e., skipped.
- the residue digit subtracted may be output via digit crossbar 1370 in some embodiments. Therefore, digit crossbar 1370 outputs a mixed radix digit on each cycle of conversion.
- This crossbar output is shown on converter unit 610 and 620 as output bus 612 and 622 respectively in FIG. 6A .
- the zero flag is also shown coming from each converter unit 610 and 620 of FIG. 3 . The zero flag is made available to control logic to determine end of conversion. If a value during mixed radix conversion terminates before its complement, that values data path is used as the final output of bus selector 655 of FIG. 6A .
- FIG. 6A illustrates a control block 615 providing a detection of both sign and round-up of the final result through the use of a successive digit comparison mechanism not shown. All digit comparison is performed least significant digit first due to the nature of mixed radix conversion.
- the residue multiplier of FIG. 6A illustrates the function of the mixed radix conversion unit 610 .
- the output of residue to mixed radix conversion unit 610 is connected to a special power term multiplier unit 630 .
- the power term multiplier 630 takes a mixed radix digit output from the converter 610 , and multiplies the digit by its associated power, and then sums the resulting product to a running product summation contained in multiplier unit 630 . These calculations are performed in residue format.
- the source of the ⁇ baseî power terms is provided by a power terms source units 605 , 625 .
- the power term units 605 , 625 may be the same in some embodiments.
- the power term source 605 may consist of values pre-stored in a LUT or FIFO, or the power terms source 605 may be generated in real time by a residue arithmetic circuit, or other source which can provide the associated mixed radix digit power value in residue format.
- FIG. 14 also includes a power terms source 605 , which sources a power term value to the power term multiplier 630 of FIG. 6A on each multiply cycle.
- a power term constant such as power term constant 1440
- storage registers 1406 , 1407 , 1408 , 1409 are stored in a plurality of residue digit storage registers, such as storage registers 1406 , 1407 , 1408 , 1409 , is routed to one input of a residue multiplier consisting of a plurality of modular digit multipliers, such as digit multipliers 1410 , 1411 , 1412 , 1413 .
- the other input to the multiplier is sourced from the mixed radix digit value input via input 1140 which feeds the mixed radix digit crossbar 1145 .
- This input is the connected to the mixed radix digit output 1370 of FIG. 13 . Therefore, on each clock cycle, an associated power term constant is multiplied by its respective mixed radix digit, thereby creating a weighted value output from the digit multipliers.
- the power term constants 605 may be pre-stored using a LUT memory, or a FIFO shift memory, for example; alternatively, an arithmetic circuit may sequentially generate the required power term constants using a smaller table of stored modulus values. Generating modulus values using an arithmetic circuit has the added flexibility of producing power term constants with arbitrarily skipped modulus. This approach allows the multiply unit of FIG. 6A to normalize products with skipped digits, or to multiply operands with arbitrary or dynamically selected digit modulus.
- the product of the mixed radix digit and its respective power term constant is then summed by a residue adder and accumulator register 1430 , shown as a plurality of digit add and accumulate units 1420 , 1421 , 1422 , 1423 in FIG. 14 .
- the digit add unit 1430 includes an accumulator function which state may be cleared prior to mixed radix conversion; FIG. 9 shows a more detailed diagram of a modular digit add and accumulator unit, such as add and accumulate unit 1420 of FIG. 14 .
- the digit adder 1420 may clear its accumulator register 930 as shown in FIG. 9 , and will therefore sum the first weighted value generated by the digit multiply units with zero.
- each weighted product term generated will be summed to the running total contained in the accumulator 1430 . This process continues for all digits in parallel until the conversion is complete.
- the modular add circuitry of FIG. 9 uses conventional binary adder 905 and binary subtract 910 and a bus selector 920 which steers the correct value based on the state of the borrow out signal 912 of the subtract unit 910 as explained previously.
- fractional multiplier For the proper operation of the fractional multiplier of FIG. 6A , the sum of products representing the converted final result does not include any direct contribution from mixed radix digits associated with the fractional range.
- fractional digits are the first F number of digits converted to mixed radix in one embodiment.
- fractional digits are included in the mixed radix conversion process. In fact, it is important that during conversion of the intermediate product 600 and the complement of the intermediate product 602 , that the digit modulus associated with the fractional residue digits are converted first. This provides the needed division by the fractional rage, which is associated to the product of all fractional digit modulus.
- a suitable controller contains the code and control flow to ensure that the first F digits converted by the mixed radix converter 610 are not processed by the digit multiplier 630 . Therefore, in some embodiments, power term constants (weights) for each of the digits associated to the fractional range, such as constant 1440 of FIG. 14 , are not stored in the power terms source unit 605 . Further concepts relating to the matter of skipping, and/or truncating mixed radix digits associated with the fractional range are disclosed in more detail in U.S. Patent Publication No. 2013/0311532.
- the power term multiplier 630 outputs the summation result to the round-up unit 635 in FIG. 6A .
- the round-up unit 635 may exist as a separate unit as shown, or may be combined into the power term multiplier unit 630 . (For example, during the initial clearing cycle, the accumulator may be set to one if rounding is required.) For this example, a separate round-up unit is shown separately in FIG. 6A .
- the round up unit 635 consists of a plurality of modular digit adders, with one adder input tied to a selector which may select either one or zero.
- the other adder input is the summation value input from the power term multiplier 630 .
- a single unit value is added to the final summation output of the residue add and accumulate unit 630 , otherwise, a value of zero is added and the result is output.
- mixed radix digits are sent to the sign and round up detect unit 615 as shown in FIG. 6A .
- This unit is capable of comparing, digit by digit, whether the mixed radix value represented by the first F digits is greater than or equal to a pre-determined round-up threshold amount, i.e., such as 50% of the fractional range magnitude. If the final converted value requires rounding, a signal is sent from the round up detect unit 615 to the round up unit 635 to perform a round up on the summation value.
- the round-up unit 635 outputs the input value unaltered.
- the value is then routed to bus selector unit 655 of FIG. 6A .
- Bus selector 655 will route the value to the final answer output 660 if the starting value 600 is positive, and hence smaller in magnitude than its complement value processed by mixed radix converter 620 .
- the final fractional multiplier result 660 is therefore a normalized positive fractional final product value.
- the mixed radix converter 620 accepts the complement of the starting value 600 using complement unit 602 .
- the mixed radix converter 610 executes in tandem with the mixed radix converter 620 . Both converters generate a mixed radix digit of the same significance (digit position) on each clock cycle. Since the values processed are complements, one value is larger than the other in magnitude.
- the sign and round up detect circuit 615 may detect which value converted is the smaller value in magnitude. If the smaller value is the complemented value 602 , this means the starting intermediate product was negative, and the value processed by converter 620 is therefore used.
- converter 620 sends mixed radix digits to a power term multiplier unit 640 , which multiplies the mixed radix digit value by its power term constant, or weight.
- the power term constants are stored in a power terms source 625 .
- Each non-fractional digit processed adds another product sum to a running summation contained in the mixed radix digit, power term multiplier unit 640 as discussed previously.
- the output may be rounded up by round up unit 645 .
- the value is then complemented by complement unit 650 , since the value processed is positive, and therefore must be re-converted to a negative quantity for a final result.
- the sign flags may be updated to reflect the proper sign, and the sign valid flags set true. Sign flags and sign valid flags are discussed in U.S. Patent Publication No. 2013/0311532.
- the sign and round-up detect unit 615 determines if the original value or its complement is smaller in magnitude; it performs this comparison least significant digit first during conversion to mixed radix format.
- the sign and round-up detect unit 615 compares each mixed radix digit, digit by digit, during the synchronized conversion process. If a value under conversion terminates before another, that value is deemed smaller in magnitude. Also, if both converters terminate in the same cycle, the sign detect unit 615 can determine, based on the value of digits compared, which value is smaller in magnitude. Once detected, the sign detect unit 615 sends a signal to the select input of the bus selector 655 . In this embodiment, the fractional multiplier of FIG. 6A routes the correct answer to the output 660 , and furthermore, generates a valid sign flag for the resultant value.
- the residue to mixed radix conversion units 610 , 620 processes digit modulus from smallest to largest. This allows that every digit function unit only process digits within their specific legal range. For example, during subtraction of the digit value of the first modulus, its range is smaller than all other digit modulus; during subsequent processing of each successive digit, the same holds true. Therefore, the digit subtract unit of each digit processing unit of the residue to mixed radix converter, such as digit processing unit 1310 of FIG. 13 , need only support a single range modular subtract unit 1333 , using circuitry similar to that shown in FIG. 8 .
- FIG. 6B is presented.
- the identical apparatus as FIG. 6A is shown with several minor modifications.
- FIG. 6A two identical power terms sources 605 , 625 are illustrated.
- the power terms source 625 b associated with the processing of the complement of the intermediate product 602 is modified to store a plurality of negative power terms, or pre-complemented power terms.
- the power terms source 605 b of FIG. 6B contains the original positive power terms, while the power terms source 625 b contains the complement of the original power terms, which is also referred to as negative power terms source 625 b .
- the round-up unit 645 b is actually a decrement operation, since adding a negative one is subtracting by one.
- the mixed radix digit, power term multiplier 640 multiplies a positive mixed radix digit with a negative power term value.
- This arrangement provides an apparatus where the mixed radix digit, power term multiplier 640 sums only negative weighted digit values. Therefore, the final resulting product sum of the power term multiplier 640 is already complemented, and there is no need for an extra complement step, such as complement step 650 of FIG. 6A .
- This arrangement saves a complement unit and a single clock cycle for the processing of negative fractional products. It also helps to make the fractional multiplier of FIG. 6B symmetrical in terms of timing regardless of the sign of the final fractional product result.
- FIG. 15G is re-labeled FIG. 15 in the present disclosure.
- the sample fractional value calculation is summarized within the dotted rectangle 1551 of FIG. 15.
- a thorough description of the fractional format used in the present invention can be found in U.S. Patent Publication No. 2013/0311532.
- FIG. 16 a control unit flow diagram is provided to illustrate the processing steps taken by the fractional multiplier of the present invention.
- the control flow will emphasize the apparatus of FIG. 6A over that of FIG. 6B for means of clarity.
- the present invention does not disclose details of the control circuitry required to operate and sequence the apparatus of FIG. 6A , however, such control circuitry is well known to those skilled in the art.
- the flow diagram of FIG. 16 is also used to help describe the operation of the fractional multiplier of the present fractional multiply example which follows.
- FIG. 17 is provided to clarify the contents and formation of power terms contained within the power terms sources 605 , 625 of FIG. 6A . Additionally, the pictorial table of FIG. 17 helps clarify various associations of the structure and operation of the power terms source with respect to the operation of the fractional multiplier of the present invention and example. In the present example, the power terms sources 605 , 625 are assumed to implemented using a memory LUT.
- FIG. 18 and FIG. 19 are provided to illustrate the intermediate residue values of the example fractional multiply calculation in a digit by digit fashion. Associated with the residue intermediate values in FIG. 18 and FIG. 19 are the specific control steps of the flow control sequence of FIG. 16 .
- the example provided will focus on the multiplication of two positive fractional values in RNS fractional format. Multiplication of negative values use the opposite data path of FIG. 6A , and is similar to the flow described for positive results with the exception of two complement units 602 , 650 .
- the intermediate product (IP) is loaded into the mixed radix converter 610 in control step 1601 .
- This step is also shown in FIG. 18 , row 1808 which shows the actual starting intermediate product residue value.
- the intermediate product is the integer product of the two fractional operands A and B which is shown in FIG. 15 .
- FIG. 16 may be more concisely described as a normalization flow control sequence, which is a more general procedure, and may be used to process an intermediate product sum value, for example.
- operand A is the value 8.0625 as is shown in row 1556 of FIG. 15
- operand B is the value 3.25 as shown in row 1557 of FIG. 15
- the integer product of these operands is shown in row 1558 of FIG. 15 , and is also repeated in FIG. 18 , row 1808 .
- the digit modulus are shown in opposite order in FIG. 18 versus FIG. 15 .
- additional initialization takes place; the modulus index, [i], is cleared, and the accumulator section of each mixed radix digit, power term multiplier 630 , 640 of FIG. 6A is cleared.
- the rounding comparator initial state flag “cFlag” is assigned as ‘equal’ initially.
- the modulus associated with the fractional residue digits, M 0 thru M 6 , and hence the fractional range, R F are the first modulus to be divided out by the residue to mixed radix converters 610 , 620 . Therefore, the modulus associated with the fractional digits are numbered with the index [i] starting with zero.
- step 1603 of FIG. 16 a determination is made to whether either mixed radix converter contains a zero value. If not, the flow control proceeds to step 1604 which conveys the fact that the indexed modulus digit is stored, or at least transmitted to the mixed radix digit, power term multiplier 630 of FIG. 6A . This digit is in effect the generated mixed radix digit, d[i], output via bus 612 , 622 of FIG. 6A . In the next control step 1606 , the generated digit d[i] is subtracted from the entire RNS word contained in the residue to mixed radix converter 610 , 622 of FIG. 6A .
- control step 1606 of FIG. 16 a comparison or check is made to determine if the index is within the fractional digit range, and if so, flow control proceeds to control step 1609 , where the generated digit d[i] is compared against a rounding constant in mixed radix format. If the comparison is not equal, the state of the cFlag is updated to reflect the result of the digit comparison.
- the index [i] is incremented to anticipate the access and processing of the next modulus and digit position of the residue number under decomposition within the residue to mixed radix converter 610 , 620 of FIG. 6A .
- These aforementioned steps produce a new residue value in the residue to mixed radix converter, and this resulting value is shown in FIG. 18 , row 1809 by means of our example. Note that in FIG. 18 , row 1809 , the M 0 modulus digit position is shown with an asterisk, since the first modulus M 0 and its associated digit is now undefined, hence it is skipped as is shown in control step 1612 of FIG. 16 .
- the process above is quite fast, and may be accomplished in one or two clock cycles in many embodiments.
- the above process is repeated for each and every modulus associated with the fractional range unless the residue value terminates in a zero value before that time. If so, unless the converted value is originally zero, or greater than the rounding constant, the final result will be an underflow. This type of result may generate an exception, or error, and is not shown in the control sequence of FIG. 16 for sake of clarity.
- the afore-mentioned process forms a loop, which is repeated for each additional fractional modulus M 1 thru M 6 of our example. In our example, this is illustrated in FIG. 18 , rows 1810 thru 1815 .
- control is passed to control step 1603 , 1604 , 1605 of FIG. 16 , and then on to control step 1607 .
- This sequence of steps begins the conversion of residue digits not associated to the fractional range 1801 of FIG. 18 .
- control step 1607 the generated mixed radix digit d[i] is not associated with the fractional range, and is therefore sent to the mixed radix digit, power term multiplier 630 , 640 of FIG. 6A via data paths 612 , 622 respectively.
- control step 1608 the generated mixed radix digit d[i] is multiplied by its respective power term, or digit weight. For the very first non-fractional digit, this weight is equal to the value one, as shown in FIG. 1700 , row 1711 .
- the product of the digit and the value one is of course the value of the digit itself, and this product is summed to the value contained in the add and accumulate unit 1430 of FIG. 14 , which was cleared in control step 1602 of FIG. 16 .
- control is again passed to control step 1611 and control step 1612 to increment the modulus index, and to skip the digit just processed.
- This process flow forms a loop which results in a residue value stored in the accumulator section 1430 of the mixed radix digit, power term multiplier of FIG. 14 .
- the proper power term is always indexed within the power term source 1440 in this process. This index may be the same index [i] in some embodiments.
- the above process forms a control loop which processes the first non-fractional residue digit d[7] in our example, and the relevant values generated during this process are shown in FIG. 18 , rows 1816 thru 1819 .
- the control loop for non-fractional digits repeats until one of the residue values contained in either mixed radix converter unit 610 , 620 goes to zero.
- there are four additional digits generated that are processed using the same control loop procedure, except with different power term values as shown in FIG. 19 , rows 1903 thru 1918 .
- the accumulator of the mixed radix digit, power term multiplier 630 contains the value 13376953 as shown FIG. 19 , row 1919 .
- the cFlag of control step 1609 of FIG. 16 is set to “greater than” (>) in FIG. 18 , row 1815 , and this specifies that a round up is required; this fact is determined in control step 1613 of FIG. 16 .
- the cFlag of the comparison step 1609 may be generated not by a direct compare of two digits d[i] and c[i], but may be set by an early termination of one intermediate residue product over the other.
- row 1918 the accumulator of the mixed radix digit, power term multiplier 630 is sent to the round-up unit 635 in FIG. 6A . Because the cFlag is set “greater than” as noted above, the recomposed residue value from the accumulator section is incremented by one, thereby constituting a round up operation.
- row 1919 the final value is interpreted to be exactly the same result as that obtained in FIG. 15 using the method of mixed radix digit truncation of U.S. Patent Publication No. 2013/0311532, and re-printed as FIG. 15 .
- the example demonstrates the improved fractional multiply method of the present invention, which essentially starts the residue re-composition process in parallel to the residue decomposition process, and this saves additional clock cycles in many embodiments.
- the second claimed fractional residue multiplier achieves lower latency at the cost of additional parallel processing.
- the new method follows a different algorithm for fractional residue multiply then the previous disclosure. We will explain the basic algorithm first; next, we will show how the algorithm is applied to residue fractions and processed by the disclosed apparatus.
- each fractional number is represented by a whole part, w, and a fractional part, f.
- w.f w. 0+0 .f
- FIG. 20 discloses a novel apparatus for separating a fractional residue number into its constituent fractional and whole parts.
- the fractional residue format for the example apparatus of FIG. 20 was introduced in U.S. Patent Publication No. 2013/0311532 and is not repeated here.
- a mixed radix converter unit 2010 which in one embodiment is similar to the apparatus of FIG. 13 , generates a mixed radix digit on each cycle.
- a power term multiplier and accumulator unit 2020 which in one embodiment is similar to the apparatus of FIG. 14 , receives the mixed radix digit and multiplies by its associated power term 2015 , resulting in a weighted digit value.
- the power term multiplier and accumulator unit 2020 also includes an accumulator, which provides a means to provide a running accumulation of weighted digit values.
- the mixed radix converter 2010 starts with the fractional residue digits first; after converting all fractional digits, a fractional range is converted. During the fractional range conversion, the weighted value of each digit output 2012 is summed by the power term multiply and accumulate unit 2020 . Immediately after all fractional digits are converted by the mixed radix converter 2010 , the power terms accumulator 2020 value is latched into the fractional portion latch 2025 . Next, the power term accumulator is cleared and the mixed radix conversion continues, only now the power terms accumulator is summing weighted digits associated with the whole, or integer, range. When mixed radix conversion terminates, the residue value contained in the power term multiply and accumulator 2020 is latched into the integer portion latch 2030 .
- the apparatus for separating a fractional residue number into its fractional and whole parts includes a power terms constants source 2015 .
- FIG. 23 provides more detail of the power terms constants 2015 source by means of example.
- the power terms constants 2015 source of FIG. 23 illustrates an example structure which differs from the power terms source of FIG. 17 in that power terms for both the fractional range, and the whole integer range are provided.
- FIG. 17 only power terms for the whole range is supported, since the multiplier of the previous apparatus does not accumulate weighted digits of the fractional range by design.
- power term constants are provided for both the fractional range, and the whole integer range of a residue fraction.
- the power term constants may be implemented as a memory look up table (LUT), or may be constructed using arithmetic circuits, not disclosed herein, which may dynamically calculate the required power terms constants depending on the fixed point position of a variable point position residue fraction.
- LUT memory look up table
- Variable point position residue fractions also known as sliding point residue fractions, is disclosed in U.S. Patent Publication No. 2013/0311532 and is not repeated here.
- the total number of clocks required for the fractional residue value separation operation is the same as the total number of fractional and whole digits of a single operand. This is advantageous, since the number of clocks required to process the intermediate product of two residue fractional operands, i.e. in terms of converted mixed radix digits, may be double.
- equation 2 it takes several additional clocks to scale and sum the integer and fractional portions, so in some cases, the algorithm of the present multiplier may multiply in less clock latency than the previous multiplier of the present invention. If a residue format supports both a large fractional and whole range, this new multiplier can reduce clock latency significantly. In fact, in terms of multipliers operating in a digit by digit fashion, this new apparatus may be one of the fastest procedures and methods known to date.
- FIG. 22 illustrates a block diagram of one embodiment of the internal components of the fractional portion multiplier 2130 of FIG. 21 .
- a multiply of the fractional digits of operand F 1 and F 2 is performed by integer multiplier 2205 .
- the result of multiplier 2205 results in a partially extended product which is steered by selector 2208 and then accumulated into a fully extended residue value using a residue to mixed radix converter 2210 feeding a power term multiply and accumulator unit 2220 .
- the conversion process occurs one digit at a time, and when complete, the fully extended value, representing a truncated fractional product, is stored in a truncated fraction sum latch 2225 .
- the fully extended result by-passes latch 2225 and is routed directly to subtract unit 2240 .
- the fully extended fractional values, Ext F 1 2237 and Ext F 2 2238 are available from the fractional value separators 2110 and 2120 of FIG. 21 .
- the undefined digits originally produced by the multiplier 2205 are defined as caught up, or extended, if they assume the residue digit values which represent the fully extended product, (f 1 *f 2 ), divided by the fractional range R F .
- the definition of the fractional range R F is the product of all fractionally associated modulus, and is further defined in U.S. Patent Publication No. 2013/0311532.
- a fully extended product (f 1 *f 2 ) of the fractional values is obtained using multiplier 2235 in FIG. 22 .
- the source of the fully extended fractional values, Ext F 1 2237 and Ext F 2 2238 is via fractional value separators 2110 and 2120 of FIG. 21 . These values are available F clock cycles from start.
- the fully extended value representing the product of the truncated fraction digits is subtracted from the fully extended product f 1 *f 2 using residue subtract unit 2240 shown in FIG. 22 .
- the result of subtraction unit 2240 is always a value that is evenly divisible by the fractional range R F , since all fractional residue digits are zero.
- the resulting value is multiplied by the multiplicative inverse of the fractional range (R F ) ⁇ 1 2245 using multiplier 2250 , which occurs in a single step or clock.
- the resulting product of multiplier 2250 is defined for whole and redundant residue digits only (since the fractional range digits are undefined due to MODDIV by the fractional range R F ).
- the resulting whole and redundant digits represent the value of the normalized fractional portion product result (f 1 *f 2 ), but are partially base extended.
- the resulting residue product value is routed to an available base extension unit.
- the resulting digits are routed using bus 2252 to the same mixed radix unit converter 2210 using a bus selector 2208 .
- a new accumulator sum is started by clearing the power term accumulator 2220 .
- An associated set of power term constants are defined for this stage of conversion as shown in FIG. 23 .
- the power term constants 2215 restart at a power of one, and progress as a running product of digit modulus values, i.e., progressively scaled by the value of each modulus processed.
- the non-skipped digits are converted to mixed radix, multiplied by their associated power, and are immediately summed to a running, but fully extended, residue value within the power term multiply and accumulator 2220 .
- the final sum represents the fully extended value of the partially extended product f 1 *f 2 result of multiplier 2250 .
- the final fractional product (f 1 *f 2 ) is transferred from the power term accumulator 2220 to the final fractional result latch 2230 .
- the final fractional result latch 2230 is replaced by a bus which transfers the result to the next stage of processing as indicated in FIG. 21 via bus 2132 .
- a block diagram of one embodiment of the new multiplier is shown.
- Two fractional operands represented as blocks 2102 , 2104 , are shown.
- One completely extended operand 2102 is input into fractional value separator 2110 while the other completely extended operand 2104 is input into fractional value separator 2120 at start.
- fractional digits of the first and second operand are routed by bus 2106 and 2108 respectively, to one and another input of a fractional portion multiplier 2130 .
- value separators 2110 , 2120 output a fully extended fractional value to fractional portion multiplier 2130 .
- F+W number of clocks both value separators 2110 , 2120 output a fully extended integer value.
- a plurality of integer multipliers 2140 , 2142 , 2144 , 2154 provide the first three of the four terms of the right side of equation 2.
- Multipliers 2140 , 2142 provide the product of an integer times a fraction.
- Adder 2150 performs a sum of these two terms.
- Multiplier 2144 provides a product of two integers, and further scales the product by the fractional range constant 2152 using multiplier 2154 .
- the final term of equation 2 appears as a result of fractional portion multiplier 2130 , which is summed to the scaled whole term using adder 2156 .
- a final adder 2160 makes the final summation to complete the processing of the right hand side of equation 2.
- the final fractional multiply result exits adder 2160 in FIG. 21 .
- FIG. 21 Other embodiments may exist which basically accomplish the same functions as FIG. 21 .
- a single multiplier and accumulator circuit may be used in lieu of multipliers 2140 , 2142 , 2144 & 2154 and adders 2150 , 2156 & 2160 .
- Other variations include modifications and enhancements to the residue fractional value separator of FIG. 23 . It should be clearly understood by those skilled in the art that variations to the multiplier apparatus illustrated by the block diagram of FIG. 15 are possible.
- a method for measuring the mixed radix digits associated with the fractional digits (of the IP) may be compared with a suitable rounding constant, and such a similar method may be implemented within the apparatus of FIG. 21 .
- the generation of a sequence of mixed radix digits output 2212 may also be transmitted to a comparator and control unit not shown.
- a comparator and control unit may determine whether a round up is to be performed based upon a least significant first digit comparison to a rounding constant, similar to the rounding constant 1806 of FIG. 18 .
- a comparison of mixed radix digits, or remainder value 1804 is performed against a rounding constant 1806 . If the result of comparison determines the remainder value, associated to the value represented by the series of mixed radix digits 1804 , then the final residue result of the residue fractional multiplier of FIG. 21 may be incremented by one unit. It should be obvious to those skilled in arithmetic circuit design that other forms and modes of rounding is possible.
- FIG. 24 is included to demonstrate an example calculation using the new residue fractional multiplier of the present invention.
- the same numeric operands and the same residue fractional format is used as provided in the example of FIG. 15 and FIGS. 18 & 19 for means of clear comparison.
- row 2407 of FIG. 24 the residue digits for operand A is shown, while in row 2408 the residue digits for operand B is shown.
- the fractional only portion for operand A is shown in row 2409 while the fractional only part of operand B is shown in row 2411 .
- the residue values in rows 2409 , 2410 , 2411 , 2412 of FIG. 24 also show completely extended values.
- row 2413 shows the multiplication, or modulo product, of the fractional digits only of operand A 2407 and operand B 2408 .
- the remaining non-fractionally associated digits of row 2413 are shown with a dash, indicating that they are not used in this process.
- the fractional associated digits needed to form the product of operand A and operand B are therefore immediately available to the apparatus by a simple truncation of the fractionally associated operands A and B at start.
- the fractional portion multiplier of FIG. 24 then performs a base extension of the remaining digits using the residue to mixed radix converter 2210 and the power term multiply and accumulate unit 2220 .
- the result of this base extension may be stored in the truncated fractional sum register 2225 and this result is shown as row 2414 in our example calculation.
- a fully extended fraction portion has been separated from each operand A and B, and their product is calculated using the residue (integer) multiplier 2235 of FIG. 22 .
- the difference of the fully extended truncated fraction product in register 2225 and the fully extended fraction only portions 2235 is performed using a subtract unit 2240 .
- This difference is shown in row 2416 of FIG. 24 .
- this difference is evenly divisible by the fractional range, R F , whose value is also shown in row 2417 .
- the remaining non-fractional digits of the fractional only product of row 2413 can be “caught up”, or calculated in one step, by multiplying the difference of row 2416 to the multiplicative inverse of the fractional range (with respect to the modulus of the number system).
- the result of this product, performed by multiplier 2250 of FIG. 22 is shown in row 2419 of FIG. 24 .
- the multiplicative inverse has no defined digits in the fractional range, so that the product will also not have valid, or defined, fractionally associated digits.
- the product of row 2419 may be base extended by using the apparatus of FIG. 22 using the feedback path 2252 and another base extension cycle.
- the final base extended product is then stored in the final fractional result register 2230 .
- a fully normalized and extended fractional result, f 1 *f 2 is calculated.
- This final value is shown in row 2420 of FIG. 24 .
- rows 2421 and 2422 are the fractional times integer terms of equation 2
- in row 2423 is the product of the whole terms of equation 2 scaled (multiplied) by the fractional range R F .
- the final summation of these terms is accomplished using adder 2156 and adder 2160 .
- This result is shown in the row 2424 of FIG. 24 . This is the same final result of the previous multiplier examples, without the addition of one unit to account for a rounding operation.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
w r .f r =w 1 .f 1 *w 2 .f 2 (eq. 1)
w.f=w.0+0.f
w 1 .f r=(w 1 +f 1)*(w 2 +f 2)=w 1 w 2 +w 1 f 2 +w 2 f 1 +f 1 f 2 (eq. 2)
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/579,918 US9712185B2 (en) | 2012-05-19 | 2014-12-22 | System and method for improved fractional binary to fractional residue converter and multipler |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/475,979 US9081608B2 (en) | 2012-05-19 | 2012-05-19 | Residue number arithmetic logic unit |
US201361919730P | 2013-12-21 | 2013-12-21 | |
US14/579,918 US9712185B2 (en) | 2012-05-19 | 2014-12-22 | System and method for improved fractional binary to fractional residue converter and multipler |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/475,979 Continuation-In-Part US9081608B2 (en) | 2012-05-19 | 2012-05-19 | Residue number arithmetic logic unit |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150106414A1 US20150106414A1 (en) | 2015-04-16 |
US9712185B2 true US9712185B2 (en) | 2017-07-18 |
Family
ID=52810582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/579,918 Active 2033-02-26 US9712185B2 (en) | 2012-05-19 | 2014-12-22 | System and method for improved fractional binary to fractional residue converter and multipler |
Country Status (1)
Country | Link |
---|---|
US (1) | US9712185B2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109558170A (en) * | 2018-11-06 | 2019-04-02 | 海南大学 | It is a kind of to support data level parallel and the 2-D data access framework of multiple instructions fusion |
US10387122B1 (en) | 2018-05-04 | 2019-08-20 | Olsen Ip Reserve, Llc | Residue number matrix multiplier |
US10992314B2 (en) | 2019-01-21 | 2021-04-27 | Olsen Ip Reserve, Llc | Residue number systems and methods for arithmetic error detection and correction |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10410098B2 (en) * | 2017-04-24 | 2019-09-10 | Intel Corporation | Compute optimizations for neural networks |
CN111427538A (en) * | 2019-01-09 | 2020-07-17 | 西北工业大学 | Fixed coefficient product sum calculation method, circuit and data processing device |
US11503322B2 (en) | 2020-08-07 | 2022-11-15 | Samsung Display Co., Ltd. | DPCM codec with higher reconstruction quality on important gray levels |
US11509897B2 (en) * | 2020-08-07 | 2022-11-22 | Samsung Display Co., Ltd. | Compression with positive reconstruction error |
US11791979B2 (en) | 2021-07-08 | 2023-10-17 | International Business Machines Corporation | Accelerated cryptographic-related processing with fractional scaling |
US11947960B2 (en) * | 2021-11-05 | 2024-04-02 | Qualcomm Incorporated | Modulo-space processing in multiply-and-accumulate units |
CN117271437B (en) * | 2023-11-21 | 2024-02-23 | 英特尔(中国)研究中心有限公司 | Processor |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4281391A (en) * | 1979-01-15 | 1981-07-28 | Leland Stanford Junior University | Number theoretic processor |
US4588980A (en) * | 1984-09-24 | 1986-05-13 | Gte Communication Systems Corporation | Residue to analog converter |
US5117383A (en) * | 1989-07-29 | 1992-05-26 | Sony Corporation | Digital signal processing circuit using a residue number system |
US7523151B1 (en) * | 2000-05-12 | 2009-04-21 | The Athena Group, Inc. | Method and apparatus for performing computations using residue arithmetic |
-
2014
- 2014-12-22 US US14/579,918 patent/US9712185B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4281391A (en) * | 1979-01-15 | 1981-07-28 | Leland Stanford Junior University | Number theoretic processor |
US4588980A (en) * | 1984-09-24 | 1986-05-13 | Gte Communication Systems Corporation | Residue to analog converter |
US5117383A (en) * | 1989-07-29 | 1992-05-26 | Sony Corporation | Digital signal processing circuit using a residue number system |
US7523151B1 (en) * | 2000-05-12 | 2009-04-21 | The Athena Group, Inc. | Method and apparatus for performing computations using residue arithmetic |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10387122B1 (en) | 2018-05-04 | 2019-08-20 | Olsen Ip Reserve, Llc | Residue number matrix multiplier |
CN109558170A (en) * | 2018-11-06 | 2019-04-02 | 海南大学 | It is a kind of to support data level parallel and the 2-D data access framework of multiple instructions fusion |
CN109558170B (en) * | 2018-11-06 | 2021-05-04 | 极芯通讯技术(南京)有限公司 | Two-dimensional data path architecture supporting data level parallelism and multi-instruction fusion |
US10992314B2 (en) | 2019-01-21 | 2021-04-27 | Olsen Ip Reserve, Llc | Residue number systems and methods for arithmetic error detection and correction |
Also Published As
Publication number | Publication date |
---|---|
US20150106414A1 (en) | 2015-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9712185B2 (en) | System and method for improved fractional binary to fractional residue converter and multipler | |
US6742012B2 (en) | Apparatus and method for performing multiplication operations | |
US8965945B2 (en) | Apparatus and method for performing floating point addition | |
US8671129B2 (en) | System and method of bypassing unrounded results in a multiply-add pipeline unit | |
US6904446B2 (en) | Floating point multiplier/accumulator with reduced latency and method thereof | |
US8271571B2 (en) | Microprocessor | |
US20150339103A1 (en) | Product summation apparatus for a residue number arithmetic logic unit | |
JPH07200261A (en) | Logic unit | |
JPH10214176A (en) | Device for quickly calculating transcendental function | |
JPH07210369A (en) | Circuit and method for execution of parallel addition and average operation | |
US20080243976A1 (en) | Multiply and multiply and accumulate unit | |
JP5203972B2 (en) | Pre-saturated fixed point multiplier | |
US20140358979A1 (en) | GENERATING A FAST 3x MULTIPLAND TERM FOR RADIX-8 BOOTH MULTIPLICATION | |
US10089073B2 (en) | Method and apparatus for converting from integer to floating point representation | |
US6728744B2 (en) | Wide word multiplier using booth encoding | |
JPWO2007094047A1 (en) | Arithmetic apparatus and arithmetic method | |
US10929101B2 (en) | Processor with efficient arithmetic units | |
JPS63123125A (en) | Floating point adder | |
JPH10187416A (en) | Floating point arithmetic unit | |
US6725360B1 (en) | Selectively processing different size data in multiplier and ALU paths in parallel | |
US7237000B2 (en) | Speed of execution of a conditional subtract instruction and increasing the range of operands over which the instruction would be performed correctly | |
JPH10333883A (en) | Multiplying method and circuit therefor | |
US7412473B2 (en) | Arithmetic circuitry for averaging and methods thereof | |
KR100251547B1 (en) | Digital Sgnal Processor | |
JP5376659B2 (en) | Product-sum operation apparatus and method for controlling product-sum operation apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: DIGITAL SYSTEM RESEARCH INC., NEVADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OLSEN, ERIC B.;REEL/FRAME:042670/0623 Effective date: 20120814 |
|
AS | Assignment |
Owner name: OLSEN IP RESERVE, LLC, NEVADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DIGITAL SYSTEM RESEARCH INC.;REEL/FRAME:042768/0837 Effective date: 20170621 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: 7.5 YR SURCHARGE - LATE PMT W/IN 6 MO, SMALL ENTITY (ORIGINAL EVENT CODE: M2555); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2552); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 8 |