US7269711B2 - Methods and apparatus for address generation in processors - Google Patents
Methods and apparatus for address generation in processors Download PDFInfo
- Publication number
- US7269711B2 US7269711B2 US10/747,764 US74776403A US7269711B2 US 7269711 B2 US7269711 B2 US 7269711B2 US 74776403 A US74776403 A US 74776403A US 7269711 B2 US7269711 B2 US 7269711B2
- Authority
- US
- United States
- Prior art keywords
- address
- carry
- adder
- value
- bit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012937 correction Methods 0.000 claims abstract description 59
- 238000011084 recovery Methods 0.000 claims description 33
- 230000008569 process Effects 0.000 claims description 2
- 238000006073 displacement reaction Methods 0.000 description 31
- 238000007792 addition Methods 0.000 description 28
- 230000007246 mechanism Effects 0.000 description 11
- 238000012545 processing Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 230000000903 blocking effect Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/342—Extension of operand address space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/355—Indexed addressing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
Definitions
- This disclosure relates generally to processors, and, more particularly, to methods and apparatus for address generation in processors.
- Processing speed also known as processing horsepower
- the personal computer is a prime example of how processing speed has become a critical feature in the eyes of the consumer. Consumers expect advertised processor speeds, often measured in terms of the processor clock rate, to increase on an annual, or even semi-annual, basis.
- today's applications require processors with much greater horsepower than just a few years ago. For example, computer games and applications, such as word processors and databases, designed for a computer today are often not able to execute in a useful way on the slower processors of only a few years ago.
- processor horsepower needs to increase accordingly so that the user experience remains constant. As a result, identifying techniques to increase processor speed is an ever-present goal of the processor designer and manufacturer.
- processors i.e., processors whose address space is indexed using 32 bits.
- Processors architected for 32-bit addressing have numerous advantages over their 16-bit predecessors, including the ability to support larger program memory requirements and the ability to support more complex instructions that can perform multiple functions in a single clock cycle.
- typical 32-bit processors are designed to support both 32-bit addressing and the legacy 16-bit addressing. The need to support the legacy 16-bit addressing places an additional burden on the processor designer who is attempting to increase the speed of the 32-bit processor.
- processors are beginning to incorporate mechanisms to support aggressive, out-of-order instruction execution with data speculation.
- Such processors are typically capable of executing multiple program threads in parallel.
- Software compilers for such processors may speculate as to how to organize the code to execute in these parallel threads to achieve as efficient execution as possible.
- the speculation may not always be correct as it is often difficult, if not impossible, to determine the complete program execution flow a-priori.
- conditional execution programming constructs e.g., an if-then-else statement
- two or more threads executing in parallel may need to access the same data variable, resulting in a data dependency.
- a processor supporting out-of-order instruction execution needs to have a mechanism for recovering from incorrect instruction execution, e.g., due to a misspeculation based on conditional execution of an unexpected code segment, a data dependency violation, etc.
- This recovery mechanism typically includes address correction logic that allows the processor to recompute and/or replace one or more address components of a previously executed instruction prior to rescheduling the instruction for re-execution.
- FIG. 1 is a schematic illustration of an example prior art address generator that supports 32-bit and 16-bit address generation.
- FIG. 2 is a schematic illustration of an example processor having a recovery mechanism for incorrect address generation.
- FIG. 3 is a schematic illustration of an example address generator that supports 32-bit and 16-bit address generation for the example processor of FIG. 2 .
- FIG. 4 is a more detailed schematic illustration of the correction logic of FIG. 3 .
- FIG. 5 illustrates the mathematical definition of an example legacy effective address and linear address.
- FIG. 6 illustrates the generation of an example 16-bit address as performed by the example address generator of FIG. 3 .
- FIG. 7 depicts an example 16-bit address generation that results in an incorrect address produced by the example address generator of FIG. 3 .
- FIG. 8 depicts an example recovery from the incorrect 16-bit address generation demonstrated in FIG. 7 that may be carried out by the example address generator of FIG. 3 .
- FIG. 9 is a schematic illustration of an example computer system in which the processor of FIG. 2 may be employed.
- the address generator is a key component that affects overall processor speed.
- modern processors often need to support program code written for legacy processors that have a different address size.
- the address generator may need to support two or more address sizes, e.g., a normal address size and a legacy address size.
- the logic added for the generation of the legacy address or addresses may have a negative impact on the speed with which the normal address is generated.
- As the need to support legacy code dwindles, and with the advent of processing architectures that support instruction recovery, mechanisms that increase the speed with which normal addresses can be generated while still supporting the generation of legacy addresses are possible.
- FIG. 1 An example prior art address generator 100 is shown FIG. 1 .
- the example address generator 100 supports both normal 32-bit address generation and legacy 16-bit address generation.
- the addresses are generated from a set of address components, including, a segment 102 , a displacement 104 , a base 106 and an index 108 .
- the address generator 100 generates two address outputs, referred to herein as an effective address 110 and a linear address 112 .
- the effective address 110 comprises a displacement 104 , a base 106 and an index 108 to allow the processor to support different addressing modes, thereby allowing software developers to write efficient program code.
- the base 106 may be used to point to the start of an array, and then the displacement 104 may be used to point to an element of the array.
- the index 108 may be used to point to elements in a two-dimensional array, or specify a word-size for the elements in the array.
- the linear address 112 points to a physical location in memory.
- the linear address 112 comprises a segment 102 and an effective address 110 to allow the processor memory to be organized into segments. This memory organization, for example, allows the processor to support an expandable memory region and reduces the number of bits needed to represent the address components of many of the processor's instructions.
- the example address generator 100 in FIG. 1 focuses on the generation of legacy 16-bit addresses because the additional logic needed to support legacy address generation is a key factor affecting the address generator's critical path during the execution of 32-bit code.
- the effective address components (the displacement 104 , the base 106 and the index 108 ) are all 16 bits, and, therefore, the effective address 110 is also 16 bits.
- the segment 102 is still 32 bits and, thus, the linear address 112 is a 32-bit value.
- the processor provides the segment 102 and the displacement 104 at least one clock cycle before the base 106 and the index 108 .
- the address generator 100 reorders the address computation to take advantage of the differing arrival times of the different address components, as described below.
- the address generator 100 includes a 32-bit adder 114 to add the segment 102 and the displacement 104 .
- the adder 114 produces a sum output 116 and a carry bit 117 corresponding to an output carry from the 16 th most significant bit position.
- the sum output 116 and the carry bit 117 are denoted as EARLYADD_SUM and EARLYADD_COUT 15 , respectively.
- Staging elements 118 and 119 are provided to buffer EARLYADD_SUM 116 and EARLYADD_COUT( 15 ) 117 , respectively, until the base 106 and the index 108 are available.
- EARLYADD_SUM 116 is then added with the base 106 and the index 108 (the latter two components arriving at least one clock cycle after the segment 102 and the displacement 104 ) using a 32-bit 3:2 carry save adder 120 .
- the carry save adder 120 produces a sum output 122 and a carry output 124 .
- the sum output 122 and the carry output 124 are then added by a 32-bit adder 126 to produce the lower 16 bits of the linear address 112 , as well as a carry bit 128 , denoted as LA_FA_COUT( 15 ).
- LA_FA_COUT( 15 ) 128 corresponds to a carry bit generated by the addition of the lower 16-bits in the adder 126 .
- the address generator 100 also comprises a 16-bit 3:2 carry save adder 130 to add the lower 16 bits of displacement 104 (buffered by the staging element 131 ) with the lower 16-bits of the base 106 and the lower 16 bits of the index 108 .
- the carry save adder 130 produces a sum output 132 and carry output 134 .
- the sum output 132 and carry output 134 are then added by a 16-bit adder 136 to produce the effective address 110 , as well as a carry bit 138 , denoted as LA 13 FA 13 COUT( 15 ).
- each 32 -bit adder 114 , 120 and 126 is able to block the output carry generated by the lower 16 bits from propagating to the upper 16 bits, thus maintaining the proper truncation and wrap-around properties of 16-bit, binary arithmetic.
- the blocking of the appropriate output carry bit is controlled by the input ASIZE( 16 ) 140 into the address generator 100 .
- the input ASIZE( 16 ) 140 is set to a logic ONE for the case of 16-bit address generation, and to logic ZERO for the case of 32-bit address generation.
- the addition of it to the other address components may result in a carry that needs to propagate from the lower 16 bits to the upper 16 bits in the generation of the linear address 112 .
- the input ASIZE( 16 ) 140 may cause all such carries to be blocked and, therefore, the correction logic 142 is needed to determine if a carry bit equal to a logic ONE should be added to the upper 16 bits during the calculation of the linear address 112 .
- the correction logic 142 accepts as input ASIZE( 16 ) 140 , as well as the output carry bits from the adders 114 , 120 , 126 , 130 and 136 resulting from the lower 16 bit additions.
- these inputs include: EARLYADD_COUT( 15 ) 117 , LA_FA_COUT( 15 ) 128 , EA_FA_COUT( 15 ) 138 , the 16 th most significant output carry bit of the carry save adder 120 , denoted as LA_CSA_COUT( 15 ) 144 , and the 16 th most significant output carry bit of the carry save adder 130 , denoted as EA_CSA_COUT( 15 ) 146 .
- the correction logic 142 examines the output carry bits to determine if all carry bits correspond to the generation of the effective address 110 .
- the output 148 of the correction logic 142 is set to a logic ZERO. Otherwise, an additional carry is needed, and COUT( 15 ) 148 is set to a logic ONE.
- the output COUT( 15 ) 148 of the correction logic 142 then drives the input carry CIN( 16 ) 150 for the addition of the upper 16 bits in the adder 126 .
- the correction logic 142 lies directly in the critical path for generation of the upper 16 bits of the linear address 112 .
- the propagation delay of the signals through the correction logic 142 places a limitation on the rate at which 16-bit addresses may be generated. Even worse, the propagation delay through the correction logic also impacts 32-bit address generation, and, thus, overall processor speed.
- the processor 200 includes an instruction scheduler 202 to schedule instructions for execution. To prepare an instruction for execution, the instruction scheduler 202 passes a set of address components 204 , such as a segment, a displacement, a base and an index, to an address generator 206 .
- the address generator 206 may, for example, generate one or more linear addresses for the instruction using the aforementioned address components 204 .
- the linear address for example, may point to a physical location in memory wherein an argument of the instruction is located.
- the example address generator 206 has two other control inputs: a size input 208 and a force-carry (FC) input 210 .
- the size input 208 is used to specify the size of the address 211 being generated.
- the address generator may support generation of both a normal size address (e.g., a 32-bit address) and a legacy size address (e.g., a 16-bit address).
- the force-carry input 210 is used to modify an operation within the address generator 206 . In the example address generator 300 of FIG. 3 described below, the force-carry input 210 modifies the value of a carry-bit of an adder within the address generator 206 depending on the address size indicated by the size input 208 .
- the example address generator 206 has two more outputs in addition to the generated addresses 211 : a correction indicator output 212 and a force carry value output 214 .
- the correction indicator 212 also known as the “need correction” (NC) indicator 212 , is an output generated by the address generator to indicate that the generated address 211 is incorrect.
- the force carry value output 214 is an optional output that may be used, for example, to indicate the state of the force carry input 210 during the preceding address generation, or to output the value of a particular carry bit determined during the generation of the most recent address 211 .
- the example processor 200 also comprises a recovery mechanism for recovery from erroneous instruction execution, for example, due to out-of-order, speculative processing.
- the recovery mechanism includes a recovery queue 218 to store pending instructions, and a recovery control unit 220 to determine how the instructions pending in the recover queue 218 should be processed.
- the instruction scheduler 202 passes instructions to the recovery queue 218 to store temporarily until the results of executing the instruction are confirmed and committed to the overall program execution flow.
- the processor may contain an instruction retirement unit (not shown) that makes this determination. If the results of executing the instruction are deemed valid, the recovery control unit 220 will cause the recovery queue 218 to remove the corresponding instruction from its temporary storage. If the results are invalid, the recovery control unit 220 will cause the recovery queue 218 to reschedule the affected instruction for execution, possibly after modifying a portion or portions of the instruction to be scheduled.
- the recovery control unit 220 may be modified to accept the correction indicator 212 as another input to signal another condition by which the execution of an instruction may be invalid.
- the recovery control unit 220 may also be modified to accept the force carry value 214 as an input to indicate how the instruction needs to be modified prior to rescheduling the instruction for re-execution (specifically, how the generation of the affected address needs to be modified).
- the recovery control unit 220 may modify the force carry input 210 to the address generator and the recovery queue 218 may apply the address components 204 to the address generator.
- the address generator 206 is designed so that the propagation delay for normal address generation is minimized, whereas legacy address generation may have additional propagation delay (resulting in a slower processing speed for legacy addresses).
- the address generator 206 is designed so that the generation of a normal address is always correct, whereas the generation of a legacy address may result in an incorrect value. Specifically, returning to the example address generator 100 in FIG. 1 , the address generator 206 operates in a manner substantially equivalent to assuming that the output 148 of the correction logic 142 is a particular value and then using this value in the generation of the linear address 112 .
- the propagation delay through the correction logic 142 no longer impacts the time required to generate the upper 16 bits of the linear address 112 .
- the assumed value may be incorrect. If this is the case, then the address must be regenerated using a different assumed value for the output 148 . In this way, the speed with which normal addresses are generated may be improved at the expense of possible errors in the generation of legacy addresses, with any such errors necessitating the regeneration of the legacy address or addresses.
- this added delay may have little to no impact to the legacy code's user experience.
- the instruction scheduler 202 passes the instruction to the recovery queue 218 for temporary storage, and provides the address components 204 to the address generator 206 . If the size 208 of the address corresponds to a normal address (for example, as determined by an instruction decoder not shown in FIG. 2 ), the address generator 206 determines the correct linear address 211 . The correction indicator 212 and the force carry value output 214 are ignored as they are not applicable to normal address generation.
- the instruction scheduler 202 sets the value of the force carry input 210 to an initial value, for example, a logic ZERO.
- the address generator 206 generates the legacy address and determines if the address is correct as described in detail below. If the correction indicator 212 indicates that the address is correct, the recovery control unit 220 will not reschedule the instruction due to incorrect address generation (although the instruction may be rescheduled for other reasons, such as a data dependency violation, that are unrelated to the address generated by the address generator 206 ).
- the recovery control unit 220 will reschedule the instruction and cause the recovery queue 218 to provide the appropriate address components 204 to the address generator 206 . Additionally, the recovery control unit will modify the force carry input 210 , for example, by setting the input to a logic ONE based on the previous state of this input (logic ZERO) as indicated by the force carry value output 214 .
- the address generator 206 could be designed to default to a particular carry value for a legacy address, and the force carry input 210 could be used to signal the address generator 206 to toggle this value.
- Another example could employ prediction logic to set the carry-bit to a value based on the values of the address components 204 , e.g., how likely the addition of the components will result in the carry-bit of interest being a logic ONE or a logic ZERO.
- the example address generator 206 may be used in processors that do not inherently support out-of-order instruction execution, or the like.
- FIG. 3 An example address generator 300 , which may be used to implement the address generator 206 of FIG. 2 , is shown in FIG. 3 .
- the example address generator 300 supports both normal 32-bit address generation and legacy 16-bit address generation. For both address types, the addresses are generated from a set of address components, including, a segment 302 , a displacement 304 , a base 306 and an index 308 .
- the address generator 300 generates two address outputs, the effective address 310 and the linear address 312 , based on the mathematical expressions discussed previously.
- the effective address components are all 16 bits, and, therefore, the effective address 310 is also 16 bits.
- the segment 302 is still 32 bits and, thus, the linear address 312 is a 32-bit value.
- the processor provides the segment 302 and the displacement 304 at least one clock cycle before the base 306 and the index 308 .
- the address generator 312 reorders the address computation to take advantage of the differing arrival times of the different address components, described as follows.
- the address generator 300 includes a 32-bit adder 314 to add the segment 302 and the displacement 304 .
- the adder 314 produces a sum output 316 and a carry bit 317 coffesponding to an output carry from the 16 th most significant bit position.
- the sum output 316 and the carry bit 317 are denoted as EARLYADD_SUM and EARLYADD_COUT( 15 ), respectively.
- Staging elements 318 and 319 are provided to buffer EARLYADD_SUM 316 and EARLYADD_COUT( 15 ) 317 , respectively, until the base 306 and the index 308 are available.
- EARLYADD_SUM 316 is then added with the base 306 and the index 308 (the latter two components arriving at least one clock cycle after the segment 302 and the displacement 304 ) using a 32-bit 3:2 carry save adder 320 .
- the carry save adder 320 produces a sum output 322 and carry output 324 .
- the sum output 322 and carry output 324 are then added by a 32-bit adder 326 to produce all 32 bits of the linear address 312 , as well as an output carry bit 328 , denoted as LA_FA_COUT( 15 ).
- LA_FA_COUT( 15 ) 328 corresponds to a carry bit generated by the addition of the lower 16-bits in the adder 326 .
- the address generator 300 also includes a 16-bit 3:2 carry save adder 330 to add the displacement 304 (buffered by the staging element 331 ) with the base 306 and the index 308 .
- the carry save adder 330 produces a sum output 332 and a carry output 334 .
- the sum output 332 and the carry output 334 are then added by a 16-bit adder 336 to produce the effective address 310 , as well as a carry bit 338 , denoted as EA_FA_COUT( 15 ).
- EA_FA_COUT( 15 ) 338 corresponds to the carry bit generated by the addition of the lower 16-bits in the adder 336 .
- each 32-bit adder 314 , 320 and 326 is able to block the output carry generated by the lower 16 bits from propagating to the upper 16 bits, thus maintaining the proper truncation and wrap-around properties of 16-bit, binary arithmetic.
- the blocking of the appropriate output carry bit is controlled by the input ASIZE( 16 ) 340 into the address generator 300 .
- the input ASIZE( 16 ) 340 is set to a logic ONE for the case of 16-bit address generation, and to logic ZLRO for the case of 32-bit address generation.
- the address generator 300 includes a force carry input 342 to allow the value of this carry to be set to a particular value at the start of address generation. In this way, generation of the 32-bit linear address is not delayed due to the determination of the correct value for this carry-bit.
- the force-carry input 342 causes the carry bit to be set to an incorrect value, then the resulting address will also be incorrect, thereby requiring that the linear address be regenerated using a different value for the force-carry input 342 .
- the address generator 300 contains correction logic 344 to determine, for example, the correct value of the force carry input 342 . Further detail pertinent to the correction logic 344 is provided below in conjunction with FIG. 4 .
- the correction logic 344 accepts as input the output carry bits from the adders 314 , 320 , 326 , 330 and 336 resulting from the lower 16 bit additions.
- these inputs include: EARLYADD_COUT( 15 ) 317 , LA_FA_COUT( 15 ) 328 , EA_FA_COUT( 15 ) 338 , the 16 th most significant output carry bit of the carry save adder 320 , denoted as LA_CSA_COUT( 15 ) 346 , and the 16 th most significant output carry bit of the carry save adder 330 , denoted as EA_CSA_COUT( 15 ) 348 .
- the correction logic 344 examines the output carry bits to determine if all carry bits correspond to the generation of the effective address 310 .
- correction indicator 350 of the correction logic 344 is output from the address generator 300 for use by the processor's recovery mechanism (as described above for the example processor 200 of FIG. 2 ) to determine whether the generated legacy address is incorrect and needs to be regenerated using a different value for the force carry input 342 .
- the correction logic 344 may also provide a force carry value output 352 that is representative of the previous value of the force carry input 342 to aid the recovery mechanism in determining the appropriate new value for the force carry input 342 .
- the ASIZE( 16 ) input 340 is set so that the carries generated by the lower 16 bits are allowed to propagate to the upper 16 bits in the 32-bit adders 314 , 320 and 326 .
- the force-caffy input 342 and the correction indicator 350 and the forced carry value output 352 may be ignored as the generated 32-bit address will be correct.
- the correction logic 344 needed for 16-bit address generation has no negative impact on 32-bit address generation.
- FIG. 4 depicts an example correction logic circuit 400 that could be used to implement the correction logic 300 of FIG. 3 .
- the example correction logic circuit 400 accepts as input the following five output carry bits as described above: EARLYADD_COUT( 15 ) 402 , EA_CSA_COUT( 15 ) 404 , LA_CSA_COUT( 15 ) 406 , EA_FA_COUT( 15 ) 408 and LA_FA_COUT( 15 ) 410 .
- An exclusive-OR (XOR) operation (XOR gate 412 ) is performed on EARLYADD_COUT( 15 ) 402 , EA_CSA_COUT( 15 ) 404 , and LA_CSA_COUT( 15 ) 406 .
- XOR gate 414 An XOR operation (XOR gate 414 ) is then performed on the output of the XOR gate 412 and EA_FA_COUT( 15 ) 408 .
- XOR gate 416 an XOR operation (XOR gate 416 ) is performed on the output of the XOR gate 414 and LA_FA_COUT( 15 ) 410 .
- the example circuit 400 performs an XOR operation on all five output carry bits.
- the XOR operation performed by the example circuit 400 compares a first set of carry bits EA_CSA_COUT( 15 ) 348 and EA_FA_COUT( 15 ) 338 ) generated during the computation of the effective address 310 to a second set of carry bits EARLYADD_COUT( 15 ) 317 , LA_CSA_COUT( 15 ) 346 and LA_FA_COUT( 15 ) 328 ) generated in the computation of the linear address 312 .
- the example correction logic circuit 400 accepts the force carry as an input 418 and the address size ASIZE( 16 ) as input 419 .
- the example circuit 400 is designed assuming that, for legacy 16-bit address generation, the ASIZE( 16 ) input 419 is a logic ONE and the first attempt at generating the linear address uses a default force carry input 418 equal to a logic ZERO. Then, an AND operation (AND gate 420 ) is performed on the inverse of the force carry input 418 , the AS 1 ZE( 16 ) input 419 and the output of the XOR gate 416 to produce the correction indication output 422 , which corresponds to the correction indication output 350 of FIG. 3 .
- the AS 1 ZE( 16 ) input 419 is included in the AND operation so that a correction indication output 422 equal to a logic ONE can occur only for legacy 16-bit addresses.
- the correction indication output 422 is equal to the output of the XOR gate 416 .
- the force carry input 418 was properly set to a logic ZERO and the correction indicator 422 is set to logic ZERO, thereby indicating that no correction is needed.
- the force carry input 418 was incorrectly set to a logic ZERO and the correction indicator 422 is set to a logic ONE, thereby indicating that a correction is needed. If a correction is needed, on the second attempt the force value input will be equal to a logic ONE, thus forcing the correction indicator 422 to be a logic ZERO for the second attempt, thereby indicating that the generated address is now correct.
- circuit configurations could be used to compare the output of the XOR gate 416 to the force carry input 418 .
- a coincidence operation i.e., the inverse of the XOR operation
- the ASIZE( 16 ) input 419 could be removed from the example correction logic circuit 400 if the recovery mechanism is able to ignore a correction indication output 422 corresponding to a normal 32-bit address.
- the example correction logic circuit 400 also includes a force carry value output 424 that is formed by delaying the force carry input 418 using a delay element 426 .
- the force carry value output 424 corresponds to the force carry value output 352 of FIG. 3 .
- FIG. 5 illustrates the mathematical definition of an example 16-bit effective address and linear address. Note that the example of FIG. 5 is provided to more clearly illustrate the definition of the effective address and linear addresses mentioned previously, and neither the example prior art address generator 100 of FIG. 1 or the example address generator 300 of FIG. 3 actually perform the sequence of addition operations exactly as shown in FIG. 5 .
- the lower 16 bits of the displacement 510 , the base 512 and the index 514 are the 16-bit values represented by A 15 . . . A 0 , B 15 . . . B 0 and C 15 . . . C 0 , respectively.
- the upper 16 bits of the displacement 516 , the base 518 and the index 520 are all set to logic ZEROs because the address components are restricted to 16 bits for this case.
- the aforementioned address components are then summed to produce the 16-bit effective address, wherein the lower 16 bits of the effective address 522 are represented by Y 15 . . . Y 0 , and the upper 16 bits of the effective address 524 are forced to all logic ZEROs by blocking any potential logic ONEs that may be generated as the carry out bits of the 16 th and 17 th bit positions ( 526 ) due to the summation of the three previous 16-bit address components.
- the resulting effective address 522 , 524 is then added to the 32-bit segment, wherein the upper 16 bits 528 and the lower 16 bits 530 of the segment are represented as X 31 . . . X 16 and X 15 . . . X 0 , respectively.
- the result of this addition is the 32-bit linear address, wherein the upper 16 bits 532 and the lower 16 bits 534 of the segment are represented as Z 31 . . . Z 16 and Z 15 . . . Z 0 , respectively. Note that all carries are allowed to propagate in this final addition as the result is a 32-bit value.
- FIG. 6 illustrates the calculation of an example 16-bit address as performed by the example address generator 300 .
- the example address generator 300 computes the legacy linear address 312 (via 32-bit adders 314 , 320 and 326 ) and the effective address 310 (via 16-bit adders 330 and 336 ) in parallel.
- the address generator 300 adds the displacement 610 and the segment 612 to determine the “early add” output 614 .
- the early add output 614 also includes the addition of the force carry input 616 that is set to either a logic ZERO or a logic ONE at the start of the address computation.
- the output carry 618 from the addition of the lower 16 bits of the displacement 610 and the segment 612 is blocked during legacy 16-bit address generation.
- the 32-bit early add output 614 is, in effect, the result of two separate 16-bit additions, namely, the addition of the lower 16 bits of the displacement 610 and the segment 612 , and the addition of the upper 16 bits of the segment 612 to the force carry input 616 .
- the early add output 614 is added with the base 620 and the index 622 via a 3:2 carry save adder to produce a 32-bit sum output 624 and a 32-bit carry output 626 .
- the output carry 628 generated during the addition the lower 16 bits of the early add output 614 , the base 620 and the index 622 is blocked during legacy 16-bit address generation.
- the sum output 624 and the carry output 626 are added to produce the legacy linear address 630 , again with the output carry 632 from the 16 th bit position being blocked for legacy 16-bit address generation.
- the example address generation 300 computes the legacy 16-bit effective address in parallel with the linear address as follows.
- the displacement 634 (corresponding to the lower 16 bits of the displacement 610 ) is added with the base 636 (corresponding to the lower 16 bits of the base 620 ) and the index 638 (corresponding to the lower 16 bits of the index 622 ) via a 3:2 carry save adder to produce a 16-bit sum output 640 and a 16-bit carry output 642 .
- An output carry 644 is generated during the 16-bit addition the lower 16 bits of the displacement 634 , the base 636 and the index 638 .
- the sum output 640 and the carry output 642 are added to produce the legacy effective address 646 and an output carry 648 from the 16 th most significant bit position.
- the carries generated during the computation of the effective address can compared to the output carries generated during the computation of the linear address to determine whether or not all output carries having a value of logic ONE correspond to the effective address computation. If not, then an output carry needed for the computation of the linear address was incorrectly blocked, and the force carry input 616 should be set to a logic ONE to correctly generate the legacy linear address in the next attempt.
- the address components are added in different sequences for the linear address 630 and the effective address 646 .
- the output carries generated during computation of the linear address there is no one-to-one correspondence between the output carries generated during computation of the linear address and the output carries generated during computation of the effective address.
- the total number of output carries having a value of logic ONE may be compared to determine whether the same number occur for the effective address computation as for the linear address computation.
- the set of linear address output carries 618 , 628 and 632 may be compared to the set of effective address output carries 644 and 648 . If both sets contain the same number of logic ONE values, then all output carries were correctly blocked during the generation of the linear address because they all corresponded to the computation of the effective address. However, if both sets do not contain the same number of logic ONE values, then an output carry corresponding to the computation of the linear address was incorrectly blocked, and the correction indicator (e.g., output 350 of FIG. 3 ) should be set to a logic ONE to indicate that the resulting legacy linear address is incorrect. As described with respect to FIG. 4 , the aforementioned comparison can be performed using an XOR operation.
- FIGS. 7 and 8 illustrate a numerical example of the legacy address computations shown in FIG. 6 .
- FIG. 7 shows the first attempt at generating the legacy addresses, so the force carry input 710 is set to a logic ZERO.
- the legacy linear address is generated as follows.
- the displacement value 0xC000 (block 712 ) and the segment value 0x05554000 (block 714 ) are added to produce an early add output 716 , wherein the upper 16 bits and the lower 16 bits are 0x0555 and 0x0000, respectively.
- the addition also produces an output carry 718 from the lower 16 bits that is equal to a logic ONE, which is blocked for legacy 16-bit address generation.
- the early add output value 0x05550000 (block 716 ), the base value 0xC000 (block 720 ) and the index value 0x6000 (block 722 ) are summed via a 3:2 carry save adder.
- the result is a sum output 724 , wherein the upper 16 bits and the lower 16 bits are 0x0555 and 0xA000, respectively, and a carry output 726 , wherein the upper 16 bits and the lower 16 bits are 0x0000 and 0x8000, respectively.
- the adder also produces an output carry 728 from the lower 16 bits that is equal to a logic ZERO, which is blocked for legacy 16-bit address generation.
- the sum output value 0x0555A000 (block 724 ) and the carry output value 0x00008000 (block 726 ) are added to produce the legacy linear address 730 , wherein the upper 16 bits and the lower 16 bits are 0x0555 and 0x2000, respectively.
- the addition also produces an output carry 732 equal to a logic ONE, which is blocked for 16-bit legacy address generation.
- the legacy linear address generated for this example is 0x05552000 (block 730 ).
- the legacy 16-bit effective address is generated as follows.
- the displacement value 0xC000 (block 734 ), the base value 0xC000 (block 736 ) and the index value 0x6000 (block 738 ) are summed via a 3:2 carry save adder.
- the result is a sum output 740 equal to 0x6000 and a carry output 742 equal to 0x8000.
- the adder also produces an output carry 744 that is equal to a logic ONE.
- the sum output value 0x6000 (block 740 ) and the carry output value 0x8000 (block 742 ) are added to produce a legacy effective address 746 having a value of 0xE000 for this example.
- the addition also produces an output carry 748 equal to a logic ZERO.
- the output carries generated in the example of FIG. 7 are compared to determine whether the generated legacy linear address is correct.
- the legacy linear address generated using a force carry input 710 equal to a logic ZERO is incorrect.
- the generated legacy linear address is 0x05552000, which indicates that a carry was incorrectly blocked (see block 750 ).
- FIG. 8 shows the second attempt at generating the legacy addresses for the example shown in FIG. 7 .
- FIGS. 7 and 8 represent two phases of the same example legacy address generation, a detailed description of the intermediate computations for the linear address and the effective address will not be re-described herein. Instead, the interested reader is referred to the previous description of FIG. 7 for a complete discussion of the operations. To assist the reader in the exercise, substantially equivalent blocks are labeled with identical reference numerals.
- FIG. 8 corresponds to the second attempt at generating the legacy linear address and, therefore, the force carry input 710 is now set to a logic ONE.
- the computation of the legacy 16-bit effective address remains unchanged, and yields the effective address of 0xE000 (block 746 ).
- the force-carry input value of logic ONE propagates through the intermediate additions for the upper 16 bits, and results in a generated legacy linear address of 0x05562000 (block 730 ), which is the correct result.
- the force-carry input (e.g., 616 of FIG. 6 and 710 of FIGS. 7 and 8 ) is depicted as modifying the value of the displacement (e.g., 610 of FIG. 6 and 712 of FIGS. 7 and 8 ).
- an equivalent operation may be achieved by forcing a logic ONE on the input carry for the upper 16 bits of the appropriate adder (i.e., the addition of the 17 th bit positions of the arguments being summed).
- the force-carry input may be applied to any convenient adder in the set of adders used to generate the linear address (e.g., any of the adders 314 , 320 and 326 of FIG. 3 ).
- FIG. 9 is a block diagram of an example computer system 1000 capable of implementing the apparatus and/or methods disclosed herein.
- the computer system 1000 can be, for example, a server, a personal computer, a personal digital assistant (PDA), an Internet appliance, or any other type of computing device.
- PDA personal digital assistant
- the computer system 1000 includes a processor 1012 which may be implemented by the example processor 200 of FIG. 2 .
- the processor 1012 is in communication with a main memory including a volatile memory 1014 and a non-volatile memory 1016 via abus 1018 .
- the volatile memory 1014 may be implemented by Static Random Access Memory (SRAM), Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM) and/or any other type of random access memory device.
- the non-volatile memory 1016 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1014 , 1016 is typically controlled by a memory controller (not shown) in a conventional manner.
- the computer 1000 also includes a conventional interface circuit 1020 .
- the interface circuit 1020 may be implemented by any type of well known interface standard, such as an Ethernet interface, a universal serial bus (USB), and/or a third generation input/output (3GIO) interface.
- One or more input devices 1022 are connected to the interface circuit 1020 .
- the input device(s) 1022 permit a user to enter data and commands into the processor 1012 .
- the input device(s) can be implemented by, for example, a keyboard, a mouse, a touchscreen, a track-pad, a trackball, an isopoint and/or a voice recognition system.
- One or more output devices 1024 are also connected to the interface circuit 1020 .
- the output devices 1024 can be implemented, for example, by display devices (e.g., a liquid crystal display, a cathode ray tube display (CRT)), by a printer and/or by speakers.
- the interface circuit 1020 thus, typically includes a graphics driver card.
- the interface circuit 1020 also includes a communication device such as a modem or network interface card to facilitate exchange of data with external computers via a network 1026 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
- a network 1026 e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.
- the computer 1000 also includes one or more mass storage devices 1028 for storing software and data.
- mass storage devices 1028 include floppy disk drives, hard drive disks, compact disk drives and digital versatile disk (DVD) drives.
- the mass storage device 1028 may be used to store the address components 204 provided to the address generator 206 by the instruction scheduler 202 .
- the volatile memory 1014 may be used to store the address components 204 .
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
effective address=displacement+base+index.
Similarly, the
linear address=effective address+segment.
effective address=displacement+base+index
and:
linear address=effective address+segment.
Claims (41)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/747,764 US7269711B2 (en) | 2003-12-29 | 2003-12-29 | Methods and apparatus for address generation in processors |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/747,764 US7269711B2 (en) | 2003-12-29 | 2003-12-29 | Methods and apparatus for address generation in processors |
Publications (2)
Publication Number | Publication Date |
---|---|
US20050144423A1 US20050144423A1 (en) | 2005-06-30 |
US7269711B2 true US7269711B2 (en) | 2007-09-11 |
Family
ID=34700792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/747,764 Expired - Fee Related US7269711B2 (en) | 2003-12-29 | 2003-12-29 | Methods and apparatus for address generation in processors |
Country Status (1)
Country | Link |
---|---|
US (1) | US7269711B2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070143352A1 (en) * | 2005-12-21 | 2007-06-21 | International Business Machines Corporation | Method and system for implementing database migration using a staged approach |
US7757198B1 (en) | 2007-04-10 | 2010-07-13 | Lattice Semiconductor Corporation | Scan chain systems and methods for programmable logic devices |
US9052890B2 (en) | 2010-09-25 | 2015-06-09 | Intel Corporation | Execute at commit state update instructions, apparatus, methods, and systems |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12007909B2 (en) * | 2020-12-15 | 2024-06-11 | Xsight Labs Ltd. | Elastic resource management in a network switch |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5233553A (en) * | 1991-03-06 | 1993-08-03 | Chips And Technologies, Inc. | Apparatus for performing modulo arithmetic with three-port adder |
US5511017A (en) * | 1994-06-01 | 1996-04-23 | Exponential Technology, Inc. | Reduced-modulus address generation using sign-extension and correction |
US5970512A (en) * | 1997-03-27 | 1999-10-19 | International Business Machines Corporation | Translation shadow array adder-decoder circuit for selecting consecutive TLB entries |
US6066965A (en) * | 1997-12-11 | 2000-05-23 | Evsx, Inc. | Method and apparatus for a N-nary logic circuit using 1 of 4 signals |
US6105126A (en) * | 1998-04-30 | 2000-08-15 | International Business Machines Corporation | Address bit decoding for same adder circuitry for RXE instruction format with same XBD location as RX format and dis-jointed extended operation code |
US6105129A (en) * | 1998-02-18 | 2000-08-15 | Advanced Micro Devices, Inc. | Converting register data from a first format type to a second format type if a second type instruction consumes data produced by a first type instruction |
US6209076B1 (en) * | 1997-11-18 | 2001-03-27 | Intrinsity, Inc. | Method and apparatus for two-stage address generation |
US6457115B1 (en) * | 2000-06-15 | 2002-09-24 | Advanced Micro Devices, Inc. | Apparatus and method for generating 64 bit addresses using a 32 bit adder |
US20020194452A1 (en) * | 2001-06-01 | 2002-12-19 | Catherwood Michael I. | Modulo addressing based on absolute offset |
US20030093775A1 (en) * | 2001-11-14 | 2003-05-15 | Ronald Hilton | Processing of self-modifying code under emulation |
-
2003
- 2003-12-29 US US10/747,764 patent/US7269711B2/en not_active Expired - Fee Related
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5233553A (en) * | 1991-03-06 | 1993-08-03 | Chips And Technologies, Inc. | Apparatus for performing modulo arithmetic with three-port adder |
US5511017A (en) * | 1994-06-01 | 1996-04-23 | Exponential Technology, Inc. | Reduced-modulus address generation using sign-extension and correction |
US5970512A (en) * | 1997-03-27 | 1999-10-19 | International Business Machines Corporation | Translation shadow array adder-decoder circuit for selecting consecutive TLB entries |
US6209076B1 (en) * | 1997-11-18 | 2001-03-27 | Intrinsity, Inc. | Method and apparatus for two-stage address generation |
US6066965A (en) * | 1997-12-11 | 2000-05-23 | Evsx, Inc. | Method and apparatus for a N-nary logic circuit using 1 of 4 signals |
US6105129A (en) * | 1998-02-18 | 2000-08-15 | Advanced Micro Devices, Inc. | Converting register data from a first format type to a second format type if a second type instruction consumes data produced by a first type instruction |
US6105126A (en) * | 1998-04-30 | 2000-08-15 | International Business Machines Corporation | Address bit decoding for same adder circuitry for RXE instruction format with same XBD location as RX format and dis-jointed extended operation code |
US6457115B1 (en) * | 2000-06-15 | 2002-09-24 | Advanced Micro Devices, Inc. | Apparatus and method for generating 64 bit addresses using a 32 bit adder |
US20020194452A1 (en) * | 2001-06-01 | 2002-12-19 | Catherwood Michael I. | Modulo addressing based on absolute offset |
US20030093775A1 (en) * | 2001-11-14 | 2003-05-15 | Ronald Hilton | Processing of self-modifying code under emulation |
Non-Patent Citations (5)
Title |
---|
IA-32 Intel(R) Architecture Optimization: Reference Manual, Intel Corporation, 2003, Chapter 1: IA-32 Intel(R) Architecture Processor Family Overview, pp. 1.1-1.34. |
IA-32 Intel(R) Architecture Software Developer's Manual, vol. 1: Basic Architecture, Intel Corporation, 2003, Chapter 3: Basic Execution Environment, pp. 3.1-3.17. |
IS 32 Intel Architecture Software Developer's Manual, vol. 3: System Programing Guide, Intel Corporation, 2003, Chapter 2: System Architecture Overview, pp. 2.1-2.7. |
IS 32 Intel Architecture Software Developer's Manual, vol. 3: System Programing Guide, Intel Corporation, 2003, Chapter 3: Protected-Mode Memory Management, pp. 3.1-3.38. |
P6 Family of Processors: Hardware Developer's Manual, Intel Corporation, Sep. 1998, 16 pages. |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070143352A1 (en) * | 2005-12-21 | 2007-06-21 | International Business Machines Corporation | Method and system for implementing database migration using a staged approach |
US7757198B1 (en) | 2007-04-10 | 2010-07-13 | Lattice Semiconductor Corporation | Scan chain systems and methods for programmable logic devices |
US9052890B2 (en) | 2010-09-25 | 2015-06-09 | Intel Corporation | Execute at commit state update instructions, apparatus, methods, and systems |
Also Published As
Publication number | Publication date |
---|---|
US20050144423A1 (en) | 2005-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7818550B2 (en) | Method and apparatus for dynamically fusing instructions at execution time in a processor of an information handling system | |
JP3120152B2 (en) | Computer system | |
US6772355B2 (en) | System and method for reducing power consumption in a data processor having a clustered architecture | |
US5860151A (en) | Data cache fast address calculation system and method | |
US20060206693A1 (en) | Method and apparatus to execute an instruction with a semi-fast operation in a staggered ALU | |
US20060015855A1 (en) | Systems and methods for replacing NOP instructions in a first program with instructions of a second program | |
CN100543670C (en) | Method and apparatus for generating extended precision integer division quotient and remainder | |
US7010676B2 (en) | Last iteration loop branch prediction upon counter threshold and resolution upon counter one | |
Sawada et al. | Verification of FM9801: An out-of-order microprocessor model with speculative execution, exceptions, and program-modifying capability | |
US7269711B2 (en) | Methods and apparatus for address generation in processors | |
US7213136B2 (en) | Apparatus and method for redundant zero micro-operation removal | |
US7065636B2 (en) | Hardware loops and pipeline system using advanced generation of loop parameters | |
Sazeides | Modeling value speculation | |
US7404065B2 (en) | Flow optimization and prediction for VSSE memory operations | |
US6829700B2 (en) | Circuit and method for supporting misaligned accesses in the presence of speculative load instructions | |
US20060168431A1 (en) | Method and apparatus for jump delay slot control in a pipelined processor | |
US7269720B2 (en) | Dynamically controlling execution of operations within a multi-operation instruction | |
US6920547B2 (en) | Register adjustment based on adjustment values determined at multiple stages within a pipeline of a processor | |
CN115269011A (en) | Instruction execution unit, processing unit and related device and method | |
US5680408A (en) | Method and apparatus for determining a value of a majority of operands | |
US6430682B1 (en) | Reliable branch predictions for real-time applications | |
US20050144604A1 (en) | Methods and apparatus for software value prediction | |
US6266757B1 (en) | High speed four-to-two carry save adder | |
CN118963842A (en) | Jump instruction target address locating device and CPU core | |
Müller | Complexity and correctness of computer architectures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PATEL, RAJESH B.;FARRELL, ROBERT L.;PHILLIPS, JAMES E.;AND OTHERS;REEL/FRAME:014946/0934;SIGNING DATES FROM 20031218 TO 20040109 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Expired due to failure to pay maintenance fee |
Effective date: 20190911 |