+

WO1998002797A1 - Procede et appareil pour predecoder des instructions a longueur d'octet variable dans un microprocesseur superscalaire - Google Patents

Procede et appareil pour predecoder des instructions a longueur d'octet variable dans un microprocesseur superscalaire Download PDF

Info

Publication number
WO1998002797A1
WO1998002797A1 PCT/US1996/011757 US9611757W WO9802797A1 WO 1998002797 A1 WO1998002797 A1 WO 1998002797A1 US 9611757 W US9611757 W US 9611757W WO 9802797 A1 WO9802797 A1 WO 9802797A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
byte
bit
superscalar microprocessor
instructions
Prior art date
Application number
PCT/US1996/011757
Other languages
English (en)
Inventor
Thang M. Tran
Original Assignee
Advanced Micro Devices, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Micro Devices, Inc. filed Critical Advanced Micro Devices, Inc.
Priority to EP96925323A priority Critical patent/EP0912923A1/fr
Priority to PCT/US1996/011757 priority patent/WO1998002797A1/fr
Priority to JP50595198A priority patent/JP3732233B2/ja
Publication of WO1998002797A1 publication Critical patent/WO1998002797A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3818Decoding for concurrent execution
    • G06F9/382Pipelined decoding, e.g. using predecoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30149Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
    • G06F9/30152Determining start or end of instruction; determining instruction length
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/3016Decoding the operand specifier, e.g. specifier format
    • G06F9/30167Decoding the operand specifier, e.g. specifier format of immediate specifier, e.g. constants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3816Instruction alignment, e.g. cache line crossing

Definitions

  • This invention relates to superscalar microprocessors and more particularly to the predecoding of va ⁇ able bvte-length computer instructions wtthui high performance and high frequency superscalar microprocessors
  • Superscalar microprocessors are capable of attaining performance characteristics which surpass those ot conventional scalar processors by allowing the concurrent execution of multiple instructions Due to the widespread acceptance of the x86 family of microprocessors, efforts have been undertaken be microprocessor manufacturers to develop superscalar microprocessors which execute x86 instructions Such C superscalar microprocessors achieve relatively high performance characteristics while advantageously maintaining backwards compatibility with the vast amount of existing software developed for previous microprocessor generations such as the 8086. 80286. 80386. and 80486
  • the x86 instruction set is relatively complex and is characterized by a plurality of variable bvte- 5 length instructions
  • a generic format illustrative of the x86 instruction set is shown in Figure 1 A
  • an x86 instruction consists of from one to five optional prefix bvtes 102 followed by an operation code (opcode) field 104.
  • the opcode field 104 defines the basic operation for a particular instruction
  • the default operation of a particular opcode may be modified by one or more prefix bytes.
  • a prefix byte may be used to change the address or operand size for an instruction, to ovemde the default segment used in memory addressing, or to instruct the processor to repeat a string operation a number of times
  • the opcode field 104 follows the prefix bytes 102. if any. and may be one or two bytes in iength The addressing mode ( MODRM )
  • Z 5 bvte 106 specifies the registers used as well as memorv addressing modes
  • the scale-index-base (SIB) byte 108 is used only in 32-b ⁇ t base-relative addressing using scale and index factors
  • a base field of the SIB bvt ⁇ specifies which register contains the base value for the address calculation, and an index field specifies whicr register contains the index value
  • a scale field specifies the power of two bv which the index value will be multiplied before being added, along with any displacement, to the base value
  • the next instruction field is
  • the optional displacement field 1 10 which may be from one to four bvtes in length
  • the displacement field 1 10 contains a constant used m address calculations
  • the optional immediate field 1 12. which mav also be from one to four bytes in length, contains a constant used as an instruction operand
  • the 80286 sets a maximum length for an instruction at 10 bytes, while the 80386 and 80486 both allow instruction lengths of up to 15 bvtes
  • FIG. IB several different variable byte-length x86 instruction formats are shown
  • the shortest x86 instruction is only one byte long, and comp ⁇ ses a single opcode byte as shown in format (a)
  • the byte containing the opcode field also contains a register field as shown in formats (b), (c) and (e)
  • Format (j) shows an instruction with two opcode bytes
  • An optional MODRM byte follows opcode bytes in formats (d), (0, (h), and (j)
  • Immediate data follows opcode bvtes in formats (e), (g), (I), and (k), and follows a MODRM byte in formats (f) and (h)
  • Figure 1 C illustrates several possible addressing mode formats (a)-(h) Formats (c), (d), (e), (g), and (h) contain MODRM bvtes with offset (I e . displacement) information
  • An SIB byte is used in formats (0, (g). an (
  • the complexity of the x86 instruction set poses difficulties in implementing high performance x86 compatible superscalar microprocessors
  • One difficulty arises from the fact that instructions must be aligned with respect to the parallel-coupled instruction decoders of such processors before proper decode can be effectuated
  • the x86 instruction set consists of variable byte-length instructions, the start bytes of successive instructions within a line are not necessarily equally spaced, and the number of instructions per line is not fixed As a result, employment of simple fixed-length shifting logic cannot in itself solve the problem of instruction alignment
  • Superscalar microprocessors have been proposed that employ instruction predecodmg techniques to help solve the problem of quickly aligning, decoding and executing a plurality of variable byte-length instructions in parallel
  • a predecoder when instructions are written within the instruction cache from an external mam memory, a predecoder appends several predecode bits (referred to collectively as a predecode tag) to each byte These bits indicate whether the byte is the start and/or end byte of an x86 instruction, the number of microinstructions required to implement the x86 instruction, and the location of opcodes and prefixes
  • the superscalar microprocessor converts each instruction to one or more microinstructions referred to as ROPS
  • the ROPS are similar to RISC instructions in that they are associated with a fixed length and with simple, consistent encodings Since the x86 instructions in the instruction cache are already tagged with predecode bits indicating where instructions start and end and how many
  • a predecode unit which is capable of predecodmg variable byte-length instructions prior to their storage withm an instruction cache
  • the predecode unit is configured to generate a plurality of predecode bits for each mstruction byte
  • the plurality of predecode bits associated with each mstruction byte are collectively referred to as a predecode tag
  • An mstruction alignment unit then uses the predecode tags to dispatch the va ⁇ able byte-length instructions to a plurality of decode units which form fixed issue positions within the superscalar microprocessor
  • the predecode unit generates three predecode bits associated with each byte of mstruction code a ' start" bit, an "end” bit, and a "functional" bit
  • the start bit is set if the associated byte is the first byte of the instruction
  • the end bit is set if the byte is the last byte of the instruction
  • the predecode unit is configured such that the meaning conveved by or associated with the functional bit is dependent both upon its state ( I e whether the functional bit is set or not) and upon the state of the start bit for that byte
  • the meaning of the functional bit may further be dependent upon the status of the start bit of a previous instruction byte
  • the functional bit indicates whether the instruction is a directly decodeable "fast path" instruction or is an MROM instruction (I e . an instruction to be serialized through microcode)
  • the functional bit indicates whether the opcode is the first byte of the mstruction or whether a prefix is the first byte of the instruction If the sta ⁇ bit for the byte is cleared and the byte does not follow a start byte, the functional bit indicates whether the associated byte is either a MODRM or an SIB byte, or is displacement or immediate data
  • the mstruction alignment unit may be implemented with a relatively small number of cascaded levels of logic gates thus accommodating very high frequencies of operation Instruction alignment to decode units may further be accomplished with relatively few pipeline stages
  • the pluralit y of decode units to which the variable byte length instructions are aligned utilize the predecode tags to attain relatively fast decoding of the instructions
  • the predecode unit is configured such that the meaning of the functional bit of a particular predecode tag is dependent upon the status of the start bit. a relatively large amount of predecode information mav be conveyed with a relatively small number of predecode bits This thereby allows a reduction in the size of the instruction cache without compromising performance
  • the decode units know the exact locations of the opcode, displacement, immediate, register, and scale-index bytes Accordingly, no senal scan by the decode units through the mstruction bytes is needed
  • the functional bits allow the decode units to calculate the 8-bit Imear addresses (via adder circuits) expeditiously for use by other subunits within the superscalar microprocessor Accordingly, relatively fast decoding may be attained, and high performance may be accommodated
  • the present invention contemplates a method for predecodmg variable bvte length instructions within a superscalar microprocessor comprising the steps of generating a sta ⁇ bit indicative of whether a byte of an mstruction is a start byte, generatmg an end bit indicative of whether said byte of said instruction is an end byte, and generatmg a functional bit that conveys a meaning dependent upon a value of
  • Figure 1 A is a diagram which illustrates the generic x86 mstruction set format
  • Figure 1 B is a diagram which illustrates several different va ⁇ able byte-length x86 mstruction formats
  • Figure 1C is a diagram which illustrates several possible x86 addressing mode formats
  • Figure 2 is a block diagram of a superscalar microprocessor which includes an mstruction alignment unit to forward multiple instructions to six decode units
  • Figure 3 is a block diagram of the instruction alignment unit and six decode units
  • FIGS. 4A-4C are block diagrams which depict execution of an MROM instruction While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herem be described in detail It should be understood, however, that the drawings and detailed description thereto are not intended to limit ' 5 the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims
  • superscalar microprocessor 200 including a predecode unit 202 which operates accordance with a method of the present invention is shown As illustrated in the embodiment of Figure 2, superscalar microprocessor 200 includes a predecode unit 202 and a branch prediction unit 220 coupled to an instruction cache 204 A prefetch unit 203 is coupled to
  • An instruction alignment unit 206 is coupled between mstruction cache 204 and a plurality of decode units 208A-208F (referred to collectively as decode units 208) Each decode unit 208A- 208F is coupled to a respective reservation station 21 OA-21 OF (referred collectively as reservation stations 210), and each reservation station 210A-210F is coupled to a respective functional unit 212A-212F (referred to collectively as functional units 212) Decode units 208, reservation stations 210.
  • a data cache 224 is finally shown coupled to load/store unit 222, and an MROM unit 209 is shown coupled to instruction alignment unit 206
  • instruction cache 204 is a high speed cache memory provided to temporarily 5 store instructions prior to their dispatch to decode units 208
  • mstruction cache 204 is configured to cache up to 32 kilobytes of instruction code organized in lines of 16 bytes each (where each byte consists of 8 bits)
  • instruction code is provided to instruction cache 204 by prefetching code from a ma memory (not shown) through prefetch unit 203
  • instruction cache 204 For each byte of instruction code, instruction cache 204 further stores a predecode tag associated therewith
  • mstruction 1 0 cache 204 could be implemented in a set-associative, a fully-associative, or a direct-mapped configuration
  • Prefetch unit 203 is provided to prefetch instruction code from the mam memory for storage withm instruction cache 204 In one embodiment, prefetch unit 203 is configured to burst 64-bit wide code from the mam memory into mstruction cache 204 It is understood that a variety of specific code prefetching 5 techniques and algorithms mav be employed bv prefetch unit 203
  • predecode unit 202 As prefetch unit 203 fetches instructions from the main memory, predecode unit 202 generates three predecode bits associated with each byte of instruction code a "start" bit, an end” bit, and a "functional" bit
  • the start bit as well as the end bit of each bvte are indicative of the boundaries of an mstruction
  • the functional bit of each bvte conveys additional information regarding the byte or the instruction such as whether the instruction can be decoded directly by decode units 208 or whether the mstruction must be executed by invoking a microcode procedure controlled by MROM unit 209 (as will be desc ⁇ bed in greater detail below), whether the byte is a MODRM or SIB byte or whether the byte is displacement or immediate data
  • the functional bit may further be employed to indicate the location of an opcode byte It will be appreciated from the following that the encoded meaning of the functional bit of a particular instruction byte is dependent upon the associated start bit
  • Table 1 indicates one encoding of the predecode tags as implemented by predecode unit 202 As indicated within the table, if a given byte is the first byte of an instruction, the start bit for that byte is set by predecode unit 202 as the byte is fetched from mam memory and stored within mstruction cache 204 If the byte is the last byte of an mstruction, the end bit for that byte is set If a particular mstruction cannot be directly decoded by the decode units 208. the functional bit associated with the first byte of the mstruction is 5 set On the other hand, if the instruction can be directly decoded by the decode units 208.
  • the functional bit associated with the first byte of the instruction is cleared
  • the functional bit for the second byte of a particular instruction is cleared if the opcode is the first byte, and is set if the opcode is the second byte It is noted that in situations where the opcode is the second byte, the first byte is a prefix byte
  • the functional bit values for instruction byte numbers 3-8 indicate whether the byte is a MODRM or an SIB byte, as well as 0 whether the byte contains displacement or immediate data
  • the predecode unit 202 of superscalar microprocessor 200 is configured to generate a functional bit for each byte of mstruction code
  • the meanmg of the functional bit is dependent upon the value of the start bit associated with that byte
  • the meanmg of the functional bit is further dependent upon the value of the start bit associated with a previous instruction byte.
  • the functional bit indicates whether the mstruction is a directly decodeable instruction or an MROM mstruction (desc ⁇ bed
  • the start bit for that byte is set. If the start bit associated with a particular byte of mstruction code is cleared and immediately follows a byte of mstruction code in which the start bit was set. the functional bit mdicates whether the opcode is the first byte or whether a prefix is the first byte Still further, if the start bit for a byte of mstruction code is cleared and the previous byte's start bit was also cleared, the functional bit mdicates whether the byte is a MODRM or SIB byte, or whether the byte is displacement or
  • a predecode tag is generated which is associated with each byte of instruction code
  • predecode tags and the instruction code are stored withm instruction cache 204 for subsequent processing by the superscalar microprocessor. Since the meaning of the functional bit is dependent upon the start bit of a particular byte and upon the start bits of previous bytes, a relatively large amount of predecode information can be conveyed to the instruction alignment unit 206 and to decode units 208 to attain relatively fast alignment and decode of instructions. Since the number of bits required withm the predecode tag is
  • the required size of the instruction cache 204 may be reduced without compromisemg performance
  • the decode units know the exact locations of the opcode, displacement, immediate, register, and scale-index bytes. Accordingly, no serial 0 scan by the decode units through the mstruction bytes is needed.
  • the functional bits ailow the decode units to calculate the 8-bit linear addresses (via adder circuits) expeditiously for use by other subunits withm the superscalar microprocessor Accordingly, relatively fast decoding may be attained, and high performance may be accommodated.
  • certain instructions within the x86 instruction set may be directly decoded by decode unit 208. These instructions are referred to as “fast path” instructions.
  • the remaining instructions of the x86 instruction set are referred to as “MROM instructions” MROM instructions are executed by invoking MROM unit 209. When an MROM mstruction is encountered. MROM unit 209 parses and serializes the instruction into a subset of defined fast path instructions to effectuate a desired operation
  • MROM instructions A listing of exemplary x86 instructions categorized as fast path mstrucuons as well as a description of the manner of handling both fast path and MROM instructions will be provided further below
  • Instruction alignment unit 206 is provided to channel or "funnel" variable bvte-length instructions from instruction cache 204 to fixed issue positions formed by decode units 208A-208F As will be desc ⁇ bed in conjunction with Figures 3-5.
  • instruction alignment unit 206 is configured to channel instruction code to designated decode units 208A-208F depending upon the locations of the start bvtes of instructions withm a Ime as delineated by instruction cache 204
  • the particular decode unit 208A-208F to which a given instruction may be dispatched is dependent upon both the location of the start bvte of that instruction as well as the location of the previous instruction's start bvte. if anv Instructions starting at certain bvte locations may further be restricted for issue to only one predetermined issue position Specific details follow
  • each of the decode units 208 includes decoding circuitry for decoding the predetermined fast path instructions referred to above
  • each decode unit 208A-208F routes displacement and immediate data to a corresponding reservation station unit 210A-210F
  • Output signals from the decode units 208 include bit-encoded execution instructions for the functional units 212 as well as operand address information, immediate data and/or displacement data
  • the superscalar microprocessor of Figure 2 supports out of order execution, and thus employs reorder buffer 216 to keep track of the original program sequence for register read and write operations, to implement register renaming, to allow for speculative instruction execution and branch misprediction recovery, and to facilitate precise exceptions As will be appreciated by those of skill in the art.
  • Reorder buffer 216 may be implemented m a first-in- first-out configuration wherein speculative results move to the "bottom" of the buffer as they are validated and written to the register file, thus making room for new entries at the "top” of the buffer.
  • Other specific configurations of reorder buffer 216 are also possible as will be described further below If a branch prediction is incorrect the results of speculativeiy-executed instructions along the mispredicted path can be invalidated in the buffer before thev are written to register file 218
  • each reservation station unit 210A-210F is capable of holding mstruction information (i e . bit encoded execution bits as well as operand values, operand tags and/or immediate data) for up to three pending instructions awaiting issue to the corresponding functional unit It is noted that for the embodiment of Figure 2. each decode unit 208A-208F is associated with a dedicated reservation station unit 210A-210F. and that each 5 reservation station unit 210A-210F is similarly associated with a dedicated functional unit 212A-212F
  • decode units 208 six dedicated "issue positions" are formed by decode units 208.
  • reservation station units 210 and functional units 212 Instructions aligned and dispatched to issue position 0 through decode unit 208 A are passed to reservation station unit 210A and subsequently to functional unit 212A for execution Similarly, instructions aligned and dispatched to decode unit 208B are passed to reservation station unit 0 210B and into functional unit 212B. and so on
  • register address information is routed to reorder buffer 216 and register file 218 simultaneously .
  • the x86 register file includes eight 32 bit real registers (l e., typically referred to as EAX, 5 EBX. ECX. EDX. EBP, ESI.
  • Reorder buffer 216 contains temporary storage locations for results which change the contents of these registers to therebv allow out of order execution
  • a temporary storage location of reorder buffer 216 is reserved for each instruction which, upon decode, modifies the contents of one of the real registers Therefore, at various points during execution of a particular program, reorder buffer 216 may have one or more locations which contain the speculatively 0 executed contents of a given register If following decode of a given mstruction it is determined that reorder buffer 216 has previous locat ⁇ on(s) assigned to a register used as an operand m the given instruction the reorder buffer 216 forwards to the corresponding reservation station either * 1 ) the value in the most recently assigned location, or 2) a tag for the most recently assigned location if the value has not yet been produced by the functional unit that will eventually execute the previous instruction.
  • the operand value (or tag) is provided from reorder buffer 216 rather than from register file 218 If there is no location reserved for a required register in reorder buffer 16, the value is taken directly from register file 218 If the operand corresponds to a memory location, the operand value is provided to the reservation station unit through load/store unit 222
  • Reservation station units 210A-210F are provided to temporarily store instruction information to be speculatively executed by the corresponding functional units 212A-212F As stated previously, each reservation station unit 210A-210F mav store instruction information for up to three pending instructions Each of the six reservation stations 210A-210F contain locations to store bit-encoded execution instructions to be speculatively executed by the corresponding functional unit and the values of operands If a particular operand is not available, a tag for that operand is provided from reorder buffer 216 and is stored within the correspondmg reservation station until the result has been generated (l e . by completion of the execution of a previous mstruction) It is noted that when an instruction is executed by one of the functional units 212A-
  • each of the functional units 212 is configured to perform integer arithmetic operations of addition and subtraction, as well as shifts, rotates, logical operations, and branch operations It is noted that a floating point unit (not shown) may also be employed to accommodate floating point operations
  • Each of the functional units 212 also provides information regarding the execution of conditional branch instructions to the branch prediction unit 220 If a branch prediction was mcorrect.
  • branch prediction unit 220 flushes instructions subsequent to the mispredicted branch that have entered the instruction processing pipeline, and causes prefetch/predecode unit 202 to fetch the required instructions from instruction cache 204 or main memory It is noted that in such situations, results of instructions in the original program sequence which occur after the mispredicted branch instruction are discarded, including those which were speculatively executed and temporarily stored in load/store unit 222 and reorder buffer 216 Exemplary configurations of suitable branch prediction mechanisms are well known
  • Results produced by functional units 212 are sent to the reorder buffer 216 if a register value is being updated, and to the load/store unit 222 if the contents of a memory location is changed If the result is to be stored in a register, the reorder buffer 216 stores the result in the location reserved for the value of the register when the instruction was decoded As stated previously, results are also broadcast to reservation station units 21 OA-21 OF where pending instructions may be waiting for the results of previous mstruction executions to obtain the required operand values Generally speaking, load/store unit 222 provides an interface between functional units 212A-212F and data cache 224. In one embodiment, load/store unit 222 is configured with a store buffer with eight storage locations for data and address information for pending loads or stores.
  • Functional units 212 arbitrate for access to the load/store unit 222. When the buffer is full, a functional unit must wait until the load/store ' 5 unit 222 has room for the pending load or store request information.
  • the load/store unit 222 also performs dependency checking for load instructions against pending store instructions to ensure that data coherency is maintained.
  • Data cache 224 is a high speed cache memory provided to temporarily store data being transferred 10 between load/store unit 222 and the main memory subsystem.
  • data cache 224 has a capacity of storing up to eight kilobytes of data. It is understood that data cache 224 may be implemented in a variety of specific memory configurations, including a set associative configuration.
  • Figure 3 is a block diagram which depicts internal portions of one embodiment of instruction alignment unit 206 as well as internal portions of decode units 208A-208F with respect to a line of instruction code to be provided from instruction cache 204.
  • instruction alignment unit 206 is configured to channel variable byte-length instructions (in this case certain x86 instructions referred to as fast path instructions) to decode units 208A-208F.
  • a latching unit 302 is incorporated as a portion of an output buffer section 301 of instruction cache 204.
  • Latching unit 302 is capable of storing a line of instruction code provided from a storage array (not shown in Figure 3) of instruction cache 204 prior to being dispatched to decode units 208.
  • the instruction alignment unit 206 of Figure 3 includes a plurality of multiplexer circuits referred to as multiplexer channels 304A-304G coupled between latching unit 302 and decode units 208.
  • a multiplexer control circuit 306 is further shown coupled to each multiplexer channel 304A-304G.
  • each decode unit 208A-208F includes an associated instruction decoder 318A-318F having an input port 30 coupled to a respective multiplexer channel 304A-304F.
  • Each decode unit 208A-208F further includes a respective displacement/immediate data buffer 330A-330F and a respective instruction issue unit 340A- 340F.
  • a line of instruction code to be executed is provided to latching unit 302 from the 35 storage array of instruction cache 204.
  • Each byte of instruction code within instruction cache 204 is associated with a corresponding predecode tag including a start bit. an end bit. and a functional bit.
  • the predecode tag associated with each byte is provided to an input of multiplexer control circuit 306.
  • multiplexer control circuit 306 controls multiplexer channels 304A-304G such that the instruction bytes are selectively routed to designated instruction decoders 318A-318F.
  • Instruction paths formed by decode units 208A-208F are referred to as issue positions.
  • the channeling of instruction code through multiplexer channels 304A-304G is dependent upon the location of the start byte associated with each instruction relative to each line as delineated by latching unit 302.
  • each of the first five multiplexer channels 304A-304F routes four contiguous bytes of instruction code from latchmg unit 302 to a respective instruction decoder 318A-318F.
  • Multiplexer channel 304G is capable of channeling up to three contiguous bytes of instruction code to instruction decoder 318.
  • Table 2 below illustrates the possible multiplexer channels 304A-304G through which sta ⁇ bytes may be channeled. As stated previously, the channeling of instruction code is dependent upon the location(s) of sta ⁇ bytes within a given line. It is noted that each multiplexer channel 304A-304F is configured to route the lowest-order start byte among those allocated to it, provided the sta ⁇ byte has not been selected for routing by a lower order multiplexer channel.
  • multiplexer channel 304A is capable of routing sta ⁇ bytes located at byte positions 0-2 to decode unit 318A.
  • Multiplexer channel 304B is capable of routing sta ⁇ bytes at byte positions 1 -4 to decode unit 318B.
  • Multiplexer channel 304C is capable of transfemng sta ⁇ bytes at byte positions 3-8 to decode unit 208C.
  • multiplexer channel 304D is capable of transfemng sta ⁇ bytes at byte positions 6-10 to decode unit 208D
  • multiplexer channel 304E is capable of transfemng start bytes at byte positions 9-12 to decode unit 208E.
  • multiplexer channel 304F is capable of fe rin sta ⁇ b tes at b te ositions 12-15 to decode unit 318F.
  • Sta ⁇ b tes located at b te ositions 13- 15 may alternatively be routed through multiplexer channel 304G to a seventh issue position which is empioved to wrap bytes of an incomplete instruction (i.e . an instruction which extends into the next line) to the next cache line for decode
  • instruction bvtes routed through multiplexer channel 304G are provided to instruction decoder 304A upon the next clock cycle when the ⁇ 5 remaining bytes of that instruction are available withm latching unit 302
  • the dispatch of the instruction to a designated position is dependent upon the nature of the remaining bytes of the mstruction that appear on the next line
  • - 0 that immediate or displacement data is provided to displacement/immediate data buffer 330F through multiplexer channel 304A
  • the preceding bytes of that instruction (which appear on the preceding cache l e) will have been dispatched to mstruction decoder 318F during the preceding clock cycle
  • the instruction mformation from the previous lme is routed through multiplexer channel 5 304G to instruction decoder 318A, and is merged with the rest of the instruction code during the next clock cycle
  • the number of cascaded levels of logic required to implement the 0 instruction alignment unit 206 may be advantageously reduced Furthermore, by restricting the dispatch of an instruction having a sta ⁇ byte which resides at one of a select subset of byte locations withm a line to a single issue position (I e . byte positions 5 and 1 1 ), the number of cascaded levels of logic for instruction alignment may be reduced even further Accordingly, the instruction alignment unit 206 as described above allows the implementation of a superscalar microprocessor having a relatively small number of gates per
  • the defined fast path instructions may be up to eight bytes in length, and may include a s gle prefix 0 byte It is noted that by limiting the defined fast path instructions to only a smgle prefix byte, it is possible that bytes 4 through 7. if any.
  • the instruction decoder of the issue position fi e , instruction decoder receiving the remaining bytes of the instruction detects the absence of a start bit at its first-byte position, and accordingly passes the data to the displacement/immediate data buffer 330 of the preceding issue position and issues a NOOP instruction
  • a sta ⁇ byte of an mstruction is located at byte position 0 of latchmg unit 302.
  • bvte is provided to decode unit 208A along with the next three contiguous bvtes residm at byte positions 1 , 2, and 3 If the next start byte resides at position 2 (I e , first instruction was two bytes in length), bvtes 2-5 are routed through multiplexer channel 304B to decode unit 208B For the embodiment of Figure 3.
  • each instruction decoder 318A-318F is capable of decoding only one instruction at a time Accordmgly, although the start bytes of more than one mstruction may be provided to, for example, mstruction decoder 318 A, only the first mstruction is decoded Bytes beyond the first end byte, correspondmg to additional instructions withm a given instrucnon decoder, are extraneous and are effectively ignored It is noted that the multiplexer channels 304 of instruction alignment unit 206 could be alternatively configured such that onlv a smgle instruction (or portions thereof), in accordance with the instruction's start and end predecode bits, are channeled to a given mstruction decoder 318
  • multiplexer channel 304G routes the preceding portions of the instruction to instruction decoder 318 A, in which case the next instruction (correspondmg to the first sta ⁇ ⁇ byte withm latching unit 302 during the next clock cycle) will be routed through multiplexer channel 304B to instruction decoder 318B
  • a sample sequence of x86 instructions is shown in Table 3 below. Instructions 1 through 7 in addition to the first byte of instruction 8 are shown within cache line 1. Cache line 2 begins with the second byte of instruction 8, and further includes instructions 9 through 16.
  • Table 4 illustrates the manner in which the above sequence of instructions in Table 3 are dispatched to the decode units 208A-208F by instruction alignment unit 206.
  • Instructions 1-5 are dispatched to issue positions 0-4 corresponding to decode units 318A-318E, respectively, during a first clock cycle.
  • Instruction 6. which begins at byte position 1 1 of latchmg unit 302, can onlv be channeled to issue position 4 correspondmg to decode unit 318E However, smce issue position 4 is already occupied by mstruction 5, mstruction 6 cannot be dispatched dunng this cycle Accordmgly, multiplexer control circuit 306 causes decode unit 318F to issue a NOOP (no operation) instruction dunng the decode stage when instructions 1 -4 are decoded
  • multiplexer control circuit 306 causes decode units 318A-318D to issue NOOP instructions Smce mstruction 8 wraps around to the next cache lme, the first byte of the mstruction is wrapped around to instruction decoder 318 dunng the next clock cycle through multiplexer channel 304G
  • mstruction 8 is dispatched to issue position 0 It is noted that the first byte of instruction 8 is wrapped around from byte position 15 of the previous lme Instructions 9 and 10 are further dispatched to issue positions 1 and 2 through multiplexer channels 304B and 304C, respectively Upon decode of mstructions 8-10, mstruction issue units 340D-E cause NOOP mstructions to be issued
  • Instructions 11 and 12 are dispatched to issue positions 2 and 3 dunng clock cycle 4
  • Instruction 13 begms in byte 7, and cannot be routed to issue position 4 Therefore, the dispatch of instruction 13 must be held until the next clock cycle
  • predecode unit 202 is configured such that when a predesignated MROM instruction is encountered, the functional bit associated with the first byte of the instruction is set This condition is readily detectable by MROM unit 209 to effectuate senaiization of the mstruction as will be descnbed further below
  • MROM unit 209 provides series of fast path instructions to the decode units 208 through instruction alignment unit 206 m accordance with the microcode for that particular MROM instruction. Once all of the microcoded instructions have been dispatched to decode units 208 through alignment unit 206 to effectuate the desired MROM operation, the instructions which followed the MROM instruction are allowed to be dispatched.
  • Table 5 illustrates a sample of x86 assembly language code segment containing an MROM instruction (REP MOVSB).
  • Figures 4A-4C are block diagrams of portions of superscalar processor 200 depicting the dispatch and decode of the instructions of Table 5 during consecutive clock cycles.
  • the first two instructions (MOVE CX, S_LEN and CLD) are routed through multiplexer channels 304A and 304B to issue positions 0 and 1 (i.e., decode units 318A and 318B).
  • decode MROM unit 209 Upon decode MROM unit 209 further causes decode units 208C-208F to issue NOOP instructions.
  • Microcoded instructions that effectuate the REP MOVSB instruction are dispatched during cycles 2 25 through N. as depicted by Figure 4B. During these cycles, a set of fast path instructions in accordance with the microcode stored in MROM unit 209 are dispatched through the instruction alignment unit 206 to decode units 208A-208F. It is noted that this MROM sequence may take several cycles to complete.
  • MROM unit 209 causes decode units 208A-208C issue NOOP instructions.
  • Figures 2-4 is configured to selectively route instructions to the specific issue positions indicated by Table 2, other configurations are also possible. That is, the specific issue position or positions to which a given instruction within a line of memory is dispatched may be varied from that described above. It is further specifically contemplated that the number of issue positions provided within a superscalar microprocessor
  • o employing a decode unit in accordance with the invention may also vary.
  • Other configurations of an instruction alignment unit for providing instructions to the parallel decode units are also possible, and other configurations of the decode units are possible.
  • predecode unit 202 may vary from that indicated in Table 1.
  • the specific meanings conveyed by a particular combination of the values of the start bit and functional bit of a particular byte of instruction code may be different from the specific meanmg indicated within Table 1.
  • the instruction alignment unit 206 and decode units 208 in the embodiment described above are configured to directly transfer and decode certain raw x86 instructions (i.e., fast path instructions)
  • implementations of a superscalar microprocessor are also possible wherein an instruction alignment unit is configured to translate a raw x86 instruction into one or more fixed length instructions, such as ROPs. In such a configuration, a plurality of decode units would be configured to receive and decode the translated instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

L'invention concerne un microprocesseur superscalaire comportant une unité de prédécodage configurée pour prédécoder des instructions à longueur d'octet variable avant leur mémorisation au sein d'une mémoire cache d'instructions. L'unité de prédécodage est configurée pour générer une pluralité de bits de prédécodage pour chaque octet d'instruction. Cette pluralité de bits de prédécodage associés à chaque octet d'instruction sont désignés collectivement sous le nom de clef de prédécodage. Une unité d'alignement d'instructions utilise alors les clefs de prédécodage pour acheminer les instructions à longueur d'octet variable simultanément vers une pluralité d'unités de décodage formant des positions d'émission fixes au sein du microprocesseur superscalaire. Les informations acheminées par les bits fonctionnels permettent aux unités de décodage de détecter les emplacements exacts des octets de code opération, de déplacement, d'opérande direct, de registre, et d'indice d'échelle. En conséquence, aucune analyse sérielle des octets d'instructions par les unités de décodage n'est nécessaire. En outre, les bits fonctionnels permettent aux unités de décodage de calculer promptement des adresses linéaires (par l'intermédiaire du circuit additionneur) destinées à être utilisées par d'autres sous-unités au sein du microprocesseur superscalaire. En conséquence, un décodage relativement rapide peut être obtenu, et des performances élevées peuvent être réalisées.
PCT/US1996/011757 1996-07-16 1996-07-16 Procede et appareil pour predecoder des instructions a longueur d'octet variable dans un microprocesseur superscalaire WO1998002797A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP96925323A EP0912923A1 (fr) 1996-07-16 1996-07-16 Procede et appareil pour predecoder des instructions a longueur d'octet variable dans un microprocesseur superscalaire
PCT/US1996/011757 WO1998002797A1 (fr) 1996-07-16 1996-07-16 Procede et appareil pour predecoder des instructions a longueur d'octet variable dans un microprocesseur superscalaire
JP50595198A JP3732233B2 (ja) 1996-07-16 1996-07-16 スーパースカラマイクロプロセッサ内で可変バイト長命令をプリデコードするための方法および装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US1996/011757 WO1998002797A1 (fr) 1996-07-16 1996-07-16 Procede et appareil pour predecoder des instructions a longueur d'octet variable dans un microprocesseur superscalaire

Publications (1)

Publication Number Publication Date
WO1998002797A1 true WO1998002797A1 (fr) 1998-01-22

Family

ID=22255460

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1996/011757 WO1998002797A1 (fr) 1996-07-16 1996-07-16 Procede et appareil pour predecoder des instructions a longueur d'octet variable dans un microprocesseur superscalaire

Country Status (3)

Country Link
EP (1) EP0912923A1 (fr)
JP (1) JP3732233B2 (fr)
WO (1) WO1998002797A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014006918A (ja) * 2013-08-30 2014-01-16 Renesas Electronics Corp データプロセッサ
US9116688B2 (en) 2008-09-09 2015-08-25 Renesas Electronics Corporation Executing prefix code to substitute fixed operand in subsequent fixed register instruction

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138331A1 (en) * 2003-12-22 2005-06-23 Alberola Carl A. Direct memory access unit with instruction pre-decoder
US11204768B2 (en) 2019-11-06 2021-12-21 Onnivation Llc Instruction length based parallel instruction demarcator

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0498654A2 (fr) * 1991-02-08 1992-08-12 Fujitsu Limited Antémémoire à traitement des données d'instruction et processeur de données comprenant une telle antémémoire
GB2263987A (en) * 1992-02-06 1993-08-11 Intel Corp End bit markers for instruction decode.
EP0651322A1 (fr) * 1993-10-29 1995-05-03 Advanced Micro Devices, Inc. Antémémoires d'instructions pour instructions à longueur de bytes variable
WO1996010783A1 (fr) * 1994-09-30 1996-04-11 Intel Corporation Decodeur de longueur d'instructions pour instructions de longueur variable

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0498654A2 (fr) * 1991-02-08 1992-08-12 Fujitsu Limited Antémémoire à traitement des données d'instruction et processeur de données comprenant une telle antémémoire
GB2263987A (en) * 1992-02-06 1993-08-11 Intel Corp End bit markers for instruction decode.
EP0651322A1 (fr) * 1993-10-29 1995-05-03 Advanced Micro Devices, Inc. Antémémoires d'instructions pour instructions à longueur de bytes variable
WO1996010783A1 (fr) * 1994-09-30 1996-04-11 Intel Corporation Decodeur de longueur d'instructions pour instructions de longueur variable

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9116688B2 (en) 2008-09-09 2015-08-25 Renesas Electronics Corporation Executing prefix code to substitute fixed operand in subsequent fixed register instruction
JP2014006918A (ja) * 2013-08-30 2014-01-16 Renesas Electronics Corp データプロセッサ

Also Published As

Publication number Publication date
EP0912923A1 (fr) 1999-05-06
JP3732233B2 (ja) 2006-01-05
JP2000515274A (ja) 2000-11-14

Similar Documents

Publication Publication Date Title
US5758114A (en) High speed instruction alignment unit for aligning variable byte-length instructions according to predecode information in a superscalar microprocessor
JP3794917B2 (ja) 分岐予測を迅速に特定するための命令キャッシュ内のバイト範囲に関連する分岐セレクタ
US5748978A (en) Byte queue divided into multiple subqueues for optimizing instruction selection logic
US6049863A (en) Predecoding technique for indicating locations of opcode bytes in variable byte-length instructions within a superscalar microprocessor
US5600806A (en) Method and apparatus for aligning an instruction boundary in variable length macroinstructions with an instruction buffer
US5537629A (en) Decoder for single cycle decoding of single prefixes in variable length instructions
US5586277A (en) Method for parallel steering of fixed length fields containing a variable length instruction from an instruction buffer to parallel decoders
JP5424653B2 (ja) 複数の命令セットの命令プリデコード
US5850532A (en) Invalid instruction scan unit for detecting invalid predecode data corresponding to instructions being fetched
JP3803723B2 (ja) 分岐予測を選択する分岐セレクタを採用する分岐予測機構
US20060174089A1 (en) Method and apparatus for embedding wide instruction words in a fixed-length instruction set architecture
US5968163A (en) Microcode scan unit for scanning microcode instructions using predecode data
US5872947A (en) Instruction classification circuit configured to classify instructions into a plurality of instruction types prior to decoding said instructions
EP1049970B1 (fr) Prediction de branchement a bits de selection de retour permettant de categoriser le type de prediction de branchement
US5987235A (en) Method and apparatus for predecoding variable byte length instructions for fast scanning of instructions
US5835744A (en) Microprocessor configured to swap operands in order to minimize dependency checking logic
US5852727A (en) Instruction scanning unit for locating instructions via parallel scanning of start and end byte information
US5778246A (en) Method and apparatus for efficient propagation of attribute bits in an instruction decode pipeline
US5991869A (en) Superscalar microprocessor including a high speed instruction alignment unit
WO1998002797A1 (fr) Procede et appareil pour predecoder des instructions a longueur d'octet variable dans un microprocesseur superscalaire
US5940602A (en) Method and apparatus for predecoding variable byte length instructions for scanning of a number of RISC operations
US5898851A (en) Method and apparatus for five bit predecoding variable length instructions for scanning of a number of RISC operations
EP0896700A1 (fr) Microprocesseur superscalaire comportant une unite d'alignement des instructions a hautes performances
EP0912925B1 (fr) Structure de pile d'adresses de retour et microprocesseur superscalaire comportant cette structure
KR100448676B1 (ko) 슈퍼스칼라 마이크로프로세서 내에서 가변 바이트 길이 명령어들을 프리디코딩하는 방법 및 장치

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1996925323

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1019997000338

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 1996925323

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1019997000338

Country of ref document: KR

WWG Wipo information: grant in national office

Ref document number: 1019997000338

Country of ref document: KR

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载