WO2019046723A1 - Adressage relatif de pointeur global implicite pour accès à la mémoire globale - Google Patents
Adressage relatif de pointeur global implicite pour accès à la mémoire globale Download PDFInfo
- Publication number
- WO2019046723A1 WO2019046723A1 PCT/US2018/049099 US2018049099W WO2019046723A1 WO 2019046723 A1 WO2019046723 A1 WO 2019046723A1 US 2018049099 W US2018049099 W US 2018049099W WO 2019046723 A1 WO2019046723 A1 WO 2019046723A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- register
- instructions
- operand
- identify
- Prior art date
Links
- 230000015654 memory Effects 0.000 title claims description 251
- 238000000034 method Methods 0.000 claims abstract description 88
- 238000006073 displacement reaction Methods 0.000 claims description 77
- 238000012545 processing Methods 0.000 claims description 68
- 238000004590 computer program Methods 0.000 claims description 23
- 238000010586 diagram Methods 0.000 description 44
- 238000004519 manufacturing process Methods 0.000 description 41
- 230000006870 function Effects 0.000 description 20
- 230000009249 intrinsic sympathomimetic activity Effects 0.000 description 20
- 230000008569 process Effects 0.000 description 17
- 230000004044 response Effects 0.000 description 9
- 238000012546 transfer Methods 0.000 description 7
- 101150045567 GP16 gene Proteins 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 101000874179 Homo sapiens Syndecan-1 Proteins 0.000 description 3
- 101100478997 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) SWC3 gene Proteins 0.000 description 3
- 102100035721 Syndecan-1 Human genes 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012993 chemical processing Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005389 semiconductor device fabrication Methods 0.000 description 1
- 108020001572 subunits Proteins 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- VLCQZHSMCYCDJL-UHFFFAOYSA-N tribenuron methyl Chemical compound COC(=O)C1=CC=CC=C1S(=O)(=O)NC(=O)N(C)C1=NC(C)=NC(OC)=N1 VLCQZHSMCYCDJL-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/3016—Decoding the operand specifier, e.g. specifier format
- G06F9/30163—Decoding the operand specifier, e.g. specifier format with implied specifier, e.g. top of stack
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30101—Special purpose registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/342—Extension of operand address space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/355—Indexed addressing
Definitions
- This application relates generally to memory access and more particularly to implicit global pointer relative addressing for global memory access.
- Example addressing modes include, but are not limited to, displacement addressing, program counter (PC) relative addressing, constant pool addressing, and global pointer (GP) relative addressing.
- PC program counter
- GP global pointer
- the first instruction may load the most significant 16 bits of the address into the base register and the second instruction may load data from, or store data to, the address generated from the combination of the base register and a specified offset representing the least significant bits of the address.
- ISAs and data processing apparatus and methods related thereto that comprise an instruction set that includes one or more instructions which implicitly identify the GP register as an operand (e.g., base register or source register) of the instruction.
- an operand e.g., base register or source register
- one or more bits of the instruction that were dedicated to explicitly identifying the operand can be used to extend the size of one or more other explicitly identified operands, such as the offset or immediate, to provide longer offsets/immediates.
- a method of decoding instructions comprising: receiving, at a decode unit, an instruction for execution by an execution unit of the data processing apparatus that specifies an operation to be performed, the received instruction being an instruction from an instruction set comprising one or more instructions that implicitly identify a global pointer (GP) register as an operand of the instruction, the GP register storing an address of global memory in which data is stored; decoding, at the decode unit, the received instruction to determine whether the received instruction is one of the one or more instructions that implicitly identify the GP register as an operand of the instruction; and outputting one or more control signals to cause the execution unit to perform the specified operation with the GP register as an operand when the determination is positive.
- GP global pointer
- Embodiments include a data processing apparatus comprising: a register file comprising a GP register, the GP register configured to store an address of global memory in which data is stored; an execution unit; and a decode unit configured to: receive an instruction for execution by the execution unit that specifies an operation to be performed, the received instruction being an instruction from an instruction set comprising one or more instructions that implicitly identify a GP register as an operand of the instruction; determine whether the received instruction is one of the one or more instructions that implicitly identify the GP register as an operand of the instruction; and output one or more control signals to cause the execution unit to perform the specified operation with the GP register as an operand when the determination is positive.
- the data processing apparatus may be embodied in hardware on an integrated circuit. There may be provided a method of manufacturing, at an integrated circuit manufacturing system, the data processing apparatus. There may be provided an integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, configures the system to manufacture the data processing apparatus. There may be provided a non-transitory computer readable storage medium having stored thereon a computer readable description of a data processing apparatus that, when processed in an integrated circuit manufacturing system, causes the integrated circuit manufacturing system to manufacture an integrated circuit embodying the data processing apparatus.
- the data processing apparatus may be implemented as part of and/or referred to as a processor, a processor chip, a processor module, and so on.
- FIG. 1 is a schematic diagram of a known implementation of an ISA that uses displacement addressing
- FIG. 2 is a schematic diagram of a known implementation of an ISA that uses constant pool addressing
- FIG. 3 is a schematic diagram of a known implementation of an ISA that uses constant pool addressing with common base addresses;
- FIG. 4 is a schematic diagram of a known implementation of an ISA that uses GP relative addressing
- FIG. 5 is a block diagram of an example data processing apparatus configured to implement an ISA with an instruction set comprising one or more instructions that implicitly identify the GP register an operand of the instruction;
- Fig. 6 is a schematic diagram illustrating use of the GP register of Fig. 5 for position independent code
- Fig. 7 is a schematic diagram illustrating use of the GP register of Fig. 5 for non-position-independent code
- Fig. 8 is a flow diagram of an example method of decoding instructions
- FIG. 9 is a schematic diagram of an example 32-bit format of a load byte instruction that implicitly identifies the GP register as the base register;
- Fig. 10 is a schematic diagram of an example 32-bit format of a load byte unsigned instruction that implicitly identifies the GP register as the base register;
- FIG. 11 is a schematic diagram of an example 32-bit format of a load half word instruction that implicitly identifies the GP register as the base register;
- Fig. 12 is a schematic diagram of an example 32-bit format of a load half word unsigned instruction that implicitly identifies the GP register as the base register;
- Fig. 13 is a schematic diagram of an example 32-bit format of a load word instruction that implicitly identifies the GP register as the base register;
- Fig. 14 is a schematic diagram of an example 32-bit format of a load double instruction that implicitly identifies the GP register as the base register;
- Fig. 15 is a schematic diagram of an example 32-bit format of a store byte instruction that implicitly identifies the GP register as the base register;
- Fig. 16 is a schematic diagram of an example 32-bit format of a store half word instruction that implicitly identifies the GP register as the base register;
- Fig. 17 is a schematic diagram of an example 32-bit format of a store word instruction that implicitly identifies the GP register as the base register;
- Fig. 18 is a schematic diagram of an example 32-bit format of a store double instruction that implicitly identifies the GP register as the base register;
- Fig. 19 is a schematic diagram of an example 32-bit format of a load word floating point instruction that implicitly identifies the GP register as the base register;
- Fig. 20 is a schematic diagram of an example 32-bit format of a store word floating point instruction that implicitly identifies the GP register as the base register;
- Fig. 21 is a schematic diagram of an example 32-bit format of a load double floating point instruction that implicitly identifies the GP register as the base register;
- Fig. 22 is a schematic diagram of an example 32-bit format of a store double floating point instruction that implicitly identifies the GP register as the base register;
- Fig. 23 is a schematic diagram of an example 16-bit format of a load word instruction that implicitly identifies the GP register as the base register;
- Fig. 24 is a schematic diagram of an example 16-bit format of a store word instruction that implicitly identifies the GP register as the base register;
- Fig. 25 is schematic diagram of an example 32-bit format of an add immediate (byte) instruction that implicitly identifies the GP register as the source register;
- Fig. 26 is a schematic diagram of an example 32-bit format of an add immediate (word) instruction that implicitly identifies the GP register as the source register;
- Fig. 27 is a schematic diagram of an example 48-bit format of an add immediate instruction that implicitly identifies the GP register as the source register;
- Fig. 28 is a schematic diagram of an example 32-bit format of an add immediate program counter instruction that implicitly identifies the program counter as the source register;
- Fig. 29 is a schematic diagram of an example 32-bit format of a compute aligned program counter instruction that implicitly identifies the program counter as the source register;
- Fig. 30 is a schematic diagram of an example 48-bit format of a load word program counter instruction that implicitly identifies the program counter as the base register;
- Fig. 31 is a schematic diagram of an example 48-bit format of a store word program counter instruction that implicitly identifies the program counter as the base register;
- Fig. 32 is a schematic diagram of an example 32-bit format of a load word unsigned instruction that implicitly identifies the GP register as the base register;
- FIG. 33 is a block diagram of an example integrated circuit manufacturing system for generating an integrated circuit embodying the data processing apparatus described herein;
- Fig. 34 is a flow diagram for decoding instructions at a data processing apparatus.
- Fig. 35 is a diagram of a system for decoding instructions at a data processing apparatus.
- Fig. 1 is a schematic diagram illustrating an example implementation of an ISA that uses displacement addressing.
- the following set of executable instructions may be used to load the word at address 0x10010020 into register $v0, where "Ox" indicates a hexadecimal number:
- the first instruction (a load upper immediate instruction) loads 0x1001 into the upper 16 bits of register $s0 and the second instruction (a load word instruction) loads the word at the memory address formed by the contents of register $s0 (0x10010000) and the offset (0x0020) into register $v0.
- a load upper immediate instruction loads 0x1001 into the upper 16 bits of register $s0
- the second instruction loads the word at the memory address formed by the contents of register $s0 (0x10010000) and the offset (0x0020) into register $v0.
- PC-relative addressing addresses are calculated by applying an offset (e.g., a 16-bit offset) to the program counter.
- an offset e.g., a 16-bit offset
- the program counter is used as the base address.
- data is stored separately from the code (or text) of the program and is not within the offset range of the program counter.
- PC-relative addressing allows at least load instructions (and potentially store instructions) to access the section of memory where executable code is stored (e.g., the text or instruction section) which is not scalable to systems that support virtual memory and where the operating system applies page-based protection.
- page-based protection will first enable pages that include executable instructions as execute-only (it may be a security hazard otherwise).
- a load access of such a page will cause an exception requiring the operating system to enable permissions for reads (and potentially writes). This operation is expensive and thus PC-relative addressing is often limited to embedded systems where all code operates in kernel mode and thus is assumed to be trusted.
- constant pool addressing To address at least some of the limitations of PC-relative addressing for global data access, some ISAs have implemented constant pool addressing. In constant pool addressing the addresses of global data values are placed in a constant pool which is located near the memory access instruction (e.g., the addresses of global data values are stored in the text section used to store the executable code). Constant pool addressing, however, also typically requires two instructions per data access: an instruction to load the address from memory, and an instruction to perform a load or store of data at that address.
- FIG. 2 is a schematic diagram illustrating an example implementation of an ISA that uses constant pool addressing.
- the following set of instructions may be used to load the word at address 0x10010020 into register 0 (rO) and to load the word at address 0x10010040 into register 1 (rl):
- the PC register in this particular ISA is read as the program counter plus 8, and so the first instruction loads the word at the address generated from the program counter + 8 + 0x0008 into register 3 (r3), the second instruction loads the word at the memory address formed by the contents of register 3 (r3) (0x10010020) and the offset zero (0x0000) into register 0 (rO), the third instruction loads the word at the address generated from the program counter + 8 + 0x0004 into register 4 (r4), and the fourth instruction loads the word at the memory address formed by the contents of register 4 (r4) (0x10010040) and the offset of zero (0x0000) in register 1 (rl).
- constant pool addressing may be improved by identifying (e.g., by the compiler) common base addresses for a set of memory locations to reduce the number of addresses stored in the constant pool and to reduce the number of address loads performed.
- identifying a single common base address means that a single address is stored in the constant pool yet multiple data items can be accessed via that address.
- FIG. 3 is a schematic diagram illustrating an example implementation of an ISA that uses constant pool addressing with common base addresses.
- the following is a set of instructions to (i) load the word at address 0x10010020 into register rO; and (ii) load the word at address 0x10010040 into register rl wherein the address
- 0x10010020 is stored at 0x000c offset from the first instruction:
- the first instruction loads the word at the address generated from the program counter + 8 + 0x0004 into register 3 (r3)
- the second instruction loads the word at the memory address formed by the contents of register 3 (r3) (0x10010020) and the offset (0x0000) into register 0 (rO)
- the third instruction loads the word at the memory address formed by the contents of register 3 (r3) (0x10010020) and the offset of (0x0020) in register 1 (rl).
- the common address only needs to be loaded once and then can be used in multiple subsequent load instructions to access data near that address. However, this still requires an initial load of the common address from memory and subsequently the number of addresses that can be accessed from the common address is limited by the offset size.
- GP relative addressing a register, referred to as the GP register ($gp) (which may be register 28 in some implementations) is configured to point to an address in global memory.
- the GP register as the base register, the data within the offset range of this memory address can be accessed with a single load or store instruction.
- a single load or store instruction can be used to access the 64KB address space defined by, for example, the 32KB addresses above the GP address and the 32KB addresses below the GP address.
- Fig. 4 is a schematic diagram illustrating an example implementation of an ISA that supports GP relative addressing.
- the GP register is set to
- GP relative addressing is much more flexible than PC relative addressing. Specifically, GP relative addressing allows addresses farther away from the program counter to be more easily and efficiently accessed.
- the compiler is configured to store global variables (with a known size) and constants in a single memory region, and then, at load time, the GP register is set to point to the middle of this region. So long as all these data items together take up no more than the offset range (e.g., 64KB), all of these data items can be accessed with a single instruction via the GP.
- the compiler must know at compile time that a data item will end up being in a location that is within the offset range (e.g., 64KB range) of the GP. In practice, compilers typically cannot guarantee this to be the case.
- the global variables would be stored in this memory region and accessed via offsets to the GP, but typically the size of the global variables well exceeds the offset range (e.g., 64KB range). Accordingly, the usual practice is to put small global data items (e.g., data items that are eight bytes or less) in the GP area, but the small global data items may nonetheless exceed the offset range (e.g., 64KB range). Any global data item that does not fall within the offset range may then require two or more instructions to access.
- the offset range e.g. 64KB range
- the addressing methods implemented by known ISAs result in executable code that comprises multiple instructions to perform a significant number of global data accesses, either because known ISAs only support an addressing mode, such as displacement addressing or constant pool addressing, which requires two or more executable instructions to implement a memory access; or because known ISAs support an addressing mode, such as PC-relative addressing or GP -relative addressing, that allow a region of memory to be accessed via a single executable instruction, but the number of memory addresses that fall in this region is limited and thus there are a significant number of global data items that fall outside this region and therefore require two or more executable instructions to access.
- an addressing mode such as displacement addressing or constant pool addressing
- ISAs, apparatus, and methods related thereto that comprise an instruction set that includes one or more instructions which identify the GP register as an operand of the instruction.
- the identification can be performed wherein the one or more instructions that identify a global pointer register as an operand of the instruction make the identification implicitly.
- This is in contrast to known ISAs that implement GP relative addressing via instructions that explicitly identify the GP register ($gp) as an operand of the instruction.
- the one or more instructions that implicitly identify the GP register as an operand of the instruction may be referred to as the GP relative instructions and may comprise one or more displacement memory access instructions that implicitly identify the GP register as the base register, and/or one or more register arithmetic instructions that implicitly identify the GP register as the source register. It should be noted that a complete suite of GP relative instructions allows unified PIC support for application processors under Linux, and that increased code-density on embedded processors under one common instruction set is possible.
- a displacement memory access instruction (e.g., a displacement store instruction or a displacement load instruction that identifies an address via an offset from a base register) that implicitly identifies the GP register as the base register can have an extended offset range compared to displacement memory access instructions where any register, including the GP register, can be explicitly identified as the base register (which is referred to herein as a generic displacement memory access instruction). For example, if a generic displacement memory access instruction (e.g., displacement load instruction or displacement store instruction) uses sixteen bits to specify an offset and five bits to identify the base register, then a 64KB (2 16 ) range of addresses can be accessed from the base register (e.g., GP register) via the offset.
- a generic displacement memory access instruction e.g., displacement load instruction or displacement store instruction
- the GP register ($gp) is implicitly identified as the base register by, for example, the opcode
- one or more of the five bits that were previously used to identify the base register can be used to extend the offset length so as to extend the range of addresses that can be accessed from the GP via the offset. For example, if all five bits that were previously used to identify the base register are applied to the offset, this would extend the offset to twenty-one bits which extends the range of addresses that can be accessed from the GP via the offset by a factor of 2 5 to 2MB (2 21 ). Extending the range of addresses that are accessible from the GP via the offset reduces the number of global data items in memory that fall outside this range, and thus requires two or more instructions to access.
- a register arithmetic instruction e.g., an instruction that adds an immediate to a value in a source register
- a register arithmetic instruction that implicitly identifies the GP register as the source register
- the GP register ($gp) is implicitly identified as the source register by, for example, the opcode
- one or more of the five bits that were previously used to explicitly identify the source register can be used to extend the immediate field. For example, where all five bits are added to the immediate, the number of possible immediate values is extended by a factor of 2 5 to 2MB; and where only three bits are added to the immediate, the number of possible immediate values is extended by a factor of 2 3 to 262KB. Extending the number of possible immediate values increases the number of addresses that can be calculated from the GP register via a single register arithmetic instruction. This may increase the efficiency and code density of the program.
- code density describes the amount of space that the executable code for a program takes up in memory. This may also be referred to as the "memory footprint" of the program. The denser the code, the less space the code takes up in memory. Conversely, the less dense the code, the more space the code takes up in memory.
- the code density of a program is a function of the ISA
- Code density is particularly important when the program is to be executed by a data processing apparatus with a limited amount of memory, such as a mobile telephone or other embedded systems.
- RISC (reduced instruction set computer) ISAs which generally have a smaller instruction set compared to CISC (complex instruction set computers) ISAs, generally produce programs with poorer code density because RISC ISAs will often require multiple simple instructions to perform the action(s) performed by one complex instruction in a CISC ISA.
- data processing apparatus such as CPUs, that implement CISC ISAs typically run at slower clock speeds than data processing apparatus that implement RISC ISAs because the maximum clock period is dictated by the slowest step of the pipeline and more complex instructions tend to be slower.
- a RISC ISA that includes one or more instructions that implicitly identify the GP register as an operand of the instruction (i.e., GP relative instructions) has proven to produce code that has significantly improved code density. Since global data access is a common operation in most programs, this is particularly true where the one or more instructions that implicitly identify the GP register as an operand of the instruction are memory access instructions. As described above, memory access instructions that use the GP register as the base register allow a significant portion of global data to be accessed via a single memory access instruction.
- the ISA Since the GP-relative instructions of the described ISA are not intended to be used to access data in the section of memory in which executable instructions are stored (e.g., the text or instruction section), the ISA does not suffer from the same problem as traditional PC-relative addressing which is used to access data that is stored in the section of memory in which the executable instructions are stored (e.g., the text or instruction section). Accordingly, the ISA, apparatus and methods described herein allow implementations that scale from real-time operating systems (RTOS) with a fixed memory mapping to systems, such as Linux®, that support virtual memory.
- RTOS real-time operating systems
- FIG. 5 illustrates an example data processing apparatus 500 that implements the modified ISA described herein that comprises an instruction set with one or more instructions that implicitly identify the GP register as an operand of the instruction.
- a data processing apparatus is any device, machine, or dedicated circuit, such as, but not limited to, a processor, computer, or computer system, with processing capability such that it can execute instructions.
- a processor may be any kind of general-purpose or dedicated processor, such as a CPU, GPU, a System-on-chip, a state machine, a media processor, an application-specific integrated circuit (ASIC), a
- ASIC application-specific integrated circuit
- a computer or computer system may comprise one or more processors.
- the data processing apparatus 500 of Fig. 5 comprises a register file 502, a decode unit 504, and an execution unit 506. It will be evident to a person of skill in the art that the data processing apparatus 500 of Fig. 5 may comprise other components that are not shown such as, but not limited to, a fetch unit and input/output interface(s).
- the register file 502 comprises a plurality of registers which can be written to, and read from, by the execution unit 506.
- the plurality of registers comprises a GP register 503, one or more general-purpose registers 505 and/or one or more floating point registers (not shown).
- the GP register 503 is a register that is configured to point to an address in memory that contains data (as opposed to code) to aid in accessing data in memory.
- the GP register may be a dedicated register separate from the general-purpose registers 505, or, as shown in Fig. 5, may be a specific general-purpose register (e.g., register 28 in some systems) which has been defined (e.g., by software convention or the ISA) as the register used to store the GP address.
- How the GP register is set-up and used to access data in memory may differ based on the type of code being run on the data processing apparatus. In particular, the use and contents of the GP register may depend on whether the code is position independent code; or regular, non-position-independent (or position dependent code).
- Position independent code which is used, for example, in Linux® applications, is code that executes properly regardless of its absolute address. Accordingly, PIC code can be executed at any memory address without modification. This differs from regular, non-PIC code which can only be run from a particular memory location.
- Data references from PIC code are typically made indirectly through a global offset table (GOT) which stores the addresses of all accessed global variables and constants.
- PIC functions that access global data typically start by calculating the address of the GOT given the current program counter value.
- the GP register 503 may be configured to store the calculated address of the GOT to allow easy access to the entries of the GOT.
- the GP value is typically only set once at the beginning of a PIC function and is invariant throughout the remainder of the function.
- the GP register 503 may be configured, as shown in Fig. 7, to point to a region of memory 508 used to store global variables and constants.
- Global variables and constants are variables and constants that are globally available, i.e., variables and constants that can be seen by two different calls to the same function and that can be seen by two calls to different functions.
- the region of memory used to store global variables may, in some cases, also be used to store static data. Static data is data that can be seen by two different calls to the same function, but cannot be seen by calls to different functions.
- the GP register may be set, at load-time, to an address at the center of the region of memory used to store global variables and constants.
- the GP may be set to an address at the center of the region of memory used to store global variables and constants since the signed offset allows both the 2 X_1 addresses above the GP and the 2 X_1 addresses below the GP to be accessed via the offset.
- the GP register 503 may be set to point to the beginning or the end of that region of memory 508.
- the GP may be set to an address at the start (bottom) of the region of memory used to store global variables and constants since the unsigned offset allows the 2 X addresses above the GP to be accessed via the offset.
- the GP register may be set to a specific address (e.g., the middle, end or start of the region of memory used to store global variables and constants) once at the beginning of a program.
- the specific address that the GP is set to may be determined at program build-time and may be based on one or more user-specified parameters. For example, the user may configure the data section to be at a specific address.
- the decode unit 504 is configured to receive computer executable instructions representing a program or subroutine that are based on an instruction set comprising one or more instructions that implicitly identify the GP register as an operand of the instruction.
- the computer executable instructions may be provided to the decode unit 504 by a fetch stage (not shown) that is configured to fetch instructions of a program or subroutine (in program/sub-routine order) in memory as indicated by a PC.
- Each instruction identifies an operation or task (e.g., load, store, add, subtract, jump, branch) to be performed and none, one or more than one operand on which the operation is to be performed.
- an operand identifies data that is to be operated on or manipulated by the instruction.
- An operand can be a value within the instruction itself (e.g., an explicitly identified immediate or offset), a register, a memory location or an I/O port.
- the decode unit 504 is configured to decode each received instruction to identify the operation to be performed and the operand(s) of the operation and to output one or more control signals which causes the execution unit 506 to perform the operation identified by the received instruction using the identified operand(s). Outputting the one or more control signals may be referred to herein as providing the decoded instructions to the execution unit 506 for execution.
- the decode unit 504 of Fig. 5 is configured to, in response to receiving an instruction that implicitly identifies the GP register as an operand of the instruction, output one or more control signals that cause the execution unit 506 to perform the identified operation with the GP register as an operand of the instruction.
- the decode unit 504 is configured to, in response to receiving an instruction that explicitly identifies a particular register (e.g., by number) as an operand of the instruction, output one or more control signals to cause the execution unit 506 to perform the specified operation with the explicitly identified register as an operand of the instruction.
- the decode unit 504 may be configured to identify instructions that implicitly identify the GP register as an operand of the instruction based on the bit pattern of the received instruction. For example, each instruction that implicitly identifies the GP register as an operand of the instruction may have a unique recognizable bit pattern (e.g., certain bits of the instruction have a recognizable pattern) that identifies it as an instruction that implicitly identifies the GP register as an operand of the instruction. For example, in some cases each instruction that implicitly identifies the GP register as an operand of the instruction may have a unique opcode. As is known to a person of skill in the art, the opcode of an instruction are the bits of the instruction that identify the type of operation to be performed.
- the one or more instructions that implicitly identify the GP register as an operand of the instruction may comprise one or more displacement memory access instructions that implicitly identify the GP register as the base register.
- a memory access instruction causes data to be read from memory or written to memory.
- Memory access instructions include load instructions and store instructions.
- a load instruction is an instruction that causes the execution unit 506 to read data from an address of memory 508 and store the read data to a register in the register file 502.
- a store instruction is an instruction that causes the execution unit 506 to write data in a register in the register file 502 to an address of memory 508.
- a displacement memory access instruction identifies the address of memory to be read from (in a load) or written to (in a store) through the combination of a base register and an offset.
- some ISAs have introduced GP relative addressing wherein the GP register is used as the base register in displacement addressing. Since the GP register is configured to point to an address of memory (e.g., the GOT address or an address in a region of memory that stores global variables and constants), data at the addresses within the offset range of the GP can be accessed with a single displacement memory access instruction (e.g., a single displacement load or store instruction).
- a single displacement memory access instruction e.g., a single displacement load or store instruction.
- a single load or store instruction can be used to access a 64KB range of addresses from the GP (e.g., the 64KB address space defined by the 32KB addresses above the GP address and the 32KB addresses below the GP address).
- known ISAs implement GP relative addressing via generic load and store instructions where the GP register must be explicitly identified as the base register from the set of all possible registers.
- the range of addresses accessible from the GP is limited by the number of bits allocated to the offset in the generic load/store instructions.
- displacement memory access instructions e.g., load and store instructions
- the offset range for the displacement memory access instructions is extended. This extends the range of addresses that can be accessed by such a displacement memory access instruction. For example, if a generic displacement memory access instruction uses sixteen bits to identify an offset and five bits to identify the base register, then the range of addresses that can be accessed from the GP is 64KB (2 16 ) (e.g., 32KB above the GP and 32KB below the GP). If, however, an instruction implicitly identifies the GP register as the base register by the opcode, for example one or more of the five bits that were previously used to explicitly identify the base register can be used to extend the offset.
- opcode for example one or more of the five bits that were previously used to explicitly identify the base register can be used to extend the offset.
- the one or more displacement memory access instructions (e.g., store instructions and/or load instructions) that implicitly identify the GP register as the base register may include a plurality of displacement memory access instructions that cause different sized data to be loaded from, or stored to, the global memory.
- the one or more displacement memory access instructions that implicitly identify the GP register as the base register may comprise a first load instruction to load data of a first size (e.g., byte) from the global memory and a second load instruction to load data of a second size (e.g., word) from the global memory; and/or a first store instruction to store data of a first size (e.g., byte) to the global memory and a second store instruction to store data of a second size (e.g., word) to the global memory.
- the displacement memory access instructions e.g., store instructions and/or load instructions
- the displacement memory access instructions may include one or more of the following:
- SB store byte
- [GP] - which causes the execution unit to store a byte from a specified general-purpose register to an address in memory based on the GP register and a specified offset
- SH store half [GP] - which causes the execution unit to store a half word (i.e., two bytes) from a specified general-purpose register to an address in memory based on the GP register and a specified offset
- Example 32-bit formats for these instructions are described below with reference to Figs. 9 to 18 and 32.
- Programs typically use various sized data types (e.g., byte, half word (i.e., two bytes), word (i.e., four bytes), double word (i.e., eight bytes)) to store numbers.
- data types e.g., byte, half word (i.e., two bytes), word (i.e., four bytes), double word (i.e., eight bytes)
- the programmer typically uses the smallest data type that is able to cover a desired range of values. Having dedicated instructions that enable processing of data of different sizes allows such programs to be properly executed without having to use additional instructions to convert from one data size to another.
- the one or more displacement memory access instructions that implicitly identify the GP register as the base register include a plurality of displacement memory access instructions that cause different sized data to be loaded from, or stored to, the global memory
- the instructions that relate to different sized data may have different sized offsets (e.g., a different number of offset bits). For example, as shown in Figs. 9-14 the load byte and load half word instructions that implicitly identify the GP register as the base register may have 18 offset bits, whereas the load word instruction that implicitly identifies the GP register may have 21 offset bits.
- the store byte and store half word instructions that implicitly identify the GP register as the base register may have 18 offset bits
- the store word instruction that implicitly identifies the GP register as the base register may have 21 offset bits. This allows the offset size (and thus the reachable range) to be adjusted based on how common different data sizes are. For example, where word data is significantly more common, it may warrant allocating a larger fraction of the opcode space to extend the reachable range.
- the one or more displacement memory access instructions that implicitly identify the GP register as the base register may include a plurality of displacement memory access instructions that cause data to be loaded into, or stored from, different types of target registers.
- the one or more displacement memory access instructions that implicitly identify the GP register as the base register may comprise a first load instruction to load data from global memory into a first type of register (e.g., a general- purpose register) and a second load instruction to load data from global memory into a second type of target register (e.g., a floating point register); and/or a first store instruction to store data from a first type of register (e.g., general -purpose register) into global memory and a second store instruction to store data from a second type of register (e.g., floating point register) into global memory.
- a first type of register e.g., a general-purpose register
- a second store instruction to store data from a second type of register (e.g., floating point register) into global memory.
- the displacement memory access instructions e.g., store instructions and/or load instructions
- the displacement memory access instructions that implicitly identify the GP register as the base register may also include one or more of the following instructions that load data into, or store data from, a floating point register:
- LDC1 load double floating point [GP] - which causes the execution unit to load a double word (i.e., eight bytes) from an address in memory based on the GP register and a specified offset into a specified floating point register
- SDC1 store double floating point [GP] - which causes the execution unit to store a double word (i.e., eight bytes) from a specified floating point register to an address in memory based on the GP address and a specified offset
- the decode unit 504 may be configured to decode instructions of different lengths.
- the decode unit 504 may be configured to decode instructions that are both 16-bits in length (16-bit instructions) and 32-bits in length (32-bit instructions).
- the one or more displacement memory access instructions e.g., store and/or load instructions
- the one or more displacement memory access instructions that implicitly identify the GP register as the base register may include a plurality of displacement memory access instructions wherein at least one of the displacement memory access instructions is of a first length (e.g., 32 bits) and at least one of the displacement memory access instructions (of the same type) is of a different length (e.g., 16 bits).
- the instruction set may comprise one or more 16-bit load instructions that implicitly identify the GP register as the base register.
- An example format of a 16-bit load instruction that implicitly identifies the GP register as the base register is shown and described with reference to Fig. 23.
- the instruction set may comprise one or more 16-bit store instructions that implicitly identify the GP register as the base register.
- An example format of a 16-bit store instruction that implicitly identifies the GP register as the base register is shown and described with reference to Fig. 24.
- the shorter length memory access instruction (load/store) (e.g., 16-bit instruction) will take up less space in the code than the corresponding longer length memory access (load/store) instruction (e.g., 32-bit instruction), and thus the more often the shorter length memory access instruction (load/store) (e.g., 16-bit instruction) can be used in the code, the shorter the code will be.
- the shorter length memory access instruction (load/store) e.g., 16-bit instruction
- load/store instruction e.g., 32-bit instruction
- the one or more instructions that implicitly identify the GP register as an operand of the instruction may comprise one or more register arithmetic instructions that implicitly identify the GP register as the source register of the instruction.
- Register arithmetic instructions cause an immediate to be added to, or subtracted from, a value in a register (the source register) and stored in another register.
- Register arithmetic instructions typically require that the source register be explicitly identified (e.g., by number) in the instruction.
- register arithmetic instructions e.g., register addition and/or subtraction instructions
- register arithmetic instructions that implicitly identify the GP register as the source register allows the addresses that can be calculated from the GP register to be expanded.
- register arithmetic instruction e.g., register addition or register subtraction instruction
- the range of addresses that can be calculated via the immediate bits is the 64KB (2 16 ) range of address from the addresses in the source register (e.g., GP register).
- the GP register is implicitly identified as the source register by the opcode, for example, then one or more of the five bits that were previously used to identify the source register can be used to extend the immediate. This extends the range of addresses that can be calculated via the immediate. For example, if all five bits are used to extend the offset to twenty-one bits, the range of addresses that can be calculated via the immediate is extended by a factor of 2 5 to the 2MB (2 21 ) range of addresses from the address in GP register.
- the one or more register arithmetic instructions that implicitly identify the GP register as the source register may include a plurality of register arithmetic instructions that cause immediates of different measurement units to be added to the GP register value.
- the one or more register arithmetic instructions that implicitly identify GP register as the source register may comprise a first register arithmetic instruction which causes an immediate in a first measurement unit (e.g., bytes) to be added to the address of the GP register, and a second register arithmetic instruction which causes an immediate in a second measurement unit (e.g., words) to be added to the value of the GP register.
- the register arithmetic instructions that implicitly identify the GP register as the source register may include one or more of the following instructions:
- Example 32-bit formats for these instructions are described below with reference to Figs. 25 and 26.
- the decode unit 504 may be configured to receive and decode instructions of different lengths.
- the decode unit 504 may be configured to decode instructions that are 16-bits in length (16-bit instructions), 32- bits in length (32-bit instructions), or 48-bits in length (48-bit instructions).
- the one or more register arithmetic instructions that implicitly identify the GP register as the source register may include a plurality of register arithmetic instructions wherein at least one of the register arithmetic instructions is of a first length (e.g., 32 bits) and at least one of the register arithmetic instructions is of a different length (e.g., 48 bits).
- the instruction set may also comprise one or more 48-bit register arithmetic instructions.
- An example format of a 48-bit register arithmetic instruction is shown and described with reference to Fig. 27.
- the example 48-bit arithmetic instruction causes a 32-bit immediate to be added to the GP register. This allows any address in a 32-bit address space to be generated from the GP register. Such an instruction provides a universal fall back if the global data section is so large that some of the data cannot be reached directly by the other GP relative instructions.
- the instruction set may also comprise one or more instructions that implicitly identify the program counter as an operand of the instruction.
- the decode unit 504 of Fig. 5 may be further configured to, in response to receiving an instruction that implicitly identifies the program counter as an operand, output one or more control signals to cause the execution unit 506 to perform the identified operation with the program counter as an operand of the instruction.
- the decode unit 504 may be configured to identify instructions that implicitly identify the program counter as an operand of the instruction based on the bit pattern of the received instruction. For example, each instruction that implicitly identifies the program counter as an operand of the instruction may have a unique recognizable bit pattern (e.g., certain bits of the instruction, which may or may not be contiguous bits, may have a recognizable pattern) that identifies the instruction as an instruction that implicitly identifies the program counter as an operand of the instruction. For example, in some cases each instruction that implicitly identifies the program counter as an operand of the instruction may have a unique opcode.
- Program counter related instructions are intended to supplement the GP-relative instructions. These instructions can be used for generating PC-relative addresses for the GP register. In addition, these instructions can also be used for the original purpose of code-density increase in and of themselves.
- the one or more instructions that implicitly identify the program counter as an operand of the instruction may include one or more register arithmetic instructions which implicitly identify the program counter as the source register, and/or one or more displacement memory access instructions which implicitly identify the program counter as the base register.
- the one or more instructions that implicitly identify the program counter as the source register may comprise instructions of different lengths.
- the one or more instructions that implicitly identify the program counter as an operand of the instruction may include one or more of the following:
- ALUIPC add program counter
- LWPC load word program counter
- SWPC store word program counter
- Example 32-bit formats for the ADDIUPC and ALUIPC instructions are described with reference to Figs. 28 and 29.
- Example 48-bit formats for the LWPC and SWPC instructions are described with reference to Figs. 30 and 31.
- PIC functions that access global data typically start by calculating the address of the GOT given the current program counter value.
- the ALUIPC instruction provides a highly efficient way to calculate the address of the GOT for any function using a single instruction. This is because this instruction can create a 4KB aligned pointer anywhere in a 32-bit address space in a PIC code model.
- the execution unit 506 is configured to execute the decoded instructions (i.e., perform the operations identified using the implicitly or explicitly identified operand(s)) received from the decode unit 504.
- the execution unit 506 may comprise one or more arithmetic logic units (ALUs).
- ALUs arithmetic logic units
- the execution unit 506 may have one or more sub- units dedicated to performing certain functions where the instruction set comprises both an instruction that explicitly identifies a register as an operand and a corresponding GP-relative instruction that implicitly identifies the GP register as an operand.
- the two instructions may be executed by the same sub-units because the operation performed by the two instructions is the same.
- the execution unit 504 may comprise a sub-unit for executing displacement store instructions and/or a sub-unit for executing displacement load
- the dedicated unit for executing displacement store instructions may be configured to write the data in a general-purpose register to an address in memory generated from the data in a base register plus an offset.
- the dedicated sub-unit for executing a displacement load instruction may be configured to load the data from an address in memory generated from the data in a base register plus an offset into a general-purpose register.
- the instruction set comprises one or more displacement load instructions which require explicit identification of the base register (e.g., the LW [generic] of Fig. 13) and one or more displacement load instructions that implicitly identify the GP register as the base register (e.g., the LW [GP] instruction of Fig.
- both instructions may be executed by the same sub-unit because both instructions cause the execution unit to perform the same operation with the same type of operands.
- the instruction set comprises one or more displacement store instructions which require explicit identification of the base register (e.g., the SW [generic] of Fig. 17) and one or more displacement store instructions that implicitly identify the GP register as the base register (e.g., the SW [GP] instruction of Fig. 17)
- both instructions may be executed by the same sub-unit because both instructions cause the execution unit to perform the same instruction with the same type of operands.
- Fig. 8 illustrates an example method 800 of decoding instructions at a data processing apparatus, such as the data processing apparatus 500 of Fig. 5, that implements an ISA that includes an instruction set with one or more instructions that implicitly identify the GP register as an operand of the instruction.
- the method 800 begins at block 802 where the decode unit receives an instruction for execution.
- the decode unit 504 decodes the received instruction.
- decoding the instruction may comprise identifying the operation to be performed by the instruction and identifying the operands thereof.
- the decode unit may be configured to decode the received instruction by identifying a predetermined partem of bits in the instruction from a plurality of predetermined patterns. For example, each different instruction may be identified by a unique pattern of bits in the instruction.
- the decode unit 504 determines whether the decoded instruction is an instruction that implicitly identifies the GP register as an operand of the instruction. If the decode unit 504 determines that the decoded instruction is an instruction that implicitly identifies the GP register as an operand of the instruction, then the method proceeds to block 808 where the decode unit identifies the GP register as an operand of the instruction and then the method 800 proceeds to block 816.
- the method 800 may proceed to block 810 (if the instruction set comprises one or more instructions that implicitly identify the program counter as an operand of the instruction) or the method 800 may proceed directly to block 814 or the method 800 may end.
- the decode unit determines whether the decoded instruction is an instruction that implicitly identifies the program counter as an operand of the instruction. If the decode unit 504 determines the decoded instruction is an instruction that implicitly identifies the program counter as an operand, then the method 800 proceeds to block 812 where the decode unit 504 identifies the program counter as an operand of the instruction and the method 800 proceeds to block 816. If, however, the decode unit 504 determines that the decoded instruction is not an instruction that implicitly identifies the program counter as an operand of the instruction, then the method 800 proceeds to block 814 where the decode unit 504 may identify an explicitly specified register as an operand and the method 800 proceeds to block 816.
- the decode unit outputs one or more control signals to cause the execution unit to perform the operation identified by the instruction with the operand(s) identified in block 808, 812 or 814.
- processor architectures have been routinely categorized by describing either the underlying hardware architecture or microarchitecture of a given processor, or by referencing the instruction set executed by the processor.
- the latter, the ISA describes the types and ranges of instructions available, rather than describing how the instructions are implemented in hardware. The result is that for a given ISA, the ISA can be implemented using a wide range of techniques, where the techniques can be chosen based on preference or need for execution speed, data throughput, power dissipation, and manufacturing cost, among many other criteria.
- the ISA serves as an interface between code that is to be executed on the processor and the hardware that implements the processor.
- ISAs and the processors or computers based on them, are partitioned broadly into categories including complex instruction set computers (CISC) and reduced instruction set computers (RISCs).
- the ISAs define types of data that can be processed; the state or states of the processor, where the state or states include the main memory and a variety of registers; and the semantics of the ISA.
- the semantics of the ISA typically include modes of memory addressing and memory consistency.
- the ISA defines the instruction set for the processor, whether there are many instructions (complex) or fewer instructions (reduced), and the model for control signals and data that are input and output.
- RISC architectures have many advantages to processor design because by reducing the numbers and variations of instructions, the hardware that implements the instructions can be simplified. Further, compilers, assemblers, linkers, etc., that convert the code to instructions executable by the architecture can be simplified and tuned for performance.
- pointers can be used to share data between and among processors, processes, etc., by providing a reference address or pointer to the data.
- the pointer can be provided in lieu of transferring the data to each processor or process that requires the data.
- the pointers that are used for passing data references can be local pointers known only to a given, local processor or process, or can be GPs.
- the GPs can be shared among multiple processors or processes.
- the GPs can be organized or grouped into a GP register.
- the registers can include general-purpose registers, floating point registers, and so on.
- a further capability of the presently described architecture includes support of the rotate and exchange or ROTX instruction.
- This instruction can support a variety of data operations such as bit reversal, bit swap, byte reversal, byte swap, shifting, striping, and so on, all within one instruction.
- the use of the ROTX instruction provides a computationally inexpensive technique for implementing multiple instructions within one instruction.
- the rotate and exchange instruction can overlay a barrel shifter or other shifter commonly available in the presently described architecture. Separately implementing these various rotate, exchange, or shift instructions would increase central processing unit (CPU) complexity because each instruction would have an impact on one or more aspects of the CPU design.
- CPU central processing unit
- Processors commonly include a "mode" designator to indicate that the mode in which a processor is operating is based on a number of bytes, words, and so on.
- a mode can include a 16-bit operation, a 32-bit operation, a 64-bit operation, and so on.
- One or more bits within an instruction can be used to indicate the mode in which a particular instruction is to be executed.
- the mode bits within each instruction can be repurposed. The repurposed bits within the instruction can be used to implement the longer address offsets or extended register ranges described elsewhere.
- Storage used by processors can be organized and addressed using a variety of techniques. Typically, the storage or memory is organized as groups of bytes, words, or some other convenient size. To make storage or memory access more efficient, the access acquires as much data as reasonable with each access, thus reducing the numbers of accesses. Access to the memory is often most efficient in terms of computation or data transfer when the access is oriented or "aligned" to boundaries such as word boundaries. However, data to be processed does not always conveniently align to boundaries. For example, the operations to be performed by a processor may be byte oriented, the amount of data in memory may align to a byte boundary but not a word boundary, and so on.
- Accessing specific content such as a byte can, under certain conditions and depending on the implementation of the processor, require multiple read operations. To improve computational efficiency, unaligned memory access can be required. The unaligned memory access may be needed for computational if not access efficiency.
- a given ISA can support explicit unaligned storage or memory accesses.
- the general forms of the load and store instructions for the ISA can include unaligned load instructions and unaligned store instructions.
- the unaligned load instructions and the unaligned store instructions support a balance or tradeoff between increased density of the code that is executed by a processor and reduced processor complexity.
- the unaligned load instructions and the unaligned store instructions can be implemented in addition to the standard load instructions and store instructions, where the latter instructions align to boundaries such as word boundaries.
- the "extra" data such as bytes that can be accessed, can be held temporally for potential use by a subsequent read or store instruction (e.g., data locality).
- an ISA can include instructions and hardware specifically tuned for save and store operations.
- a save instruction can save registers, where the registers can be stored in a stack.
- the saved registers can include source registers.
- a stack pointer can be adjusted to account for the stored registers.
- the saving can also include storing a local stack frame, where a stack frame can include a collection of data (or registers) on a stack that is associated with an instruction, a subprogram call, a function call, etc., that caused the save operation.
- the restore operation can reverse the save technique.
- the registers that were saved by the save operation can be restored.
- the restored registers can include destination registers. When the registers have been restored, the restore operation can cause a jump to a return address. Code execution can continue beginning with the return address.
- Figs. 9 to 27 and 32 illustrate examples formats of instructions that implicitly identify the GP register as an operand of the instruction
- Figs. 28 to 31 illustrate example formats of instructions that implicitly identify the program counter as an operand of the instruction.
- the instruction set may comprise any combination of these instructions.
- binary values in these figures represent specific bit patterns which must be included in the instruction in order to be decoded by the decode unit 504 as an instance of the instruction.
- the remaining fields are named instruction operands.
- fields with names ending in square parentheses such as 's[7:0] ' and 's[0] ' specify a particular range of bits for the named operand, using a Verilog style syntax.
- a single operand value may be split into more than one field in the instruction encoding, with the bit ranges specified by each field explicitly in this way. All non-specified bits will be set to zero (e.g., if s[32] is not specified, then bit 32 will be set to zero). If no explicit bit range is specified, then the operand represents the least significant bits of the value.
- FIG. 9 illustrates an example 32-bit format of a load byte instruction (LB [GP]) 902 that implicitly identifies the GP register as the base register.
- This instruction causes a byte of data at the memory address (GP register + offset (u)) to be loaded into a specified general -purpose register (rt).
- rt general -purpose register
- bits 18- 20 and bits 26-31 are used to identify the instruction as a LB [GP] instruction which leaves five bits (bits 21-25) to explicitly identify the general-purpose register (rt) and 18 bits (bits 0- 17) for the offset (u). This allows an address range of 2 18 addresses from the GP to be directly accessed via this instruction.
- a generic load byte instruction (LB [Generic]) 904 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since ten bits (bits 12-15 and 26-31) are used to identify the instruction as a LB [Generic] instruction and five bits (bits 16-20) are used to explicitly identify the base register (rs). Accordingly, the LB [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 6 .
- Fig. 10 illustrates an example 32-bit format of a load byte unsigned instruction (LBU [GP]) 1002 that implicitly identifies the GP register as the base register.
- This instruction causes a byte of data at memory address (GP register + offset (u)) to be loaded into a specified general-purpose register (rt) as unsigned data.
- GP register + offset (u) a byte of data at memory address
- rt general-purpose register
- nine bits bits (bits 18-20 and 26-31) are used to identify the instruction as an LBU[GP] instruction which leaves five bits (bits 21-25) to explicitly identify the general- purpose register (rt) and 18 bits (bits 0 to 17) for the offset (u). This allows a range of 2 18 addresses from the GP to be directly accessed via this instruction.
- a generic load byte unsigned instruction (LBU [Generic]) 1004 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since ten bits (bits 12-15 and 26-31) are used to identify the instruction as an LBU [Generic] instruction and five bits (bits 16-20) are used to explicitly identify the base register (rs). Accordingly, the LBU [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 6 .
- Fig. 11 illustrates an example 32-bit format of a load half instruction (LH [GP]) 1102 that implicitly identifies the GP register as the base register.
- This instruction causes a half word of data (i.e., two bytes of data) at memory address (GP register + offset (u)) to be loaded into a specified general-purpose register (rt).
- GP register + offset (u) a specified general-purpose register
- rt ten bits (bits 0, 18-20 and 26-31) are used to identify the instruction as a LH [GP] instruction which leaves five bits (bits 21-25) to explicitly identify the general-purpose register (rt) and 18 bits for an offset (17 bits which are explicitly specified + bit 0 which must be 0 for the address to be half word aligned). This allows a range of 2 18 addresses from the GP to be directly accessed via this instruction.
- LH [Generic] a generic load half instruction 1104 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since 10 bits (bits 12-15 and 26-31) are used to identify the instruction as a LH [Generic] instruction and 5 bits (bits 16-20) are used to explicitly identify the base register (rs). Accordingly, the LH [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 6 .
- Fig. 12 illustrates an example 32-bit format of a load half unsigned instruction (LHU [GP]) 1202 that implicitly identifies the GP register as the base register.
- This instruction causes a half word of data (i.e., two bytes of data) at memory address (GP register + offset (u)) to be loaded into a specified general-purpose register (rt) as unsigned data.
- rt general-purpose register
- ten bits bits (bits 0, 18-20 and 26-31) are used to identify the instruction as an LHU [GP] instruction which leaves five bits (bits 21-25) to explicitly identify the general-purpose register (rt) and 18 bits for the offset (17 bits which are explicitly specified + bit 0 which must be 0 to be half word aligned).
- LHU [Generic] This allows a range of 2 18 addresses from the GP to be directly accessed via this instruction.
- a generic load half unsigned instruction (LHU [Generic]) 1204 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since ten bits (bits 12-15 and 26-31) are used to identify the instruction as an LHU [Generic] instruction and five bits (bits 16-20) are used to explicitly identify the base register (rs). Accordingly, the LHU [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 6 .
- Fig. 13 illustrates an example 32-bit format of a load word instruction (LW [GP]) 1302 that implicitly identifies the GP register as the base register.
- This instruction causes a word of data (i.e., four bytes of data) at memory address (GP register + offset (u)) to be loaded into a specified general-purpose register (rt).
- rt general-purpose register
- bits 0-1, and 26-31 are used to identify the instruction as a LW [GP] instruction which leaves five bits (bits 21-25) to explicitly identify the general-purpose register (rt) and 21 bits for the offset (u) (19 specified bits plus the last two bits which must be zero to be word aligned).
- LW [Generic] a generic load word instruction 1304 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since ten bits (bits 12-15 and 26-31) are used to identify the instruction as a LW [Generic] instruction and five bits (bits 16-20) are used to explicitly identify the base register (rs). Accordingly, the LW [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 9 .
- Fig. 32 illustrates an example 32-bit format of a load word unsigned instruction (LWU [GP]) 3202 that implicitly identifies the GP register as the base register.
- This instruction causes a word of data (i.e., four bytes of data) at memory address (GP register + offset (u)) to be loaded into a specified general-purpose register (rt) as unsigned data.
- rt general-purpose register
- 11 bits bits 0-1, 18-20, and 26-31 are used to identify the instruction as an LWU [GP] instruction which leaves five (bits 21-25) to explicitly identify the general-purpose register (rt) and 18 bits for the offset (u) (16 specified bits plus the last two bits which are set to zero to be word aligned).
- LWU [Generic] a generic load word unsigned instruction 3204 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since ten bits (bits 12-15 and 26-31) are used to identify the instruction as an LWU [Generic] instruction and five bits (bits 16-20) are used to explicitly identify the base register (rs). Accordingly, the LWU [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 6 .
- FIG. 14 illustrates an example 32-bit format of a load double instruction (LD [GP]) 1402 that implicitly identifies the GP register as the base register.
- This instruction causes a double word of data (i.e., eight bytes of data) at memory address (GP register + offset (u)) to be loaded into a specified general-purpose register (rt).
- rt general-purpose register
- bits 0-2, and 26-31 are used to identify the instruction as a LD [GP] instruction which leaves five bits (bits 25-21) to explicitly identify the general-purpose register (rt) and 21 bits for the offset (u) (18 specified bits plus the last three bits which must be zero to be double word aligned).
- LD [Generic] a generic load double instruction 1404 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since 10 bits (bits 12-15 and 26-31) are used to identify the instruction as a LD [Generic] instruction and 5 bits (bits 16-20) are used to explicitly identify the base register (rs). Accordingly, the LD [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 9 .
- Fig. 15 illustrates an example 32-bit format of a store byte instruction (SB [GP]) 1502 that implicitly identifies the GP register as the base register.
- This instruction causes the byte of data in a specified general-purpose register (rt) to be stored at the memory address (GP register + offset (u)).
- rt general-purpose register
- u memory address
- SB [Generic] a generic store byte instruction 1504 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since ten bits (bit 12-15 and 26-31) are used to identify the instruction as a SB [Generic] instruction and five bits (bits 16- 20) are used to explicitly identify the base register (rs). Accordingly, the SB [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 6 .
- Fig. 16 illustrates an example 32-bit format of a store half instruction (SH [GP]) 1602 that implicitly identifies the GP register as the base register.
- This instruction causes the half word (i.e., two bytes) of data in a specified general-purpose register (rt) to be stored at the memory address (GP register + offset (u)).
- rt general-purpose register
- u memory address
- SH [GP] store half instruction
- SH [Generic] a generic store half instruction 1604 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since ten bits (bits 12-15 and 26-31) are used to identify the instruction as an SH [Generic] instruction and five bits (bits 16-20) are used to explicitly identify the base register (rs). Accordingly, the SH [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 6 .
- FIG. 17 illustrates an example 32-bit format of a store word instruction (SW [GP]) 1702 that implicitly identifies the GP register as the base register.
- This instruction causes the word (i.e., four bytes) of data in a specified general-purpose register (rt) to be stored at the memory address (GP register + offset (u)).
- rt general-purpose register
- u memory address
- SW [GP] a generic store word instruction
- [Generic]) 1704 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since ten bits (bits 12-15 and 26-31) are used to identify the instruction as a SW [Generic] instruction and five bits (bits 16-20) are used to explicitly identify the base register (rs). Accordingly, the SW[GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 9 .
- Fig. 18 illustrates an example 32-bit format of a store double instruction (SD [GP]) 1802 that implicitly identifies the GP register as the base register.
- This instruction causes the word (i.e., four bytes) of data in a specified general-purpose register (rt) to be stored at the memory address (GP register + offset (u)).
- bits 0-2 and 26-31 are used to identify the instruction as a SD [GP] instruction which leaves five bits (bits 21-25) to explicitly identify the general-purpose register (rt) and 21 bits for the offset (u) (18 specified bits plus bits 0-2 which must be zero to be word aligned).
- SD [Generic] This allows a range of 2 21 addresses from the GP to be directly accessed via this instruction.
- a generic store double instruction (SD [Generic]) 1804 that requires explicit identification of the base register (rs) has only a 12-bit offset (u) since ten bits (bits 12-15 and 26-31) are used to identify the instruction as a SD [Generic] instruction and five bits (bits 16-20) are used to explicitly identify the base register (rs).
- the SD [GP] instruction increases the address range that can be accessed with this instruction by a factor of 2 9 .
- Figs. 19-22 illustrate example 32-bit formats of load and store floating point instructions that implicitly identify the GP register as the base register.
- Fig. 19 illustrates an example 32-bit format of a load word floating point instruction (LWC1 [GP]) 1902
- Fig. 20 illustrates an example 32-bit format of a store word floating point instruction (SWC1 [GP]) 2002
- Fig. 21 illustrates an example 32-bit format of a load double floating point instruction (LDC1 [GP]) 2102
- Fig. 22 illustrates an example 32-bit format of a store double floating point instruction (SDC1 [GP]) 2202.
- the load instructions 1902 and 2102 cause the word (i.e., four bytes) or double (i.e., eight bytes) at the memory address (GP register + offset (u)) to be loaded into a specified floating point register (ft); and the store instructions 2002 and 2202 cause the word (i.e., four bytes) or double (i.e., eight bytes) of data in a specified floating point register (ft) to be stored at the memory address (GP register + offset (u)).
- Fig. 23 illustrates an example 16-bit format of a load word instruction (LW [GP16]) 2302 that implicitly identifies the GP register as the base register.
- This instruction causes the double word (i.e., eight bytes) of data stored at the memory address (GP register + offset (u)) to be stored in the specified general-purpose register. It can be seen that in this example only six bits (bits 10-15) are used to identify the instruction as a LW [GP16] instruction which leaves three bits (bits 7-9) to explicitly identify the general-purpose register (rt) and nine bits for the offset (u) (seven specified bits plus bits 0 and 1 which must be zero to be word aligned).
- LW [16- Generic] load word instruction 2304 that requires explicit identification of the base register (rs3) has only a 7-bit offset (u) (four explicit bit plus bits 0 and 1 which must be zero to be word aligned) since six bits (bits 10-15) are used to identify the instruction as a LW [16-Generic] instruction and three bits (bits 7-9) are used to explicitly identify the base register (rs3). Accordingly, the LW [GP16] instruction increases the address range that can be accessed with this instruction by a factor of 2 3 .
- Fig. 24 illustrates an example 16-bit format of a store word instruction (SW [GP16]) 2402 that implicitly identifies the GP register as the base register.
- This instruction causes the word (i.e., four bytes) of data in a specified general-purpose register (rtz3) to be stored at the memory address (GP register + offset (u)).
- GP register + offset (u) GP register + offset (u)
- SW [16-Generic] 16-bit store word instruction 2404 that requires explicit identification of the base register (rs3) has only a 6-bit offset (u) since six bits (bits 10-15) are used to identify the instruction as a SW [16- Generic] instruction and three bits (bits 4-6) are used to explicitly identify the base register (rs3). Accordingly, the SW [GP16] instruction increases the address range that can be accessed with this instruction by a factor of 2 3 .
- Fig. 25 illustrates an example 32-bit format of an add immediate instruction (ADDIU [GP.B]) 2502 that implicitly identifies the GP register as the source register.
- This instruction causes the address of a byte of memory to be calculated from the GP plus a specified immediate (u).
- ADDIU [GP.B] nine bits (bits 18-20, 26-31) are used to identify the instruction as an ADDIU [GP.B] instruction which leaves five bits (bits 21-25) to explicitly identify the general-purpose register (rt) and 18 bits (bits 0-17) for the immediate (u). This allows a range of 2 18 addresses from the GP to be directly generated via this instruction.
- a generic add immediate unsigned instruction 2504 that requires explicit identification of the source register (rs) has only a 16-bit immediate (u) since six bits (bits 26-31) are used to identify the instruction as an ADDIU [32-generic] instruction and five bits (bits 16-20) are used to explicitly identify the source register (rs). Accordingly, the ADDIU [GP.B] instruction increases the address range that can be generated with this instruction by a factor of 2 2 .
- Fig. 26 illustrates an example 32-bit format of an add immediate instruction (ADDIU [GP.W]) 2602 that implicitly identifies the GP register as the source register.
- This instruction causes the address of a word of memory to be calculated from the GP plus a specified immediate (u).
- ADDIU [GP.W] add immediate instruction
- a generic add immediate unsigned instruction (ADDIU [32- Generic]) 2604 that requires explicit identification of the source register (rs) has only a 16-bit immediate (u) since six bits (bits 26-31) are used to identify the instruction as an ADDIU [32- generic] instruction and five bits (bits 16-20) are used to explicitly identify the source register (rs). Accordingly, the ADDIU [GP.W] instruction increases the address range that can be generated with this instruction by a factor of 2 5 .
- Fig. 27 illustrates an example 48-bit format of an add immediate instruction (ADDIU [GP48]) 2702 that implicitly identifies the GP register as the source register.
- This instruction causes the address of memory to be calculated from the GP plus a specified immediate (u) and stored in a specified general-purpose register.
- 11 bits bits 32-36, 42-47) are used to identify the instruction as an ADDIU [GP48] instruction which leaves five bits (bits 37-41) to explicitly identify the general- purpose register (rt) and 32 bits for the offset (u). This allows an address range of 2 32 addresses from the GP to be directly generated via this instruction which is the entire addressable memory in a 32-bit address space.
- Fig. 28 illustrates an example 32-bit format of an add immediate program counter instruction (ADDIUPC [32]) 2802 that implicitly identifies the program counter as the source register.
- This instruction causes an address of memory to be calculated from the next program counter plus a specified signed immediate (s) and stored in a specified general- purpose register (rt).
- rt general- purpose register
- bits 26-31 are used to identify the instruction as an ADDIUPC [32] instruction which leaves five bits (bits 21-25) to explicitly identify the general-purpose register (rt) and 22 bits for the offset (u) (21 specified bit plus bit 0 which is set to zero). This allows an address range of 2 22 addresses from the GP to be directly generated via this instruction.
- Fig. 29 illustrates an example 32-bit format of an add immediate program counter instruction (ALUIPC) 2902 that implicitly identifies the program counter as the source register.
- AUIPC add immediate program counter instruction
- This instruction causes an aligned address at an upper 20-bit immediate offset from the next program counter to be calculated and stored in a specified general- purpose register (rt).
- rt general- purpose register
- seven bits bits 1, 26-31 are used to identify the instruction as an ALUIPC instruction which leaves five bits (bits 21-25) to explicitly identify the general-purpose register (rt) and 20 bits to specify the immediate. This allows a 4KB aligned address to be generated from the program counter.
- Fig. 30 illustrates an example 48-bit format of a load word program counter instruction (LWPC [48]) 3002 that implicitly identifies the program counter as the base register.
- This instruction causes the word at memory address (program counter + offset (s)) to be loaded into a specified general-purpose register.
- 11 bits bits 32-36, 42-47) are used to identify the instruction as a LWPC [48] instruction which leaves five bits (bits 37-41) to explicitly identify the general-purpose register (rt) and 32 bits for the offset (s). This allows an address range of 2 32 addresses from the program counter to be directly accessed via this instruction which is any address in a 32-bit address space.
- Fig. 31 illustrates an example 48-bit format of a store word program counter instruction (SWPC [48]) 3102 that implicitly identifies the program counter as the base register.
- This instruction causes the word in a specified general-purpose register (rt) to be stored at the memory address (program counter + offset (s)).
- rt general-purpose register
- s program counter + offset
- 1 1 bits bits 32-36, 42-47 are used to identify the instruction as a SWPC [48] instruction which leaves five bits (bits 37-41) to explicitly identify the general-purpose register (rt) and 32 bits for the offset (s). This allows an address range of 2 32 addresses from the program counter to be directly accessed via this instruction which is any address in a 32- bit address space.
- the data processing apparatus of Fig. 5 is shown as comprising a number of functional blocks. This is schematic only and is not intended to define a strict division between different logic elements of such entities. Each functional block may be provided in any suitable manner. It is to be understood that intermediate values described herein as being formed by the data processing apparatus need not be physically generated by the data processing apparatus at any point and may merely represent logical values which
- the data processing apparatus described herein may be embodied in hardware on an integrated circuit.
- the data processing apparatus described herein may be configured to perform any of the methods described herein.
- any of the functions, methods, techniques or components described above can be implemented in software, firmware, hardware (e.g., fixed logic circuitry), or any combination thereof.
- the terms "module,” “functionality,” “component”, “element”, “unit”, “block” and “logic” may be used herein to generally represent software, firmware, hardware, or any combination thereof.
- the module, functionality, component, element, unit, block or logic represents program code that performs the specified tasks when executed on a processor.
- a computer-readable storage medium examples include a random-access memory (RAM), read-only memory (ROM), an optical disc, flash memory, hard disk memory, and other memory devices that may use magnetic, optical, and other techniques to store instructions or other data and that can be accessed by a machine.
- RAM random-access memory
- ROM read-only memory
- optical disc optical disc
- flash memory hard disk memory
- hard disk memory and other memory devices that may use magnetic, optical, and other techniques to store instructions or other data and that can be accessed by a machine.
- Computer program code and computer readable instructions refer to any kind of executable code for processors, including code expressed in a machine language, an interpreted language or a scripting language.
- Executable code includes binary code, machine code, bytecode, code defining an integrated circuit (such as a hardware description language or netlist), and code expressed in a programming language code such as C, Java or OpenCL.
- Executable code may be, for example, any kind of software, firmware, script, module or library which, when suitably executed, processed, interpreted, compiled, executed at a virtual machine or other software environment, cause a processor of the computer system at which the executable code is supported to perform the tasks specified by the code.
- HDL hardware description language
- An integrated circuit definition dataset may be, for example, an integrated circuit description.
- An integrated circuit definition dataset may be in the form of computer code, for example as a netlist, code for configuring a programmable chip, as a hardware description language defining hardware suitable for manufacture in an integrated circuit at any level, including as register transfer level (RTL) code, as high-level circuit representations such as Verilog or VHDL, and as low-level circuit representations such as OASIS (RTM) and GDSII.
- RTL register transfer level
- RTM high-level circuit representations
- GDSII GDSI
- one or more intermediate user steps may be required in order for a computer system configured for generating a manufacturing definition of an integrated circuit to execute code defining an integrated circuit so as to generate the manufacturing definition of that integrated circuit.
- Fig. 33 shows an example of an integrated circuit (IC) manufacturing system 3302 which is configured to manufacture a data processing apparatus as described in any of the examples herein.
- the IC manufacturing system 3302 comprises a layout processing system 3304 and an integrated circuit generation system 3306.
- the IC manufacturing system 3302 is configured to receive an IC definition dataset (e.g., defining a data processing apparatus as described in any of the examples herein), process the IC definition dataset, and generate an IC according to the IC definition dataset (e.g., which embodies a data processing apparatus as described in any of the examples herein).
- the processing of the IC definition dataset configures the IC manufacturing system 3302 to manufacture an integrated circuit embodying a data processing apparatus as described in any of the examples herein.
- the layout processing system 3304 is configured to receive and process the IC definition dataset to determine a circuit layout.
- Methods of determining a circuit layout from an IC definition dataset are known in the art, and for example may involve synthesising RTL code to determine a gate level representation of a circuit to be generated, e.g., in terms of logical components (e.g., NAND, NOR, AND, OR, MUX and FLIP-FLOP components).
- a circuit layout can be determined from the gate level representation of the circuit by determining positional information for the logical components. This may be done automatically or with user involvement in order to optimize the circuit layout.
- the layout processing system 3304 may output a circuit layout definition to the IC generation system 3306.
- a circuit layout definition may be, for example, a circuit layout description.
- the IC generation system 3306 generates an IC according to the circuit layout definition, as is known in the art.
- the IC generation system 3306 may implement a semiconductor device fabrication process to generate the IC, which may involve a multiple-step sequence of photo lithographic and chemical processing steps during which electronic circuits are gradually created on a wafer made of semiconducting material.
- the circuit layout definition may be in the form of a mask which can be used in a lithographic process for generating an IC according to the circuit definition.
- the circuit layout definition provided to the IC generation system 3306 may be in the form of computer- readable code which the IC generation system 3306 can use to form a suitable mask for use in generating an IC.
- the different processes performed by the IC manufacturing system 3302 may be implemented all in one location, e.g., by one party. Alternatively, the IC
- manufacturing system 3302 may be a distributed system such that some of the processes may be performed at different locations and may be performed by different parties. For example, some of the stages of: (i) synthesising RTL code representing the IC definition dataset to form a gate level representation of a circuit to be generated, (ii) generating a circuit layout based on the gate level representation, (iii) forming a mask in accordance with the circuit layout, and (iv) fabricating an integrated circuit using the mask, may be performed in different locations and/or by different parties.
- processing of the integrated circuit definition dataset at an integrated circuit manufacturing system may configure the system to manufacture a data processing apparatus without the IC definition dataset being processed so as to determine a circuit layout.
- an integrated circuit definition dataset may define the configuration of a reconfigurable processor, such as an FPGA, and the processing of that dataset may configure an IC manufacturing system to generate a reconfigurable processor having that defined configuration (e.g., by loading configuration data to the FPGA).
- an integrated circuit manufacturing definition dataset when processed in an integrated circuit manufacturing system, may cause an integrated circuit manufacturing system to generate a device as described herein.
- the configuration of an integrated circuit manufacturing system in the manner described above with respect to Fig. 33 by an integrated circuit manufacturing definition dataset may cause a device as described herein to be manufactured.
- an integrated circuit definition dataset could include software which runs on hardware defined at the dataset or in combination with hardware defined at the dataset.
- the IC generation system may further be configured by an integrated circuit definition dataset to, on manufacturing an integrated circuit, load firmware onto that integrated circuit in accordance with program code defined at the integrated circuit definition dataset or otherwise provide program code with the integrated circuit for use with the integrated circuit.
- Fig. 34 is a flow diagram for decoding instructions at a data processing apparatus.
- Decoding instructions can include addresses of operands, where the operands can be located in global memory.
- the decoding instructions can include implicit GP relative addressing for global memory access.
- the flow 3400 includes receiving, at a decode unit, an instruction for execution 3410 by an execution unit of the data processing apparatus that specifies an operation to be performed.
- the data processing apparatus can include various computational architecture techniques, where the computational architecture techniques are based on corresponding instruction sets.
- the data processing apparatus can be a RISC.
- the instructions can include Boolean operations, arithmetic operations, vector operations, tensor operations, and data manipulation operations such as load, store, shift, rotate, complement, and so on.
- the instructions can take as inputs unsigned values, signed magnitude values, characters, integers, floating point values, radix point values (fixed or variable radix point), and the like.
- the instructions can enable or disable control signals, fire interrupts, handle interrupts, etc.
- the received instruction can be an instruction from an instruction set comprising one or more instructions that implicitly identify a GP register 3412 as an operand of the instruction.
- the GP register can store an address of global memory in which data is stored.
- the address that can be stored within the GP register can include an address, an immediate address, a relative address, or the like.
- the GP register can be a register that can be configured or loaded with a value that can point to an address in memory.
- the address in memory can include data (rather than code), an address to further data such as an indirect address, a relative address or offset, and so on.
- the GP register can aid access to data in memory by providing a reference for storing or loading the data. The reference to the data provides access to the data, wherever the data is needed and by whichever processor, rather than having to explicitly transfer the data.
- the GP register may be accessed by more than one processor, thus further reducing data transfer requirements.
- the GP register may be one of a plurality of general-purpose registers.
- the general-purpose registers can store instructions or data and may perform other operations such as accumulation.
- the GP register can be a dedicated register separate from the general- purpose registers discussed elsewhere or can be a specific general-purpose register.
- the general-purpose register can be defined (e.g., by software convention or the ISA) as the register used to store the GP address. How the GP register is configured and used to access data in memory may differ based on the type of code being run on the data processing apparatus. In particular, the use and contents of the GP register may depend on whether the code is PIC or regular, position dependent code (non-position-independent code).
- the flow 3400 includes decoding, at the decode unit, the received instruction 3420 to determine whether the received instruction is one of the one or more instructions that implicitly identify the GP register as an operand of the instruction.
- the GP register can include data, an address, an indirect address, a relative address, an index, and so on.
- the global point register can include an operand.
- the one or more instructions that implicitly identify the GP register as an operand of the instruction include one or more register arithmetic instructions that implicitly identify the GP register as a source register of the instruction.
- Register arithmetic instructions can include addition, subtraction, multiplication, division, shifting, rotating, complementing, and the like,
- the one or more register arithmetic instructions can include a first register arithmetic instruction to add an immediate value in a first unit to the address of the GP register, and a second register arithmetic instruction to add an immediate value in a second unit to the address of the GP register.
- the first immediate value and the second immediate value can include indexes, offset, indirections, etc.
- the one or more register arithmetic instructions comprise at least two register arithmetic instructions with different bit lengths. As discussed throughout, the different bit lengths for the register arithmetic instructions may be due to register arithmetic instructions including immediate values;
- the one or more load instructions include a first load instruction to load data of a first size from the global memory, and a second load instruction to load data of a second size from the global memory.
- the data sizes can include bits, nibbles, or bytes; words or fractions of words such as half-words, quarter- words, etc.; multiple words, where the multiple words may represent long variable values, floating-point values; and so on.
- the first load instruction can have a different number of offset bits than the second load instruction.
- the offset bits can determine an index or relative address.
- the different number of offset bits may result from load instructions that access different data types, where the different data types can include bit, byte, or word; integer, real, or float; character; etc.
- the one or more load instructions comprise at least two load instructions with different bit lengths.
- the different load instruction bit lengths can be based on immediate data that can be included within the instruction.
- the immediate data can include bytes, fractions of words, words, etc.
- the one or more load instructions can include one or more load instructions to load data into a first type of register and one or more load instructions to load data into a second type of register.
- the first register and the second register can include general-purpose registers, special-purpose registers, accumulators, local pointer registers, GP registers, etc.
- Instructions other than load instructions can be executed. Recall that the value within a given GP register can include an immediate value, an indirection, and index, an offset, a displacement, and the like.
- the one or more instructions when executed, can perform load operations, store operations, etc.
- one or more displacement memory access instructions can include one or more store instructions.
- the one or more store instructions can store data of various sizes such as bit, byte, fraction of word, word, multiple word, etc.
- the one or more store instructions can also store data including a variety of data types such as unsigned, signed magnitude, two's complement, real, floating-point, character, string, and the like.
- the one or more store instructions can include a first store instruction to store data of a first size in the global memory, and a second store instruction to store data of a second size in the global memory.
- the different data sizes can result from different data types, different numerical precisions, etc.
- the first store instruction can have a different number of offset bits than the second store instruction. The different number of offset bits can be related to different data types, different data precisions, and the like.
- the one or more store instructions can include at least two store instructions with different bit lengths. Other numbers of store instructions can further include more than two store instructions with different bit lengths.
- the flow 3400 includes outputting one or more control signals 3430.
- the control signals can include one or more fire signals, one or more done signals, etc.
- the control signals can include interrupt signals.
- the outputting of the control signals can cause the execution unit to perform the specified operation 3432 with the GP register as an operand.
- the specific operation can include a Boolean operation, an arithmetic operation, a vector operation, a tensor operation, a data transfer operation, etc.
- the number of bits in the received instruction can depend on the type of instruction, the type of data, if any, upon which the instruction operates, etc.
- the number of bits in the received instruction can depend on the type of register with which the instruction interacts.
- the one or more instructions that implicitly identify the GP register as an operand of the instruction can include one or more displacement memory access instructions.
- the displacement memory access instructions can implicitly identify the GP register as a base register of the instruction.
- the one or more control signals cause the execution unit to perform an access of an address of the global memory based on the address stored in the GP register and an offset specified in the received instruction.
- the one or more displacement memory access instructions can include one or more load instructions, one or more store instructions, etc.
- a number of bits of the received instruction allocated to the offset can be greater than a number of bits allocated to an offset in a corresponding displacement memory access instruction that explicitly identifies the base register.
- the excess bits can be ignored, can be used to access a corresponding displacement memory from a plurality of displacement memories, or the like.
- the various registers can contain an operand of an instruction.
- Other registers similarly may be used to contain an operand of an instruction.
- the one or more store instructions can include one or more store instructions to store data from a first type of register and one or more store instructions to store data from a second type of register.
- the instruction set can further include one or more instructions that implicitly identify a program counter of the data processing apparatus as an operand of the instruction.
- the program counter can be a program counter for tracking decoding or executing instructions from a program that can include multiple threads or can be executed on multiple processors, a program counter for enumerating the decoding and executing instructions on a given processor, etc.
- the flow 3400 includes determining whether the received instruction is one of the one or more instructions that implicitly identify the program counter 3440 as an operand of the instruction.
- the program counter can be implemented as a register or by using another architectural technique, and can store an instruction count, an instruction, data, and so on.
- the one or more instructions that implicitly identify the program counter as an operand of the instruction includes an instruction to cause an aligned address at an upper N- bit immediate offset from the program counter to be calculated, where N is an integer greater than two.
- the flow 3400 includes outputting 3450 one or more control signals to cause the execution unit to perform the operation with the program counter as an operand.
- the control signals can include one or more fire signals, one or more done instructions, one or more interrupts, and the like.
- Fig. 35 is a diagram of a system for decoding instructions at a data processing apparatus.
- the system 3500 can include one or more processors 3510 coupled to a memory 3512 which stores instructions.
- the system 3500 can include a display 3514 coupled to the one or more processors 3510 for displaying data, intermediate steps, instructions, GP registers, program counters, instruction counters, and so on.
- one or more processors 3510 are attached to the memory 3512 where the one or more processors, when executing the instructions which are stored, are configured to: receive, at a decode unit, an instruction for execution by an execution unit of the data processing apparatus that specifies an operation to be performed, the received instruction being an instruction from an instruction set comprising one or more instructions that implicitly identify a GP register as an operand of the instruction, the GP register storing an address of global memory in which data is stored; decode, at the decode unit, the received instruction to determine whether the received instruction is one of the one or more instructions that implicitly identify the GP register as an operand of the instruction; and in response to determining, at the decode unit, that the received instruction is one of the one or more instructions that implicitly identify a GP register as an operand of the instruction, output one or more control signals to cause the execution unit to perform the specified operation with the GP register as an operand.
- the system 3500 can include a collection of instructions and data 3520.
- the instructions and data 3520 may be stored in a database, one or more statically linked libraries, one or more dynamically linked libraries, precompiled headers, source code, and so on.
- the instructions and data can include flow graphs, agents, or other suitable
- the instructions can include instructions for implicit GP relative addressing for global memory access, where the processors can include processing elements in a reconfigurable fabric.
- the system 3500 can include a receiving component 3530.
- the receiving component can include functions and instructions for receiving, at a decode unit, an instruction for execution by an execution unit of the data processing apparatus that specifies an operation to be performed, the received instruction being an instruction from an instruction set comprising one or more instructions that implicitly identify a GP register as an operand of the instruction, the GP register storing an address of global memory in which data is stored.
- the global memory in which data is stored can include memory such as storage elements within a reconfigurable fabric, direct memory access (DMA) memory, a hybrid memory cube (HMC), a distributed memory, and so on.
- DMA direct memory access
- HMC hybrid memory cube
- the system 3500 can include a decoding component 3540.
- the decoding component 3540 can include functions and instructions for decoding, at the decode unit, the received instruction to determine whether the received instruction is one of the one or more instructions that implicitly identify the GP register as an operand of the instruction.
- the instructions that can be decoded can include Boolean operations, arithmetic operations, data transfer instructions, and so on.
- the data transfer instructions can include load or store instructions, where the load or store instructions can load or store data of various types to local memory, registers such as GP registers, program counters, etc.
- the system 3500 can include an outputting component 3550.
- the outputting component 3550 can output one or more control signals, where the one or more control signals can result in an operation being executed.
- the outputting component in response to determining, at the decode unit, that the received instruction is one of the one or more instructions that implicitly identify a GP register as an operand of the instruction, can output one or more control signals to cause the execution unit to perform the specified operation with the GP register as an operand.
- the decode unit can implicitly identify other components as an operand of an instruction.
- the outputting component in response to determining that the received instruction is one of one or more instructions that implicitly identify a program counter of the data processing apparatus as an operand of the instruction, the outputting component outputs one or more control signals to cause the execution unit to perform the operation with the program counter as an operand.
- the system 3500 can include a computer program product embodied in a non-transitory computer readable medium for decoding instructions at a data processing apparatus, the computer program product comprising code which causes one or more processors to perform operations of: receiving, at a decode unit, an instruction for execution by an execution unit of the data processing apparatus that specifies an operation to be performed, the received instruction being an instruction from an instruction set comprising one or more instructions that implicitly identifies a GP register as an operand of the instruction, and the GP register storing an address of global memory in which data is stored; decoding, at the decode unit, the received instruction to determine whether the received instruction is one of the one or more instructions that implicitly identify the GP register as an operand of the instruction; and in response to determining, at the decode unit, that the received instruction is one of the one or more instructions that implicitly identify a GP register as an operand of the instruction, outputting one or more control signals to cause the execution unit to perform the specified operation with the GP register as an operand.
- Each of the above methods may be executed on one or more processors on one or more computer systems.
- Embodiments may include various forms of distributed computing, client/server computing, and cloud-based computing.
- the depicted steps or boxes contained in this disclosure's flow charts are solely illustrative and explanatory. The steps may be modified, omitted, repeated, or reordered without departing from the scope of this disclosure. Further, each step may contain one or more sub-steps. While the foregoing drawings and description set forth functional aspects of the disclosed systems, no particular implementation or arrangement of software and/or hardware should be inferred from these descriptions unless explicitly stated or otherwise clear from the context. All such arrangements of software and/or hardware are intended to fall within the scope of this disclosure.
- FIG. 1 The block diagrams and flowchart illustrations depict methods, apparatus, systems, and computer program products.
- the elements and combinations of elements in the block diagrams and flow diagrams show functions, steps, or groups of steps of the methods, apparatus, systems, computer program products and/or computer-implemented methods. Any and all such functions— generally referred to herein as a "circuit,” “module,” or “system”— may be implemented by computer program instructions, by special-purpose hardware-based computer systems, by combinations of special-purpose hardware and computer instructions, by combinations of general-purpose hardware and computer instructions, and so on.
- a programmable apparatus which executes any of the above-mentioned computer program products or computer-implemented methods may include one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors, programmable devices, programmable gate arrays, programmable array logic, memory devices, application specific integrated circuits, or the like. Each may be suitably employed or configured to process computer program instructions, execute computer logic, store computer data, and so on.
- a computer may include a computer program product from a computer-readable storage medium and that this medium may be internal or external, removable and replaceable, or fixed.
- a computer may include a Basic Input/Output System (BIOS), firmware, an operating system, a database, or the like that may include, interface with, or support the software and hardware described herein.
- BIOS Basic Input/Output System
- Embodiments of the present invention are limited to neither conventional computer applications nor the programmable apparatus that run them.
- the embodiments of the presently claimed invention could include an optical computer, quantum computer, analog computer, or the like.
- a computer program may be loaded onto a computer to produce a particular machine that may perform any and all of the depicted functions. This particular machine provides a means for carrying out any and all of the depicted functions.
- any combination of one or more computer readable media may be utilized including but not limited to: a non-transitory computer readable medium for storage; an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor computer readable storage medium or any suitable combination of the foregoing; a portable computer diskette; a hard disk; a random access memory (RAM); a read-only memory (ROM), an erasable programmable read-only memory (EPROM, Flash, MRAM, FeRAM, or phase change memory); an optical fiber; a portable compact disc; an optical storage device; a magnetic storage device; or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- computer program instructions may include computer executable code.
- languages for expressing computer program instructions may include without limitation C, C++, Java, JavaScriptTM, ActionScriptTM, assembly language, Lisp, Perl, Tel, Python, Ruby, hardware description languages, database programming languages, functional programming languages, imperative programming languages, and so on.
- computer program instructions may be stored, compiled, or interpreted to run on a computer, a programmable data processing apparatus, a heterogeneous combination of processors or processor architectures, and so on.
- embodiments of the present invention may take the form of web-based computer software, which includes client/server software, software-as-a-service, peer-to-peer software, or the like.
- a computer may enable execution of computer program instructions including multiple programs or threads.
- the multiple programs or threads may be processed approximately simultaneously to enhance utilization of the processor and to facilitate substantially simultaneous functions.
- any and all methods, program codes, program instructions, and the like described herein may be implemented in one or more threads which may in turn spawn other threads, which may themselves have priorities associated with them.
- a computer may process these threads based on priority or other order.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
L'invention concerne des architectures de jeu d'instructions (ISA) et un appareil et des procédés associés qui comportent un jeu d'instructions qui comprend une ou plusieurs instructions qui identifient le registre de pointeur global (GP) en tant qu'opérande (par exemple, registre de base ou registre source) de l'instruction. L'identification peut être implicite. Par identification implicite du registre GP en tant qu'opérande de l'instruction, un ou plusieurs bits de l'instruction qui ont été réservés à l'identification explicite de l'opérande (par exemple, registre de base ou registre source) peuvent être utilisés pour étendre la taille d'un ou de plusieurs autres opérandes, tels que l'opérande de décalage ou l'opérande immédiat, pour fournir des opérandes de décalage ou des opérandes immédiats plus longs.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762552855P | 2017-08-31 | 2017-08-31 | |
US62/552,855 | 2017-08-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019046723A1 true WO2019046723A1 (fr) | 2019-03-07 |
Family
ID=65526125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2018/049099 WO2019046723A1 (fr) | 2017-08-31 | 2018-08-31 | Adressage relatif de pointeur global implicite pour accès à la mémoire globale |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2019046723A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113867971A (zh) * | 2021-12-03 | 2021-12-31 | 北京壁仞科技开发有限公司 | 访问图形处理器的内存的方法、设备、系统和存储介质 |
WO2022242291A1 (fr) * | 2021-05-20 | 2022-11-24 | Huawei Technologies Co., Ltd. | Procédé et système d'optimisation de calculs d'adresse |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130024663A1 (en) * | 2011-07-19 | 2013-01-24 | Qualcomm Incorporated | Table Call Instruction for Frequently Called Functions |
US20130246765A1 (en) * | 2008-09-09 | 2013-09-19 | Renesas Electronics Corporation | Data processor |
US20150160981A1 (en) * | 2012-08-30 | 2015-06-11 | Imagination Technologies Limited | Global Register Protection In A Multi-Threaded Processor |
US20160179534A1 (en) * | 2014-12-23 | 2016-06-23 | Polychronis Xekalakis | Instruction length decoding |
WO2017112176A1 (fr) * | 2015-12-21 | 2017-06-29 | Intel Corporation | Instructions et logique pour des opérations de chargement d'indices et de prélecture de regroupements |
-
2018
- 2018-08-31 WO PCT/US2018/049099 patent/WO2019046723A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130246765A1 (en) * | 2008-09-09 | 2013-09-19 | Renesas Electronics Corporation | Data processor |
US20130024663A1 (en) * | 2011-07-19 | 2013-01-24 | Qualcomm Incorporated | Table Call Instruction for Frequently Called Functions |
US20150160981A1 (en) * | 2012-08-30 | 2015-06-11 | Imagination Technologies Limited | Global Register Protection In A Multi-Threaded Processor |
US20160179534A1 (en) * | 2014-12-23 | 2016-06-23 | Polychronis Xekalakis | Instruction length decoding |
WO2017112176A1 (fr) * | 2015-12-21 | 2017-06-29 | Intel Corporation | Instructions et logique pour des opérations de chargement d'indices et de prélecture de regroupements |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022242291A1 (fr) * | 2021-05-20 | 2022-11-24 | Huawei Technologies Co., Ltd. | Procédé et système d'optimisation de calculs d'adresse |
US12118359B2 (en) | 2021-05-20 | 2024-10-15 | Huawei Technologies Co., Ltd. | Method and system for optimizing address calculations |
CN113867971A (zh) * | 2021-12-03 | 2021-12-31 | 北京壁仞科技开发有限公司 | 访问图形处理器的内存的方法、设备、系统和存储介质 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20250130806A1 (en) | Implicit Global Pointer Relative Addressing for Global Memory Access | |
US10514922B1 (en) | Transfer triggered microcontroller with orthogonal instruction set | |
US10768930B2 (en) | Processor supporting arithmetic instructions with branch on overflow and methods | |
ES2903001T3 (es) | Aparatos y métodos de hardware para detección de corrupción de memoria | |
US10671391B2 (en) | Modeless instruction execution with 64/32-bit addressing | |
TWI489386B (zh) | 由多個指令集使用之暫存器之間的映射 | |
US7473293B2 (en) | Processor for executing instructions containing either single operation or packed plurality of operations dependent upon instruction status indicator | |
KR101597774B1 (ko) | 마스킹된 전체 레지스터 액세스들을 이용한 부분적 레지스터 액세스들을 구현하기 위한 프로세서들, 방법들 및 시스템들 | |
TWI578159B (zh) | 用以提供基底暫存器交換狀態驗證功能之指令及邏輯 | |
ES2934513T3 (es) | Sistemas y métodos para omitir operaciones matriciales intrascendentes | |
JP2019197531A (ja) | 連鎖タイル演算を実施するためのシステムおよび方法 | |
CN104951296A (zh) | 允许一种架构的代码模块使用另一种架构的库模块的架构间兼容模块 | |
CN108885551B (zh) | 存储器复制指令、处理器、方法和系统 | |
WO2010004245A1 (fr) | Processeur à instruction de poussée | |
KR20010043826A (ko) | 마이크로 컨트롤러 명령어 집합 | |
KR20170097626A (ko) | 벡터 인덱스 로드 및 저장을 위한 방법 및 장치 | |
CN103270489B (zh) | 用于进行段寄存器读和写而不管特权等级的系统、装置和方法 | |
WO2019046723A1 (fr) | Adressage relatif de pointeur global implicite pour accès à la mémoire globale | |
CN116166369A (zh) | 基于硬件识别指令来启用主机穿透 | |
KR20170001578A (ko) | 상태 의존 계산들의 성능을 개선하기 위한 시스템들, 방법들, 및 장치들 | |
US20190138308A1 (en) | Unaligned memory accesses | |
US11263014B2 (en) | Sharing instruction encoding space between a coprocessor and auxiliary execution circuitry | |
WO2019046716A1 (fr) | Traitement d'instructions commandé par taille de pointeur | |
US20070061551A1 (en) | Computer Processor Architecture Comprising Operand Stack and Addressable Registers | |
WO2019046742A1 (fr) | Sauvegarde et restauration de blocs non contigus de registres conservés |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18852343 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18852343 Country of ref document: EP Kind code of ref document: A1 |