WO2004068361A1

WO2004068361A1 - Storage control device, data cache control device, central processing unit, storage device control method, data cache control method, and cache control method

Info

Publication number: WO2004068361A1
Application number: PCT/JP2003/000723
Authority: WO
Inventors: Iwao Yamazaki
Original assignee: Fujitsu Limited
Priority date: 2003-01-27
Filing date: 2003-01-27
Publication date: 2004-08-12
Also published as: JP4180569B2; JPWO2004068361A1

Abstract

A central processing unit includes a plurality of sets of an instruction processing device simultaneously executing a plurality of threads and a primary data cache device, and a secondary cache device shared by the primary data cache devices of the plurality of sets. The central processing unit includes a primary data cache unit and a secondary cache unit. Even when cache lines whose physical addresses are identical are registered in a cache memory, if thread identifiers are different, the primary data cache unit performs an MI request to the secondary cache unit, executes MO/BI according to a request from the secondary cache unit, and sets an RIM flag of a fetch port. When the cache line which has received the MI request is registered in the primary data cache unit by another thread, the secondary cache unit requests the primary cache unit to execute the MO/BI.

Description

Storage control device, data cache control device, central processing unit, storage device control method, data cache control method, and cache control method

Technical field

The present invention relates to a memory access issued from a plurality of threads executed simultaneously.

Storage controller, data cache controller, central processor that processes requests

The present invention relates to a storage control method, a data cache control method, and a cache control method, and in particular, to a storage control device and a data cache control which can guarantee consistency in the execution order of reading and writing of shared data between threads. The present invention relates to an apparatus, a central processing unit, a storage device control method, a data cache control method, and a cache control method.

Background art

Today's mainstream high-performance processors employ art-of-order processing to execute instructions while maintaining a high degree of parallelism. Out-of-order processing means that while reading data of an instruction is delayed due to a cache miss, etc., the data of the next instruction is read first, and then the This is the process of reading the instruction data.

However, if such processing is performed, the reading of the preceding data executed later will retrieve the latest data, and the reading of the subsequent data executed earlier will read the older data may occur. This may violate TSO (Total Store Order). Here, TSO means that the data read result correctly reflects the data write order, and is referred to as execution order consistency.

FIG. 9 is an explanatory diagram for explaining a TSO violation in a multiprocessor and its monitoring principle. (The same figure) is an application that may cause TSO violation. Fig. (B) shows an example of TS〇 violation, and Fig. (C) shows the principle of monitoring TSO violation.

Figure (a) shows an example in which the CPU writes the data measured by the measuring instrument to the shared storage area, reads and analyzes the data written to the shared storage area by the CPU, and outputs the analysis result. I have. In this example, CPU_ | 3 writes the measurement data to shared storage area Β (ST-—: The data of Β changes from b to b '), and writes that the measurement data has been written to shared storage area A ( ST—A: The data of A changes from a to a '). On the other hand, CP, U- _α reads Α to confirm that CPU-3 has finished writing the measurement data (FC_A: A = a,), and reads the measurement data written to B (FC-B : B = b ') and analyze.

Here, it is assumed that there is no B in the cache of the CPU-H, and there is B in the cache of CPU_i3, as shown in FIG. When the CPU executes FC-A, a cache miss occurs. FC-A waits until the cache line containing A reaches CPU-α, and executes FC-Β that has a cache hit. You. At this time, FC-B reads the data before CPU_i3 updates Β (CPU-α: B = b).

And during this time, 〇? 11_; 3 obtains a cache line containing B and A as an exclusive type in order to execute 3−8 + 31: −, and invalidates the cache line containing B of the CPU—or flushes data. (MO / BI: Move Out / Block Invalidate). Then, when the cache line containing B reaches CPU_, the CPU— | 3 completes writing data to B and A (CPU—β: B = b,, A = a ′), and then the CPU— Receives a cache line containing A (Ml: Move In) and completes FC-A (CPU-a: A = a,). Thus, since A = a, the CPU determines that the measurement data has been written, and malfunctions using the old data (B = b).

Conventionally, therefore, invalidation or flushing of a cache line containing the previously executed fetch data B and a cache line containing the fetch data A executed late. By monitoring the arrival of the hash line, a possible TS〇 violation is detected, and if a possible TSο violation is detected, execution is resumed from the instruction following the Fetch instruction whose order is guaranteed. Had prevented TS ο violations.

Specifically, a fetch request from the instruction processing unit is received at the fetch-port of the storage control unit, and as shown in FIG. 9 (c), each fetch port is assigned to the address of the fetch request. In addition, the flag holds the PSTV (Post STatus Valid) flag, RIM (Re-If etch by move in Fetch) flag, and RIF (Re-If etch by move in Fet ch) flag. In addition, an F P-TOQ (Fetch Port Top of Queue) that points to the oldest allocated Fetch Port among the Fetch Ports that have not yet returned Fetch Data in response to a Fetch Request from the instruction processing unit will be provided.

Then, when the FC-B of the CPU-α fetches, the PS TV flag of the fetch 'port that has received the request of the FC-B is set. In FIG. A (c), the hatched portion indicates a state in which the flag is set. After that, the ST-B of the CPU invalidates or flushes the cache line used by FC-B. At this time, the PS TV of the fetch port containing the request of FC-B is set, and a request for invalidating or flushing the physical address portion of the address held by the fetch port and the cache line is received. Since the physical address of the address matches, it is possible to detect that the cache line of the fetch-port that sent the fetch data was taken out.

When it detects that the cache line of the fetch port that sent out the fetch data has been taken out, the fetch that holds the request of FC-B 'and all the fetches from the fetch port indicated by FP-TOQ to the fetch port Set the RIM flag for the port.

After that, CPU-3 executes ST-B and ST-A, and CPU-j3 receives a cache line containing A from CPU-j3 to execute FC-A. Detects external receipt and sets the RIF flag for all valid fetch-ports. And command processing for the success of FC-A When notifying the device, the RIM flag and the RIF flag of the fetch port holding the request of FC-A are checked, and both flags are set. Request re-execution.

That is, the fact that both the RIM flag and the RIF flag are set means that the other instruction processing unit rewrites the data b that has been transmitted in response to the subsequent fetch request B to b ′, and the preceding fetch request A Indicates the possible raw data that received the rewritten data a and.

As described above, in a multiprocessor environment, a PS TV flag, a RIM flag, and a RIF flag are provided at each fetch port, and by monitoring the transfer of cache lines between processors, a TSO violation between processors can be prevented. Can be prevented. Such a TSO guarantee technique in a multiprocessor is disclosed in, for example, US Pat. No. 5,699,538. Techniques related to cache memories are disclosed in JP-A-10-116192, JP-A-10-232839, JP-A-2000-259498, and JP-A-2001-195301.

However, in a computer system that employs a multi-thread method, there is a problem that it is not sufficient to guarantee TSO only between processors. Here, the multi-thread method is a method in which one processor executes a plurality of threads (instruction strings) simultaneously. In other words, in a computer system that employs a multi-thread system, the primary cache is shared between different threads, and it monitors not only the transfer of cache lines between processors, but also the transfer between threads of the same cache. There is a need.

SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problem of the related art, and has a storage control device and a data storage device that can guarantee consistent execution order of reading and writing of shared data between threads. It is an object of the present invention to provide a cache control device, a central processing unit, a storage device control method, a data cache control method, and a cache control method. Disclosure of the invention

In order to solve the above-mentioned problems and achieve the object, the present invention is directed to a storage control device that is shared by a plurality of threads that are executed simultaneously and that processes a memory access request issued from the threads. A consistency assurance means for assuring the consistency of the execution order of reading and writing between the plurality of instruction processing devices with respect to data shared between the instruction processing devices, and an address specified by the memory access request. A thread judging unit for judging whether or not the thread that registered the stored data and the thread that issued the memory access request are the same when the data is stored; and a judgment result of the thread judging unit. And a consistency assurance operation starting means for operating the consistency assurance means based on the information.

Further, the present invention relates to a storage device control method for processing a memory access request issued from a plurality of threads executed at the same time, and stores data at an address specified by the memory access request. A thread judging step of judging whether or not the thread at which the stored data is registered is the same as the thread at which the memory access request is issued; and a plurality of instruction processing based on the judgment result of the thread judging step. A consistency assurance operation initiating step of operating a consistency assurance mechanism that guarantees consistency of the execution order of reading and writing between the plurality of instruction processing devices with respect to data shared between the devices. It is characterized by the following.

According to this invention, when the data at the address specified by the memory access request is stored, it is determined whether the thread that registered the stored data and the thread that issued the memory access request are the same. And operating a consistency assurance mechanism that guarantees consistency of the execution order of reading and writing between the plurality of instruction processing devices with respect to data shared among the plurality of instruction processing devices based on the determination result. Therefore, it is possible to guarantee the consistency of the execution order of reading and writing of the shared data between the threads.

In addition, the present invention provides a method in which a plurality of threads executed simultaneously execute a shared thread. A data cache control device for processing a memory access request issued from a plurality of instruction processing devices, comprising: a data cache control device for processing a memory access request issued from the plurality of instruction processing devices; A consistency assurance means for performing assurance, and when storing a cache line including data of an address specified in the memory access request, a thread in which the stored cache line is registered and the memory access request. A thread judging means for judging whether or not the issued threads are the same, and a consistency assurance operation starting means for operating the consistency assurance means when the thread judging means judges that they are not the same. It is characterized by having.

The present invention also relates to a data cache control method for processing a memory access request issued from a plurality of threads executed simultaneously, and stores a cache line including data of an address specified by the memory access request. In this case, it is determined that the thread in which the stored cache line is registered and the thread that issued the memory access request are the same, and a thread determination step for determining whether or not the force is the same, and the thread determination step is not the same. In such a case, a consistency assurance operation that activates a consistency assurance mechanism that guarantees consistency of the execution order of reading and writing between the plurality of instruction processing devices with respect to data shared among the plurality of instruction processing devices. And a moving step.

According to this invention, when the cache line including the data of the address specified by the memory access request is stored, the thread that registered the stored cache line and the thread that issued the memory access request are the same. Is determined to be the same or not, and when it is determined that they are not the same, consistency that guarantees the consistency of the execution order of reading and writing between multiple instruction processing units for data shared between multiple instruction processing units Since the guarantee mechanism is operated, it is possible to guarantee the consistency of the execution order of reading and writing of shared data between the threads.The present invention also provides an instruction processing device that executes a plurality of threads simultaneously. Primary data A central processing unit having a plurality of sets with a cache device, and having a secondary cache device shared by the plurality of sets of primary data cache devices, wherein each of the plurality of sets has a primary data cache. The apparatus comprises: a consistency assurance means for guaranteeing consistency of the execution order of reading and writing between a plurality of instruction processing units for a cache line shared with another set of primary data cache units; and When a cache line having the same physical address as the memory access request is registered by a different thread, a fetch requesting means for making a request to fetch the cache line to the secondary cache device; and the secondary cache. The cache line is invalidated or discharged based on a request from the device, and the A discharge execution means for operating the property assurance means, wherein the secondary cache device is provided when the cache line which has received the request for taking in the cache line is registered in the primary data cache device by another thread. Is characterized by comprising a discharge requesting means for requesting the primary data cache device to invalidate or discharge the cache line.

The present invention also provides a central processing unit having a plurality of sets of an instruction processing device and a primary data cache device for simultaneously executing a plurality of threads, and having a secondary cache device shared by the plurality of sets of the primary data cache devices. In the cache control method used in the above, when the primary data cache device is registered by a different thread when a cache line whose physical address matches a memory access request from the instruction processing device is registered by a different thread. A fetch requesting step of making a fetch request for the cache line to the cache device; and the secondary cache device registers the cache line, which has received the cache line fetch request, in the primary data cache device by another thread. Invalidation of the cache line, if any Or a discharge requesting step of requesting the primary data cache device to execute a discharge, and the primary data cache device invalidating or discharging the cache line based on a request from the secondary cache device. For cache lines shared with other sets of primary data cache devices. And a discharge execution step for operating a consistency assurance mechanism for assuring the consistency of the read and write execution order among a plurality of instruction processing devices.

According to this invention, the primary data cache device transmits the cache line to the secondary cache device when the cache line whose physical address matches the memory access request from the instruction processing device is registered by different threads. A fetch request is issued, and the secondary cache device invalidates or flushes the cache line if the cache line requested to fetch the cache line is registered in the primary data cache device by another thread. Request to the primary data cache device, and the primary data cache device invalidates or flushes the cache line based on the request from the secondary cache device, thereby sharing it with other * a primary data cache devices. Multiple for cache line The operation of the consistency assurance mechanism that guarantees the consistency of the read and write execution order among the instruction processing units ensures the consistency of the read and write execution order of shared data between threads. can do. Further, the present invention relates to a storage controller shared by a plurality of threads executed at the same time, and for processing a memory access request issued from the thread, wherein a thread executed by the instruction processor is switched. Access invalidating means for invalidating all uncommitted store instructions and fetch instructions among the store instructions and fetch instructions issued by the thread whose execution is interrupted, and the thread whose execution is interrupted. Interlock means for detecting a fetch instruction affected by the execution result of the committed store instruction when execution of the stored instruction is resumed, and controlling the detected fetch instruction to be executed after the execution of the store instruction; It is characterized by having.

Further, the present invention is a storage device control method for processing a memory access request issued from a plurality of threads that are executed simultaneously, wherein the execution is performed when a thread executed by the instruction processing device is switched. The thread issued by the suspended thread An access invalidating step of invalidating all uncommitted store instructions and fetch instructions among the store instructions and the fetch instruction; and, when the execution of the thread whose execution has been interrupted is resumed, An interlock step of detecting a fetch instruction affected by an execution result of a committed store instruction and controlling the detected fetch instruction to be executed after the execution of the store instruction. . According to this invention, when the thread executed by the instruction processing device is switched, all the uncommitted swore instructions and fetch instructions of the store instruction and the fetch instruction issued by the thread whose execution is interrupted are issued. Is disabled, and when execution of the thread whose execution has been suspended is resumed, a fetch instruction that is affected by the execution result of the committed store instruction is detected, and the detected fetch instruction is executed after the execution of the store instruction. With such control, it is possible to guarantee consistency in the execution order of reading and writing of shared data between the threads. BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a functional block diagram showing a configuration of a CPU according to the first embodiment, FIG. 2 is a diagram showing an example of a cache tag, and FIG. 3 is a key diagram shown in FIG. FIG. 4 is a flowchart showing the processing procedure of the cache control unit. FIG. 4 is a flowchart showing the processing procedure of the MI processing between the cache control unit and the secondary cache unit. FIG. 6 is a functional block diagram showing a configuration of a CPU according to Embodiment 2, FIG. 6 is an explanatory diagram for explaining an operation of a cache control unit according to Embodiment 2, and FIG. FIG. 8 is a flowchart showing a processing procedure of a cache control unit according to Embodiment 2; FIG. 8 is a flowchart showing a processing procedure of MOR processing; FIG. 9 is a TSO violation in a multiprocessor and monitoring thereof Theory to explain the principle FIG. BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, a storage control device and a data cache according to the present invention will be described with reference to the accompanying drawings. Preferred embodiments of a control device, a central processing unit, a storage device control method, a data cache control method, and a cache control method will be described in detail. Note that τ so guarantee between threads executed by different processors is performed by setting the RIM flag by invalidating the cache line and setting the RIF flag by arriving data as before. Therefore, here, the TSO guarantee between threads executed simultaneously by the same processor is explained.

Embodiment 1

First, the configuration of the CPU according to the first embodiment will be described. FIG. 1 is a functional block diagram showing a configuration of a CPU according to the first embodiment. As shown in the figure, the CPU 10 has processor cores 100 and 200 and a secondary cache unit 300, and the secondary cache unit 300 has a processor core 100 And shared by 200.

Here, for convenience of explanation, the case where the CPU 10 has two processor cores is shown, but the CPU 10 has only one processor core, or has more processor cores. There is also. Further, since both the processor cores 100 and 200 have the same configuration, the processor core 100 will be described here as an example.

The processor core 100 has an instruction unit 110, an operation unit 120, a primary instruction cache unit 130, and a primary data cache unit 140.

The instruction cut 110 is a processing unit that decodes and executes instructions. An MT (Multi Thread) control unit controls two threads “thread 0” and “thread 1” and executes them simultaneously. I do.

The arithmetic unit 120 is a processing unit that includes a general-purpose register, a floating-point register, a fixed-point arithmetic unit, a floating-point arithmetic unit, and executes fixed-point arithmetic and floating-point arithmetic.

Primary instruction cache unit 1 3 0 and primary data cache unit 1 4 Reference numeral 0 denotes a storage unit that stores a part of the main storage device for accessing the instructions and data stored in the main storage device at high speed.

The secondary cache unit 300 stores more main storage unit instructions and data to compensate for the lack of capacity of the primary instruction cache unit 130 and the primary data cache unit 140. And is connected to the main storage device via the system controller.

Next, details of the primary data cache unit 140 will be described. The primary data cache 144 has a cache memory 141 and a cache control unit 142, and the cache memory 141 is a storage unit for storing data.

The cache control unit 144 is a processing unit that manages data stored in the cache memory 141, and includes a TLB (Translation Look-aside Buffer) 144, a TAG unit 144, and a TAG-MA. It has a TCH detection unit 145, a MIB (Move In Buffer) 146, a MOZB I processing unit 147, and a fetch 'port 148.

The TLB 143 is a processing unit that performs high-speed address conversion from a virtual address (VA: Virtual Address) to a physical address (PA: Physical Address), and converts a virtual address received from the instruction unit 110 into a physical address. The signal is converted and output to the TAG-MA TCH detector 144.

The TAG unit 144 is a processing unit that manages the cache line registered in the cache memory 141, and the cache memory 144 corresponding to the virtual address received from the instruction cut 110. The physical address and thread identifier (ID) of the cache line registered in the location 1 are output to the TAG-MA TCH detection unit 144. Here, the thread identifier is an identifier for identifying whether the cache line is used in “thread 0” or “thread 1”.

FIG. 2 is a diagram showing an example of a cache tag which is information for managing a cache line registered in the cache unit 144 of the TAG unit 144. . As shown in the figure, the cache tag uses a V bit that indicates whether the cache line is valid (Valid), an S bit and an E bit that indicate the shared type and exclusive type of the cache line, and a cache line. It includes an ID that indicates the thread that is in use, and a PA that indicates the physical address of the cache line. If the cache line is of the shared type, the cache line may be held by another processor at the same time. If the cache line is of the exclusive type, the cache line may be held at the same time. Is not held by another processor.

The TAG-MATCH detection unit 145 is a processing unit that compares the physical address received from the TLB 143 and the thread identifier received from the instruction unit 110 with the physical address and the thread identifier received from the TAG unit 144. . The TAG-MATCH detection unit 145 uses the cache line registered in the cache memory 141 when the physical address and the thread identifier match and the V bit is set, and in other cases, The physical address and the thread identifier are specified to the MI B 146 to instruct the instruction unit 110 to fetch the cache line requested by the instruction unit 110 from the secondary cache unit 300. This TAG-MATCH detection unit 145 In addition to comparing the physical address received from the TLB 143 with the physical address received from the TAG unit 144, by comparing the thread identifier received from the instruction unit 110 with the thread identifier received from the TAG unit 144, Instruction unit 1 10 not only determines whether the cache line is in cache memory 14 1 but also It can be the threads that registered the cache line in the threads and the cache memory 14 1 that requested it is determined force whether the same threads, performs different processing based on the determination result.

The MIB 146 is a processing unit that issues a cache line fetch request (Ml request) by designating a physical address to the secondary cache unit 300. Also, corresponding to the cache line fetched by this MI B 146, TA The cache tag of the G section 144 and the contents of the cache memory 141 are updated.

The MO / BI processing unit 147 is a processing unit that invalidates or discharges a specific cache line in the cache memory 141 based on a request from the secondary cache widget 300. The MOZB I processing unit 147 can set the RIM flag of the fetch port 148 by invalidating or flushing a specific cache line, and the TSO guarantee mechanism between processors can be used as a TS guarantee mechanism between threads. Can be used.

The fetch port 148 is a storage unit that stores an access destination address, a PSTV flag, a RIM flag, a RIF flag, and the like in response to each access request from the instruction unit 110.

Next, the processing procedure of the cache control unit 142 shown in FIG. 1 will be described. FIG. 3 is a flowchart showing a processing procedure of the cache control unit 142 shown in FIG. As shown in the figure, in the cache control unit 142, the TLB 143 converts a virtual address into a physical address, and the TAG unit 144 obtains a physical address, a thread identifier, and a V bit from the virtual address using a cache tag.

(Step S301).

Then, the TAG—MATCH detection unit 145 compares the physical address input from the TLB 143 with the physical address input from the TAG unit 144, and determines whether the cache line requested by the instruction cut 110 is in the cache memory 141. It is checked whether or not it is (step S302). As a result, if the two physical addresses match, the thread identifier input from the instruction unit 110 is compared with the thread identifier input from the TAG unit 144, and the same thread in the cache line in the cache memory 141 is used. It is checked whether it is used (step S303).

Then, when both the thread identifiers match, it is further checked whether or not the V bit is set (step S304). As a result, if the V bit is set, the cache line requested by instruction unit 110 is keyed. Since the cache memory 141 has the same thread and the cache line is valid, the cache control unit 142 uses the data in the data section (step S305).

On the other hand, if the physical address does not match, if the thread identifier does not match, or if the V bit is not set, a cache whose physical address matches the cache line requested by the thread executing in instruction unit 110 If there is no line, the physical address matches, but a different thread is using the cache line, or the cache line is invalid, the data in the cache memory 141 cannot be used. The cache line is fetched from the next cash unit 300 (step S306). Then, the cache control unit 142 uses the data of the fetched cache line (step S307). Thus, the TAG-MATCH detection unit 145 determines whether the thread identifier matches not only the physical address but also whether or not the thread identifier matches. By checking, the cache control unit 142 can control a cache line between threads.

Next, a description will be given of a processing procedure of a cache line fetching process (Ml process) between the cache control unit 142 and the secondary cache cut 300. FIG. 4 is a flowchart showing a processing procedure of the MI processing between the cache control unit 142 and the secondary cache unit 300. This Ml process is a process performed by the secondary cache 300 in step S306 of the cache control unit 142 shown in FIG. 3 and correspondingly.

As shown in the figure, in this Ml processing, the cache control unit 142 of the primary data cache unit 140 issues an Ml request to the secondary cache unit 300 (step S401). Then, the secondary cache unit 300 checks whether or not the cache line receiving the MI request has been registered in the primary data cache unit 140 at another thread (step S402), and if the cache line has been registered at another thread. In order to set the RIM flag, An MO / BI request is made to the user (step S403).

The cache line receiving the Ml request is the primary data cache unit 1

Judgment as to whether or not 40 is registered in another thread is performed using synonym control. Here, the synonym control is a control that manages addresses registered in the primary cache unit on the secondary cache unit side and prevents multiple cache lines of the same physical address from being registered in the primary cache unit. Then, after the MO / BI processing unit 147 of the cache control unit 142 executes the MO / BI processing and sets the RIM flag (step S404), the secondary cache unit 300 The cache control unit 142 sends out the line (step S405) and receives the cache line, and registers the cache line together with the thread identifier (step S406). When the cache line arrives, the RIF flag is set.

On the other hand, the cache line receiving the MI request is the next data cache unit 1

If it is not registered in another thread at 40, the secondary cache unit 300 sends out the cache line without making a MO / BI request (step

5 4 0 5).

Thus, in this Ml processing, the secondary cache unit 300 uses synonym control to determine whether the cache line receiving the MI request has been registered in the primary data cache unit 140 with another thread. If the thread is registered in another thread, the MOZBI processing section 147 of the cache control section 142 executes the MO / BI processing and sets the RIM flag, and the The TSO guarantee mechanism can be used as a TSO guarantee mechanism between threads.

As described above, in the first embodiment, the TAG—MA TCH detection unit 144 of the primary data cache 144 stores a cache line with the same physical address in the cache memory 144. If the thread identifier is different even if the thread is registered, an Ml request is made to the secondary cache unit 300, and the secondary cache unit 300 receives the Ml request from the cache line. Is another thread If it is registered in the primary data cache unit 140 by the 0 command, it requests the cache control unit 142 to execute I processing, and the cache control unit 142 executes MO / BI processing and fetches 'Since the RIM flag of port 148 is set, TS〇 between threads can be guaranteed using the TSO guarantee mechanism between processors.

In the first embodiment, the case where the secondary cache unit 300 issues a MOZB I request to the primary data cache unit 140 using synonym control has been described. Due to the increased burden on the knit, secondary cache units may not have synonym control. In such a case, if the cache line with the same physical address is registered in the cache memory with a different thread identifier on the primary data cache unit side, MO / BI processing is performed by itself, and the TSO can be guaranteed.

In this case, it is possible to use a protocol that is conventionally provided to speed up data transfer from the processor to the external storage device and that issues a cache line flush request from the primary cache unit to the secondary cache unit. . In this protocol, the primary cache unit sends a designated cache line flush request to the secondary cache unit, and the secondary cache unit that receives the request forwards the request to the main storage controller, The cache line is discharged to the main storage device according to the instruction of the device. Therefore, the cache line can be flushed from the primary data cache unit to the secondary cache unit by using the flush operation of the cache line.

Embodiment 2

In the first embodiment, the case where the RIM flag of the Fetch @ Port is set using the synonym control of the secondary cache unit or the cache line ejection request of the primary data cache unit has been described. However, the secondary cache has no mechanism for synonym control, and the primary data In some cases, the cache unit does not have a mechanism to issue a cache line flush request!

Therefore, in the second embodiment, a description will be given of a case where the TSO is guaranteed by using a process of flushing a replacement block generated when a cache line is replaced and performing invalidation processing and monitoring an access request to a cache memory or a main storage device. I do. In the second embodiment, since the operation of the cache control unit of the primary data cache unit is mainly different from that of the first embodiment, the operation of the cache control unit will be described.

First, the configuration of the CPU according to the second embodiment will be described. FIG. 5 is a functional block diagram showing a configuration of the CPU according to the second embodiment. As shown in the figure, the CPU 504 has four processor cores 501 to 540 'and a secondary cache unit 550 shared by the four processor cores. Note that the four processor cores 5110 to 5400 all have the same configuration, and therefore, the processor core 510 will be described here as an example.

The processor core 510 includes an instruction unit 511, an operation unit 511, a primary instruction cache unit 513, and a primary data cache unit 514.

The instruction unit 5 11 1 is a processing unit that decodes and executes an instruction in the same manner as the instruction unit 1 10, and an MT (Multi Thread) control unit controls “thread 0” and “thread 1”, Run two threads simultaneously.

The arithmetic unit 5 11 1 is a processing unit that executes fixed-point arithmetic and floating-point arithmetic in the same manner as the arithmetic unit 1 20. The primary instruction cache unit 5 13 is Similarly, it is a storage unit that stores part of the main storage device in order to access the instructions stored in the main storage device at high speed.

—Next data cache unit 5 1 4 is the primary data cache unit 1 4

The primary data cache unit 5 1 4 is a storage unit that stores a part of the main storage device in order to access the data stored in the main storage device at a high speed as in the case of 0. The cache control unit 515 does not issue an Ml request from the MIB to the secondary cache unit when a cache line with a matching physical address and a different thread identifier is registered in the cache memory. Instead, the cache control unit 515 performs a replace move-out (M〇R) process on the cache line whose physical address matches, and changes the thread identifier registered in the cache tag.

During this process, the fetch port is monitored, and if there is a matching address, the RIM flag and the RIF flag are set. However, the RIF flag can also be set when a different thread writes to the cache memory or main memory. Then, TSO is guaranteed by requesting the instruction to be re-executed when the fetch @ port in which both the RIM flag and the RIF flag are set returns STV.

FIG. 5 is an explanatory diagram for explaining the operation of the cache control unit 515. The figure shows the classification of cache access operations according to the instruction that tried to use the cache line and the state of the cache line. As shown in the figure, the cache access operation of the cache control unit 515 includes “10 patterns” of access patterns and “3 patterns” of operations.

The first of the three types of operations is the operation in the case of a cache miss (① and ⑥). The Ml request for the cache line is made to the secondary cache unit and the cache line is fetched. When a cache line is required for loading data (①), the acquired cache line is registered in a shared type, and a cache line is required for data store. In this case (⑥), the fetched cache line is registered exclusively.

The second of the three types of operations is the normal cache hit operation (③, ③, ④, and 時) when the multi-thread operation is not performed, and is the same as the normal cache hit operation without performing any special processing. Operate, and the state of the cache line does not change. The third of the “three” operations is the case that includes operations that occur to guarantee TSO between threads during multi-thread operation (⑤, ⑦, ⑨, and @). Set the RIM and RIF flags. When a store is executed on a cache line shared with another processor core (⑦), if a store is executed on a cache line shared with another processor core, which processor core will be executed after the completion of the store Before the store is executed, the state of the cache line is first changed from the shared type to the shared type (BTC) because it is not known whether the cache line held is the latest cache line. Then, the MOR operation is performed in case another processor core has completed the fetch using the area. After that, the store is executed.

Next, the processing procedure of the cache control unit 515 will be described. FIG. 7 is a flowchart showing a processing procedure of the cache control unit 515. As shown in the figure, the cache control unit 515 checks whether the access requested by the instruction unit 511 is a password or not (step S701).

As a result, if the access is a load (step S701 affirmative), a cache miss force is checked (step S702), and if a cache miss, the MIB is secured (step S701). 703), and requests a cache line to the secondary cache unit 550 (step S704). Then, when the cache line arrives, the cache line is registered in a shared type (step S705), and the data in the data section is used (step S706).

On the other hand, in the case of a cache hit, it is checked whether or not the hit cache line is registered in the same thread (step S707). If the cache line is registered in the same thread, the data section is checked. (Step S706). If the hit cache line is not registered in the same thread, it is checked whether or not the cache line is of a shared type (step S 708). If the data is used (step S706) and it is exclusive type, MOR processing is executed to set the RIM flag and RIF flag. (Step S709), and use the data in the data section (Step S706)

On the other hand, if the access is to the store (No at step S701), it is checked whether or not it is a cache miss (step S710). If the access is a cache miss, the MIB is secured (step S71 1). The cache line is requested to the secondary cache unit 550 (step S712). When the cache line arrives, the cache line is registered as an exclusive type (step S713), and the data is stored in the data section (step S714).

On the other hand, in the case of a cache hit, it is checked whether or not the hit cache line is registered with the same thread (step S715). If the cache line is registered with the same thread, the cache line is shared. It is checked whether the type is a force exclusive type (step S716). Then, if it is of the exclusive type, the data is stored in the data section (step S714). On the other hand, in the case of the shared type, the M〇R processing is executed to set the RIM flag and the RIF flag (step S717), invalidate the cache line of another processor core (step S718), and The line is changed to the exclusive type (step S719), and the data is stored in the data section (step S714).

If the hit cache line is not registered with the same thread, the MOR processing is executed and the RIM flag and RIF flag are set (step S720), and the power exclusive type in which the cache line is shared is used. (Step S716). Then, if it is of the exclusive type, the data is stored in the data section (step S714). On the other hand, in the case of the shared type, the cache line of the other processor core is invalidated (step S718), the cache line is changed to the exclusive type (step S719), and the data is stored in the data section (step S718). 71 4).

As described above, the cache control unit 515 monitors access to the cache memory or the main storage device, and executes the MOR processing when there is a possibility that a TSO violation may occur. By executing and setting the RIM and RIF flags, the TSO guarantee mechanism between the processor cores can be used for TSO guarantee between threads. Next, the MOR processing will be described. FIG. 8 is a flowchart showing a processing procedure of the MOR processing. As shown in the figure, the MOR process secures a MIB (step S801) and starts a replace move-out operation. Then, half of the cache line is read out to the replacement move buffer (step S802), and it is checked whether or not the force in which the replacement move-out is prohibited (step S803). Here, a case where the replace move-out is prohibited is a case where a special instruction such as compare and swap is trying to use the cache line. Also, the data in the replace move-out buffer is not used.

Then, if the replace move-out is prohibited, the process returns to step S802, and the replacement move-out buffer is read again. On the other hand, if the replace move-out is not prohibited, the remaining half of the cache line is read into the replace move-out buffer and the thread identifier is rewritten (step S804).

In this way, the MOR processing executes the replacement part operation, so that the TSO guarantee mechanism between the processor cores works, and the PSTV flag is set using the same cache line as the replacement moveout. You can set the RIM flag of the connected Fetch.port. Here, by setting the RIF flag simultaneously with the setting of the RIM flag, the TSO guarantee mechanism between the processor cores can function as the TSO guarantee mechanism between the threads.

Also, different threads may compete for the same cache line on the same processor core. In such a case, the operation is performed when different processors compete for the same cache line in a multiprocessor environment.

Specifically, cache line competition occurred in a multiprocessor environment In such a case, each processor has a cache line discharge prohibition control and a control for forcibly disabling it. That is, the processor holding the cache line tries to wait for the cache line to be discharged until the store is completed. This is the cache line ejection prohibition control. However, if one of the processors keeps storing in the same cache line forever, the cache line cannot be transferred to another processor. Therefore, if the cache pipeline flush processing for processing a cache line flush request received from another processor has failed a certain number of times in the cache pipeline, the store to that cache line is forcibly stopped. The cache line is successfully ejected once. As a result, the cache line is passed to another processor. After that, if the store to the cache line is to be continued, a request to flush the cache line is sent to another processor. As a result, the cache line will eventually arrive and the store can be continued.

Such a mechanism that operates when different processors exchange the same cache line in a multi-processor environment is capable of operating even in a replace move-out operation used in passing a cache line between threads. Therefore, in any case, the cache line was successfully transferred between the threads, and the hang operation could be prevented.

As described above, in the second embodiment, the cache control unit 515 of the primary data cache 514 monitors access to the cache memory or the main storage device, and a TSO violation may occur. In this case, the MOR processing is executed to set the RIM flag and RIF flag, so that the TS〇 guarantee mechanism between the processor cores can work as the Tso guarantee mechanism between threads. In the second embodiment, the case where the shared cache line is shared between different threads has been described. However, the present invention is not limited to this, and the shared cache line is also exclusive cache. Exclusive between threads like a line The same can be applied to control. Specifically, the TSO guarantee mechanism between processor cores works as a TSO guarantee mechanism between threads by executing MOR processing when the load of a cache line registered by another thread hits. Can be.

Also, in the first and second embodiments, the case where the instruction unit processes two threads at one time has been described. However, the present invention is not limited to this, and the three or more threads may be executed at one time by the instruction unit. The same can be applied to cases where processing is performed.

In the first and second embodiments, the case where the simultaneous multi-thread system is used has been described. Here, the simultaneous multi-thread method is a method in which a plurality of threads are processed at one time. On the other hand, in the multi-thread method, there is also a time-division multi-thread method in which only one thread is processed at a time, and a thread is switched at regular intervals or when it is found that instruction execution is delayed due to a cache miss or the like. Therefore, TSO guarantee in the case of the time-division multi-thread method will be described.

In the time-sharing multi-thread system, the running thread is put to sleep and the thread is switched by starting another thread operation. Therefore, when switching threads, all fetch instructions and store instructions issued from the sleeping thread and not committed are canceled. By doing so, it is possible to avoid Tso violations that may occur from the store of another thread due to out-of-order completion of the fetch instruction.

In addition, the committed store instruction is waited for execution in the store port or the write buffer holding the store request and the store data until the write to the cache memory or the main storage device becomes possible. Run the store. Here, when the result of the preceding store must be reflected by the subsequent fetch, that is, when the subsequent fetch uses the memory area to be operated by the preceding store, the address and operand length of the store request are used. The fetch request key It is detected by comparing the dress and the operand length. In this case, the execution of the fetch is made to wait until the execution of the store is completed by SFI (Store Fetch Interlock).

In this way, even if the thread switching occurs after the store instruction is committed and the store of a different thread is stuck in the store port, the SFI operation is enabled to reflect the effect of the store from the different thread. This avoids TSO violations caused by different thread stores during thread dormancy.

Furthermore, by using the invalidation of the cache line, the setting of the RIM flag by the flushing, and the setting of the RIF flag by the arrival of data, it is possible to guarantee the TS between processors, and to guarantee the TS between different threads. In addition, TS〇 of the entire computer system can be guaranteed.

As described above, according to the present invention, when the data of the address specified by the memory access request is stored, the thread that registered the stored data and the thread that issued the memory access request are the same. Consistency that guarantees the consistency of the read and write execution order among the plurality of instruction processing devices with respect to the data shared among the plurality of instruction processing devices based on the determination result. Since the security mechanism is configured to operate, there is an effect that the consistency of the execution order of reading and writing of shared data between threads can be guaranteed. Further, according to the present invention, when a cache line including data at an address specified by a memory access request is stored, the thread at which the stored cache line is registered and the thread at which the memory access request is issued are the same. Judgment as to whether a certain force is present or not, and when it is judged that they are not the same, consistency that guarantees consistency in the execution order of reading and writing between multiple instruction processing units for data shared between multiple instruction processing units. Since the configuration is such that the performance guarantee mechanism is operated, there is an effect that the consistency of the execution order of reading and writing of shared data between threads can be guaranteed. Further, according to the present invention, the primary data cache device is provided with a cache line for the secondary cache device when a cache line whose physical address matches the memory access request from the instruction processing device is registered by different threads. If the cache line for which the cache line fetch request has been received is registered in the primary data cache device by another thread, the secondary cache device invalidates or flushes the cache line. A cache line shared with another set of primary data cache devices by making a request to the secondary data cache device, and the primary data cache device invalidating or flushing the cache line based on the request from the secondary cache device. Against Since the consistency assurance mechanism that guarantees the consistency of the read and write execution order among a number of instruction processors is configured to operate, the consistency of the shared data read and write execution order between threads is guaranteed. It has the effect of being able to

Further, according to the present invention, when a thread to be executed by the instruction processing device is switched, all the uncommitted store instructions and fetch instructions among the store instructions and fetch instructions issued by the thread whose execution is interrupted are provided.ぴ When the fetch instruction is invalidated and the execution of the thread whose execution has been interrupted is resumed, a fetch instruction that is affected by the execution result of the committed store instruction is detected, and the detected fetch instruction is executed. Since the configuration is such that control is performed later, there is an effect that consistency in the execution order of reading and writing of shared data between threads can be guaranteed. Industrial applicability

As described above, the storage control device, the data cache control device, the central processing unit, the storage device control method, the data cache control method, and the cache control method according to the present invention provide a multi-threaded computer system that executes a plurality of threads simultaneously. Suitable for stem.

Claims

The scope of the claims

1. A storage controller that is shared by a plurality of threads that are executed simultaneously and that processes a memory access request issued from the threads,

Consistency assurance means for assuring the consistency of the execution order of reading and writing between the plurality of instruction processing devices with respect to data shared between the plurality of instruction processing devices; and data of an address specified by the memory access request. A thread judging means for judging whether or not the thread in which the stored data is registered is the same as the thread which has issued the memory access request;

A consistency assurance operation starting means for operating the consistency assurance means based on a result of the judgment by the thread judgment means;

A storage control device comprising:

2. The consistency assurance operation starting means, when the thread determination means determines that they are not the same, issues a request to fetch the data to a lower-level storage control device, and responds to the fetch request. 2. The storage control device according to claim 1, wherein said consistency assurance means is operated based on an instruction issued by a lower-order storage control device.

3. The coherence assurance operation initiating means, when the thread judging means determines that they are not the same, executes a data ejection operation to a lower-level storage controller to operate the coherence assurance means. The storage control device according to claim 1, characterized in that:

4. The coherence assurance operation activating means is configured to place a cache line based on a determination result of the thread determining means and a shared state of the data among a plurality of instruction processing devices. 2. The storage control device according to claim 1, wherein a replacement operation is performed to operate the consistency assurance unit.

5. A data cache control device which is shared by a plurality of threads executed at the same time and processes a memory access request issued from the thread, wherein the plurality of instructions for data shared among a plurality of instruction processing devices are provided. A consistency assurance means for guaranteeing the consistency of the execution order of read and write between the processing devices; and a cache for storing a cache line including data of an address designated by the memory access request when the cache line is stored. Thread determining means for determining whether the thread that registered the line and the thread that issued the memory access request are the same,

A consistency assurance operation activating means for operating the consistency assurance means when the thread determination means determines that they are not the same;

A data cache control device comprising:

6. The thread determination means determines whether or not the thread that has registered the cache line and the thread that issued the memory access request are the same based on a thread identifier provided in a cache tag. The data cache control device according to claim 5, wherein

7. The consistency assurance means performs the assurance by monitoring the invalidation of the address data or the ejection of data to another storage control device and the capture of data from the other storage control device. The storage control device according to any one of claims 1 to 4, characterized in that:

8. The consistency assurance measures include the PSTV flag and RI Monitoring the invalidation of the address data or the ejection of data to another storage control device and the taking in of data from the other storage control device using an M flag and a RIF flag. 8. The storage controller according to item 7, wherein:

9. A central processing unit having a plurality of sets of an instruction processing unit and a primary data cache unit that execute a plurality of threads simultaneously, and a secondary cache unit shared by the plurality of sets of primary data cache units. hand,

The plurality of sets, each primary data cache device,

A consistency assurance means for assuring the consistency of the read and write execution order among a plurality of instruction processing units for a cache line shared with another set of primary data cache units;

A capture requesting means for issuing a capture request for the cache line to the secondary cache device when a memory access request from the instruction processing device and a cache line at which a physical address matches are registered by different threads; and Discharge execution means for executing the invalidation or discharge of the cache line based on a request from a secondary cache device to operate the consistency assurance means,

The secondary cache device,

If the cache line that has received the request for taking in the cache line is registered in the primary data cache device by another thread, the ejection that requests the primary data cache device to invalidate or execute the ejection of the cache line. Request means

A central processing unit comprising:

10. A storage controller which is shared by a plurality of threads executed at the same time and processes a memory access request issued from the threads. When the thread executed by the instruction processing unit is switched, the access is invalidated to invalidate all uncommitted store instructions and fetch instructions among the store instructions and fetch instructions issued by the thread whose execution is suspended. Means for detecting a fetch instruction affected by the execution result of the committed store instruction when the execution of the thread whose execution has been interrupted is resumed, and executing the detected fetch instruction after the execution of the store instruction And an interlock means for controlling the storage control.

1 1. A storage device control method for processing a memory access request issued from a plurality of threads executed at the same time,

When the data at the address specified by the memory access request is stored, the thread that has registered the stored data and the thread that has issued the memory access request determine whether the thread is the same. Process and

A consistency assurance mechanism that guarantees consistency of the execution order of reading and writing among the plurality of instruction processing devices with respect to data shared among the plurality of instruction processing devices based on the determination result of the thread determination process. The consistency assurance operation start-up process

A storage device control method comprising:

12. The consistency assurance operation activation step, when the thread determination step determines that they are not the same, issues a request to fetch the data to a lower-level storage controller, and responds to the fetch request to request a lower-level storage controller. 21. The storage device control method according to claim 11, wherein said consistency assurance mechanism is operated based on an instruction issued by said storage control device.

13. The consistency assurance operation start step executes a data discharge operation to a lower-level storage controller when the thread determination step determines that the two are not the same. 12. The storage device control method according to claim 11, wherein said consistency assurance mechanism is operated.

14. The consistency assurance operation start step includes executing the cache line replacement operation based on the determination result of the thread determination step and the shared state of the data among a plurality of instruction processing devices, thereby executing the consistency. The storage device control method according to claim 11, wherein the assurance mechanism is operated.

15. A data cache control method for processing a memory access request issued from a plurality of threads executed at the same time,

When a cache line including the data of the address specified by the memory access request is stored, the thread in which the stored cache line is registered is the same as the thread that issued the memory access request. A thread determination step of determining whether

A consistency assurance mechanism for assuring the consistency of the execution order of reading and writing between the plurality of instruction processing devices with respect to data shared among the plurality of instruction processing devices when the thread determination process determines that the two are not the same; A consistency assurance operation start process for operating

A data cache control method comprising:

16. The thread determination step is characterized by determining, based on a thread identifier provided in a cache tag, whether or not the thread that has registered the cache line and the thread that issued the memory access request are the same. The data cache control method according to claim 15, wherein:

17. The consistency assurance step includes invalidating the data of the address or discharging data to another storage control device and taking in data from the other storage control device. The storage device control method according to any one of claims 11 to 14, wherein said assurance is performed by monitoring the following.

18. The consistency assurance step is to invalidate the data of the address or to discharge the data to another storage control device and to store the other data using the PSTV flag, RIM flag and RIF flag provided at the fetch port. The storage device control method according to claim 17, wherein monitoring of data acquisition from the control device is performed.

1 9. Used in a central processing unit having a plurality of sets of an instruction processing unit and a primary data cache unit that execute a plurality of threads simultaneously, and having a secondary cache unit shared by the primary data cache units. In the cache control method, the primary data cache device may be configured to control the secondary cache device when a cache line whose physical address matches a memory access request from the instruction processing device is registered by different threads. A capture requesting step of performing a capture request of the cache line;

If the cache line that has received the request to capture the cache line is registered in the primary data cache device by another thread, the secondary cache device invalidates or flushes the cache line and executes execution of the primary data cache. A discharge requesting process requesting the device;

The primary data cache device performs a plurality of instructions on a cache line shared with another set of primary data cache devices by performing invalidation or ejection of the cache line based on a request from the secondary cache device. A discharge execution step of operating a consistency assurance mechanism for guaranteeing consistency of the execution order of reading and writing between processing devices;

A cache control method comprising:

20. A storage device control method for processing a memory access request issued from a plurality of threads executed at the same time,

When the thread executed by the instruction processing unit is switched, the access is invalidated to invalidate all uncommitted store instructions and fetch instructions among the store instructions and fetch instructions issued by the thread whose execution is suspended. Steps: • When execution of the thread whose execution has been interrupted is resumed, a fetched instruction that is affected by the execution result of the committed storage instruction is detected, and the detected fetched instruction is executed. And an interlocking step of controlling the storage device to be executed later.