+

WO1994011828A2 - Tampon d'ecriture avec rassemblement d'octets a classement total - Google Patents

Tampon d'ecriture avec rassemblement d'octets a classement total Download PDF

Info

Publication number
WO1994011828A2
WO1994011828A2 PCT/US1993/010855 US9310855W WO9411828A2 WO 1994011828 A2 WO1994011828 A2 WO 1994011828A2 US 9310855 W US9310855 W US 9310855W WO 9411828 A2 WO9411828 A2 WO 9411828A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
write
address
write buffer
information
Prior art date
Application number
PCT/US1993/010855
Other languages
English (en)
Other versions
WO1994011828A3 (fr
Inventor
Joseph A. Bailey
Original Assignee
Ast Research, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ast Research, Inc. filed Critical Ast Research, Inc.
Priority to AU55987/94A priority Critical patent/AU5598794A/en
Publication of WO1994011828A2 publication Critical patent/WO1994011828A2/fr
Publication of WO1994011828A3 publication Critical patent/WO1994011828A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers

Definitions

  • This invention relates generally to memory management systems. More particularly, the invention relates to write buffer systems used to enhance performance in computer systems.
  • Computer systems with memory caching systems are well known in the art. These computers typically use a microprocessor, or central processing unit (CPU) , to read information from one source and write information to some destination.
  • CPU central processing unit
  • One goal in efficient CPU management is to decrease the time spent performing these two operations. On the other hand, anything that increases the read or the write time, or both, hurts the performance of the CPU.
  • To help the CPU better perform its task computer systems have added memory subsystems for holding instructions and data. Instructions and data are generally retrieved off a slow mass storage device such as a hard disk and placed in this memory subsystem. Information is obtained from the memory subsystems when the CPU issues a read request and the information is written into main memory when the CPU issues a write request.
  • a solution to this problem is the implementation of two levels of memory.
  • a large, relatively slow, but inexpensive, main memory usually DRAM
  • main memory usually DRAM
  • cache memory usually SRAM
  • SUBSTITUTE SHEET that is more likely to be referenced again by the CPU.
  • the CPU first attempts to find needed instructions and data in the cache, which is fast enough to keep up with the CPU. Only if the information is not in the cache is a read request issued to main memory. When the requested information arrives, it is both provided to the CPU and written into the cache overwriting some previous entry for potential future use. On a data write from the CPU, either the cache or main memory, or both, may be updated, it being understood that flags may be necessary to indicate to one that a write has occurred in the other.
  • the use of a cache memory improves the overall throughput of the computer because it significantly reduces the number of wait states that the computer must enter. Wait state are still necessary, however, when an access to main memory is required.
  • the speed of a main memory search request which occurs during a read cache miss, is critical to the throughput of a computer system because the CPU cannot continue operating until the requested information is received. It is recognized, however, that the speed of a memory write request need not be as critical as a memory read request. This is because the write by the CPU can be stored external to the CPU in write buffering logic and later the write can be passed to the main memory. The write buffering logic will immediately absorb the write and tell the CPU that its write operation is finished. In this way, even if some other device has control of main memory, the CPU is able to continue without wait states.
  • a write buffer subsystem positioned between the CPU and the main memory, allows the system to pass read requests to the memory immediately, but passes write requests to the memory only when the bus is free to do so.
  • the write requests are buffered in an internal FIFO like structure consisting of ranks, and held until the bus is available.
  • the write buffer "absorbs" the CPU write, telling the CPU that the write to main memory has concluded so that the CPU may perform other operations. The CPU thinks the write is finished even though it may still be
  • the write buffer subsystem includes logic to determine whether any arriving memory read requests are requesting data still in the write buffer. If so, the write buffer temporarily halts the CPU while the conflicting write requests are flushed from the FIFO.
  • a write buffer subsystem typically generates a buffer full signal to prevent the CPU from completing a write request when the write buffer cannot accept it.
  • a write buffer apparatus for use in a computer system having a central processing unit (CPU) and a memory unit, that allows for rapid writes from the CPU wherein subsequent writes may be gathered to a previously written address within the write buffer.
  • the write buffer includes a plurality of ranks, which hold a plurality of data information units comprising a data element and an associated address element.
  • the write buffer includes control logic that allows for a first one of the plurality of data information units to be stored in an unoccupied one of the plurality of ranks. This control logic further allows for comparing a second data information unit to one of the occupied or to all of the occupied plurality of ranks, to determine whether the occupying data information unit has the same associated address as the incoming data information unit.
  • control logic If the same address is found between an incoming address unit and a previously stored address unit, then the control logic further stores the matching units together in the same rank. Once at least one of the ranks has been occupied with data information, the control logic signals the memory system that it is ready to write out information to the memory
  • SUBSTITUTE SHEET system When more than one rank is occupied and ready to be written out to the memory unit, the system can interrupt the write buffer operations to require it to remove all presently stored information in a first in first out order until such removal is no longer necessary.
  • the write buffer may be compatible with any type of memory cache system coupled to the CPU and the memory unit, which cache systems may be such as a write back system or a copyback system. It is understood that the cache system, the write buffer system, and the CPU operate at substantially the same speeds.
  • the control logic further includes a write pointer and a read pointer to indicate the next rank available for storage if a match is unsuccessful and the first rank to be read when a write request is issued, respectively.
  • the invention further discloses a method for buffering data temporarily until it can be written to its intended destination.
  • the steps involved in this method include: Sending a ready signal from a data register having a plurality of ranks. Each rank is capable of holding a plurality of data information. Storing a first datum in an unoccupied one of the plurality of ranks. Matching a second datum to the address to a datum stored in the plurality of ranks. This means that the stored data have the same address as the second datum. Storing the second datum with the stored datum. And, sending a ready to remove signal from the data register for removing the plurality of data information units upon request.
  • Additional steps in the method include removing the data stored in the data register to its intended destination. Datum associated with a specific address stored in the data register may be requested and removed from the data register. The data stored in the ranks before the ranks requested data also are removed until a specific request is removed. Occasionally, data may not be stored together even though they have corresponding addresses due to rules of order imposed upon the system by the CPU employed. When this happens, additional steps are necessary to determine whether the match
  • Fig. 1 shows a block diagram of a computer memory system having a central processing unit, a system memory, an optional cache, and a write buffer according to the present invention
  • Fig. 2 illustrates a schematic diagram of the write buffer shown in Fig. 1;
  • Figs. 3A-3E illustrate a block diagram of the functional elements of the write buffer shown in Fig. 1, which shows an internal processor unit;
  • Figs. 4A-1 and -2, 4B-1 -2, -3, -4, -5, -6, -7 and -8, and 4C-1, -2 and -3 illustrate block diagrams of the internal processor unit shown in Figs. 3-3E; specifically, Figs. 4A-1 and -2 illustrate a block diagram of a buffer register and flag logic; Figs. 4B-1, -2, -3, -4, -5, -6, -7 and -8 illustrate a block diagram of the buffering control logic, ⁇ and, Figs. 4C-1, -2 and -3 illustrate a block diagram of a multiplexer used in the processor of Figs. 3A-3E.
  • Fig. 1 shows a computer system 10, including a central processing unit (CPU) 12, a system memory 14, a write buffer
  • the general architecture of the computer system is such that individual 8-bit bytes, 16-bit half-words, 24-bit tri-bytes, or 32-bit words may be accessed in system memory 14.
  • the write buffer 16 communicates with write buffer 16 over a CPU bus 22, which consists of a CPU address bus 22a and a CPU data bus 22d.
  • the write buffer 16 communicates with system memory 14 over a memory bus 24, which consists of a memory address bus 24a and a memory data bus 24d.
  • Memory bus 24 operates under the control of a central bus of a system memory controller 20.
  • An optional cache 18 holds instructions and data for CPU 12 and
  • SUBSTITUTE SHEET communicates with CPU 12 through CPU bus 22. If the needed information is not present in cache 18 or if the information is in an un-cached segment of memory, or if a data write must be performed, CPU 12 issues an appropriate memory access request to write buffer 16.
  • write buffer 16 receives a write request, which consists of an address-data pair and some control signals from CPU 12, one of several things may occur. If no other write requests are pending in write buffer 16, and if the system memory bus 24 is free, the write request is passed over, after a brief delay, to system memory 14 for execution. If no other write requests are pending in write buffer 16, but system memory bus 24 is busy, the write request is stored in the first rank of an internal register file and its availability is indicated to system memory controller 20. System memory controller 20 enables the request onto system memory bus 24 when the bus becomes free and, when the write is complete, acknowledges its use of the information. If exactly one other write request is pending in write buffer 16 when a new request is received, the new request is merely stored in the next available buffer rank.
  • write buffer 16 compares the word request of the incoming request to the word address of each rank in the buffer. If there is no match, the new request is written into the current buffer rank. If there is a match, the new request is "gathered” into the matching buffer rank. In accordance with the general architecture of the computer, only those bytes of the incoming data that are valid, i.e., intended to overwrite bytes of the address word in system memory 14, overwrite bytes in the buffer ranks. A byte in a buffer rank is left unchanged if the incoming data for that byte is invalid.
  • the two requests will be converted to a full word write request and stored in the buffer rank holding the matching request.
  • SUBSTITUTE SHEET data in bytes 0 and 1 will be data contributed by the new write request and the data in bytes 2 and 3 will be data contributed by the matching request. This has the advantage that not only are requests combined to make use of the full 32-bit bus width to system memory 14, but a superfluous write to bytes 0 and 1 of the destination word address is eliminated.
  • Write buffer 16 constantly compares the word address on CPU address bus 22 with the word address of all pending write requests. If a match is found, a match signal is generated that, if it is generated during a read request, puts CPU 12 in a wait state. System memory controller 20 will then execute pending write requests in the order stored in the buffer ranks until the match signal clears. This ensures that a read
  • SUBSTITUTE SHEET request from a memory location is never executed until all pending write requests to that location are completed.
  • a write buffer system has been designed and constructed using a first-in-first-out (FIFO) data register and control logic divided into sixteen ranks. Each byte is treated individually for gathering purposes while each rank stores an entire 30 bit address, 4 Byte enables, four bits of parity, 32 bits of data, and a data-code bit. Thus, each of the 16 ranks consist of 71 bits of information.
  • the byte ordering for this system is Little- Endian as follows: Block Byte 3--Block Byte 2--Block Byte 1-- Block Byte 0, but also may be Big-Endian, if desired. Additionally, each rank has an associated valid bit.
  • write buffer 16 is designed to be compatible with all the command and control signals and functions of a '486-type microprocessor, such as, for example, an 80486DX2 microprocessor manufactured by Intel Corporation.
  • System memory 14 interfaces with write buffer 16 as if write buffer 16 were the CPU 12.
  • write buffer 16 is designed to emulate the cycle type and timing characteristics of the '486 CPU. It will be apparent to those skilled in the art, based on the command and control features of the '486 microprocessor, on how to design an interface which emulates the interface present on a '486-type processor.
  • write buffer technology disclosed in this inversion is not intended to be limited to the '486 family of processors, or to the Intel family of 80x86-type processors.
  • a similar write buffer subsystem may be implemented for use with the 68000 series of microprocessor manufactured by Motorola Corporation.
  • Yet another write buffer subsystem may be adapted for use in a computer system using a microprocessor based on RISC architecture.
  • Write buffer 16 acts as an intelligent pipeline between a 486 CPU and the host system memory.
  • the write buffer looks exactly like system memory to the central processing unit and executes 486-style transactions on its system interface side.
  • the write buffer does not directly connect to either the ISA
  • Fig. 2 is a functional block diagram of a write buffer 16.
  • Write buffer 16 includes address compare logic 30, which is connected to byte gathering qualifier 32. The address compare logic matches the incoming write request address to any register file rank having the same prestored address. Byte gathering qualifier 32 determines whether a byte can be gathered to a specific rank without violating any of the rules described below. Further connected to byte gathering qualifier 32 is write point logic 34, which includes a byte clock generator. This pointer logic 34 points to proper location in which to write when a write request is passed to write buffer 16.
  • Register file 36 is connected to write pointer logic 34, to address compare logic 30, and to the write data bus from the CPU 12.
  • An output multiplexer 38 is connected to register file 36 and further connected to memory bus 24, which is coupled to memory 14.
  • Write buffer 16 is further illustrated in Figs. 3A-3E and
  • FIG. 3A-3E illustrate the main functional blocks of write buffer 16. These functional blocks include a write buffer processor 300, a control unit 302, an address multiplexer 304, a data multiplexer 306 and a read latch 308. Processor 300 is further illustrated in Figs. 4A-1 and -2, 4B- 1, -2, -3, -4, -5, -6, -7, and -8, and 4C-1, -2 and -3.
  • Processor 300 includes a register 402, a flag controller 404, a compare logic 406, a read controller 408, an input/output (I/O) element 410, a memory write 412, a write multiplexer 414, a byte clock ("BYTCLK”) 416, a pipeline (“PIP2”) 418, a multiplexer 420, a system write 422 and a write latch 424.
  • a register 402 a flag controller 404, a compare logic 406, a read controller 408, an input/output (I/O) element 410, a memory write 412, a write multiplexer 414, a byte clock (“BYTCLK”) 416, a pipeline (“PIP2”) 418, a multiplexer 420, a system write 422 and a write latch 424.
  • Processor 300 in write buffer 16 coordinates the addressing and data transfer operations from write buffer 16 to memory 14 and from CPU 12 to write buffer 16 during the appropriate read or write functions.
  • Processor 300 is coupled to address multiplexer 304, control 302, data multiplexer 306
  • write buffer 16 Upon system power up, write buffer 16 is in an empt state and a read pointer is positioned to point at rank 0 as is a write pointer.
  • the write and read pointers are simple binary counters, which track where the next write into the buffer occurs and which rank of the write buffer is next to be written out to the system, respectively.
  • the next write Upon completion of the first write, which is to memory and is stored in rank 0, the next write is stored in rank 1 of write buffer 16.
  • the same signal used to store the write also sets the Valid bit for that rank of the buffer.
  • Flags logic 404 negates the EMPTY flag. EMPTY flag, when it goes low, signals to the state machine responsible for writing stored information out to the system to begin writing out the stored information.
  • Any address of a memory read that matches an address stored within the write buffer results in the write buffer emptying up to the point until the matching rank has been written out to the system.
  • a match indication cannot occur unless the matching rank is valid and when that rank is written out to main memory, its valid bit is cleared. Note that address-data pairs are never erased from register file 402. Ranks of the register file which contain pending writes will have their corresponding valid bits set. As they are written out, the valid bits are cleared. Every rank has one associated valid bit. When they are all set the write buffer is full.
  • the I/O or Locked cycle causes an entire purge (or emptying) of the write buffer. DMA cycles may or may not cause a purge depending upon how the write buffer is programmed. This feature is provided to allow system designers maximum flexibility. Generally, because all I/O cycles cause the write buffer to empty, and DMA access to memory, which was caused by an I/O read or write, will be furnished with correct data. This is because software will generally always perform a write to
  • Locked cycles are special "group" cycles that must occur back to back. For example, a Locked cycle may result when a read-modify write cycle is issued by the CPU. The CPU then reads a location in system memory and, based on that read value, the write buffer writes information out to system memory. Since this is a Locked cycle, another DMA cycle cannot interrupt the Locked cycle once it has begun. All Locked cycles are non-cacheable; therefore, even if a secondary cache is present, none of the state machines wait for a HIT determination to proceed if "ADS#" is asserted in the presence of a Lock (“LOCK#”) input from the CPU. If the write buffer contains information, the write buffer empties it before the Locked cycle passes through to the system.
  • LOCK# Lock
  • the first half of a locked sequence may incur latency due to the write buffer emptying.
  • the signal SLOCK# is not asserted on the system side until the write buffer is empty.
  • the SLOCK# signal is generated by the write buffer in response to the CPU asserting LOCK# and the write buffer being empty.
  • SUBSTITUTE SHEET it is well known that, for example, when a DMA Master has been granted control of the system side of the write buffer, it will be asserting signals such as SDC#, the addresses and byte enables, etc.
  • SDC# the addresses and byte enables
  • data may need to flow in or out of the cache when the DMA access is to dirty line. This situation is shown in the table. Based on which state machine is active with the device, I/O buffers and transceivers will be enabled or tri- stated as required to assure proper directionality of information flow.
  • no 10, L0CK#, or N0BUFF# cycle will be a cache hit, so, for example, the row in the table for "CPU Reads," which are not Cache Hits, applies not only to memory reads that do not result in cache hits, but also to any 10, Locked, or NOBUFF# read.
  • Memory versus 10 writes are not broken out separately for the sake of this table because the determination as to whether these cycles get passed through to the system is dependent only on the secondary cache circumstances. 10 writes are never secondary cache hits.
  • Memory writes which are cache hits in copy back caches, are absorbed by the cache and are not forwarded to the system-- this being the fundamental nature of the copy back cache.
  • the write is not a cache hit, for any reason, it will be passed through to the system eventually, whether it is an 10 write or a memory write that gets buffered in the write buffer--these particulars have no bearing on the direction of Address, Data, and Control flow, which is the subject of this table.
  • Control logic 302 includes various input and output pins.
  • One such pin is "MISS#,” which is an output to the system controller. MISS# generally is held low, instructing the system memory controller to process each transaction it receives on the system bus.
  • MISS# generally is held low, instructing the system memory controller to process each transaction it receives on the system bus.
  • the system controller hooks directly to the CPU and cache and is therefore aware that requested DMA master read data is or is not dirty within a second level copy back cache, when present. If a copyback cache system is present, DMA read data may need to be snooped from the cache and DMA write data may need to be written to
  • HIT is a second pin on the control logic 302, which pin is logically equivalent to MISS# when a SHOLD has been asserted and a copyback cache is used. If there is no cache present, the HIT pin is tied lo Otherwise, the HIT input is supplied from the TAG RAM HI generating logic, which is part of the secondary cache controller. This device assumes that this signal will be valid by the end of the first T2 period. Cache read hits and copyback write hits do not pass through the write buffer.
  • a snoop write (“SNOOPWR#”) pin is provided and is operative only when SHLDA has been granted.
  • SNOOPWR# causes the data bus transceivers to be enabled and point data to flow from the system bus of the CPU/cache bus.
  • This signal has an internal pull-up resistor and is used if a DMA master is performing a write to memory and a hit to a dirty line in a copyback cache results. If such should occur, the data from the DMA master must also be written to the dirty line of the cache. Optionally, this output may be grounded in the event that a copyback cache system is present. This results in data flow from the DMA master write to be enabled on the CPU/cache bus any time SHLDA to the system is granted.
  • SNOOPRD# (“SNOOPRD#”) input, which also is operative only when SHLDA has been granted.
  • SNOOPRD# causes the data bus transceiver to be enabled and point data to flow from the CPU/cache bus to the system bus.
  • the signal has an internal pull-up resistor. The signal is used if a DMA master is performing a read from memory and a hit in a dirty line of a copyback cache results. When this happens, then the data from the DMA master must come from the cache rather than from the memory. This line takes precedence over the SNOOPWR# input and controls the direction of data flow when asserted over the SNOOPWR#.
  • SUBSTITUTE SHEET Additional control and input ports are provided to allow a read burst to occur from the CPU to the write buffer. These ports include BLAST#, SBLAST#, E1A-D, E2A-D, E3A-D, E4A-D.
  • Address multiplexer 304 controls which address bus is being used during a write operation. Address multiplexer 304 is connected to write buffer address bus 24A and local address bus 22A. Accordingly, when a write function is to occur by the write buffer, the address multiplexer sets the proper address on the system bus selecting its source from either bus, the CPU or from the address stored in register file 402.
  • Address multiplexer 304 includes additional control and signal ports. Such ports include 30 system address lines ( "SADR31...2”) , which are bi-directional with the CPU, four system byte enable bit lines (“SBE# (3...0) ”) , which outputs are uni-directional with the system and which are asserted by the system, and other miscellaneous inputs such as clock, reset, latch address 23 ("LATAD23”) , address strobe line (“A23STRB”), write enable (“WREN#”), four write buffer byte enable lines (“WBBE#”), four byte enable input lines (“BE#I (3...0) ”) , write buffer address (“WBA”) , and 30 address input lines (“ADRI”) .
  • SADR31...2 system address lines
  • SBE# system byte enable bit lines
  • SBE# system byte enable bit lines
  • WBA write buffer address
  • ADRI address input lines
  • Data multiplexer 306 is connected to CPU 12 through write data bus 24D and local data bus 22D. Data multiplexer 306 selects data information from either register file 402 or CPU 12 to be output to the system, depending upon whether the information is located in the write buffer or is originating as 10 information from the CPU.
  • Data multiplexer 306 further includes 32 system data lines (“SD031...0”) and four system parity lines (“SDP03...0”) . Additional inputs include a write enable (“WREN”), four write buffer data parity lines (“WPDP3...0”) , 32 write buffer data lines (“WBD31...0”) , 32 data input lines (“DI31...0”) , and four data parity input lines (“DPI3...0”) .
  • WREN write enable
  • WPDP3...0 write buffer data lines
  • WBD31...0 32 write buffer data lines
  • DI31...0 32 data input lines
  • DPI3...0 data parity input lines
  • KEN# which is an output from the write buffer to the CPU
  • SKEN# which is an input from the system memory controller to the write buffer.
  • KEN# is latched along with all read data, parity, and ready signals from the system during a read operation.
  • the KEN# output reflects what is occurring on input SKEN# after a one clock latency. This input only determines whether the read data may be internally cached or not, and not whether the CPU should attempt to burst write data into the internal cache.
  • KEN# is asserted one clock cycle before the first RDY# of BRDY# of a cache line fill and one clock cycle before the last RDY# or BRDY# of the same cache line fill .
  • Cache line fills may be done in either burst or non-burst fashion depending on the system. Every read the CPU performs will not be a burst attempt. Every cacheable read sequence will involve the reading of four double words.
  • the SKEN# input typically goes to the KEN# input of the CPU. In the present invention, it is made a part of the write buffer in order to keep the KEN# signal phased correctly with the systems SRDY# and SBRDY# inputs.
  • Register 402 connects to CPU 12 through write buffer address and data buses 24A and 24B and to system memory 14 through local address and data buses 22A and 22B.
  • the 16 ranks, which are described above, are included in register 402 of write buffer 16.
  • Register 402 includes various inputs to allow it to restore the buffered data temporarily until it can be written to the system memory. Such inputs include a clock input to provide timing information, a byte check lines (“BYTECK63...0”) , which enable either 1,2, 3 or 4 byte clocks to be generated out of a possible 64 byte clocks. Importantly, 16 ranks x 4 bytes per rank require 64 individual byte clocks to perform the byte gathering function. Register 402 further includes 30 processor address input (“PAD31...2”) lines, 32 processor data input (“PD31...0”) lines, and four processor data parity check (“PDP3...0”) lines. Additional inputs include input reset (“IRST#”) , a write strobe
  • SUBSTITUTE SHEET (“WSTROBE”) , 16 rank parity check lines (“RP15...0”) , and a "PDC” line, which is the CPU's data code line to be stored.
  • Sixteen valid bit lines (VBIT15...0") are provided to indicate whether one or more ranks contain un-written address-dat pairs, and various rank information output busses including 16 rank address lines, 16 address byte enable lines, 16 dat rank, and 16 data rank parity signal lines, ("#A31...2") , (“#BE3..0"), ("#D31...0”) , and (“#DP15...0”) .
  • Flags logic 404 includes various pins such as, L0CK#, L0CKCK1 and 2, SADS1-3, IRST, PREADS (4,2 and 1) , IADS and 16 valid bit lines ("VBIT#") .
  • the VBITS are used to generate the empty, full, and EOKAY outputs of Flags logic 404.
  • the LOCK input is from the CPU. If this line is asserted during either a read or a write cycle, the write buffer is emptied before allowing the cycle to pass through. This line is sampled on every read and write to form the SLOCK# output.
  • the SLOCK# output of flags logic 404 is the lock output to the system.
  • This output is not asserted at the same time as L0CK#, especially if the write buffer contains data that must be purged before the locked cycle is allowed to pass through.
  • the data contained in the register structure of the write buffer is by definition not locked data and, while it is being emptied, the SL0CK# signal is set high.
  • the SL0CK# output is asserted low and the lock cycle passes and the EMPTY flag output is set high when all stored writes have been written out to the system memory.
  • a FULL output flag is set high when there are 16 valid ranks in register 402.
  • the EOKAY flag is set high if there are at least four vacant ranks within the register 402.
  • the EOKAY flag is part of the secondary cache support and tells a secondary cache controller that there is room within the write buffer to evict 4 double words from a dirty line of a copyback cache.
  • Compare logic 406 connects to register 402 through write address and data bus 24A and 24D. Compare logic 406 compares the incoming write information with previously stored write information to perform byte gatherings. If an address match
  • Compare logic 406 responds to address and byte enables signals thus informing the compare logic to perform byte gathering searching. Compare logic 406 also follows the byte alignment protection rules set up in the CPU protocol established in the 486 series of processors. Compare logic 406 includes necessary input such as, for example rank addressing ("R#A”) and rank byte enable (“R#BE”) inputs to allow comparisons to the incoming write to provide storage in either an unoccupied rank or to gather the data to a rank previously written to. A no-gather (“NOGATHER”) pin can be inserted to prevent gathering from occurring, meaning that all writes to the buffer will occupy a separate rank.
  • R#A rank addressing
  • R#BE rank byte enable
  • Read logic 408 a state machine connected to CPU 12 and to the optional cache, is used when the CPU issues a memory read command to perform the reading from the system memory in either a single of burst mode.
  • a BLAST signal which is part of the '486 system protocol and is cache compatible with the '486 internal cache subsystem, is sent to signal when a burst read operation is to be performed.
  • Read logic 408 includes various inputs that have previously been described above, such as, for example, REST ("reset"), SRDY, SBRDY, BLAST, and HIT.
  • Additional inputs include memory input/output ("MI0#”) from the CPU, which is a memory cycle when asserted high and an I/O cycle when asserted low, write/read input from CPU ("WR#”), which is asserted high for a write cycle an asserted low for a read cycle. Additional inputs include a line match, a rank empty, a free input, which is asserted by the "system write" state machine when it is in an idle state, a cache input, and others as are illustrated in Figs. 4B-1, -2, -3, -4, -5, -6, -7 and -8, and will be apparent to one skilled in the art regarding their purpose and application.
  • I/O 410 is a state machine connected to the CPU for observing any I/O or locked cycle, such as during an I/O read or write cycle, and monitors the write buffer status to determine if and when it is empty.
  • I/O 410 includes various input ports such as, for example, BOFF, IRST, SYSCOK,
  • SUBSTITUTE SHEET address set (“ADSET”) , LOCK, MIO, NOBUFF, WR, EMPTY, SRDY, SADS2, IOEN, LOCKCK2, and PREADS4.
  • Memory write two (“MEMWR2") 412, connects to register 402 and the CPU to inform the CPU that the memory system is ready for a write cycle to begin.
  • Memory write two 412 includes selected inputs such as, for example, FULL, COPYBACK, HIT, WRTSTR, RDY, BYTCKEN, MATCH, LOCK#, NOBUFF#, MIO#, WR#, and ADSTE#. The remaining inputs will be apparent to those skilled in the art as per their function and operation.
  • WRTMUX 414 couples to compare logic 406 and to register 402 and is used to direct the write information either to a previously stored rank to perform byte gathering or to a new rank if no gathering is possible.
  • Write multiplexer 414 works in conjunction with compare logic 406 in performing gathering to a previously gathered rank or buffering to the first available, unoccupied rank.
  • Write multiplexer includes various inputs such as MATCH, RANK...0, RSEL15...0, and WRT3..0.
  • Byte clock (“BYTCLK”) 416 connects to compare logic 406 and register 402 to signify which byte location in a specific rank is to be accessed during a byte gathering operation.
  • Byte clock 416 includes various inputs such as, for example, IRST#, RSEL#15...0, PBE#...0, VTKEN, and BYTECK63...0.
  • Pipeline (“PIPE2") 418 provides signal buffering to boost weak signals inputted to write buffer 16. Pipe2418 maintains adequate signal strength to prevent error readings during a write operation to write buffer 16.
  • Main multiplexer (“BIGMUX”) 420 connects to read logic 408, and compare logic 406 and register 402. Main multiplexer 420 looks at the register file while under the control of the read state machine for outputting data during a read request.
  • Write latch (“WRITLA”) 424 connects to CPU 12 and main multiplexer 420 and switches main multiplexer 420 during read operations to access the proper rank and byte locations.
  • Multiplexer 420 includes input ports for correspondence with the write buffer address bus and data bus and with the local
  • SUBSTITUTE SHEET syste address and data buses and with register 400 These inputs include rank data counter (“RDCNT”) , rank byte enable (“R#BE”), rank address (“R#A) , rank data (“R#D”) , rank parity (“R#DP”) , and write buffer address (“WB#A”) , write buffer data (“WB#B”), write buffer byte enable (“WB#BE”), write buffer data compare (“WB#DC”), and write data parity (“WSP”) , respectively.
  • RDCNT rank data counter
  • WB#BE rank byte enable
  • WB#A write buffer address
  • WB#B write buffer byte enable
  • WB#DC write buffer data compare
  • WSP write data parity
  • System write (“SYSWR”) 422 connects to I/O element 410 and is a state machine used to empty the register file of pending stored memory writes from the CPU.
  • System write 422 includes various input and output ports such as, for example,
  • Snoop 426 is a state machine connected to the CPU and to the optional cache element, and uses '486 system protocol to search the cache for data during a snoop operation.
  • Snoop 426 includes various inputs and outputs such as, for example,
  • DMA purge which is used to tell the snoop state machine to wait until "empty" occurs before proceeding with a DMA request
  • IRST IRST
  • SYSCLK SHOLD
  • REMBPTY HLDA
  • WHOLD2 WHOLD2
  • HOLD HOLD
  • SHLDA SHLDA
  • SNEN SNEN
  • the write buffer system has five internal state machines, which have been briefly described above.
  • a CPU memory state machine handles all regular CPU writes to memory in zero "0" wait states.
  • CPU special write state machine handles all I/O, locked and non-buffered writes by the CPU.
  • the system write state machine handles removing data/address pairs from the ranks internal to register 402 and writing them out to the system memory.
  • the system DMA state machine handles the processing of write buffer responses when a DMA master signals for use of the system bus.
  • write buffer register 402 is not empty and no other state machine has issued an internal "hold, " the system write state machine will work to empty write buffer register 402 to system memory 14.
  • the read state machine can also place the system write state machine in hold to allow the read to pass
  • SUBSTITUTE SHEET around any stored writes in the register If the system elected not to purge the write buffer on a DMA cycle, the system DMA state machine can also place the system write state machine in hold. While the system write state machine is active, a "busy" signal is issued and the system write state machine will not respond to a hold from another state machine until after it has completed its current cycle.
  • the CPU memory write state machine examines the Empty flag to determine if it can return a ready signal to the CPU and move the data into the register file or if the ready signal must be withheld until the system write state machine has had time to make space in the register file.
  • Two state machines affect the EMPTY, FULL and EOKAY flags. These are the system write and the cpu memory write state machines. Only the CPU write state machine puts information into the write buffer register 402 (setting valid bits) , and only the system write state machine takes information out of the write buffer register 402 (clearing valid bits) .
  • the EMPTY, FULL, and the EOKAY flags are generated from the state of the "VALID BITS" associated with the 16 ranks of the write buffer register 402.
  • the compare logic 406 generates two other flags, which are MATCH and CPU- MATCH. The match flag indicates that the incoming write matches to a stored address and will cause byte gathering to occur provided all byte gathering rules are met.
  • the CPU- match flag is generated on a CPU read when the address matches a stored address within the write buffer register 402. If no CPU-match occurs, the read state machine simply puts the system write state machine into hold (if it is working) , otherwise, it passes the read cycle onto the system bus.
  • the write buffer system has only one state machine for every type of read. Three types of read cycles, NOBUFF, locked, and I/O reads, cause a complete purge of write buffer register 402, and the read state machine will not let these read cycles to pass to the system interface until the EMPTY flag goes true. Under these read cycles, the read state machine may never issue a Hold to the system write state
  • Byte gathering is a method used by the write buffer to perform more efficient buffering operations.
  • the write buffer performs byte, word, tri-byte and long-word gathering to decrease the number of write transfers to the same address location in main memory. Writes to the same address have their data combined into the same address-data pair rank of the write buffer register. Byte gathering is allowed only if the resulting byte combination is legitimate. This legitimacy requirement is imposed upon the byte gathering characteristics used in a '486-based microprocessor system. These legitimacy requirements are not required limitations for byte gathering in a byte gathering system as contemplated in the present invention. These legitimacy rules are imposed by the processor, and therefore, are explained for the purpose of giving a working enablement for use in a 486 environment. Illegitimate or illegal write combinations are shown on the table below:
  • SUBSTITUTE SHEET BE# refers to the Byte Enable location within a specific rank.
  • the byte combinations indicated by the byte enabled state are unacceptable byte gathering combinations.
  • the write buffer writes to system memory address-data pairs in the sequence in which they were received from the processor except in the case of gathered data. No reliance should be placed on any aspect of gathering as it is not
  • SUBSTITUTE SHEET deterministic and is affected by how quickly the device i able to write its stored data out to the system memory.
  • the byte gathering feature may be disabled durin operation.
  • a NOGATH# signal which signifies no gathering, is asserted to keep the corresponding memory write from becomin a candidate for byte gathering within the register of the write buffer. While this signal is asserted, every memor write, which has not been disallowed from registered storage for some other reason such as it being a locked or I/O write, occupies a rank within the register of the write buffer regardless of whether it may be gathered to any pre-existing location or not.
  • the system also includes the ability to disable the buffering capabilities.
  • a NOBUFF# signal which signifies no buffering, is asserted to cause the corresponding write to bypass storage in register 402 and force the write buffer to purge itself of any contents stored within the register. This signal is applicable to writes to system memory as well, which writes are normally stored in the write buffer.
  • the NOBUFF# signal is asserted to decode memory mapped device address ranges, such as, for example, video memory, which must not be buffered in the write buffer.
  • NOBUFF# writes cause the write buffer system to purge the register of any data contents.
  • the write buffer can allow a memory read to occur even when data remains unwritten to the system memory.
  • the write buffer allows for a read re-ordering operation to be performed during these circumstances.
  • the data information to be read from the system memory is occasionally found in the write buffer.
  • Matching memory reads result in stored writes being emptied until the match goes away, thus assuring that the CPU's read request fetches the latest data from the system memory.
  • the state machine responsible for writing stored information to the system is halted, and the read cycle is allowed to pass through. This is read re-ordering. Even though the remaining
  • SUBSTITUTE SHEET stored writes occurred in time before the current memory read, the system "sees" the read first. This allows the CPU to get its read data more quickly than by completely purging the data contents of the write buffer to the memory and then allowing the CPU read to pass through to system memory.
  • Memory cache subsystems will now be described. If the computer system has a memory cache subsystem, it may be either a write-through or a write-back type cache. In a write- through cache system all writes to memory are stored in write buffer 16. In a write-back cache subsystem, only write-cache misses are stored by the write buffer.
  • a write-through cache is represented by the L-l internal primary cache found in the '486 microprocessor. When a write by the CPU takes place, regardless of whether the write is a hit in the cache, it also is written to memory. This provides that the system always has the same data in memory as that existing in the cache itself. If a direct memory access (DMA) master reads from system memory, the data is available in the memory system without needing to access the data from the cache subsystem. Furthermore, when a DMA master writes to the system memory, and the write is hit in the cache, the cache line may be invalidated and the cache controller may proceed to linefill. Since every write goes to the system, a write- through cache requires a higher band width on the system bus. A write-through cache is simpler to design and implement than a copyback cache because there are numerous factors involving data coherency that must be accounted for in copy back designs.
  • DMA direct memory access
  • a copyback cache requires less system bus band width than the write-through cache since when a write takes place by the
  • a copyback cache permits the CPU to write into the cache if a write hit occurs, and that write does not go out to the system immediately. Accordingly, when a cache line is written to the cache system by the CPU, it becomes "dirty" because the copy of the data
  • a write back cache may hold dirty data
  • special techniques called “snooping” and “snarfing” are employed to ensure data coherency.
  • “Snooping is the process of supplying dirty copyback data from the cache on a DMA read.
  • “Snarfing” is the process of writing data to the cache line (either dirty or not dirty) during a DMA write.
  • the data When a DMA master tries to read a location in memory that is "dirty" in the cache, the data must come from the cache itself, rather than from the system memory.
  • a DMA master tries to write to system memory and the write is a cache hit, the data must be simultaneously written into the cache and the system memory. If the line, during this write, is not dirty but is valid, it may be invalidated. If the line is valid and dirty, the DMA master's data must replace only that portion of the line representing the bytes where the DMA write occurs.
  • a "line” is the physical architecture of a string of data held in a cache.
  • the line width is based on the same line width as used in the '486 processor, which line width is 16 bytes or four long words. This line formatting is used to correspond to the readburst of the '486 CPU.
  • all line evictions in a write-back cache will use the write buffer to store the data temporarily before being written out to the main or system memory.
  • a flag, Eviction Okay “EOKAY,” is set high when there are at least four available ranks within the write buffer, thus providing enough room for a whole line to be evicted.
  • EOKAY When EOKAY is asserted, an eviction may be performed and control circuitry puts the CPU into HOLD and reads the line from the memory cache.
  • Another contemplated feature is an additional internal address code logic, which is range programmable to allow for the decoding of memory map I/O devices, video, and the like.
  • the address range may be limited to enhance spee performance, rather than follow the address range of the fou giga-byte limit of the '486 microprocessor.
  • the system read re-ordering operation may be modified to replace the "purge until match goes away" feature. This allows data to be supplied directly from the write buffer when the CPU issues a read command and the data is in the write buffer. Furthermore, if the CPU writes to byte 0 and tries to read bytes 0 and 1, the write buffer reads the four byte long word from memory and "merges" the good bytes from memory with the byte 0 in the write buffer. This is known as a "partial" hit situation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

L'invention décrit un système de tampon d'écriture amélioré. Le tampon d'écriture peut rassembler à n'importe quel niveau de classement dans le tampon. Le tampon d'écriture comprend des emplacements de stockage d'adresse qui sont appariés lorsqu'une nouvelle écriture est écrite dans la mémoire tampon d'écriture. Si une concordance se produit, le système détermine si les nouvelles informations d'octets peuvent être rassemblées sur l'emplacement d'adresse concordant. Si un conflit existe au niveau des informations d'écriture, les informations d'octets et l'adresse sont rassemblées sur un nouvel étage. Une fois l'étage plein, le tampon essaie d'écrire les informations de données de la mémoire d'écriture dans un système de mémoire. Le tampon d'écriture accroît les performances du système en permettant à une unité centrale d'écrire dans le tampon d'écriture presque instantanément, sans qu'il y ait de temporisation de la part du système de mémoire plus lent à absorber l'écriture. Pendant que l'unité centrale est occupée à effectuer d'autres tâches, le tampon d'écriture sort des informations d'écriture et les envoie à l'unité de mémoire ou à d'autres dispositifs du système qui nécessitent des informations de données d'adresse.
PCT/US1993/010855 1992-11-09 1993-11-09 Tampon d'ecriture avec rassemblement d'octets a classement total WO1994011828A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU55987/94A AU5598794A (en) 1992-11-09 1993-11-09 Write buffer with full rank byte gathering

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US97386492A 1992-11-09 1992-11-09
US07/973,864 1992-11-09

Publications (2)

Publication Number Publication Date
WO1994011828A2 true WO1994011828A2 (fr) 1994-05-26
WO1994011828A3 WO1994011828A3 (fr) 1994-07-07

Family

ID=25521309

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/010855 WO1994011828A2 (fr) 1992-11-09 1993-11-09 Tampon d'ecriture avec rassemblement d'octets a classement total

Country Status (2)

Country Link
AU (1) AU5598794A (fr)
WO (1) WO1994011828A2 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5638534A (en) * 1995-03-31 1997-06-10 Samsung Electronics Co., Ltd. Memory controller which executes read and write commands out of order
US5666494A (en) * 1995-03-31 1997-09-09 Samsung Electronics Co., Ltd. Queue management mechanism which allows entries to be processed in any order
WO1999036849A1 (fr) * 1998-01-16 1999-07-22 Advanced Micro Devices, Inc. Architecture peps a tampon d'ecriture avec fonction de surveillance d'acces selectif
GB2335762A (en) * 1998-03-25 1999-09-29 Advanced Risc Mach Ltd Write buffering in a data processing apparatus
US9892768B2 (en) 2012-02-24 2018-02-13 Avago Technologies General Ip (Singapore) Pte. Ltd. Latching pseudo-dual-port memory multiplexer
CN118605941A (zh) * 2024-08-07 2024-09-06 南京沁恒微电子股份有限公司 能够快速处理内存拷贝指令的cpu及其方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4959771A (en) * 1987-04-10 1990-09-25 Prime Computer, Inc. Write buffer for a digital processing system
US5023776A (en) * 1988-02-22 1991-06-11 International Business Machines Corp. Store queue for a tightly coupled multiple processor configuration with two-level cache buffer storage
US5224214A (en) * 1990-04-12 1993-06-29 Digital Equipment Corp. BuIffet for gathering write requests and resolving read conflicts by matching read and write requests
US5193167A (en) * 1990-06-29 1993-03-09 Digital Equipment Corporation Ensuring data integrity by locked-load and conditional-store operations in a multiprocessor system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5638534A (en) * 1995-03-31 1997-06-10 Samsung Electronics Co., Ltd. Memory controller which executes read and write commands out of order
US5666494A (en) * 1995-03-31 1997-09-09 Samsung Electronics Co., Ltd. Queue management mechanism which allows entries to be processed in any order
WO1999036849A1 (fr) * 1998-01-16 1999-07-22 Advanced Micro Devices, Inc. Architecture peps a tampon d'ecriture avec fonction de surveillance d'acces selectif
US6151658A (en) * 1998-01-16 2000-11-21 Advanced Micro Devices, Inc. Write-buffer FIFO architecture with random access snooping capability
GB2335762A (en) * 1998-03-25 1999-09-29 Advanced Risc Mach Ltd Write buffering in a data processing apparatus
US6415365B1 (en) 1998-03-25 2002-07-02 Arm Limited Write buffer for use in a data processing apparatus
GB2335762B (en) * 1998-03-25 2002-10-30 Advanced Risc Mach Ltd Write buffering in a data processing apparatus
US9892768B2 (en) 2012-02-24 2018-02-13 Avago Technologies General Ip (Singapore) Pte. Ltd. Latching pseudo-dual-port memory multiplexer
CN118605941A (zh) * 2024-08-07 2024-09-06 南京沁恒微电子股份有限公司 能够快速处理内存拷贝指令的cpu及其方法

Also Published As

Publication number Publication date
AU5598794A (en) 1994-06-08
WO1994011828A3 (fr) 1994-07-07

Similar Documents

Publication Publication Date Title
JP7553478B2 (ja) 書き込みミスエントリのドレインをサポートする犠牲キャッシュ
US5715428A (en) Apparatus for maintaining multilevel cache hierarchy coherency in a multiprocessor computer system
JP3067112B2 (ja) 遅延プッシュをコピー・バック・データ・キャッシュに再ロードする方法
US5745732A (en) Computer system including system controller with a write buffer and plural read buffers for decoupled busses
US5247648A (en) Maintaining data coherency between a central cache, an I/O cache and a memory
US6366984B1 (en) Write combining buffer that supports snoop request
US5903911A (en) Cache-based computer system employing memory control circuit and method for write allocation and data prefetch
US5627992A (en) Organization of an integrated cache unit for flexible usage in supporting microprocessor operations
US5642494A (en) Cache memory with reduced request-blocking
US5263142A (en) Input/output cache with mapped pages allocated for caching direct (virtual) memory access input/output data based on type of I/O devices
US5717898A (en) Cache coherency mechanism for multiprocessor computer systems
JP3218317B2 (ja) 集積キャッシュユニットおよびその構成方法
US5185878A (en) Programmable cache memory as well as system incorporating same and method of operating programmable cache memory
US5557769A (en) Mechanism and protocol for maintaining cache coherency within an integrated processor
US5761725A (en) Cache-based computer system employing a peripheral bus interface unit with cache write-back suppression and processor-peripheral communication suppression for data coherency
US6665774B2 (en) Vector and scalar data cache for a vector multiprocessor
CA1322058C (fr) Systemes informatiques multiprocesseurs a memoire commune et a antememoires individuelles
US8782348B2 (en) Microprocessor cache line evict array
US5802559A (en) Mechanism for writing back selected doublewords of cached dirty data in an integrated processor
US6212605B1 (en) Eviction override for larx-reserved addresses
JPH11506852A (ja) 多数のバスマスタと共用レベル2キャッシュとを備える多レベルキャッシュシステムでのキャッシュスヌーピングオーバーヘッドの低減
US5918069A (en) System for simultaneously writing back cached data via first bus and transferring cached data to second bus when read request is cached and dirty
JPH0721085A (ja) メモリとi/o装置の間で転送されるデータをキャッシュするためのストリーミングキャッシュおよびその方法
US6434665B1 (en) Cache memory store buffer
US5717894A (en) Method and apparatus for reducing write cycle wait states in a non-zero wait state cache system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AU CA JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

AK Designated states

Kind code of ref document: A3

Designated state(s): AU CA JP KR

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载