+

WO1998010350A1 - A data flow control mechanism for a bus supporting two-and three-agent transactions - Google Patents

A data flow control mechanism for a bus supporting two-and three-agent transactions Download PDF

Info

Publication number
WO1998010350A1
WO1998010350A1 PCT/US1997/011419 US9711419W WO9810350A1 WO 1998010350 A1 WO1998010350 A1 WO 1998010350A1 US 9711419 W US9711419 W US 9711419W WO 9810350 A1 WO9810350 A1 WO 9810350A1
Authority
WO
WIPO (PCT)
Prior art keywords
bus
agent
request
ihe
indication
Prior art date
Application number
PCT/US1997/011419
Other languages
French (fr)
Inventor
Peter D. Macwilliams
Nitin V. Sarangdhar
Stephen S. Pawlowski
Gurbir Singh
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/709,215 external-priority patent/US6405271B1/en
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to AU35870/97A priority Critical patent/AU3587097A/en
Publication of WO1998010350A1 publication Critical patent/WO1998010350A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system
    • G06F13/368Handling requests for interconnection or transfer for access to common bus or bus system with decentralised access control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system
    • G06F13/368Handling requests for interconnection or transfer for access to common bus or bus system with decentralised access control
    • G06F13/37Handling requests for interconnection or transfer for access to common bus or bus system with decentralised access control using a physical-position-dependent priority, e.g. daisy chain, round robin or token passing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4208Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a system bus, e.g. VME bus, Futurebus, Multibus
    • G06F13/4213Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a system bus, e.g. VME bus, Futurebus, Multibus with asynchronous protocol

Definitions

  • the present invention pertains to computer ystems and computer system buses. More particularly, this invention relates to controlling data flow on a computer system bus which supports two- and empe-agent transactions.
  • Modern computer systems typically have multiple agents coupled together via a system bus.
  • the agents are integrated circuit chips with multiple pins coupling each agent to the bus.
  • These agents may include, for example, a processor(s), a memory device(s), a mass storage device(s), etc. In order for the computer system to operate properly, these agents should be able to effectively communicate with each other via the bus.
  • One aspect of this communication is the transfer of data from one agent to another.
  • the transfer of daia on the bus is referred to as the data flow on the bus.
  • an agent which can be the target of a data transfer has a storage space, such as a data buffer, into which the transferred data is placed.
  • agents typically have a limited amount of storage space for data. Therefore, situations can arise where the targeied agent for a data transfer does not have sufficient storage space to store the data. Additionally, it is often the case that only ihe targeted agent knows whether it has sufficient storage space to store the data.
  • it would be beneficial to provide ⁇ ⁇ ⁇ mechanism that allows the agent which is targeted by a request to control the flow of data on the bus for that request.
  • a latched b.is one type of bus which can be used in a computer system.
  • a latched bus system data is latched into a storage space from the bus in one clock cycle and control signals based on that data can be placed on the bus in any of the subsequent clock cycles.
  • a non-latched bus system data is received from the bus in one clock cycle and control signals based on that data can be placed on the bus in thai same clock cycle. Due to the nature of the latched bus, some solutions for controlling data flow on the bus which work on a non-latched bus are too inefficient to work on a latched bus.
  • data can be placed on the bus by a source agent which can wait until it receives a ready signal from the targeted agent, at which point the source agent provides, in the same clock cycle as it receives the ready signal, the next data.
  • a source agent which can wait until it receives a ready signal from the targeted agent, at which point the source agent provides, in the same clock cycle as it receives the ready signal, the next data.
  • this concept of waiting for, receiving, and processing the ready signal for each data transfer takes too much time on a laiched bus because the ready signal would be received in one clock cycle, processed in the next clock cycle, and then the next data would be placed on the bus. Therefore, ii would be beneficial to provide a mechanism thai allows the targeted agent 10 more efficiently control the flow of data on a latched bus.
  • some computer systems include one or more cache memories, each of which is faster and smaller than the main system memory.
  • the cache memory typically allows data which has been recently accessed by an agent, or which is predicted to be accessed soon by an agciu, to be available in a asier memory, thereby reducing the time required to obtain the data and increasing overall system performance.
  • Different agents, such as different processors, on a bus will often have their own cache memory These agents are ihen able to modify the data stored in their cache memory without making the same modifications to the main memory until a later time.
  • situations can arise where data which is requested by a first agent is stored in a cache memory of a second agent, and the requested data in the second agent's cache memory has been modified.
  • the data to be returned to the first agent should come from the cache memory of ihe second agent, not from the main memory because the data in the cache memory is a more recent version.
  • One solution to this problem is to transfer the requested data from the second agent to the first agent and have the memory controller for the main memory also take the data off the bus.
  • a transaction such as this which uses the first and second agents, as well as the memory controller, is referred to as a determinae-agent transaction.
  • this solution presumes that the memory controller has sufficient storage space to take the data off the bus, which is not always the case.
  • . ? a mechanism which allows the memory controller to maintain data flow control on the bus for a three-agent transaction.
  • the present invention provides a data flow control mechanism for a bus supporting two- and three-agent transactions to achieve these and other desired results which will be apparent to those skilled in the art from the description thai follows.
  • An apparatus in accordance with the data flow control mechanism of the present invention includes a control logic to place an indication of a request onto a computer system bus. The apparatus then waits to place data corresponding to the request onto the bus until it has received an indication from an agent coupled to the bus that th agent is ready 10 receive the data.
  • the data flow control mechanism supports both two- and three-ageni transactions.
  • data is transferred from a source agent to a target agent, with the target agent maintaining control of the data How.
  • data is transferred from a snooping agent to either the source agent or the target agent, as well as possibly from the source agent to the target agent.
  • the target agent controls the data flow of transfers to the target agent, regardless of whether they originated with the source agent or the snooping agent.
  • Figure 1 illustrates a multiprocessor computer system such as may be used with one embodiment of the present invention:
  • Figure 2 is a block diagram illustrating a bus cluster system such as may be used with one embodiment of the present invention
  • Figure 3 shows a.i example of overlapped phases for two transactions according to one embodiment of the present invention
  • Figure 4 is a state diagram illustrating the different states for the TRDY# signal in accordance with one embodiment of ihe present invention
  • Figure 5 is a timing diagram illustrating the timing of signals in performing a write transaction according to one embodiment of the present invention
  • Figure 6 is a timing diagram illustrating the timing of signals in performing a read transaction with an implicit writeback according to one embodiment of the present invention.
  • Figure 7 is a timing diagram illustrating the timing of signals in performing a write transaction with an implicit writeback according l ⁇ one embodiment of the present invention.
  • the present invention provides a mechanism for controlling data flow on a bus which supports two- and three-agent transactions.
  • the mechanism allows an agent which is to receive data from the bus to control the flow of data on the bus.
  • the agent which is to receive data indicates when it is ready to receive the data, at which time another agent on ihe bus. which is the source of ihe data being transferred, places the data on the bus.
  • FIG. 1 illustrates a multiprocessor computer system such as may be used with one embodiment of the present invention.
  • the computer system 100 generally comprises a processor-memory bus or other communication means 101 for communicating information between different agents coupled to the bus 101, such as processors, bus bridges, memory devices, peripheral devices, etc.
  • the processor- memory bus 101 includes arbitration, address, data and control buses (not shown).
  • the bus 101 is a latched bus having a data bus widih of 64 bits.
  • each of the one or more processors 102, 103, 104 , and 105 includes a small, extremely fast internal cache memory (not shown), commonly referred to as a level one (LI) cache memory for temporarily storing data and instructions on-chip.
  • LI level one
  • a biggerlevel two (L2) cache memory 106 can re coupled to any one of the processors, such as processor 105, for temporarily storing data and instructions for use by ihe processor(s).
  • processors such as processor 105
  • Each processor may have its own L2 cache, or some may share an L2 cache.
  • Processors 102, 103, and 104 may each be a parallel processor (a symmetric co-processor), such as a processor similar to or the same as processor 105.
  • processor 102, 103, or 104 may be an asymmetric co-processor, such as a digital signal processor.
  • processors 102 through 105 may include processors of different t ⁇ pes.
  • ihe present invention includes Intel ® Architecture microprocessors as processors 102 through 105, such as i386 1 M , i486 I ⁇ I , Pentium " , or Pentium Pro microprocessors.
  • the preseni invention may utilize an> type of microprocessor architecture. It is to be appreciated that the particular archiic:u ⁇ re(s) used is not especially germane to the present invention.
  • the processor- memory bus 101 provides system access to the memory and input/output (I/O) subsystems.
  • a memory controller 122 is coupled to the processor- memory bus 101 for controlling access to a random access memory (RAM) or other dynamic storage device 121 (commonly referred to as a main memory) for storing information and instructions for processor. ⁇ 102 through 105.
  • a mass data storage device 125 such as a matineiic disk and disk drive, for storing information and
  • a display device 123 such as a cathode ray tube (CRT), liquid crystal display (LCD), eic, for displaying information 10 the computer user may be coupled to the processor-memory bus 101.
  • CTR cathode ray tube
  • LCD liquid crystal display
  • eic eic
  • Each of the agents coupled to the bus including processors 102- 105 and memory controller 122, include a bus control logic 108 which acts as an interface between the agent and the bus 101 , both of which may run at different clock speeds.
  • the bus control logic ION includes the latches and necessary circuitry for driving signals onto and receiving signals from the bus 101.
  • An input/outpul (I/O) bridge 124 may be coupled to the processor- memory bus 101 and a system I/O bus 131 to provide a communication path or gateway for devices on either processor-memory bus 101 or I/ bus 131 to access or transfer data between devices on the other bus.
  • ihe bridge 124 is an interface between the system I/O bus 1 1 and the processor- memory bus 101.
  • the I/O bus 131 communicates information between peripheral devices in the computer system.
  • Devices that may be coupled to the system bus 1 1 include, for example, a display device 1 2, such as a cathode ray lube, liquid crystal display, etc., an alphanumeric input device 133 including alphanumeric and other keys, etc., for communicating informal on and command selections to other devices in the computer system (e.g., the processor 102) and a cursor control device 134 for controlling cursor movement.
  • a display device 1 2 such as a cathode ray lube, liquid crystal display, etc.
  • an alphanumeric input device 133 including alphanumeric and other keys, etc.
  • a cursor control device 134 for controlling cursor movement.
  • a hard copy device 1 35 such as a plotter or printer, for providing a visual representation of ihe computer images and a mass storage device 136, such as a magnetic disk and disk drive, lor storing information and instructions may also be coupled to the system bus 131 .
  • additional processors or other components may be included. Additionally, in certain implementations components may be re-arranged. For example, the L2 cache memory 106 may lie between the processor 105 and the processor-memory bus 101. Furthermore, certain implementations of the present invention may not require nor include all of the above components. For example, the processors 102 through 104, the display device 123, or the mass storage device 125 may not be coupled to the processor-memory bus 101. Additionally, the peripheral devices shown coupled to ihe system I/O bus 1 31 may be coupled to the processor- emory bus 101 ; in addition, in some implementations only
  • a single bus may exisi w.th the processors 102 through 105, the memory controller 122, and the peripheral devices 132 through 136 coupled lo the single bus.
  • FIG. 2 is a block diagram illustrating a bus cluster system such as may be used with one embodimtnt of the preseni invention.
  • Figure 2 shows two clusters 201 and 202 of agents. Each of these clusters is comprised of a number of agents.
  • the cluster 201 is comprised of four agents 203-206 and a cluster manager
  • the agents 203-206 can include microprocessors, co-processors, digital signal processors, etc.; for example, ihe agents 203 through 206 may be the same as the processor 105 shown in Figure I .
  • the cluster manager 207 and its cache are shared between these four agents 203-206. Each cluster is coupled to a memory-system bus
  • the system interface 209 includes a high speed I/O interface 210 for interfacing the computer system to peripheral devices (not shown) and a memory interface 2 1 1 which provides access to a global main memory (not shown), such as a DRAM memory array.
  • the high speed I/O interface 21 1 -. the bridge 124 ol ' Figure I and the memory interface 21 1 is the memory controller 122 ol " Figure 1.
  • each cluster also includes a local memory controller and/or a local I/O bridge.
  • the cluster 201 may include a local memory controller 265 coupled lo the processor bus 212.
  • the local memory controller 265 manages accesses to a RAM or other local memory 266 contained within the cluster 201.
  • the cluster 2 1 may also include a local I/O bridge 267 coupled to the processor bus 212.
  • Local I/O bridge 267 manages accesses to I/O devices within the cluster, such as a mass storage device 268, or to an I/O bus, such as system I/O bus 131 of Figure 1.
  • ihe local memory of each cluster is part of the glob.il memory and I/O space for the entire system. Therefore, in this embodiment the system inteiface 209 need not be present because the individual local memory and I/O bridges make up the global memory system.
  • the buses 212 and 21 and the memory-system bus 208 operate analogous lo ihe processor- memory bus 101 of Figure 1.
  • the cluster 201 or 202 may comprise fewer than four agents.
  • the cluster 201 or 202 may not include the memory controller, local memory, I/O bridge, and storage device. Additionally, certain implementations of the present invention may include additional processors or other components.
  • bus transactions occur on the processor-memory buses described above in Figures 1 and 2 in a pipelined manner. That is, multiple bus transactions may be pending at the same time, wherein each is not fully completed. Therefore, when a requesting agent (also referred to as a source agent) begins a bus transaction by driving an address onto the address bus, the bus transaction may be only one of a number of bus transactions currently pending. Although bus transactions are pipelined, the bus transactions do not have to be fully completed in order; completion replies lo requests can be out-of-order.
  • bus activity is hierarchically organized i io operations, transactions, and phases.
  • An operation is a bus procedure that appears atomic to software such as reading a naturally aligned memory location. Executing an operation usually requires one transaction but may require multiple transactions, such as in the case of deterred replies in which requests and replies are different transactions, or m unaligned memory operations which software expects to be atomic.
  • a transaction is the set of bus activities related lo a single request, from ret ⁇ iesi bus arbitration through the completion of the transaction (e.g., a normal or implicit writeback response) during the Response Phase.
  • a transaction contains up to six distinct phases. However, certain phases are optional based on the transaction and response type. Alternatively, additional phases could also be added.
  • a phase uses a particular signal group to communicate a particular lype of information. In one implementation, these phases are:
  • the data transfer phase is optional and is used if a transaction is transferring data.
  • the data phase is request-initialed if the data is available at the time of initiating the request (lor example, for a write transaction).
  • the data phase is response-initialed if ihe data is available at the time of generating the transaction response (for example, for a read transaction).
  • ⁇ transaction may contain both a request-initiated data transfer and a response-initiated data transfer.
  • FIG. 3 shows an example of overlapped phases for tw transactions.
  • transactions begin with an arbitration phase, in which a requesting agent becomes the bus owner.
  • the arbitration phase needs lo occur only il the agent that is driving the next transaction does nol already own the bus.
  • bus ownership is granted to the requesting agent in the arbitration phase two or more clocks alter ownership is requested.
  • the second phase is the request phase, in which ihe bus owner drives a request and address information on the bus.
  • the request phase is one or more clocks afier bus ownership is granted (provided there is an arbitration phase), and is two clocks long.
  • ihe first clock an address signal is driven along with the transaction type and sufficient information to begin snooping a memory access.
  • byte enables used to identify which bytes of data should be transferred if the data transfer is less than ihe data bus width, a transaction identifier used to uniquely identify the transaction in ihe event a deferred response is given to the request, and the requested data transfer length are driven, along with other transaction information.
  • the third phase of a transaction is an error phase.
  • the error phase indicates any immediate errors, such as parity errors, triggered by the request. If an error is discovered, an error signal is asserted during the error phase by the agent which detected the error in the transaction. When an error is indicated, the transaction is immediately dropped (that is, the transaction progresses no further in the pipeline) and may be re-driven by the agent which issued the transaction. Whether the agent reissues the transaction depends on the agent itself, in one implementation, the error phase is three clocks after the request phase.
  • every transaction that is not canceled because of an error in the error phase has a snoop phase.
  • the snoop phase indicates if ihe cache line accessed in a transaction is not valid, valid or modified (dirty) in any agent's cache.
  • the snoop phase is four or more clocks from the request phase.
  • the snoop phase of the bus defines a snoop window during which snoop events can occur on the bus.
  • a snoop event refers to agents transmitting and/or receiving snoop results via ihe bus.
  • An agent which has snoop results which need to be driven during the snoop phase drives these snoop results as a snoop event during the snoop window.
  • All > nooping agents coupled to die bus, including the agent driving the results receive these snoop results as a snoop event during the snoop window.
  • the snoop window is a single bus clock.
  • the response phase indicates whether the transaction failed or succeeded, whether the response is immediate or deferred, whether the transaction will be retried, or whether the transaction includes data phases. If a transaction contains a response- initiated data phase, then it enters the data transfer phase along with the response phase.
  • the transaction does not have a daia phase, then that transaction is complete after the response phase. If the requesting agent has write data to transfer or has requested read data, the transaction has a daia phase which may extend beyond the response phase in the former case and will be coincident wiih or extend beyond the Response Phase in the latter case.
  • the data phase occurs only if a transaction requires a data transfer.
  • the data phase can be response initiated (for example, by the memory controller or another processor) or request initiated.
  • the bus accommodates deferred transactions by splitting a bus transaction into two independent transactions. The first transaction involves a request by a requesting agent and a response by ihe responding agent.
  • the request comprises the sending of an address on the address bus and a first token (also referred to as a transaction identifier).
  • the response includes the sending of the requested data (or completion signals) if ihe responding agent is ready to respond. In this case, the bus transaction ends.
  • the responding agent may send a deferred response over the bus during the response phase. Sending of a deferred response allows other transactions to be issued and not be held up by the completion of this transaction.
  • the requesting agent receives this deferred response.
  • the responding agent arbilraies for ownership of the bus. Once bus ownership is o itaincd, the responding agent sends a deferred reply transaction including a second loken on the bus. The requesting agent monitors the bus and receives the second token as pan of ihe deferred reply transaction.
  • the requesting agent latches the second token and determines whether the second token sent from the responding agent matches the first loken. If the requesting agent determines that the second token from the responding agent does not match the first token (which the requesting agent generated), then the data on the bus (or the completion signal) is ignored and the requesting agent continues monitoring the bus. If the requesting agent determines thai ihe second token from the responding agent does match the first token, then the data on the bus (or the completion signals) is the data originally requested by the requesting agent and the requesting agent latches the data on the data bus.
  • the present invention supports both read and wrue transactions.
  • a read transaction data is transferred from ihe targeted agent, typically a memory controller, to the requesting agent, typically a processor.
  • a write transaction data is transferred from the requesting agent, typically a processor, to the targeted agent, typically a memory conu'oller.
  • the present invention also supports an implicit writeback, which is part of a read or write transaction.
  • An implicit write back occurs when a requesting agent places a request on the bus for a cache line which is stored in a modified stale in a cache coupled 10 the bus.
  • an agent may perform a write transaction over the bus of eight bytes of data, however the cache line which includes ihose eight bytes is stored in mollified state in another agent's cache.
  • the cache which contains the cache line in modi lied state (or the agent which is coupled to the cache) issues a hit modilied .signal on ihe bus during the snoop phase for the transaction.
  • the requesting agent places the eight bytes of write data onto the bus, which are retrieved >y the targeted agent. Then, m ihe data transfer phase of the transaction, the cache which contains the cache line in modified state writes the cache line, which is 32 bytes in one implementation, to the bus. Any ol ihe data in the cache line which was not writlui to by the requesting agent is then merged with the write data from the original data transfer.
  • an additional control signal on the bus is used to control ihe flow of data on the bus.
  • this signal is the Target Ready (TRDY#) signal.
  • TRDY# Target Ready
  • the agent which is to be the recipient of the data for a transaction asserts ihe TRDY# signal to indicate that it is ready to receive the data for the transaction from ;. particular agent.
  • an agent issuing a read request does noi assert the TRDY# signal.
  • the agents on the bus presume thai the requesting agent, in issuing a read request, is ready to receive the requested data.
  • ihe memory controller on the bus such as memory control 122 of Figure I , or local memory coniroller 264 or interface 2 1 1 of Figure 2, has responsibility for asserting and deasserting the TRDY# signal.
  • the memory controller has the ability to control the flow of daia on the bus.
  • the memory controller on the bus includes a bus control logic, as illustrated in Figure 1.
  • the bus control logic includes one or more data buffers (not shown) into which the memory controller can temporarily store write data received from the bus prior lo storing the data in the main memory.
  • the memory controller decodes the address and determines the size of the data transfer associated with ihe request, and whether the request targets the memory controller. The memory controller can then delay assertion of the TRDY# signal until it has an available data buffer into which the data to be transferred can be placed.
  • the memory controller which is responsible for assertion and deassertion of the TRDY# signal includes a state machine to indicate when ⁇ l ⁇ TRDY# signal is to be asserted and deasserted.
  • Figure 4 is a state diagram illustrating ihe different stales for ihe TRDY# signal in accordance with one embodiment of the present invention. As illustrated, the memory controller can either asse H the TRDV# signal, state 4 1 , or deassert the TRDY# signal, state 402. The memory controller initializes at system reset to state 402 with the TRD Y# signal being deasserted.
  • Whether the memory controller will transition to the assert TRDY# state 401 depends on whether the reason for asserting the TRDY# signal is data provided by the requesting agent as pan of a write transaction or data provided by a snooping agent as pan ol an implicit writeback. However, ii is to be appreciated thai, regardless of the source of the data, the memory controller does not assert the TRDY# signal until i ⁇ is ready to receive the data.
  • the memory controller transitions to the assert TRDY# state 401 in response to a write transaction initiated by an agent on the bus when the following two conditions have been satisfied: (1) it is at least three clocks after the address strobe (ADS#) signal for the request has been asserted; and (2) it is at least one clock after the response for the previous transaction on the pipelined bus has been driven
  • ihe memory controller transitions to the assert TR Y# state 401 in response to an implicit writeback, which could be the result of either a read or write iransaciion from the requesting agent, such that the following two conditions are satisfied: ( 1 ) if the transaction also has a request initiated data transfer (thai is, the requesting agent initiated a write transaction), then TRDY# is deasserted for at least one clock between the TRD Y# for the write and the TRDY# for the implicit writeback; and (2) for both request and response initiated data transfers, it is at least one clock after ihe response for ihe previous transaction on the pipelined bus has been driven.
  • an implicit writeback which could be the result of either a read or write iransaciion from the requesting agent, such that the following two conditions are satisfied: ( 1 ) if the transaction also has a request initiated data transfer (thai is, the requesting agent initiated a write transaction), then TRDY# is deasserted for at least one clock between the TRD
  • the memory controller transitions back to the deasseri TRDY# state 402 as soon as it can be ensured thai ihe TRDY# dcassertion meeis the following five conditions: ( 1 ) the previous TRDY# deassertion occurred three or more clocks from the current TRDY# deasserlion point: (2) TRD Y# may be deasserted when the inactive data bus busy (DBS Y#) signal, defined below, and ihe active TRDY# signal are observed for al least one clock; (3) TKDY# can be deasserted within one clock if DBSY# was observed inactive on the clock TRDY# is asserted (4) TRDY# does not need to be deasserted until the response is active; and (5) TRDY# for a request initiated transfer must be deasserted before the lespouse to allow ihe TRDY# for an implicit writeback if one is required.
  • DBSY# inactive data bus busy
  • Figures 5 - 7 provide examples of timing diagrams illustrating the TRDY# signal according to vario.is embodiments of the present invention. A summary of the signals used in Figures 5 - 7 is shown below in ' fable 1.
  • FIG. 5 is a timing diagram illustrating the timing of signals in performing a two-agent write transaction according to one embodiment of the present invention.
  • a square is used to indicate the clock in which a signal is asserted
  • a circle is used to indicate the clock in which a signal is sampled.
  • the requesting agents asserts an address strobe (ADS#) signal 501 and a request control signal (RF.Qa()#) 502 in clock (CLK) 1. which are sampled in CLK 2 by the other agents on ihe bus.
  • the ADS# signal 501 being asserted indicates that the request is beginning, and the REQa()# signal 502 being asserted indicates that the requesuig agent has write data to transfer.
  • the modified hit (H1TM#) signal 503 remains inactive, indicating thai the request has not hit a modified cache line.
  • the target agent asserts the TRD Y# signal 504 in CLK 4, which the requesting agent observes active in CLK 5.
  • the requesting agent observes the DBSY# signal 505 inactive in CLK 5, which allows it lo begin the daia transfer in the next clock cycle, CLK 6.
  • the requesting agent asserts the daia ready (DRDY#) signal 507 in CLK 6 to indicate that valid data is on the bus.
  • the requesting agent drives the data on the data (D
  • the targeted agent then asserts response (RSf2:0
  • the TRDY# signal 504 can be deasserted in CLK 6 because the TRDY# signal 504 is observed active and the DBSY# signal 505 is observed inactive in CLK 5. Alternatively, the TRDY# signal 504 could remain asserted in CLK 6 and not be deasserted until CLK 7.
  • Figure 6 is a liming diagram illustrating the liming of signals in performing a read transaction with an implicit writeback, a three-agent transaction, according to one embodiment of the present invention.
  • a square is used to indicate the clock in which a signal is asserted
  • a circle is used to indicate the clock in which a signal is sampled.
  • the requesting agent asserts the ADS# signal . r >01 in CLK 1 , which is sampled in CLK 2 by ihe other agents on the bus.
  • the ADS# signal 5 1 being asserted indicates thai the request is beginning, and the REQaO# signal 502 being observed deasserted in CLK 2 indicates that the requesting agent does not have write data IO transfer.
  • the snooping agent asserts a H1TM# signal 503 in CLK 5, which is observed by the other agents on the bus in CLK 6, indicating that the request has hit a modified cache line in the snooping agent's cache.
  • the targeted agent then asserts the TRDY# signal 504 in CLK 7, which is observed active by the snooping agent in CLK 8.
  • the snooping agent observes the DBS Y# signal 505 inactive and the TRDY# signal 504 active in CLK 8, resulting in the snooping agent beginning the data transfer in CLK 9.
  • CLK 9 ihe targeted agent deasserts the TRDYr? signal 504 and the snooping agent asserts the DBSY# signal 505.
  • the snooping agent drives the modified cache line onto the bus on daia ⁇ Dlo3:0
  • the targeted agent then asserts the response signals (RS
  • both the target agent and the requesting agent latches the data from the bus 508.
  • the snooping agent transfers four sets of eight bytes of data each (four daia transfers on the D
  • Figure 7 is a timing diagram illustrating the timing of signals in performing a write transaction with an implicit writeback, a three-ageni iransaciion, according to one embodiment of the present invention.
  • a square is used to indicate the clock in which a signal is asserted
  • a circle is used to indicate the clock in which a sign.il is sampled.
  • ihe requesting agent asserts the ADS# signal 5 1 and a request control signal (REQaO#) 502 in CLK 1 and the other genis on the bus sample these signals 501 and 502 in CLK 2.
  • the ADS# signal 501 being asserted indicates thai the request is beginning
  • the REQa()# signal 502 being asserted indicates thai the requesting agent has write data to transfer.
  • the target agent asserts the TRDY# signal 504 in CLK 4 to indicate that it is ready to accept data.
  • the requesting agent observes the TRDY# signal 504 active and the DBS Y# signal 505 inactive, so that the data transfer begins in CLK 6 with the requesting agent asserting ihe DBS Y# signal 505 and the DRDY# signal 507, and driving daia on the D[63:0
  • the DBSY# signal 505 remains active for one clock, indicating thai the data transfer will complete in two clocks.
  • the target agent then asserts the response (RS
  • the snooping agent asserts ihe I IITM# signal 503 in CLK 5, which is observed by the other agents on the bus in CLK 6, indicating thai the request has hit a modified cache line in the snooping agent's cache.
  • the targeted agent asserts the TRDY# signal 504 for the implicit writeback data.
  • the snooping agent obs rves the TRDY# signal 504 active and the DBSY# signal 505 inactive, so the snooping agent begins the data transfer in CLK 9 with the assertion of the DBSY# signal 505.
  • the snooping agent is not ready to drive the implicit writeback data until CLK 1 1 , so ii does nol assert the DRDY# signal 507 until CLK 1 1.
  • the snooping agent then places ihe implicit writeback data on the bus in CLK 1 1.
  • the memory controller is described as being responsible for assertion and deasseriion ⁇ the TRDY# signal lo control data flow on the bus. It is to be appreciated, however, thai other agents on the bus may also control data flow for certain transactions. For example, il a request targets the mass storage device 125 of Figure I , or one of the agents on the system I/O bus 131 (via the bridge 124), then the storage device 125 or budge 124. respectively, would have control of the data flow on ihe bus.
  • ihe present invention provides a mechanism for controlling data flow on a bus which supports two- and empe-agent transactions.
  • the mechanism advantageously allows ihe agent which is to receive the data to control the flow of the data on the bus, thereby avoiding the possible situation of data being placed on the bus and the agent not having sufficient storage space for the daia.
  • the data flow control is provided to ihe agent which is to receive, the data, regardless of whether the agent is the targeted agent of the transaction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A data flow control mechanism (122) for a bus supporting two-and three-agent transactions includes a control logic (108) to place an indication of a request onto a computer system bus. The agent placing the indication on the bus then waits to place data corresponding to the request onto the bus until it has received an indication from another agent coupled to the bus that the other agent is ready to receive the data.

Description

A DATA FLOW CONTROL MECHANISM FOR A BUS SUPPORTING TWO- AND THREE ACENT TRANSACTIONS
BACKGROUND OF THE INVENTION
Field of the Invention The present invention pertains to computer ystems and computer system buses. More particularly, this invention relates to controlling data flow on a computer system bus which supports two- and ihree-agent transactions.
Background
Modern computer systems typically have multiple agents coupled together via a system bus. Typically, the agents are integrated circuit chips with multiple pins coupling each agent to the bus. These agents may include, for example, a processor(s), a memory device(s), a mass storage device(s), etc. In order for the computer system to operate properly, these agents should be able to effectively communicate with each other via the bus.
One aspect of this communication is the transfer of data from one agent to another. The transfer of daia on the bus is referred to as the data flow on the bus. In many computer systems, an agent which can be the target of a data transfer has a storage space, such as a data buffer, into which the transferred data is placed. However, agents typically have a limited amount of storage space for data. Therefore, situations can arise where the targeied agent for a data transfer does not have sufficient storage space to store the data. Additionally, it is often the case that only ihe targeted agent knows whether it has sufficient storage space to store the data. Thus, it would be beneficial to provide < ■■ mechanism that allows the agent which is targeted by a request to control the flow of data on the bus for that request.
Additionally, one type of bus which can be used in a computer system is referred to as a latched b.is. In a latched bus system, data is latched into a storage space from the bus in one clock cycle and control signals based on that data can be placed on the bus in any of the subsequent clock cycles. In contrast, in a non-latched bus system, data is received from the bus in one clock cycle and control signals based on that data can be placed on the bus in thai same clock cycle. Due to the nature of the latched bus, some solutions for controlling data flow on the bus which work on a non-latched bus are too inefficient to work on a latched bus. For example, on a non- latched bus, data can be placed on the bus by a source agent which can wait until it receives a ready signal from the targeted agent, at which point the source agent provides, in the same clock cycle as it receives the ready signal, the next data. However, this concept of waiting for, receiving, and processing the ready signal for each data transfer takes too much time on a laiched bus because the ready signal would be received in one clock cycle, processed in the next clock cycle, and then the next data would be placed on the bus. Therefore, ii would be beneficial to provide a mechanism thai allows the targeted agent 10 more efficiently control the flow of data on a latched bus.
Furthermore, some computer systems include one or more cache memories, each of which is faster and smaller than the main system memory. The cache memory typically allows data which has been recently accessed by an agent, or which is predicted to be accessed soon by an agciu, to be available in a asier memory, thereby reducing the time required to obtain the data and increasing overall system performance. Different agents, such as different processors, on a bus will often have their own cache memory These agents are ihen able to modify the data stored in their cache memory without making the same modifications to the main memory until a later time. However, situations can arise where data which is requested by a first agent is stored in a cache memory of a second agent, and the requested data in the second agent's cache memory has been modified. Therefore, the data to be returned to the first agent should come from the cache memory of ihe second agent, not from the main memory because the data in the cache memory is a more recent version. One solution to this problem is to transfer the requested data from the second agent to the first agent and have the memory controller for the main memory also take the data off the bus. A transaction such as this which uses the first and second agents, as well as the memory controller, is referred to as a ihree-agent transaction. However, this solution presumes that the memory controller has sufficient storage space to take the data off the bus, which is not always the case. Thus, it would be beneficial to provide
. ? a mechanism which allows the memory controller to maintain data flow control on the bus for a three-agent transaction.
As will be described in more detail below, the present invention provides a data flow control mechanism for a bus supporting two- and three-agent transactions to achieve these and other desired results which will be apparent to those skilled in the art from the description thai follows.
SUMMARY OF THE INVENTION
A data flow control mechanism for a bus supporting two- and three-agent transactions is described herein. An apparatus in accordance with the data flow control mechanism of the present invention includes a control logic to place an indication of a request onto a computer system bus. The apparatus then waits to place data corresponding to the request onto the bus until it has received an indication from an agent coupled to the bus that th agent is ready 10 receive the data.
In one embodiment of the present invention, the data flow control mechanism supports both two- and three-ageni transactions. In a two-agent transaction in accordance with this emltodiment, data is transferred from a source agent to a target agent, with the target agent maintaining control of the data How. In a three-agent transaction in accordance with this embodiment, data is transferred from a snooping agent to either the source agent or the target agent, as well as possibly from the source agent to the target agent. In the three-agent transaction, the target agent controls the data flow of transfers to the target agent, regardless of whether they originated with the source agent or the snooping agent.
BRIEF DESCRIPTION OF THE DRAWINGS The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Figure 1 illustrates a multiprocessor computer system such as may be used with one embodiment of the present invention:
Figure 2 is a block diagram illustrating a bus cluster system such as may be used with one embodiment of the present invention; Figure 3 shows a.i example of overlapped phases for two transactions according to one embodiment of the present invention;
Figure 4 is a state diagram illustrating the different states for the TRDY# signal in accordance with one embodiment of ihe present invention;
Figure 5 is a timing diagram illustrating the timing of signals in performing a write transaction according to one embodiment of the present invention;
Figure 6 is a timing diagram illustrating the timing of signals in performing a read transaction with an implicit writeback according to one embodiment of the present invention; and
Figure 7 is a timing diagram illustrating the timing of signals in performing a write transaction with an implicit writeback according lυ one embodiment of the present invention.
DETAILED DESCRIPTION
In the following detailed description numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be understood by those skilled in the art that the preseni invention may be practiced without these specific details. In oilier instances, well known methods, procedures, components, and circuits have not been described in detail so as not to obscure aspects of the present invention.
In the discussion^ to follow, certain signals are discussed which include a "#". This notation is used to indicate a signal which is active when in a low state (that is, a low voltage). It is to be appreciated, however, that the preseni invention includes implementations where these signals are active when in a high stale rather than when in a low state. Similarly, the present invention includes implementations where signals discussed herein ivhich do not include a "#" are active when in a low state.
The present invention provides a mechanism for controlling data flow on a bus which supports two- and three-agent transactions. The mechanism allows an agent which is to receive data from the bus to control the flow of data on the bus. The agent which is to receive data indicates when it is ready to receive the data, at which time another agent on ihe bus. which is the source of ihe data being transferred, places the data on the bus.
Figure 1 illustrates a multiprocessor computer system such as may be used with one embodiment of the present invention. The computer system 100 generally comprises a processor-memory bus or other communication means 101 for communicating information between different agents coupled to the bus 101, such as processors, bus bridges, memory devices, peripheral devices, etc. The processor- memory bus 101 includes arbitration, address, data and control buses (not shown). In one embodiment, the bus 101 is a latched bus having a data bus widih of 64 bits. in one embodiment of the present invention, each of the one or more processors 102, 103, 104 , and 105 includes a small, extremely fast internal cache memory (not shown), commonly referred to as a level one (LI) cache memory for temporarily storing data and instructions on-chip. In addition, a biggerlevel two (L2) cache memory 106 can re coupled to any one of the processors, such as processor 105, for temporarily storing data and instructions for use by ihe processor(s). Each processor may have its own L2 cache, or some may share an L2 cache.
Processors 102, 103, and 104 may each be a parallel processor (a symmetric co-processor), such as a processor similar to or the same as processor 105. Alternatively, processor 102, 103, or 104 may be an asymmetric co-processor, such as a digital signal processor. In addition, processors 102 through 105 may include processors of different t\ pes. In one embodiment, ihe present invention includes Intel ® Architecture microprocessors as processors 102 through 105, such as i3861 M, i486IΛI, Pentium " , or Pentium Pro microprocessors. However, the preseni invention may utilize an> type of microprocessor architecture. It is to be appreciated that the particular archiic:uιre(s) used is not especially germane to the present invention.
The processor- memory bus 101 provides system access to the memory and input/output (I/O) subsystems. A memory controller 122 is coupled to the processor- memory bus 101 for controlling access to a random access memory (RAM) or other dynamic storage device 121 (commonly referred to as a main memory) for storing information and instructions for processor.^ 102 through 105. A mass data storage device 125, such as a matineiic disk and disk drive, for storing information and
D - instructions, and a display device 123, such as a cathode ray tube (CRT), liquid crystal display (LCD), eic, for displaying information 10 the computer user may be coupled to the processor-memory bus 101.
Each of the agents coupled to the bus, including processors 102- 105 and memory controller 122, include a bus control logic 108 which acts as an interface between the agent and the bus 101 , both of which may run at different clock speeds. The bus control logic ION includes the latches and necessary circuitry for driving signals onto and receiving signals from the bus 101.
An input/outpul (I/O) bridge 124 may be coupled to the processor- memory bus 101 and a system I/O bus 131 to provide a communication path or gateway for devices on either processor-memory bus 101 or I/ bus 131 to access or transfer data between devices on the other bus. Essentially, ihe bridge 124 is an interface between the system I/O bus 1 1 and the processor- memory bus 101.
The I/O bus 131 communicates information between peripheral devices in the computer system. Devices that may be coupled to the system bus 1 1 include, for example, a display device 1 2, such as a cathode ray lube, liquid crystal display, etc., an alphanumeric input device 133 including alphanumeric and other keys, etc., for communicating informal on and command selections to other devices in the computer system (e.g., the processor 102) and a cursor control device 134 for controlling cursor movement. Moreover, a hard copy device 1 35, such as a plotter or printer, for providing a visual representation of ihe computer images and a mass storage device 136, such as a magnetic disk and disk drive, lor storing information and instructions may also be coupled to the system bus 131 .
In certain implementations of ihe present invention, additional processors or other components may be included. Additionally, in certain implementations components may be re-arranged. For example, the L2 cache memory 106 may lie between the processor 105 and the processor-memory bus 101. Furthermore, certain implementations of the present invention may not require nor include all of the above components. For example, the processors 102 through 104, the display device 123, or the mass storage device 125 may not be coupled to the processor-memory bus 101. Additionally, the peripheral devices shown coupled to ihe system I/O bus 1 31 may be coupled to the processor- emory bus 101 ; in addition, in some implementations only
- I) a single bus may exisi w.th the processors 102 through 105, the memory controller 122, and the peripheral devices 132 through 136 coupled lo the single bus.
Figure 2 is a block diagram illustrating a bus cluster system such as may be used with one embodimtnt of the preseni invention. Figure 2 shows two clusters 201 and 202 of agents. Each of these clusters is comprised of a number of agents. For example, the cluster 201 is comprised of four agents 203-206 and a cluster manager
207, which may include another cache memory (not shown), coupled to the bus 212. The agents 203-206 can include microprocessors, co-processors, digital signal processors, etc.; for example, ihe agents 203 through 206 may be the same as the processor 105 shown in Figure I . The cluster manager 207 and its cache are shared between these four agents 203-206. Each cluster is coupled to a memory-system bus
208. These clusters 201 and 202 are coupled to various other components of the computer system through a system interface 209. The system interface 209 includes a high speed I/O interface 210 for interfacing the computer system to peripheral devices (not shown) and a memory interface 2 1 1 which provides access to a global main memory (not shown), such as a DRAM memory array. In one embodiment, the high speed I/O interface 21 1 -. the bridge 124 ol' Figure I , and the memory interface 21 1 is the memory controller 122 ol" Figure 1.
In one embodiment of the preseni invention, each cluster also includes a local memory controller and/or a local I/O bridge. For example, the cluster 201 may include a local memory controller 265 coupled lo the processor bus 212. The local memory controller 265 manages accesses to a RAM or other local memory 266 contained within the cluster 201. The cluster 2 1 may also include a local I/O bridge 267 coupled to the processor bus 212. Local I/O bridge 267 manages accesses to I/O devices within the cluster, such as a mass storage device 268, or to an I/O bus, such as system I/O bus 131 of Figure 1.
In another embodiment of the present invention, ihe local memory of each cluster is part of the glob.il memory and I/O space for the entire system. Therefore, in this embodiment the system inteiface 209 need not be present because the individual local memory and I/O bridges make up the global memory system. In one embodiment of the present invention, the buses 212 and 21 and the memory-system bus 208 operate analogous lo ihe processor- memory bus 101 of Figure 1.
Certain implementations of the present invention may not require nor include all of the above components. For example, the cluster 201 or 202 may comprise fewer than four agents. Alternatively, the cluster 201 or 202 may not include the memory controller, local memory, I/O bridge, and storage device. Additionally, certain implementations of the present invention may include additional processors or other components.
In one embodiment of the present invention, bus transactions occur on the processor-memory buses described above in Figures 1 and 2 in a pipelined manner. That is, multiple bus transactions may be pending at the same time, wherein each is not fully completed. Therefore, when a requesting agent (also referred to as a source agent) begins a bus transaction by driving an address onto the address bus, the bus transaction may be only one of a number of bus transactions currently pending. Although bus transactions are pipelined, the bus transactions do not have to be fully completed in order; completion replies lo requests can be out-of-order.
In the bus used ith one embodiment of the preseni invention, bus activity is hierarchically organized i io operations, transactions, and phases. An operation is a bus procedure that appears atomic to software such as reading a naturally aligned memory location. Executing an operation usually requires one transaction but may require multiple transactions, such as in the case of deterred replies in which requests and replies are different transactions, or m unaligned memory operations which software expects to be atomic. In this embodiment, a transaction is the set of bus activities related lo a single request, from retμiesi bus arbitration through the completion of the transaction (e.g., a normal or implicit writeback response) during the Response Phase.
In one embodiment, a transaction contains up to six distinct phases. However, certain phases are optional based on the transaction and response type. Alternatively, additional phases could also be added. A phase uses a particular signal group to communicate a particular lype of information. In one implementation, these phases are:
- K - Arbitration Phase Request Phase Error Phase Snoop Phase Response Phase Data Transfer Phase
In one mode, the data transfer phase is optional and is used if a transaction is transferring data. The data phase is request-initialed if the data is available at the time of initiating the request (lor example, for a write transaction). The data phase is response-initialed if ihe data is available at the time of generating the transaction response (for example, for a read transaction). Λ transaction may contain both a request-initiated data transfer and a response-initiated data transfer.
Different phases ;rom different transactions can overlap, thereby pipelining bus usage and improving bus performance. Figure 3 shows an example of overlapped phases for tw transactions. Referring to Figure 3, transactions begin with an arbitration phase, in which a requesting agent becomes the bus owner. The arbitration phase needs lo occur only il the agent that is driving the next transaction does nol already own the bus. In one implementation, bus ownership is granted to the requesting agent in the arbitration phase two or more clocks alter ownership is requested.
The second phase is the request phase, in which ihe bus owner drives a request and address information on the bus. In one implementation, the request phase is one or more clocks afier bus ownership is granted (provided there is an arbitration phase), and is two clocks long. In ihe first clock, an address signal is driven along with the transaction type and sufficient information to begin snooping a memory access. In the second clock, byte enables used to identify which bytes of data should be transferred if the data transfer is less than ihe data bus width, a transaction identifier used to uniquely identify the transaction in ihe event a deferred response is given to the request, and the requested data transfer length are driven, along with other transaction information. The third phase of a transaction is an error phase. The error phase indicates any immediate errors, such as parity errors, triggered by the request. If an error is discovered, an error signal is asserted during the error phase by the agent which detected the error in the transaction. When an error is indicated, the transaction is immediately dropped (that is, the transaction progresses no further in the pipeline) and may be re-driven by the agent which issued the transaction. Whether the agent reissues the transaction depends on the agent itself, in one implementation, the error phase is three clocks after the request phase.
In one embodiment, every transaction that is not canceled because of an error in the error phase has a snoop phase. The snoop phase indicates if ihe cache line accessed in a transaction is not valid, valid or modified (dirty) in any agent's cache. In one implementation, the snoop phase is four or more clocks from the request phase.
The snoop phase of the bus defines a snoop window during which snoop events can occur on the bus. A snoop event refers to agents transmitting and/or receiving snoop results via ihe bus. An agent which has snoop results which need to be driven during the snoop phase drives these snoop results as a snoop event during the snoop window. All > nooping agents coupled to die bus, including the agent driving the results, receive these snoop results as a snoop event during the snoop window. In one implementation, the snoop window is a single bus clock.
The response phase indicates whether the transaction failed or succeeded, whether the response is immediate or deferred, whether the transaction will be retried, or whether the transaction includes data phases. If a transaction contains a response- initiated data phase, then it enters the data transfer phase along with the response phase.
If the transaction does not have a daia phase, then that transaction is complete after the response phase. If the requesting agent has write data to transfer or has requested read data, the transaction has a daia phase which may extend beyond the response phase in the former case and will be coincident wiih or extend beyond the Response Phase in the latter case. The data phase occurs only if a transaction requires a data transfer. The data phase can be response initiated (for example, by the memory controller or another processor) or request initiated. The bus accommodates deferred transactions by splitting a bus transaction into two independent transactions. The first transaction involves a request by a requesting agent and a response by ihe responding agent. In one embodiment the request comprises the sending of an address on the address bus and a first token (also referred to as a transaction identifier). The response includes the sending of the requested data (or completion signals) if ihe responding agent is ready to respond. In this case, the bus transaction ends.
However, if the responding agent is not ready to complete the bus transaction, then the responding agent may send a deferred response over the bus during the response phase. Sending of a deferred response allows other transactions to be issued and not be held up by the completion of this transaction. The requesting agent receives this deferred response. When the responding agent is ready to complete the deferred bus transaction, ihe responding agent arbilraies for ownership of the bus. Once bus ownership is o itaincd, the responding agent sends a deferred reply transaction including a second loken on the bus. The requesting agent monitors the bus and receives the second token as pan of ihe deferred reply transaction. The requesting agent latches the second token and determines whether the second token sent from the responding agent matches the first loken. If the requesting agent determines that the second token from the responding agent does not match the first token (which the requesting agent generated), then the data on the bus (or the completion signal) is ignored and the requesting agent continues monitoring the bus. If the requesting agent determines thai ihe second token from the responding agent does match the first token, then the data on the bus (or the completion signals) is the data originally requested by the requesting agent and the requesting agent latches the data on the data bus.
It is to be appreciated that, due lo the pipelined nature of the bus, multiple transactions can be at different stages ol the bus ai different times. For example, one transaction can be in the snoop phase, while a second transaction is in the error phase, and yet a third transactioa can be in the request phase. Thus, error signals and request signals can both be issued concurrently υn the bus even though they correspond to different transactions. In one embodiment of the preseni invention, up 10 eight transactions can be outstanding on the bus at any particular time and up to sixteen transactions can be waiting for a deferred response at any particular time.
The present invention supports both read and wrue transactions. In a read transaction data is transferred from ihe targeted agent, typically a memory controller, to the requesting agent, typically a processor. In a write transaction, data is transferred from the requesting agent, typically a processor, to the targeted agent, typically a memory conu'oller.
Additionally, one embodiment the present invention also supports an implicit writeback, which is part of a read or write transaction. An implicit write back occurs when a requesting agent places a request on the bus for a cache line which is stored in a modified stale in a cache coupled 10 the bus. For example, an agent may perform a write transaction over the bus of eight bytes of data, however the cache line which includes ihose eight bytes is stored in mollified state in another agent's cache. In this situation, the cache which contains the cache line in modi lied state (or the agent which is coupled to the cache) issues a hit modilied .signal on ihe bus during the snoop phase for the transaction. The requesting agent places the eight bytes of write data onto the bus, which are retrieved >y the targeted agent. Then, m ihe data transfer phase of the transaction, the cache which contains the cache line in modified state writes the cache line, which is 32 bytes in one implementation, to the bus. Any ol ihe data in the cache line which was not writlui to by the requesting agent is then merged with the write data from the original data transfer.
In one embodiment of the present invention, an additional control signal on the bus is used to control ihe flow of data on the bus. In one implementation, this signal is the Target Ready (TRDY#) signal. The agent which is to be the recipient of the data for a transaction asserts ihe TRDY# signal to indicate that it is ready to receive the data for the transaction from ;. particular agent. In one embodiment, an agent issuing a read request does noi assert the TRDY# signal. In this embodiment, the agents on the bus presume thai the requesting agent, in issuing a read request, is ready to receive the requested data.
In one embodiment of the present invention, ihe memory controller on the bus, such as memory control 122 of Figure I , or local memory coniroller 264 or interface 2 1 1 of Figure 2, has responsibility for asserting and deasserting the TRDY# signal. Thus, in this embodiment of the present invention, the memory controller has the ability to control the flow of daia on the bus.
The memory controller on the bus includes a bus control logic, as illustrated in Figure 1. The bus control logic includes one or more data buffers (not shown) into which the memory controller can temporarily store write data received from the bus prior lo storing the data in the main memory. When a request is issued on the bus, the memory controller decodes the address and determines the size of the data transfer associated with ihe request, and whether the request targets the memory controller. The memory controller can then delay assertion of the TRDY# signal until it has an available data buffer into which the data to be transferred can be placed.
According to one embodiment of the present invention, the memory controller which is responsible for assertion and deassertion of the TRDY# signal includes a state machine to indicate when ιlι TRDY# signal is to be asserted and deasserted. Figure 4 is a state diagram illustrating ihe different stales for ihe TRDY# signal in accordance with one embodiment of the present invention. As illustrated, the memory controller can either asse H the TRDV# signal, state 4 1 , or deassert the TRDY# signal, state 402. The memory controller initializes at system reset to state 402 with the TRD Y# signal being deasserted. Whether the memory controller will transition to the assert TRDY# state 401 depends on whether the reason for asserting the TRDY# signal is data provided by the requesting agent as pan of a write transaction or data provided by a snooping agent as pan ol an implicit writeback. However, ii is to be appreciated thai, regardless of the source of the data, the memory controller does not assert the TRDY# signal until iι is ready to receive the data.
According to one embodiment of the present invention, the memory controller transitions to the assert TRDY# state 401 in response to a write transaction initiated by an agent on the bus when the following two conditions have been satisfied: (1) it is at least three clocks after the address strobe (ADS#) signal for the request has been asserted; and (2) it is at least one clock after the response for the previous transaction on the pipelined bus has been driven
According lo one embodiment ol the piesent invention, ihe memory controller transitions to the assert TR Y# state 401 in response to an implicit writeback, which could be the result of either a read or write iransaciion from the requesting agent, such that the following two conditions are satisfied: ( 1 ) if the transaction also has a request initiated data transfer (thai is, the requesting agent initiated a write transaction), then TRDY# is deasserted for at least one clock between the TRD Y# for the write and the TRDY# for the implicit writeback; and (2) for both request and response initiated data transfers, it is at least one clock after ihe response for ihe previous transaction on the pipelined bus has been driven.
Regardless of how ihe memory controller transitioned to the assert TRDY# state 401 , the memory controller transitions back to the deasseri TRDY# state 402 as soon as it can be ensured thai ihe TRDY# dcassertion meeis the following five conditions: ( 1 ) the previous TRDY# deassertion occurred three or more clocks from the current TRDY# deasserlion point: (2) TRD Y# may be deasserted when the inactive data bus busy (DBS Y#) signal, defined below, and ihe active TRDY# signal are observed for al least one clock; (3) TKDY# can be deasserted within one clock if DBSY# was observed inactive on the clock TRDY# is asserted (4) TRDY# does not need to be deasserted until the response is active; and (5) TRDY# for a request initiated transfer must be deasserted before the lespouse to allow ihe TRDY# for an implicit writeback if one is required.
Figures 5 - 7 provide examples of timing diagrams illustrating the TRDY# signal according to vario.is embodiments of the present invention. A summary of the signals used in Figures 5 - 7 is shown below in 'fable 1.
Table
Figure imgf000017_0001
Figure 5 is a timing diagram illustrating the timing of signals in performing a two-agent write transaction according to one embodiment of the present invention. In the illustrated embodiment, a square is used to indicate the clock in which a signal is asserted, and a circle is used to indicate the clock in which a signal is sampled. As illustrated in Figure 5, the requesting agents asserts an address strobe (ADS#) signal 501 and a request control signal (RF.Qa()#) 502 in clock (CLK) 1. which are sampled in CLK 2 by the other agents on ihe bus. The ADS# signal 501 being asserted indicates that the request is beginning, and the REQa()# signal 502 being asserted indicates that the requesuig agent has write data to transfer. The modified hit (H1TM#) signal 503 remains inactive, indicating thai the request has not hit a modified cache line.
The target agent asserts the TRD Y# signal 504 in CLK 4, which the requesting agent observes active in CLK 5. The requesting agent observes the DBSY# signal 505 inactive in CLK 5, which allows it lo begin the daia transfer in the next clock cycle, CLK 6. The requesting agent asserts the daia ready (DRDY#) signal 507 in CLK 6 to indicate that valid data is on the bus. The requesting agent drives the data on the data (D|63:0]#) lines 506 in CLK 6. The targeted agent then asserts response (RSf2:0|#) signals 508 in CLK 7, providing ihe completion information to the requesting agent (e.g., normal data response, retry response, deferred response, etc.).
As illustrated in Figure 5, the TRDY# signal 504 can be deasserted in CLK 6 because the TRDY# signal 504 is observed active and the DBSY# signal 505 is observed inactive in CLK 5. Alternatively, the TRDY# signal 504 could remain asserted in CLK 6 and not be deasserted until CLK 7.
Figure 6 is a liming diagram illustrating the liming of signals in performing a read transaction with an implicit writeback, a three-agent transaction, according to one embodiment of the present invention. In the illustrated embodiment, a square is used to indicate the clock in which a signal is asserted, and a circle is used to indicate the clock in which a signal is sampled. As illustrated in Figure 6, the requesting agent asserts the ADS# signal .r>01 in CLK 1 , which is sampled in CLK 2 by ihe other agents on the bus. The ADS# signal 5 1 being asserted indicates thai the request is beginning, and the REQaO# signal 502 being observed deasserted in CLK 2 indicates that the requesting agent does not have write data IO transfer.
The snooping agent asserts a H1TM# signal 503 in CLK 5, which is observed by the other agents on the bus in CLK 6, indicating that the request has hit a modified cache line in the snooping agent's cache. The targeted agent then asserts the TRDY# signal 504 in CLK 7, which is observed active by the snooping agent in CLK 8. The snooping agent observes the DBS Y# signal 505 inactive and the TRDY# signal 504 active in CLK 8, resulting in the snooping agent beginning the data transfer in CLK 9. In CLK 9, ihe targeted agent deasserts the TRDYr? signal 504 and the snooping agent asserts the DBSY# signal 505. Also in CLK 9, the snooping agent drives the modified cache line onto the bus on daia ιDlo3:0|#) lines 500 and asserts the DRDY# signal 507 to indicate that valid daia is on the bus. The targeted agent then asserts the response signals (RS|2:0|#) 508 in CLK 9, providing the completion information to the requesting agent (e.g., an implicit writeback response). In the illustrated embodiment, both the target agent and the requesting agent latches the data from the bus 508.
It should be noted that in the illustrated embodiment, the snooping agent transfers four sets of eight bytes of data each (four daia transfers on the D|63:0|# lines) as the implicit writeback daia. This is due to the cache line size in the illustrated embodiment being 32 bytes, and the implicit writeback being a transfer of the entire cache line from the snooping agent to the larget agent.
Figure 7 is a timing diagram illustrating the timing of signals in performing a write transaction with an implicit writeback, a three-ageni iransaciion, according to one embodiment of the present invention. In the illustrated embodiment, a square is used to indicate the clock in which a signal is asserted, and a circle is used to indicate the clock in which a sign.il is sampled. As illustrated in Figure 7, ihe requesting agent asserts the ADS# signal 5 1 and a request control signal (REQaO#) 502 in CLK 1 and the other genis on the bus sample these signals 501 and 502 in CLK 2. The ADS# signal 501 being asserted indicates thai the request is beginning, and the REQa()# signal 502 being asserted indicates thai the requesting agent has write data to transfer.
The target agent asserts the TRDY# signal 504 in CLK 4 to indicate that it is ready to accept data. In CLK 5, the requesting agent observes the TRDY# signal 504 active and the DBS Y# signal 505 inactive, so that the data transfer begins in CLK 6 with the requesting agent asserting ihe DBS Y# signal 505 and the DRDY# signal 507, and driving daia on the D[63:0|# lines 506. The DBSY# signal 505 remains active for one clock, indicating thai the data transfer will complete in two clocks. The target agent then asserts the response (RS|2:0|#) signals 508 in CLK 9, which is observed by the requesting agent in CLK 10.
The snooping agent asserts ihe I IITM# signal 503 in CLK 5, which is observed by the other agents on the bus in CLK 6, indicating thai the request has hit a modified cache line in the snooping agent's cache. In CLK 7, the targeted agent asserts the TRDY# signal 504 for the implicit writeback data. In CLK 8, the snooping agent obs rves the TRDY# signal 504 active and the DBSY# signal 505 inactive, so the snooping agent begins the data transfer in CLK 9 with the assertion of the DBSY# signal 505. In the illustrated embodiment, the snooping agent is not ready to drive the implicit writeback data until CLK 1 1 , so ii does nol assert the DRDY# signal 507 until CLK 1 1. The snooping agent then places ihe implicit writeback data on the bus in CLK 1 1.
In Figures 5 - 7 above, specific timing of ihe TRD Y# signal 504 is discussed. As discussed above, the TRDY# signal 504 is asserted to indicate that the targeted agent is ready to receive data. Thus, the liming in the illustrated examples of Figures 5 - 7 would be changed if the targeted agent were not ready at the illustrated times. For example, in Figure 6, the TRDY# signal 504 could be asserted in CLK 9 rather than CLK 7 if the targeted agent were not ready lo begin receiving data until CLK 9. It is to be appreciated that, delaying assertion υf the TRDY# signal 504 for two clocks would result in a corresponding iwυ-elock delay of the assertion of the DBSY# signal 505, the DRDY# signal 507, ihe RS| 2:0|# .signals 508, and the data being driven on the D|63:0|# lines 506.
In some of the di -.missions above, the memory controller is described as being responsible for assertion and deasseriion \ the TRDY# signal lo control data flow on the bus. It is to be appreciated, however, thai other agents on the bus may also control data flow for certain transactions. For example, il a request targets the mass storage device 125 of Figure I , or one of the agents on the system I/O bus 131 (via the bridge 124), then the storage device 125 or budge 124. respectively, would have control of the data flow on ihe bus.
Thus, ihe present invention provides a mechanism for controlling data flow on a bus which supports two- and ihree-agent transactions. The mechanism advantageously allows ihe agent which is to receive the data to control the flow of the data on the bus, thereby avoiding the possible situation of data being placed on the bus and the agent not having sufficient storage space for the daia. Furthermore, the data flow control is provided to ihe agent which is to receive, the data, regardless of whether the agent is the targeted agent of the transaction.
Whereas many alterations and modifications of the present invention will be comprehended by a person skilled in the an after having read ihe foregoing description, ii is to be understood that ihe particular embodiments shown and described by way of illustration are in no way intended to be considered limiting. Therefore, references to details of particular embodiments are not intended to limit the scope of the claims.
Thus, a data flow control mechanism for a bus supporting two- and three- agent transactions has been described.

Claims

CLAIMS What is claimed is:
1 . A method for coi trolling data flow for transactions issued on a pipelined computer system bus, ihe method comprising ihe steps of:
(a) a first ageni issuing a request on the bus;
(b) a second agent providing a first indication 10 the first agent that the second agent is ready to accept data corresponding to the request; and
(c) the first a^ent placing the data corresponding to the request on the bus in response lo receiving the first indication.
2. The method of claim I , further comprising the steps of:
(d) the second agent providing a second indication to a ihird ageni that the second ageni is ready fur w riteback data corresponding lo the request from ihe third agent; and
(e) the third agent placing the writeback daia corresponding to the request on the bus in response to receiving the second indication.
3. The method of cl tim I , wherein the step of providing a first indication comprises ihe step of asserting a coiiliol line of the eoinputei system bus.
4. The method of claim 3, w eieiu the step ol providing a second indication comprises the step of asserting ihe control line of ihe computer system bus.
5. A computer system comprising: a pipelined bus; a first agent coupled to the bus. a second agent coupled to the bus; wherein the first agent includes a first bus control logic to place a request on the bus, and also lo delay placing data on ihe bus corresponding to the request until a first indication that the second agent is ready to accepi data has been received from the second agent; and wherein the second ageni includes a second bus control logic to provide the first indication to the first agent that the second ageni is ready 10 receive data corresponding to the request from the first agent.
6. The computer system of claim 5, wherein the first bus control logic is also to place the data on the bus in response 10 ihe first indication.
7. The computer system of claim 5. wherein the request is a write request.
8. The computer system of claim 5, wherein the first agent is a microprocessor.
9. The computer system of claim 8, wherein ihe second agent is a memory controller.
10. The computer system of claim 5, further comprising a third agent coupled to the bus, wherein the thirii agent includes a third bus control logic lo receive a second indication, from the seco -id agent, that the second agent is ready to receive data corresponding to the request from the third agent.
1 1 . The computer system of claim 10, wherein the first indication and the second indication comprise the same control line of the bus.
12. The computer sy aeni ot claim 10, wherein ihe bus is a latched bus.
13. An apparatus for issuing a request on a pipelined computer system bus, the apparatus comprising: a control logic configured lo place an indication of the request on the computer system bus: and wherein ihe control logic is also configured lo place data corresponding to the request on the bus only after receipt of an indication from an agent coupled to the bus that the agent is ready to accepi the daia.
14. The apparatus of claim 13, wherein the indication from the agent comprises a control signal of the computer system bus.
15. An apparatus for providing flow control for transactions issued on a pipelined computer system bus, the apparatus comprising: means for issuing, by a first agent, a request on the bus; means for providing, by a second agent, a first indication to the first agent that the second agent is ready to accepi data corresponding to ihe request; and means for placing, by the firsi agent, data corresponding to the request on the bus in response to receiving the first indication.
16. The apparatus of claim 14, further comprising: means for providing, by ihe second ageni, a second indication to a third agent that the second agent is ready for writeback data from the third ageni: and means for placing, by the third agent, the writeback daia on the bus in response lo receiving the second indication.
17. The apparatus ol claim 17, wherein the means for providing a l ist indication comprises a control line f the computer system bus.
18. The apparatus of claim 1 , wherein the means for providing a second indication comprises ihe control line of the computer system bus.
->->
PCT/US1997/011419 1996-09-06 1997-06-30 A data flow control mechanism for a bus supporting two-and three-agent transactions WO1998010350A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU35870/97A AU3587097A (en) 1996-09-06 1997-06-30 A data flow control mechanism for a bus supporting two-and three-agent transactions

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/709,215 1996-09-06
US08/709,215 US6405271B1 (en) 1994-09-08 1996-09-06 Data flow control mechanism for a bus supporting two-and three-agent transactions

Publications (1)

Publication Number Publication Date
WO1998010350A1 true WO1998010350A1 (en) 1998-03-12

Family

ID=24848933

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/011419 WO1998010350A1 (en) 1996-09-06 1997-06-30 A data flow control mechanism for a bus supporting two-and three-agent transactions

Country Status (3)

Country Link
AU (1) AU3587097A (en)
TW (1) TW347495B (en)
WO (1) WO1998010350A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG123610A1 (en) * 1999-12-29 2006-07-26 Intel Corp Quad pumped bus architecture and protocol
GB2450148A (en) * 2007-06-14 2008-12-17 Advanced Risc Mach Ltd Controlling write transactions between initiators and recipients via interconnect logic

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5528764A (en) * 1992-12-24 1996-06-18 Ncr Corporation Bus system with cache snooping signals having a turnaround time between agents driving the bus for keeping the bus from floating for an extended period
US5551005A (en) * 1994-02-25 1996-08-27 Intel Corporation Apparatus and method of handling race conditions in mesi-based multiprocessor system with private caches

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5528764A (en) * 1992-12-24 1996-06-18 Ncr Corporation Bus system with cache snooping signals having a turnaround time between agents driving the bus for keeping the bus from floating for an extended period
US5551005A (en) * 1994-02-25 1996-08-27 Intel Corporation Apparatus and method of handling race conditions in mesi-based multiprocessor system with private caches

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG123610A1 (en) * 1999-12-29 2006-07-26 Intel Corp Quad pumped bus architecture and protocol
GB2450148A (en) * 2007-06-14 2008-12-17 Advanced Risc Mach Ltd Controlling write transactions between initiators and recipients via interconnect logic

Also Published As

Publication number Publication date
AU3587097A (en) 1998-03-26
TW347495B (en) 1998-12-11

Similar Documents

Publication Publication Date Title
US6405271B1 (en) Data flow control mechanism for a bus supporting two-and three-agent transactions
US6128711A (en) Performance optimization and system bus duty cycle reduction by I/O bridge partial cache line writes
US5696910A (en) Method and apparatus for tracking transactions in a pipelined bus
US6021456A (en) Method for communicating interrupt data structure in a multi-processor computer system
US6012120A (en) Method and apparatus for providing DMA transfers between devices coupled to different host bus bridges
US5463753A (en) Method and apparatus for reducing non-snoop window of a cache controller by delaying host bus grant signal to the cache controller
US6598103B2 (en) Transmission of signals synchronous to a common clock and transmission of data synchronous to strobes in a multiple agent processing system
US5327570A (en) Multiprocessor system having local write cache within each data processor node
US5353415A (en) Method and apparatus for concurrency of bus operations
US5524235A (en) System for arbitrating access to memory with dynamic priority assignment
US5535340A (en) Method and apparatus for maintaining transaction ordering and supporting deferred replies in a bus bridge
US5568620A (en) Method and apparatus for performing bus transactions in a computer system
US5919254A (en) Method and apparatus for switching between source-synchronous and common clock data transfer modes in a multiple processing system
US6012118A (en) Method and apparatus for performing bus operations in a computer system using deferred replies returned without using the address bus
US5911053A (en) Method and apparatus for changing data transfer widths in a computer system
US20100005247A1 (en) Method and Apparatus for Global Ordering to Insure Latency Independent Coherence
JP2003518693A (en) Quad pump bus architecture and protocol
WO1994008297A9 (en) Method and apparatus for concurrency of bus operations
JPH0642225B2 (en) Computer system having DMA function
US6108735A (en) Method and apparatus for responding to unclaimed bus transactions
USRE40921E1 (en) Mechanism for efficiently processing deferred order-dependent memory access transactions in a pipelined system
US5923857A (en) Method and apparatus for ordering writeback data transfers on a bus
US6260091B1 (en) Method and apparatus for performing out-of-order bus operations in which an agent only arbitrates for use of a data bus to send data with a deferred reply
US6253302B1 (en) Method and apparatus for supporting multiple overlapping address spaces on a shared bus
WO1998010350A1 (en) A data flow control mechanism for a bus supporting two-and three-agent transactions

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT UA UG UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 1998512630

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

122 Ep: pct application non-entry in european phase
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载