US20120066475A1 - Translation lookaside buffer - Google Patents
Translation lookaside buffer Download PDFInfo
- Publication number
- US20120066475A1 US20120066475A1 US13/298,800 US201113298800A US2012066475A1 US 20120066475 A1 US20120066475 A1 US 20120066475A1 US 201113298800 A US201113298800 A US 201113298800A US 2012066475 A1 US2012066475 A1 US 2012066475A1
- Authority
- US
- United States
- Prior art keywords
- memory
- line
- translation
- virtual address
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013519 translation Methods 0.000 title claims abstract description 95
- 239000000872 buffer Substances 0.000 title abstract description 11
- 230000015654 memory Effects 0.000 claims abstract description 83
- 238000000034 method Methods 0.000 claims description 17
- 230000004044 response Effects 0.000 claims 12
- 230000011664 signaling Effects 0.000 claims 3
- 230000003213 activating effect Effects 0.000 claims 1
- 230000014616 translation Effects 0.000 description 54
- 230000006870 function Effects 0.000 description 12
- 230000008569 process Effects 0.000 description 7
- 239000013256 coordination polymer Substances 0.000 description 6
- 238000003491 array Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/65—Details of virtual memory and virtual address translation
- G06F2212/652—Page size control
Definitions
- Virtual addressing enables the system to effectively create a virtual memory space larger than an actual physical memory space.
- the process of breaking up the actual physical memory space into the virtual memory space is termed paging.
- Paging breaks up a linear address space of the physical memory space into fixed blocks called pages. Pages allow a large linear address space to be implemented with a smaller physical main memory plus cheap background memory. This configuration is referred to as “virtual memory.”
- Paging allows virtual memory to be implemented by managing memory in pages that are swapped to and from the background memory. Paging offers additional advantages, including reduced main memory fragmentation, selective memory write policies for different pages, and varying memory protection schemes for different pages.
- the presence of a paging mechanism is typically transparent to the application program.
- the size of a page is a tradeoff between flexibility and performance.
- a small page size allows finer control over the virtual memory system but it increases the overhead from paging activity. Therefore many CPUs support a mix of page sizes, e.g. a particular MIPS implementation supports any mix of 4 kB, 16 kB, 64 kB, 256 kB, 1 MB, 4 MB and 16 MB pages.
- a processor is then able to advantageously operate in the virtual address space using virtual addresses. Frequently, however, these virtual addresses must be translated into physical addresses—actual memory locations.
- One way of accomplishing this translation of virtual addresses into physical addresses is a use of translation tables that are regularly accessed and stored in main memory. Translation tables are stored in main memory because they are typically large in size. Unfortunately, regularly accessing of translation tables stored in main memory tends to slow overall system performance.
- TLB translation lookaside buffer
- a TLB is a special type of cache memory. As with other types of cache memories, a TLB is typically comprised of a relatively small amount of memory storage specially designed to be quickly accessible.
- a TLB typically incorporates both a tag array and a data array, as are provided in cache memories. Within the tag array, each tag line stores a virtual address. This tag line is then associated with a corresponding data line in the data array in which is stored a physical address translation for the virtual address.
- a processor prior to seeking a translation of a virtual address from translation tables in main memory, a processor first refers to the TLB to determine whether the physical address translation of the virtual address is presently stored in the TLB. In the event that the virtual address and corresponding physical address are stored in the TLB, the TLB provides the corresponding physical address at an output port thereof, and a time and resource-consuming access of main memory is avoided.
- a content addressable memory CAMs are parallel pattern matching circuits. In a matching mode of operation the CAM permits searching of all of its data in parallel to find a match.
- TLBs require custom circuit design techniques to implement a CAM.
- Using custom circuit designs is not advantageous since each TLB and associated CAM requires a significant design effort in order to implement same in a processor system design.
- signals from the processor propagate off chip to the CAM, thereby incurring delays.
- a translation lookaside buffer comprising: at least an input port for receiving a portion of a virtual address;
- a random access memory a set of registers; and, synthesisable logic for determining a hash value from the received portion of the virtual address and for comparing the hash value to a stored hash value within the set of registers to determine a potential that a physical address associated with the virtual address is stored within a line within the random access memory and associated with a register, from the set of registers, within which the hash value is stored.
- a translation lookaside buffer comprising: a random access memory; a first register associated with a line in the memory; and, a hashing circuit for receiving a virtual address other than a virtual address for which a translation is presently stored in the memory, for determining a hash value and for storing the hash value in the first register; and the hashing circuit for storing the virtual address and a translation therefor in the line in memory.
- a translation lookaside buffer comprising: RAM; and, synthesisable logic for determining from a virtual address at least one potential address within the RAM in fixed relation to which to search for a physical address associated with the virtual address, the at least one potential address being other than the one and only known address within the RAM in fixed relation to which the physical address associated with the virtual address is stored.
- a method of performing a virtual address lookup function for a translation lookaside buffer including RAM and synthesisable logic including the steps of: providing a virtual address to the synthesisable logic; hashing the provided virtual address to provide a hash result;
- FIG. 1 a illustrates a prior art transistor implementation of a SRAM circuit
- FIG. 1 b illustrates a prior art transistor implementation of a CAM circuit
- FIG. 2 illustrates a prior art translation process from a virtual address (VA) to a physical address (PA);
- FIG. 3 illustrates a prior art translation from a VA to a PA when performed in conjunction with a direct mapped cache memory
- FIG. 4 a generally illustrates a translation lookaside buffer formed using synthesisable logic components and a random access memory
- FIG. 4 b illustrates a translation lookaside buffer in more detail formed from synthesisable logic components
- FIG. 4 c outlines the steps taken for operation of the TLB
- FIG. 5 illustrates a hashing circuit in more detail
- FIG. 6 illustrates a variation of the hashing circuit shown in FIG. 5 .
- CAM circuits include storage circuits similar in structure to SRAM circuits. However, CAM circuits also include search circuitry offering an added benefit of a parallel search mode of operation, thus enabling searching of the contents of the CAM in parallel using hardware. When searching the CAM for a particular data value, the CAM provides a match signal upon finding a match for that data value within the CAM.
- a main difference between CAM and SRAM is that in a CAM, data is presented to the CAM representative of a virtual address and an address relating to the data is returned, whereas in a SRAM, an address is provided to the SRAM and data stored at that address is returned.
- the cells of the CAM are arranged so that each row of cells holds a memory address and that row of cells is connected by a match line to a corresponding word line of the data array to enable access of the data array in that word line when a match occurs on that match line.
- each row of the CAM holds the full address of a corresponding main memory location and the inputs to the CAM require the full address to be input.
- a prior art publication entitled “A Reconfigurable Content Addressable Memory,” by Steven A Guccione et al., discusses the implementation of a CAM within an FPGA.
- a CAM circuit 101 is very similar to a standard SRAM 100 .
- Both CAM and SRAM circuits are almost identical, each having 6 transistors 102 except for the addition of three match transistors 103 that provide for the parallel search capability of the CAM 101 .
- using standard programmable logic devices does not facilitate implementing such transistor level circuits.
- the prior art publication also addresses implementing of a CAM using look up tables (LUTs) in an FPGA. Rather than using flip-flops within the FPGA to store the data to be matched, this implementation addresses the use of LUTs for storing of the data to be matched. By using LUTs rather than flip-flops a smaller CAM architecture is possible.
- LUTs look up tables
- FIG. 2 illustrates the translation process from a virtual address (VA) 201 to a physical address (PA) 202 .
- VA virtual address
- PA physical address
- the V ⁇ 201 is a 32-bit address, VA[31:0]
- the PA 202 is also a 32-bit address PA[31:0].
- the VA has two portions, a virtual page number (VPN) 203 and a page offset (PO) 204 .
- the VPN 203 is typically located in the upper portion of the VA 201 and the PA 202 is typically located in the lower portion, though this need not be so.
- the VPN is 20 bits and the PA is 12 bits.
- the PA, or lower 12 bits translate directly into the PA.
- the VPN 203 is used for indexing the TLB 205 to retrieve a physical page number (PPN) 206 therefrom. In other words, the VPN 203 undergoes translation to the PPN 206 . Combining the PPN 206 in the upper portion of the PA 202 and the PO into the lower portion of the PA provides a translation from the VA to the PA.
- PPN physical page number
- FIG. 3 illustrates the translation from a VA 201 to a PA 202 when performed in conjunction with a direct mapped cache memory 301 .
- the VA is used to access both the cache memory 301 and the TLB 205 .
- the page-offset portion of the VA is used to access the cache memory 301 —the page offset being the portion of the address that remains unmodified by the translation process.
- the page offset is used to index a tag array 302 and a data array 303 found in cache memory 301 where the page offset is used to index a cache line 302 a within the cache memory 301 .
- Access to the TLB 205 is performed using the VPN 203 portion of the VA 201 .
- the TLB 205 typically comprises a TLB tag array 304 and a TLB data array 305 .
- Both the TLB tag array 304 and the TLB data array 305 contain bits from the VPN 203 such that when a VPN is provided to both of these arrays, the bits making up the VPN are compared to those stored within the arrays 304 , 305 to locate an entry within the TLB 205 .
- the PPN 206 is retrieved and is provided to the cache memory 301 and used for comparison to the tag retrieved 302 a from the tag array 302 .
- a match being indicative of a cache “hit” 306 .
- a TLB hit signal 307 is generated. In this manner, the cache is only accessed using bits of the PPN 206 .
- the above example illustrates the use of a direct mapped cache memory; however, the same translation of a VA to a PA is applicable to set-associative caches as well. When set-associative caches are used, those of skill in the art appreciate that the size of a cache way is less than or equal to the size of a virtual page.
- FIG. 4 a generally illustrates a TLB 400 formed using synthesisable logic components 499 and a random access memory (RAM) 410 .
- a VPN for translation is provided via a VPN_IN input port 450 , where bits VPN_IN[31:12] are provided from the VA[31:0] to this input port 450 .
- a page mask signal is provided via a CP 0 _PAGE_MASK input port 451 .
- a CP 0 _TRANSLATION input signal is provided via a CP 0 _TRANSLATION input port 452 .
- a TLB_TRANSLATION output signal is provided via TLB_TRANSLATION output port 453 , in dependence upon a translation from a VA to a PA using the TLB 400 .
- FIG. 4 b illustrates a TLB 400 in more detail formed from synthesizeable logic components, and in FIG. 4 c the steps of operation for the TLB 400 are shown in summary.
- a VPN for translation is provided 480 via a VPN_IN input port 450 , where bits VPN_IN[31:12] are provided from the VA[31:0] to this input port 450 as the VPN.
- a page mask is provided via a CP 0 _PAGE_MASK input port 451 . This page mask is provided to a page mask encoder 408 , for encoding the page mask according to Table 1.
- the page mask encoder 408 is used for accepting the CPO_PAGE_MASK input signal on an input port thereof and for correlating this input signal to a 3-bit vector, MASK[2:0].
- the 3-bit vector MASK[2:0] is further provided to a hashing circuit 406 .
- the hashing circuit 406 receives VPN_IN[31:12] via a first input port 406 a and MASK[2:0] via a second input port 406 b .
- a hashed vector H_VPN[5:0] is provided from an output port 406 c thereof via a hashing operation 481 of the hashing circuit 406 .
- the hashed vector H_VPN[5:0] and the MASK[2:0] are further provided to each one of 48 registers 409 , where each register consists of multiple flip-flops collectively referred to as 491 .
- Each of the registers 409 has two output ports. A first output signal from a first output port thereof is provided to a comparator circuit 403 . A second output signal from a second output port is provided to the second input port 406 b on one of 48 hashing circuits 406 . The first input port on this hashing circuit receives VPN_IN[31:12].
- the hashing circuit 406 output port is coupled to one of 48 comparator circuits 403 for performing a comparison between the register output and the hashing circuit output signal.
- Each of the comparators in dependence upon a comparison of two input signals, provides a ‘1’ if the signals are the same and a ‘0’ if they are different.
- Output signals hit, from each of the 48 comparators is provided to one of 48 single bit 2-input multiplexers 411 .
- Outputs ports from each of the multiplexers are coupled to a flip-flop 404 .
- Each of the flip-flop 404 generates an output signal provided at the output ports labeled try 1 , where collectively these output signals try[0 . . . 47], for 0 ⁇ i ⁇ 47 are provided to a priority encoder circuit 401 .
- the priority encoder circuit is further coupled to a binary decoder circuit 402 , where the priority encoder circuit asserts a TLB_ENTRY[5:0] signal to the binary decoder circuit 402 and to the RAM 410 .
- Three output ports are provided within the TLB 400 , an ENTRY_FOUND output port 454 , an ENTRY_NOT_FOUND output port 455 and a TLB_TRANSLATION output port 453 , for providing ENTRY_FOUND, ENTRY_NOT_FOUND, and TLB_TRANSLATION output signal, respectively.
- An address for translation from a VA to a PA is stored in a random access memory (RAM) 410 , with the RAM 410 preferably having 48-entries, in the form of lines.
- RAM random access memory
- input signals VPN_IN, CP 0 _PAGE_MASK, and CP 0 _TRANSLATION are provided to the TLB circuit 400 via input ports 450 , 451 , and 452 , respectively.
- Translations performed by the TLB are stored in RAM 410 for a given index, i. The given index, indexes one of the lines 410 a within the RAM that holds the translation to the PPN.
- the hashing circuit 406 computes the hash function H (VPN_IN, MASK) and stores the result in a corresponding 6-bit register h 1 490 .
- the page mask is stored in the 3-bit register m i 491 .
- a VPN is provided via the input port 450 and the hash functions H (VPN_IN, m 1 ) is computed for all i and compared to h 1 . This yields a 48 bit vector 492 hit 0 . . . hit 47 which is subsequently loaded into a 48 bit register 493 try 0 . . . try 47 .
- H hash functions
- the priority encoder 401 selects the entry with the lowest index to address entries within the RAM.
- the decoder 402 converts this index to a 48-bit one-hot vector 494 clr 0 . . . clr 47 .
- the try 1 vector is reloaded, except for a bit corresponding to an index just used to address the RAM, which is cleared. This process is repeated, one entry at a time 483 . The process stops as soon as the requested entry is found 484 , as indicated by the ENTRY_FOUND signal on the ENTRY_FOUND output port 454 , or when all bits in try 1 are 0.
- the ENTRY_NOT_FOUND signal is provided via the ENTRY_NOT_FOUND output port 455 .
- the translation is successful and information for the translation is provided 485 from the RAM 410 using a TLB_TRANSLATION signal on the TLB_TRANSLATION output port 453 .
- the translation is not successful and the TLB reports a TLB refill exception.
- FIG. 5 illustrates a hashing circuit 506 in more detail.
- a 7 to 1 multiplexer 501 uses the MASK[2:0] and VPN[31:12] input signals to the hashing circuit 506 , a 7 to 1 multiplexer 501 provides the H_VPN[5:0] output signal from the hashing circuit 506 in dependence upon the MASK[2:0] signal provided to the second input port 506 b .
- This hashing circuit selects the 6 least significant bits from the VPN. The selection is controlled by the page mask because the definition of “least significant” changes with the page size.
- the 6 least significant bits (LSB)s of the VPN are bits 22:17, but with a 16 kB page size the 6 LSBs are bits 19:14.
- the TLB 400 stores two adjacent virtual pages per TLB entry, called an odd/even pair, the 6 LSBs for a 4 kB page odd/even pair are bits 18:13.
- bit 12 decides whether to return the even (0) or odd ( 1 ) translation, and for a 16 kB odd/even pair the bits are 20:15.
- This hash function is redundant, since the ordering of bits H_VPN[5:0] is irrelevant.
- FIG. 6 exploits the fact that ordering of bits is irrelevant.
- FIG. 6 illustrates a variation of the hashing circuit shown in FIG. 5 .
- a VPN[31:12] signal is provided to the first input port 606 a
- a MASK[2:0] signal is provided to the second input port 606 b .
- the mask signal MASK[2:0] is comprised of hits m 0 , m 1 , and m 2 .
- this hashing 606 circuit there are 3, 3 to 1 multiplexers 601 through 603 .
- the first multiplexer 601 receives the following hits, ⁇ m 2 , m 2 (m 1 +m 0 ) ⁇ on its selection input ports, where bits from the VPN, VPN[13:14], VPN[19:20], VPN[25:26] are provided to multiplexer data input ports, 0 through 2 , respectively. Multiplexer 601 thus provides bits 5 and 4 to the H_VPN[5:0] output signal.
- the second multiplexer 602 receives the following bits ⁇ m 2 (m 1 +m 0 ), m 2 m 1 +m 2 m 1 m 0 ) ⁇ on its selection input ports, where bits from the VPN, VPN[15:16], VPN[21:22], VPN[27:28] are provided to multiplexer data input ports, labeled 0 through 2 , respectively.
- Multiplexer 602 thus provides bits 3 and 2 to the H_VPN[5:0] output signal.
- the third multiplexer 603 receives the following bits ⁇ m 2 m 1 , m 2 m 1 m 0 +m 2 m 1 ) ⁇ on its selection input ports, where bits from the VPN, VPN[17:18], VPN[23:24], VPN[29:30] are provided to multiplexer data input ports, labeled 0 through 2 , respectively. Multiplexer 603 thus provides bits 1 and 0 to the H_VPN[5:0] output signal.
- the hash function H_VPN[5:0] is uniformly distributed for MASK[2:0] and for VPN_IN[31:12] input signals.
- a TLB miss all entries within the RAM are looked up for which try 1 is initially asserted.
- the number of cycles N miss is given by the following equation:
- Table 2 lists, for a range of hash function widths (n), the average number of cycles it takes to find a translation N hit , to detect a miss N miss and the probability that the TLB operation takes 25 cycles or more.
- VA when VA is provided to the TLB it is propagated to the synthesized logic for each line and a result is provided indicated by at least an asserted bit within the try 1 vector of bits. Only those lines for which a result indicative of a match occurred are then physically accessed to provide the PPN As such only a small fraction of the TLB lines are accessed for the translation process, thus resulting in a substantial performance improvement.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A translation lookaside buffer (TLB) is disclosed formed using RAM and synthesisable logic circuits. The TLB provides logic within the synthesisable logic for pairing down a number of memory locations that must be searched to find a translation to a physical address from a received virtual address. The logic provides a hashing circuit for hashing the received virtual address and uses the hashed virtual address to index the RAM to locate a line within the RAM that provides the translation.
Description
- This invention relates to the area of translation lookaside buffers and more specifically to translation lookaside buffer architectures for rapid design cycles.
- Modern microprocessor systems typically utilize virtual addressing. Virtual addressing enables the system to effectively create a virtual memory space larger than an actual physical memory space. The process of breaking up the actual physical memory space into the virtual memory space is termed paging. Paging breaks up a linear address space of the physical memory space into fixed blocks called pages. Pages allow a large linear address space to be implemented with a smaller physical main memory plus cheap background memory. This configuration is referred to as “virtual memory.” Paging allows virtual memory to be implemented by managing memory in pages that are swapped to and from the background memory. Paging offers additional advantages, including reduced main memory fragmentation, selective memory write policies for different pages, and varying memory protection schemes for different pages. The presence of a paging mechanism is typically transparent to the application program.
- The size of a page is a tradeoff between flexibility and performance. A small page size allows finer control over the virtual memory system but it increases the overhead from paging activity. Therefore many CPUs support a mix of page sizes, e.g. a particular MIPS implementation supports any mix of 4 kB, 16 kB, 64 kB, 256 kB, 1 MB, 4 MB and 16 MB pages.
- A processor is then able to advantageously operate in the virtual address space using virtual addresses. Frequently, however, these virtual addresses must be translated into physical addresses—actual memory locations. One way of accomplishing this translation of virtual addresses into physical addresses is a use of translation tables that are regularly accessed and stored in main memory. Translation tables are stored in main memory because they are typically large in size. Unfortunately, regularly accessing of translation tables stored in main memory tends to slow overall system performance.
- Modern microprocessor systems often use a translation lookaside buffer (TLB) to store or cache recently generated virtual to physical address translations in order to avoid the need to regularly access translation tables in main memory to accomplish address translation. A TLB is a special type of cache memory. As with other types of cache memories, a TLB is typically comprised of a relatively small amount of memory storage specially designed to be quickly accessible. A TLB typically incorporates both a tag array and a data array, as are provided in cache memories. Within the tag array, each tag line stores a virtual address. This tag line is then associated with a corresponding data line in the data array in which is stored a physical address translation for the virtual address. Thus, prior to seeking a translation of a virtual address from translation tables in main memory, a processor first refers to the TLB to determine whether the physical address translation of the virtual address is presently stored in the TLB. In the event that the virtual address and corresponding physical address are stored in the TLB, the TLB provides the corresponding physical address at an output port thereof, and a time and resource-consuming access of main memory is avoided. To facilitate operation of the TLB and to reduce indexing requirements therefore, a content addressable memory (CAM) is typically provided within the TLB. CAMs are parallel pattern matching circuits. In a matching mode of operation the CAM permits searching of all of its data in parallel to find a match.
- Unfortunately, traditional TLBs require custom circuit design techniques to implement a CAM. Using custom circuit designs is not advantageous since each TLB and associated CAM requires a significant design effort in order to implement same in a processor system design. Of course, when a processor is absent CAM circuitry, signals from the processor propagate off chip to the CAM, thereby incurring delays.
- It is therefore an object of this invention to provide a CAM architecture formed of traditional synthesisable circuit blocks.
- In accordance with the invention there is provided a translation lookaside buffer (TLB) comprising: at least an input port for receiving a portion of a virtual address;
- a random access memory; a set of registers; and, synthesisable logic for determining a hash value from the received portion of the virtual address and for comparing the hash value to a stored hash value within the set of registers to determine a potential that a physical address associated with the virtual address is stored within a line within the random access memory and associated with a register, from the set of registers, within which the hash value is stored.
- In accordance with an aspect of the invention there is provided a translation lookaside buffer comprising: a random access memory; a first register associated with a line in the memory; and, a hashing circuit for receiving a virtual address other than a virtual address for which a translation is presently stored in the memory, for determining a hash value and for storing the hash value in the first register; and the hashing circuit for storing the virtual address and a translation therefor in the line in memory.
- In accordance with yet another aspect of the invention there is provided a translation lookaside buffer comprising: RAM; and, synthesisable logic for determining from a virtual address at least one potential address within the RAM in fixed relation to which to search for a physical address associated with the virtual address, the at least one potential address being other than the one and only known address within the RAM in fixed relation to which the physical address associated with the virtual address is stored.
- In accordance with yet another aspect of the invention there is provided a method of performing a virtual address lookup function for a translation lookaside buffer including RAM and synthesisable logic including the steps of: providing a virtual address to the synthesisable logic; hashing the provided virtual address to provide a hash result;
- based on the hash result determining a memory location within the RAM relative to which is stored a virtual address identifier and a physical address related thereto;
- comparing the virtual address to the virtual address identifier to determine if the physical address corresponds to the provided virtual address; and, when the physical address corresponds to the provided virtual address, providing the physical address as an output value.
- The invention will now be described with reference to the drawings in which:
-
FIG. 1 a illustrates a prior art transistor implementation of a SRAM circuit; -
FIG. 1 b illustrates a prior art transistor implementation of a CAM circuit; -
FIG. 2 illustrates a prior art translation process from a virtual address (VA) to a physical address (PA); -
FIG. 3 illustrates a prior art translation from a VA to a PA when performed in conjunction with a direct mapped cache memory; -
FIG. 4 a generally illustrates a translation lookaside buffer formed using synthesisable logic components and a random access memory; -
FIG. 4 b illustrates a translation lookaside buffer in more detail formed from synthesisable logic components; -
FIG. 4 c outlines the steps taken for operation of the TLB; -
FIG. 5 illustrates a hashing circuit in more detail; and, -
FIG. 6 illustrates a variation of the hashing circuit shown inFIG. 5 . - CAM circuits include storage circuits similar in structure to SRAM circuits. However, CAM circuits also include search circuitry offering an added benefit of a parallel search mode of operation, thus enabling searching of the contents of the CAM in parallel using hardware. When searching the CAM for a particular data value, the CAM provides a match signal upon finding a match for that data value within the CAM. A main difference between CAM and SRAM is that in a CAM, data is presented to the CAM representative of a virtual address and an address relating to the data is returned, whereas in a SRAM, an address is provided to the SRAM and data stored at that address is returned.
- The cells of the CAM are arranged so that each row of cells holds a memory address and that row of cells is connected by a match line to a corresponding word line of the data array to enable access of the data array in that word line when a match occurs on that match line. In a fully associative cache each row of the CAM holds the full address of a corresponding main memory location and the inputs to the CAM require the full address to be input.
- A prior art publication, entitled “A Reconfigurable Content Addressable Memory,” by Steven A Guccione et al., discusses the implementation of a CAM within an FPGA. As is seen in Prior Art
FIG. 1 , at a transistor level, the implementation of aCAM circuit 101 is very similar to astandard SRAM 100. Both CAM and SRAM circuits are almost identical, each having 6transistors 102 except for the addition of threematch transistors 103 that provide for the parallel search capability of theCAM 101. Unfortunately, using standard programmable logic devices does not facilitate implementing such transistor level circuits. - In the prior art publication the implementation of the CAM in an FPGA is discussed. Using gate level logic to implement a CAM often results in an undesirable size of the CAM. Flip-flops are used as the data storage elements within the CAM and as a result the size of the CAM circuit attainable using an FPGA is dependent upon the number of flip-flops available within the FPGA. Implementing the CAM in an FPGA quickly depletes many of the FPGA resources and as a result is not a viable solution. Unfortunately this has lead prior designers to conclude that the CAM is only efficiently implemented at a transistor level.
- The prior art publication also addresses implementing of a CAM using look up tables (LUTs) in an FPGA. Rather than using flip-flops within the FPGA to store the data to be matched, this implementation addresses the use of LUTs for storing of the data to be matched. By using LUTs rather than flip-flops a smaller CAM architecture is possible.
- Unfortunately, forming CAMs from synthesisable elements is not easily done so prior art processors that offer CAM are provided with a CAM core within the processor. Providing a CAM core within the processor unfortunately makes the resulting circuit expensive because of the added design complexity. Such additional design complexity is ill-suited for small batch custom design processors.
-
FIG. 2 illustrates the translation process from a virtual address (VA) 201 to a physical address (PA) 202. TheVΛ 201 is a 32-bit address, VA[31:0], and thePA 202 is also a 32-bit address PA[31:0]. The VA has two portions, a virtual page number (VPN) 203 and a page offset (PO) 204. TheVPN 203 is typically located in the upper portion of theVA 201 and thePA 202 is typically located in the lower portion, though this need not be so. Typically for a 32-bit addressing scheme, the VPN is 20 bits and the PA is 12 bits. The PA, or lower 12 bits translate directly into the PA. TheVPN 203 is used for indexing theTLB 205 to retrieve a physical page number (PPN) 206 therefrom. In other words, theVPN 203 undergoes translation to thePPN 206. Combining thePPN 206 in the upper portion of thePA 202 and the PO into the lower portion of the PA provides a translation from the VA to the PA. -
FIG. 3 illustrates the translation from aVA 201 to aPA 202 when performed in conjunction with a direct mappedcache memory 301. At the beginning of a translation cycle, the VA is used to access both thecache memory 301 and theTLB 205. The page-offset portion of the VA is used to access thecache memory 301—the page offset being the portion of the address that remains unmodified by the translation process. The page offset is used to index atag array 302 and adata array 303 found incache memory 301 where the page offset is used to index acache line 302 a within thecache memory 301. Access to theTLB 205 is performed using theVPN 203 portion of theVA 201. TheTLB 205 typically comprises aTLB tag array 304 and aTLB data array 305. Both theTLB tag array 304 and theTLB data array 305 contain bits from theVPN 203 such that when a VPN is provided to both of these arrays, the bits making up the VPN are compared to those stored within thearrays TLB 205. - Once the
TLB data array 305 is accessed and a match is found between the VPN and an entry within theTLB data array 305 a, thePPN 206 is retrieved and is provided to thecache memory 301 and used for comparison to the tag retrieved 302 a from thetag array 302. A match being indicative of a cache “hit” 306. If a match is found between theVPN 203 and an entry within theTLB tag array 304 a then a TLB hitsignal 307 is generated. In this manner, the cache is only accessed using bits of thePPN 206. The above example illustrates the use of a direct mapped cache memory; however, the same translation of a VA to a PA is applicable to set-associative caches as well. When set-associative caches are used, those of skill in the art appreciate that the size of a cache way is less than or equal to the size of a virtual page. - Unfortunately, when a TLB is implemented in SRAM, an exhaustive search of the memory is required to support CAM functionality. Thus, when a TLB has storage for 1024 virtual addresses and their corresponding Physical Address, each address translation requires up to 1024 memory access and comparison operations. Such a CAM implementation is unworkable as the performance drops linearly with CAM size.
-
FIG. 4 a generally illustrates aTLB 400 formed usingsynthesisable logic components 499 and a random access memory (RAM) 410. A VPN for translation is provided via aVPN_IN input port 450, where bits VPN_IN[31:12] are provided from the VA[31:0] to thisinput port 450. A page mask signal is provided via aCP0_PAGE_MASK input port 451. A CP0_TRANSLATION input signal is provided via aCP0_TRANSLATION input port 452. A TLB_TRANSLATION output signal is provided viaTLB_TRANSLATION output port 453, in dependence upon a translation from a VA to a PA using theTLB 400. -
FIG. 4 b illustrates aTLB 400 in more detail formed from synthesizeable logic components, and inFIG. 4 c the steps of operation for theTLB 400 are shown in summary. In more detailed description of the TLB operation, a VPN for translation is provided 480 via aVPN_IN input port 450, where bits VPN_IN[31:12] are provided from the VA[31:0] to thisinput port 450 as the VPN. A page mask is provided via aCP0_PAGE_MASK input port 451. This page mask is provided to apage mask encoder 408, for encoding the page mask according to Table 1. -
TABLE 1 Page Mask Encoding page size mask[2:0] 4 kB 0 0 0 16 kB 0 0 1 64 kB 0 1 0 256 kB 0 1 1 1 M 1 0 0 4 M 1 0 1 16 M 1 1 0 - The
page mask encoder 408 is used for accepting the CPO_PAGE_MASK input signal on an input port thereof and for correlating this input signal to a 3-bit vector, MASK[2:0]. The 3-bit vector MASK[2:0] is further provided to ahashing circuit 406. Thehashing circuit 406 receives VPN_IN[31:12] via afirst input port 406 a and MASK[2:0] via asecond input port 406 b. A hashed vector H_VPN[5:0] is provided from anoutput port 406 c thereof via ahashing operation 481 of thehashing circuit 406. The hashed vector H_VPN[5:0] and the MASK[2:0] are further provided to each one of 48registers 409, where each register consists of multiple flip-flops collectively referred to as 491. Each of theregisters 409 has two output ports. A first output signal from a first output port thereof is provided to acomparator circuit 403. A second output signal from a second output port is provided to thesecond input port 406 b on one of 48 hashingcircuits 406. The first input port on this hashing circuit receives VPN_IN[31:12]. Thehashing circuit 406 output port is coupled to one of 48comparator circuits 403 for performing a comparison between the register output and the hashing circuit output signal. Each of the comparators, in dependence upon a comparison of two input signals, provides a ‘1’ if the signals are the same and a ‘0’ if they are different. Output signals hit, from each of the 48 comparators is provided to one of 48 single bit 2-input multiplexers 411. Outputs ports from each of the multiplexers are coupled to a flip-flop 404. Each of the flip-flop 404 generates an output signal provided at the output ports labeled try1, where collectively these output signals try[0 . . . 47], for 0≦i≦47 are provided to apriority encoder circuit 401. The priority encoder circuit is further coupled to abinary decoder circuit 402, where the priority encoder circuit asserts a TLB_ENTRY[5:0] signal to thebinary decoder circuit 402 and to theRAM 410. Three output ports are provided within theTLB 400, anENTRY_FOUND output port 454, anENTRY_NOT_FOUND output port 455 and aTLB_TRANSLATION output port 453, for providing ENTRY_FOUND, ENTRY_NOT_FOUND, and TLB_TRANSLATION output signal, respectively. - An address for translation from a VA to a PA is stored in a random access memory (RAM) 410, with the
RAM 410 preferably having 48-entries, in the form of lines. In use, whenever a new translation is to be performed, input signals VPN_IN, CP0_PAGE_MASK, and CP0_TRANSLATION are provided to theTLB circuit 400 viainput ports RAM 410 for a given index, i. The given index, indexes one of thelines 410 a within the RAM that holds the translation to the PPN. Thehashing circuit 406 computes the hash function H (VPN_IN, MASK) and stores the result in a corresponding 6-bit register h 1 490. The page mask is stored in the 3-bit register m i 491. - When a translation is requested using the TLB, a VPN is provided via the
input port 450 and the hash functions H (VPN_IN, m1) is computed for all i and compared to h1. This yields a 48bit vector 492 hit0 . . . hit47 which is subsequently loaded into a 48 bit register 493 try0 . . . try47. In order to determine whether the requested VPN_IN is present in the translation table stored inRAM 482, only those entries, or lines, in RAM are checked for which tryi is asserted. An entry in the 48-bit try1 vector is asserted if it yields a ‘1’ 483. Of course, there may be more than one bit asserted in the try1 vector, but thepriority encoder 401 selects the entry with the lowest index to address entries within the RAM. Thedecoder 402 converts this index to a 48-bit one-hot vector 494 clr0 . . . clr47. When the clock pulse arrives from a clock circuit (not shown), the try1 vector is reloaded, except for a bit corresponding to an index just used to address the RAM, which is cleared. This process is repeated, one entry at atime 483. The process stops as soon as the requested entry is found 484, as indicated by the ENTRY_FOUND signal on theENTRY_FOUND output port 454, or when all bits in try1 are 0. When all bits in tryi are ‘0’ then the ENTRY_NOT_FOUND signal is provided via theENTRY_NOT_FOUND output port 455. In the first case the translation is successful and information for the translation is provided 485 from theRAM 410 using a TLB_TRANSLATION signal on theTLB_TRANSLATION output port 453. In the second case the translation is not successful and the TLB reports a TLB refill exception. -
FIG. 5 illustrates ahashing circuit 506 in more detail. Using the MASK[2:0] and VPN[31:12] input signals to thehashing circuit 506, a 7 to 1multiplexer 501 provides the H_VPN[5:0] output signal from the hashingcircuit 506 in dependence upon the MASK[2:0] signal provided to thesecond input port 506 b. This hashing circuit selects the 6 least significant bits from the VPN. The selection is controlled by the page mask because the definition of “least significant” changes with the page size. For example, with a 4 kB page size, the 6 least significant bits (LSB)s of the VPN are bits 22:17, but with a 16 kB page size the 6 LSBs are bits 19:14. Since theTLB 400 stores two adjacent virtual pages per TLB entry, called an odd/even pair, the 6 LSBs for a 4 kB page odd/even pair are bits 18:13. Thus bit 12 decides whether to return the even (0) or odd (1) translation, and for a 16 kB odd/even pair the bits are 20:15. This hash function, however, is redundant, since the ordering of bits H_VPN[5:0] is irrelevant.FIG. 6 exploits the fact that ordering of bits is irrelevant. -
FIG. 6 illustrates a variation of the hashing circuit shown inFIG. 5 . A VPN[31:12] signal is provided to thefirst input port 606 a, and a MASK[2:0] signal is provided to thesecond input port 606 b. The mask signal MASK[2:0] is comprised of hits m0, m1, and m2. Within this hashing 606 circuit there are 3, 3 to 1multiplexers 601 through 603. Thefirst multiplexer 601 receives the following hits, {m2,m 2(m1+m0)} on its selection input ports, where bits from the VPN, VPN[13:14], VPN[19:20], VPN[25:26] are provided to multiplexer data input ports, 0 through 2, respectively.Multiplexer 601 thus providesbits second multiplexer 602 receives the following bits {m2(m1+m0),m 2m1+m2m1m0 )} on its selection input ports, where bits from the VPN, VPN[15:16], VPN[21:22], VPN[27:28] are provided to multiplexer data input ports, labeled 0 through 2, respectively.Multiplexer 602 thus providesbits third multiplexer 603 receives the following bits {m2m1,m 2m1m0+m2m1 )} on its selection input ports, where bits from the VPN, VPN[17:18], VPN[23:24], VPN[29:30] are provided to multiplexer data input ports, labeled 0 through 2, respectively.Multiplexer 603 thus providesbits - Preferably, the hash function H_VPN[5:0] is uniformly distributed for MASK[2:0] and for VPN_IN[31:12] input signals. In the case of a TLB miss, all entries within the RAM are looked up for which try1 is initially asserted. The number of cycles Nmiss is given by the following equation:
-
- where p is the probability that a comparator output signal hit, is asserted. The term:
-
- gives the probability that exactly j bits in the try vector tryi are initially asserted. Having a uniform hashing function H with n bits at the output signal thereof, p=2−n, wherein the case of
FIG. 4 b, n=6. - In the case of a TLB hit, at least one access to the
RAM 410 us required, as opposed to a TLB miss condition which is detected without accessing the RAM, since in a TLB miss condition the try vector try1 contains all zeros. - The average number of cycles to perform a translation that hits in the TLB is given by the following formula:
-
- For a TLB hit, there must be at least one ‘1’ in the try vector try1. The only uncertainty is with the remaining elements within the vector. The variable k is used to represent the number of remaining entries that are set to ‘1’ within the try vector try1 for k in the range from 0 . . . 47. If k=0 then only one entry within the RAM is looked up. Therefore, since one clock cycle was used to find the translation in the first location for i=0, then a total of two clock cycles are utilized to perform the translation. On average, it takes 2+k/2 cycles to return the requested translation from
RAM 410. - In terms of performing the translation and interrupt latency, the number of clock cycles required is examined for long lookup sequences, for instance having a k as high as 25 or more. The following relation:
-
- gives the probability that the TLB will use 25 or more cycles to complete a translation. Table 2 lists, for a range of hash function widths (n), the average number of cycles it takes to find a translation Nhit, to detect a miss Nmiss and the probability that the TLB operation takes 25 cycles or more.
-
TABLE 2 TLB latency as a function of the number of hash bits ‘n’ n Nhit Nhitq Nmiss P{N25} 3 4.94 3.94 7.00 4.3 10−11 4 3.46 2.46 4.00 5.9 10−18 5 2.73 1.73 2.50 3.6 10−25 6 2.37 1.37 1.75 1.5 10−32 7 2.18 1.18 1.38 5.4 10−40 - From Table 2 it is evident that P {N25} is so small that even with a 4 bit hash function it takes more than 6000 years of continuous operation to run into a case where the TLB translation requires between 25 and 48 clock cycles.
- The column Nhitq (“hit quick”) applies to the case where the VPN_IN is applied continuously to the
TLB circuit 400. From this table it is evident that having n=5 or n=6 is sufficient when focusing on the most important number, which is Nhit. There is not much to be gained beyond 6 bits, since Nhit approaches 2.0 when n=>20. A value of n=6 is used in theTLB circuit 400 since the hash function may not be very uniform. Therefore, 6-bit hash function used within the TLB approximates the performance of a 5-bit truly uniform hash function. - Advantageously, when VA is provided to the TLB it is propagated to the synthesized logic for each line and a result is provided indicated by at least an asserted bit within the try1 vector of bits. Only those lines for which a result indicative of a match occurred are then physically accessed to provide the PPN As such only a small fraction of the TLB lines are accessed for the translation process, thus resulting in a substantial performance improvement.
- Numerous other embodiments may be envisaged without departing from the spirit or scope of the invention.
Claims (21)
1-34. (canceled)
35. A method, comprising:
generating a try vector that indicates which, if any, lines in a memory potentially store a translation for a virtual address;
accessing a line of the memory based on the try vector; and
determining whether the accessed line includes the translation for the virtual address.
36. The method of claim 35 , further comprising:
generating an index for a line of the memory based on the try vector;
using the index to access the line of the memory; and
outputting a physical page number for the virtual address in response to the accessed line including the translation for the virtual address.
37. The method of claim 35 , wherein said determining comprises determining that the accessed line includes the translation for the virtual address in response to the accessed line including a virtual page number for the virtual address.
38. The method of claim 35 , further comprising:
generating an index for a line of the memory based on the try vector;
using the index to access the line of the memory;
updating the try vector to indicate the accessed line of the memory does not include the translation for the virtual address; and
accessing another line of the memory based on the updated try vector.
39. The method of claim 35 , further comprising:
generating an index for a line of the memory based on the try vector;
using the index to access the line of the memory;
updating the try vector to indicate that the accessed line of the memory does not include the translation for the virtual address; and
signaling that the memory does not include the translation for the virtual address in response to the try vector indicating that no line potentially stores the translation for the virtual address.
40. The method of claim 35 , wherein said generating a try vector comprises setting bits of the try vector that correspond to lines of the memory that potentially store the translation for the virtual address.
41. The method of claim 35 , further comprising:
accessing a line of the memory corresponding to a set bit of the try vector; and
clearing the set bit of the try vector corresponding to the line of the memory in response to determining that the corresponding line does not include the translation for the virtual address.
42. The method of claim 41 , further comprising signaling that the memory does not include the translation for the virtual address in response to all bits of the try vector being cleared.
43. An apparatus, comprising:
a memory including a plurality of lines;
a register configured to store a try vector that indicates which, if any, lines in the memory potentially store a translation for a virtual address;
an encoder configured to generate an index based on the try vector and access a line of the memory using the index;
a comparator configured to indicate whether the accessed line of the memory includes the translation for the virtual address; and
an output port configured to output the translation for the virtual address.
44. The apparatus of claim 43 , wherein the output port is further configured to output a physical page number stored in the accessed line.
45. The apparatus of claim 43 , wherein said comparator is configured to indicate that the accessed line includes the translation for the virtual address in response to determining that a virtual page number stored in the accessed line matches a virtual page number of the virtual address.
46. The apparatus of claim 43 , further comprising:
a decoder configured to receive the index and to specify a bit of the try vector to update to a status that indicates the accessed line of the memory does not include the translation for the virtual address;
wherein the encoder is further configured to generate another index based on the updated try vector and access another line of the memory using the other index.
47. The apparatus of claim 43 , further comprising:
a decoder configured to receive the index and to specify a bit of the try vector to update to a status that indicates the accessed line of the memory does not include the translation for the virtual address; and
wherein the encoder is further configured to signal that the memory does not include the translation for the virtual address in response to the try vector indicating that no line potentially stores the translation for the virtual address.
48. The apparatus of claim 43 , wherein set bits of the try vector indicate that corresponding lines of the memory potentially store the translation for the virtual address.
49. The apparatus of claim 48 , wherein:
the encoder is further configured to select a set bit of the try vector and to generate the index for a line of the memory that corresponds to the selected bit; and
the apparatus further comprises a decoder configured to receive the index and to specify that the set bit of the try vector corresponding to the line is to be cleared if the accessed line of the memory does not include the translation for the virtual address.
50. The apparatus of claim 48 , wherein the encoder is further configured to signal that memory does not include the translation for the virtual address in response to all bits of the try vector being cleared.
51. An apparatus, comprising:
a memory including a plurality of lines;
means for activating bits of a try vector that correspond to lines of the memory that potentially store a translation for a virtual address;
means for accessing a line of the memory based on the try vector; and
means for determining whether the accessed line includes the translation for the virtual address.
52. The apparatus of claim 51 , further comprising:
means for generating an index for a line of the memory based on the try vector;
means for using the index to access the line of the memory; and
means for outputting a physical page number for the virtual address in response the accessed line including the translation for the virtual address.
53. The apparatus of claim 51 , further comprising:
means for generating an index for a line of the memory based on the try vector;
means for using the index to access the line of the memory;
means for deactivating the activated bit of the try vector corresponding to the accessed line in response to the accessed line not including the translation for the virtual address; and
means for accessing another line of the memory based on the try vector.
54. The apparatus of claim 51 , further comprising:
means for generating an index for a line of the memory based on the try vector;
means for using the index to access the line of the memory;
means for deactivating the activated bit of the try vector corresponding to the accessed line in response to the accessed line not including the translation for the virtual address; and
means for signaling that the memory does not include the translation for the virtual address in response to all bits of the try vector being deactivated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/298,800 US8607026B2 (en) | 2002-09-13 | 2011-11-17 | Translation lookaside buffer |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/242,785 US20040054867A1 (en) | 2002-09-13 | 2002-09-13 | Translation lookaside buffer |
US13/298,800 US8607026B2 (en) | 2002-09-13 | 2011-11-17 | Translation lookaside buffer |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/242,785 Continuation US20040054867A1 (en) | 2002-09-13 | 2002-09-13 | Translation lookaside buffer |
Publications (2)
Publication Number | Publication Date |
---|---|
US20120066475A1 true US20120066475A1 (en) | 2012-03-15 |
US8607026B2 US8607026B2 (en) | 2013-12-10 |
Family
ID=31991480
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/242,785 Abandoned US20040054867A1 (en) | 2002-09-13 | 2002-09-13 | Translation lookaside buffer |
US13/298,800 Expired - Lifetime US8607026B2 (en) | 2002-09-13 | 2011-11-17 | Translation lookaside buffer |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/242,785 Abandoned US20040054867A1 (en) | 2002-09-13 | 2002-09-13 | Translation lookaside buffer |
Country Status (7)
Country | Link |
---|---|
US (2) | US20040054867A1 (en) |
EP (1) | EP1552397A1 (en) |
JP (1) | JP2005538465A (en) |
KR (1) | KR101145557B1 (en) |
CN (1) | CN100555248C (en) |
AU (1) | AU2003260822A1 (en) |
WO (1) | WO2004025480A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060047912A1 (en) * | 2004-08-30 | 2006-03-02 | Texas Instruments Incorporated | System and method for high performance, power efficient store buffer forwarding |
Families Citing this family (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7236490B2 (en) * | 2000-11-17 | 2007-06-26 | Foundry Networks, Inc. | Backplane interface adapter |
US7596139B2 (en) | 2000-11-17 | 2009-09-29 | Foundry Networks, Inc. | Backplane interface adapter with error control and redundant fabric |
US7187687B1 (en) | 2002-05-06 | 2007-03-06 | Foundry Networks, Inc. | Pipeline method and system for switching packets |
US7649885B1 (en) | 2002-05-06 | 2010-01-19 | Foundry Networks, Inc. | Network routing system for enhanced efficiency and monitoring capability |
US7468975B1 (en) * | 2002-05-06 | 2008-12-23 | Foundry Networks, Inc. | Flexible method for processing data packets in a network routing system for enhanced efficiency and monitoring capability |
US20120155466A1 (en) | 2002-05-06 | 2012-06-21 | Ian Edward Davis | Method and apparatus for efficiently processing data packets in a computer network |
US7266117B1 (en) | 2002-05-06 | 2007-09-04 | Foundry Networks, Inc. | System architecture for very fast ethernet blade |
US20090279558A1 (en) * | 2002-05-06 | 2009-11-12 | Ian Edward Davis | Network routing apparatus for enhanced efficiency and monitoring capability |
KR100459728B1 (en) * | 2002-10-22 | 2004-12-03 | 삼성전자주식회사 | High-speed translation lookaside buffer |
US6901072B1 (en) | 2003-05-15 | 2005-05-31 | Foundry Networks, Inc. | System and method for high speed packet transmission implementing dual transmit and receive pipelines |
US7817659B2 (en) | 2004-03-26 | 2010-10-19 | Foundry Networks, Llc | Method and apparatus for aggregating input data streams |
US8730961B1 (en) * | 2004-04-26 | 2014-05-20 | Foundry Networks, Llc | System and method for optimizing router lookup |
US7657703B1 (en) * | 2004-10-29 | 2010-02-02 | Foundry Networks, Inc. | Double density content addressable memory (CAM) lookup scheme |
US7948171B2 (en) * | 2005-02-18 | 2011-05-24 | Semiconductor Energy Laboratory Co., Ltd. | Light emitting device |
US8448162B2 (en) | 2005-12-28 | 2013-05-21 | Foundry Networks, Llc | Hitless software upgrades |
US20070288690A1 (en) * | 2006-06-13 | 2007-12-13 | Foundry Networks, Inc. | High bandwidth, high capacity look-up table implementation in dynamic random access memory |
CN100520710C (en) * | 2006-07-27 | 2009-07-29 | 中国科学院计算技术研究所 | TLBR inside exception processing method in complicated instruction system and processor |
US7903654B2 (en) * | 2006-08-22 | 2011-03-08 | Foundry Networks, Llc | System and method for ECMP load sharing |
US8238255B2 (en) | 2006-11-22 | 2012-08-07 | Foundry Networks, Llc | Recovering from failures without impact on data traffic in a shared bus architecture |
US8151082B2 (en) * | 2007-12-06 | 2012-04-03 | Fusion-Io, Inc. | Apparatus, system, and method for converting a storage request into an append data storage command |
WO2008127458A2 (en) * | 2006-12-06 | 2008-10-23 | Fusion Multisystems, Inc. (Dba Fusion-Io) | Apparatus, system, and method for a shared, front-end, distributed raid |
US8161353B2 (en) | 2007-12-06 | 2012-04-17 | Fusion-Io, Inc. | Apparatus, system, and method for validating that a correct data segment is read from a data storage device |
US8155011B2 (en) * | 2007-01-11 | 2012-04-10 | Foundry Networks, Llc | Techniques for using dual memory structures for processing failure detection protocol packets |
US8037399B2 (en) * | 2007-07-18 | 2011-10-11 | Foundry Networks, Llc | Techniques for segmented CRC design in high speed networks |
US8271859B2 (en) | 2007-07-18 | 2012-09-18 | Foundry Networks Llc | Segmented CRC design in high speed networks |
US8509236B2 (en) | 2007-09-26 | 2013-08-13 | Foundry Networks, Llc | Techniques for selecting paths and/or trunk ports for forwarding traffic flows |
US8090901B2 (en) | 2009-05-14 | 2012-01-03 | Brocade Communications Systems, Inc. | TCAM management approach that minimize movements |
US8599850B2 (en) | 2009-09-21 | 2013-12-03 | Brocade Communications Systems, Inc. | Provisioning single or multistage networks using ethernet service instances (ESIs) |
US20110216769A1 (en) * | 2010-03-08 | 2011-09-08 | Brocade Communications Systems, Inc. | Dynamic Path Selection |
CN102486751A (en) * | 2010-12-01 | 2012-06-06 | 安凯(广州)微电子技术有限公司 | Method for realizing virtual big page through small page NANDFLASH on micro memory system |
US20130067289A1 (en) * | 2011-09-14 | 2013-03-14 | Ariel Maislos | Efficient non-volatile read cache for storage system |
US9569369B2 (en) * | 2011-10-27 | 2017-02-14 | Oracle International Corporation | Software translation lookaside buffer for persistent pointer management |
CN102662860B (en) * | 2012-03-15 | 2015-07-01 | 天津国芯科技有限公司 | Translation lookaside buffer (TLB) for process switching and address matching method therein |
US9343177B2 (en) * | 2013-02-01 | 2016-05-17 | Apple Inc. | Accessing control registers over a data bus |
CN104216833B (en) | 2013-05-29 | 2017-10-10 | 华为技术有限公司 | A kind of method and device for determining physical address |
US10007435B2 (en) | 2015-05-21 | 2018-06-26 | Micron Technology, Inc. | Translation lookaside buffer in memory |
US9959044B2 (en) * | 2016-05-03 | 2018-05-01 | Macronix International Co., Ltd. | Memory device including risky mapping table and controlling method thereof |
US10545877B2 (en) * | 2018-04-05 | 2020-01-28 | Arm Limited | Apparatus and method for accessing an address translation cache |
US11055232B2 (en) * | 2019-03-29 | 2021-07-06 | Intel Corporation | Valid bits of a translation lookaside buffer (TLB) for checking multiple page sizes in one probe cycle and reconfigurable sub-TLBS |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6212603B1 (en) * | 1998-04-09 | 2001-04-03 | Institute For The Development Of Emerging Architectures, L.L.C. | Processor with apparatus for tracking prefetch and demand fetch instructions serviced by cache memory |
US6356990B1 (en) * | 2000-02-02 | 2002-03-12 | International Business Machines Corporation | Set-associative cache memory having a built-in set prediction array |
US20030074537A1 (en) * | 1997-12-31 | 2003-04-17 | Roland Pang | Method and apparatus for indexing a cache |
US6581140B1 (en) * | 2000-07-03 | 2003-06-17 | Motorola, Inc. | Method and apparatus for improving access time in set-associative cache systems |
US6625714B1 (en) * | 1999-12-17 | 2003-09-23 | Hewlett-Packard Development Company, L.P. | Parallel distributed function translation lookaside buffer |
US6687789B1 (en) * | 2000-01-03 | 2004-02-03 | Advanced Micro Devices, Inc. | Cache which provides partial tags from non-predicted ways to direct search if way prediction misses |
US6925464B2 (en) * | 2002-06-13 | 2005-08-02 | Intel Corporation | Method and system for performing inserts and lookups in memory |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4170039A (en) * | 1978-07-17 | 1979-10-02 | International Business Machines Corporation | Virtual address translation speed up technique |
US4215402A (en) * | 1978-10-23 | 1980-07-29 | International Business Machines Corporation | Hash index table hash generator apparatus |
USRE37305E1 (en) * | 1982-12-30 | 2001-07-31 | International Business Machines Corporation | Virtual memory address translation mechanism with controlled data persistence |
US4680700A (en) * | 1983-12-07 | 1987-07-14 | International Business Machines Corporation | Virtual memory address translation mechanism with combined hash address table and inverted page table |
JPH0760411B2 (en) * | 1989-05-23 | 1995-06-28 | 株式会社日立製作所 | Buffer storage controller |
GB9205551D0 (en) * | 1992-03-13 | 1992-04-29 | Inmos Ltd | Cache memory |
US5574877A (en) * | 1992-09-25 | 1996-11-12 | Silicon Graphics, Inc. | TLB with two physical pages per virtual tag |
US5526504A (en) * | 1993-12-15 | 1996-06-11 | Silicon Graphics, Inc. | Variable page size translation lookaside buffer |
EP0675443A1 (en) * | 1994-03-30 | 1995-10-04 | Digital Equipment Corporation | Apparatus and method for accessing direct mapped cache |
US5752275A (en) * | 1995-03-31 | 1998-05-12 | Intel Corporation | Translation look-aside buffer including a single page size translation unit |
US6026476A (en) * | 1996-03-19 | 2000-02-15 | Intel Corporation | Fast fully associative translation lookaside buffer |
US5860147A (en) * | 1996-09-16 | 1999-01-12 | Intel Corporation | Method and apparatus for replacement of entries in a translation look-aside buffer |
US6014732A (en) * | 1997-10-22 | 2000-01-11 | Hewlett-Packard Company | Cache memory with reduced access time |
US6205531B1 (en) * | 1998-07-02 | 2001-03-20 | Silicon Graphics Incorporated | Method and apparatus for virtual address translation |
US6381673B1 (en) * | 1998-07-06 | 2002-04-30 | Netlogic Microsystems, Inc. | Method and apparatus for performing a read next highest priority match instruction in a content addressable memory device |
US6360220B1 (en) * | 1998-08-04 | 2002-03-19 | Microsoft Corporation | Lock-free methods and systems for accessing and storing information in an indexed computer data structure having modifiable entries |
US6233652B1 (en) * | 1998-10-30 | 2001-05-15 | Intel Corporation | Translation lookaside buffer for multiple page sizes |
US6625715B1 (en) * | 1999-12-30 | 2003-09-23 | Intel Corporation | System and method for translation buffer accommodating multiple page sizes |
US6629099B2 (en) * | 2000-12-07 | 2003-09-30 | Integrated Silicon Solution, Inc. | Paralleled content addressable memory search engine |
US6889225B2 (en) * | 2001-08-09 | 2005-05-03 | Integrated Silicon Solution, Inc. | Large database search using content addressable memory and hash |
US6700808B2 (en) * | 2002-02-08 | 2004-03-02 | Mobility Electronics, Inc. | Dual input AC and DC power supply having a programmable DC output utilizing a secondary buck converter |
US7136960B2 (en) * | 2002-06-14 | 2006-11-14 | Integrated Device Technology, Inc. | Hardware hashing of an input of a content addressable memory (CAM) to emulate a wider CAM |
-
2002
- 2002-09-13 US US10/242,785 patent/US20040054867A1/en not_active Abandoned
-
2003
- 2003-09-12 KR KR1020057004205A patent/KR101145557B1/en not_active Expired - Fee Related
- 2003-09-12 CN CNB038215691A patent/CN100555248C/en not_active Expired - Fee Related
- 2003-09-12 EP EP03795156A patent/EP1552397A1/en not_active Withdrawn
- 2003-09-12 JP JP2004535773A patent/JP2005538465A/en active Pending
- 2003-09-12 WO PCT/IB2003/003915 patent/WO2004025480A1/en active Application Filing
- 2003-09-12 AU AU2003260822A patent/AU2003260822A1/en not_active Abandoned
-
2011
- 2011-11-17 US US13/298,800 patent/US8607026B2/en not_active Expired - Lifetime
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030074537A1 (en) * | 1997-12-31 | 2003-04-17 | Roland Pang | Method and apparatus for indexing a cache |
US6212603B1 (en) * | 1998-04-09 | 2001-04-03 | Institute For The Development Of Emerging Architectures, L.L.C. | Processor with apparatus for tracking prefetch and demand fetch instructions serviced by cache memory |
US6625714B1 (en) * | 1999-12-17 | 2003-09-23 | Hewlett-Packard Development Company, L.P. | Parallel distributed function translation lookaside buffer |
US6687789B1 (en) * | 2000-01-03 | 2004-02-03 | Advanced Micro Devices, Inc. | Cache which provides partial tags from non-predicted ways to direct search if way prediction misses |
US6356990B1 (en) * | 2000-02-02 | 2002-03-12 | International Business Machines Corporation | Set-associative cache memory having a built-in set prediction array |
US6581140B1 (en) * | 2000-07-03 | 2003-06-17 | Motorola, Inc. | Method and apparatus for improving access time in set-associative cache systems |
US6925464B2 (en) * | 2002-06-13 | 2005-08-02 | Intel Corporation | Method and system for performing inserts and lookups in memory |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060047912A1 (en) * | 2004-08-30 | 2006-03-02 | Texas Instruments Incorporated | System and method for high performance, power efficient store buffer forwarding |
US8775740B2 (en) * | 2004-08-30 | 2014-07-08 | Texas Instruments Incorporated | System and method for high performance, power efficient store buffer forwarding |
Also Published As
Publication number | Publication date |
---|---|
KR20050043944A (en) | 2005-05-11 |
CN1682200A (en) | 2005-10-12 |
US8607026B2 (en) | 2013-12-10 |
JP2005538465A (en) | 2005-12-15 |
AU2003260822A1 (en) | 2004-04-30 |
US20040054867A1 (en) | 2004-03-18 |
WO2004025480A1 (en) | 2004-03-25 |
KR101145557B1 (en) | 2012-05-16 |
EP1552397A1 (en) | 2005-07-13 |
CN100555248C (en) | 2009-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8607026B2 (en) | Translation lookaside buffer | |
US6014732A (en) | Cache memory with reduced access time | |
US6425055B1 (en) | Way-predicting cache memory | |
CA2057403C (en) | Apparatus and method for a space saving translation lookaside buffer for content addressable memory | |
US6230248B1 (en) | Method and apparatus for pre-validating regions in a virtual addressing scheme | |
JP3169155B2 (en) | Circuit for caching information | |
US6493812B1 (en) | Apparatus and method for virtual address aliasing and multiple page size support in a computer system having a prevalidated cache | |
US7805588B2 (en) | Caching memory attribute indicators with cached memory data field | |
US6138225A (en) | Address translation system having first and second translation look aside buffers | |
JP3666689B2 (en) | Virtual address translation method | |
JP4065660B2 (en) | Translation index buffer with distributed functions in parallel | |
US20050050278A1 (en) | Low power way-predicted cache | |
WO1995016963A1 (en) | Variable page size translation lookaside buffer | |
JPH07200399A (en) | Microprocessor and method for access to memory in microprocessor | |
EP0690386A1 (en) | Address translator and method of operation | |
GB2293672A (en) | Virtual page memory buffer | |
US6581140B1 (en) | Method and apparatus for improving access time in set-associative cache systems | |
JPH08227380A (en) | Data-processing system | |
JPH07295889A (en) | Address conversion circuit | |
US5535351A (en) | Address translator with by-pass circuit and method of operation | |
US5860097A (en) | Associative cache memory with improved hit time | |
US20070094476A1 (en) | Updating multiple levels of translation lookaside buffers (TLBs) field | |
US7024536B2 (en) | Translation look-aside buffer for improving performance and reducing power consumption of a memory and memory management method using the same | |
US6385696B1 (en) | Embedded cache with way size bigger than page size | |
JPS623354A (en) | Cache memory access system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NYTELL SOFTWARE LLC, DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NXP B.V.;REEL/FRAME:030663/0411 Effective date: 20110628 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
CC | Certificate of correction | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |