CN115098410B

CN115098410B - Processor, data processing method for processor, and electronic device

Info

Publication number: CN115098410B
Application number: CN202210731118.6A
Authority: CN
Inventors: 胡世文; 薛大庆
Original assignee: Hygon Information Technology Co Ltd
Current assignee: Hygon Information Technology Co Ltd
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2025-08-29
Anticipated expiration: 2042-06-24
Also published as: CN115098410A

Abstract

A processor, a data processing method for the processor and an electronic device. The processor comprises a first-level cache space, a first translation look-aside buffer and at least one preset cache space, wherein the first-level cache space and the at least one preset cache space are sequentially in communication connection to form a communication link, the at least one preset cache space comprises a target preset cache space, the first translation look-aside buffer and the target preset cache space are arranged at the same path level, the first translation look-aside buffer is in communication connection with the target preset cache space, and the first translation look-aside buffer is configured to cache a first-level page table item. The processor can improve the performance of the data prefetcher, so that the accelerator can use the virtual address, thereby greatly simplifying the programming mode of the program using the heterogeneous architecture and improving the performance of the near-memory page table walker.

Description

Processor, data processing method for processor and electronic equipment

Technical Field

Embodiments of the present disclosure relate to a processor, a data processing method for a processor, and an electronic device.

Background

In the field of computer technology, one of the important functions of a computer operating system is memory management. In a multiprocessing operating system, each process has its own Virtual Address space, and any Virtual Address (Virtual Address) within the system specification range can be used. The address used by the central processing unit (Central Processing Unit, CPU) when executing an application is a virtual address. When the operating system allocates memory to a process, the virtual address used needs to be mapped to a physical address (PHYSICAL ADDRESS), which is the real physical memory access address. By dividing the addresses into virtual addresses and physical addresses, program compiling can be simplified, a compiler compiles programs based on continuous and sufficient virtual address space, and virtual addresses of different processes are distributed to different physical addresses, so that a system can simultaneously run a plurality of processes, and the running efficiency of the whole computer system is improved. In addition, since the application can use but cannot alter the address translation, one process cannot access the memory contents of another process, thereby increasing the security of the system.

Disclosure of Invention

At least one embodiment of the present disclosure provides a processor, including a first level cache space, a first translation look-aside buffer, and at least one preset cache space, where the first level cache space and the at least one preset cache space are sequentially communicatively connected to form a communication link, the at least one preset cache space includes a target preset cache space, the first translation look-aside buffer and the target preset cache space are disposed at a same path level, the first translation look-aside buffer is communicatively connected to the target preset cache space, and the first translation look-aside buffer is configured to cache a first level page table entry.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.

FIG. 1 is a schematic diagram of an address translation process;

FIG. 2 is a schematic diagram of a multi-core processor architecture;

FIG. 3 shows a schematic diagram of one example of the contents of a TLB item;

FIG. 4 is a schematic diagram of a processor architecture according to at least one embodiment of the present disclosure;

FIG. 5 is a schematic diagram of another processor architecture provided in accordance with at least one embodiment of the present disclosure;

FIG. 6A shows a schematic diagram of the contents of one data item in a first stage page table entry buffer;

FIG. 6B is a diagram showing the physical address organization of a first stage page table entry;

FIG. 7 illustrates an exemplary diagram of an architecture of a cache in a processor provided in accordance with at least one embodiment of the present disclosure;

FIG. 8 is a flow chart of a method for processing data of a processor according to some embodiments of the present disclosure;

FIG. 9 is a flow chart of another method of data processing for a processor provided in some embodiments of the present disclosure;

FIG. 10 is a flow chart of another method of data processing for a processor provided in some embodiments of the present disclosure;

FIG. 11 illustrates a schematic flow diagram of another data processing method for a processor provided in accordance with at least one embodiment of the present disclosure;

FIG. 12A is a flow chart of a cache data read;

FIG. 12B is a schematic flow chart of data processing by a processor provided by an embodiment of the present disclosure;

FIG. 13 is a schematic block diagram of an electronic device provided in some embodiments of the present disclosure, an

Fig. 14 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.

Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the terms "a," "an," or "the" and similar terms do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

When the computer operating system processes tasks, virtual addresses used by application programs need to be converted into physical addresses, so that memory access is performed based on the physical addresses to acquire data. The process of converting a virtual address to a physical address is referred to as address translation (Address Generation).

The present disclosure is illustrated by the following several specific examples. In order to keep the following description of the embodiments of the present disclosure clear and concise, the present disclosure omits detailed description of known functions and known components. When any element of an embodiment of the present disclosure appears in more than one drawing, the element is identified by the same or similar reference numeral in each drawing.

FIG. 1 is a schematic diagram of an address translation process illustrating the address translation process of a four-level page table. As shown in fig. 1, a virtual address is divided into several segments, for example denoted as EXT, offset_lvl4, offset_lvl3, offset_lvl2, offset_lvl1, offset_pg, respectively. In this example, the higher virtual address segment EXT is not used. The virtual address segments OFFSET_lvl4, OFFSET_lvl3, OFFSET_lvl2, OFFSET_lvl1 respectively represent the OFFSET values of the fourth-stage page table, that is, the virtual address segment OFFSET_lvl4 represents the OFFSET value of the fourth-stage page table, the virtual address segment OFFSET_lvl3 represents the OFFSET value of the third-stage page table, the virtual address segment OFFSET_lvl2 represents the OFFSET value of the second-stage page table, and the virtual address segment OFFSET_lvl1 represents the OFFSET value of the first-stage page table.

The initial address of the highest level page table (i.e., fourth level page table) is stored in the architecture register reg_pt, whose contents are set by the operating system and cannot be changed by the application program. In the second stage page table, the third stage page table and the fourth stage page table, the starting address of the next stage page table is stored in the page table item of each stage page table. The first stage Page Table Entry (PTE) stores the high order bits of the physical address of the corresponding memory Page, and the high order bits are combined with the virtual address OFFSET (OFFSET_pg) of a virtual address to obtain the physical address corresponding to the virtual address. Therefore, the starting address of the next stage page table is obtained step by step in the mode, and finally the first stage Page Table Entry (PTE) can be obtained, so that the corresponding physical address is further obtained, and the translation from the virtual address to the physical address is realized.

It should be noted that although fig. 1 illustrates a 4-stage page table, embodiments of the present disclosure are not limited thereto, any number of multi-stage page tables may be employed, such as a 2-stage page table, a 3-stage page table, a 5-stage page table, etc., and single-stage page tables may also be employed, as may be desired, and embodiments of the present disclosure are not limited thereto. For example, a system may support memory pages of different sizes, each represented by a number of bits of the virtual address OFFSET offset_pg, e.g., each memory page may be 4K in size. In the same system, the larger memory page requires fewer address translation stages. It should be noted that, any memory page size system may be applied, and if the system supports multiple memory page sizes, page table numbers corresponding to different memory page sizes are also different, which is not limited in this disclosure.

FIG. 2 is a schematic diagram of a multi-core processor architecture. For example, as shown in FIG. 2, the processor has 4 processor cores (CPU cores). At the same time, the processor also has multiple levels of caches, such as a first level Cache (L1 Cache), a second level Cache (L2 Cache), and a last level Cache (LAST LEVEL CACHE, LLC). In this example, the last level Cache is actually a third level Cache (L3 Cache). Of course, embodiments of the present disclosure are not limited in this regard, and a processor may have any number of levels of cache, so the last level of cache may also be any level of cache, as may be desired.

For example, in this example, the last level cache is shared by multiple processor cores, and the second level cache is private to each processor core. That is, multiple processor cores share a final level cache, while each processor core is provided with a dedicated second level cache separately. The last level buffer memory and the second level buffer memory are used for storing instructions and data, and the last level buffer memory is connected with the memory. It should be noted that, in other examples, the second level cache may also be a shared type cache, which is not limited by embodiments of the present disclosure.

For example, a dedicated first level cache is provided for each processor core, the first level cache being provided within the processor core. For example, the first level caches may include a first level instruction cache (L1I cache) and a first level data cache (L1D cache) for caching instructions and data, respectively. The electronic device (such as a computer) comprising the processor also comprises a memory, and the processor core realizes instruction transmission and data reading through a data caching mechanism of the multi-level cache and the memory.

Each level of cache described above may optionally be configured in one of a variety of architectures, such as fully Associative (Fully Associative), set Associative (Set Associative), direct indexing (Directly Indexed), etc., and in use, the replacement policy employed when using recently accessed data to populate multiple entries of the cache may also include recently Used (LEAST RECENTLY Used, LRU), least frequently Used (Least-Frequently Used, LFU), etc., as embodiments of the present disclosure are not limited in this respect.

For example, a translation look-aside buffer (Translation Lookaside Buffer, TLB) is provided for each processor core alone, which may include a translation look-aside buffer (ITLB) for instructions and a translation look-aside buffer (DTLB) for data. For example, both ITLB and DTLB are provided within the processor core. Address translation is a very time consuming process, and for multi-level page tables, multiple accesses to memory are typically required to obtain the corresponding physical address. Taking the 4-level page table shown in fig. 1 as an example, the memory needs to be accessed 4 times to obtain the corresponding physical address. Thus, to save address translation time, to improve computer system performance, a TLB (e.g., comprising ITLB and DTLB) may be provided in the processor core to store a portion of the previously used first-level Page Table Entries (PTEs). When address translation is needed, firstly, a virtual page number of a virtual address to be translated is used for searching whether a first-level page table item is needed in the TLB, if so, the needed first-level page table item can be immediately obtained, and the obtained first-level page table item is used for shifting a corresponding physical address through the address of the translated virtual address. On the other hand, when there is no lookup hit in the TLB, then a page table lookup (e.g., a multi-level page table lookup) as described above is required using the virtual page number of the virtual address to be translated to obtain the required first-level page table entry, which is used to offset the corresponding physical address by the address of the translated virtual address.

Similar to CPU cache architectures, for example, TLB may also have various architectures, such as fully Associative (Fully Associative), set Associative (Set Associative), direct index (Directly Indexed), and so on. The TLB architecture may also be a multi-level structure, with the lowest level TLB having the smallest size and the fastest speed, and searching the next level TLB when the lowest level TLB misses. The replacement policy employed in populating the entries of the TLB may also refer to, for example, a replacement policy in a cache, such as Least recently Used (LEAST RECENTLY Used, LRU), least frequently Used (Least-recently-Frequently Used, LFU), etc., FIG. 3 illustrates a schematic diagram of one example of the contents of a TLB entry.

As shown in FIG. 3, each entry in the TLB (also referred to as a TLB entry or TLB data entry), for example, corresponds to a previously used first Page Table Entry (PTE), including valid bits, virtual address bits, physical address bits, and attribute bits. The valid bit indicates that the data item is a valid item when the valid bit is a valid value, and indicates that the data item is an invalid item when the valid bit is an invalid value. For example, in some examples, the valid bit is a 1bit that is a valid value when its value is 1 and is an invalid value when its value is 0. The virtual address bits store the virtual address (virtual page number) of the memory page or the hash value of the address, the physical address bits store the physical address (physical page number) of the memory page, and the attribute bits store the attribute and state of the memory page. The information of the physical address bits and attribute bits of the TLB entry comes from the corresponding PTE.

Although TLBs can reduce the latency of many address translations, accessing page tables for address translation in the event of a TLB lookup miss (miss) is not avoided during execution of a software program. To reduce the time required for translation operations, a hardware page table walker (Page Table Walker, PTW) is typically provided for the processor core alone, for use in performing the page table walk process, the hardware page table walker being provided within the processor core. By using a hardware page table walker, multiple levels of page tables may be traversed to obtain the final memory page physical address. For example, the page table walker may be located within the processor core or may be located outside of the processor core.

The L1I cache and the L1D cache are accessed using physical addresses (PHYSICALLY INDEXED, virtually tagged mode), and the second level cache, the last level cache, and the memory are also accessed using physical addresses. Therefore, address translation by ITLB or DTLB is required before accessing data. In the address translation process, when page table browsing is required, the hardware page table walker needs to store addresses of page tables of all levels in the memory to read data corresponding to the page tables of all levels, the read data is subjected to address calculation, and the process of reading data related to the page tables of all levels is basically consistent with the process of reading common data, namely, a page table item reading request is basically consistent with a data reading request, and the reading request of the hardware page table walker can reach the memory through a first-level cache, a second-level cache and a last-level cache at the longest. If the data requested by the hardware page table walker is present in a level of cache, the cache returns the data and no more requests from the hardware page table walker are passed to the level of cache/memory, whereas when the data requested by the hardware page table walker is retrieved from memory, the data is temporarily retained (i.e., cached) in at least one level of memory.

The inventors of the present disclosure noted that the processor shown in fig. 2 has the following technical bottlenecks:

First, a data prefetcher is one of the main technologies for reducing the data access delay of a processor core, and can prefetch according to different data access rules, however, a prefetcher trained by any level of cache access except a first level of cache can only use a physical address, and cannot send a prefetching request crossing pages, which greatly influences the performance of the prefetcher;

Second, to further enhance the performance of systems-on-chip including processors, current systems-on-chip are increasingly heterogeneous architectures, typically consisting of multiple processor cores plus other accelerators, however accelerators may not use virtual addresses, which makes the programming mode of programs using heterogeneous architectures more complex;

Third, for architectures that place the page table walker in the processor core, the number of times the page table walker accesses memory is high, and even for architectures that place the page table walker next to the LLC cache, the number of times the page table walker accesses memory may still be high, still limiting its performance.

At least one embodiment of the present disclosure provides a processor, a data processing method for a processor, and an electronic device. The processor is capable of enhancing the performance of the data prefetcher so that the accelerator can use virtual addresses, which greatly simplifies the programming mode of programs using heterogeneous architectures. In addition, for an architecture in which the page table walker is placed beside the LLC cache, the processor provided in the embodiments of the present disclosure can further reduce the number of times the page table walker accesses the memory, thereby improving the performance of the near-memory page table walker.

Embodiments of the present disclosure will be described in detail below with reference to the attached drawings, but the present disclosure is not limited to these specific embodiments.

In the above-described embodiments of the present disclosure, the preset Cache space is a different Cache space from the first level Cache space (L1 Cache), and may be, for example, a second level Cache space (L2 Cache) or a later level Cache space, such as a last level Cache (LLC Cache), or the like.

Fig. 4 is a schematic architecture diagram of a processor according to at least one embodiment of the present disclosure. As shown in fig. 4, in some embodiments of the present disclosure, the processor includes at least one processor core, and a first level cache space, a first translation look-aside buffer, a second translation look-aside buffer, and at least one preset cache space for the processor core.

For example, the processor has, for example, 4 processor cores (CPU cores). The processor also has multiple levels of cache, such as a first level cache, a second level cache, and a last level cache, which in this example are preset cache spaces.

In some examples, the first translation look-aside buffer and the second translation look-aside buffer may be examples of the previously described translation look-aside buffers (TLBs). Here, the first translation look-aside buffer and the second translation look-aside buffer may be examples of the TLB described above, and refer to that the functions of storing PTEs are similar, and the principle adopted is similar, however, the hardware structure, the setting position, etc. of the first translation look-aside buffer and the TLB described above may be different, and the hardware structure, the setting position, etc. of the second translation look-aside buffer and the TLB described above may also be different, which is not limited by the embodiments of the present disclosure.

In contrast to the scenario illustrated in fig. 2, in the embodiment of the present disclosure, the first translation lookaside buffer is a newly added Translation Lookaside Buffer (TLB) that is set at the same path level as any one of the levels of cache space of the processor, which is predetermined, except for the first level of cache space. Here, the second translation look-aside buffer is located within the processor core, and likewise the first level cache space is located within the processor core, the second translation look-aside buffer is disposed at a same path level as the first level cache space, and the first level cache space is communicatively coupled to the second translation look-aside buffer. It should be noted that the first translation look-aside buffer may be selected to have the same size, number of stages, architecture, and replacement policy as the second translation look-aside buffer, or may be selected to have a different size, number of stages, architecture, and replacement policy.

For example, the first level cache space is an L1 cache, and is disposed inside the processing core. For example, the first level cache space is disposed at the same path level as the processor core, the first level cache space is communicatively coupled to the processor core, and the processor core may directly obtain data or instructions from the first level cache space. Here, "set at the same path level" means that physical locations in the chip are adjacent or similar, and data interaction and transfer can be directly performed. Therefore, the first-level cache space and the processor core being arranged at the same path level may mean that the first cache space is arranged inside the processor core, the distance from the processor core is relatively short, and the processor core may directly perform data interaction and transmission with the first-level cache space. For example, "communication connection" means that data/instructions may be transferred directly. For example, a second translation look-aside buffer is disposed at the same path level as the first level cache space, the second translation look-aside buffer being communicatively coupled to the first level cache space. The second translation look-aside buffer and the first level buffer space being arranged at the same path level may mean that the second translation look-aside buffer is arranged beside the first level buffer space and is closer to the first level buffer space, and the first level buffer space and the second translation look-aside buffer may directly perform data interaction and transmission. For another example, a second translation look-aside buffer may be disposed within the processor core, the second translation look-aside buffer may be logically disposed at the same path level as the processor core, the second translation look-aside buffer being communicatively coupled to the processor core.

As shown in fig. 4, in some examples, the first level cache space includes an L1I cache for storing instructions and an L1D cache for storing data. Of course, embodiments of the present disclosure are not limited thereto, and in other examples, instead of distinguishing between an L1I cache and an L1D cache, only one L1 cache may be provided for storing both data and instructions.

For example, in some examples, the at least one preset cache space includes a second level cache space to an nth level cache space, N being an integer greater than 2. The nth level cache space is closest to the memory and furthest from the processor core. For example, in the example shown in fig. 4, the at least one preset buffer space may include a second level buffer space (L2 buffer) and a last level buffer space (LLC), that is, n=3 at this time. Of course, embodiments of the present disclosure are not limited thereto, and N may be any integer greater than 2, such as 4,5, 6, etc., and accordingly, the processor is a level 4 cache architecture, a level 5 cache architecture, a level 6 cache architecture, etc. For example, in other examples, the at least one predetermined cache space includes one cache space, i.e., only a second level cache space, in which case the processor is a level 2 cache architecture. It should be noted that, in the processor provided in the embodiment of the present disclosure, other levels of caches may be used as examples of the preset cache space in the present disclosure, in addition to the first level cache space.

For example, the first level buffer space and at least one preset buffer space are sequentially connected in a communication manner to form a communication link, so that step-down data acquisition can be realized. For example, when a processor core needs to acquire data, it may first go to a first level cache space query, if there is no hit, then proceed to a second level cache space query, if there is still no hit, then go to a last level cache space query. If the last level of cache space is not hit, the data is fetched into the memory.

For example, in the description of the present disclosure, the at least one preset buffer space includes a target preset buffer space as a description target, which may be any one of a plurality of preset buffer spaces. For example, any one of the second-level cache space to the nth-level cache space may be set as the target preset cache space.

For example, a first translation look-aside buffer is disposed at the same path level as the target preset cache space, the first translation look-aside buffer is communicatively coupled to the target preset cache space, and the first translation look-aside buffer is configured to cache the first level page table entry. For example, the first translation look-aside buffer is disposed outside of the processor core.

For example, in the example of fig. 4, the second-level buffer space is set as the target preset buffer space, the first translation look-aside buffer is set at the same path level as the second-level buffer space, and the first translation look-aside buffer is communicatively connected to the second-level buffer space. Therefore, the first translation look-aside buffer and the second-level buffer space are arranged at the same path level, which means that the first translation look-aside buffer is arranged beside the second-level buffer space and is closer to the second-level buffer space, and the second-level buffer space can directly perform data interaction and transmission with the first translation look-aside buffer. For example, the second level cache space is a private type or shared type cache space for the processor core, and the second level cache space is a target preset cache space. That is, in some processor architectures, the second level cache space is provided separately for each processor core, of a private type, while in other processor architectures, the second level cache space is shared by multiple processor cores, of a shared type. Whether the second-level cache space is of private type or of shared type, the second-level cache space may be targeted for a preset cache space.

For example, in some embodiments of the present disclosure, the second-level cache space is a private-type cache space for the processor cores, the number of the second-level cache spaces is a plurality, the number of the processor cores is a plurality, the plurality of the second-level cache spaces is one-to-one corresponding to the plurality of the processor cores, the plurality of the second-level cache spaces is a target preset cache space and is provided with a plurality of first translation look-aside buffers, that is, the plurality of the second-level cache spaces is one-to-one corresponding to the plurality of the first translation look-aside buffers, and each of the second-level cache spaces is corresponding to one of the first translation look-aside buffers.

Fig. 5 is a schematic architecture diagram of another processor provided in at least one embodiment of the present disclosure.

The architecture of the processor in fig. 5 is substantially the same as the architecture of the processor in fig. 4, with, for example, 4 processor cores. The processor also has multiple levels of cache, such as a first level cache, a second level cache, and a last level cache, which in this example are preset cache spaces. The difference is that, in the example of fig. 5, the last level of buffer space is set as the target preset buffer space, the first translation look-aside buffer and the last level of buffer space are set at the same path level, and the first translation look-aside buffer is communicatively connected with the last level of buffer space. Therefore, the first translation look-aside buffer and the last level buffer space are arranged at the same path level, which means that the first translation look-aside buffer is arranged beside the last level buffer space and is closer to the last level buffer space, and the last level buffer space can directly perform data interaction and transmission with the first translation look-aside buffer.

For example, in some embodiments, the nth level cache space is a shared type cache space for processor cores, i.e., multiple processor cores share the same nth level cache space, the nth level cache space being a target preset cache space, an example of which is the case shown in fig. 5.

It should be noted that, although fig. 4 illustrates that the second level cache space is the target preset cache space and the first translation look-aside buffer is disposed beside the second level cache space, fig. 5 illustrates that the last level cache space is the target preset cache space and the first translation look-aside buffer is disposed beside the last level cache space, this does not constitute a limitation on the embodiments of the present disclosure. In some examples, when the processor includes more levels of cache, any level of cache space other than the first level of cache space may be targeted to a preset cache space, thereby adjusting the setting position of the first translation look-aside buffer accordingly. It is noted that the first translation look-aside buffer is not disposed within the processor core or beside the first level cache space.

In the embodiment of the disclosure, by setting the first translation look-aside buffer at the same path level as the target preset buffer space (for example, at a side of another buffer space other than the first level buffer space, for example, at a processor core), the barrier that the data prefetcher cannot use the virtual address can be overcome, the barrier that the data prefetcher cannot send the prefetch request across pages can be overcome, and the performance of the data prefetcher can be improved. Also, since the first translation look-aside buffer is provided, virtual addresses can be used for any component provided outside the processor core, which greatly improves convenience. For example, accelerators placed beside any level of cache space other than the first level of cache space may also use virtual addresses, which greatly simplifies the programming mode using heterogeneous architectures. Moreover, for an architecture in which the page table walker is located outside of the processor core (for example, an architecture in which the page table walker is located beside the LLC), the processor provided by the embodiments of the present disclosure can further reduce the number of times the page table walker accesses the memory, thereby improving the performance of the page table walker.

For example, the first level buffer space and at least one preset buffer space are sequentially connected in communication to form a communication link. For example, when the processor core needs to acquire data, it may first query the first level cache space, and if there is no hit, it will step by step send a data read request to the lower level cache until the memory obtains the data to be read.

For example, the first translation look-aside buffer stores at least some of the page table entry data of the first stage page table to page table entry data of the mth stage page table, M being an integer greater than 1. That is, the first translation look-aside buffer may store any page table entry data that has been recently used.

For example, in some embodiments of the present disclosure, the processor core is configured to generate a first level page table entry read request in response to the absence of first level page table entry data in the first translation look-aside buffer or the second translation look-aside buffer that is required for address translation.

For example, when the processor core needs to perform address translation, it first queries the first translation look-up buffer or the second translation look-up buffer for whether a PTE is needed, if not, it needs to perform page table walk, so the page table walker generates a first-level page table entry read request, and sequentially sends the first-level page table entry read request to each level of cache until the memory performs data read, so the first-level page table entry read request is sequentially transferred downwards in the multi-level cache structure. When the target preset cache space as the description object receives the first-level page table entry read request, then a static random access memory (as described in detail later) in the target preset cache space is queried whether a needed PTE exists, if yes, the queried PTE is returned, and if not, the page table walker continues to query in the lower-level cache space or the memory of the target preset cache space so as to attempt to obtain the PTE.

For example, the first-level page table entry read request includes a virtual address of a corresponding memory page, and in some examples, the first-level page table entry read request further includes translation bits. The virtual address of the corresponding memory page is the virtual page number in the virtual address to be translated.

For example, the memory page size is x=2 ^Y bytes (e.g., 4096=2 ¹²), and the memory pages are always aligned in X bytes, so the physical address of the memory page and the low Y bits of the virtual address are always 0, so the virtual address of the memory page may not save the low Y bits of the address to save hardware resources.

Many high performance CPUs support virtual machine mode, i.e., one physical machine (stand-alone or server) runs multiple virtual machines, each with a corresponding operating system running in it. In the example where the first level page table entry read request also includes translation bits, one or more virtual machines are running in the computer system, the virtual machines use virtual addresses, which in translating virtual addresses in the virtual machines' system need to be translated first into guest physical addresses that are actually still virtual addresses in the computer system rather than real physical addresses, and then translated into system physical addresses that are real physical addresses. That is, in virtual machine mode, each address translation needs to undergo a guest virtual address to guest physical address, guest physical address to system physical address translation process. Under the architecture of a 4-level page table, the translation of one guest virtual address to a system physical address may be possible to access memory at most 24 times. In the virtual machine mode, in order to minimize the number of address translations, translation bits are added to distinguish between the system physical address PTE and the guest physical address PTE. Translation bits of a valid value (e.g., "1") indicate that guest physical address (which may also be referred to as intermediate physical address (IPA, intermediate Physical Address)) to system physical address translations are provided, and translation bits of an invalid value (e.g., "0") indicate that guest virtual address to system physical address translations are provided, or that the current system is not in virtual machine mode.

For example, the first-level page table entry read request also includes the physical address of the corresponding first-level page table entry, which itself is a physical address in memory. The first stage page table entry has a size of n=2 ^M bytes, and the low M bits of the physical address of the first stage page table entry are always 0, so the physical address of the first stage page table entry may not hold the low M bits of the physical address to save hardware resources.

Typically, the data read-write request includes one of a data read request (data will not be rewritten), a data write request (data will be rewritten), and an instruction read request (data is an executable instruction). The first level page table entry read request contains the information described above, of a type different from that of a normal data read request. For example, when a in-cache lookup is performed, the first stage page table entry read request and the normal data read request are stored in different Miss Status holding registers (Miss Status HANDLING REGISTERS, MSHR). In embodiments of the present disclosure, when a data read request is received by the second through nth levels of cache spaces, it is necessary to determine whether the request is a first level page table entry read request.

Typical cache spaces, such as level two caches, include control logic, memory (e.g., static Random Access Memory (SRAM)) and fill buffers (fill buffers). The control logic is a control module of the cache space, and is used for controlling the operation of the cache space, the memory is used for storing cached data (including operation data or instruction data), and the filling buffer is used for temporarily storing a data read request when the cache space is queried by the data read request and is not hit, so that the execution state of the data read request is monitored. When the target data for the data read request is read from the lower level cache or the memory to the cache space, the data read request is deleted from the filling buffer, for example, the read target data is also written into the memory of the cache space for subsequent use.

In the embodiment of the disclosure, a first-stage page table entry buffer is additionally arranged in a preset cache space. The first-level page table entry buffer is configured to store information carried by the first-level page table entry read request. For example, the preset buffer space includes control logic, memory (e.g., static random access memory), a first level page table entry buffer, and a fill buffer (fill buffer). The first level page table entry buffer is used to cache first level page table entry read requests in the event of a query miss, while the fill buffer is used to cache normal data read requests in the event of a query miss, as described above.

FIG. 6A shows a schematic diagram of the contents of one data item in a first stage page table entry buffer.

As shown in FIG. 6A, for example, each data item in the first-level page table entry buffer is used to store cached information for the requested first-level page table entry, the contents of which include valid bits, translation bits, virtual addresses and physical addresses for the memory page corresponding to the requested first-level page table entry. The valid bit indicates that the data item is a valid item when it is a valid value (e.g., a "1"), and indicates that the data item is an invalid item when it is an invalid value (e.g., a "0"). A translation bit of a valid value (e.g., "1") indicates that the data item is used to provide a guest physical address to system physical address translation, and a translation bit of an invalid value (e.g., "0") indicates that the data item is used to provide a guest virtual address to system physical address translation, or that the data item is not in virtual machine mode.

The size of the data held in the cache and transferred between caches is fixed, e.g. 64 bytes. A piece of such data is referred to as a cache line (cacheline). While a first level page table entry data tends to be much smaller, e.g., 8 bytes, occupying only a portion of the cache line. Thus, the physical address of the cached first-level page table entry needs to have enough bits to obtain the correct first-level page table entry data from the read cache line. For example, in one example, as shown in FIG. 6B, the physical address of the corresponding first-level page table entry includes a cache line address for comparison with the address of the returned data corresponding to the first-level page table entry read request to determine whether the data includes first-level page table entry data, and a cache line offset value representing an offset value of the corresponding first-level page table entry data in the cache line.

The size of the first stage page table entry buffer may be determined according to actual requirements. If the first stage page table entry buffer is not very large, then a new first stage page table entry read request may be made and the first stage page table entry buffer is full. To avoid this, the cache needs to inform the processor core or upper level cache via a Token (Token) or the like mechanism to suspend sending new first level page table entry read requests. Token technology is a known method and reference may be made to a general design, which is not described in detail herein.

Fig. 7 illustrates an exemplary schematic diagram of an architecture of a cache in a processor provided by at least one embodiment of the present disclosure. Such as the processor shown in fig. 4 or fig. 5.

As shown in fig. 7, the processor includes an upper cache space 701, a target preset cache space 702, and a lower cache space/memory 703, which are sequentially communicatively connected to form a communication link, and similarly, the target preset cache space 702 is a preset cache space to be described currently. For example, the upper level buffer space 701 refers to a buffer space of an upper level of the target preset buffer space 702, for example, in the case where the target preset buffer space is the second level buffer space, the upper level buffer space is the first level buffer space, and in the case where the target preset buffer space is the third level buffer space (or LLC buffer), the upper level buffer space is the second level buffer space. The lower level buffer space 703 refers to a buffer space of a lower level of the target preset buffer space 702, and in the case that the target preset buffer space is the second level buffer space, the lower level buffer space is the third level buffer space (or the LLC described above). The first translation lookaside buffer 704 is set at the same path level as the target preset buffer space 702.

The target preset cache space 702 includes control logic 705, static random access memory 706, first level page table entry buffer 707, and fill buffer 708. Static random access memory 706 is an example of a storage medium for holding cached data and tags. The control logic 705 is configured to control the operation of the target preset cache space 702, which includes comparing a data read request with tags of respective cache lines currently cached to determine whether the requested data is in cache when a data read request is received. If the requested data is in the cache, the corresponding data is returned, and if the query is not in the cache (i.e., is not in the cache), the data read request is filled into the fill buffer 708, e.g., the fill buffer 708 passes the data read request to the lower cache space/memory 703 and waits for the return of the requested data. Likewise, upon receipt of a first-level page table entry read request, control logic 705 returns corresponding data if the requested first-level page table entry data is in cache, fills first-level page table entry buffer 707 with the first-level page table entry read request if the requested first-level page table entry data misses the cache, saves information carried by the first-level page table entry read request, and first-level page table entry buffer 707 passes the first-level page table entry read request to lower-level cache space/memory 703 and waits for return of the requested first-level page table entry data.

It should be noted that, in the embodiments of the present disclosure, the processor may be a single-core architecture or a multi-core architecture, which is not limited by the embodiments of the present disclosure. The number or the number of stages and the arrangement manner of the caches are not limited, and can be determined according to actual requirements. The processor is not limited to the structure shown in fig. 4 and 5, and may include more or fewer components, and the connection manner between the components is not limited.

At least one embodiment of the present disclosure also provides a data processing method for a processor. The data processing method includes caching a first level page table entry for address translation in a first translation look-aside buffer. In at least one embodiment, the method may enhance the performance of the data prefetcher, enabling the accelerator to use virtual addresses, thereby greatly simplifying programming modes using heterogeneous architectures, and enhancing the performance of the near-memory page table walker.

As described above, in at least one embodiment of the present disclosure, the processor includes a first level cache space, a first translation look-aside buffer, and at least one preset cache space, where the first level cache space and the at least one preset cache space are sequentially communicatively connected to form a communication link, the at least one preset cache space includes a target preset cache space, the first translation look-aside buffer and the target preset cache space are disposed at a same path level, and the first translation look-aside buffer is communicatively connected to the target preset cache space. For example, the processor may be the processor shown in fig. 4 or fig. 5. The relevant description of the processor may refer to the above, and will not be repeated here.

As shown in FIG. 8, in some embodiments, the method includes steps S11-S13. The method shown in fig. 8 is used, for example, for target preset cache space side.

Step S11, in response to receiving a read request generated by the processor core, judging whether the read request is a first-stage page table entry read request.

As can be seen from the above, when the target preset cache space receives a data read request, it needs to determine whether the request is a first level page table entry read request or a data read request involving normal data as a processed object. Thus, to support a first translation look-aside buffer, a target preset cache space is required to be able to identify a read operation of a PTE, such as described above, the target preset cache space including a first-level page table entry buffer for temporarily storing a first-level page table entry read request for a cache miss when the cache miss is looked up, and identifying the corresponding PTE by the temporarily stored first-level page table entry read request when the requested PTE is obtained from the present level, lower-level cache space or memory.

And step S12, responding to the reading request being a first-stage page table item reading request, returning the acquired first-stage page table item corresponding to the first-stage page table item reading request, and writing the corresponding first-stage page table item and the virtual address and the translation bit of the memory page in the first-stage page table item reading request into a first translation backup buffer.

As described above, when the corresponding PTE requested is obtained from a lower level cache space or memory and identified, the PTE may be saved in the first translation look-aside buffer.

As shown in fig. 9, in some embodiments, step S12 may include steps S210-S220.

Step S210, responding to the hit of the first-stage page table item reading request in the first translation backup buffer, returning the corresponding first-stage page table item acquired from the target preset cache space, and writing the corresponding first-stage page table item, the virtual address and the translation bit of the memory page in the first-stage page table item reading request into the first translation backup buffer.

For example, in the case that the first-level page table entry read request hits in the target preset cache space, the target preset cache space caches the required first-level page table entry, and then the corresponding first-level page table entry can be immediately obtained.

For example, when a first-stage page table item corresponding to a first-stage page table item read request hits in a target preset cache space, the first-stage page table item is a used first-stage page table item, and the first translation look-aside buffer can store the used first-stage page table item for devices such as a data prefetcher.

Step S220, responding to the miss of the first-level page table item reading request in the target preset cache space, returning the corresponding first-level page table item obtained from the lower-level cache or the memory of the target preset cache space, and writing the corresponding first-level page table item and the virtual address and the translation bit of the memory page in the first-level page table item reading request into the first translation reserve buffer.

For example, in the case that the first-level page table entry read request misses in the target preset cache space, the first-level page table entry required for querying in a lower-level cache or a memory of the target preset cache space is continued, for example, a step-by-step querying manner is adopted to acquire the first-level page table entry required. In order to maintain consistency of the contents of the first translation lookaside buffer, the corresponding first-level page table entries and virtual addresses and translation bits of the memory pages in the first-level page table entry read request need to be written into the first translation lookaside buffer.

Returning to fig. 8, in some examples, the method may further include step S13.

And step S13, responding to the reading request not being the first-stage page table item reading request, and returning the data corresponding to the reading request, which are acquired from the target preset cache space, the lower-stage cache of the target preset cache space or the internal memory.

For example, in the case where the read request is a general data read request type, the corresponding data is acquired from the target preset buffer space, a lower level buffer of the target preset buffer space, or a memory.

As shown in FIG. 10, in other embodiments, steps S110-S120 may be included for step S220 shown in FIG. 9. The method shown in fig. 10 is also used for example for target preset cache space side.

Step S110, in response to the first-level page table item corresponding to the first-level page table item reading request not being in the target preset cache space, the first-level page table item reading request is inserted into the first-level page table item buffer, the first-level page table item corresponding to the first-level page table item reading request is returned from the lower-level cache or the memory of the target preset cache space, and the physical address of the returned corresponding first-level page table item is compared with the physical address in the effective item of the first-level page table item buffer.

For example, the sram 706 shown in fig. 7 is configured to store cached data and tags, and the control logic 705 is configured to compare tag information included in a request with tag information of a cache line in the cache to determine whether the requested data is in the cache. When the tag information included in the first-level page table item reading request is not equal to the tag information of the cache line of the target preset cache space, the first-page table item data corresponding to the first-level page table item reading request is not in the target preset cache space, and the first-level page table item reading request is inserted into the first-level page table item buffer.

For example, if the first page table entry data corresponding to the first page table entry read request is not hit in the target preset cache space, the requested first page table entry data is returned from the lower level cache or the memory, and the physical address of the returned first page table entry data is compared with the physical address of the effective entry in the first page table entry buffer, so as to determine whether the first page table entry read request corresponding to the returned first page table entry data is a first page table entry read request of a certain item which is not hit before.

And step S120, in response to the physical address of the corresponding first-stage page table item being the same as the physical address in at least one effective item of the first-stage page table item buffer, writing the virtual address and the translation bit of the memory page in the corresponding first-stage page table item and the first-stage page table item read request into the first translation look-aside buffer, and deleting the effective item corresponding to the first-stage page table item read request in the first-stage page table item buffer.

For example, if the physical address of the returned data is the same as the physical address of the valid entry of the first-level page table entry buffer, it indicates that the first-level page table entry read request corresponding to the returned first-level page table entry data is a certain first-level page table entry read request that was missed previously, and therefore, the requested first-level page table entry, virtual address and translation bit of the memory page are extracted from the returned first-level page table entry data and filled into the first translation lookaside buffer for subsequent address translation, and in addition, other translation lookaside buffers (e.g., second translation lookaside buffers) other than the first translation lookaside buffer may also be updated synchronously in order to maintain the consistency of the TLB data items in the respective translation lookaside buffers.

For another example, in some examples, the method may further include, responsive to one or more items of content in the second translation look-aside buffer being purged, purging the corresponding one or more items of content in the first translation look-aside buffer to maintain consistency between the first translation look-aside buffer and the second translation look-aside buffer.

For example, in some cases (e.g., a process switch or a translation look-aside buffer entry being altered by the operating system), one or more of the second translation look-aside buffers need to be purged. At this time, in order to ensure cache consistency of the translation look-aside buffers, the corresponding one or more entries in the first translation look-aside buffer also need to be cleared.

For example, the methods shown in fig. 8 and 9 and fig. 10 cooperate with each other, the order of execution of the steps may be adjusted and changed, and embodiments of the present disclosure are not limited in this respect.

It should be noted that, in the embodiment of the present disclosure, the data processing method is not limited to the steps described above, and may include more or fewer steps, and the execution order of the steps is not limited, which may be determined according to actual requirements.

FIG. 11 illustrates a schematic flow diagram of another data processing method for a processor provided in accordance with at least one embodiment of the present disclosure. Similarly, the processor includes a first level buffer space, a first translation look-aside buffer and at least one preset buffer space, where the first level buffer space and the at least one preset buffer space are sequentially communicatively connected to form a communication link, the at least one preset buffer space includes a target preset buffer space, the first translation look-aside buffer and the target preset buffer space are set at the same path level, and the first translation look-aside buffer is communicatively connected with the target preset buffer space. The data processing method of this embodiment involves a process of address translation using a first translation look-aside buffer

As shown in FIG. 11, the data processing method includes the following steps S111-S112.

Step S111, according to the address translation request, whether a first-stage page table item corresponding to the address translation request is cached is queried in a first translation backup buffer.

Step S112, in response to the inquiry hit in the first translation look-up buffer of the first-stage page table item corresponding to the address translation request, address translation is performed by using the corresponding first-stage page table item.

For example, to save address translation time, to improve computer system performance, a previously used first-level Page Table Entry (PTE) may be deposited in a first translation look-aside buffer that is not at the same path level as the first-level cache. When address translation is needed, inquiring whether a needed PTE exists in the first translation backup buffer according to the address translation request, and if so, immediately acquiring the corresponding PTE for calculating the physical address. Unlike the second translation look-aside buffer at the same path level as the first level buffer, the first translation look-aside buffer may also be used in the context of off-core prefetching, such as when the first translation look-aside buffer is located outside the processor core, for example, the data prefetcher generates an address translation request, queries the first level page table entry needed by the first translation look-aside buffer outside the processor core, and prefetches data according to the query result, so that the data prefetching operation does not need to be performed in the processor core, thereby improving the performance of the processor.

For example, in some examples, as shown in fig. 11, the method may further include step S113.

Step S113, based on the target address obtained by performing address translation by using the corresponding first-stage page table item, reading the data corresponding to the target address at least into a target preset cache space.

For example, the data corresponding to the target address may be used later, so that the data is read into the target preset buffer in advance, so that the processing time when the data is read when the data is used later is reduced, and the need for further reading into the memory is avoided.

FIG. 12A is a flow chart of a buffered data read.

The buffered data read flow illustrated in FIG. 12A corresponds, for example, to the processor illustrated in FIG. 2 that does not support a TLB other than an in-core TLB, i.e., does not support a first translation look-aside buffer in the processor that is not at the same path level as the first level buffer, as provided by embodiments of the present disclosure.

As shown in fig. 12A, first, a current one of the cache spaces (e.g., the second-level cache or the last-level cache) receives a data read request generated by the processor core, then, it is queried whether the preset cache space has the requested data, and if the requested data hits in the cache space, the requested data is obtained from a storage unit (e.g., a static random access memory) of the cache space. If the requested data misses in the cache space, the read request is written to the fill buffer and continues to query and read in the next level of cache space/memory, thereby obtaining that the requested data returns the requested data to the current cache space, updating the data/tag cached in the cache space, and deleting the previously cached data read request from the fill buffer (thereby no longer monitoring the data read request).

Fig. 12B is a schematic flow chart diagram of data processing using a processor provided in at least one embodiment of the present disclosure.

The data processing method provided by the embodiment of the present disclosure is described in detail below with reference to the processor architecture shown in fig. 7.

As shown in fig. 12B, first, the target preset cache space 702 receives a data read request generated by a processor core. Then, it is determined whether the data read request hits in the target pre-set cache space 702.

In response to the data read request hitting the target preset buffer space 702, corresponding data is obtained from the static random access memory 706 of the target preset buffer space. Then, it is determined whether the data read request is a PTE read request, and in response to the PTE read request, the data (i.e., PTE) obtained according to the PTE read request, the virtual address and the translation bits of the memory page are written into the first translation look-aside buffer 704 for use in subsequent address translations and the requested data is returned to the upper level buffer space 701. In response to not being a PTE read request, the requested data is returned directly to the upper level cache space 701.

In response to the data read request not hitting the target pre-set cache space 702, the data read request is written into the fill buffer 708 of the target pre-set cache space, and then a determination is made as to whether the data read request is a PTE read request. In response to the data read request being a PTE read request, the PTE read request is written to PTE buffer 707, and the requested data is then obtained from lower level cache space/memory 703. Or in response to the data read request not being a PTE request, the requested data is obtained directly from the lower level cache space/memory 703. Next, in both cases, the data/tag of the sram 706 is updated with the obtained requested data for subsequent use with respect to the requested data obtained from the lower level cache space/memory 703.

The physical address of the requested data obtained is then compared to the physical address of the valid entry in the PTE buffer 707, and if equal, the requested data is PTE data, the virtual address and translation bits of the requested PTE, memory page are extracted from the requested data, the matching data entry is deleted from the PTE buffer 707, the obtained virtual address and translation bits of the requested PTE, memory page are written to the first translation look-aside buffer (704) for subsequent use, and the requested data (i.e., PTE) is returned to the upper level cache space 701. If the address of the requested data is not equal to the physical address of the valid entry in the PTE buffer (707), indicating that the requested data is not PTE data, then the requested data is processed in the manner of normal data, i.e., returned directly to the upper level cache space 701.

It should be noted that the data processing method provided by the various embodiments of the present disclosure may be used in a processor provided by any of the embodiments of the present disclosure, where the processor may implement one or more steps of the data processing method provided by the embodiments of the present disclosure.

At least one embodiment of the present disclosure also provides an electronic device including a processor provided by at least one embodiment of the present disclosure.

Fig. 13 is a schematic block diagram of an electronic device 1300 provided by some embodiments of the present disclosure. As shown in fig. 13, the electronic device 1300 includes a processor 1310. The processor 1310 is, for example, a processor provided by any of the embodiments of the present disclosure, which may perform one or more of the steps of the data processing method described above.

For example, processor 1310 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capabilities and/or program execution capabilities. For example, the Central Processing Unit (CPU) may be an X86, ARM, RISC-V architecture, or the like. The processor 1310 may be a general-purpose processor or a special-purpose processor, and may control other components in the electronic device 1300 to perform desired functions.

Fig. 14 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. The electronic device 1400 is, for example, suitable for use in implementing the data processing methods provided by embodiments of the present disclosure. The electronic device 1400 may be a terminal device or a computer system or the like. It should be noted that the electronic device 1400 shown in fig. 14 is merely an example, and does not impose any limitation on the functionality and scope of use of the embodiments of the present disclosure.

As shown in fig. 14, the electronic device 1400 may include a processing means (e.g., a central processor, a graphics processor, etc.) 1410, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1420 or a program loaded from a storage means 1480 into a Random Access Memory (RAM) 1430. The processing device 1410 is, for example, a processor provided by any of the embodiments of the present disclosure. In the RAM 1430, various programs and data required for the operation of the electronic device 1400 are also stored. The processing device 1410, the ROM 1420, and the RAM 1430 are connected to each other through a bus 1440. An input/output (I/O) interface 1450 is also connected to bus 1440.

In general, devices may be connected to I/O interface 1450 including input devices 1460 such as a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 1470 such as a Liquid Crystal Display (LCD), speaker, vibrator, etc., storage devices 1480 such as magnetic tape, hard disk, etc., and communication devices 1490. The communications apparatus 1490 can allow the electronic device 1400 to communicate wirelessly or by wire with other electronic devices to exchange data. While fig. 14 shows the electronic device 1400 with various means, it is to be understood that not all of the illustrated means are required to be implemented or provided, and that the electronic device 1400 may alternatively be implemented or provided with more or fewer means.

It should be noted that, in the embodiments of the present disclosure, specific functions and technical effects of the electronic device 1300/1400 may refer to the above description about the processor and the data processing method, which are not repeated here.

The following points need to be described:

(1) The drawings of the embodiments of the present disclosure relate only to the structures to which the embodiments of the present disclosure relate, and reference may be made to the general design for other structures.

(2) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.

The foregoing is merely specific embodiments of the disclosure, but the scope of the disclosure is not limited thereto, and the scope of the disclosure should be determined by the claims.

Claims

1. A processor comprising a processor core, a first-level cache space, a first translation lookaside buffer, and at least one preset cache space,

The first-level cache space and the at least one preset cache space are sequentially connected to form a communication link.

The at least one preset cache space includes a target preset cache space, the first translation lookaside buffer and the target preset cache space are set at the same path level, and the first translation lookaside buffer is communicatively connected to the target preset cache space,

The first translation lookaside buffer is configured to cache first-level page table entries, and the first translation lookaside buffer is disposed outside the processor core;

The first-level cache space is located in the processor core, and the target preset cache space is configured as follows:

In response to receiving a read request generated by the processor core, determining whether the read request is a first-level page table entry read request;

In response to the read request being the first-level page table entry read request, returning the obtained first-level page table entry corresponding to the first-level page table entry read request, and writing the corresponding first-level page table entry and the virtual address and translation bits of the memory page in the first-level page table entry read request into the first translation lookaside buffer; or

In response to the read request not being the first-level page table entry read request, data corresponding to the read request obtained from the target preset cache space, a lower-level cache of the target preset cache space, or a memory is returned.

2. The processor of claim 1 , wherein returning the obtained first-level page table entry corresponding to the first-level page table entry read request, and writing the corresponding first-level page table entry and the virtual address of the memory page and the translation bit in the first-level page table entry read request into the first translation lookaside buffer comprises:

In response to the first-level page table entry read request hitting the target preset cache space, returning the corresponding first-level page table entry obtained from the target preset cache space, and writing the corresponding first-level page table entry and the virtual address of the memory page and the translation bit in the first-level page table entry read request into the first translation lookaside buffer;

In response to the first-level page table entry read request not hitting in the target preset cache space, the corresponding first-level page table entry obtained from the lower-level cache or memory of the target preset cache space is returned, and the corresponding first-level page table entry and the virtual address of the memory page and the translation bit in the first-level page table entry read request are written into the first translation backup buffer.

3. The processor according to claim 2, wherein the target preset cache space comprises a first-level page table entry buffer, the first-level page table entry buffer being configured to store information carried by the first-level page table entry read request,

The content of each data item in the first-level page table entry buffer includes a valid bit, a translation bit, a virtual address and a physical address of a memory page corresponding to the requested first-level page table entry,

When the valid bit is a valid value, it indicates that the data item is a valid item; when the valid bit is an invalid value, it indicates that the data item is an invalid item.

When the translation bit is a valid value, it indicates that the data item is used to provide translation from a guest physical address to a system physical address. When the translation bit is an invalid value, it indicates that the data item is used to provide translation from a guest virtual address to the system physical address.

4. The processor of claim 3 , wherein, in response to a first-level page table entry read request not hitting the target preset cache space, returning the corresponding first-level page table entry obtained from a lower-level cache or memory of the target preset cache space, and writing the corresponding first-level page table entry and the virtual address of the memory page and the translation bit in the first-level page table entry read request to the first translation lookaside buffer, comprising:

In response to the first-level page table entry corresponding to the first-level page table entry read request not being in the target preset cache space, inserting the first-level page table entry read request into the first-level page table entry buffer, returning the first-level page table entry corresponding to the first-level page table entry read request from a lower-level cache or memory of the target preset cache space, and comparing the physical address of the returned corresponding first-level page table entry with the physical address in the valid entry of the first-level page table entry buffer;

In response to the physical address of the corresponding first-level page table entry being the same as the physical address in at least one valid entry in the first-level page table entry buffer, the corresponding first-level page table entry and the virtual address of the memory page and the translation bit in the first-level page table entry read request are written into the first translation lookaside buffer, and the valid entry corresponding to the first-level page table entry read request in the first-level page table entry buffer is deleted.

5. The processor according to claim 3 , wherein the first-level page table entry read request comprises a virtual address of a corresponding memory page,

The virtual address of the corresponding memory page is the virtual page number in the virtual address that needs to be translated.

6. The processor according to claim 5, wherein the first-level page table entry read request further includes a physical address of the corresponding first-level page table entry,

The physical address of the corresponding first-level page table entry includes a cache row address and a cache row offset value. The cache row address is used to compare with the address of the returned data corresponding to the first-level page table entry read request to determine whether the data includes first-level page table entry data. The cache row offset value represents the offset value of the corresponding first-level page table entry data in the corresponding cache row.

7. The processor according to claim 1 , wherein the at least one preset cache space comprises a second-level cache space to an N-th-level cache space, where N is an integer greater than 2.

The N-level cache space is closest to the memory and farthest from the processor core, and any one-level cache space from the second-level cache space to the N-level cache space serves as the target preset cache space.

8 . The processor according to claim 7 , wherein the second-level cache space is a private or shared cache space for the processor core, and the second-level cache space serves as the target preset cache space.

9 . The processor according to claim 7 , wherein the N-th level cache space is a shared cache space for the processor core, and the N-th level cache space serves as the target preset cache space.

10. The processor according to claim 7, wherein the first-level cache space to the N-th-level cache space store at least part of the page table entry data of the first-level page table to the page table entry data of the M-th-level page table, and M is an integer greater than 1.

11. The processor of claim 1 , further comprising a second translation lookaside buffer, wherein the second translation lookaside buffer is located within the processor core.

The second translation lookaside buffer and the first level cache space are arranged at the same path level, and the first level cache space is communicatively connected with the second translation lookaside buffer.

The processor core is configured to generate a first-level page table entry read request in response to the absence of first-level page table entry data required for address translation in the first translation lookaside buffer or the second translation lookaside buffer.

12. The processor of claim 11 , wherein the first translation lookaside buffer is further configured to, in response to one or more contents in the second translation lookaside buffer being cleared, clear the corresponding one or more contents in the first translation lookaside buffer, so as to maintain consistency between the first translation lookaside buffer and the second translation lookaside buffer.

13. A data processing method for a processor, wherein the processor comprises a processor core, a first-level cache space, a first translation lookaside buffer, and at least one preset cache space, the first-level cache space and the at least one preset cache space being communicatively connected in sequence to form a communication link, the at least one preset cache space including a target preset cache space, the first translation lookaside buffer and the target preset cache space being arranged at the same path level, the first translation lookaside buffer being communicatively connected to the target preset cache space, and the first translation lookaside buffer being arranged outside the processor core,

The data processing method includes:

caching first-level page table entries for address translation in the first translation lookaside buffer; the data processing method further comprises:

In response to the read request being a first-level page table entry read request, returning the obtained first-level page table entry corresponding to the first-level page table entry read request, and writing the corresponding first-level page table entry and the virtual address and translation bits of the memory page in the first-level page table entry read request into the first translation lookaside buffer;

14. The data processing method according to claim 13 , wherein returning the obtained first-level page table entry corresponding to the first-level page table entry read request, and writing the corresponding first-level page table entry and the virtual address of the memory page and the translation bit in the first-level page table entry read request into the first translation lookaside buffer comprises:

15. The data processing method according to claim 14, wherein the target preset cache space comprises a first-level page table entry buffer, the first-level page table entry buffer being configured to store information carried in the first-level page table entry read request,

When the translation bit is a valid value, it indicates that the data item is used to provide a translation from a guest physical address to a system physical address; when the translation bit is an invalid value, it indicates that the data item is used to provide a translation from a guest virtual address to the system physical address.

In response to a first-level page table entry read request not hitting the target preset cache space, returning the corresponding first-level page table entry obtained from a lower-level cache or memory of the target preset cache space, and writing the corresponding first-level page table entry and the virtual address of the memory page and the translation bit in the first-level page table entry read request into the first translation lookaside buffer, comprising:

16. The data processing method according to claim 13, wherein the processor further comprises a second translation lookaside buffer, the second translation lookaside buffer being located within the processor core, the second translation lookaside buffer being arranged at the same path level as the first-level cache space, and the first-level cache space being communicatively connected to the second translation lookaside buffer.

The data processing method further includes:

In response to the absence of first-level page table entry data required for address translation in the first translation lookaside buffer or the second translation lookaside buffer, a first-level page table entry read request is generated.

17. The data processing method according to claim 16, further comprising:

In response to one or more contents in the second translation lookaside buffer being cleared, corresponding one or more contents in the first translation lookaside buffer are cleared to maintain consistency between the first translation lookaside buffer and the second translation lookaside buffer.

18. A data processing method for a processor, wherein the processor comprises a processor core, a first-level cache space, a first translation lookaside buffer, and at least one preset cache space, the first-level cache space and the at least one preset cache space being communicatively connected in sequence to form a communication link, the at least one preset cache space comprising a target preset cache space, the first translation lookaside buffer and the target preset cache space being arranged at the same path level, the first translation lookaside buffer being communicatively connected to the target preset cache space, and the first translation lookaside buffer being arranged outside the processor core,

The data processing method includes:

querying, in accordance with the address translation request, whether a first-level page table entry corresponding to the address translation request is cached in the first translation lookaside buffer;

In response to finding a first-level page table entry corresponding to the address translation request in the first translation lookaside buffer, performing address translation using the corresponding first-level page table entry;

The data processing method further includes:

19. The data processing method according to claim 18, further comprising:

Based on a target address obtained by performing address translation using the corresponding first-level page table entry, data corresponding to the target address is read into at least the target preset cache space.

20. An electronic device comprising the processor according to any one of claims 1 to 12.