+

WO2010151267A1 - Optimisations pour un système à mémoire transactionnelle non limitée (utm) - Google Patents

Optimisations pour un système à mémoire transactionnelle non limitée (utm) Download PDF

Info

Publication number
WO2010151267A1
WO2010151267A1 PCT/US2009/048947 US2009048947W WO2010151267A1 WO 2010151267 A1 WO2010151267 A1 WO 2010151267A1 US 2009048947 W US2009048947 W US 2009048947W WO 2010151267 A1 WO2010151267 A1 WO 2010151267A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
loss
address
instruction
data
Prior art date
Application number
PCT/US2009/048947
Other languages
English (en)
Inventor
Gad Sheaffer
Jan Gray
Burton Smith
Ali-Reza Adl-Tabatabai
Robert Geva
Vadim Bassin
David Callahan
Yang Ni
Bratin Saha
Martin Taillefer
Shlomo Raikin
Koichi Yamada
Landy Wang
Arun Kishan
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to GB1119084.0A priority Critical patent/GB2484416B/en
Priority to CN200980160097.XA priority patent/CN102460376B/zh
Priority to JP2012516043A priority patent/JP5608738B2/ja
Priority to PCT/US2009/048947 priority patent/WO2010151267A1/fr
Priority to BRPI0925055-7A priority patent/BRPI0925055A2/pt
Priority to DE112009005006T priority patent/DE112009005006T5/de
Priority to KR1020117031098A priority patent/KR101370314B1/ko
Publication of WO2010151267A1 publication Critical patent/WO2010151267A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1045Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1036Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/109Address translation for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30185Instruction operation extension or modification according to one or more bits in the instruction, e.g. prefix, sub-opcode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • G06F9/528Mutual exclusion algorithms by using speculative mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/401Compressed data

Definitions

  • FIELD This invention relates to the field of processor execution and, in particular, to execution of groups of instructions.
  • a processor or integrated circuit typically comprises a single processor die, where the processor die may include any number of cores or logical processors.
  • the ever increasing number of cores and logical processors on integrated circuits enables more software threads to be concurrently executed.
  • the increase in the number of software threads that may be executed simultaneously have created problems with synchronizing data shared among the software threads.
  • One common solution to accessing shared data in multiple core or multiple logical processor systems comprises the use of locks to guarantee mutual exclusion across multiple accesses to shared data.
  • the ever increasing ability to execute multiple software threads potentially results in false contention and a serialization of execution.
  • transactional memory TM
  • transactional execution includes executing a grouping of a plurality of micro-operations, operations, or instructions.
  • both threads execute within the hash table, and their memory accesses are monitored/tracked. If both threads access/alter the same entry, conflict resolution may be performed to ensure data validity.
  • One type of transactional execution includes Software Transactional Memory (STM), where tracking of memory accesses, conflict resolution, abort tasks, and other transactional tasks are performed in software, often without the support of hardware.
  • STM Software Transactional Memory
  • HTM Hardware Transactional Memory
  • hardware is included to support access tracking, conflict resolution, and other transactional tasks.
  • HTM Hardware Transactional Memory
  • actual memory data arrays were extended with additional bits to hold information, such as hardware attributes to track reads, writes, and buffering, and as a result, the data travels with the data from the processor to memory.
  • this information is referred to as persistent, i.e. it is not lost upon a cache eviction, since the information travels with data throughout the memory hierarchy. Yet, this persistency imposes more overhead throughout the memory hierarchy system.
  • previous hardware transactional memory (HTM) systems have been fraught with a number of inefficiencies.
  • HTMs currently provide no efficient method for transitioning between un-buffered or buffered and not monitored states to a buffered and monitored state to ensure consistency before commit of a transaction.
  • multiple inefficiencies of a HTM' s interface with software exist. Specifically, hardware provides no mechanism to properly accelerate software memory access barriers, which take into account different forms of strong and weak atomicity between transactional and non-transactional operations.
  • hardware does not provide any facilities for determining when a transaction is to abort or commit based on loss of monitoring, buffering, and/or other attribute information.
  • the instruction set for these previous HTMs do not provide for commit instructions that define information to retain, or clear, upon commit of a transaction.
  • Other exemplary inefficiencies include: HTMs not providing instructions to efficiently vector or jump execution upon detection of a conflict or loss of information and the inability of current HTMs to handle ring level priority transitions during execution of transactions.
  • Figure 1 illustrates an embodiment of a processor including multiple processing elements capable of executing multiple software threads concurrently.
  • Figure 2 illustrates an embodiment of associating metadata for a data item.
  • Figure 3 illustrates an embodiment of multiple orthogonal metaphysical address spaces for separate software subsystems within a plurality of processing elements.
  • Figure 4 illustrates an embodiment of compression of metadata to data.
  • Figure 5 illustrates an embodiment of a flow diagram for a method of accessing metadata.
  • Figure 6 illustrates an embodiment of a metadata storage element to support acceleration of transactions within strong and weak atomicity environments.
  • Figure 7 illustrates an embodiment of a flow diagram for accelerating non- transactional operations while maintaining atomicity in a transactional environment.
  • Figure 8 illustrates an embodiment of a flow diagram for a method efficiently transitioning a block of data to a buffered and monitored state before commit of a transaction.
  • Figure 9 illustrates an embodiment of hardware to support a loss instruction to jump to a destination label based upon a status value in a transaction status register.
  • Figure 10 illustrates an embodiment of a flow diagram for a method of executing a loss instruction to jump to a destination label based upon a conflict or loss of specific information.
  • Figure 11 illustrates an embodiment of hardware to support definition of commit conditions and clear controls in a commit instruction.
  • Figure 12 illustrates an embodiment of a flow diagram for a method of executing a commit instruction, which defines commit conditions and clear controls.
  • Figure 13 illustrates an embodiment of hardware to support handling privilege level transitions during execution of transactions.
  • the method and apparatus described herein are for optimizing hardware and software for unbounded transactional memory (UTM) execution. Specifically, the optimizations are primarily discussed in reference to a supporting a UTM system.
  • processor 100 may include hardware support for hardware transactional execution. Either in conjunction with hardware transactional execution, or separately, processor 100 may also provide hardware support for hardware acceleration of a Software Transactional Memory (STM), separate execution of a STM, or a combination thereof, such as a hybrid Transactional Memory (TM) system.
  • processor 100 includes any processor, such as a micro-processor, an embedded processor, a digital signal processor (DSP), a network processor, or other device to execute code.
  • DSP digital signal processor
  • Processor 100 includes a plurality of processing elements.
  • a processing element refers to a thread unit, a process unit, a context, a logical processor, a hardware thread, a core, and/or any other element, which is capable of holding a state for a processor, such as an execution state or architectural state.
  • a processing element in one embodiment, refers to any hardware capable of being independently associated with code, such as a software thread, operating system, application, or other code.
  • a physical processor typically refers to an integrated circuit, which potentially includes any number of other processing elements, such as cores or hardware threads.
  • a core often refers to logic located on an integrated circuit capable of maintaining an independent architectural state wherein each independently maintained architectural state is associated with at least some dedicated execution resources.
  • a hardware thread typically refers to any logic located on an integrated circuit capable of maintaining an independent architectural state wherein the independently maintained architectural states share access to execution resources.
  • the line between the nomenclature of a hardware thread and core overlaps.
  • a core and a hardware thread are viewed by an operating system as individual logical processors, where the operating system is able to individually schedule operations on each logical processor.
  • Physical processor 100 includes two cores, core 101 and 102, which share access to higher level cache 110.
  • processor 100 may include asymmetric cores, i.e. cores with different configurations, functional units, and/or logic, symmetric cores are illustrated.
  • core 102 which is illustrated as identical to core 101, will not be discussed in detail to avoid repetitive discussion.
  • core 101 includes two hardware threads 101a and 101b, while core 102 includes two hardware threads 102a and 102b. Therefore, software entities, such as an operating system, potentially view processor 100 as four separate processors, i.e. four logical processors or processing elements capable of executing four software threads concurrently.
  • a first thread is associated with architecture state registers 101a
  • a second thread is associated with architecture state registers 101b
  • a third thread is associated with architecture state registers 102a
  • a fourth thread is associated with architecture state registers 102b.
  • architecture state registers 101a are replicated in architecture state registers 101b, so individual architecture states/contexts are capable of being stored for logical processor 101a and logical processor 101b.
  • Other smaller resources such as instruction pointers and renaming logic in rename allocater logic 130 may also be replicated for threads 101a and 101b.
  • Some resources, such as re-order buffers in reorder/retirement unit 135, ILTB 120, load/store buffers, and queues may be shared through partitioning.
  • Processor 100 often includes other resources, which may be fully shared, shared through partitioning, or dedicated by/to processing elements.
  • Figure 1 an embodiment of a purely exemplary processor with illustrative functional units/resources of a processor is illustrated. Note that a processor may include, or omit, any of these functional units, as well as include any other known functional units, logic, or firmware not depicted.
  • processor 100 includes bus interface module 105 to communicate with devices external to processor 100, such as system memory 175, a chipset, a northbridge, or other integrated circuit.
  • Memory 175 may be dedicated to processor 100 or shared with other devices in a system.
  • Higher-level or further-out cache 110 is to cache recently fetched elements from higher-level cache 110. Note that higher-level or further- out refers to cache levels increasing or getting further way from the execution unit(s).
  • higher-level cache 110 is a second-level data cache. However, higher level cache 110 is not so limited, as it may be associated with or include an instruction cache.
  • a trace cache i.e. a type of instruction cache, may instead be coupled after decoder 125 to store recently decoded traces.
  • Module 120 also potentially includes a branch target buffer to predict branches to be executed/taken and an instruction-translation buffer (I-TLB) to store address translation entries for instructions.
  • I-TLB instruction-translation buffer
  • Decode module 125 is coupled to fetch unit 120 to decode fetched elements.
  • processor 100 is associated with an Instruction Set Architecture (ISA), which defines/specifies instructions executable on processor 100.
  • ISA Instruction Set Architecture
  • machine code instructions recognized by the ISA include a portion of the instruction referred to as an opcode, which references/specifies an instruction or operation to be performed.
  • allocator and renamer block 130 includes an allocator to reserve resources, such as register files to store instruction processing results.
  • resources such as register files to store instruction processing results.
  • threads 101a and 101b are potentially capable of out-of-order execution, where allocator and renamer block 130 also reserves other resources, such as reorder buffers to track instruction results.
  • Unit 130 may also include a register renamer to rename program/instruction reference registers to other registers internal to processor 100.
  • Reorder/retirement unit 135 includes components, such as the reorder buffers mentioned above, load buffers, and store buffers, to support out-of-order execution and later in-order retirement of instructions executed out-of-order.
  • Scheduler and execution unit(s) block 140 includes a scheduler unit to schedule instructions/operation on execution units. For example, a floating point instruction is scheduled on a port of an execution unit that has an available floating point execution unit. Register files associated with the execution units are also included to store information instruction processing results. Exemplary execution units include a floating point execution unit, an integer execution unit, a jump execution unit, a load execution unit, a store execution unit, and other known execution units.
  • Lower level data cache and data translation buffer (D-TLB) 150 are coupled to execution unit(s) 140.
  • the data cache is to store recently used/operated on elements, such as data operands, which are potentially held in memory coherency states.
  • the D-TLB is to store recent virtual/linear to physical address translations.
  • a processor may include a page table structure to break physical memory into a plurality of virtual pages.
  • processor 100 is capable of hardware transactional execution, software transactional execution, or a combination or hybrid thereof.
  • a transaction which may also be referred to as a critical or atomic section of code, includes a grouping of instructions, operations, or micro-operations to be executed as an atomic group.
  • instructions or operations may be used to demarcate a transaction or a critical section.
  • these instructions are part of a set of instructions, such as an Instruction Set Architecture (ISA), which are recognizable by hardware of processor 100, such as decoders described above.
  • ISA Instruction Set Architecture
  • these instructions once compiled from a high-level language to hardware recognizable assembly language include operation codes (opcodes), or other portions of the instructions, that decoders recognize during a decode stage.
  • updates to memory are not made globally visible until the transaction is committed.
  • a transactional write to a location is potentially visible to a local thread, yet, in response to a read from another thread the write data is not forwarded until the transaction including the transactional write is committed.
  • data items/elements loaded from and written to within a memory are tracked, as discussed in more detail below.
  • pendency of a transaction refers to a transaction that has begun execution and has not been committed or aborted, i.e. pending.
  • processor 100 is capable of executing a compiler to compile program code to support transactional execution.
  • the compiler may insert operations, calls, functions, and other code to enable execution of transactions.
  • a compiler often includes a program or set of programs to translate source text/code into target text/code.
  • compilation of program/application code with a compiler is done in multiple phases and passes to transform hi-level programming language code into low-level machine or assembly language code.
  • single pass compilers may still be utilized for simple compilation.
  • a compiler may utilize any known compilation techniques and perform any known compiler operations, such as lexical analysis, preprocessing, parsing, semantic analysis, code generation, code transformation, and code optimization.
  • a front-end i.e. generally where syntactic processing, semantic processing, and some transformation/optimization may take place
  • a back-end i.e. generally where analysis, transformations, optimizations, and code generation takes place.
  • Some compilers refer to a middle end, which illustrates the blurring of delineation between a front-end and back end of a compiler.
  • reference to insertion, association, generation, or other operation of a compiler may take place in any of the aforementioned phases or passes, as well as any other known phases or passes of a compiler.
  • a compiler potentially inserts transactional operations, calls, functions, etc.
  • references to execution of program code in one embodiment, refers to (1) execution of a compiler program(s), either dynamically or statically, to compile main program code, to maintain transactional structures, or to perform other transaction related operations, (2) execution of main program code including transactional operations/calls, (3) execution of other program code, such as libraries, associated with the main program code, or (4) a combination thereof.
  • a compiler will be utilized to insert some operations, calls, and other code inline with application code to be compiled, while other operations, calls, functions, and code are provided separately within libraries. This potentially provides the ability of the libraries distributors to optimize and update the libraries without having to recompile the application code.
  • a call to a commit function may be inserted inline within application code at a commit point of a transaction, while the commit function is separately provided in an updateable library.
  • the choice of where to place specific operations and calls potentially affects the efficiency of application code. For example, if a filter operation, which is discussed in more detail regarding access barriers in reference to Figure 6, is inserted inline with code, the filter operation may be performed before vectoring execution to a barrier instead of inefficiently vectoring to the barrier and then performing the filter operation.
  • processor 100 is capable of executing transactions utilizing hardware/logic, i.e. within a Hardware Transactional Memory (HTM) system.
  • HTM Hardware Transactional Memory
  • some structures and implementations are disclosed for illustrative purposes. Yet, it should be noted that these structures and implementations are not required and may be augmented and/or replaced with other structures having different implementation details.
  • processor 100 may be capable of executing transactions within an unbounded transactional memory (UTM) system, which attempts to take advantage of the benefits of both STM and HTM systems.
  • UTM unbounded transactional memory
  • an HTM is often fast and efficient for executing small transactions, because it does not rely on software to perform all of the access tracking, conflict detection, validation, and commit for transactions.
  • a UTM system utilizes hardware to execute smaller transactions and software to execute transactions that are too big for the hardware.
  • hardware may be utilized to assist and accelerate the software.
  • the same hardware may also be utilized to support and accelerate a pure STM system.
  • transactions include transactional memory accesses to data items both by local processing elements within processor 100, as well as potentially by other processing elements. Without safety mechanisms in a transactional memory system, some of these accesses would potentially result in invalid data and execution, i.e. a write to data invalidating a read, or a read of invalid data. As a result, processor 100 potentially includes logic to track or monitor memory accesses to and from data items for identification of potential conflicts, such as read monitors and write monitors, as discussed below.
  • a data item or data element may include data at any granularity level, as defined by hardware, software or a combination thereof.
  • data, data elements, data items, or references thereto include a memory address, a data object, a class, a field of a type of dynamic language code, a type of dynamic language code, a variable, an operand, a data structure, and an indirect reference to a memory address.
  • any known grouping of data may be referred to as a data element or data item.
  • a few of the examples above, such as a field of a type of dynamic language code and a type of dynamic language code refer to data structures of dynamic language code.
  • dynamic language code such as JavaTM from Sun Microsystems, Inc
  • JavaTM is a strongly typed language.
  • Each variable has a type that is known at compile time.
  • the types are divided in two categories - primitive types (boolean and numeric, e.g., int, float) and reference types (classes, interfaces and arrays).
  • the values of reference types are references to objects.
  • an object which consists of fields, may be a class instance or an array. Given object a of class A it is customary to use the notation A::x to refer to the field x of type A and a.x to the field x of object a of class A.
  • monitoring/buffering memory accesses to data items may be performed at any of data level granularity.
  • memory accesses to data are monitored at a type level.
  • a transactional write to a field A::x and a non- transactional load of field A::y may be monitored as accesses to the same data item, i.e. type A.
  • memory access monitoring/buffering is performed at a field level granularity.
  • a transactional write to A::x and a non-trans actional load of A::y are not monitored as accesses to the same data item, as they are references to separate fields.
  • other data structures or programming techniques may be taken into account in tracking memory accesses to data items.
  • fields x and y of object of class A i.e. A::x and A::y, point to objects of class B, are initialized to newly allocated objects, and are never written to after initialization.
  • a transactional write to a field B::z of an object pointed to by A::x are not monitored as memory access to the same data item in regards to a non-transactional load of field B::z of an object pointed to by A::y. Extrapolating from these examples, it is possible to determine that monitors may perform monitoring/buffering at any data granularity level.
  • processor 100 includes monitors to detect or track accesses, and potential subsequent conflicts, associated with data items.
  • hardware of processor 100 includes read monitors and write monitors to track loads and stores, which are determined to be monitored, accordingly.
  • read and write monitors are to monitor data items at a granularity of the data items despite the granularity of underlying storage structures.
  • a data item is bounded by tracking mechanisms associated at the granularity of the storage structures to ensure the at least the entire data item is monitored appropriately.
  • read and write monitors include attributes associated with cache locations, such as locations within lower level data cache 150, to monitor loads from and stores to addresses associated with those locations.
  • a read attribute for a cache location of data cache 150 is set upon a read event to an address associated with the cache location to monitor for potential conflicting writes to the same address.
  • write attributes operate in a similar manner for write events to monitor for potential conflicting reads and writes to the same address.
  • hardware is capable of detecting conflicts based on snoops for reads and writes to cache locations with read and/or write attributes set to indicate the cache locations are monitored, accordingly.
  • setting read and write monitors, or updating a cache location to a buffered state results in snoops, such as read requests or read for ownership requests, which allow for conflicts with addresses monitored in other caches to be detected.
  • snoop logic is coupled to conflict detection/reporting logic, such as monitors and/or logic for conflict detection/reporting, as well as status registers to report the conflicts.
  • any combination of conditions and scenarios may be considered invalidating for a transaction, which may be defined by an instruction, such as a commit instruction, which is discussed below in more detail in reference to Figures 11-12.
  • Examples of factors, which may be considered for non-commit of a transaction includes detecting a conflict to a transactionally accessed memory location, losing monitor information, losing buffered data, losing metadata associated with a transactionally accessed data item, and detecting an other invalidating event, such as an interrupt, ring transition, or an explicit user instruction.
  • hardware of processor 100 is to hold transactional updates in a buffered manner.
  • transactional writes are not made globally visible until commit of a transaction.
  • a local software thread associated with the transactional writes is capable of accessing the transactional updates for subsequent transactional accesses.
  • a separate buffer structure is provided in processor 100 to hold the buffered updates, which is capable of providing the updates to the local thread and not to other external threads.
  • a cache memory such as data cache 150, is utilized to buffer the updates, while providing the same transactional functionality.
  • cache 150 is capable of holding data items in a buffered coherency state; in one case, a new buffered coherency state is added to a cache coherency protocol, such as a Modified Exclusive Shared Invalid (MESI) protocol to form a MESIB protocol.
  • MESI Modified Exclusive Shared Invalid
  • cache 150 In response to local requests for a buffered data item - data item being held in a buffered coherency state, cache 150 provides the data item to the local processing element to ensure internal transactional sequential ordering. However, in response to external access requests, a miss response is provided to ensure the transactionally updated data item is not made globally visible until commit.
  • the buffered update when a line of cache 150 is held in a buffered coherency state and selected for eviction, the buffered update is not written back to higher level cache memories - the buffered update is not to be proliferated through the memory system, i.e. not made globally visible, until after commit. Upon commit, the buffered lines are transitioned to a modified state to make the data item globally visible.
  • the terms internal and external are often relative to a perspective of a thread associated with execution of a transaction or processing elements that share a cache. For example, a first processing element for executing a software thread associated with execution of a transaction is referred to a local thread.
  • a store to or load from an address previously written by the first thread which results in a cache line for the address being held in a buffered coherency state, is received, then the buffered version of the cache line is provided to the first thread since it is the local thread.
  • a second thread may be executing on another processing element within the same processor, but is not associated with execution of the transaction responsible for the cache line being held in the buffered state - an external thread; therefore, a load or store from the second thread to the address misses the buffered version of the cache line, and normal cache replacement is utilized to retrieve the unbuffered version of the cache line from higher level memory.
  • the internal/local and external/remote threads are being executed on the same processor, and in some embodiments, may be executed on separate processing elements within the same core of a processor sharing access to the cache.
  • local may refer to multiple threads sharing access to a cache, instead of being specific to a single thread associated with execution of the transaction, while external or remote may refer to threads not sharing access to the cache.
  • external or remote may refer to threads not sharing access to the cache.
  • Metadata Metadata is purely illustrative for purpose of discussion. Similarly, the specific examples of translating data addresses for referencing metadata is also exemplary, as any method of associating data with metadata in separate entries of the same memory may be utilized. Metaphysical Address Spaces for Metadata Metadata
  • Metadata includes any property or attribute associated with data item 216, such as transactional information relating to data item 216.
  • metadata location 217 may hold any combination of the examples discussed below and other attributes for data item 216, which are not specifically discussed.
  • metadata 217 includes a reference to a backup or buffer location for transactionally written data item 216, if data item 216 has been previously accessed, buffered and/or backed up within a transaction.
  • a backup copy of a previous version of data item 216 is held in a different location, and as a result, metadata 217 includes an address, or other reference, to the backup location.
  • metadata 217 itself may act as a backup or buffer location for data item 216.
  • metadata 217 includes a filter value to accelerate repeat transactional accesses to data item 216.
  • access barriers are performed at transactional memory accesses to ensure consistency and data validity. For example, before a transactional load operation a read barrier is executed to perform read barrier operations, such testing if data item 216 is unlocked, determining if a current read set of the transaction is still valid, updating a filter value, and logging of version values in the read set for the transaction to enable later validation. However, if a read of that location has already been performed during execution of the transaction, then the same read barrier operations are potentially unnecessary.
  • one solution includes utilizing a read filter to hold a first default value to indicate data item 216, or the address therefore, has not been read during execution of the transaction and a second accessed value to indicate that data item 216, or the address therefore, has already been accessed during a pendency of the transaction.
  • the second accessed value indicates whether the read barrier should be accelerated.
  • the read barrier is elided - not executed - to accelerate the transactional execution by not performing unnecessary, redundant read barrier operations.
  • a write filter value may operate in the same manner with regard to write operations.
  • individual filter values are purely illustrative, as, in one embodiment, a single filter value is utilized to indicate if an address has already been accessed - whether written or read.
  • metadata access operations to check metadata 217 for 216 for both loads and stores utilize the single filter value, which is in contrast to the examples above where metadata 217 includes a separate read filter value and write filter value.
  • four bits of metadata 217 are allocated to a read filter to indicate if a read barrier is to be accelerated in regards to an associated data item, a write filter to indicate if a write barrier is to be accelerated in regards to an associated data item, an undo filter to indicate undo operations are to be accelerated, and a miscellaneous filter to be utilized in any manner by software as a filter value.
  • Metadata examples include an indication of, representation of, or a reference to an address for a handler - either generic or specific to a transaction associated with data item 216, an irrevocable/obstinate nature of a transaction associated with data item 216, a loss of data item 216, a loss of monitoring information for data item 216, a conflict being detected for data item 216, an address of a read set or read entry within a read set associated with data item 216, a previous logged version for data item 216, a current version of data item 216, a lock for allowing access to data item 216, a version value for data item 216, a transaction descriptor for the transaction associated with data item 216, and other known transaction related descriptive information.
  • metadata 217 may also include information, properties, attributes, or states associated with data item 216, which are not involved with a transaction.
  • the hardware monitors and buffered coherency states described above are also considered metadata in some embodiments.
  • the monitors indicate whether a location is to be monitored for external read requests or external read for ownership requests, while the buffered coherency state indicates if an associated data cache line holding a data item is buffered.
  • monitors are maintained as attribute bits, which are appended to or otherwise directly associated with cache lines, while the buffered coherency state is added to cache line coherency state bits.
  • hardware monitors and buffered coherency states are part of the cache line structure, not held in a separate metaphysical address space, such as illustrated metadata 217.
  • monitors may be held as metadata 217 in a separate memory location from data item 216, and similarly, metadata 217 may included a reference to indicate that data item 216 is a buffered data item.
  • metadata 217 may hold the buffered data item, while the globally visible version of data item 216 is maintained in its original location.
  • the buffered update held in metadata 217 replaces data item 216.
  • Lossy Metadata Similar to the discussion above with reference to buffered cache coherency states, metadata 217, in one embodiment, is lossy - local information that is not provided to external requests outside memory 215's domain.
  • memory 215 is a shared cache memory
  • a miss in response to a metadata access operation is not serviced outside cache memory 215's domain.
  • lossy metadata 217 is only held locally within the cache domain and does not exist as persistent data through out the memory subsystem, there is no reason to forward the miss externally to service the request from a higher-level memory.
  • misses to lossy metadata are potentially serviced in a quick and efficient fashion; immediate allocation of memory in the processor may be allocated without waiting for an external request for the metadata to be generated or serviced.
  • Metadata 217 is held in a separate memory location - a distinct address - from data item 216, which results in a separate metaphysical address space for metadata; the metaphysical address space being orthogonal to the data address space - a metadata access operations to the metaphysical address space does not hit or modify a physical data entry.
  • the metaphysical address space potentially affects the data address space through competition for allocation in memory 215.
  • a data item 216 is cached in an entry of memory 215, while metadata 217 for data 216 is held in another entry of the cache.
  • a subsequent metadata operation may result in the selection of data item 216's memory location for eviction and replacement with metadata for a different data item.
  • operations associated with metadata 217' s address do not hit data item 216, however, a metadata address for a metadata element may replace physical data, such as data item 216 within memory 215.
  • metadata potentially competes with data for space in the cache memory, the ability to hold metadata locally potentially results in efficient support for metadata without expensive cost of proliferating persistent metadata throughout a memory hierarchy.
  • memory 215 As inferred by the assumption of this example - that metadata is held in the same memory, memory 215; however, in an alternative embodiment, metadata 217 for/associated with data item 216 is held in a separate memory structure.
  • addresses for metadata and data may be the same, while a metaphysical portion of the metadata address indexes into the separate metadata storage structure instead of the data storage structure.
  • a metaphysical address space shadows the data address space, but remains orthogonal as discussed above.
  • metadata may be compressed with regard to physical data.
  • the size of a metaphysical address space for metadata does not shadow the data address space in size, but still remains orthogonal.
  • metaphysical translation logic 210 is utilized to translate an address, such as data address 200 to a metadata address.
  • address 200 includes an address that is associated with, or references, data item 216.
  • Normal data translation such as translation between physical, or linear, and virtual addresses may be utilized to index to data item 216 within memory 215.
  • association of metadata 217 with data item 216 includes similar translation of address 200, which references data item 216, into another distinct address that references metadata 217; therefore, translation of address 200 into a data address with data translation logic 205 and a distinct metaphysical address with metaphysical translation 210 results in separate accesses without interference from one another - creating the orthogonal nature of the two address spaces.
  • use of data translation 205 or metaphysical translation 210 in one embodiment, is based on the type of operation to access address 200 - a normal data access operation to access data item 216 utilizes data translation 205, while a metadata access operation to access metadata 217 utilizes metaphysical translation 210, which may be identified through a portion of instruction/operation operation code (opcode).
  • an instruction may potentially access both data and metadata for a given metadata address, and thus, perform complex operations, such as a conditional store to data based on metadata.
  • an instruction is decoded into a test and set metadata operation to test metadata and set it to a value, as well as an additional operation to set data to a value if the test of metadata succeeded.
  • a data item may be moved based on a data read from data memory to the matching metadata address. Examples of translating data address 200 to a metadata address for metadata 217 are included immediately below.
  • translating a data address to a metadata address includes utilizing a physical address or a virtual address - after normal data translation 205 - plus addition of a metaphysical value with metaphysical translation logic 210 to separate data addresses from metadata address.
  • metaphysical translation logic 210 includes logic to combine the virtual address with a metaphysical value.
  • normal virtual to physical address translation is utilized, then normal data translation 205 is utilized to obtain a translated address from address 200 and then metaphysical translation logic 210 includes logic to combine the translated address with a metaphysical value to form a metadata address.
  • data address 200 may be translated utilizing separate translation structures, tables, and/or logic within metaphysical translation 210 to obtain a distinct metadata address.
  • metaphysical translation logic 210 may mirror, or include separate logic - logic to combine address 200 with a metaphysical value, in comparison to data translation logic 205, but metaphysical translation logic 210 includes page table information to translate address 200 to a different, distinct metadata address. It can be seen that either through addition of information in, extension with information appended to, replacement of information within, or translation of a data address to obtain a metadata address, the resulting distinct metadata address is associated with the data item through the algorithm of addition, extension, replacement, or translation, while remaining orthogonal from incorrectly updating or reading the data item.
  • any of the aforementioned translation techniques may incorporate, i.e. be based on, a compression ratio of data to metadata so as to separately store metadata for each compression ratio.
  • an address may be modified for translation and/or compression, such as though disregarding specific bits of an address, removing specific bits of an address, changing what bit ranges with an address are used for selection of different granularities of data, translating specific bits, and adding or replacing specific bits with metadata related information. Compression is discussed in more detail below in reference to Figure 4.
  • Multiple Metaphysical Address Spaces Turning to Figure 3 an embodiment of supporting multiple metaphysical address spaces is illustrated.
  • each processing element is associated with a metaphysical address space, such that each processing element is capable of maintaining independent metadata.
  • Four processing elements 301-304 are depicted.
  • a processing element may encompass any of the elements described above in reference to Figure 1.
  • processing elements include cores of a processor.
  • processing elements 301-304 will be discussed in reference to hardware threads (threads) within a processor; each hardware thread to execute a software thread and potentially multiple software subsystems. Therefore, it is potentially advantageous to allow individual threads of threads
  • metaphysical translation logic 310 is to associate accesses from different threads 301-304 with their appropriate metaphysical address spaces.
  • a thread identifier (ID) utilized in conjunction with an address referenced by a metadata access operation indexes into the correct metaphysical address space.
  • a metadata access operation which is associated with thread 302 and references data address 300 for data item 316
  • Any method of translation may be utilized to translate the data address for data item 316 to a metadata address.
  • the translation additionally includes combination with thread ID 302, which, for example, may be obtained from a control register for thread 302 or an opcode of the received instruction from thread 302.
  • the combination may include appending thread ID 302 to the address, replacement of bits in the address, or other known method of associating a thread ID with an address.
  • metaphysical translation logic 310 is able to select/index into the metaphysical address space associated with data item 316 for processing element 302.
  • each processing element 301-304 is capable of maintaining independent metadata for data item 316. Yet, a programmer does not need to individually manage the metaphysical address spaces, because the hardware is capable of keeping them separate through use of the thread ID in a transparent manner to software. Moreover, the metaphysical address spaces are orthogonal - one metadata access from one thread does not access metadata from another thread because each metadata access is associated with a separate set of addresses, which include a reference to a unique thread ID.
  • an access across PEIDs and/or MDIDs may be advantageous. For example, to determine if hardware has detected conflicts, to check monitor metadata from another thread to determine if an associated data item is monitored by another thread, to clear other thread's metadata, or to determine commit conditions a thread may need to check, modify or clear other thread's metadata associated with data item 316.
  • a specific opcode for the operations to access another thread's metadata is recognized, and as a result, metaphysical translation logic 310 performs the translation of address 300 to all metadata addresses for the metadata to be accessed.
  • metaphysical translation logic 310 sets each of the four bits to access all metadata 317.
  • the lookup logic for memory 315 may be designed where a single access with all four bits set accesses all metadata 317, or metaphysical translation logic 310 may generate four separate accesses with a different thread ID bit of the four bits set to access all metadata 317.
  • a mask may be applied to an address value to allow one thread to hit metadata of another thread.
  • each processing element 301-304 may be associated with multiple metaphysical address spaces to interleave multiple contexts or software subsystems within a single thread to multiple metadata address spaces. For example, in some situations, it is potentially advantageous to allow multiple software subsystems within a single processing element to maintain independent metadata sets. Therefore, in one example, orthogonal metadata address spaces may be provided at multiple processing element levels, such as at a core level, hardware thread level, and/or software subsystem level. In the illustration, each processing element 301-304 is associated with two metaphysical address spaces, where each one of the two metaphysical address spaces is to be associated with software subsystems to execute on one of the processing elements.
  • a software subsystem includes any task or code to be executed on a processing element, which may utilize a separate metaphysical address space.
  • four subsystems that may be associated with individual metaphysical address spaces include a transactional runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, and a software translation subsystem, which may be executed on a single processing element.
  • each software subsystem may have control of the processing element at different times.
  • a software subsystem includes individual transactions executed within a single processing element. In fact, it may be desirable for nested transactions executing on the same thread to be associated with separate metaphysical address spaces.
  • a filter test for an access to a data item within an outer transaction may fail, yet it is potentially advantageous to provide a second, distinct filter for an access to the same data item within an inner nested transaction, which may separately succeed to accelerate the access within the inner transaction.
  • each nested transaction - subsystem - is associated with distinct metadata space, such that a clear of the inner nested transaction's metadata does not affect the outer transaction's metadata.
  • a software subsystem is not so limited, as it may be any task or code capable of managing metadata.
  • the address is combined with the processing element ID (PEID) as discussed above; and in addition, is combined with a metadata ID (MDID), or a context ID. Therefore, separate metadata may be uniquely identified for a subsystem within a processing element.
  • PEID processing element ID
  • MDID metadata ID
  • processing elements 301- 304 are hardware threads, and that thread 302 is executing an outer transaction and an inner transaction nested within the outer transaction.
  • metadata 317c is associated with data item 316 through metaphysical translation 310 translating data address 300 of data item 316 to an address plus a thread ID (TID) and a metadata ID (MDID) for the outer transaction, which references metadata 317c.
  • metadata 317c includes four filter values - read filter value, write filter value, undo filter value, and a miscellaneous filter value, a pointer or other reference to a backup location for data item 316, a monitoring value to indicate if monitors on data item 316 have been lost, a transaction descriptor value, and a version of data item 316.
  • the inner transaction is associated with metadata 317d for data item 316, which includes the same metadata fields as those in metadata 317c.
  • metaphysical translation 310 translates data address 300 for data item 316 to an address combined with the thread ID and the metadata ID for the inner transaction, which references metadata 317d.
  • the only difference between the metadata address, which references metadata 317c, and the metadata address, which references metadata 317d, may be the metadata ID for the outer transaction and the inner transaction; yet, this difference in address ensures the address spaces are disjoint/orthogonal - an access to metadata from the inner transaction will not affect metadata from the outer transaction because the MDID for an access from the inner transaction will be different from the outer transaction. As referred to above, this may be advantageous for rolling back nested transactions or holding different metadata values for different level transactions.
  • the backup data for data item 316 held in metadata 317d may be cleared or used to roll-back data item 316 to an entry point before the inner transaction without clearing or affecting the backup data for the outer transaction held in metadata 317c.
  • the metadata ID (MDID) to separate software subsystem metaphysical address spaces may be any size and may come from many sources.
  • a PEID may be from a combination of two bits - 00, 01, 10, 11.
  • an MDID of two bits - 00, 01, 10, 11, is similarly able to distinguish between four subsystems.
  • a value to represent processing element 302 and subsystem two within PE 302 includes 0101 (first two bits are 01 for PE 302 and the second two bits are 01 for the second subsystem).
  • metaphysical translation logic combines this value with data address 300, or a translation thereof, to reference PE 302 MDID 01, which includes metadata location 317d.
  • both thread IDs and MDIDs may be more complex. For example, assume threads 301-302 share access to memory 315, while threads 303-304 are remote processing elements that do not share access to memory 315. In addition, assume that threads 301-302 each support two software subsystems for a total of four orthogonal address spaces for threads 301-302 - PE 301 MDO, PE 301 MDl, PE 302 MDO, and PE 302 MDl address spaces. In this case, a value for the combined thread ID and MDID utilized to obtain the metadata address may come from an opcode, a control register, or a combination thereof.
  • an opcode provides one bit for context/MDID
  • a control register provides one bit for a processing element ID (PEID) - assuming only two processing elements
  • a metadata control register such as MDCR 320, provides four bits to identify a specific software subsystem/context for greater granularity.
  • the one bit from the opcode - the first bit including a 1 to indicate a second context, and a second bit from a control register for processing element 302 — the second bit including a 1 to indicate processing element 302, is combined with a MDID from metadata control register (MDCR) 320 associated with the second thread; the MDCR to have been previously updated by the current subsystem's MDID, which is controlling the second thread - 0010 - to identify the proper subsystem associated with the received operation.
  • Metaphysical translation logic takes the combined value, such as 110010, and further combines it with referenced data address 300, or a translation thereof, to obtain a metadata address.
  • the 110010 part of the metadata address is unique to the subsystem that the access operation originated from, so it will only hit or modify metadata address 317d in memory 315 without hitting or affecting metadata addresses 317a, b, c, e, f, g, h - the orthogonal metaphysical address spaces for other subsystems both within the second thread and other threads.
  • MDCR Metadata Control Register
  • MDID current metadata context ID
  • Table A Exemplary Embodiment of bits for MDCR
  • MDID 0 and MDIDl are the metadata IDs concurrently accessible to the instruction set.
  • the number of bits actually used out of these fields is MDID_size, which in one embodiment, is read only at any permission level, as it is specified by processor design. However, in other embodiments different level privilege levels may be able to modify the size. There may be no hardware checks that ensure the MDID fits within the size bit allotment.
  • MDIDO and MDIDl are capable of being written and read at any permission level. It may also be possible to use special MDID values to designate special metadata spaces which always read as zero or one. This might be used by software to force all metadata tests in a block to be true or false in a similar fashion to the discussion of a register to force a metadata value in reference to Figure 6 and 7.
  • metaphysical translation logic 310 in conjunction with decoders are capable of recognizing metadata access operations from thread 302, which are intended to access metadata from thread 301's metadata address space, and allow access for those specific instructions/operations to read or modify thread 301 's metadata. Compression of Metadata to Data
  • metaphysical address translation logic 210 and 310 from Figures 2-3 may take compression into account when performing translation and modification of an address to reference compressed metadata, accordingly.
  • FIG 4 an embodiment of modifying an address to achieve compression of metadata is illustrated; specifically, an embodiment of a compression ratio of 8 for data to metadata is depicted.
  • Control logic such as metaphysical address translation logic 210 and 310 from Figures 2-3, is to receive data address 400 referenced by a metadata access operation.
  • compression includes shifting or removing log 2 (N) number of bits within or from address 400, where N is the compression ratio of data to metadata.
  • N is the compression ratio of data to metadata.
  • N the compression ratio of data to metadata.
  • address 400 that includes 64 bits to references a specific data byte in memory is truncated by three bits to form the metadata byte address 405 used to reference metadata in memory on a byte granularity; out of which a bit of metadata is selected using the three bits previously removed from the address to form the metadata byte address.
  • the bits shifted/removed in one embodiment, are replaced by other bits. As illustrated, the high order bits, after address 400 is shifted, are replaced with zeros.
  • the removed/shifted bits may be replaced with other data or information, such as a processing element ID, context identifier (ID), and/or a metadata ID (MDID) associated with the metadata access operation.
  • a processing element ID such as a graphics processing unit (GPU)
  • ID context identifier
  • MDID metadata ID
  • the lowest number bits are removed in this example, any position of bits may be removed and replaced based on any number of factors, such as cache organization, cache circuit timing, locality of metadata to data, and minimizing conflicts between data and metadata
  • a data address may not be shifted by log 2 (N), but rather address bits 0:2 are zeroed.
  • bits of the physical address and virtual address that are the same are not shifted as in the example above, which allows for pre-selection of a set and a bank with unmodified bits, such as bits 11 :3.
  • a compression ratio may be an input into metaphysical address translation logic 210 and 310 from Figures 2-3 and the translation logic utilizes the compression ratio in conjunction with a PEID, CID, MDID, metaphysical value, or other information to translate a data address into a metadata address.
  • the metadata address is then utilized to access a memory holding the metadata.
  • metadata is a local construct - lossy, misses to the memory based on the metadata address may be serviced quickly and efficiently - allocation of a memory location without generating an external miss service request and without waiting for the external request to be serviced.
  • an entry is allocated in a normal fashion for the metadata.
  • an entry such as entry 217 from Figure 2
  • an entry is selected, allocated, and initialized to the metadata default value based on metadata address 405 and a cache replacement algorithm, such as a Least Recently Used (LRU) algorithm.
  • LRU Least Recently Used
  • a compression ratio of eight is purely illustrative and any compression ratio may be utilized.
  • a compression ratio of 512: 1 is used - a bit of metadata represents 64 bytes of data.
  • a data address is translated/modified to form metadata address through shifting the data address down by Iog 2 (512) bits - 9 bits.
  • bits 6:8 are still utilized to select a bit, instead of bits 0:2, effectively creating the compression through selection at a granularity of 512 bits.
  • the high order portion of the data address has 9 open bit locations to hold information.
  • the 9 bits are to hold identifiers, such as context ID, thread ID, and/or MDID.
  • metaphysical space values may also be held in these bits or the address may be extended by the metaphysical value.
  • multiple concurrent compression ratios are supported by hardware.
  • a representation of a compression ratio is held as part of a metaphysical value combined with a data address to obtain a metadata address.
  • the compression ratio is taken into account and does not match addresses of different compression ratios.
  • software may be able to rely on hardware to not forward store information to loads of a different compression ratio.
  • hardware is implemented utilizing a single compression ratio, but includes other hardware support to present multiple compression ratios to software.
  • cache hardware is implemented utilizing an 8: 1 compression ratio, as illustrated in Figure 4.
  • a metadata access operation to access metadata at different granularities is decoded to include a micro-operation to read a default amount of metadata and a test micro-operation to test an appropriate part of the metadata read.
  • the default amount of metadata read is 32-bits.
  • test operation for a different granularity/compression of 8 1 tests correct bits of the 32 bits of metadata read, which may be based on a certain number of bits of an address, such as a number of LSBs of a metadata address, and/or a context ID.
  • a single bit is selected from the least significant eight bits of the 32 read bits of metadata based on the three LSBs of a metadata address.
  • two consecutive metadata bits are selected from the least significant 16 bits of the 32 bits of read metadata based on the three LSBs of the address, and continuing all the way to 16 bits for a 128 bit metadata size.
  • FIG. 5 a flow diagram for a method of accessing metadata associated with data is illustrated. Although the flows of Figure 5 are illustrated in a substantially serial fashion, the flows may be performed at least partially in parallel, as well as potentially in a different order.
  • flow 505 a metadata operation referencing a data address for a given data item is encountered.
  • metadata instructions/operations may be supported in hardware to read, modify, and/or clear metadata.
  • instructions may be supported in a processor's Instruction Set Architecture (ISA), such that decoders of the processor recognize operation codes (opcodes) of instructions to access data and logic to perform the accesses, accordingly. Note that use of instruction may also refer to an operation.
  • ISA Instruction Set Architecture
  • Some processors utilize the idea of a macro-instruction, which is capable of being decoded into a plurality of micro- operations to perform individual tasks, such as a test and set metadata macro-instruction, which is decoded into a metadata test operation/micro-operation to test the metadata and if the correct Boolean value is obtained as a result of the test operation, then a set operation updates the metadata to a specific value.
  • a macro-instruction which is capable of being decoded into a plurality of micro- operations to perform individual tasks, such as a test and set metadata macro-instruction, which is decoded into a metadata test operation/micro-operation to test the metadata and if the correct Boolean value is obtained as a result of the test operation, then a set operation updates the metadata to a specific value.
  • metadata access operations are not limited to explicit software instructions to access metadata, but rather may also include implicit micro-operations decoded as part of a larger more complex instruction that includes an access to a data item associated with metadata.
  • the data access instruction may be decoded into a plurality of operations, such as an access to the data item and an implicit update of the associated metadata.
  • the physical mapping of metadata to data in hardware is not directly visible to software.
  • metadata access operations in this example, reference data addresses and relies on the hardware to perform the correct translations, i.e. mapping, to access the metadata appropriately.
  • metadata access operations may individually reference separate metaphysical address spaces depending on which thread, context, and/or software subsystem they originate from. Therefore, a memory may hold metadata for data items in a transparent fashion with regard to the software.
  • the hardware detects an access operation to metadata, either through explicit operation code (op code of an instruction) or decoding of an instruction into a metadata access micro-operation(s), the hardware performs the requisite translation of the data address referenced by the access operation to access the metadata accordingly.
  • a program may include separate operations, such as a data access operation or a metadata access operation, that reference the same address of a data item, such as data items 216 and 316 from Figure 2-3, and the hardware may map those accesses to different address spaces, such as a physical address space and a metaphysical address space.
  • the ISA may be extended with instructions to load/store/test/set metadata for a given virtual address, MDID, compression ratio, and operand width. Any of these parameters may be explicit instruction operands, may be encoded in the opcode, or may be obtained from a separate control register.
  • Instructions may combine the metadata load/store operation with other operations, for example, loading some data, testing some bits of it, and setting a condition code for a subsequent conditional jump. Instructions may also flush all metadata, or just metadata for a particular MDID. Below are listed a number of illustrative metadata access operations. Note that some of the exemplary instructions are in reference to specific 64X compression ratio instructions, but similar instructions may be utilized for different compression ratios, as well as uncompressed metadata, even though they are not specifically disclosed. Metadata Bit Test and Set (MDLT)
  • the metadata load and test instruction has 2 arguments: the data address to which the metadata is associated as a source operand and a register (destination operand) into which the byte, word, dword, qword or other size of metadata containing the bit is written. The value of the tested metadata bit is written into the register.
  • the programmer should not assume any knowledge about the data stored in the destination register of the MDLT instruction, and should not manipulate this register.
  • This register is to be used solely as a source operand to a metadata store and set instruction (MDSS) to the same address.
  • MDLT instruction will combine the test and set operations, but will squash the set operation if the test succeeds.
  • MSS Metadata Store and Set
  • the metadata store and set instruction has 2 arguments: The data address to which the metadata is associated and a register (source operand) from which the byte, word, dword, qword or other size of metadata containing the bit is to be stored to memory.
  • the MDSS instruction will set the correct bit in the value from its source operand.
  • Metadata Store and Reset Instruction MDSR
  • the MDSR instruction has 2 source arguments: The data address to which the metadata is associated as a source operand and a register (source operand) from which the byte, word, dword, qword or other size of metadata containing the bit is to be reset.
  • the MDSR instruction will reset the correct bit in the value from its source operand.
  • a metadata address is determined from the referenced data address. Examples of determining a metadata address are included in the metaphysical address translation and multiple metaphysical address spaces sections above. However, note that the translation may incorporate, i.e. be based on, a compression ratio of data to metadata so as to separately store metadata for each compression ratio.
  • 81/3 ib DTO mem is zero, uses MDIDO from MDCR.
  • DT1 mem is zero, uses MDID1 from MDCR.
  • the CMDT instruction is to convert the memory data address to a memory metadata address with a compressed mapping function that is implementation dependent and test whether a metadata bit corresponding to the memory metadata address is set.
  • the compression ratio CR is of 1 bit for 8 bytes.
  • the metadata address computation incorporates one of the context IDs from the MDCR register to provide a unique set of MD for each individual context ID, addressing
  • CMDS Compressed Metadata Store
  • CMDS instruction converts the memory data address to a memory metadata address with a compressed mapping function that is implementation dependent.
  • the compression ratio is 1 bit for 8 bytes of data.
  • the encoding of the imm8 value is as follows: 0 - ⁇ MD_Value; Value to be stored into MD and 7: 1 - ⁇ Reserved; Not Used
  • CMDCLR Compressed Metadata Clear
  • CMDCLR resets all MDBLK[CR][MDCR.MDID[MDID number]]. META that correspond to any data in the range spanning MBLK(mem). Exemplary pseudo code related to CMDCLR is included below:
  • mblk floor (addr, MBLK_SIZE)
  • mdblkStart mblk
  • mdblkEnd floor(mblk + MBLK_SIZE - 1, MDBLK_SIZE)
  • mblk floor (addr, MBLK_SIZE)
  • mdblkStart mblk
  • mdblkEnd floor(mblk + MBLK_SIZE - 1, MDBLK_SIZE)
  • OSTM [bit 15] 0 64-Bit Mode Exceptions #GP(0) If the memory address is in a non-canonical form.
  • a metadata address is determined from the data address referenced in the metadata access operation based on a compression ratio, processing element ID, context ID, MDID, metaphysical value, operand size, and/or other metaphysical address space translation related value. Any of the methods described above, such as combination of ID values with no translation of the data address, normal translation of the data address, or separate metaphysical address translation of the data address, may be utilized to obtain the appropriate metadata address.
  • the translation to a metadata address may include modification of the address, such as application of a mask, to allow the access from one thread or context ID to access another thread or context ID.
  • the metadata referenced by the metadata address is accessed.
  • the disjoint location for the metadata associated with the local requesting thread or context ID is accessed and the appropriate operations, such as test, set, and clear, are performed.
  • metadata for other threads or context IDs may be accessed in this flow as well.
  • a given CR is a power of two that indicates how many bits of data map to one bit of metadata. It is implementation defined which CRs values, if any, may be used.
  • CR>1 denotes Compressed Metadata.
  • MDBLK[CR][*]s are ceil(CR/8) bytes in size and are naturally aligned. MDBLKs are associated with physical data, not their linear virtual addresses. All valid physical addresses A with the same value floor(A/MDBLK[CR][*]_SIZE) designate the same sets of MDBLKs.
  • MDID For a given CR, there can be any number of distinct MDIDs each designating a unique instance of metadata.
  • the metadata for a given CR and MDID is distinct from the metadata for any other CR or MDID.
  • addr is QWORD aligned
  • a given implementation may support multiple concurrent contexts, where the number of contexts will depend on the CR and certain configuration information related to the specific system of which the processor is a part.
  • Uncompressed Metadata there is a QWORD of metadata for each QWORD of physical data.
  • Metadata is interpreted by software only.
  • Software may set, reset, or test META for a specific MDBLK[CR][MDID], or reset META for all the Thd's MDBLK[*][*]s, or reset META for all the Thd's MDBLKS[CR][MDID] that may intersect a given MBLK(addr).
  • Metadata Loss Any META property of the Thd may spontaneously reset to 0, generating a Metadata Loss Event.
  • STMs usually ensure consistency between memory access operations utilizing access barriers. For example, before a memory access to a data item, a metadata location or lock location associated with the data item is checked to determine if the data item is available. Other potential barrier operations include obtaining a lock, such as a read lock, write lock, or other lock, on the data item in the metadata or lock location, logging/storing a version for the data item in a read or write set for a transaction, determining if a read set for a transaction to that point is still valid, buffering or backing up a value of the data item, setting monitors, updating a filter value, as well as any other transactional operations.
  • a lock such as a read lock, write lock, or other lock
  • hardware holds a filter value to accelerate execution associated with these barriers.
  • the filter value may be included in a cache as an annotation bit, such as the read and write monitors, or held in a metadata location within a metaphysical address space, as previously described.
  • the first write barrier when the first write barrier is encountered, it updates a write filter value from an un- accessed value to an accessed value to indicate a write barrier for address A has been encountered already within the transaction. Therefore, upon the subsequent two transactional write operations within the transaction, before vectoring to the write barrier, the write filter value for address A is checked.
  • the filter value includes an accessed value, which indicates that the write barrier does not need to be executed - the write barrier was already executed within the transaction.
  • the filter value accelerates transactional execution - elides or does not include execution of the write barrier for the last two accesses in comparison to the previous example without utilizing a filter.
  • read filters for loads/reads, undo filters for undo operations, and miscellaneous filters for generic filter operations may be utilized in the same manner as the write filter above was utilized for write/store operations.
  • transactional barriers are also inserted at non-transactional operations; this provides protection and isolation between transactional and non-transactional operations, but at a cost - the expense of executing a transactional barrier at every non-transactional operation.
  • the filters described above may be leveraged in combination with strong atomicity barriers at non-transactional operations to support different modes of strong and weak atomicity operation.
  • metadata 610 is held in hardware for data 605, as discussed above.
  • Metadata access 600 is received to access metadata 610.
  • metadata access includes a test metadata operation to test a filter, such as read filter, write filter, undo filter, or miscellaneous filter.
  • a test metadata operation to test a filter may originate from a transactional or non- transactional access operation.
  • a compiler when compiling application code, inserts the test filter operation inline in the application code as a condition to executing a call to a transactional barrier at transactional and non- transactional accesses. Therefore, within a transaction, the filter operation is executed before a call to a barrier, and if it returns successful, then the call to the transactional barrier is not executed providing the acceleration discussed above.
  • the hardware is capable of operating in a weak atomicity mode, where transactional barriers at non- transactional operations are not executed, and a strong atomicity mode where transactional barriers are executed.
  • the mode of operation, or control 625 may be set in metadata control register (MDCR) 615, which may be combined with the version of MDCR described above to hold MDIDs or may be a separate control register.
  • control 625 for mode of operation may be held in a general transactional control register or status register.
  • a first mode of execution includes a strong atomicity mode where transactional barriers are to be executed at non-transactional operations.
  • control 625 represents a first value, such as a 00, to indicate a strong atomicity and non-transactional mode of operation.
  • response logic 620 which is listed as an exemplary multiplexer, selects the metadata value from hardware maintained metadata 610 associated with data address A to be provided to destination register 650 for metadata access 600.
  • barriers are accelerated based on the actual hardware held metadata.
  • a second mode of execution such as a weak atomicity and non-transactional mode, as indicated by control 625 representing a second value, such as 01
  • a fixed or forced value from MDCR is provided to destination register 650 in response to metadata access 600 instead of the hardware maintained metadata 610.
  • a forced value is provided to destination register 650 in response to test filter operation 600 to ensure the test of the filter value always succeeds and the call to the transactional barrier is not executed before the non- transactional memory access.
  • the test filter operation is returning a Boolean value as to indicate if the filter test succeeds (barrier is not to be executed) or fails (barrier is to be executed).
  • the same filter software construct for accelerating transactions by eliding barriers based on the filter value is leveraged to provide one mode of operation where all barriers at non-transactional operations are elided - weak atomicity mode, and a second mode of operation where barriers at non-transactional operations are executed or accelerated based on hardware maintained metadata - strong atomicity.
  • different forced values may be provided for each mode.
  • the forced value would ensure the test filter operation fails so the barrier is always executed, while in the weak atomicity mode, the forced value would ensure the test filter operation succeeds so the barrier is not executed.
  • providing a forced or fixed value from a control register, such as MDCR 615, based on control information, such as control 625, has been described in relation to providing a fixed/forced value or a metadata value based on mode operation, providing a forced or fixed value may be utilized for any generic metadata usage, such as allowing a data-invariant behavior to be utilized for debugging and generic monitoring of memory accesses capable of being enabled on-demand.
  • a metadata (MD) access operation referencing a data address is encountered.
  • the MD access operation includes a test operation previously inserted by a compiler in-line with application code to elide a transactional barrier at a non-transactional memory access if the test returns one value (successful) and to execute the barrier if the test returns a second value (failure).
  • a test MD operation is not so limited, as it may include any test operation for returning a Boolean success or failure value.
  • a mode of operation is determined.
  • examples of a mode of operation may be transactional or non-transactional in combination with strong atomicity or weak atomicity. Therefore, one, or two separate registers, may hold a first bit to indicate a transactional or non-transactional mode of operation and a second bit for strong or weak atomicity mode of operation.
  • the hardware maintained metadata value is provided to the metadata access operation - the hardware maintained value is placed in a destination register specified by the MD access operation.
  • the forced MDCR fixed value is provided to the MD access operation instead of the hardware maintained MD value.
  • FIG. 8 an embodiment of a flow diagram for a method of efficiently transitioning a block of data to a buffered and monitored state before commit of a transaction is illustrated.
  • blocks of memory such as a cache line holding a data item or metadata may be buffered and/or monitored.
  • coherency bits for a cache line include a representation of a buffered state and attribute bits for a cache line indicate if the cache line is unmonitored, read monitored, or write monitored.
  • a cache line is buffered, but unmonitored, which means the data held in the cache line is lossy and that conflicts to the cache line are not detected, since there is no monitoring applied.
  • data that is local to a transaction and is not to be committed such as metadata, may be held in a buffered and unmonitored state.
  • read monitoring is applied to the data.
  • the cache line is then moved to a buffered and read monitored state; however, to get to that state, a read request is sent to external processing elements forcing all other copies to transition to a shared state. These external read requests may result in a conflict with another processing element maintaining a write monitor on the same block/cache line.
  • write monitoring is applied to the cache line.
  • the line is then moved to a buffered and write monitored state, which is achieved by a sending read for ownership request to other processing element forcing all other copies to transition to an invalid state.
  • a conflict is detected with any processing element maintaining either a read or write monitor on the same memory block.
  • a memory block that the transaction needs to update but not eventually commit may be maintained in the buffered but unmonitored state, as described above.
  • an efficient path from the buffered and unmonitored state to a committable state is provided as illustrated in Figure 8.
  • a buffered update to a memory block - a cache line to hold the block - is received in flow 805.
  • read monitoring is applied to the block.
  • a read attribute for the cache line is set to a read monitor value to indicate the block is read monitored.
  • a read request is first sent out to other processing elements in flow 815.
  • the other processing elements In response to receiving the read request, the other processing elements either detect a conflict due to maintaining the line in a write monitoring state already, or transition their copies to a shared state in flow 820. In flow 825, if there are no conflicts, then the cache line is transitioned to a buffered and read monitored state - cache line coherency bits are updated to a buffered coherency state and the read monitor attribute is set.
  • conflicting writes are detected to the cache line based on the read monitoring.
  • the read attributes are coupled to snoop logic, such that an external read for ownership request to the cache line will detect a conflict with the read monitor being set on the cache line.
  • transitioning the buffered and unmonitored block to a committable state in two stages - flow 810 and flow 840 - is potentially advantageous.
  • Deferring the acquisition of ownership via the staged acquisition of read and write monitors allows multiple concurrent transactions to update the same block, while reducing the conflicts between these transactions. If a transaction does not get to the commit stage for any reason, updating the block in a buffered and read monitored way will not cause another transaction that will get to the commit stage to needlessly abort.
  • deferring acquiring sole ownership of the block until the commit stage is therefore a way to obtain higher concurrency among threads without sacrificing validity of data.
  • Table E below illustrates an embodiment of conflicting states between two processing elements: PO and Pl.
  • a line held by Pl in a buffered read monitored state, as indicated by the R-B column, and any state of PO with the cache line maintained with a write monitor, as indicated by the -W-, RW-, WB, RWB, is conflicting, as represented by the x in the intersecting cells.
  • Table E An embodiment of conflicting states between two processing elements Additionally, Table F below illustrates a loss of an associated property in processing element Pl in response to the operation listed under PO. For example, if Pl holds a line in a buffered read monitored state, as indicated by the R-B column, and either a store or set write monitor operation occurs on PO, then Pl loses both read monitoring and buffering of the line as indicated by the x-x in the intersection of the store/set WM rows and the R-B column.
  • Table E An embodiment of loss of attributes as result of an operation Branch Instruction (JLOSS) for conflict or loss of transactional data
  • hardware provides an accelerated way to check a transaction's consistency.
  • hardware may support consistency checking by providing mechanisms that track loss of monitored or buffered data from the cache - eviction of buffered or monitored lines, or track potential conflicting accesses to such data - monitors to detect conflicting snoops, such as a read request for ownership to a monitored line.
  • hardware provides architectural interfaces to allow software to access these mechanisms based on the status of monitored or buffered data.
  • Two such interfaces include the following: (1) Instructions to read or write a status register that allow the software to poll the register explicitly during execution; (2) an interface that allows software to setup a handler that is invoked whenever the status register indicates a potential loss of consistency.
  • hardware supports a new instruction called JLOSS that performs a conditional branch based on the status of HW monitored or buffered data.
  • the JLOSS instruction branches to a label if the hardware detects potential loss of any monitored or buffered data from the cache, or it detects potential conflicts to any such data.
  • a label includes any destination, such as an address of a handler or other code to be executed as a result of a loss of data or detection of a conflict.
  • Figure 9 depicts decoders 910, which recognize JLOSS as part of a processor ISA and decodes the instruction to allow logic of the processor to perform the conditional branch based on the status of a transaction.
  • the status of a transaction is held in transaction status register 915.
  • Transaction status register may represent the status of transactions, such as when hardware detects a conflict or a loss of data — herein referred to as a loss event.
  • a conflict flag in TSR 915 is set upon a monitor indicating an address is monitored in combination with a snoop to the monitored address, the conflict flag in TSR 912 indicating a conflict was detected.
  • a loss flag is set upon a loss of data, such as an eviction of a line including transactional data or metadata.
  • JLOSS when decoded and executed, tests the status register flags, and if there is a loss event — loss and/or conflict, then logic 925 provides the label referenced by JLOSS to execution resources 930 as a jump destination address.
  • software is able to discern the status of a transaction, and based on that status is capable of vectoring execution to a label specified by the single instruction. Because JLOSS checks consistency, reporting of false conflicts is acceptable - JLOSS may conservatively report that a conflict has occurred.
  • software such as a compiler, inserts JLOSS instructions into the program code to poll for consistency.
  • JLOSS may be utilized inline with main application code, often JLOSS instructions are utilized within read and write barriers to determine consistency on demand, which are often provided within libraries; therefore, execution of program code may include a compiler to insert JLOSS in code, or execution of JLOSS from the program code, any other form of inserting or executing an instruction. It's expected that polling by JLOSS is much faster than an explicit read of the status register, because the JLOSS instruction does not require additional registers - there is no need for a destination register to receive the status information for an explicit read.
  • transaction status register 915 holds specific conflict and loss status information, such as if a read monitored location has been written by another agent - read conflict, a write monitored location has been read or written by another agent - write conflict, a loss of physical transactional data, or a loss of metadata. Therefore, different versions of the JLOSS instruction may be utilized. For example, a JLOSS. rm ⁇ label> instruction will branch to its label if any read monitored location may have been written by another agent.
  • HASTM HLOSS.rm
  • JLOSS.rm JLOSS.rm instruction
  • a JLOSS.rm instruction may be utilized to detect any reads or writes to a write monitored locations.
  • a JLOSS.buf instruction may be used to determine if buffered data has been lost and jump to a specified label as a result.
  • the following Pseudo Code shows a native code STM read barrier that provides a consistent read set and uses JLOSS.
  • the setrm(void* address) function sets the read monitor on the given address and the jloss_rm() function is an intrinsic function for the JLOSS instruction that returns true if any conflicting accesses to read monitored locations may have occurred.
  • This pseudo-code monitors the loaded data, but it's also possible to monitor the transaction records (ownership records) instead. It's possible to use an instruction that combines setting of the read monitor with loading of the data - e.g. a movxm instruction that both loads and monitors the data.
  • Pseudo Code A An in-place update STM, optimistic read, native code read barrier
  • an STM system that does not maintain read-set consistency may avoid infinite loops, or other incorrect control flow — exceptions, due to inconsistency by inserting JLOSS. rm instruction at loop back edges , or other critical control flow points, such as instructions that may raise exceptions.
  • the following Pseudo Code shows another native code read barrier that provides consistency.
  • This version TM system uses cache-resident write sets using buffered updates for writes inside transactions. A read from a location that was previously buffered and then lost causes inconsistency, so to maintain consistency, this read barrier avoids reading from any lost buffered location.
  • the COMMIT LOCKING flag is true if the STM is using commit time locking for buffered locations.
  • the jloss_buf() check is utilized on reads from a previously locked location when not using commit-time locking; otherwise, it is utilized on all reads.
  • Pseudo Code B In-place update, native code STM read barrier Type tmRd ⁇ Type>(TxnDesc* txnDesc,Type* addr) ⁇ setrm(add ⁇ ); /* set the read monitor on loaded address */
  • TM systems may combine read monitoring with buffering and write monitoring, as discussed above, and thus also include checking for conflicts to either monitored or buffered lines to maintain consistency.
  • different embodiments may also provide JLOSS flavors that branch on logical combinations of different monitoring and buffering events such as JLOSS.rm.buf (conflict on read monitored or buffered locations), JLOSS.rm.wm, (conflict on read or write monitored locations), or JLOSS.* (conflict on read monitored, write monitored, or buffered location).
  • the architectural interface decouples the JLOSS instruction from the conditions under which it branches by allowing software to setup the conditions - conflict on read/write monitored lines or buffered lines - in a separate control register.
  • This embodiment requires only a single JLOSS instruction encoding and can support future extensions to the set of events on which the JLOSS should branch.
  • a JLOSS instruction is received in flow 1005.
  • the JLOSS instruction may be inserted by a programmer or compiler within either main codes, such as after a load operation to ensure read set consistency, or within a barrier, such as within a read or write barrier.
  • the JLOSS instruction, and its variants discussed above, are in one embodiment recognizable as part of a processor's ISA.
  • decoders are able to decode the opcodes for the JLOSS instructions.
  • the type of conflict or loss is dependent on the variant of the JLOSS instruction. For example, if the received JLOSS instruction is a JLOSS. rm instruction, then it is determined if a read monitored line has been conflictingly accessed by an external write. However, as stated above, any variant on JLOSS may be received, including a JLOSS instruction that allows the user to specify conditions in a control register.
  • TSR 915 information in a transaction status register, such as TSR 915 is utilized to determine if the conditions are satisfied.
  • TSR 915 may include a read monitor status flag, which by default is set to a no conflict value and is updated to a conflict value to indicate a conflict has occurred.
  • a status register is not the only way for determining if a conflict has occurred, and in fact, any known method for determining a loss or conflict may be utilized.
  • the ability of hardware to also hold metadata that software is able to use for acceleration may want a commit instruction to fail if hardware detected any conflicts.
  • a commit instruction may be desirable to clear different combinations of information held in hardware for a transaction, such as metadata, monitors, and buffered lines.
  • hardware supports multiple forms of a commit instruction to allow the commit instruction to specify both the conditions for commit and the information to clear upon commit.
  • a commit instruction to allow the commit instruction to specify both the conditions for commit and the information to clear upon commit.
  • commit instruction 1105 includes an opcode 1110, which is recognizable as part of a processor's ISA - decoders 1115 are able to decode opcode 1110.
  • opcode 1110 includes two portions: commit conditions 1111 and clear control 1112. Commit conditions 1111 are to specify the conditions for a transaction to commit, while commit clear control 1112 specifies the information to clear upon commit of a transaction.
  • both portions includes four values: read monitoring (RM), write monitoring (WM), Buffering (Buf), and metadata (MD).
  • any of the four values are set in portion 1111 - include a value to indicate that the associated attribute/property is a commit condition
  • the corresponding property is a condition for commit.
  • the first bit of conditions 1111 corresponding to read monitor information is set, then the loss of any read monitoring data from monitors 1135 associated with the transaction results in an abort - no commit as a specified condition of the commit instruction failed.
  • the corresponding property is cleared upon the commit.
  • RM in portion 1112 is set, then the read monitor information in monitors 1135 for the transaction are cleared when the transaction is committed.
  • Txcomwm As a first example, a Txcomwm instruction is discussed. This instruction ends the transaction and makes all write-monitored buffered data globally visible if no write monitored data has been lost (success); otherwise, it fails if write monitored data has been lost. Txcomwm sets (or resets) a flag to indicate success (or failure). On success, Txcomwm clears the buffered state of all write monitored data. Txcomwm does not affect read or write monitoring state, allowing software to re-use such state in subsequent transactions; it also does not affect the state of locations that are buffered but not write monitored, allowing software to persist information kept in such locations.
  • the pseudo code below labeled pseudo code C illustrates an algorithmic description of Txcomwm.
  • TSR.LOSS_WM When TSR.LOSS_WM is 0, the BF property of all write monitored buffered BBLKs is atomically cleared and all such buffered data becomes visible to other agents. TCR.IN TX is cleared. Buffered blocks that lack WM are not affected and remain buffered. The CF flag is set upon completion. When TSR.LOSS_WM is 1, the CF flag is cleared and TCR.IN_TX is cleared. CF flag is set to 1 if the operation succeeded and set to 0 for failure. The OF, SF, ZF, AF, and PF flags are set to 0.
  • Pseudo Code D shows how a HASTM system is able to use the Txcomwm instruction to commit a transaction that uses hardware write buffering to avoid undo logging in an in-place update STM.
  • the CACHE_RESIDENT_WRITES flag indicates this execution mode, illustrates an embodiment of how a HASTM
  • Txcomwmrm extends the Txcomwm instruction so that it fails if any read monitored locations have also been lost. This variant is useful for transactions that use only hardware to detect read-set conflicts.
  • the pseudo code below labeled pseudo code E illustrates an algorithmic description of Txcomwmrm.
  • TSR.LOSS_WM or TSR.LOSS_RM are 1
  • the CF flag is cleared and TCR.IN_TX is cleared.
  • the CF flag is set to 1 if the operation succeeded and cleared to 0 for failure.
  • the OF, SF, ZF, AF, and PF flags are set to 0.
  • Pseudo Code F shows the commit algorithm utilizing txcomwmrm instruction for an STM system that uses hardware both to buffer transactional writes and to detect read-set conflicts.
  • the HW_READ_MONITORTNG flag indicates whether the algorithm uses only hardware for read-set conflict detection.
  • Pseudo Code F An embodiment of pseudo code utilizing txcomwmrm instruction
  • TSR.LOSS_WM and T SR. LO S S IRM are O
  • the BF property of all write monitored buffered BBLKs is atomically cleared and all such buffered data becomes visible to other agents.
  • RM, WM and IRM, as well as TCR.IN_TX is cleared. Buffered blocks that lack WM are not affected and remain buffered.
  • the CF flag is set upon completion.
  • TSR.LOSS WM or TSR.LOSS IRM are 1, the CF flag is cleared and TCR.IN TX is cleared.
  • the CF flag is set to 1 if the operation succeeded and cleared to O for failure.
  • the OF, SF, ZF, AF, and PF flags are set to 0.
  • Pseudo Code F An embodiment of an algorithmic description for Txcomwmirmc instruction atomi cal l y
  • a commit instruction is received.
  • a compiler may insert a commit instruction in program code.
  • a call to a commit function is inserted in main code and the commit function, such as those included above in pseudo code, are provided in a library; a compiler may also insert the commit instruction into the commit function within the library.
  • decoders are capable of decoding the commit instruction. From the decoded information, the conditions specified by the opcode of the commit instruction are determined in flow 1210.
  • the opcode may set some flags and reset others to indicate what conditions are to be utilized for commit. If the conditions are not satisfied, then false is returned and the transaction may be separately aborted. However, if the conditions for commit, such as any combination of no loss of read monitors, write monitors, metadata, and/or buffering, then in flow 1215 the clear conditions/control are determined. As an example, any combination of read monitors, write monitors, metadata, and/or buffering for the transaction is determined to be cleared. As a result, the information determined to be cleared is cleared in flow 1225.
  • Optimized Memory Management for UTM As discussed above, Unbounded transactional memory (UTM) architecture and its hardware implementation extend the processor architecture by introducing the following properties: monitoring, buffering and metadata. These combined provide software the means necessary to implement a variety of sophisticated algorithms, including a wide spectrum of transactional memory designs. Each property may be implemented in hardware by either extending the existing cache protocols in the cache implementation or allocating independent new hardware resources.
  • UTM architecture and its hardware implementations potentially provide performance boost over the software only solution (STM) on transactions if it is able to effectively avoid and minimize the incidents such as UTM transaction aborts and subsequent transaction retry-operations.
  • STM software only solution
  • a current privilege level (CPL) based suspension mechanism makes hardware transaction active (enabling hardware accelerated transaction with UTM properties such as buffering and monitoring and enabling the ejection mechanism), while the processor is operating at the privilege level 3 (user mode). Any ring transitions from the ring 3 causes the currently active transactions to automatically suspend (stopping to generate UTM properties and disabling the ejection mechanism). Similarly, any ring transitions back to the ring 3 automatically resumes the previously suspended hardware transaction if it were active.
  • the potential downside of this approach is that use of the hardware transactional memory resources in the kernel code or at any other ring levels, except at the ring 3, are mostly precluded.
  • TxCR transaction control register
  • Figure 13 illustrates an embodiment of hardware to support handling privilege level transitions during execution of transactions, which enables ring 0 transactions on top of the user mode (ring 3) transactions, but also provides for the OS and a hypervisor, such as a Virtual Machine Monitor (VMM) to handle infinite levels of nested interrupts and NMI cases with presence of ring 0 transactions.
  • a storage element such as EFLAGS register 1310, includes transaction enable field (TEF) 1311.
  • TEF 1311 holds an active value it indicates that a transaction is currently active and enabled, while when TEF 1311 holds an inactive value it indicates that a transaction is suspended.
  • TEF 1311 When TEF 1311 holds an active value it indicates that a transaction is currently active and enabled, while when TEF 1311 holds an inactive value it indicates that a transaction is suspended.
  • a transaction begin operation, or other operation at a start of a transaction sets the TEF field 1311 to the active value.
  • a ring level transition event at flow 1300 such as an interrupt, exception, system call, exit of a virtual machine, or enter of a virtual machine
  • the state of PE 0 Eflags register 1310 is pushed on kernel stack 1320 in flow 1301.
  • the TEF field 1311 is cleared/updated to the inactive value to suspend the transaction.
  • the ring level transition event is handled or serviced appropriately while the transaction is suspended.
  • the state of Eflags register 1310 which was pushed onto the stack at flow 1301, is popped at flow 1304 to restore Eflags 1310 with the previous state.
  • the restore of the previous state returns TEF 1311 to the active value and resumes the transaction as active and enabled.
  • the processor pushes the EFLAGS register into the kernel stack and clears "Transaction Enable” bit if it is set, suspending the previously enabled transaction.
  • the processor restores the entire EFLAGS register state for the interrupted thread including the "Transaction Enable” bit from the kernel stack, un-suspending the transaction if it was previously enabled.
  • the processor pushes the EFLAGS register and clears "Transaction Enable” if it is set, suspending the previously enabled transaction.
  • the processor restores the entire EFLAGS register state for the interrupted thread including the "Transaction Enable” bit from the kernel stack, un-suspending the transaction if it was previously enabled.
  • the processor Upon VM-Exit, the processor saves the EFLAGS register of the guest including the Transaction Enable" bit state into the Virtual Machine Control Structure (VMCS) and loads up the EFLAGS register state of the host which "Transaction Enable” bit state is clear, suspending the previously enabled transaction of the guest if enabled.
  • VMCS Virtual Machine Control Structure
  • the processor Upon VM-Enter, the processor restores the EFLAGS register of the guest including the Transaction Enable" bit state from the VMCS, un-suspending the previously enabled transaction of the guest if it was enabled.
  • This enables the kernel mode (ring O) hardware accelerated UTM transactions on top of the user mode (ring 3) hardware accelerated UTM transactions but also provides ways for both the OS and VMM to handle infinite levels of nested interrupts and NMI cases with presence of ring 0 transactions. None of the prior arts provided such mechanisms.
  • a module as used herein refers to any hardware, software, firmware, or a combination thereof. Often module boundaries that are illustrated as separate commonly vary and potentially overlap. For example, a first and a second module may share hardware, software, firmware, or a combination thereof, while potentially retaining some independent hardware, software, or firmware.
  • use of the term logic includes hardware, such as transistors, registers, or other hardware, such as programmable logic devices. However, in another embodiment, logic also includes software or code integrated with hardware, such as firmware or micro-code.
  • a value includes any known representation of a number, a state, a logical state, or a binary logical state. Often, the use of logic levels, logic values, or logical values is also referred to as l's and O's, which simply represents binary logic states. For example, a 1 refers to a high logic level and 0 refers to a low logic level.
  • a storage cell such as a transistor or flash cell, may be capable of holding a single logical value or multiple logical values.
  • the decimal number ten may also be represented as a binary value of 1010 and a hexadecimal letter A. Therefore, a value includes any representation of information capable of being held in a computer system.
  • states may be represented by values or portions of values.
  • a first value such as a logical one
  • a second value such as a logical zero
  • reset and set in one embodiment, refer to a default and an updated value or state, respectively.
  • a default value potentially includes a high logical value, i.e. reset
  • an updated value potentially includes a low logical value, i.e. set.
  • any combination of values may be utilized to represent any number of states.
  • a machine-accessible/readable medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine, such as a computer or electronic system.
  • a machine-accessible medium includes random-access memory (RAM), such as static RAM (SRAM) or dynamic RAM (DRAM); ROM; magnetic or optical storage medium; flash memory devices; electrical storage device, optical storage devices, acoustical storage devices or other form of propagated signal (e.g., carrier waves, infrared signals, digital signals) storage device; etc.
  • a machine may access a storage device through receiving a propagated signal, such as a carrier wave, from a medium capable of holding the information to be transmitted on the propagated signal.
  • a propagated signal such as a carrier wave

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)

Abstract

L'invention concerne un procédé et un appareil d'optimisation d'un système à mémoire transactionnelle non limitée (UTM). Une prise en charge matérielle pour des moniteurs, une mise en mémoire tampon et des métadonnées est prévue, des espaces d'adresses métaphysiques orthogonales pour des métadonnées pouvant être associés séparément à des fils d'exécution et/ou des sous-systèmes logiciels dans des fils d'exécution. De plus, les métadonnées peuvent être conservées par le matériel d'une manière compressée en ce qui concerne les données de manière transparente au logiciel. En outre, en réponse à une instruction/des opérations d'accès aux métadonnées, le matériel est capable de prendre en charge une valeur de métadonnées forcée pour permettre de multiples modes d'exécution transactionnelle. Cependant, si des moniteurs, des données mémorisées, des métadonnées, ou d'autres informations sont perdus ou si des conflits sont détectés, le matériel fournit une variante d'une instruction perdue qui est capable de scruter un registre d'état de transaction quant à cette perte ou ce conflit et de faire passer l'exécution à une étiquette en réponse à la détection de la perte ou du conflit. De manière similaire, de multiples variantes d'une instruction d'engagement sont fournies pour permettre au logiciel de définir les conditions d'engagement et les informations à mettre à zéro lors de l'engagement. En outre, le matériel assure la prise en charge pour permettre la suspension et la reprise de transactions lors de transitions au niveau de l'anneau.
PCT/US2009/048947 2009-06-26 2009-06-26 Optimisations pour un système à mémoire transactionnelle non limitée (utm) WO2010151267A1 (fr)

Priority Applications (7)

Application Number Priority Date Filing Date Title
GB1119084.0A GB2484416B (en) 2009-06-26 2009-06-26 Optimizations for an unbounded transactional memory (utm) system
CN200980160097.XA CN102460376B (zh) 2009-06-26 2009-06-26 无约束事务存储器(utm)系统的优化
JP2012516043A JP5608738B2 (ja) 2009-06-26 2009-06-26 無制限トランザクショナルメモリ(utm)システムの最適化
PCT/US2009/048947 WO2010151267A1 (fr) 2009-06-26 2009-06-26 Optimisations pour un système à mémoire transactionnelle non limitée (utm)
BRPI0925055-7A BRPI0925055A2 (pt) 2009-06-26 2009-06-26 "otimizações para um sistema de memória transacional ilimitada (utm)"
DE112009005006T DE112009005006T5 (de) 2009-06-26 2009-06-26 Optimierungen für ein ungebundenes transaktionales Speichersystem (UTM)
KR1020117031098A KR101370314B1 (ko) 2009-06-26 2009-06-26 언바운디드 트랜잭션 메모리 (utm) 시스템의 최적화

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2009/048947 WO2010151267A1 (fr) 2009-06-26 2009-06-26 Optimisations pour un système à mémoire transactionnelle non limitée (utm)

Publications (1)

Publication Number Publication Date
WO2010151267A1 true WO2010151267A1 (fr) 2010-12-29

Family

ID=43386805

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/048947 WO2010151267A1 (fr) 2009-06-26 2009-06-26 Optimisations pour un système à mémoire transactionnelle non limitée (utm)

Country Status (7)

Country Link
JP (1) JP5608738B2 (fr)
KR (1) KR101370314B1 (fr)
CN (1) CN102460376B (fr)
BR (1) BRPI0925055A2 (fr)
DE (1) DE112009005006T5 (fr)
GB (1) GB2484416B (fr)
WO (1) WO2010151267A1 (fr)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8095824B2 (en) 2009-12-15 2012-01-10 Intel Corporation Performing mode switching in an unbounded transactional memory (UTM) system
US8316194B2 (en) 2009-12-15 2012-11-20 Intel Corporation Mechanisms to accelerate transactions using buffered stores
CN102830953A (zh) * 2012-08-02 2012-12-19 中兴通讯股份有限公司 指令处理方法及网络处理器指令处理装置
US8521995B2 (en) 2009-12-15 2013-08-27 Intel Corporation Handling operating system (OS) transitions in an unbounded transactional memory (UTM) mode
US9436477B2 (en) 2012-06-15 2016-09-06 International Business Machines Corporation Transaction abort instruction
US9442738B2 (en) 2012-06-15 2016-09-13 International Business Machines Corporation Restricting processing within a processor to facilitate transaction completion
US9448797B2 (en) 2012-06-15 2016-09-20 International Business Machines Corporation Restricted instructions in transactional execution
WO2016154115A1 (fr) * 2015-03-20 2016-09-29 Mill Computing, Inc. Mécanismes de sécurité de cpu utilisant des domaines de protection spécifique de fil
CN106030532A (zh) * 2014-03-26 2016-10-12 英特尔公司 用于事务存储器程序的软件回放器
US9477514B2 (en) 2012-06-15 2016-10-25 International Business Machines Corporation Transaction begin/end instructions
US9477515B2 (en) 2009-12-15 2016-10-25 Intel Corporation Handling operating system (OS) transitions in an unbounded transactional memory (UTM) mode
TWI564808B (zh) * 2012-06-15 2017-01-01 萬國商業機器公司 異動處理中之選擇控制指令執行
TWI574207B (zh) * 2012-06-15 2017-03-11 萬國商業機器公司 非異動儲存指令
CN106662998A (zh) * 2014-12-31 2017-05-10 华为技术有限公司 事务冲突检测方法、装置及计算机系统
US9740521B2 (en) 2012-06-15 2017-08-22 International Business Machines Corporation Constrained transaction execution
US9740549B2 (en) 2012-06-15 2017-08-22 International Business Machines Corporation Facilitating transaction completion subsequent to repeated aborts of the transaction
US9747218B2 (en) 2015-03-20 2017-08-29 Mill Computing, Inc. CPU security mechanisms employing thread-specific protection domains
US9760432B2 (en) 2015-07-28 2017-09-12 Futurewei Technologies, Inc. Intelligent code apparatus, method, and computer program for memory
US9766925B2 (en) 2012-06-15 2017-09-19 International Business Machines Corporation Transactional processing
US9792125B2 (en) 2012-06-15 2017-10-17 International Business Machines Corporation Saving/restoring selected registers in transactional processing
US9811337B2 (en) 2012-06-15 2017-11-07 International Business Machines Corporation Transaction abort processing
US9921754B2 (en) 2015-07-28 2018-03-20 Futurewei Technologies, Inc. Dynamic coding algorithm for intelligent coded memory system
US10180803B2 (en) 2015-07-28 2019-01-15 Futurewei Technologies, Inc. Intelligent memory architecture for increased efficiency
US10223214B2 (en) 2012-06-15 2019-03-05 International Business Machines Corporation Randomized testing within transactional execution
GB2568059A (en) * 2017-11-02 2019-05-08 Advanced Risc Mach Ltd Method for locating metadata
US10430199B2 (en) 2012-06-15 2019-10-01 International Business Machines Corporation Program interruption filtering in transactional execution

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9547594B2 (en) * 2013-03-15 2017-01-17 Intel Corporation Instructions to mark beginning and end of non transactional code region requiring write back to persistent storage
US9710245B2 (en) * 2014-04-04 2017-07-18 Qualcomm Incorporated Memory reference metadata for compiler optimization
US9195593B1 (en) * 2014-09-27 2015-11-24 Oracle International Corporation Hardware assisted object memory migration
US9952987B2 (en) * 2014-11-25 2018-04-24 Intel Corporation Posted interrupt architecture
US9451307B2 (en) * 2014-12-08 2016-09-20 Microsoft Technology Licensing, Llc Generating recommendations based on processing content item metadata tags
US11086521B2 (en) * 2015-01-20 2021-08-10 Ultrata, Llc Object memory data flow instruction execution
GB2539428B (en) 2015-06-16 2020-09-09 Advanced Risc Mach Ltd Data processing apparatus and method with ownership table
GB2539429B (en) 2015-06-16 2017-09-06 Advanced Risc Mach Ltd Address translation
GB2539433B8 (en) 2015-06-16 2018-02-21 Advanced Risc Mach Ltd Protected exception handling
US10019360B2 (en) * 2015-09-26 2018-07-10 Intel Corporation Hardware predictor using a cache line demotion instruction to reduce performance inversion in core-to-core data transfers
GB2543306B (en) * 2015-10-14 2019-05-01 Advanced Risc Mach Ltd Exception handling
US10437480B2 (en) 2015-12-01 2019-10-08 Futurewei Technologies, Inc. Intelligent coded memory architecture with enhanced access scheduler
US9996471B2 (en) * 2016-06-28 2018-06-12 Arm Limited Cache with compressed data and tag
US10191936B2 (en) * 2016-10-31 2019-01-29 Oracle International Corporation Two-tier storage protocol for committing changes in a storage system
CN106411945B (zh) * 2016-11-25 2019-08-06 杭州迪普科技股份有限公司 一种Web的访问方法和装置
US10120805B2 (en) * 2017-01-18 2018-11-06 Intel Corporation Managing memory for secure enclaves
US10579377B2 (en) * 2017-01-19 2020-03-03 International Business Machines Corporation Guarded storage event handling during transactional execution
US10324857B2 (en) * 2017-01-26 2019-06-18 Intel Corporation Linear memory address transformation and management
US10795836B2 (en) * 2017-04-17 2020-10-06 Microsoft Technology Licensing, Llc Data processing performance enhancement for neural networks using a virtualized data iterator
GB2562062B (en) * 2017-05-02 2019-08-14 Advanced Risc Mach Ltd An apparatus and method for managing capability metadata
US10732634B2 (en) * 2017-07-03 2020-08-04 Baidu Us Llc Centralized scheduling system using event loop for operating autonomous driving vehicles
SG11202007272QA (en) * 2018-02-02 2020-08-28 Charles Stark Draper Laboratory Inc Systems and methods for policy execution processing
GB2573558B (en) * 2018-05-10 2020-09-02 Advanced Risc Mach Ltd A technique for managing a cache structure in a system employing transactional memory
US10783031B2 (en) * 2018-08-20 2020-09-22 Arm Limited Identifying read-set information based on an encoding of replaceable-information values
US10866890B2 (en) * 2018-11-07 2020-12-15 Arm Limited Method and apparatus for implementing lock-free data structures
CN112306956B (zh) * 2019-07-31 2024-04-12 伊姆西Ip控股有限责任公司 用于元数据维护的方法、装置和计算机程序产品
GB2588134B (en) * 2019-10-08 2021-12-01 Imagination Tech Ltd Verification of hardware design for data transformation component
KR102740976B1 (ko) * 2020-04-07 2024-12-11 에스케이하이닉스 주식회사 데이터 처리 시스템, 이를 위한 메모리 컨트롤러 및 그 동작 방법
CN111552619B (zh) * 2020-04-29 2021-05-25 深圳市道旅旅游科技股份有限公司 日志数据展示方法、装置、计算机设备及存储介质
US11372548B2 (en) * 2020-05-29 2022-06-28 Nvidia Corporation Techniques for accessing and utilizing compressed data and its state information
CN114064302B (zh) * 2020-07-30 2024-05-14 华为技术有限公司 一种进程间通信的方法及装置
CN117056157B (zh) * 2023-10-11 2024-01-23 沐曦集成电路(上海)有限公司 一种寄存器层次化验证方法、存储介质和电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070156994A1 (en) * 2005-12-30 2007-07-05 Akkary Haitham H Unbounded transactional memory systems
US20070198519A1 (en) * 2006-02-22 2007-08-23 David Dice Methods and apparatus to implement parallel transactions
US20080040511A1 (en) * 2006-08-11 2008-02-14 Samsung Electronics Co., Ltd. Method and system for content synchronization and detecting synchronization recursion in networks

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07182241A (ja) * 1993-12-22 1995-07-21 Toshiba Corp キャッシュメモリ制御装置
US7363474B2 (en) * 2001-12-31 2008-04-22 Intel Corporation Method and apparatus for suspending execution of a thread until a specified memory access occurs
US7991965B2 (en) * 2006-02-07 2011-08-02 Intel Corporation Technique for using memory attributes
US7376807B2 (en) * 2006-02-23 2008-05-20 Freescale Semiconductor, Inc. Data processing system having address translation bypass and method therefor
JPWO2008155849A1 (ja) * 2007-06-20 2010-08-26 富士通株式会社 演算処理装置、tlb制御方法、tlb制御プログラムおよび情報処理装置
KR101639672B1 (ko) * 2010-01-05 2016-07-15 삼성전자주식회사 무한 트랜잭션 메모리 시스템 및 그 동작 방법

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070156994A1 (en) * 2005-12-30 2007-07-05 Akkary Haitham H Unbounded transactional memory systems
US20070198519A1 (en) * 2006-02-22 2007-08-23 David Dice Methods and apparatus to implement parallel transactions
US20080040511A1 (en) * 2006-08-11 2008-02-14 Samsung Electronics Co., Ltd. Method and system for content synchronization and detecting synchronization recursion in networks

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9477515B2 (en) 2009-12-15 2016-10-25 Intel Corporation Handling operating system (OS) transitions in an unbounded transactional memory (UTM) mode
US8316194B2 (en) 2009-12-15 2012-11-20 Intel Corporation Mechanisms to accelerate transactions using buffered stores
US8095824B2 (en) 2009-12-15 2012-01-10 Intel Corporation Performing mode switching in an unbounded transactional memory (UTM) system
US8365016B2 (en) 2009-12-15 2013-01-29 Intel Corporation Performing mode switching in an unbounded transactional memory (UTM) system
US8521995B2 (en) 2009-12-15 2013-08-27 Intel Corporation Handling operating system (OS) transitions in an unbounded transactional memory (UTM) mode
US8856466B2 (en) 2009-12-15 2014-10-07 Intel Corporation Mechanisms to accelerate transactions using buffered stores
US8886894B2 (en) 2009-12-15 2014-11-11 Intel Corporation Mechanisms to accelerate transactions using buffered stores
US9069670B2 (en) 2009-12-15 2015-06-30 Intel Corporation Mechanisms to accelerate transactions using buffered stores
US9195600B2 (en) 2009-12-15 2015-11-24 Intel Corporation Mechanisms to accelerate transactions using buffered stores
US10223214B2 (en) 2012-06-15 2019-03-05 International Business Machines Corporation Randomized testing within transactional execution
US10684863B2 (en) 2012-06-15 2020-06-16 International Business Machines Corporation Restricted instructions in transactional execution
US9442737B2 (en) 2012-06-15 2016-09-13 International Business Machines Corporation Restricting processing within a processor to facilitate transaction completion
US9448797B2 (en) 2012-06-15 2016-09-20 International Business Machines Corporation Restricted instructions in transactional execution
US9448796B2 (en) 2012-06-15 2016-09-20 International Business Machines Corporation Restricted instructions in transactional execution
US11080087B2 (en) 2012-06-15 2021-08-03 International Business Machines Corporation Transaction begin/end instructions
US10353759B2 (en) 2012-06-15 2019-07-16 International Business Machines Corporation Facilitating transaction completion subsequent to repeated aborts of the transaction
US9477514B2 (en) 2012-06-15 2016-10-25 International Business Machines Corporation Transaction begin/end instructions
US9436477B2 (en) 2012-06-15 2016-09-06 International Business Machines Corporation Transaction abort instruction
US9529598B2 (en) 2012-06-15 2016-12-27 International Business Machines Corporation Transaction abort instruction
TWI564808B (zh) * 2012-06-15 2017-01-01 萬國商業機器公司 異動處理中之選擇控制指令執行
TWI574207B (zh) * 2012-06-15 2017-03-11 萬國商業機器公司 非異動儲存指令
US9442738B2 (en) 2012-06-15 2016-09-13 International Business Machines Corporation Restricting processing within a processor to facilitate transaction completion
US9740521B2 (en) 2012-06-15 2017-08-22 International Business Machines Corporation Constrained transaction execution
US9740549B2 (en) 2012-06-15 2017-08-22 International Business Machines Corporation Facilitating transaction completion subsequent to repeated aborts of the transaction
US10437602B2 (en) 2012-06-15 2019-10-08 International Business Machines Corporation Program interruption filtering in transactional execution
US10719415B2 (en) 2012-06-15 2020-07-21 International Business Machines Corporation Randomized testing within transactional execution
US10430199B2 (en) 2012-06-15 2019-10-01 International Business Machines Corporation Program interruption filtering in transactional execution
US9766925B2 (en) 2012-06-15 2017-09-19 International Business Machines Corporation Transactional processing
US9772854B2 (en) 2012-06-15 2017-09-26 International Business Machines Corporation Selectively controlling instruction execution in transactional processing
US9792125B2 (en) 2012-06-15 2017-10-17 International Business Machines Corporation Saving/restoring selected registers in transactional processing
US9811337B2 (en) 2012-06-15 2017-11-07 International Business Machines Corporation Transaction abort processing
US9851978B2 (en) 2012-06-15 2017-12-26 International Business Machines Corporation Restricted instructions in transactional execution
US9858082B2 (en) 2012-06-15 2018-01-02 International Business Machines Corporation Restricted instructions in transactional execution
US10606597B2 (en) 2012-06-15 2020-03-31 International Business Machines Corporation Nontransactional store instruction
US10599435B2 (en) 2012-06-15 2020-03-24 International Business Machines Corporation Nontransactional store instruction
US9983881B2 (en) 2012-06-15 2018-05-29 International Business Machines Corporation Selectively controlling instruction execution in transactional processing
US9983883B2 (en) 2012-06-15 2018-05-29 International Business Machines Corporation Transaction abort instruction specifying a reason for abort
US9983915B2 (en) 2012-06-15 2018-05-29 International Business Machines Corporation Facilitating transaction completion subsequent to repeated aborts of the transaction
US9983882B2 (en) 2012-06-15 2018-05-29 International Business Machines Corporation Selectively controlling instruction execution in transactional processing
US9996360B2 (en) 2012-06-15 2018-06-12 International Business Machines Corporation Transaction abort instruction specifying a reason for abort
US10558465B2 (en) 2012-06-15 2020-02-11 International Business Machines Corporation Restricted instructions in transactional execution
US10185588B2 (en) 2012-06-15 2019-01-22 International Business Machines Corporation Transaction begin/end instructions
CN102830953A (zh) * 2012-08-02 2012-12-19 中兴通讯股份有限公司 指令处理方法及网络处理器指令处理装置
CN102830953B (zh) * 2012-08-02 2017-08-25 中兴通讯股份有限公司 指令处理方法及网络处理器指令处理装置
CN106030532A (zh) * 2014-03-26 2016-10-12 英特尔公司 用于事务存储器程序的软件回放器
CN106662998A (zh) * 2014-12-31 2017-05-10 华为技术有限公司 事务冲突检测方法、装置及计算机系统
EP3232320A4 (fr) * 2014-12-31 2018-01-24 Huawei Technologies Co. Ltd. Procédé et appareil de détection de conflit de transaction, et système informatique
US10678700B2 (en) 2015-03-20 2020-06-09 Mill Computing, Inc. CPU security mechanisms employing thread-specific protection domains
US9747218B2 (en) 2015-03-20 2017-08-29 Mill Computing, Inc. CPU security mechanisms employing thread-specific protection domains
WO2016154115A1 (fr) * 2015-03-20 2016-09-29 Mill Computing, Inc. Mécanismes de sécurité de cpu utilisant des domaines de protection spécifique de fil
US10180803B2 (en) 2015-07-28 2019-01-15 Futurewei Technologies, Inc. Intelligent memory architecture for increased efficiency
US9921754B2 (en) 2015-07-28 2018-03-20 Futurewei Technologies, Inc. Dynamic coding algorithm for intelligent coded memory system
US9760432B2 (en) 2015-07-28 2017-09-12 Futurewei Technologies, Inc. Intelligent code apparatus, method, and computer program for memory
GB2568059A (en) * 2017-11-02 2019-05-08 Advanced Risc Mach Ltd Method for locating metadata
GB2568059B (en) * 2017-11-02 2020-04-08 Advanced Risc Mach Ltd Method for locating metadata
US11334499B2 (en) 2017-11-02 2022-05-17 Arm Limited Method for locating metadata

Also Published As

Publication number Publication date
GB2484416B (en) 2015-02-25
GB201119084D0 (en) 2011-12-21
GB2484416A (en) 2012-04-11
BRPI0925055A2 (pt) 2015-07-28
JP5608738B2 (ja) 2014-10-15
CN102460376B (zh) 2016-05-18
CN102460376A (zh) 2012-05-16
JP2012530960A (ja) 2012-12-06
DE112009005006T5 (de) 2013-01-10
KR20130074726A (ko) 2013-07-04
KR101370314B1 (ko) 2014-03-05

Similar Documents

Publication Publication Date Title
KR101370314B1 (ko) 언바운디드 트랜잭션 메모리 (utm) 시스템의 최적화
US8769212B2 (en) Memory model for hardware attributes within a transactional memory system
US9785462B2 (en) Registering a user-handler in hardware for transactional memory event handling
JP6342970B2 (ja) トランザクショナルメモリ(tm)システムにおける読み出し及び書き込み監視属性
US10387324B2 (en) Method, apparatus, and system for efficiently handling multiple virtual address mappings during transactional execution canceling the transactional execution upon conflict between physical addresses of transactional accesses within the transactional execution
JP5860450B2 (ja) ローカルにバッファリングされたデータをサポートするためのキャッシュコヒーレンスプロトコルの拡張
US8612950B2 (en) Dynamic optimization for removal of strong atomicity barriers
US10210018B2 (en) Optimizing quiescence in a software transactional memory (STM) system
US9274855B2 (en) Optimization for safe elimination of weak atomicity overhead
US8719514B2 (en) Software filtering in a transactional memory system
US20110145516A1 (en) Using buffered stores or monitoring to filter redundant transactional accesses and mechanisms for mapping data to buffered metadata
WO2010077842A2 (fr) Espace d'adressage métaphysique pour contenir des métadonnées avec perte en matériel
JP6023765B2 (ja) 無制限トランザクショナルメモリ(utm)システムの最適化
JP6318440B2 (ja) 無制限トランザクショナルメモリ(utm)システムの最適化
GB2519877A (en) Optimizations for an unbounded transactional memory (UTM) system

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200980160097.X

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09846640

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2012516043

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 1119084

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20090626

WWE Wipo information: entry into national phase

Ref document number: 1119084.0

Country of ref document: GB

WWE Wipo information: entry into national phase

Ref document number: 9218/DELNP/2011

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 20117031098

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 112009005006

Country of ref document: DE

Ref document number: 1120090050069

Country of ref document: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09846640

Country of ref document: EP

Kind code of ref document: A1

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: PI0925055

Country of ref document: BR

ENP Entry into the national phase

Ref document number: PI0925055

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20111226

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载