+

WO2002003194A2 - Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program - Google Patents

Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program Download PDF

Info

Publication number
WO2002003194A2
WO2002003194A2 PCT/US2001/018614 US0118614W WO0203194A2 WO 2002003194 A2 WO2002003194 A2 WO 2002003194A2 US 0118614 W US0118614 W US 0118614W WO 0203194 A2 WO0203194 A2 WO 0203194A2
Authority
WO
WIPO (PCT)
Prior art keywords
code
source program
sequence
instruction
invocation
Prior art date
Application number
PCT/US2001/018614
Other languages
French (fr)
Other versions
WO2002003194A3 (en
Inventor
Knud Kirkegaard
Milind Girkar
Paul Grey
Xinmin Tian
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to AU2001266796A priority Critical patent/AU2001266796A1/en
Priority to DE10196389T priority patent/DE10196389T1/en
Priority to GB0301568A priority patent/GB2381356B/en
Publication of WO2002003194A2 publication Critical patent/WO2002003194A2/en
Publication of WO2002003194A3 publication Critical patent/WO2002003194A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation

Definitions

  • the present invention relates generally to compiler optimization techniques and, more specifically, to a multi-entry threading method and
  • Parallel applications are executed by a multiple processor computer system which includes a plurality of processors interconnected so as to exchange
  • Figure 1A is a block diagram of a distributed-memory multiple processor
  • a computer system 100 includes
  • Each processing module 120 includes a processor 122 and a memory 124. In the computer system 100, any number of processing modules can be interconnected as shown.
  • Figure IB is a block diagram of a shared-memory multiple processor
  • a computer system 150 includes multiple processors 160 connected to a shared memory 170.
  • processors 160 connected to a shared memory 170.
  • shared memory 170 In one embodiment,
  • memory 170 includes exclusive areas occupied by each processor 160 and a common area accessed by all processors. In the computer system 150, only a limited number of processors 160 may be interconnected, due to the Hmitations imposed by the shared memory 170.
  • a compiler looks at the entire source program, collects and reorganizes the instructions, and translates the source program into
  • One compiler technique involves use of outlining technology, which
  • Each outlined subroutine is then sent to one thread in a processor of parallel
  • Figure 1 A is a block diagram of one embodiment for a disti ⁇ aded-memory multiple processor computer system.
  • Figure IB is a block diagram of one embodiment for a shared-memory multiple processor computer system.
  • Figure 2 is a block diagram of one embodiment for a computer system.
  • Figure 3A is a block diagram of one embodiment for a process of obtaining an executable program in a computer system.
  • Figure 3B is a block diagram of one embodiment for a process of obtaining a parallel executable program in a computer system.
  • Figure 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.
  • the acts are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
  • registers or other such information storage, transmission or display devices are registers or other such information storage, transmission or display devices.
  • the present invention also relates to an apparatus for performing the
  • This apparatus may be specially constructed for the required
  • Such a computer may comprise a general purpose computer, selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program stored in the computer.
  • program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • a computer readable storage medium such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • any of the methods according to the present invention can be implemented in hard-wired circuitry, by programming a general purpose processor or by any combination of hardware and software.
  • One of skill in the art can be implemented in hard-wired circuitry, by programming a general purpose processor or by any combination of hardware and software.
  • the invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • the required structure for a variety of tasks are performed by remote processing devices that are linked through a communications network.
  • sequences of instructions designed to implement the methods can be compiled for execution on a variety of hardware platforms and for interface to a variety of
  • Figure 2 is a block diagram of one embodiment for a computer system 200.
  • Computer system 200 includes a system bus 201, or other communications module similar to the system bus, for communicating information, and a processing module, such as processor 202, connected to bus 201 for processing
  • Computer system 200 further includes a main memory 204, such as a random access memory (RAM) or other dynamic storage device, connected to
  • bus 201 for storing information and instructions to be executed by processor 202.
  • Main memory 204 may also be used for storing temporary variables or other intermediate information during execution of instructions by processor 202.
  • Computer system 200 also includes a read only memory (ROM) 206, and/ or other similar static storage device, connected to bus 201, for storing static information
  • ROM read only memory
  • static storage device connected to bus 201, for storing static information
  • An optional data storage device 207 such as a magnetic disk or optical
  • System bus 201 is connected to an
  • Computer system 200 may also be connected via bus 210 to a display device 221,
  • an alphanumeric input device 222 such as a keyboard including alphanumeric and other keys, is
  • cursor control device 223 Another type of user input device is cursor control device 223,
  • cursor such as a conventional mouse, touch mouse, trackball, or other type of cursor
  • direction keys for communicating direction information and command selection
  • a fully loaded computer system may optionally include video, camera, speakers, sound card, and many other similar conventional options.
  • a communication device 224 is also connected to bus 210 for accessing
  • device 224 may include a modem, a network interface card, or other well known
  • the computer system 200 may be connected to a number of servers via a conventional network infrastructure.
  • Figure 3A is a block diagram of one embodiment for a process of obtaining
  • source file 310 includes source code written by programmers in high-level languages, for
  • FORTRAN For example FORTRAN or C.
  • the source code instructions must be translated into machine language.
  • the translation process involves several processing steps and
  • the high-level language source code instructions are configured to:
  • source file 310 is passed through a compiler (not shown).
  • the compiler translates the high-level instructions into object code stored within object files 320.
  • the compiler needs to generate the object code in a form suitable for parallel execution.
  • the object code includes multiple modules, each module
  • Some modules may be stored in runtime
  • the linker 330 combines the modules and gives real values to addresses within the modules, thereby producing an executable program 350.
  • Figure 3B is a block diagram of one embodiment for a process of obtaining
  • serial source program 360 and a serial source program with OpenMP directives 365 are compiled by a compiler (not shown), which creates parallel executable
  • FIG. 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.
  • a source program to be compiled and executed by a multi-processor computer system needs to be parallelized in order to fully take advantage of the system's resources. Therefore, depending on the number of
  • the source program includes multiple loops of code, also known as parallel regions.
  • a parallel region or loop is defined as a code block of the program that is to be executed by the multiple threads in parallel.
  • One example of a source program including multiple parallel regions or loops is as follows:
  • #include ⁇ stdio , h> Idefine NSIZE 200 main ( ) ⁇ int x, i, j ; float a [NSIZE], b [NSIZE], c [NSIZE];
  • Each thread receives a portion of the loop and executes the portion in
  • Parallel regions or loops are sequences of the code representing the fundamental parallel constructs that indicate code to be executed
  • processing block 410 the source program or source code is received and read by the compiler.
  • processing block 420 a first parallel construct within the routine to be executed in parallel is located by the compiler.
  • a start code is generated by the compiler.
  • the start code is a new threaded entry code indicating the beginning
  • an invocation code is generated by the compiler.
  • the invocation code is an invocation
  • the new threaded entry code is inserted before the
  • the new entry code is inserted prior to a first instruction of the parallel construct.
  • the invocation instruction is inserted before the new threaded entry code in the source program.
  • a stop code is inserted after the parallel construct
  • the stop code is a threaded return instruction, which is inserted after a last instruction of the parallel construct.
  • the threaded return instruction signals the run-time system to perform the synchronization and return to the main program.
  • a new location instruction is generated by the
  • the location instruction is a label instruction
  • the location instruction is inserted after the threaded return
  • the jump is a prefix before the new threaded entry to direct the system to continue execution of the source program at the location instruction.
  • the jump is a prefix before the new threaded entry to direct the system to continue execution of the source program at the location instruction.
  • blocks 420 through 495 are processed again with respect to the new parallel

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

A method and apparatus for compiling a source program are described. Multiple predetermined sequences within the source program are located. A start code is inserted in the source program prior to a first instruction of each predetermined sequence. An invocation code is inserted in the source program prior to the start code, the invocation code addressing the start code and transferring each sequence to a system for execution. Finally, a stop code is inserted in the source program after a last instruction of each sequence, the stop code signaling to the system to step execution of the sequence.

Description

MULTI-ENTRY THREADING METHOD AND APPARATUS FOR AUTOMATIC AND DIRECTIVE-GUIDED PARALLELIZATION OF A
SOURCE PROGRAM
FIELD OF THE INVENTION
The present invention relates generally to compiler optimization techniques and, more specifically, to a multi-entry threading method and
apparatus for automatic and directive-guided parallelization of a source program.
BACKGROUND OF THE INVENTION The ever-increasing complexity of computational problems that require a solution is reflected in the increase in the speed and performance of computers designed to solve such problems. As CPUs become faster and multiprocessor
system configurations multiply, program performance must be improved and
efficient methods for compiling and executing source programs must be used.
Parallel processing is rapidly becoming mainstream technology influencing
architecture and software design from the home computer market to business applications. Parallel applications are executed by a multiple processor computer system which includes a plurality of processors interconnected so as to exchange
data with each other. Figure 1A is a block diagram of a distributed-memory multiple processor
computer system. As illustrated in Figure 1A, a computer system 100 includes
multiple processing modules 120. Each processing module 120 includes a processor 122 and a memory 124. In the computer system 100, any number of processing modules can be interconnected as shown.
Figure IB is a block diagram of a shared-memory multiple processor
computer system. As illustrated in Figure IB, a computer system 150 includes multiple processors 160 connected to a shared memory 170. In one embodiment,
memory 170 includes exclusive areas occupied by each processor 160 and a common area accessed by all processors. In the computer system 150, only a limited number of processors 160 may be interconnected, due to the Hmitations imposed by the shared memory 170.
Parallel processing methods use automatic tools, such as automatic
parallelizing compilers, which compile the source programs and facilitate parallel
processing of the programs. A compiler looks at the entire source program, collects and reorganizes the instructions, and translates the source program into
object code executable by the computer. One compiler technique involves use of outlining technology, which
transforms selected regions of a program into outlined or separate subroutines.
Each outlined subroutine is then sent to one thread in a processor of parallel
execution. Parallelization using outlining technology is described in detail in Jyh- Herng Chow et al., Automatic Parallelization for Symmetric Shared-Memory
Multiprocessors, Proceedings of CASCON '96, Toronto, Canada, 12-14 November, 1996. However, parallelization of a source program using outlining technology
increases the complexity of compiler generating multi-threaded code. Because the original code is split into separate subroutines, a number of scalar optimizations, which were originally applied to one single subroutine, will have to be invoked for several different subroutines, creating a procedure that generates less efficient code and is time consuming. BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example and not Hmltation
in the figures of the accompanying drawings, in which like references indicate
similar elements and in which:
Figure 1 A is a block diagram of one embodiment for a distiϊbuted-memory multiple processor computer system.
Figure IB is a block diagram of one embodiment for a shared-memory multiple processor computer system.
Figure 2 is a block diagram of one embodiment for a computer system.
Figure 3A is a block diagram of one embodiment for a process of obtaining an executable program in a computer system.
Figure 3B is a block diagram of one embodiment for a process of obtaining a parallel executable program in a computer system.
Figure 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.
DETAILED DESCRIPTION
In the following detailed description of embodiments of the invention, reference is made to the accompanying drawings in which like references indicate
similar elements, and in which are shown by way of illustration specific embodiments in which the invention may be practiced.
Numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one
skilled in the art that the present invention may be practiced without these
specific details. In some instances, well-known structures and devices are shown
in block diagram form, rather than in detail, in order to avoid obscuring the present invention. These embodiments are described in sufficient detail to enable
those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical, electrical and
other changes may be made without departing from the scope of the present
invention.
Some portions of the detailed descriptions that follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively
convey the substance of their work to others skilled in the art. An algorithm is
here, and generally, conceived to be a self-consistent sequence of acts leading to a
desired result. The acts are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise
as apparent from the following discussion, it is appreciated that throughout the
description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and
processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities
within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or
registers or other such information storage, transmission or display devices.
The present invention also relates to an apparatus for performing the
operations herein. This apparatus may be specially constructed for the required
purposes, or it may comprise a general purpose computer, selectively activated or reconfigured by a computer program stored in the computer. Such a computer
program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. The algorithms and displays presented herein are not inherently related to
any particular computer or other apparatus. Various general purpose systems
may be used with programs in accordance with the teachings herein, or it may
prove convenient to construct more specialized apparatus to perform the required
method. For example, any of the methods according to the present invention can be implemented in hard-wired circuitry, by programming a general purpose processor or by any combination of hardware and software. One of skill in the art
will immediately appreciate that the invention can be practiced with computer system configurations other than those described below, including hand-held
devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and
the like. The invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. The required structure for a variety
of these systems will appear from the description below.
The methods of the invention are described in terms of computer software.
If written in a programming language conforming to a recognized standard,
sequences of instructions designed to implement the methods can be compiled for execution on a variety of hardware platforms and for interface to a variety of
operating systems. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a
variety of programming languages may be used to implement the teachings of the invention as described herein. Furthermore, it is common in the art to speak of
software, in one form or another (e.g., program, procedure, application...), as taking an action or causing a result. Such expressions are merely a shorthand way of saying that execution of the software by a computer causes the processor of the
computer to perform an action or produce a result.
Figure 2 is a block diagram of one embodiment for a computer system 200.
Computer system 200 includes a system bus 201, or other communications module similar to the system bus, for communicating information, and a processing module, such as processor 202, connected to bus 201 for processing
information. Computer system 200 further includes a main memory 204, such as a random access memory (RAM) or other dynamic storage device, connected to
bus 201, for storing information and instructions to be executed by processor 202.
Main memory 204 may also be used for storing temporary variables or other intermediate information during execution of instructions by processor 202.
Computer system 200 also includes a read only memory (ROM) 206, and/ or other similar static storage device, connected to bus 201, for storing static information
and instructions for processor 202. An optional data storage device 207, such as a magnetic disk or optical
disk, and its corresponding drive may also be connected to computer system 200 for storing information and instructions. System bus 201 is connected to an
external bus 210, which connects computer system 200 to other devices. Computer system 200 may also be connected via bus 210 to a display device 221,
such as a cathode ray tube (CRT) or a liquid crystal display (LCD), for displaying information to a computer user. For example, graphical or textual information may be presented to the user on display device 221. Typically, an alphanumeric input device 222, such as a keyboard including alphanumeric and other keys, is
connected to bus 210 for communicating information and/ or command selections to processor 202. Another type of user input device is cursor control device 223,
such as a conventional mouse, touch mouse, trackball, or other type of cursor
direction keys, for communicating direction information and command selection
to processor 202 and for controlling cursor movement on display 221. A fully loaded computer system may optionally include video, camera, speakers, sound card, and many other similar conventional options.
A communication device 224 is also connected to bus 210 for accessing
remote computers or servers, via the Internet, for example. The communication
device 224 may include a modem, a network interface card, or other well known
interface devices, such as those used for interfacing with Ethernet, Token-ring, or other types of networks. In any event, in this manner, the computer system 200 may be connected to a number of servers via a conventional network infrastructure.
Figure 3A is a block diagram of one embodiment for a process of obtaining
an executable program in a computer system. According to Figure 3A, source file 310 includes source code written by programmers in high-level languages, for
example FORTRAN or C. The source code instructions must be translated into machine language. The translation process involves several processing steps and
includes compilation of the source code instructions.
In one embodiment, the high-level language source code instructions
within source file 310 are passed through a compiler (not shown). The compiler translates the high-level instructions into object code stored within object files 320.
In one embodiment, considering a parallel computer system, the compiler needs to generate the object code in a form suitable for parallel execution. In one
embodiment, the object code includes multiple modules, each module
incorporating one or more subroutines. Some modules may be stored in runtime
library 340.
Finally, the object code is passed through a linker 330. The linker 330 combines the modules and gives real values to addresses within the modules, thereby producing an executable program 350.
Figure 3B is a block diagram of one embodiment for a process of obtaining
a parallel executable program in a computer system. As shown in Figure 3B, a
serial source program 360 and a serial source program with OpenMP directives 365 are compiled by a compiler (not shown), which creates parallel executable
code 370. The parallel executable code 370 is then linked to a parallel runtime library 380. The runtime library 380 creates multiple thread entries 385, which are then sent to a SMP machine 390 for execution. Figure 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.
In one embodiment, a source program to be compiled and executed by a multi-processor computer system needs to be parallelized in order to fully take advantage of the system's resources. Therefore, depending on the number of
processors, multiple threads must be generated to run the source program in
parallel.
The source program includes multiple loops of code, also known as parallel regions. A parallel region or loop is defined as a code block of the program that is to be executed by the multiple threads in parallel. One example of a source program including multiple parallel regions or loops is as follows:
#include <stdio , h> Idefine NSIZE 200 main ( ) { int x, i, j ; float a [NSIZE], b [NSIZE], c [NSIZE];
/* parallel loop */ #pragma omp parallel for schedule (static) private (i) shared(a,b) for (i=0; KNSIZE; i++) { b[i] = (float) (i * 2) ; a[i] = b[i] + 100; }
/* parallel region */ ipragma omp parallel share (b,c) { x = 100 ; /* work-sharing loop */
#pragma omp for schedule (dynamic) firstprivate (x) private ( i ) for (j=100 ; j <200 ; j ++ ) { b [j ] = ( float ) (j * 2 ) ι c [j ] = b [j ] + 100 * x; } } }
Each thread receives a portion of the loop and executes the portion in
parallel with other threads. Parallel regions or loops are sequences of the code representing the fundamental parallel constructs that indicate code to be executed
in parallel.
Referring to Figure 4, at processing block 410, the source program or source code is received and read by the compiler. At processing block 420, a first parallel construct within the routine to be executed in parallel is located by the compiler.
At processing block 430, a start code is generated by the compiler. In one embodiment, the start code is a new threaded entry code indicating the beginning
of a parallel construct. At processing block 440, an invocation code is generated by the compiler. In one embodiment, the invocation code is an invocation
instruction, which passes the threaded entry identified by the new threaded entry code to the multi-threaded run-time system for parallel execution on multi¬
processor systems.
At processing block 450, the new threaded entry code is inserted before the
parallel construct in the source program. In one embodiment, the new entry code is inserted prior to a first instruction of the parallel construct. At processing block 460, the invocation instruction is inserted before the new threaded entry code in the source program.
At processing block 470, a stop code is inserted after the parallel construct
in the source program. In one embodiment, the stop code is a threaded return instruction, which is inserted after a last instruction of the parallel construct. The threaded return instruction signals the run-time system to perform the synchronization and return to the main program.
At processing block 480, a new location instruction is generated by the
compiler. In one embodiment, the location instruction is a label instruction
indicating the next instruction to be executed by the multiprocessor system. At processing block 485, the location instruction is inserted after the threaded return
instruction.
At processing block 490, a jump instruction is generated and is inserted
before the new threaded entry to direct the system to continue execution of the source program at the location instruction. In one embodiment, the jump
instruction is inserted subsequent to the invocation instruction and prior to the
new threaded entry code.
At processing block 495, a decision is made whether the routine contains any new parallel constructs. If the routine contains another parallel construct,
then blocks 420 through 495 are processed again with respect to the new parallel
construct. Otherwise, if the routine does not contain any new parallel constructs,
the procedure stops. In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing
from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an
illustrative rather than a restrictive sense.

Claims

CLAIMSWhat is claimed is:
1. A method for compiling a source program comprising: locating a plurality of predetermined sequences within said source
program;
inserting a start code in said source program prior to a first instruction of
each sequence of said plurality of predetermined sequences; inserting an invocation code in said source program prior to said start
code, said invocation code addressing said start code and transferring said each sequence to a system for execution; and
inserting a stop code in said source program after a last instruction of said each sequence of said plurality of sequences, said stop code signaling to said
system to stop execution of said each sequence.
2. The method according to claim 1, further comprising: inserting a location instruction after said stop code;
generating a jump instruction to start execution of said source program at
said location instruction; and inserting said jump instruction prior to said start code and subsequent to
said invocation code.
3. The method according to claim 1, further comprising: receiving said source program; and reading said source program.
4. The method according to claim 1, wherein each sequence of said plurality of predetermined sequences is a parallel construct.
5. The method according to claim 1, wherein said system is a multi¬
threaded run-time system capable of executing said each sequence in parallel.
6. The method according to claim 1, wherein each sequence of said
plurality of predetermined sequences is an Open MP parallel construct.
7. The method according to claim 1, wherein inserting said start code further comprises generating said start code for insertion.
8. The method according to claim 1, wherein inserting said invocation
code further comprises generating said invocation code for insertion.
9. The method according to claim 2, wherein inserting said location
instruction further comprises generating said location instruction for insertion.
10. A computer readable medium containing executable instructions, which, when executed in a processing system, cause said system to perform a method for compiling a source program, said method comprising:
locating a plurality of predetermined sequences within said source program;
inserting a start code in said source program prior to a first instruction of each sequence of said plurality of predetermined sequences;
inserting an invocation code in said source program prior to said start code, said invocation code addressing said start code and transferring said each
sequence to a system for execution; and
inserting a stop code in said source program after a last instruction of said
each sequence, said stop code signaling to said system to stop execution of said each sequence.
11. The computer readable medium according to claim 10, wherein said
method further comprises:
inserting a location instruction after said stop code;
generating a jump instruction to start execution of said source program at
said location instruction; and
inserting said jump instruction prior to said start code and subsequent to
said function code.
12. The computer readable medium according to claim 10, wherein said method further comprises: receiving said source program; and
reading said source program.
13. The computer readable medium according to claim 10, wherein each
sequence of said plurality of predetermined sequences is a parallel construct.
14. The computer readable medium according to claim 10, wherein said
system is a multi-threaded run-time system capable of executing said first sequence in parallel.
15. The computer readable medium according to claim 10, wherein each
sequence of said pluraHty of predetermined sequences is an Open MP parallel construct.
16. The computer readable medium according to claim 10, wherein inserting said start code further comprises generating said start code for insertion.
17. The computer readable medium according to claim 10, wherein
inserting said invocation code further comprises generating said invocation code
for insertion.
18. The computer readable medium according to claim 11, wherein
inserting said location instruction further comprises generating said location instruction for insertion.
19. An apparatus comprising:
a memory to store a source program; and a processor coupled to said memory
to locate a plurality of predetermined sequences within said source program;
to insert a start code in said source program prior to a first instruction of each sequence of said pluraHty of predetermined sequences;
to insert an invocation code in said source program prior to said start code, said invocation code addressing said start code and transferring said each sequence to a system for execution; and to insert a stop code in said source program after a last instruction of
said each sequence of said pluraHty of sequences, said stop code signaling
to said system to stop execution of said each sequence.
20. The apparatus according to claim 19, wherein said processor further: inserts a location instruction after said stop code;
generates a jump instruction to start execution of said source program at
said location instruction; and inserts said jump instruction prior to said start code and subsequent to said function code.
21. The apparatus according to claim 19, wherein said processor further: receives said source program; and reads said source program.
22. The apparatus according to claim 19, wherein each sequence of said plurality of predetermined sequences is a paraHel construct.
23. The apparatus according to claim 19, wherein said system is a multi¬
threaded run-time system capable of executing said each sequence in parallel.
24. The apparatus according to claim 19, wherein each sequence of said plurality of predetermined sequences is an Open MP paraHel construct.
25. The apparatus according to claim 19, wherein prior to inserting said
start code, said processor further generates said start code for insertion.
26. The apparatus according to claim 19, wherein prior to inserting said
invocation code, said processor further generates said invocation code for
insertion.
27. The apparatus according to claim 20, wherein prior to inserting said location instruction, said processor further generates said location instruction for insertion.
PCT/US2001/018614 2000-06-30 2001-06-08 Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program WO2002003194A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU2001266796A AU2001266796A1 (en) 2000-06-30 2001-06-08 Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program
DE10196389T DE10196389T1 (en) 2000-06-30 2001-06-08 Multi-entry threading method and device for automatic and directive-directed parallelization of a source program
GB0301568A GB2381356B (en) 2000-06-30 2001-06-08 Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US60808700A 2000-06-30 2000-06-30
US09/608,087 2000-06-30

Publications (2)

Publication Number Publication Date
WO2002003194A2 true WO2002003194A2 (en) 2002-01-10
WO2002003194A3 WO2002003194A3 (en) 2003-01-23

Family

ID=24434971

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/018614 WO2002003194A2 (en) 2000-06-30 2001-06-08 Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program

Country Status (6)

Country Link
CN (1) CN1210650C (en)
AU (1) AU2001266796A1 (en)
DE (1) DE10196389T1 (en)
GB (1) GB2381356B (en)
TW (1) TW525090B (en)
WO (1) WO2002003194A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1569104A3 (en) * 2004-01-09 2006-05-03 Interuniversitair Microelektronica Centrum Vzw An automated method for performing parallelization of sequential code and a computerized system adapted therefore
EP2315118A1 (en) * 2009-10-20 2011-04-27 Bull Hn Information Systems Inc. Method and apparatus for enabling parallel processing during execution of a cobol source program using two-stage compilation
US20140189663A1 (en) * 2009-10-20 2014-07-03 Cynthia S. Guenthner Method and apparatus enabling multi threaded program execution for a cobol program including openmp directives by utilizing a two-stage compilation process

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7478376B2 (en) * 2004-12-02 2009-01-13 International Business Machines Corporation Computer program code size partitioning method for multiple memory multi-processing systems
US7487496B2 (en) * 2004-12-02 2009-02-03 International Business Machines Corporation Computer program functional partitioning method for heterogeneous multi-processing systems

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB8610658D0 (en) * 1986-05-01 1986-06-04 British Petroleum Co Plc Flow control
US5278986A (en) * 1991-12-13 1994-01-11 Thinking Machines Corporation System and method for compiling a source code supporting data parallel variables
GB9305263D0 (en) * 1993-03-15 1993-05-05 Univ Westminster Parrallel computation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1569104A3 (en) * 2004-01-09 2006-05-03 Interuniversitair Microelektronica Centrum Vzw An automated method for performing parallelization of sequential code and a computerized system adapted therefore
EP2315118A1 (en) * 2009-10-20 2011-04-27 Bull Hn Information Systems Inc. Method and apparatus for enabling parallel processing during execution of a cobol source program using two-stage compilation
US20140189663A1 (en) * 2009-10-20 2014-07-03 Cynthia S. Guenthner Method and apparatus enabling multi threaded program execution for a cobol program including openmp directives by utilizing a two-stage compilation process
US8869126B2 (en) * 2009-10-20 2014-10-21 Bull Hn Information Systems Inc. Method and apparatus enabling multi threaded program execution for a Cobol program including OpenMP directives by utilizing a two-stage compilation process

Also Published As

Publication number Publication date
GB0301568D0 (en) 2003-02-26
WO2002003194A3 (en) 2003-01-23
GB2381356A (en) 2003-04-30
AU2001266796A1 (en) 2002-01-14
DE10196389T1 (en) 2003-06-18
GB2381356B (en) 2004-09-22
CN1446334A (en) 2003-10-01
TW525090B (en) 2003-03-21
CN1210650C (en) 2005-07-13

Similar Documents

Publication Publication Date Title
US8037465B2 (en) Thread-data affinity optimization using compiler
US9424013B2 (en) System and method for reducing transactional abort rates using compiler optimization techniques
Hermanns Parallel programming in Fortran 95 using OpenMP
US8645930B2 (en) System and method for obfuscation by common function and common function prototype
EP0806725B1 (en) Method and apparatus for early insertion of assembler code for optimization
US5778212A (en) Interprocedural analysis user interface
EP2815313B1 (en) Rasterization of compute shaders
Allen et al. A framework for determining useful parallelism
US20130283250A1 (en) Thread Specific Compiler Generated Customization of Runtime Support for Application Programming Interfaces
US20130086348A1 (en) Lock-Clustering Compilation for Software Transactional Memory
US8341615B2 (en) Single instruction multiple data (SIMD) code generation for parallel loops using versioning and scheduling
Krishnamurthy et al. Optimizing parallel programs with explicit synchronization
EP1208425A2 (en) Computer system, computer-readable storage medium and method of operating same, and method of operating that system
US8966461B2 (en) Vector width-aware synchronization-elision for vector processors
Noaje et al. Source-to-source code translator: OpenMP C to CUDA
US6301652B1 (en) Instruction cache alignment mechanism for branch targets based on predicted execution frequencies
US20130086565A1 (en) Low-level function selection using vector-width
Su et al. Automatic generation of fast BLAS3-GEMM: A portable compiler approach
WO2002003194A2 (en) Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program
Chamberlain et al. Factor-join: A unique approach to compiling array languages for parallel machines
Shei et al. MATLAB parallelization through scalarization
Trancoso et al. DDMCPP: The data-driven multithreading C pre-processor
US7162718B1 (en) Language extension for light weight threading in a JVM
Iancu et al. Performance portable optimizations for loops containing communication operations
Tao et al. Automatic parallelization of programs via software stream rewriting

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase

Ref document number: 0301568

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20010608

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: IN/PCT/2002/01701/MU

Country of ref document: IN

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWE Wipo information: entry into national phase

Ref document number: 018121241

Country of ref document: CN

RET De translation (de og part 6b)

Ref document number: 10196389

Country of ref document: DE

Date of ref document: 20030618

Kind code of ref document: P

WWE Wipo information: entry into national phase

Ref document number: 10196389

Country of ref document: DE

122 Ep: pct application non-entry in european phase
REG Reference to national code

Ref country code: DE

Ref legal event code: 8607

NENP Non-entry into the national phase

Ref country code: JP

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载