+

WO2002031678A2 - Systeme et procede destines a un systeme d'exploitation dans un dispositif logique reconfigurable - Google Patents

Systeme et procede destines a un systeme d'exploitation dans un dispositif logique reconfigurable Download PDF

Info

Publication number
WO2002031678A2
WO2002031678A2 PCT/GB2001/004551 GB0104551W WO0231678A2 WO 2002031678 A2 WO2002031678 A2 WO 2002031678A2 GB 0104551 W GB0104551 W GB 0104551W WO 0231678 A2 WO0231678 A2 WO 0231678A2
Authority
WO
WIPO (PCT)
Prior art keywords
hardware
software
reconfigurable logic
operating system
logic device
Prior art date
Application number
PCT/GB2001/004551
Other languages
English (en)
Other versions
WO2002031678A3 (fr
Inventor
Alex Wilson
Munther Hindi
Original Assignee
Celoxica Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Celoxica Limited filed Critical Celoxica Limited
Priority to AU2001295716A priority Critical patent/AU2001295716A1/en
Publication of WO2002031678A2 publication Critical patent/WO2002031678A2/fr
Publication of WO2002031678A3 publication Critical patent/WO2002031678A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture

Definitions

  • the present invention relates generally to a system in which an operating system is implemented in reconfigurable logic, and more particularly, in one or more FPGAs.
  • a software-controlled processor is usually slower than hardware dedicated to that function.
  • a way of overcoming this problem is to use a special software-controlled processor such as a RISC processor which can be made to function more quickly for limited purposes by having its parameters (for instance size, instruction set etc.) tailored to the desired functionality.
  • An operating system (sometimes abbreviated as "OS") is the program that, after being initially loaded into the computer by a boot program, manages all the other programs in a computer.
  • the other programs are called applications.
  • the applications make use of the operating system by making requests for services through a defined application program interface.
  • users can interact directly with the operating system through an interface such as a command language.
  • a system that utilizes an operating system stored in reconfigurable logic.
  • the system includes a reconfigurable logic device that has at least a portion, and preferably all, of an operating system stored on it.
  • the operating system is operable to manage one or multiple hardware and or software processes including applications.
  • the operating system is capable of supporting multiple processes (multitasking), and can determine an order in which the processes should run and an amount of time allowed for each process to execute before giving another process a turn.
  • the operating system is operable to manage the sharing of internal memory among multiple processes, handle input and output to and from an attached hardware device(s), such as hard disks, printers, and dial-up ports, send messages to the applications or interactive user (or to a system operator) about the status of an operation and any errors that may have occurred, and/or offload the management of what are called batch jobs (for example, printing) so that the initiating application is -freed from this work.
  • the reconfigurable logic device includes one or more
  • the reconfigurable logic device interfaces with at least one of a hardware register, buffer management, a microprocessor, an interrupt handler, and a memory transfer.
  • a second reconfigurable logic device can be provided for reconfiguring (e.g., changing the configuration of) the first reconfigurable logic device.
  • the reconfigurable logic device is in communication with a code library, where the code library includes at least one driver for interfacing a hardware device or devices with an application.
  • a driver associated with the hardware device is retrieved from the code library.
  • the driver is utilized to ⁇ configure the reconfigurable logic device to communicate with the hardware device. This can be a dynamic process that occurs automatically upon receiving a request for access to the hardware device.
  • a method for controlling processes of a system is also provided. According to the method, an operating system is initiated. At least a portion, and preferably all, of the operating system is stored on a reconfigurable logic device. The operating system is utilized to manage at least one hardware and/or software process including applications.
  • an optimal functionality between hardware and software of the system can be dete-rmined, and implementation of functionality between the hardware and software can be partitioned based on the determined optimal functionality.
  • Figure 1 is a schematic diagram of a hardware implementation of one embodiment of the present invention.
  • Figure 2 illustrates a method for controlling processes of a system according to an embodiment of the present invention
  • Figure 3 is a flow diagram of a process for automatically partitioning a behavioral description of an electronic system into the optimal configuration of hardware and software according to a preferred embodiment of the present invention
  • Figure . 4 is a flow diagram schematically showing the codesign system of one embodiment of the invention.
  • Figure 6 is a block diagram of the platform used to implement the second example circuit produced by an embodiment of the invention.
  • Figure 7 is a picture of the circuit of Figure 6;
  • Figure 8 is a block diagram of the system of Figure 6;
  • Figure 9 is a simulation of the display produced by the example of Figures 6 to 8;
  • Figure 10 is a block diagram of a third example target system.
  • Figures 11A-D together form a block diagram showing a dependency graph for calculation of the variables in the Figure 10 example.
  • a preferred embodiment of a system in accordance with the present invention is preferably practiced in the context of a personal computer such as an IBM compatible personal computer, Apple Macintosh computer or UNIX based workstation.
  • a representative hardware environment is depicted in Figure 1, which illustrates a typical hardware configuration of a workstation in accordance with a preferred embodiment having a central processing unit 110, such as a microprocessor, and a number of other units interconnected via a system bus 112.
  • the workstation shown in Figure 1 includes a Random Access Memory (RAM) 114, Read Only Memory (ROM) 116, an I/O adapter 118 for connecting peripheral devices such as disk storage units 120 to the bus 112, a user interface adapter 122 for connecting a keyboard 124, a mouse 126, a speaker 128, a microphone 132, and/or other user interface devices such as a touch screen (not shown) to the bus 112, communication adapter 134 for connecting the workstation to a communication network (e.g., a data processing network) and a display adapter 136 for connecting the bus 112 to a display device 138.
  • a communication network e.g., a data processing network
  • display adapter 136 for connecting the bus 112 to a display device 138.
  • the workstation also includes a Field Programmable Gate Array (FPGA) 140 with a complete or a portion of an operating system thereon such as the Microsoft Windows NT or Windows/98 Operating System (OS), the IBM OS/2 operating system, the MAC OS, or UNIX operating system.
  • FPGA Field Programmable Gate Array
  • Handel-C is a prograrnming language that enables a software or hardware engineer to target directly FPGAs (Field
  • Handel-C syntax is based on that of conventional C so programmers familiar with conventional C will recognize almost all the constructs in the Handel-C language.
  • Handel-C includes parallel constructs that provide the means for the programmer to exploit this benefit in his applications.
  • the compiler compiles and optimizes Handel-C source code into a file suitable for simulation or a netlist which can be placed and routed on a real FPGA.
  • the simulator allows a user to test a program without using real hardware. It can display the state of every variable (register) in your program at every clock cycle if required, the simulation steps and the number of cycles simulated being under program control.
  • the source code that was executed at each clock cycle as well as the program state may be displayed in order to assist in the debugging of the source code.
  • OOP Object oriented programming
  • OOP is a process of developing computer software using objects, including the steps of analyzing the problem, designing the system, and constructing the program.
  • An object is a software package that contains both data and a collection of related structures and procedures. Since it contains both data and a collection of structures and procedures, it can be visualized as a self-sufficient component that does not require other additional structures, procedures or data to perform its specific task.
  • OOP therefore; views a computer program as a collection of largely autonomous components, called objects, each of which is responsible for a specific task. This concept of packaging data, structures, and procedures together in one component or module is called encapsulation.
  • OOP components are reusable software modules which present an interface that conforms to an object model and which are accessed at run-time through a component integration architecture.
  • a component integration architecture is a set of architecture mechanisms which allow software modules in different process spaces to utilize each others capabilities, or functions. This is generally done by assuming a common component object model on which to build the architecture. It is worthwhile to differentiate between an object and a class of objects at this point.
  • An object is a single instance of the class of objects, which is often just called a class.
  • a class of objects can be viewed as a blueprint, from which many objects can be formed.-
  • OOP allows the programmer to create an object that is a part of another object.
  • the object representing a piston engine is said to have, a composition- relationship with the object representing a piston.
  • a piston engine comprises a piston, valves and many other components; the fact that a piston is an element of a . piston engine can be logically and semantically represented in OOP by two objects.
  • OOP also allows creation of an object that "depends from” another object. If there are two objects, one representing a piston engine and the other representing a piston engine wherein the piston is made of ceramic, then the relationship between the two objects is not that of composition.
  • a ceramic piston engine does not make up a piston engine. Rather it is merely one kind of piston engine that has one more limitation than the piston engine; its piston is made of ceramic.
  • the object representing the ceramic piston engine is called a derived object, and it inherits all of the aspects of the object representing the piston engine and adds further limitation or detail to it.
  • the object representing the ceramic piston engine "depends from" the object representing the piston engine. The relationship between these objects is called inheritance.
  • the object or class representing the ceramic piston engine inherits all of the aspects of the objects representing the piston engine, it inherits the thermal characteristics of a standard piston defined in the piston engine class.
  • the ceramic piston engine object overrides these ceramic specific thermal characteristics, which are typically different from those associated with a metal piston. It skips over the original and uses new functions related to ceramic pistons.
  • Different kinds of piston engines have different characteristics,, but may have the same underlying functions associated with it (e.g., how many pistons in the engine, ignition sequences, lubrication, etc.). To access each of these functions in any piston engine object, a programmer would call the same functions with the same names, but each type of piston engine may have different/overriding implementations of functions behind the same name.
  • Objects can represent physical objects, such as automobiles in a traffic-flow simulation, electrical components in a circuit-design program, countries in an economics model, or aircraft in an air-traffic-control system.
  • Objects can represent elements of the computer-user environment such as windows, menus or graphics objects.
  • An object can represent an inventory, such as a personnel file or a table of the latitudes and longitudes of cities.
  • An object can represent user-defined data types such as time, angles, and complex numbers, or points on the plane.
  • OOP allows the software developer to design and implement a computer program that is a model of some aspects of reality, whether that reality is a physical entity, a process, a system, or a composition of matter. Since the object can represent anything, the software developer can create an object which can be used as a component in a larger software project in the future.
  • OOP enables software developers to build objects out of other, previously built objects. This process closely resembles complex machinery being built out of assemblies and sub-assemblies. OOP technology, therefore, makes software engineering more like hardware engineering in that software is built from existing components, which are available to the developer as objects. All this adds up to an improved quality of the software as well as an increased speed of its development.
  • C++ is an OOP language that offers a fast, machine-executable code.
  • C++ is suitable for both commercial-application and systems- programming projects.
  • C++ appears to be the most popular choice among many OOP programmers, but there is a host of other OOP languages, such as Smalltalk, Common Lisp Object System (CLOS), and Eiffel. Additionally, OOP capabilities are being added to more traditional popular computer programming languages such as Pascal.
  • Encapsulation enforces data abstraction through the organization of data into small, independent objects that can communicate with each other. Encapsulation protects the data in an object from accidental damage, but allows other objects to interact with that data by calling the object's member functions and structures.
  • Subclassing and inheritance make it possible to extend and modify objects through deriving new kinds of objects from the standard classes available in the system. Thus, new capabilities are created without having to start from scratch.
  • Polymorphism and multiple inheritance make it possible for different programmers to mix and match characteristics of many different classes and create specialized objects that can still work with related objects in predictable ways.
  • Class hierarchies and containment hierarchies provide a flexible mechanism for modeling real-world objects and the relationships among them.
  • Class libraries are very flexible. As programs grow more complex, more programmers are forced to adopt basic solutions to basic problems over and over again.
  • a relatively new extension of the class library concept is to have a framework of class libraries. This framework is more complex and consists of significant collections of collaborating classes that capture both the small scale patterns and major mechanisms that implement the common requirements and design in a specific application domain. They were first developed to free application programmers from the chores involved in displaying menus, windows, dialog boxes,. and other standard user interface elements for personal computers.
  • Frameworks also represent a change in the way programmers think about the interaction between the code they write and code written by others.
  • the programmer called libraries provided by the operating system to perform certain tasks, but basically the program executed down the page from start to finish, and the programmer was solely responsible for the flow of control. This was appropriate for printing out paychecks, calculating a mathematical table, or solving other problems with a program that executed in just one way.
  • event loop programs require programmers to write a lot of code that should not need to be written separately for every application.
  • the concept of an application framework carries the event loop concept further. Instead of dealing with all the nuts and bolts of constructing basic menus, windows, and dialog boxes and then making these things all work together, programmers using application frameworks start with working application code and basic user interface elements in place. Subsequently, they build from there by replacing some of the generic capabilities of the framework with the specific capabilities of the intended application.
  • Application frameworks reduce the total amount of code that a programmer has to write from scratch.
  • the framework is really a generic application that displays windows, supports copy and paste, and so on, the programmer can also relinquish control to a greater degree than event loop programs permit.
  • the framework code takes care of almost all event handling and flow of control, and the programmer's code is called only when the framework needs it (e.g., to create or manipulate a proprietary data structure).
  • a programmer writing a framework program not only relinquishes control to the user (as is also true for event loop programs), but also relinquishes the detailed flow of control within the program to the framework. This approach allows the creation of more complex systems that work together in interesting ways, as opposed to isolated programs, having custom code, being created over and over again for similar problems.
  • a framework basically is a collection of cooperating classes that make up a reusable design solution for a given problem domain. It typically includes objects that provide default behavior (e.g., for menus and windows), and programmers use it by inheriting some of that default behavior and overriding other behavior so that the framework calls application code at the appropriate times.
  • default behavior e.g., for menus and windows
  • Class libraries are essentially collections of behaviors that you can call when you want those individual behaviors in your program.
  • a framework provides not only behavior but also the protocol or set of rules that govern the ways in which behaviors can be combined, including rules for what a programmer is supposed to provide versus what the framework provides.
  • a framework embodies the way a family of related programs or pieces of software work. It represents a generic design solution that can be adapted to a variety of specific problems in a given domain. For example, a single framework can embody the way a user interface works, even though two different user interfaces created with the same framework might solve quite different interface problems.
  • a preferred embodiment of the invention utilizes HyperText Markup Language (HTML) to implement documents on the Internet together with a general-purpose secure communication protocol for a transport medium between the client and the Newco. HTTP or other protocols could be readily substituted for HTML without undue experimentation.
  • HTML HyperText Markup Language
  • Information on these products is available in T. Berners-Lee, D. Connoly, "RFC 1866: Hypertext Markup Language - 2.0" (Nov. 1995); and R. Fielding, H, Frystyk, T. Berners-Lee, J. Gettys and J.C.
  • HTML Hypertext Transfer Protocol - HTTP/1.1: HTTP Working Group Internet Draft
  • HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of domains. HTML has been in use by the World-Wide Web global information initiative since 1990. HTML is an application of ISO Standard 8879; 1986 Information Processing Text and Office Systems; Standard Generalized Markup Language (SGML). .
  • HTML has been the dominant technology used in development of Web-based solutions.
  • HTML has proven to be inadequate in the following areas:
  • Sun Microsystem's Java language solves many of the client-side problems by: • Improving performance on the client side;
  • UI User Interface
  • Custom “widgets” e.g., real-time stock tickers, animated icons, etc.
  • client-side validation e.g., client-side validation, offloading appropriate processing onto the client-for improved performance.
  • Dynamic, real-time Web pages can be created. Using the above-mentioned custom UI components, dynamic Web pages can also be created.
  • Sun's Java language has emerged as an industry-recognized language for "programming the Internet.”
  • Sun defines Java as: "a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multithreaded, dynamic, buzzword-compliant, general-purpose programming language.
  • Java supports programming for the Internet in the form of platform-independent Java applets.”
  • Java applets are small, specialized applications that comply with Sun's Java Application Programming Interface (API) allowing developers to add "interactive content" to Web documents (e.g., simple animations, page adornments, basic games, etc.). Applets execute within a Java-compatible browser (e.g., Netscape Navigator) by copying code from the server to client.
  • Java's core feature set is based on C++.
  • Sun's Java literature states that Java is basically, "C++ with extensions from Objective C for more dynamic method resolution.”
  • ActiveX includes tools for developing animation, 3-D virtual reality, video and other multimedia content.
  • the tools use Internet standards, work on multiple platforms, and are being supported by over 100 companies.
  • the group's building blocks are called ActiveX Controls, small, fast components that enable developers to embed parts of software in hypertext markup language (HTML) pages.
  • ActiveX Controls work with a variety of programming languages including Microsoft Visual C++, Borland Delphi, Microsoft Visual Basic programming system and, in the future, Microsoft's development tool for Java, code named "Jakarta.”
  • ActiveX Technologies also includes ActiveX Server Framework, allowing developers to create server applications.
  • ActiveX could be substituted for JAVA without undue experimentation to practice the invention.
  • FIG. 2 illustrates a method 200 for controlling processes of a system, which can be a computer, device, etc.
  • a Basic Input-Output System (BIOS) is initiated, which takes control.
  • BIOS Basic Input-Output System
  • the BIOS initiates the operating system. At least a portion, and preferably all, of the operating system is stored on a reconfigurable logic device, which preferably includes one or more FPGAs. After all operating system files have been initiated, the operating system is given control of the computer or device in operation 206. In operation 208, requested initial commands are performed. In operation 210, the operating system waits for the first interactive user input. The operating system manages one or more hardware and/or software processes, such as hardware applications, in operation 212.
  • an optimal functionality between hardware and software of the system can be determined, and implementation of functionality between the hardware and software can be partitioned based on the determined optimal functionality.
  • a system that utilizes an operating system stored in reconfigurable logic.
  • the system includes a reconfigurable logic device that has at least a portion, and preferably all, of an operating system stored on it.
  • the operating system is operable to manage one or multiple hardware and or software processes including applications.
  • the operating system is capable of supporting multiple processes (multitasking), and can determine an order in which the processes should run and an amount of time allowed for each process to execute before giving another process a turn.
  • the operating system is operable to manage the sharing of internal memory among multiple processes, handle input and output to and from an attached hardware device(s), such as hard disks, printers, and dial-up ports, send messages to the applications or interactive user (or to a system operator) about the status of an operation and any errors that may have occurred, and/or offload the management of what are called batch jobs (for example, printing) so that the initiating application is freed from this work.
  • an attached hardware device(s) such as hard disks, printers, and dial-up ports
  • the reconfigurable logic device includes one or more FPGAs. Also preferably, the reconfigurable logic device interfaces with at least one of a hardware register, buffer management, a microprocessor, an interrupt handler, and a memory transfer. A second reconfigurable logic device can be provided for reconfiguring (e.g., changing the configuration of) the first reconfigurable logic device.
  • the reconfigurable logic device is in communication with a code library, where the code library includes at least one driver for interfacing a hardware device or devices with an application.
  • a driver associated with the hardware device is retrieved from the code library.
  • the driver is utilized to configure the reconfigurable logic device to communicate with the hardware device. This can be a dynamic process that occurs automatically upon receiving a request for access to the hardware device.
  • Various aspects of the present invention can include:
  • FPLD Field-Programmable Logic Devices
  • PAL Programmable Array Logic device
  • FPLD's Complex PLD's
  • CPLD's Complex PLD's
  • This paradigm is characterized by devices such as the Advanced Micro Devices MACH.TM. family.
  • Examples of CPLD circuitry are seen in U.S. Pat. No. 5,015,884 (issued May 14, 1991 to Om P. Agrawal et al.) and U.S. Pat. No. 5,151,623 (issued Sep. 29, 1992 to Om P. Agrawal et al.), which are herein incorporated by reference.
  • FPGA Field Programmable Gate Arrays
  • Examples of such devices include the XC2000.TM. and XC3000.TM. families of FPGA devices introduced by Xilinx, Inc. of San Jose, Calif.
  • the architectures of these devices are exemplified in U.S. Pat. Nos. 4,642,487; 4,706,216; 4,713,557; and 4,758,985; each of which is originally assigned to Xilinx, Inc. and herein incorporated by reference.
  • An FPGA device can be characterized as an integrated circuit that has four major features as follows.
  • a user-accessible, configuration-defining memory means such as SRAM, PROM,. EPROM, EEPROM, anti-fused, fused, or other, is provided in the FPGA device so as to be at least once-programmable by device users for defining user-provided configuration instructions.
  • Static Random Access Memory or SRAM is of course, a form of reprogrammable memory that can be differently programmed many times.
  • Electrically Erasable and reprogrammable ROM or EEPROM is an example of nonvolatile reprogrammable memory.
  • the configuration-defining memory of an FPGA device can be formed of mixture of different kinds of memory elements if desired (e.g., SRAM and EEPROM) although this is not a popular approach.
  • IOB's Input/Output Blocks
  • the IOB's' may have fixed configurations or they may be configurable in accordance with user-provided configuration instructions stored in the configuration-defining memory means.
  • Configurable Logic Blocks are provided for carrying out user-programmed logic functions as defined by user-provided configuration instructions stored in the configuration-defining memory means.
  • each of the many CLB's of an FPGA has at least one lookup table (LOT) that is user-configurable to define any desired truth table,-to the extent allowed by the address space of the LUT.
  • Each CLB may have other resources such as LUT input signal pre-processing resources and LUT output signal post-processing resources.
  • TLB' was adopted by early pioneers of FPGA technology, it is not uncommon to see other names being given to the repeated portion of the FPGA that carries out user-programmed logic functions.
  • 'LAB' is used for example in U.S. Pat. No. 5,260,611 to refer to a repeated unit having a 4-input LUT.
  • An interconnect network is provided for carrying signal traffic within the FPGA device between various CLB's and/or between various IOB's and/or between various IOB's and CLB's. At least part of the interconnect network is typically configurable so as to allow for programmably-defined routing of signals between various CLB's and/or IOB's in accordance with user-defined routing instructions stored in the configuration- defining memory means.
  • FPGA devices may additionally include embedded volatile memory for serving as scratchpad memory for the CLB's or as FIFO or LIFO circuitry.
  • the embedded volatile memory may be fairly sizable and can have 1 million or more storage bits in addition to the storage bits of the device's configuration memory.
  • Modern FPGAs tend to be fairly complex. They typically offer a large spectrum of user-configurable options with respect to how each of many CLB's should be configured, how each of many interconnect resources should be configured, and/or how each of many IOB's should be configured. This means that there can be thousands or millions of configurable bits that may need to be individually set or cleared during configuration of each FPGA device.
  • the configuration instruction signals may also define an initial state for the implemented design, that is, initial set and reset states for embedded flip flops and/or embedded scratchpad memory ' cells.
  • the number of logic bits that are used for defining the configuration instructions of a given FPGA device tends to be fairly large (e.g., 1 Megabits or more) and usually grows with the size and complexity of the target FPGA. Time spent in loading configuration instructions and verifying that the instructions have been correctly loaded can become significant, particularly when such loading is carried out in the field. • For many reasons, it is often desirable to have in-system reprogramming capabilities so that reconfiguration of FPGA's can be carried out in the field.
  • FPGA devices that have configuration memories of the reprogrammable kind are, at least in theory, 'in-system programmable" (ISP).
  • ISP 'in-system programmable
  • the term, 'in-system' as used herein indicates that the FPGA device remains connected to an application-specific printed circuit board or to another form of end-use system during reprogramming.
  • the end-use system is of course, one which contains the FPGA device and for which the FPGA device is to be at least once configured to operate within in accordance with predefined, end-use or 'in the field' application specifications.
  • the price is the inherent volatility of the configuration data as stored in the FPGA device. Each time power to the FPGA device is shut off, the volatile configuration memory cells lose their configuration data. Other events may also cause corruption or loss of data from volatile memory cells within the FPGA device.
  • configuration restoration means is needed to restore the lost data when power is shut off and then re-applied to the FPGA or when another like event calls for configuration restoration (e.g., corruption of state data within scratchpad memory).
  • the configuration restoration means can take many form ' s. If the FPGA device resides in a relatively large system that has a magnetic or optical or opto-magnetic form of nonvolatile memory (e.g., a hard magnetic disk) ⁇ and the latency of powering up such a optical/magnetic device and/or of loading configuration instructions from such an optical/magnetic form of nonvolatile memory can be tolerated—then the optical/magnetic memory device can be used as a nonvolatile configuration restoration means that redundantly stores the configuration data and is used to reload the same into the system's FPGA device(s) during power-up operations (and/or other restoration cycles).
  • nonvolatile memory e.g., a hard magnetic disk
  • the FPGA device(s) resides in a relatively small system that does not have such optical/magnetic devices, and/or if the latency of loading configuration memory data from such an optical/magnetic device is not tolerable, then a smaller and/or faster configuration restoration means may be called for.
  • the small/fast device is expected to satisfy application-specific criteria such as: (1) being securely retained within the end-use system; (2) being able to store FPGA configuration data during prolonged power outage periods; and (3) being able to quickly and automatically re-load the configuration instructions back into the volatile configuration memory (SRAM) of the FPGA device each time power is turned back on or another event calls for configuration restoration.
  • application-specific criteria such as: (1) being securely retained within the end-use system; (2) being able to store FPGA configuration data during prolonged power outage periods; and (3) being able to quickly and automatically re-load the configuration instructions back into the volatile configuration memory (SRAM) of the FPGA device each time power is turned back on or another event calls for configuration restoration.
  • SRAM volatile configuration memory
  • 'CROP device' will be used herein to refer in a general way to this form of compact, nonvolatile, and fast-acting device that performs 'Configuration-Restoring On Power-up' services for an associated FPGA device.
  • the corresponding CROP device is not volatile, and it is generally not 'in-system programmable'. Instead, the CROP device is generally of a completely nonprogrammable type such as exemplified by mask-programmed ROM ICs or by once-only programmable, fuse- based PROM ICs. Examples of such CROP devices include a product family that the Xilinx company provides under the designation 'Serial Configuration PROMs' and under the trade name, XC1700D.TM.. These serial CROP devices employ one-time programmable PROM (Programmable Read Only Memory) cells for storing configuration instructions in nonvolatile fashion.
  • PROM Program Read Only Memory
  • An embodiment of the present invention provides a hardware/software codesign system which can target a system in which the hardware or the processors to run the software can be customized according to the functions partitioned to it.
  • the codesign system of this invention includes a partitioning means which flexibly decides the partitioning while varying the parameters of the hardware or processor to obtain both an optimal partitioning and optimal size of hardware and processor.
  • the codesign system comprising means for receiving a specification of the functionality, partitioning means for partitioning implementation of the functionality between (a) and (b) and for customizing the hardware and/or the machine in accordance . with the selected partitioning of the functionality.
  • the target system is a hybrid hardware/software system. It can be formed using • configurable logic resources in which case either the hardware or the processor, or both, can be formed on the configurable logic resources (e.g. an FPGA).
  • the configurable logic resources e.g. an FPGA
  • the partitioning means uses a genetic algorithm to optimize the partitioning and the parameters of the hardware and the processor.
  • a genetic algorithm to optimize the partitioning and the parameters of the hardware and the processor.
  • it generates a plurality of different partitions of the functionality of the target syste (varying the size of the hardware and/or the processor between the different partitions) and estimates the speed and size of the resulting system. It then selects the optimal - partitioning on the basis of the estimates.
  • a genetic algorithm a variety of partitions are randomly generated, the poor ones are rejected, and the remaining ones are modified by combining aspects of them with each other to produce different partitions. The speed and size of these are then assessed and the process can be repeated until an optimal partition is produced.
  • the invention is applicable to target systems which use either customizable hardware and a customizable processor, or a fixed processor and customizable hardware, or fixed hardware and a customizable processor.
  • the customizable part could be formed on an FPGA, or, for instance, an ASIC.
  • the system may include estimators for estimating the speed and size of the hardware and the software controlled machine and may also include an interface generator for generating interfaces between the hardware and software. In that case the system may also include an estimator for estimating the size of the interface.
  • the partitioning means calls the estimators when deciding on an optimum partitioning.
  • the software-controlled machine can comprise a CPU and the codesign system comprises means for generating a compiler for the CPU as well as means for describing the CPU where it is to be formed on customizable logic circuits.
  • the codesign system can further comprise a hardware compiler for producing from those parts of the specification partitioned to hardware a register transfer level description for configuring configurable logic resources (such as an FPGA). It can further include a synthesizer for converting the register transfer level description into a net list.
  • a hardware compiler for producing from those parts of the specification partitioned to hardware a register transfer level description for configuring configurable logic resources (such as an FPGA). It can further include a synthesizer for converting the register transfer level description into a net list.
  • the system can include a width adjuster for setting and using a desired data word size, and this can be done at several points in the desired process as necessary.
  • Another aspect of the invention provides a hardware/software codesign system which receives a specification of a target system in the form of behavioral description, i.e. a description in a programming language such as can be written by a computer programmer, and partitions it and compiles it to produce hardware and software.
  • the partitioning means can include a parser for parsing the input behavioral description.
  • the description can be in a familiar computer language such as C, supplemented by a plurality of predefined attributes to describe, for instance, parallel execution of processes, an obligatory partition to software or an obligatory partition to hardware.
  • the system is preferably adapted to receive a declaration of the properties of at least one of the hardware and the software-controlled machine, preferably in an object-oriented paradigm. It can also be adapted such that some parts of the description can be at the register transfer level, to allow closer control by the user of the final performance of the target system.
  • the invention provides a hardware/software codesign system for making an electronic circuit which includes both dedicated hardware and software controlled resources.
  • the codesign system receives a behavioral description of the target electronic system, and automatically partitions the required functionality between hardware and software, while being able to vary the parameters (e.g. size or power) of the hardware arid/or software ' .
  • the hardware and the processor for the software can be formed on an FPGA, each being no bigger than is necessary to form the desired functions.
  • the codesign system outputs a description of the required processor (which can be in the form of a net list for placement on the FPGA), machine code to- run on the processor, and a net list or register transfer level description of the necessary hardware.
  • the automatic partitioning can be performed by using an optimization algorithm, e.g. a genetic algorithm, which generates a partitioning based on estimates of performance.
  • the invention also allows the manual partition of systems across a number of hardware and software resources from a single behavioral description of the system. This provision for manual partitioning, as well as automatic partitioning, gives the system great flexibility.
  • the hardware resources may be a block that can implement random hardware, such as an FPGA or ASIC; a fixed processor, such as a microcontroller, DSP, processor, or processor core; or a customizable processor which is to be implemented on one of the hardware resources, such as an FPGA-based processor.
  • the system description can be augmented with register transfer level descriptions, and parameterized instantiations of both hardware and software library components written in other languages.
  • the sort of target systems which can be produced include:
  • the use of the advanced estimation techniques in specific embodiments of the invention allows the system to take into account the area of the processor that will be produced, allowing the targeting of customizable processors with additional and removable instructions, for example.
  • the estimators also take into account the speed degradation produced when the logic that a fixed hardware resource must implement nears the resource's size limit. This is done by the estimator reducing the estimated speed as that limit is reached. Further, the estimators can operate on both the design before partitioning, and after partitioning. Thus high level simulation, as well as simulation and estimation after partitioning, can be performed.
  • the part of the system which compiles the software can transparently support additional or absent instructions for the processor and so is compatible with the parameterization of the processor.
  • the input language supports variables with arbitrary widths, which are then unified to a fixed width using a promotion scheme, and then mapped to the widths available oh the target system architecture.
  • the input description it is possible for the input description to include both behavioral and register transfer level descriptions, which can both be compiled to software. This gives support for very fast simulation and allows the user control of the behavior of the hardware on each clock cycle.
  • Figure 3 is a flow diagram of a process 300 for automatically partitioning a behavioral . description of an electronic system into the optimal configuration of hardware and software according to a preferred embodiment of the present invention.
  • the system receives a behavioral description of the electronic system and, in operation 304, determines the optimal required functionality between hardware and software.
  • that functionality is partitioned preferably while varying the parameters (e.g. size or power) of the hardware and/or software.
  • the parameters e.g. size or power
  • the codesign system outputs a description of the required processors, machine code to run on the processors, and a net list or register transfer level description of the necessary hardware. It is possible for the user to write some parts of the description of the system at register transfer level to give closer control over the operation of the system, and the user can specify the processor or processors to be used, and can change, for instance, the partitioner, compilers or speed estimators used in the codesign system.
  • the automatic partitioning is formed by using a genetic algorithm which estimates the performance of randomly generated different partitions and selects an optimal one of them.
  • Appendix 1 is an exemplary register transfer level description of a simple processor.
  • Appendix 2 is a register transfer level description of the main process flow in the example of Figures 6 to 8.
  • Appendix 3 is the input specification for the target system of Figure 10.
  • the target architecture for this system is an FPGA containing one or more processors, and custom hardware.
  • the processors may be of different architectures, and may communicate with each other and with the custom hardware.
  • the user writes a description 402 of the system in a C-like language, which is actually ANSI C with some additions which allow efficient translation to hardware and parallel processes.
  • This input description will be compiled by the system 400 of Figure 4.
  • the additions to the ANSI C language include the following: c
  • Variables are declared with explicit bit widths and the operators working on the variables work with an arbitrary precision. This allows efficient implementation in hardware. For instance a statement which declares the width of variables (in this case the program counter pc, the instruction register ir, and the top of stack tos) is as follows:
  • the width of the data path of the processor in the target system may be declared, or else is calculated by the partitioner 408 as the width of the widest variable which it uses.
  • the "par” statement has been added to describe process-level parallelism.
  • the system can automatically extract fine-grained parallelism from the C-like description but generating coarse-grained parallelism automatically is far more difficult. Consequently the invention provides this attribute to allow the user to express parallelism in the input language using the "par" statement which specifies that a following list of statements is to be executed in parallel. For example, the expression:
  • Channels can be declared and are used for blocking, point-to-point synchronized communication as used in occam (see G. Jones. Programming in occam. Prentice Hall International Series in Computer Science, 1987, which is hereby incorporated by reference j with a syntax like a C function call.
  • the parallel processes can use the channels to perform distributed assignment. Thus parallel processes can communicate using blocking channel communication.
  • the keyword "chan" I declares these channels. For example,
  • variable b is a statement which sends the value of variable a down channel.1; and receive (channel 2, b) is a statement which assigns the value received along channel 2 to variable b.
  • the hardware, resources available are declared.
  • the resources may be a customizable processor, a fixed processor, or custom hardware.
  • the custom hardware may be of a specific architecture, such as a Xilinx FPGA.
  • the architecture of the target system can be described in terms of the available functional units and their interconnection.
  • platforms can be hard or soft.
  • a hard platform is something that is fixed such as a Pentium processor or an FPGA.
  • a soft platform is something that can be configured like an FPGA-based processor.
  • the partitioner 408 understands the keywords "hard” and "soft", which are used for declaring these platforms and the code can be implemented on any of these.
  • Xilinx 4000 series FPGAs e.g. the Xilinx 4085 below
  • An attribute can be attached to a platform when it is declared:
  • the attribute PLATFORMS contains one element: the architecture of the hard platform. In this embodiment this may be the name of a Xilinx 3000 or 4000 series FPGA, an Altera FPGA, or an x86 processor.
  • PLATFORMS is a pair.
  • the first element is the architecture of the platform: FPGAStackProc, FPGAParallelStackProc or FPGAMips
  • Channels can be declared with an implementation, and as only being able to link previously declared platforms.
  • the system 400 recognizes the following chaimel implementations:
  • PCIBus - a channel implemented over a PCI bus between an FPGA card and a
  • FPGAChan - a channel implemented using wires on the FPGA.
  • CHANNELTYPE may be PCIBus or FPGAChan.
  • FPGAChan is the default.
  • PLATFORM is the name of the platform which can send down the channel.
  • PLATFORM is the name of the platform which can receive from the channel.
  • the system 400 checks that the declared channels and the platforms that use them are compatible.
  • the communication mechanisms which a given type of channel can implement are built into the system. New mechanisms can be added by the user, in a similar way to adding new processors as will be explained below.
  • proci is a soft platform which is implemented // on the FPGA on the meetea board soft proci attribute- ((platform(FpgaStackProc, meeteaBoard)));
  • channel2 is. implemented on the FPGA chan channel2 .attribute- ((type(FPGAChan)));
  • Attributes are also added to the input code to enable the user to specify whether a block is to be put in hardware or software and for software the attribute also specifies the target processor.
  • the attribute also specifies whether the description is to be interpreted as a register transfer ( RT ) or behavioral level description.
  • the default is behavioral. For example:
  • parts of the description which are to be allocated to hardware can be written by the user at a register transfer level, by using a version of the input language with a well defined timing semantics (for example Handel-C or another RTL language), or the scheduling decisions (i.e. which operations happen on which clock cycle) can be left to the compiler.
  • a block of code may be specifically assigned by the user to one of the available resources.
  • Soft resources may themselves be assigned to hardware resources such as an FPGA-based processor.
  • PLATFORM is the name of the platform on which the code will be implemented. This implies the compiler which will be used to compile that code.
  • LEVEL is Behavioral or RTL. Behavioral descriptions will be scheduled and may be partitioned. RTL descriptions are passed straight through to the RTL synthesizer e.g. a Handel-C compiler.
  • NUMBER is a positive integer. Behavioral descriptions will be scheduled in such a way that the block of code will execute within that number of cycles, when possible. An error is generated if it is not possible.
  • this input language which is based on a known computer language, in this case C, but with the additions above allows the user, who could be a system programmer, to write a specification for the system in familiar behavioral terms like a computer program. The user only needs to learn the additions above, such as how to declare parallelism and to declare the available resources to be able to write the input description of the target system.
  • This input language is input to the parser 404 which parses and type checks the input code, and performs some syntax level optimizations, (in a standard way for parsers), and attaches a specific compiler to each block of code based on the attributes above.
  • the parser 404 uses standard techniques [Aho, Sethi and Ullman; "Compilers Principles, Techniques, and Tools”; Addison Wesley known as “The Dragon Book”, which is hereby incorporated by reference] to turn the system description in the input language into an internal data structure, the abstract syntax tree which can be supplied to the partitioner 408.
  • the width adjuster 406 uses C-techniques to promote automatically the arguments of operators to wider widths such that they are all of the same width for instance by concatenating them with zeros. Thus this is an extension of the promotion scheme of the.C language, but uses arbitrary numbers of bits. Further adjustment is carried out later in the flow at 406a and 406b, for instance by ANDing them with a bit mask.
  • Each resource has a list of widths that it can support. For example a 32 bit processor may be able to carry out 8, 16 and 32 bit operations. Hardware may be able to support any width, or a fixed width datapath operator may have been instantiated from a library.
  • the later width adjustment modules 406a and 406b insert commands to enable the width of operation in the description to be implemented correctly using the resources available.
  • the partitioner 408 generates a control/data-flow graph (CDFG) from the abstract syntax tree, for instance using the techniques. described in G. de Michelli. "Synthesis and Optimization of Digital Circuits”; McGraw-Hill, 1994 which is hereby incorporated by reference. It then operates, on the parts of the description which have not already been assigned to resources by the user. It groups parts of the description . together into blocks, "partitioning blocks", which are indivisible by the partitioner. The size of these blocks is set by the user, and can be any size between a single operator, and a top-level process. Small blocks tend to lead to a slow more optimal partition; large blocks tend to lead to a faster less optimal partition.
  • CDFG control/data-flow graph
  • the algorithm used in this embodiment is described below but the system is designed so that new partitioning algorithms can easily be added, and the user can choose which of these partitioning algorithms to use.
  • the algorithms all assign each partitioning block to one of the hardware resources which has been declared.
  • the algorithms do this assignment so that the total estimated hardware area is smaller than the hardware resources available, and so that the estimated speed of the system is maximized.
  • the algorithm implemented in this embodiment of the system is a genetic gorithm for instance as explained in D.E. Goldberg, "Genetic Algorithms in Search, Optimization and Machine learning", Addison- Wesley, 1989 which is hereby incorporated by reference.
  • the resource on which each partitioning block is to be placed represents a gene and the fitness function returns infinity for a partitioning which the estimators say will not fit in the available hardware; otherwise it returns the estimated system speed. Different partitions are generated and estimated speed found.
  • the user may set the termination condition to one of the following:
  • the partitioner 408 uses estimators 420, 422, and 424 to estimate the size and speed of the hardware, software and interfaces as described below.
  • the estimators and the simulation and profiling module 420 can accept a system description from any level in the flow.
  • the input description which may include behavioral and register transfer level parts, to be compiled to software for simulation and estimation at this stage.
  • the simulator can be used to collect profiling information for sets of typical input data, which will be used by the partitioner 408 to estimate data dependent values, by inserting data gathering operations into the output code.
  • the estimator 422 is called by the partitioner 408 for a quick estimation of the size and speed of the hardware parts of the system using each partition being considered.
  • Data dependent values are estimated using the average of the values for the sets of typical input data supplied by the user.
  • the description is scheduled using a call to the behavioral synthesizer 412.
  • the user can choose which estimation algorithm to use, which gives a choice between slow accurate estimation and faster less accurate estimation.
  • the speed and area of the resulting RTL level description is then estimated using standard techniques. For FPGAs the estimate of the speed is then decreased by a non-linear factor determined from the available free area, to take into account the slower speed of FPGA designs when the FPGA is nearly full.
  • the target is customizable processors to be compiled by the system itself then a more accurate estimation of the software speed is used which models the optimizations that the software compiler 416 uses.
  • the area and cycle time of the processor is modeled using a function which is written for each processor, and expresses the required values in terms of the values of the processor's parameterizations, such as the set of instructions that will be used, the data path and instruction register width and the cache size.
  • Interfaces between the hardware and software are instantiated by the interface cosynthesizer 410 from a standard library of available communication mechanisms. Each communication mechanism is associated with an estimation function, which is used by the partitioner to cost the software and hardware speed and area required for given communication, or set of communications. Interfaces which are to be implemented using a resource which can be parameterized (such as a channel on an FPGA), are synthesized using the parameterizations decided by the partitioner. For example, if a transfer often thousand 32 bit values over a PCI bus was required, a DMA transfer from the host to an FPGA card's local memory might be used.
  • the compiler parts of the system are designed in an object oriented way, and actually provide a class hierarchy of compilers, as shown in Figure 5.
  • Each node in the tree shows a class which is a subclass of its parent node.
  • the top-level compiler class 502 provides methods common to both the hardware and software flows, such as the type checking, and a system-level simulator used for compiling and simulating the high- level description. These methods are inherited by the hardware and software compilers 504, 506, and may be used or overridden.
  • the compiler class also specifies other, virtual, functions which must be supplied by its subclasses. So the compile method on the hardware compiler class compiles the description to hardware by converting the input description to an RTL description; the compile method on the Processor A compiler compiles a description to machine code which can run on Processor A.
  • the compiler is called from the command line by the attributes mentioned above specifying which compiler to use for a block of code.
  • An interactive environment is provided, where the user has access to a set of functions which the user can call, e.g. to estimate speed and size of hardware and software implementations, manually attach a compiler to a block of code, and call the simulator.
  • This interactive environment also allows complex scripts, functions and macros to be written and saved by the user for instance so that the user can add a new partitioning algorithm.
  • the main compilation stages of the process flow are software or hardware specific. Basically at module 412 the system schedules and allocates any behavioral parts of the hardware description, and at module 416 compiles the software description to assembly code. At module 418 it also writes a parameterized description of the processors to be used, which may also have been designed by the user. These individual steps will be explained in more detail.
  • the parts of the description to be compiled into hardware use a behavioral synthesis compiler 412 using the techniques of De Michelli mentioned above.
  • the description is translated to a control/data flow graph, scheduled (i.e. what happens on each clock cycle is established) and bound (i.e. which resources are used for which operations is established), optimized, and then an RT-level description is produced.
  • the invention also allows the designer to write parts of the input description corresponding to certain hardware at the register transfer level, and so define the cycle-by-cycle behavior of that hardware.
  • the result of the hardware compilation by the behavioral synthesizer 412 is an RTL description which can be output to a RTL synthesis system 414 using a hardware description language (e.g. Handel-C or VHDL), or else synthesized to a gate level description using the techniques of De Michelli.
  • a hardware description language e.g. Handel-C or VHDL
  • RTL synthesis optimizes the hardware description, and maps it to a given technology. This is performed using standard techniques.
  • the software compiler 416 largely uses standard techniques [e.g. from Aho, Sethi and Ullman mentioned above].
  • parallelism is supported by mapping the invention's CSP-like model of parallelism and communication primitives into the target model. For instance channels can mapped to blocks of shared memory protected by semaphores.
  • CSP is described in C.A.R. Hoare "Communicating sequential processes.” Prentice-Hall International series in computing science. Prentice-Hall International, Englewood Cliffs, NJ. which is hereby incorporated by reference.
  • the software compiler is organized in an object oriented way to allow users to add support for different processors (see Figure 5) and for processor parameterizations. For example, in the processor parameterize 418 unused instructions from the processor description are automatically removed, and support for additional instructions can be added.
  • This embodiment of the invention includes some prewritten processor descriptions which can be selected by the user. It contains parameterized descriptions of three processors, and the software architecture is designed so that it is easy for developers to add new descriptions which can be completely new or refinements of these.
  • the three processors provided are
  • a 2-cycle non-pipelined stack-based processor (see below).
  • a more sophisticated multicycle non-pipelined stack-based processor with a variable number of cycles per instruction, and hardware support for parallelism and channels.
  • the software compiler supports many processor parameterizations. More complex and unexpected modifications are supported by virtue of the object oriented design of the compiler, which allows small additions to be made easily by the user. Most of the mapping functions can be inherited from existing processor objects, minor additions can be made a function used to calculate the speed and area of processor given the parameterizations of the processor and a given program.
  • the output of the software compilation/processor parameterization process is machine code to run on the processor together with a description of the processor to be used (if it is not a standard one).
  • Handel-C provides estimation of the speed and area of the design, which is written as an HTML file to be viewed using a standard browser, such as Netscape.
  • the file shows two versions of the program: in one each statement is colored according to how much area it occupies, and in the other according to how much combinational delay it generates. The brighter the color for each statement, the greater the area or delay. This provides a quick visual feedback to the user of the consequences of design decisions.
  • the Handel-C simulator is a fast cycle-accurate simulator which uses the C-like nature of the specification to produce an executable which simulates the design. It has an X- windows interface which allows the user to view VGA video output at about one frame per second.
  • the design can be compiled to a netlist. This is then mapped, placed and routed using the FPGA vendor's tools.
  • the simulator can be used to collect profiling information for sets of typical input data, which will be used by the partitioner 408 to estimate data dependent values, by inserting data gathering operations into the output code.
  • a subset of the above system could be used to provide a provably correct compilation strategy. This subset would include the channel communication and parallelism of OCCAM and CSP. A formal semantics of the language could be used together with a set of transformations and a mathematician, to develop a provably correct partitioning and compilation route.
  • the description of the processor to be used to run the software part of the target system may itself be written in the C-like input language and compiled using the codesign system. As it is such an important element of the final design most users will want to write it at the register transfer level, in order to hand-craft important parts of the design.
  • the user may use the predefined processors, provided by the codesign system or write the description in VHDL or even at gate level, and merge it into the design using an FPGA vendor's tools. With this system the user can parameterize the processor design in nearly any way that he or she wishes as discussed above in connection with the software compilation and as detailed below.
  • the first processor parameterization to consider is removing redundant logic. Unused instructions can be removed, along with unused resources, such as the floating point unit or expression stack.
  • the second parameterization is to add resources. Extra RAMS and ROMs can be added.
  • the instruction set can be extended from user assigned instruction definitions. Power-on bootstrap facilities can be added.
  • the third parameterization is to tune the size of the used resources.
  • the bit widths of the program counter, stack pointer, general registers and the opcode and operand portions of the instruction register can be set.
  • the size of internal memory and of the stack or stacks can be set, the number and priorities of interrupts can be defined, and channels needed to communicate with external resources can be added. This freedom to add communication channels is a great benefit of codesign using a parametrizable processor, as the bandwidth between hardware and software can be changed to suit the application and hardware/software partitioning.
  • Appendix 1 The description of a very simple stack-based processor in this style (which is actually one of the pre-written processors provided by the codesign system for use by the user) is listed in Appendix 1.
  • the processor starts with a definition of the instruction width, and the width of the internal memory and stack addresses. This is followed by an assignment of the processor opcodes.
  • the registers are defined; the declaration "unsigned x y, z" declares unsigned integers y and z of width x.
  • the program counter, instruction register and top-of-stack are the instruction width; the stack pointer is the width of the stack.
  • the processor is defined. This is a simple non-pipelined two- cycle processor. On the first cycle (the first three-line "par"), the next instruction is fetched from memory, the program counter is incremented, and the top of the stack is saved. On the second cycle the instruction is decoded and executed. In this simple example a big switch statement selects the fragment of code which is to be executed.
  • This simple example illustrates a number of points.
  • Various parameters such as the width of registers and the depth of the stack can be set. Instructions can be added by including extra cases in the switch statement. Unused instructions and resources can be deleted, and opcodes can be assigned.
  • the example also introduces a few other features of the register transfer level 30 language such as rom and ram declarations.
  • the target system is a video game in which the user can fly a plane over a detailed background picture. Another user can be dialed up, and the screen shows both the local plane and a plane controlled remotely by the other user.
  • the main challenge for the design is that the system must be implemented on a single medium-sized FPGA.
  • the platform for this application was a generic and simple FPGA-based board.
  • the Hammond board contains a Xilinx 4000 series FPGA and 256kb synchronous static RAM. Three buttons provide a simple input device to control the plane; alternatively a standard computer keyboard can be plugged into the board. There is a parallel port which is used to configure the FPGA, and a serial port. The board can be clocked at 20 MHz from a crystal, or from a PLL controlled by the FPGA. Three groups of four pins of the FPGA are connected to a resistor network which gives a simple digital to analogue converter, which can be used to provide 12 bit VGA video by implementing a suitable sync generator on the FPGA. Problem description and discussion The specification of the video game system is as follows:
  • the system must dial up an Internet service provider, and establish a connection with the remote game, which will be running on a workstation.
  • the system must display a reconfigurable background picture.
  • the system must display on a VGA monitor a picture of two planes: the local plane and the remote plane.
  • the position of the local plane will be controlled by the buttons on the Hammond board.
  • the position of the remote plane will be received over the dialup connection every time it changes.
  • the position of the local plane will be sent over the dialup, connection every time it changes.
  • This simple problem combines some hard timing constraints, such as sending a stream of video to the momtor, with some complex tasks without timing constraints, such as connecting to the Internet service provider. There is also an illustration of contention for a shared resource, which will be discussed later.
  • a block diagram of the system 800 is shown in Figure 8.
  • the system design decisions were quite straightforward.
  • a VGA monitor 802 is plugged straight into the Hammond board 600.
  • a modem 804 can be used, and plugged into the serial port of the Hammond board. Otherwise it is quite feasible to build a simple modem in the FPGA.
  • serial port interface dial up, establishing the network connection, sending the position of the local plane, receiving the position of the remote plane, displaying the background picture, displaying the planes.
  • a simple way of generating the video is to build a sync generator in the FPGA, and calculate and output each pixel of VGA video at the pixel rate.
  • the background picture can be stored in a "picture RAM”.
  • the planes can be stored.
  • the hardware portions of the design are dictated by the need of some part of the system to meet tight timing constraints. These are the video generation circuitry and the port drivers. Consequently these were allocated to hardware , and their C descriptions written at register transfer level to enable them to meet the timing constraints.
  • the picture RAM and the character generator ROM and character location RAM were all stored in the Hammond board RAM bank as the size estimators, showed that there would be insufficient space on the FPGA.
  • the parts of the design to be implemented in software are the dial-up and negotiation, establishing the network, and communicating the plane locations. These are non-time critical, and so can be mapped to software.
  • the program is stored in the RAM bank, as there is not space for the application code in the FPGA.
  • the main function is shown in Appendix 2. The first two lines declare some communication channels. Then the driver for the parallel port and sync generator are started, and the RAM is initialized with the background picture, the character memory and the program memory. The parallel communicating hardware and software process are then started, communicating over a channel hwswchan.
  • the software establishes the network connection, and then enters a loop which transmits and receives the position of the local and remote plane, and sends new positions to the display process.
  • the simple stack-based processor from Appendix 1 was parameterized in the following ways to run this software.
  • the width of the processor was made to be 10 bits, which is sufficient to address a character on the screen in a single word. No interrupts were required, so these were removed, as were a number of unused instructions, and the internal memory.
  • the RT-level design was simulated using the Handel-C simulator. Sample input files mimicking the expected inputs from the peripherals were prepared, and these were fed into the simulator. A black and white picture 900 of the color display is shown in Figure 9 (representing a snapshot of the X window drawn by the co-simulator).
  • the design was then placed and routed using the proprietary Xilinx tools, and successfully fit into the Xilinx 4013 FPGA on the Hammond board.
  • the codesign system was presented with an implementation challenge with this design.
  • the processor had to access the RAM (because that is where the program was stored), whilst the hardware display process simultaneously had to access the RAM because this is where the background picture, character map and screen map were stored.
  • This memory contention problem was made more difficult to overcome because of an implementation decision made during the design of the Hammond board: for a read cycle the synchronous static RAM which was used requires the address to be presented the cycle before the data is returned.
  • the display process needs to be able to access the memory without delay, because of the tight timing constraints placed on it.
  • a semaphore is used to indicate when the display process requires the memory. In this case the processor stalls until the semaphore is lowered. On the next cycle the processor then presents to the memory the address of the next instruction, which in some cases may already have been presented once.
  • the designer was able to overcome this problem using the codesign system of invention because of the facility for some manual partitioning by the user and describing some parts of the design at the register transfer level to give close control over those parts. Thus while assisting the user, the system allows close control where desired.
  • the "springs" program is a small example of a codesign programmed in the C-like language mentioned above. It performs a simulation of a simple mass-spring system, with a real time display on a monitor, and interaction via a pair of buttons.
  • the design consists of three parts: a process computing the motion of the masses, a process rendering the positions of the masses into line segments, and a process which displays these segments and supplies the monitor with appropriate synchronization signals.
  • the first two processes are written in a single C-like program.
  • the display process is hard real-time and so requires a language which can control external signals at the resolution of a single clock cycle, so for this reason it is implemented using an RTL description (Handel-C in this instance).
  • Figure 10 is a block diagram of the ultimate implementation, together with a representation of the display of the masses and springs.
  • Figure 11 is a dependency graph for calculation of the variables required.
  • the mass motion process first sets up the initial positions, velocities and acceleration of the masses. This can be seen in Appendix 3 where positions pO to p7 are initialized as 65536.
  • the program then continues in an infinite loop, consisting of: sending pairs of mass positions to the rendering process, computing updated positions based on the velocities of the masses, computing updated velocities based on the accelerations of the masses, and computing accelerations based on the positions of the masses according to Hooke's law.
  • the process then reads the status of the control buttons and sets the position of one of the masses accordingly. This can be seen in Appendix 3 as the statement "received (buttons, button status)"
  • the rendering process runs an infinite loop performing the following operations: reading a pair of mass positions from the mass motion process then interpolate in between these two positions for the next 64 lines of video output. A pair of interpolated positions is sent to the R T L display process once per line. This is a relatively simple process with only one calculation, but this must be performed very regularly.
  • the display process (which is written in Handel-C) and is illustrated in Appendix 3 reads start and end positions from the rendering process and drives the video color signal between these positions on a scan line. Simultaneously, it drives the synchronization signals for the monitor. At the end of each frame it reads the values from the external buttons and sends these to the mass motion process.
  • the design could be partitioned it in a large number of ways. It could partition the entire design into hardware or into software, partition the design at the high-level, by the first two processes described above and compiling them using one of the possible routes, or it can partition the design at a lower level, and generate further parallel processes communicating with each other. Whatever choice the partitioner makes, it maintains the functional correctness of the design, but will change the cost of the implementation (in terms of the area, clock cycles and so forth). The user may direct the partitioner to choose one of the options above the others. A number of the options are described below.
  • the partitioner could map the first two processes directly into Handel-C, after performing some additional parallelization.
  • the problem with this approach is that each one of the operations in the mass motion process will be dedicated to its own piece of hardware, in an effort to increase performance. However, as discussed above, this is . unnecessary as these calculations can be performed at a slower speed.
  • the result is a design that can perform quickly enough but which is too large to fit on a single FPGA. This problem would be recognized by the partitioner using its area estimation techniques.
  • partitioner maps the two processes into software running on a parameterized threaded processor. This, reduces the area required, since the repeated operations of the mass motion calculations are performed with a single operation inside the processor. However, since the processor must swap between doing the mass motion calculations and the rendering calculations, overhead is introduced which causes it to run too slowly to display in real-time.
  • the partitioner can recognize this by using the speed estimator, based on the profiling information gathered from simulations of the system.
  • partitioner Another alternative would be for the partitioner to generate a pair of parameterized processors running in parallel, the first calculating motion and the second performing the rendering.
  • the area required is still smaller than the pure hardware approach, and the speed is now sufficient to implement the system in real time.
  • using a parameterized processor for the rendering process adds some overhead (for instance, performing the instruction decoding), which is unnecessary. So although the solution works, it is a sub optimal.
  • the best solution, and the one chosen by the partitioner, is to partition the mass motion process into software for a parameterized, unthreaded processor, and to partition the rendering process 1010 which was written at a behavioral level together with the position, velocity and acceleration calculations 1006 into hardware.
  • This solution has the minimum area of the options considered, and performs sufficiently quickly to satisfy the real time display process.
  • the behavioral part of the system 1002 includes the calculation of the positions, velocities and accelerations of the masses at 1006 (which will subsequently be partitioned to software), and the line and drawing processes at 1010 (which will subsequently be partitioned to hardware).
  • the RTL hardware 1020 is used to receive the input from the buttons at 1022 and output the video at 1024.
  • the partitioner 408 used the estimators 420, 422 and 424 to estimate the speed and area of each possible partition based on the use of a customized processor.
  • the interface cosynthesizer 410 implements the interface between hardware and software on two FPGA channels 1004 and 1008 and these are used to transfer a position information to the rendering process and to transfer the button information to the position calculation 1006 from button. input 1022.
  • the width adjuster 406 which is working on the mass motion part of the problem to be partitioned to software, parameterizes the processor to have a width of 17 bits and adjusts the width of "curr_pos" which is the current position to nine bits, the width of the segment channel.
  • the processor parameterize at 17 further parameterizes the processor by removing unused instructions such as multiply, interrupts, and the data memory is reduced and multi-threading is removed. Further, op codes are assigned and the operator width is adjusted.
  • the partitioner 408 generates a dependency graph as shown in Figure 11 which indicates which variables depend on which. It is used by the partitioner to determine the communications costs associated with the partitioning, for instance to assess the need for variables to be passed from one resource to another given a particular partitioning.
  • the system uses parameterization and instruction addition and removal for optimal processor design in on FPGA.
  • the system provides an environment in which an
  • FPGA-based processor and its compiler can be developed in a single framework.
  • the hardware can be designed to run in parallel with the processors to meet speed constraints.
  • time critical parts of the system can be allocated to custom hardware, which can be designed at the behavioral or register transfer level.
  • Non-time critical parts of the design can be allocated to software, and run on a small, slow processor.
  • the system can target circuitry on dynamic FPGAs.
  • the FPGA can contain a small processor which can configure and reconfigure the rest of the FPGA at run time.
  • the system allows the user to explore efficient system implementations, by allowing parameterized application-specific processors with user-defined instructions to communicate with custom hardware. This combination of custom processor and custom hardware allows a very large design space to be explored by the user.
  • send(position, p5) send(position, p5) send(position, p6) send(position, p6) send(position, p7)
  • buttons * Get button information */ receive(buttons, button status); * Fix top point according to buttons */ if (button status & 1)
  • interface bus out video out(Visible(sx, sy) ? (video ? (unsigned 12)Oxfff : 0) 0) with video spec;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Stored Programmes (AREA)
  • Hardware Redundancy (AREA)

Abstract

L'invention concerne un système qui met en oeuvre un système d'exploitation stocké dans un dispositif logique reconfigurable. Le système comprend un dispositif logique reconfigurable dans lequel au moins une partie d'un système d'exploitation est stockée. Le système d'exploitation peut être exécuté pour gérer un ou plusieurs processus matériels ou logiciels. L'invention concerne également un procédé de contrôle des processus d'un système, qui permet de démarrer le système d'exploitation. Au moins une partie, de préférence l'ensemble, du système d'exploitation est stockée dans un dispositif logique reconfigurable. Le système d'exploitation est utilisé pour gérer au moins un processus matériel et/ou logiciel comprenant des applications.
PCT/GB2001/004551 2000-10-12 2001-10-11 Systeme et procede destines a un systeme d'exploitation dans un dispositif logique reconfigurable WO2002031678A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001295716A AU2001295716A1 (en) 2000-10-12 2001-10-11 System and method for an operating system in a reconfigurable logic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US68718600A 2000-10-12 2000-10-12
US09/687,186 2000-10-12

Publications (2)

Publication Number Publication Date
WO2002031678A2 true WO2002031678A2 (fr) 2002-04-18
WO2002031678A3 WO2002031678A3 (fr) 2003-12-31

Family

ID=24759423

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2001/004551 WO2002031678A2 (fr) 2000-10-12 2001-10-11 Systeme et procede destines a un systeme d'exploitation dans un dispositif logique reconfigurable

Country Status (2)

Country Link
AU (1) AU2001295716A1 (fr)
WO (1) WO2002031678A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1300734C (zh) * 2004-11-10 2007-02-14 中国人民解放军国防科学技术大学 蚂蚁寻优软硬件划分中的初始信息素生成方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970254A (en) * 1997-06-27 1999-10-19 Cooke; Laurence H. Integrated processor and programmable data path chip for reconfigurable computing
US6658564B1 (en) * 1998-11-20 2003-12-02 Altera Corporation Reconfigurable programmable logic device computer system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1300734C (zh) * 2004-11-10 2007-02-14 中国人民解放军国防科学技术大学 蚂蚁寻优软硬件划分中的初始信息素生成方法

Also Published As

Publication number Publication date
WO2002031678A3 (fr) 2003-12-31
AU2001295716A1 (en) 2002-04-22

Similar Documents

Publication Publication Date Title
US20020100029A1 (en) System, method and article of manufacture for compiling and invoking C functions in hardware
US20030121010A1 (en) System, method, and article of manufacture for estimating a potential performance of a codesign from an executable specification
US20020072893A1 (en) System, method and article of manufacture for using a microprocessor emulation in a hardware application with non time-critical functions
US20030140337A1 (en) System, method, and article of manufacture for data transfer reporting for an application
US20030117971A1 (en) System, method, and article of manufacture for profiling an executable hardware model using calls to profiling functions
US6668312B2 (en) System, method, and article of manufacture for dynamically profiling memory transfers in a program
WO2000038087A1 (fr) Systeme a conception mixte materiel/logiciel
US20030120460A1 (en) System, method, and article of manufacture for enhanced hardware model profiling
EP1004085B1 (fr) Conversion de programmes graphiques en realisations materielles et systeme correspondant
US6584601B1 (en) System and method for converting graphical programs into hardware implementations which utilize probe insertion
US6608638B1 (en) System and method for configuring a programmable hardware instrument to perform measurement functions utilizing estimation of the hardware implentation and management of hardware resources
Koch et al. FPGAs for software programmers
Chou et al. The chinook hardware/software co-synthesis system
De Micheli Computer-aided hardware-software codesign
Rowen Engineering the complex SOC: fast, flexible design with configurable processors
US6784903B2 (en) System and method for configuring an instrument to perform measurement functions utilizing conversion of graphical programs into hardware implementations
US8473269B1 (en) System-level hardware and software development and co-simulation system
WO2002061576A2 (fr) Systeme, procede et article manufacture pour des constructions d'interface dans un langage de programmation capable de programmer des architectures de materiel
EP1502214A1 (fr) Dispositif et procede permettant de gerer la conception des circuits integres
US20030023653A1 (en) System, method and article of manufacture for a single-cycle floating point library
EP0772140A1 (fr) Environnement et procédé de conception pour la conception combinée de matériel et de logiciel
Wagner et al. Strategies for the integration of hardware and software IP components in embedded systems-on-chip
Iseli Spyder: A reconfigurable processor development system
Buchenrieder et al. HW/SW co-design with PRAMs using CODES
Saul Hardware/software codesign for FPGA-based systems

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载