+

WO1995019001A1 - Moyen de mise a jour pour module de gestion d'elements de stockage d'ordinateurs - Google Patents

Moyen de mise a jour pour module de gestion d'elements de stockage d'ordinateurs Download PDF

Info

Publication number
WO1995019001A1
WO1995019001A1 PCT/US1995/000196 US9500196W WO9519001A1 WO 1995019001 A1 WO1995019001 A1 WO 1995019001A1 US 9500196 W US9500196 W US 9500196W WO 9519001 A1 WO9519001 A1 WO 9519001A1
Authority
WO
WIPO (PCT)
Prior art keywords
container
value
information
update
state
Prior art date
Application number
PCT/US1995/000196
Other languages
English (en)
Inventor
Jared M. Harris
Ira L. Ruben
Original Assignee
Apple Computer, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apple Computer, Inc. filed Critical Apple Computer, Inc.
Priority to AU15601/95A priority Critical patent/AU1560195A/en
Publication of WO1995019001A1 publication Critical patent/WO1995019001A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • G06F9/4493Object persistence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • G06F9/449Object-oriented method invocation or resolution

Definitions

  • BACKGROUND Application programs in a computer system typically need to manage data in a manner that permits frequent updating.
  • Two broad examples of such application programs are a word processor and a database manager.
  • Word processors need to be able to manipulate sections of text and other related information each time the user modifies a document, and a database program needs to insert, delete and modify entries in accordance with a user's requirements. Updating is an issue for computer manufacturers as well, for example to permit upgrades to operating system routines, including those which are provided in ROM.
  • persistent storage of information refers to information which remains after an application program which references or creates it, terminates. Persistent storage is often nonvolatile in that it also survives shutdown of the computer system, but in some situations can be partially or completely volatile.
  • a word processor it is often desirable to support the ability of two or more different users to update a single document at the same time.
  • a database system it is often desirable to permit different users to update the database data concurrently.
  • Pessimistic concurrency which, while permitting many users to read and view the data concurrently, permits only one user to modify the data at a time.
  • the system "locks out” all other users from write accesses when one user has the data open for updating.
  • Pessimistic concurrency can be implemented at a file level or, in sophisticated database programs for example, at a record level. That is, for file level locking, only one user may have the file open at a time for writing. This is the typical manner with which word processors implement concurrency.
  • a database program can implement record level locking if, for example, a backend process is the only process which has the data file open for writing, and all other users issue their commands and queries through the backend process.
  • Optimistic concurrency in which two or more users can update data at the same time. To the extent the concurrent updates conflict, only one can successfully be committed. Optimistic concurrency is different from pessimistic concurrency in that users are permitted to make updates concurrently, subject to subsequent detection of conflicts and resulting inability to commit the updates.
  • serializability of updates the characteristic that for a set of committed updates to the data, at least one sequence exists in which those specific updates could have been performed to achieve the resulting state of the information.
  • Optimistic concurrency permits increased performance in some circumstances, but still produces serializable committed updates.
  • a few programs implement an update model known as "versioning, " which does challenge the requirement of serializable updates.
  • versioning In a versioning mechanism, several concurrent updaters can each have an independent yet internally consistent view of the information. The views are known as "configurations". The different updaters modify the information concurrently, and can write their updated configurations to persistent storage, all subject to subsequent reconciliation.
  • MPW Macintosh ® Programming Workshop
  • Apple Computer, Inc. Cupertino, CA.
  • MPW Projector is described in the MPW 3.1 Reference Manual, and in H. Kanner, "Projector, An Informal
  • MPW Projector is a good first step toward reducing the constraints imposed by strict serializability, significant additional flexibility is highly desirable. For example, Projector's finest level of granularity is still represented by a "file". It would be desirable to support much finer degrees of granularity. As another example, MPW Projector's provisions for reconciling two conflicting versions of a document is limited to a single procedure in which the computer identifies strict text differences, and a user indicates how each text difference should be resolved. Significant additional intelligence will be desirable in the comparison procedure, as would significant increased flexibility and automation in the resolution of conflicts, as well as support for comparisons between non-text information. Accordingly, there is a need for much greater flexibility in the support of non-serialized updates in the maintenance of data.
  • documents and other collections of stored information are made up of multiple content elements, such as text, tables, images, formatting information, mathematical equations and graphs.
  • content elements such as text, tables, images, formatting information, mathematical equations and graphs.
  • content elements may be copied out of a document and used in yet another document, and so on.
  • Patches are usually created by hand, thus requiring a person to generate them who knows the base operating system in extensive detail, often at the binary level. Patches will also fail, or even corrupt the user's system, if some aspect of the user's system configuration or prior update history was not taken into account when the patches were created. Patching can be performed also where the base version is provided in read-only memory, provided the system accesses such information using a level of indirection which is changeable. But this can be even more complicated than direct patches, and often creates additional problems of its own.
  • MPW projector is a good first step toward implementing versioning.
  • MPW Projector is an integrated set of tools and scripts whose primary purpose is to maintain control of the development of source code. It preserves in an orderly manner the various revisions of a file, and through the versioning mechanism also prevents one programmer from inadvertently destroying changes made by another. If the underlying data is text, data compression is achieved by storing only one complete copy of a file and storing revisions only as files of differences.
  • Projector also has a facility for associating a specific set of file revisions with a name, this name being usable as a designator for a particular version, or release, of a product. Thus the name alone can be used to trigger the selection of just those source files that are required to build the desired instance of the product.
  • MPW Projector maintains versions in a tree structure. When one user desires to modify a file in the main Projector database, the user "checks out" the file, thereby making a copy of the file in the user's own directory.
  • the user can check out a file either as "read-only” or, if no one else has already done so, as "read/write”. After modifying the file, the user can then "check in” the file back to the main Projector database, either as a new version in a new branch of the file version tree, or, only if the file was check out as read/write as a new version in the same branch of the version tree.
  • MPW Projector performs a strict text-based comparison between the two versions of the file and displays the differences in a pair of windows on the computer system display. A user then cuts-and- pastes portions from one window into the other in order to merge them together.
  • MPW projector's technique for reconciling different updates of a base version is useful only when the underlying data is text, and operates only by comparison of the two updates to be reconciled. No record is kept of the individual changes that were made from the base version to each of the updated versions. The reconciliation process for an implementation of version merging can be made significantly more intelligent if such a record was kept.
  • each updater should be provided with an independent "view" of the information. That is, if a work includes a plurality of modules, each updater should have his "current" view of the information include all of the modules which have not been changed in his or her current version, plus only his or her current version of the modules which have been changed. MPW projector accomplishes this, but only at the file level. Much finer granularity would be desirable. For example, if the information is a document, it would be desirable for the level of granularity to be as small as a paragraph, or even a sentence. If the information is in a database format, it would be desirable for the level of granularity to be possibly as small as a record or less.
  • the update mechanism should perform them atomically. That is, while the storage mechanism should be able to maintain partial updates in nonvolatile storage, so as to reduce memory requirements, the base document should always be recoverable in case of a system or power failure.
  • the application program copies any unedited portions of the base information into the temporary file so that the temporary file is complete, then renames the old base file to a second temporary name, then renames the first temporary file to the name of the prior name of the base file, and then deletes the old file.
  • This technique becomes problematical, however, when the information file is extremely large. In particular, it can be seen that the technique requires twice as much available space in the nonvolatile storage medium as the ultimate file requires. In addition, the extensive amount of copying required by the mechanism can severely degrade performance.
  • Shadow page technique Another conventional technique for handling atomic updates is sometimes referred to as the "shadow page technique".
  • the base file is divided into pages, and an index to the current version of the pages is maintained. Updates are managed atomically at the page level rather than the file level, and are accomplished by writing the new version of a page, then updating the index to point to the new version rather than the old version of that page, and then deleting the old version of the page.
  • the index itself may also be shadowed.
  • the set of old pages, identified by the old index, and the set of new pages identified by the new index completely describe the base and updated states of the information, respectively.
  • the shadow page technique avoids the large file problem mentioned above, but can still be inadequate in many situations. For example, the minimum granularity of a page may still be too coarse. Additionally, like other update mechanisms described above, reconciliation of two or more concurrently created updates is difficult in the shadow page technique in part because no record is kept of the changes which were made from the base version to each update version to be reconciled.
  • the shadow page technique has yet another problem, which arises from the fact that different pages may end up at different places in the file.
  • the indexing mechanism permits random placement of these pages.
  • a consecutive read of the information in the file therefore can cause extensive jumping around inside the file, thereby degrading performance.
  • Yet another conventional update technique can combine shadow pages with the use of a log file.
  • changes are made in a temporary version of the information, for example in memory, the individual changes are recorded in a log which is written out to nonvolatile storage.
  • the current state of the information as represented in memory is written to persistent storage and a marker is written to the log file to indicate that persistent storage is consistent up to that point in the log file.
  • the problem of randomly located pages is minimized in this technique since speed optimizations such as elevator algorithms can be used during the write.
  • log-enhanced update techniques There are many variations of log-enhanced update techniques, including some in which the change log identifies base pages logically rather than physically.
  • An extreme example of log-enhanced updating techniques is set forth in Douglis and Osterhout, "Log Structured File Systems," Compcon '89 (1989) , incorporated herein by reference.
  • the Osterhout paper describes a log-structured file system which maintains only the change log in persistent storage. No complete set of the information exists in persistent storage. Rather, it is always merely reconstructed in memory, to the extent needed, by traversing the log file. Persistent storage also maintains a pointer to the last known valid change specified in the log file, and atomicity of updates is accomplished by redirecting that pointer to point to a subsequent position in the log file at the time of each "commit".
  • log-enhanced techniques nor the exclusively log-based technique, however, supports any concept of more than one valid consistent state of the information, and therefore does not support non- serialized concurrency well. They also fail to adequately handle situations like the operating system patch example set forth above, and do not permit different users to have independent views of the current state of the information being updated.
  • an update mechanism which can operate at fine granularity, support independent views of the information by multiple updaters, facilitate reconciliation of concurrent updates, and maintain atomicity of updates without requiring large amounts of free disk space or substantially degrading performance.
  • the same update mechanism could be used for all different types of information, including operating systems, databases, documents and so on.
  • the update mechanism it would be desirable for the update mechanism to be integrated with other container manager mechanisms intended to reduce storage space without degrading performance, handle fine granularity units of information, support a wide variety of different types of content elements, and unify the methods by which the application programs access different kinds of storage media.
  • a target container defines a first state of the information
  • the update container which can point to the target container, identifies changes to the information in the first state which would be sufficient to update the first information state to a second information state.
  • Update containers may be nested to any depth.
  • the procedure searches down the chain until it finds the ultimate target container.
  • the procedure then creates in-memory structures for providing access to the objects and value data represented in such container.
  • the procedure then works its way back up the chain, performing the changes on the in-memory structure, which are called for in each of the update containers.
  • New modifications made after this process is complete, are recorded, and when committed, are written out into a new update container that the application program originally opened.
  • the changes which are identified in an update container if they represent modifications to an object in an underlying container, refer to that object logically rather than physically. For example, they include a reference to a persistent object identifier which is unique within the underlying contianer and its own base containers.
  • certain changes are allowed on existing closed containers, such as rearranging information or deleting superfluous information, as long as the logical view of the closed container remains the same.
  • the update mechanism described herein supports multiple concurrent (parallel) updaters, since more than one update container can refer to the same target container. Thus each updater has an independent view of the information being updated. Also, the mechanism facilitates reconciliation of concurrent updates since it maintains a record of the changes that were made.
  • the application program can have it appended either to the end of the target container or can have it stored as a separate file. If it is stored as a separate file, then its reference to the target container can be stored as a dynamic value, thereby affording to the update mechanism all the flexibility of dynamic values described in the related DYNAMIC VALUE MECHANISM FOR COMPUTER STORAGE CONTAINER MANAGER patent application.
  • the procedures determine which handler to call to perform a given operation in dependence upon the type of the object for which the procedure was invoked.
  • the handlers too, are relatively easy to write because like the application program, the rules permit the handlers to call the very same set of procedures (recursively if the very same procedure is called) as are available to the application program.
  • handlers too can be written without knowledge of any characteristics of the object for which they are invoked other than the characteristics defining the type for which the handler is specifically written. For example, a read handler for a type which defines a data compression/decompression algorithm need never know where the data is physically located since it merely calls the predefined read value procedure to obtain the data to decompress.
  • Types can be defined in a tree structure. This further simplifies the writing of handlers since the different characteristics of a type can be divided into many small components, each defined by a different sub-type on the tree. Thus each handler can be written to accomplish only a limited objective (for example an I/O redirection or a data transformation) .
  • the predefined procedures automatically follow the chain of handlers defined by a type tree, by knowing where in the chain a given caller of the procedure is. Neither the application program nor the handlers themselves need keep track of this information.
  • the predefined procedures make no assumptions about the types in a type tree.
  • An application developer can define novel types as required by dividing them into subtypes (if desired) and writing handlers for each subtype.
  • the complexity of the handlers depend only on the complexity of the transformation or redirection which they are to individually perform, not on the complexity of either the type tree or the procedures which implement the present invention. So long as the handlers follow certain rules of good behavior, the predefined procedures will be able to follow any such user-defined type tree. Additional procedures are provided for associating the individual handlers to their corresponding types (subtypes) , and for building the type trees themselves.
  • computer apparatus stores a subject value and a chain of sequentially associated value handlers for the subject value.
  • the chain includes a top value handler and a bottom value handler, each of the value handlers in the chain except the bottom value handler invoking the respective next value handler when invoked, the bottom value handler performing an operation on the subject value when invoked.
  • the value operations can be data read operations, data write operations, etc., and the value handlers in the chain can perform data transformations and/or data redirections, transparently to its caller.
  • the dynamic value chain is not stored in persistent storage; rather it is created when an application program desires to perform a value operation on the subject value.
  • the subject value has a type associated with it which determines the value handlers to be placed in the chain.
  • the chain can have more than one value handler in it for a given value operation if the type associated with the subject value is made up of a hierarchy of sub-types.
  • Figs 1 and 2 illustrate type trees
  • Fig. 3 is a block diagram of a hardware computer system platform
  • Fig. 4 is an overall block diagram of major data structures which are created in main memory of the computer system of Fig. 3 during the pendency of a session;
  • Fig. 5 illustrates the structure of in-memory objects which are created by dynamic value mechanism
  • Fig. 6 illustrates the same structure as Fig. 5 using a simplified notation
  • Fig. 7 illustrates a type object using the notation of Fig. 6;
  • Fig. 8 is a flowchart of a CMReadValueData() routine
  • Figs. 9, 10 and 11 are representations of information stored in a table of contents in persistent storage
  • Figs. 12 and 22 illustrate the structure of in- memory objects which are created to support an update mechanism
  • Fig. 13 is a state transition diagram describing a finite state machine
  • Fig. 14 is a table defining actions taken and the next state for the finite state machine of Fig. 13;
  • Fig. 15 illustrates a logical view of certain value data
  • Fig. 16 illustrates a physical view of the value data illustrated in Fig. 15;
  • Figs. 17, 18 and 19 illustrate ways in which value data may be modified
  • Fig. 20 illustrates major structures of an update container in persistent storage
  • Fig. 21 describes the structure of value updating instructions.
  • the embodiment described herein takes the form of a Container Manager and its associated data structures which can be used by developers of a wide variety of types of application programs.
  • the Container Manager includes a number of C language type definitions and a number of procedures for implementing the functionality provided by the Container Manager. Together they provide a common application program interface (API) for the different types of application programs.
  • API application program interface
  • the structures are described first with respect to their logical organization and subsequently their physical organization in the storage apparatus managed by the container manager. That is, they will be described first with respect to the view which the container manager software provides to an application programmer via the API, and subsequently with respect to the way that logical organization is actually implemented in the present embodiment.
  • an object is a collection of data that "hangs together" and that can be referenced by other data.
  • Objects can be simple or complex, small (a few bytes) or large (up to 2 M bytes) .
  • objects of the Container Manager are typically larger and more complex, because they represent user meaningful content elements, rather than the atoms and molecules used to build this content.
  • a sequence of bytes of data would not by itself be an object, because we can only understand the bytes if we know how they will be used.
  • a paragraph, an image, etc. can be an object if it contains enough information so that we know how to interpret it.
  • an object contains information about what kind of object it is, and some data, which provides the content of the object. In this description, the information "about" the object is called metadata, and the content of the object is called its value.
  • the Container Manager groups objects in an object container, which is some form of data storage or transmission (such as a file, a piece of RAM, or an inter-application message) that is used to hold one or more objects (both their metadata and their values) .
  • object container which is some form of data storage or transmission (such as a file, a piece of RAM, or an inter-application message) that is used to hold one or more objects (both their metadata and their values) .
  • objects both their metadata and their values
  • These containers are defined by a set of rules for storing multiple objects in a such a container, so that software that understands the rules can find the objects, figure out what kind of objects they are, and use them correctly.
  • the rules accommodate a wide variety of different kinds of objects, different ways that applications want to use objects, and system considerations about how data can be stored.
  • the Container Manager provides a container definition that can conveniently, efficiently, and reliably hold all the different kinds of objects that users and applications want to group together, store, and exchange.
  • the Container Manager does not define how any given object is structured internally (within its value) so as not to limit the formats which an application developer may want to define.
  • Objects stored in a container can have proprietary or standard formats, they can be designed to use the Container
  • the Container Manager manipulates and stores data using primary and secondary entities.
  • the primary entities used by the Container Manager are containers, objects, properties, values, and types.
  • Every object is in some container.
  • An object consists of a set of properties. The properties are not in any particular order.
  • Each property consists of a set of values with distinct types. The values are not in any particular order.
  • Every object must have at least one property, and that property must have at least one value.
  • Each value consists of a variable length sequence of bytes.
  • the Container Manager knows very little about a container beyond the objects in it. However, the container always contains a distinguished object, and applications can add arbitrary properties to that object, so applications can specify further information about the container if they wish. Containers are often files, but they can also be many other forms of storage. For example, in various applications developers already support the following types of containers: blocks of memory, the clipboard, network messages, and Container Manager values. Undoubtedly other types of containers will be useful as well.
  • Each Container Manager object has a persistent ID which is unique within its container. Other than that, objects don't really exist independent of their properties. An object contains no information beyond what is stored in its properties.
  • Properties A property defines a role for a value. Properties are like field names in a record or struct, with two differences. First, properties can be added freely to an object, so an application should never assume an object only has the properties it knows about. Second, property names are globally unique, so that they can never collide when various different applications add properties to the same object. This also means that the same property name always means the same thing, no matter what object it is in. Properties are distinct from types, just as field names are distinct from the data type of the field.
  • a property indicating the date created might have a string, Julian day, or OSI standard date representation. These different formats would not be indicated by the property, but by the type (see below) .
  • Values are where the data is actually stored. In terms of physical location, this data might actually be stored anywhere in a container. In fact, it can be broken up into any number of separate pieces, and the pieces can be stored anywhere. (See the discussion of value segments below.)
  • Each value may range in size from 0 bytes to 2 M bytes, although that range can differ in a different embodiment.
  • the overhead per value varies depending on the circumstances. For an object with a single value, the typical overhead will be 21 bytes. For a small value which is one of several values associated with a property, the overhead can be as low as five bytes.
  • Types The type of a value describes the format of that value. Types record the structure of a value, whether it is compressed, what its byte ordering is, and so on.
  • the Container Manager provides an open-ended mechanism, so that types can be extended to include whatever metadata is required.
  • the type of a string value could indicate the alphabet, whether it was null terminated, and possibly other information (such as the intended language) . It might also indicate that the string was stored in a compressed form, and could indicate the compression technique, and the dictionary if one was required. If the string used multi-byte characters, and the byte-ordering was not defined by the alphabet, the type could indicate the byte-ordering within the characters.
  • the Container Manager defines an inheritance mechanism to make building complex types like this efficient.
  • the structure of types is tied into the mechanism for accessing values, so that the type associated with a value causes the appropriate code to be invoked to access the value, decompress it, byte-swap it, and so on.
  • the specific mechanism for doing this is referred to herein as Dynamic Values.
  • Secondary Entities In addition to the primary entities manipulated by the Container Manager, there are several additional entities that play supporting roles in the Container Manager design. These entities are important to fully understand how The Container Manager works, but they do not significantly change the picture given above.
  • Type and property descriptions Each property associated with a value is actually a reference to a property description. Similarly, the type of a value is actually a reference to a type description. These type and property descriptions are objects, and their IDs are drawn from the same name-space as other object IDs.
  • IDs are never reused once they have been assigned, so even if an object is deleted, its ID will never be reassigned. These IDs are obviously essential to the functioning of the Container Manager format, but they do not appear directly in the API. The only points at which an application actually deals with anything corresponding to an ID is when it needs to store an object reference into a value, or find the object corresponding to a reference retrieved from a value. Even in this case, however, the API does not give the application direct access to an object ID, but only to a token that corresponds to the ID in the context of that particular value. This hiding of actual IDs permits the Container Manager to perform reference tracking.
  • Refnums In the API, types, properties, and objects are referred to using opaque reference numbers provided by the Container Manager. The refnums are much more convenient to use than IDs because they are unique within the session, while an ID would need to be used together with a container reference. Since they are opaque, they allow implementations of the API that support caching schemes in which only portions of the container metadata are in memory at any given time.
  • Refnums have no persistent meaning, so they cannot be stored in values as references to other values.
  • the tokens provided by the reference calls must always be used for persistent references.
  • Dynamic values As mentioned above in the discussion on "Types", a Container Manager value can be compressed, encrypted, byte-swapped, etc. during read/write. Furthermore, these transformations can be composed together to form a chain of transformations.
  • a value actually stored in a container is a description of how to find the. data, rather than the data itself.
  • Such descriptions can be as simple as references to files, or to objects in another container, or as complex as queries that cause data to be retrieved from a database.
  • the Container Manager allows a value to be stored as multiple segments stored at different locations in the container. These segments are not visible at the API, since the Container Manager routines concatenate them to create a single stream of bytes.
  • the Container Manager also takes advantage of value segments to represent insertions, deletions, and overwrites of contiguous bytes in a value. This allows the Container Manager to represent these operations directly in recording updates, rather than having to create a new copy of the value.
  • the Container Manager makes use of dynamically linked handlers supplied by the execution environment for two reasons: portability and extensibility.
  • the use of handlers means that the Container Manager library is almost trivially portable, since all the system dependencies are in the handlers.
  • the Container Manager library is also easily extensible, with the addition of newly written handlers, since the handler interfaces are carefully designed to provide cleanly encapsulated abstractions.
  • the Container Manager employs session handlers, container handlers and value handlers. Session handlers are global to the session as a whole. These include allocating and de-allocating memory, and reporting errors.
  • Container handlers perform all of the actual I/O to containers. These handlers map I/O to the underlying storage in a way that depends on the container type. Container handlers basically provide a stream I/O interface to the container storage.
  • Value handlers implement both I/O transformations and value indirection. These handlers are determined by the type of each value. New handlers to carry out new types of data transformations or support new types of indirect values can be written at any time. These handlers are invoked entirely by the library. The accessing application does not need to know that it is using handlers to access the value.
  • the Container Manager provides a very powerful mechanism for transforming values during I/O, and for following indirect references.
  • the Container Manager type mechanisms are probably best explained in terms of some usage examples.
  • CMWriteValueData Container Manager's Write Value Data procedure
  • the mechanisms described herein allow us to store a reference to the file in a value. When the value is used, an I/O redirection is set up, without the application being aware of it.
  • the Container Manager avoids this problem. It allows any number of different types of references, implemented by handlers.
  • the mechanisms described herein allow us to take two (or more) data transformations, such as compression and format conversion, and compose them together. Just as the application does not need to be aware of the underlying transformations, the individual transformations do not need to be aware of each other.
  • ° A value contains a query that is used to look information up in a database.
  • the "I/O redirection" provides access to a table retrieved from the database.
  • ° A value contains a file reference that is encrypted because it also holds the file-server password. A decryption stage is required before the I/O redirector can be applied to the file reference.
  • ° A value contains a query that is used to generate a file reference, which then becomes the basis for a second level of I/O redirection.
  • CMSetMetaHandler Container Manager operation This association is session-wide. Then the handler is bound to a particular type in a given container through the name of that type. This binding is done when the container is opened.
  • the value essentially has two types: the type visible to the application, which encodes the format of the data from the application's point of view, and the type used to find the appropriate handler for compression, I/O redirection, etc.
  • Base types can be added to and removed from any Container Manager type using the Container Manager CMAddBaseType() and CMRemoveBaseType() operations.
  • Base types are normal types, and themselves may have base types. This could be useful, for example, when the combination of file access and decompression is used in a variety of different contexts.
  • the two could be made base types of a new type, and then that new type could be used in various ways, including making it a base type of the "all of the above" type which adds format conversion.
  • Fig. 1 is a symbolic diagram of a tree having three types 102, 104 and 106.
  • a value may have a "compressed file type” 102 associated with it, but the compressed file type 102 has two base types: a "file access type” 104 and a “compression type” 106.
  • the complex "compressed file type" 102 can be created by first defining the compressed file type 102 object, than calling the Container Manager procedure to add a base type 104 to the type 102, and then by calling the procedure again to add the base type 106 to the compressed file type 102.
  • Fig. 2 illustrates a more complex type tree.
  • the type "format converted compressed file type” 202 has two base types, “compressed file type” 204 and “format conversion type” 206.
  • compressed file type 204 has two base types, “file access type” 206 and “compression” 208.
  • base types will always form a tree routed in the original type. If the same type is used as a base type in more than one place in the tree, the separate uses are treated as entirely separate types.
  • the tree is flattened into a linear "chain" of types. In the present embodiment, this is done by performing a depth-first, post-order walk on the tree.
  • the resulting sequence is file access, compression, then compressed file.
  • an application program calls the Container Manager routine to read data from a value (CMReadValueData) , and the value has the type, "compressed file”, then the Container Manager will first call the read handler for the compressed file type 102. The read handler for the compressed file type 102 will then (through another call to CMReadValueData) call the read handler for the compression type 106, which in turn calls
  • the read handler for file access type 104 obtains (through yet another call to CMReadValueData) the information which is stored in the container in the storage area allocated to the value which the application desires to read, and uses this information to access the actual data on, for example, a hard disk.
  • This data, obtained from the hard disk, is the return value of the read handler for file access type 104.
  • This data gets decompressed by the read handler for compression type 106, and then returned to the caller by the read handler for the compressed file type 102.
  • the chain formed by the flattened type tree is considered herein to have a "top” and a “bottom” type which are, respectively in Fig. 1, compressed file type 102 and file access type 104.
  • file access type 206 In the type tree of Fig. 2, the depth-first, post- order walk of the tree flattens it into the following linear chain: file access type 206, compression type 208, compressed file type 204, format conversion type 206, and format converted compressed file type 202.
  • Format converted compressed file type 202 is the "top” type on the chain, and "file access" type 206 is the
  • compressed file type 204 and format converted compressed file type 202 do not have handlers associated with them (let us assume) , they will not have any effect on the value.
  • the present embodiment assumes two design constraints. First, the application, and each handler, must always think that it is dealing with a "normal" value (i.e. one without redirection or transformations) ; that is, any redirection or transformation must be completely transparent to the caller. Second, in several cases we saw that handlers might have a non-trivial amount of state to manage.
  • Dynamic values are transient (i.e. not persistent) ; they are created just to provide an environment for the handlers, and they are never written to the container, saved in the container's Table Of Contents (TOC) , etc. However, they do have ref ums and from the "outside” (i.e. from any application code or handler code except the handler that "owns” them) they look exactly like normal values.
  • dynamic values have the same "value header" as real values, except that instead of pointing to storage locations which contain actual value data, they point to a vector of "handlers", one for each of a predefined set of "value operations”, to be called when a prior caller desires to use the value.
  • the Container Manager routines which implement these operations first check whether the specified value is real or dynamic. If real, then the routine simply operates on the real data. If dynamic, then the routine calls the handler which is associated with the specified value for the specified value operation. Thus for a given dynamic value, a handler can be provided to support each of the following value operations: • 1992 Apple Computer, Inc.
  • CMSize CMReadValueData(CMValue value, CMPtr buffer, CMCount offset, CMSize maxSize) ; void CMWriteValueData(CMValue value, CMPtr buffer, CMCount offset, CMSize size) ; void CMInsertValueData(CMValue value, CMPtr buffer, CMCount offset, CMSize size) ; void CMDeleteValueData(CMValue value, CMCount offset, CMSize size) ; void CMGetValuelnfo (CMValue value, CMContainer *container, CMObject *object, CMProperty *property, CMType *type, CMGeneration *generation) ; void CMSetValueType (CMValue value, CMType type) ; void CMSetValueGeneration (CMValue value, CMGeneration generation) ; void CMReleaseVal
  • routines or code portions which are not described herein are considered self- documenting either due to commenting or due to the use of self-documenting symbol names.
  • CMGetValueSizeO operation returns the size of the specified value.
  • Container Manager value routines none can be called for a particular value until one of the following preparatory routines are called for that value: CMNewValueO or CMUseValue() .
  • CMNewValueO or CMUseValue() As described below, if the desired value is a dynamic value, these routines set up the chains of dynamic value handlers needed to support the above routines.
  • CMNewValue() or CMUseValue() the pointer to the top-most dynamic value header is returned as the refNum. Then, whenever the user passes a refnum to an API value routine, it checks to see if the refNum is a dynamic value. If it is, it initiates the call to the corresponding value handler. That may cause a search up the base value chain to look for an "inherited" value routine. In the limit, we end up using the original API value routine if no handler is supplied and we reach the "real" value in the chain. Thus the handler must be semantically identical to the corresponding API call.
  • a dynamic value is created when a value is created by CMNewValue() or used by CMUseValue() , and the following two conditions occur: 1.
  • the type of the value, or any of its base types, have metahandlers which have been registered by the Container Manager CMSetMetaHandler() routine in a session-wide metahandler symbol table (CMSetMetaHandler() is usually called when a container is first opened) ; and
  • the metahandlers support a Use Value Handler, and in addition for CMNewValue() , a New Value Handler.
  • the New Value Handlers are used to save initialization data for the Use Value Handlers.
  • the Use Value Handlers are called to set up and return a refCon. Another metahandler address is also returned. This is used to get the address of the value operation handlers corresponding to the standard API CM...value routines mentioned above.
  • CMNewValue() or CMUseValue0 is almost done, a check is made on the value's type, and all of its base types (if any) to see if it has an associated registered metahandler. If it does it is called with a Use value operation type to see if a Use Value
  • Handler exists for the type. If it does, we spawn the dynamic value.
  • the spawning is done by calling the Use Value Handler.
  • the Use Value Handler is expected to set up a refCon to pass among the value handlers and a pointer to another metahandler. These are returned to CMNewValue() or CMUseValue() which does the actual creation of the dynamic value.
  • CMNewValue() or CMUseValue() which does the actual creation of the dynamic value.
  • the extensions are initialized, the metahandler pointer and refCon are saved.
  • the pointer to the created dynamic value header is what CMNewValue() or CMUseValue() returns to the user as the refNum.
  • the Container Manager supports layering of dynamic values .
  • the best way to describe layering is in terms of C++ . Say we have the following class types (using a somewhat abbreviated notation) :
  • T is to be a registered type with other registered types as base types (classes) . All type obj ects are created using the standard API call CMRegisterType O . Base types can be added to a type by using CMAddBaseType ( ) . This defines a form of inheritance like the C++ classes .
  • Type T would be registered with its base types as follows : e 1992 Apple Computer, Inc.
  • layerl CMRegisterType (container, "Layerl")
  • layer2 CMRegisterType (container, "Layer2” )
  • t CMRegisterType (container, "T")
  • CMAddBaseType t, layerl
  • CMAddBaseType t, layer2
  • t CMAddBaseType
  • CMRegisterType (container, "T") .
  • the CMAddBaseType() calls add the base types. These are recorded as the object ID's for each base type in the order created as separate value segments for a special "base type” property belonging to the type object.
  • CMNewValue() or CMUseValue0 spawn dynamic values if the original type or any of its base types have an associated Use Value Handler. Assume that was done for "T” in the above example. What happens is that CMNewValue() or CMUseValue() will look at its type object (t here) to see if the base type property is present. If it is, it will follow each type "down" to leaf types using a depth-first search.
  • type Tl could have base types T2 and T3.
  • Tl could just have base type T2 and T2 have T3 as its base type!
  • each class could have its own data along with its own constructor.
  • the T class has a constructor that calls the constructors of all of its base classes. We can carry this analogy with the Container Manager just so far. Here is where it starts to break down.
  • C++ class types are declared statically. A C++ compiler can see all the base classes and can tell what data gets inherited and who goes with what class. In the Container Manager, all "classes" (i.e., our type objects) are created dynamically. So the problem is we need some way to tell what data "belongs" to what type.
  • the solution is yet another special handler, which returns a format specification called metadata.
  • the handler is the Metadata Handler whose address is determined by the Container Manager from the same metahandler that returns the New Value and Use Value
  • Metadata is very similar to C-language printf() format descriptions, and is used for similar purposes.
  • the data is created when a new value is created, i.e., with a CMNewValue() call.
  • the data will be saved in the container, so
  • CMUseValue() uses the type format descriptions to extract the data for each dynamic value layer.
  • CMNewValue() is defined with the following prototype:
  • CMValue CMNewValue (CMOb ect obj ect , CMProperty property, CMType type , . . . ) ;
  • the " . . . " is an arbitrary number of parameters used to create the data . Metadata, accessed from the
  • Metadata Handler tells CMNewValue ( ) how to interpret the parameters just like a printf ( ) format tells printf ( ) how to use its arguments .
  • the order of the parameters is important. Because base types are done with a depth-first search through the types down to their leaves, the CMNewValue() " ... " parameters must be ordered with the parameters for the first type in the chain occurring first in the parameter list. Note what's happening here is that the user is supplying all the constructor data just like T constructor class example above.
  • CMNewValue() calls the Metadata Handler, it uses the metadata to extract the next set of CMNewValue() " ... " parameters. CMNewValue() then passes the parameters along in the form of a data packet to the New Value Handler. The New Value Handler is then expected to use this data, which it can extract with the Container Manager CMScanDataPacke () routine. Once it has the data, it can compute initialization values to write to its base value. It is the data written by the New Value Handler that the Use Value Handler will read to create its refCon.
  • CMNewValue() Only CMNewValue() does this.
  • the New Value Handler is only for new values, but the Use Value Handler is used by both CMNewValue() and CMUseValue() .
  • the data is written to the "real" value. Now if you layer another dynamic value on to this, the next chunk of data is written using that layer's base value and hence its handlers.
  • the second layer will thus use the first layer's handlers. That may or may not end up writing to the "real" value depending on the kind of layer it is. If it's some sort of I/O redirection handler (i.e., it reads and writes somewhere else) , the second layer data will probably not go to the "real" value.
  • the Use Value Handler is called both for CMNewValue() and CMUseValue() .
  • the Use Value Handler reads the data from its base value to create its refCon. If the user comes back the next day and does a CMUseValue() , only the Use Value Handler is called. Again it reads the data from its base value to construct the refCon and we're back as we were before in the CMNewValue() case.
  • Metadata and New Value Handlers will always be executed with a Container Manager running on some particular hardware (obviously) .
  • the data packet built from the CMNewValue() " ... " parameters is stored as a function of the hardware implementation on which it is run (i.e., whatever the sizes are for bytes, words, longs, etc.). How it is stored is a function of the metadata returned from the Metadata Handler.
  • the New Value Handler has a contract with both the Container Manager and the Metadata Handler on the meaning of the parameter data.
  • the handler writer must keep this in mind. Specifically, the Use Value Handler must know the attributes (bytes size, big/little endian, etc.) of the data written out by the New Value Handler so it knows how to use that info. In other words, the Use
  • Value Handler has a (separate) "contract" with its own New Value Handler on the meaning of the data written to the base value.
  • value handlers for any one layer must take into account the size of its own data when manipulating additional data created by the handlers for CMReadValueData() , CMWriteValueDataO , etc. This simply offsets the write and read value data operations by the proper amount. Remember all operations are on their base values. So if a New
  • the Metadata Handler is only needed for
  • CMMetaData metaData_Handler (CMType type); where "type" is the (base) type layer whose metadata is to be defined.
  • the Metadata Handler simply returns a C string containing the metadata using the format descriptions described above.
  • the type is passed as a convenience. It may or may not be needed. It is possible for a type object to contain other data for other properties. Types, after all, are ordinary objects. There is nothing prohibiting the creation of additional properties and their values. This fact could be used to add additional (static and private) information to a type to be used elsewhere. For example, the type could contain a compression dictionary.
  • CMType type, CMDataPacket dataPacket CMDataPacket dataPacket
  • the type is passed again as a convenience just as in the Metadata Handler. It can also be used here to pass to CMScanDataPacket() to extract the dataPacket back into variables that exactly correspond to that portion of the CMNewValue() "" parameters that correspond to the type. It is not required, however that CMScanDataPacket() be used.
  • the Use Value Handler is called for both the
  • CMUseValue() and CMNewValue() cases If its companion New Value Handler wrote data to its base value, the Use Value Handler will probably read the data to create its refCon. The refCon will be passed to all value handlers. The Use Value Handler returns its refCon along with another metahandler address that is used to get the value handler addresses. These are used to create the dynamic value.
  • the baseValue and type are identical to the ones passed to the New Value Handler.
  • the type may or may not be needed in the Use Value Handler. Like the Use Value Handler, it could be used to supply additional information from other properties.
  • the Use Value Handler will read data from its base value to construct its refCon.
  • the refCon is then returned along with a pointer to another metahandler that is used by the Container Manager to get the addresses of the value operations.
  • both the New Value and Use Value Handlers return a CMBoolean to indicate success or failure. Failure means (or it is assumed) that the handlers reported some kind of error condition or failure. As documented, error reporters are not supposed to return. But in case they do, we use the CMBoolean to know what happened. It should return 0 to indicate failure and non-zero for success.
  • the value operation handler routines can do a Container Manager CMGetValueRefCon() call on the value which was passed, in order to get at the refCon set up by the Use Value Handler. This provides a communication path among the value handlers. Further, the value handler should usually do its operations in terms of their base value, which can be accessed using the Container Manager CMGetBaseValue() call. The release handler is an exception to this rule.
  • a set of one or more dynamic value layers are spawned as a result of a single CMUseValue() or CMNewValue0. The layers result from the specified type having base types.
  • the reason the Container Manager is so insistent on forcing a release for every use of a dynamic value is mainly to enforce cleanup.
  • Most value operation handlers will, at a minimum, use a refCon that was memory allocated by the Use Value Handler. Release handlers are responsible for freeing that memory. In another example, if any files were open by the Use Value Handler, the releases would close those files.
  • a trivial value handler might merely get its base value and use it to recursively call the Container Manager value procedure which initially invoked the handler to do its operation (again except for the release handler) .
  • the value operation could be omitted entirely by having the metahandler for the value's type return NULL when asked for that value operation.
  • the Container Manager uses that as the signal to search up the dynamic value inheritance chain to find the first metahandler that does define the operation. In the limit, it will end up using the original "real" value.
  • Value I/O operations are basically stream operations. That is, you read or write information linearly from a specified offset.
  • the Container Manager provides insert and delete value data API calls CMInsertValueData() and CMDeleteValueData() .
  • Insert and delete can cause problems because base types may want to do certain transformations on their data that depend on what has occurred previously in that stream of data. For example, encryption using a cyclic key, or compression generally cannot be done simply by looking at a chunk of data starting at some random offset.
  • a cyclic key encryption can be deterministic if you can always determine where to start in the key as a function of offset. But you can see that inserts and deletes will change the offsets of following data. You would not know where to start in the key. What all this means is that certain data transformations only make sense if you are willing to refuse to support the insert/delete operations. Basically only data transformations that are position independent can be supported with the full set of value operations.
  • each value knows its own property, type, and data location.
  • Every accessible byte is part of a value of some object.
  • Even the metadata that defines the structure of the container, and the label of the container, are values of an object.
  • Type descriptions are objects, property descriptions are objects, etc. We will exploit this fact in various ways below.
  • Objects have persistent IDs. Every Container Manager object is designated by a persistent ID which is unique within the scope of its container. Objects may have additional IDs and/or names that are unique in larger scopes, but this is not required.
  • Object IDs provide a compact, convenient way to refer to an object.
  • An efficient mechanism is provided to get from any object ID to information about that object.
  • the Container Manager embodiment described herein is designed to support very flexible layouts, such as multi-media interleaving, and internal tags would be inconvenient and even harmful for this.
  • Objects consist entirely of values.
  • an object has no value as such.
  • Each object has properties, and each property has values.
  • the Container Manager format provides no information about an object except its ID.
  • an object can have a single value; in that case the value of the property "is” the value of the object.
  • Container Manager format can easily accommodate this "normal” case.
  • Each value knows its own property, type, and data location.
  • Each value consists of a property ID (or role) , a type (or format) , and data.
  • a graphic object might have a value that describes its "clip mask”; the property ID would specify what role the value plays, but not what format it is stored in.
  • the type would define how the mask is represented: rectangle, bit mask, path, Mac region, PostScript path, etc.
  • the data would be the representation of the mask itself.
  • At the level of the container standard itself there are no restrictions on what values an object can have, how many values it can have, etc.
  • the data of a value is an uninterrupted sequence of bytes which may be from 0 to 2°* bytes long, although these limits may vary in a different embodiment.
  • This sequence of bytes has no format requirements or restrictions.
  • the byte sequences representing the data for various values of various objects can be placed anywhere in the container. Thus there are no strong data format requirements for the container as a whole, although it must contain the metadata to define its structure somewhere.
  • the Container Manager format allows a single object to have multiple values with the same property ID. All the values must have different types. Such multiple values are intended to be used as alternative representations of the same information.
  • the table of contents can contain multiple entries for a single value. These entries mean that the value represented by the entry is actually stored in multiple segments. This permits values to be broken up into chunks and interleaved, without creating problems for applications that view them as single values. In addition, it allows an application to build TOC entries that "synthesize" a value out of separate parts, as is required in retrofitting some file formats.
  • a property can have multiple values, and one or more of the values can be composed of multiple segments.
  • the Container Manager embodiment described herein supports identifiers defined in ISO 9070. These are names that begin with a naming authority (assigned to a system vendor or an application vendor) , and then continue with a series of more and more specific segments, until they end in a specific type or property name.
  • names generated according to ISO 9070 are both unique and self-documenting. Individual users can generate unique names using this approach. For example, a user developing educational stackware might want to create properties, or even types, to use in scripts. The stackware development environment could automatically generate a unique prefix for the user, based on the serial number of the development tool, and then append the user generated property or type name.
  • type and property descriptions are objects as well. Since types and properties need to have globally unique names, so that applications can recognize them, type and property descriptions will typically have a globally unique name value. In many cases, this may be the only contents of a type or property description object.
  • Base types allow inheritance of semantics from other existing types for composition into more complex types. Such base type information is intended to include uses such as encryption, compression, I/O redirection, etc. Encoding information.
  • a type definition may indicate the default encoding of its values. Typically, all of the values with the same basic format in a container will have the same encoding, so this new subtype can be shared by all these values.
  • the encoding can be indicated directly in the type description for the format. If values with the same basic format but multiple encodings exist in the same container, a more complex solution is required.
  • a subtype may be created just to record the encoding. Such a subtype will typically not need a globally unique name. Compression information.
  • the type can record compression parameters, the codebook used (if applicable) , etc.
  • a type that exists just to record compression information typically will not need a globally unique name. It will refer to the underlying format type and the compression technique, both of which will have globally unique names.
  • a template or grammar for a type is indicating the type description for the format.
  • a type could have properties that provide method definitions. Providing methods in the container would allow fully encapsulated use of values. Given the importance of the table of contents (TOC) to the structure of a Container Manager container, the first thing an application should to do when accessing a container is find the table of contents. A standard mechanism for locating the TOC makes this easy.
  • TOC table of contents
  • the standard Container Manager format to have the label at the end of the container. This makes backward compatibility easy.
  • a copy of the label can optionally be placed at the beginning of the container.
  • This allows easy recognition of a Container Manager container if it is coming in from a stream-oriented I/O mechanism. In exotic cases, this approach may not work. For example, transmission streams may depend on framing information, so that the stream can be read starting anywhere.
  • the Container Manager supports such exotic cases by allowing the TOC to be found in non-standard ways when necessary. Finding the TOC is the responsibility of the I/O handler, which will depend on the system and the container type. The I/O handler can adopt a non-standard approach if required. D. Format Definition
  • Container Label Format As mentioned above, a standard Container Manager container has a label at the end. Exotic containers may have labels in other locations. However, the label will always provide the same information. The label contains the smallest possible amount of information, because it has to be the most stable part of the standard. It consists of seven fields:
  • Manager container It must be chosen to be unlikely to occur as the initial or final bytes in any existing file.
  • the minor version number is set to zero.
  • the container label format should not be changed in future versions to add additional information. Instead, it can be added as values of the distinguished object #1, which represents the container itself.
  • the TOC as represented in persistent storage consists of a sequence of entries. Each entry corresponds to a single segment of a value of some object.
  • TOC entries are sorted by object ID, and within a single object they are sorted by property ID. Thus all the entries for a given object are contiguous in the TOC, and all the entries for a given property are contiguous within the object. Also, an object can be found within the TOC, or a property can be found within an object, by binary search. If an object ID or a property is not defined, we can quickly determine that it is not defined.
  • each object in the container is represented in the TOC by a sequence of entries, one for each segment of a value of the object.
  • the Container Manager has no way to represent an object without at least one value.
  • each TOC entry defines a value, we know immediately that it must indicate the object ID, property, type, and data of the value. In addition, it indicates the generation number of the value -in order to allow applications to check consistency between different properties.
  • the TOC entry may also contain bookkeeping information for the value.
  • the object ID field in a TOC entry identifies the object that this value is part of.
  • the property field identifies the value's property by the object ID of a property description.
  • the type field indicates the value's type by the object ID of a type description.
  • the entry indicates the value's data by the offset and length of the sequence of bytes representing the value.
  • the offset is a 0 origin byte offset from the beginning of the container.
  • the length is a byte count, and may be 0, indicating a 0 length value. If the data is four bytes long or less, it may be included directly in the TOC as an immediate value, rather than being referenced by offset and length.
  • a TOC entry could simply be defined by putting all the information above in a record. This record would be relatively large, however, and would be very likely to contain redundant and/or unused information.
  • the presently described Container Manager embodiment therefore uses an approach in which each TOC entry contains only the information that is new or different compared with the previous TOC entry. This results in a TOC that is organized as a stream rather than a table, and is parsed as it is read in.
  • the TOC stream format will be described at two levels: first, the logical structure of the stream, and second the actual physical representation.
  • Logical Stream Structure The following grammar describes the set of stream elements and how they can be combined. Terminals are given in upper case; they are described in detail in the list following the grammar. '*' means repeated 0 or more times. '+' means repeated 1 or more times. Square brackets mean present 0 or 1 time .
  • LONG -REFERENCE continued- data : : CTD-IMMEDIATE
  • Object-ID Property-ID.
  • Type-ID Ref-obj-ID.
  • object ID is a 32-bit persistent indentifier.
  • the namespace for object IDs is local to the container. Object IDs are assigned sequentially as objects are allocated, and the last ID allocated in the container is maintain as a property of object # 1.
  • Gen-num This is a 32-bit generation number of the value. It can be set by default from the container, or explicitly for a given value. Generation numbers are intended to be assigned sequentially to each consistent update of the container. The generation number of the TOC value in object # 1 is the default generation number of the container.
  • Ctd-immediate These are 0, 1, 2, 3 or 4 byte data values. They represent the actual data of the value. Except for the 0 byte case, they are all packaged in a 4 byte field.
  • the ctd-immediate is exactly the same as the immediate, except that it is flagged to indicate that it is a value segment other than the first.
  • Short-ref. Ctd-short-ref. These are references to the data for a value segment using a 32-bit offset and a 32-bit length.
  • the ctd-short-ref is exactly the same as the short-ref, except that it is flagged to indicate that it is a value segment other than the first.
  • Ctd-long-ref These are references to the data for a value segment using a 64-bit offset and a 32-bit length.
  • the ctd-long-ref is exactly the same as the long-ref, except that it is flagged to indicate that it is a value segment other than the first.
  • the TOC Physical Stream Structure. To make the TOC readable without starting from the beginning, it is broken up into fixed size blocks. The block size in any given container is specified in its label, and can range from 2 10 bytes to 2 20 bytes. At the beginning of each block, there is assumed to be no previous information, and all information is written to the TOC. Thus a block can be read in and interpreted in isolation.
  • the stream of elements in the TOC is annotated with format information to make it possible to parse it. This annotation is done by placing one-byte codes at each point where the stream is ambiguous. Each distinct code is followed by a specific sequence of fields.
  • Fields immediate data (1 byte) stored in a 4 byte field .
  • ReferenceListID 15U Fields object ID Meaning: The ID of a bookkeeping object associated with this value. Occurs before any data references. Omitted if the value does not have a bookkeeping object.
  • Every TOC contains a standard object that is used to describe the TOC itself. In particular, it is object ID 1, so the TOC entries for the TOC itself always come at the beginning of the TOC. (Object ID 0 is never used) .
  • Additional TOC properties can be useful. For example, an index to speed access to the entries by ID could be attached to the TOC through another property. Potentially several such indexes, using different formats, could be attached.
  • Object IDs other than IDs of standard objects are generated by sequentially incrementing a counter from 0x00010000. Object IDs are never reused in later generations of a container if an object is deleted. The last ID number generated is kept as a property of object # 1 to allow generating further IDs without reuse.
  • This section serves two purposes. First, it provides several examples of how the container format can be used. Second, the more advanced examples show how to use the built-in extensibility of the Container Manager to address additional needs, by defining the required properties and types. Thus these advanced examples provide examples of how to extend the basic format.
  • Embedded Stream Files Suppose we have a container that simply contains two objects, with references from the first stream to the second.
  • the first object could consist of a stream of rich text
  • the second object could be an image that is logically embedded in the text.
  • the image has two alternate representations in different formats, each of which is a stream. This example exercises much of the basic machinery of the Container Manager.
  • Fig. 9 illustrates relevant parts of the TOC.
  • Each TOC entry is labeled with a lowercase letter to make it easier to refer to it; the letters are not actually part of the TOC contents.
  • the rich text type is a standard object that does not need to be in the TOC, but that an entry is provided anyway.
  • the types of the image streams are not standard objects, so that they must be provided.
  • Entry (a) is the only entry in the rich text type description.
  • the local ID of the rich text type is 268. Note that since this is less than 2 16 we know that it is a standard object. Thus this TOC entry is not really required, but it is provided for illustrative purposes.
  • the value of the entry is simply the Globally Unique Name that identifies the type.
  • the text of the Globally Unique Name is listed below the TOC entries, labeled "vo.l” (for "value offset 1").
  • the value of vl.l is the length of the string, including the null at the end.
  • the property ID (18) means "TypeName”; the type ID (22) means “GloballyUniqueName” .
  • the description object is immutable, so it has generation 0.
  • Entries (b) and (c) are very similar. Each one is a type description object, as indicated by the number 18 in the Property ID field. Because their Object IDs are greater than 2 16 , we know that they are not standard objects, and thus they are actually required in the TOC (this is mainly for illustrative reasons) .
  • the property and type IDs are the same as in entry (a) .
  • Entry (d) is the single entry for the RTXT object. Its property ID (38) means "Primary Value”.
  • Its type ID (268) is the same as the Object ID of entry (a) , indicating that its value is an RTXT value. Its offset (vo.4) points to the actual rich text data in the container, which contains an embedded ID (723655) .
  • Its generation number (6) means that its entry and value were last modified in the sixth generation (copy) of this container.
  • Entries (e) and (f) are also very similar, but in a different way than (b) and (c) .
  • both entries have the same object ID, indicating that they are both properties of the same object.
  • both entries have the same property ID (38) indicating that they are both primary values of the object.
  • they are alternative representations of the primary value. Looking at their type IDs, we can see that one value is a PICT (400348) and the other is a Windows metafile (400563) .
  • looking at the generation numbers we can see that both have been updated more recently than the text stream but that the Windows metafile was updated most recently.
  • the TOC Itself. A somewhat more recursive example is the description of the TOC by itself, illustrated in Fig. 10. Every TOC actually contains such a self-description, so reader code actually has to deal with some of this structure, but it will typically not be visible at the application level.
  • the portion of the TOC illustrated in Fig. 10 is a portion of the same TOC as is illustrated in Fig. 9, so we will see some relationships between the content objects described above, and the properties of the TOC itself.
  • Entry (a) is an actual reference to the TOC as a value.
  • the offset field (vo.l) will contain the offset of the TOC in the container (that is, the offset of the beginning of this entry) .
  • the length field (vl.l) will contain the length of the TOC in bytes. The property indicates that this is the primary value for the TOC object, and the type indicates that it is the normal top-level TOC format.
  • Entry (b) is also a property of the TOC object. It contains a value one larger than the last ID used. Note in this case that it is somewhat higher than highest ID that appeared in the previous example. This occurs if more objects were created, and then deleted. The seed prevents reuse of those IDs, in case some of them have been remembered (either within this container, or in external references to objects in this container) . This prevents accidental aliasing.
  • entry (b) Since the value of entry (b) is only four bytes long, it can be stored as an immediate value.
  • Entry (c) indicates the root object of the content in the container. Note that it contains the ID of the RTXT stream in the previous example, since that is the root content object. The ID is also an immediate value. Note that the generation numbers of the TOC entries correspond to generation numbers of the relevant content objects. The generation of the TOC value itself is the generation of its most recent object. The generation of the root reference, however, indicates when the root was set to that object. It is the same generation as the object itself, so probably that object has been the root since it was created.
  • Entry (c) introduces the SuperType property, and it is used in entry (e) .
  • Entry (e) effectively says that a GloballyUniqueName is spelled in printable 7 bit ASCII. Note that it uses an immediate reference to the supertype.
  • Entry (f) defines the local ID type we have been using, and entry (g) defines printable 7 bit ASCII.
  • Entry (h) introduces a very general property that allows us to attach data format descriptions to objects. It is intended primarily for use in type descriptions. The interpretation of the data format description will depend on its type.
  • Entries (i) and (j) are parts of the description of a stream oriented data definition standard referred to herein as COBJ. (i) describes the type of actual COBJ values, while (j) describes the type of COBJ type dictionaries, which define the format of particular streams.
  • entries (k) and (1) and (m) describe RTXT (ID 268, as we saw in the first example) .
  • RTXT ID 268, as we saw in the first example
  • COBJ type dictionary we have the COBJ type dictionary, and the reference to COBJ as the supertype of RTXT.
  • the Container Manager of the present embodiment is implemented entirely as software instructions and data, to be executed on general purpose computer hardware. No specific hardware platform is required. For completeness, however, Fig. 3 illustrates a typical hardware computer system platform on which the Container Manager might run.
  • the computer system of Fig. 3 comprises a CPU 302, main memory 304, which may be volatile, an I/O subsystem 306, and a display 308, all coupled to a CPU bus 310.
  • the I/O subsystem 306 communicates with peripheral devices including nonvolatile storage devices, such as a disk 312.
  • an application program together with at least those Container Manager routines which are used by the application program, are retrieved from the disk 312 into main memory 304 for execution by the CPU 302.
  • All of the data structures described below and specified as being "in-memory" data structures are also created in the main memory 304, in the sense that memory space is allocated for the information to be contained in the data structures, and all of the software routines which read or write to such memory locations do so according to some known definition of fields.
  • pointers are written to certain of the allocated main memory storage areas, which pointers refer to other structures in memory in a known manner which is defined by the data structure.
  • a data structure as used herein, is an abstract description of the organization of data in main memory 304; when the data structure is "created” in main memory 304, this description is imposed on regions of main memory 304 so that specific items of information can be found and/or interpreted according to the data structure.
  • pointer is a well-known shorthand for physical signals which are stored as charge, current or voltage levels in the memory cells which implement the main memory 304. These signals "identify” an item of information memory 304 in the sense that, when applied to the memory 304 as an address (either directly or via an address translation mechanism) , they cause the memory 304 to read out data from the item pointed to or identified by such signals. Data structures in nonvolatile storage are similar.
  • Fig. 4 is an overall block diagram of the major data structures which are created in main memory 304 during the pendency of a "session".
  • Data block 402 is a "session global data” block containing all of the session-wide data for a given Container Manager session. There is no static global data in the code. All open containers are tied to the session on a doubly linked list whose head and tail pointers are contained in the session global data.
  • the root of a metahandler table 404 (described below) is kept here as well along with the session handler pointers for malloc, free, and error reporting.
  • Containers are identified in the session global data block 402 by a pointer to the container's
  • Container Control Block Each time a container is opened with CMOpenContainer() or CMOpenNewContainer() , a new container control block 406 is created. The pointer to the container control block 406 is what is returned to the user as a container "refNum" (reference number) .
  • the four shown main data structures pointed to from the container control block 406 are the list 408 of deleted values (TOCValueHdr(s) ) , a list 410 of embedded container pointers, the global name symbol table 412, and a pointer to a TOC 414 control block 416.
  • the table of contents (TOC) 414 is the set of related data structures that organize objects by object IDs.
  • the requirement that objects be kept in sorted order (sorted by object ID) puts certain constraints on its organization. Further, the fact that the IDs are generated sequentially in new containers also must taken into account (for example, binary trees would not be a good choice in such a situation) .
  • the method used in the Container Manager of the embodiment described herein is an index table algorithm. It is somewhat memory intensive but allows objects to be accessed linearly in time and keeps the objects in the required sorted order.
  • the index tables correspond to "powers" of a chosen index table size. For example, if the table size is 256 and the maximum ID is OxFFFFFFFF (32-bits unsigned on MC68XXX machines) the access depth will be 4 for any ID.
  • index tables each corresponding to the indices 00 to OxFF, i.e., mod the size of an index table. Each index is used to index into its corresponding index table.
  • the first table would have its 0x00'th entry pointing to the next table. That next table would have its 0x12 entry pointing to the third table.
  • the third table would have its 0x34 entry pointing to the last table.
  • the 0x56 entry in the last table would point to the actual object with ID 0x00123456.
  • routines that maintain this data structure are generalized to support any size table (within limits) . There are trade-offs between table size and access time, which are apparent to a person of ordinary skill.
  • TOC control block 416 pointed to from the container control block.
  • the TOC control block 416 is to TOC object access, what the container control block 406 is to the entire container.
  • the TOC control block points to another data structure not shown here to keep the drawings simple. It is a set of three head/tail list pointers to doubly linked lists of the TOCObject(s) .
  • the three lists are for all the objects, property objects, and type objects in the container. Thus the type and property lists are subsets of the object list.
  • These lists are only just for the CMGetNextxxx() routines. These lists are kept as part of the TOC since, there can be only one TOC and one of these list sets. Note that for updating, there can be multiple containers using the same TOC, so putting these data structures here is the most convenient way to deal with them during updating.
  • a TOC since there can be multiple users of a TOC, a TOC requires a "use count" to prevent premature release of the TOC.
  • the lowest level of the TOC index tables 418 contain pointers to the container objects themselves instead of to other index tables. These objects are TOCObjects 420.
  • the TOC entries for an object are linked off of their TOCObject.
  • TOCObjects are returned to the user as object refNums (CMObject, CMType, and CMProperty) .
  • the properties, TOCProperties 422, for an TOCObject are contained on a doubly linked list off the TOCObject.
  • the values for each property are on a doubly linked list of value headers, TOCValueHdrs 424, off of each TOCProperty.
  • a specific real (as opposed to dynamic) value such as one of the TOCValues 426, is linked to its TOCValueHdr.
  • a "header" for an item or items of information is a logical collection of information which applies generally to the item or items.
  • the header need not be physically located in a contiguous region of memory, nor must it be contiguous with any of the items themselves.
  • Each TOCValue 426 can be either immediate, non-immediate, or a global name.
  • Immediate values contain 1, 2, or 4-byte value data encoded directly in the entry.
  • Non-immediates contain a container offset to the value data and its length. Non-immediates can also represent dynamic values (discussed below) .
  • Global name values, such as 428 are pointers to global name symbol table entries (discussed shortly) and once the value data has been written to the container, the container offset.
  • each TOCValueHdr has a pointer back to its TOCProperty.
  • each TOCProperty has a pointer back to its TOCObject. Not shown is a pointer from each TOCObject and each TOCValueHdr back to its container control block. The result is that anything can be accessed from almost anywhere and in any direction.
  • CMRegisterType0 or CMRegisterProperty() is done, a check must be made to see if the specified global name already exists. For this, a simple binary tree symbol table 412 is used.
  • a global name is itself a type or property object value, there is also a pointer from a TOCValue to the name in the global name symbol table.
  • Each global name symbol table is unique to its container. Hence the container control block has the root to its tree of global names.
  • TOC ID #1 of every container contains properties about that container as well as the TOC itself.
  • Tables I and II summarize all the TOC #1 properties. Table I is in terms of property and value ID numbers and Table II is in terms of the property and value names. In all cases the values are shown as the macros defined for these entities. The left column in the tables is the actual property number subscripted with an "r" or "o" indicating whether it is required or optional. Required means that the TOC #1 object must have that property to be considered valid.
  • CM_StdObjID_TOC_Del CM_StdObjID_UpdatesDat object/property deletion lOo eteList a updates
  • Table II is similar to Table I, but in terms of property and type names, which is the way a user would access TOC #1 properties (using CMRegisterType() and CMRegisterProperty() ) :
  • the starting object ID seed for the container It is the next available ID seed at the time the container is closed.
  • CMTOCSeedGlobalName The starting object ID seed for the container as a constant .
  • CMTOCSeedGlobalName When updating, this is the initial seed for the updating container as derived from the CMTOCSeedGlobalName of the target .
  • CMTOCMmSeedGlobalName to CMTOCSeedGlobalName define the range of obj ect IDs in a container at close time .
  • the private TOC Defines the private TOC as an obj ect and must agree with the size defined in the container label . This will be the entire TOC unless there is a non -private (updating) portion. For updating, the non-private TOC is defined by the
  • CM_ULONG value representing all deleted value data space created by CMDeleteValueData ( ) .
  • the type is always CMTargetContainerName and the value's offset/size are to the appended target (including the label) .
  • This is similar to embedded containers.
  • the type is not CMTargetContainerName. Rather it's a dynamic value type as determined by a "return target type” handler. A CMUseValue() on such a value will spawn a dynamic value whose handlers will access the separate target.
  • the object and property deletion update instructions for an updating container when applied to a target are described.
  • CMReadValueData() to a value that belongs to the parent container.
  • the handlers keep track of offsets within the value that it is treating as a container. The effect is to write or read a value in the parent container as if that value was itself a container. All the data for the parent container value is created as a container, complete with its own TOC and label.
  • the offsets in the embedded container's TOC are relative to the start of the parent container value, offset 0, just as in the non-embedded case. This means that a parent container value could be read to copy the embedded container as is.
  • a container can have any number of embedded containers open at the same time. Each of those could also, and so on. The result is essentially a tree of open embedded containers. Since the data for a parent container value is its embedded container, then if there are any more deeply embedded containers, they would also be part of the parent container's value. This gets very confusing if you try to think of it more than two levels deep.
  • the embedded container list 410 pointed to from the container control block is used so that a parent container can keep track of all of its immediate descendants. Each entry in the list is simply a pointer to a descendent container control block. At open time an entry for the embedded container is created in its parent container's embedded container list. At close time CMCloseContainer() will go through its list of "embedded containers (i.e., the list of its immediate descendants) and recursively call CMCloseContainer() to close those. The net result is the desired one of closing all the descendants of the parent container in the tree of embedded containers. An embedded container being closed is responsible for removing itself from its parent's embedded container list so that it won't be "seen” again if a parent further up the tree is closed subsequently.
  • an application program can perform the functionality of embedded containers by using dynamic values.
  • the Container Manager not being aware of this use of dynamic values, will not maintain the embedded containers list for it.
  • the application program will be responsible for explicitly "closing" each dynamic value corresponding to an embedded container using CMReleaseValueO.
  • CMDeleteObject() When CMDeleteObject() is called, an object is to be deleted.
  • CMDeleteValue() a value for a property of an object is to be deleted.
  • refNums for objects CMObject, CMProperty, and CMValue
  • CMValue are pointers to TOCObjects.
  • Values CMValue
  • the solution adopted is to put all deleted objects and values on a list of deleted items associated with the container.
  • TOCObjects are TOCObjects
  • value refNums are TOCValueHdrs
  • TOCProperties and TOCValues can be freed.
  • the TOCObjects and TOCValues are flagged as "deleted”. Whenever any object or value is passed to the API it is checked for the flag. It is an error to use a deleted item.
  • CMSetMetaHandler() is called by the user to record metahandler/type name associations. These are maintained in binary tree symbol table 404. The root of this tree is a "global" in the session data. It is not tied to any one container. When a container is opened, a type name is passed. This is used to look it up in the metahandler symbol table. This yields a metahandler function address which in turn is used to get actual handler routine addresses.
  • the following C-language struct defines the layout of all in-memory TOCObjects.
  • the objects are accessed by their object ID. o 1992 Apple Computer, Inc. struct TOCObject ⁇ /* Layout of a TOC object: */
  • CMObjectID objectID /* the object's ID (keep first for debugging) */ ListHdr property-,ist; /* list of object property entries */ struct Container *container; /* ptr to "owning" container control block */ struct TOCObject *nextObject; /* chain to next object by increasing ID */ struct TOCObject *prevObject; /* chain to previous object by decreasing ID */ struct TOCObject *nextTypeProperty; /* chain of next type/property by increasing ID*/ struct TOCObject *prevTypeProperty; /* chain of previous type/property by deer.
  • each of the properties 502, 504 and 542 is defined as follows: ⁇ 1992 Apple Computer, Inc. struct TOCProperty ⁇ /* Layout of a TOC object property: */ ListLinks propertyLinks; /* links to next/prev property (must be first) */
  • TOCObjectPtr theObject; /* ptr to "owning" object */
  • CMNewValue() and CMUseValue() create a dynamic value chain for each type that has a "UseValue” and a "NewValue” handler.
  • Fig. 5 illustrates the structure of in-memory objects which are created by the dynamic value mechanism.
  • one of the TOCObjects 420 (Fig. 4) has a series of properties 502, 504, and so on (corresponding to 422 in Fig. 4) . It is assumed further that property 502 has two values associated with it, indicated by value headers 506 and 508 (424 in Fig. 4) . These values are of different types, as will be seen from the fact that different dynamic value chains are created for these values. Property 504 also has values associated with it, but these are shown only in the abbreviated form of an arrow 510.
  • Associated with real value header 506 is a segment 512 of real value data, and associated with value header 508 are two segments 514 and 516 of real value data. If the values for the property 502 were not of types which require creation of dynamic values, then the actual data of the values would be stored in segments 512, 514 and 516. Since the type of these values call for dynamic value creation, however, the data stored in real value data segments 512, 514 and 516 may instead be transformed versions of the actual data and/or may contain only indirection information.
  • the value header structure includes a pointer to the top dynamic value header 518 in a chain of dynamic value headers 518, 520 and 522.
  • Each of the dynamic value headers 518, 520 and 522 have a format which is identical to the value header (also called a "real value header") 506, except that the field in real value header 506 which pointed to dynamic value header 518, is redefined in dynamic value header 518 to point to a set of dynamic value header extensions 524.
  • the extensions 524 include an entry which points to the base value of the dynamic value header 518, which in the case of this chain, merely points to the second dynamic value header 520 on the chain.
  • Dynamic value header 520 in turn points to its own dynamic value header extensions 526, which in turn point, in the base value field, to dynamic value header 522.
  • Dynamic value header 522 also points to its dynamic value header extensions 528. But since dynamic value header 522 is at the bottom of the chain, its base value is the real value data stored in segment 512. Thus, the "base value" field of extensions 528 points back to the real value header 506.
  • each of the dynamic value headers 518, 520 and 522 corresponds to a respective one of the types on the tree defining the complex type of the value headed by real value header 506.
  • each of the dynamic value headers 518, 520 and 522 maintains its own vector of value handlers to be used when a higher level caller desires to invoke a value operation.
  • These dynamic value vectors are illustrated in Fig. 5 as 530, 532 and 534, pointed to respectively by extensions 524, 526 and 528.
  • the dynamic value vectors 530, 532 and 534 contain a series of pointers to the respective value handlers to be called.
  • the pointers are in predefined locations in the vector; for example, the third entry in each vector contains the pointer to the WriteValueData handler to be called for a value data write operation.
  • the value header 508 in Fig. 5 is for a value whose type spawned only a single dynamic value header 536.
  • the value header 508 points to dynamic value header 536, which in turn points to its extensions 538, which in turn points both to a dynamic value vector 540 and, for the base value, back to the value header 508.
  • a special dynamic value property 542 is created only to contain the dynamic value headers. Only the bottom-most layer of each dynamic value chain (the layer whose base value is the "real" value) is on the dynamic property chain. All higher layers are not part of the dynamic property chain.
  • the dynamic value property chain is used to simplify deletion of dynamic values, for example when the container is to be closed.
  • Dynamic value headers never have value segment lists. No data is ever written to a dynamic value because these headers are removed when the value is released using a CMReleaseValueO. If there is any data, it must be associated with a "real" value -- the real value associated with a dynamic value or some place else.
  • dynValueData is NULL for "real" value headers that don't have a corresponding dynamic value.
  • dynValueData.dynValue is a pointer to the top-most layer if it is a "real" value that does have a corresponding dynamic value header.
  • dynValueData.extensions is a pointer to the extensions if it is itself a dynamic value header.
  • the value header's flags determine how to interpret the pointer.
  • CMNewValue() or CMUseValue() the pointer to the top-most dynamic value header is returned as the refNum. That means whenever the user passes it to an API value routine, it will check to see if the refNum is a dynamic value. If it is, it initiates the call to the corresponding value handler using the vector in the extensions. That may cause a search up the base value chain to look for the inherited value routine. In the limit, the original API value routine is used if no handler is supplied and the "real" value in the chain is reached. That's how data could get in there.
  • Fig. 6 illustrates the same structure as Fig. 5 using a simplified notation.
  • CMNewValue() or CMUseValue() sees any (base) type that has an associated "use value” handler, it will spawn a dynamic value.
  • the spawning is done essentially by calling the "use value" handler. It is expected to set up a refCon to pass among the value handlers and a pointer to another metahandler. These are returned to CMNewValue() or CMUseValue() which uses newDynamicValue() to do the actual creation of the dynamic value.
  • CMNewValue() or CMUseValue() which uses newDynamicValue() to do the actual creation of the dynamic value.
  • the extensions are initialized, the metahandler pointer saved, and the refCon is also saved.
  • the pointer to the created dynamic value header is what CMNewValue() or CMUseValue() returns to the user as the refNum.
  • the refNum is for a dynamic value. If it is, the corresponding handler routine will be called.
  • the vector entries are set on first use of a value operation. It may mean searching up the base value chain, but once found, the handler address is saved in the top layer vector (associated with the refNum) so the search doesn't have to be done again.
  • the t object can be represented as shown in Fig. 7 (using the notation of Fig. 6) .
  • the value data segments are shown here with the data the segment will point to in the container.
  • the global name property 704 and value 706 are created, as usual, by calling CMRegisterType() .
  • the CMAddBaseType() calls add the base types. These are recorded as the object IDs for each base type in the order created as separate value segments 708, 710 for a special "base type" property 712 belonging to the type object 702.
  • the value segments 708, 710 store only the Object IDs of the base types; the global name of the base types are stored as values such as 706 in respective type objects similar to 702.
  • CMNewValue0 or CMUseValue0 spawn dynamic values if the original type or any of its base types have an associated "use value" handler. Assume that was done for T in the above example. What happens is that CMNewValue() or CMUseValue0 will look at its type object (t here) to see if the base type property is present. If it is, it will follow each type "down" to leaf types using a depth-first search. In the example, layerl will be visited, then layer2, and finally the original type T itself. If the layerl type object had base types of its own, they would be visited before using layerl itself. Hence the depth-first search down to the leaf types.
  • newDynamicValueO For each type processed, if it has a "use value" handler of its own, it will be called to get a refCon and value handler metahandler. These are passed to newDynamicValueO to create a dynamic value for the original "real" value. newDynamicValueO always returns its refNum that will be the dynamic value it created. The first layer will create the dynamic value property and put the dynamic value header on its value header list. All further calls to newDynamicValue() will pass the most recent refNum returned from it. newDynamicValueO then chains these off the first dynamic value header. This produces the desired layering result.
  • TOCValue the format of one of the TOCValue data segments 426 (Fig. 4) or 512, 514, 516 (Fig. 5): o 1992 Apple Computer, Inc.
  • CMJJLONG offset ; /* offset to value in container */ struct Global ame *globalNameSymbol; /* ptr value for a global name (in memory) */ > globalName; union ⁇ /* actual value if imnediate placed here: */ CMJJCHAR ucharsValue[2*sizeof(CMJJLONG)]; /* value if imnediate unsigned char(s) */
  • ListLinks valueLinks ; /* links to next/prev value (must be first) */ struct TOCValueHdr *theValueHdr; /* ptr back to ValueHdr "owning 11 this value */ ContainerPtr container; /* ptr to "owning" container control block */
  • TOCValueBytes value /* value and length or imnediate value */ unsigned long logicalOffset; /* original (unedited) logical offset */
  • ContainerPtr container /* ptr to "owning" container control block */ unsigned long size; /* total current size of the value data */ unsigned long logicalSize; /* original (unupdated) size of the value data */ unsigned short valueFlags; /* flags indicating stuff about the value */
  • CMGeneration generation /* generation number */ unsigned long useCount; /* count of nbr of times "used” */ CMRefCon valueRefCon; /* user's value refCon */ TouchedListEntryPtr touch; /* ptr to updating touched list entry */ union I* this field depends on kind of value hdr: */ struct TOCValueHdr *dynValue; /* ptr to dynamic value hdr or NULL */ struct DynValueHdrExt *extensions; /* ptr to dynamic value hdr extensions */
  • TOCObjectPtr refDataObject /* associated ref object; NULL if no refs */ ListHdrPtr refShadowList; /* or shadow list of the actual data */
  • IsDynamicValue(v) is a more self-documented test to see if a TOCValueHdr is indeed a dynamic value, while DYNEXTENSIONS(v) allows simpler notational access to a dynamic value header's extension fields. */
  • DynamicValueVectorEntries DynamicValueVectorEntries, •DynamicValueVectorEntriesPtr; struct DynamicValueVector ⁇ DynamicValueVectorEntries cmGetValueSize; DynamicValueVectorEntries cmReadValueData; DynamicValueVectorEntries c riteValueData; DynamicValueVectorEntries cmlnsertValueOata; DynamicValueVectorEntries cmDeleteValueData; DynamicValueVectorEntries cmGetValuelnfo; DynamicValueVectorEntries c SetValueType; DynamicValueVectorEntries cmSetValueGen; DynamicValueVectorEntries cmReleaseValue;
  • CMGetBaseValue() When a handler is called, it is expected to do its operations on the "base value" of the value passed to it. It gets its base value using CMGetBaseValue() . However, we don't want to allow recursive use of the API for the same value. That would call the handler again and we would be in an infinite loop. Thus the active switch is provided to check for this so we can report an error.
  • the dynamic value vector is initialized with each handler address thisValue set to NULL.
  • the metahandler which was returned from the "use value” handler (the metahandler address is saved in the value header extensions) to get the proper value handler address. It is saved in the handler field of the vector entry.
  • the handler used may correspond to a different dynamic value.
  • CMSize (*TcmGetValueSize)(CMValue value); typedef CMSize (*TcmReadValueData)(CMValue value, CMPtr buffer, CMCount offset, CMSize maxSize); typedef void (*TcmUriteValueData)(CMValue value, CMPtr buffer, CMCount offset, CMSize size); typedef void (*TcmInsertValueData)(CMValue value, CMPtr buffer, CMCount offset, CMSize size); typedef void (*ToDeleteValueData)(CMValue value, CMCount offset, CMSize size); typedef void (*TcmGetValueInfo)(CMValue value, CMContainer *container, CMObject *object, CMProperty *property, CMType *type, CMGeneration *generation); typedef void (*TcmSet)(CMValue value
  • each corresponding API value operation must check to see if it has a dynamic value and call the corresponding handler which does the operation. It must get the proper address on first use. It must set switches to mark the handler as active. It must also set a switch to allow CMGetBaseValue() calls which are only allowed from dynamic value handlers. Thus the algorithm for calling a value handler looks something like this
  • the macro will pass the appropriate value corresponding to a possibly inherited handler. If the handler returns a value save it to be returned as the result.
  • DisAllowCMGetBaseValue(container) SignalDynHandlerAvailable(v, h) ; return [result] ;
  • v is a dynamic CMValue (note that GetDynHandlerAddress may CHANGE it) ; h is a pointer to a vector entry in the extensions; and g is the metahandlers operation type string.
  • the GetDynHandlerAddress() routine takes a vector entry and sets the handler address as a function of the "g" metahandler operation code. On first call it will search for inherited method if necessary. The vector is updated with the found handler address and the associated "this" value saved. This is the value returned and whose we set. We do the call and reset the switches, all using the same "this" value.
  • the IsDynamicValue() call is defined above.
  • the "h” in all these macros is the vector entry name, and the "x" parameter in GetDynHandlerAddress is used for error reporting.
  • the "s" is the name of the API routine doing the call. This is used by cmGetDynHandlerAddress0 simply as an insert if it should report an error prior to returning NULL. Note the AllowCMGetBaseValue and
  • CMGetBaseValue() is only allowed from value operation handlers.
  • the two macros control a single switch, getBaseValueOk, which the CMGetBaseValue() routine checks.
  • the switch is actually a counter which, just to be safe, is never allowed to stay negative. 0 means CMGetBaseValue() is illegal. Greater than 0, it's legal.
  • the reason the switch is a counter is because dynamic values use CMGetBaseValue() to do their operations in terms of their base values. If a dynamic value's base is also dynamic (i.e., we have layered dynamic values) , then we have a nesting condition. Hence the counter.
  • the Application Program Interface includes a number of calls to Container Manager routines to perform session operations, container operations, type and property operations, object operations, value operations and reference operations. Only certain of the calls which are important for an understanding of the invention will be described herein. 1. Session Operations
  • This routine records the association of Global Names with their metahandlers in metahandler table 404 (Fig. 4) .
  • the designated metahandler will be associated with the typeName.
  • the previous metahandler for this type name, if any, is returned. If there was no previous metahandler defined, NULL is returned.
  • the association between handlers and type names is global within a session, rather than specific to a given container.
  • a metahandler will be called whenever the Container Manager or the application needs to find out how to perform a given operation on a container or value of this type.
  • the metahandler can define specific handlers for any number of different operations, potentially with completely different interfaces. Metahandlers for values are required to support certain value operations, listed previously.
  • a metahandler When called, a metahandler returns a procedure pointer to specific handlers that can carry out the desired operation. Typically, these procedure pointers will be cached and then used in the normal manner. Each metahandler may provide handlers for any number of operations, though it itself implements only the operation of obtaining and returning a requested handler pointer.
  • a metahandler has the following prototype:
  • targetType is the refnum of the type to which the operation will be applied
  • operationType is the name of the desired operation. targetType is required because in some cases the operation may be applied to values whose type has no global unique name.
  • This approach provides more flexibility than simply passing a vector of procPtrs, and allows each operation to have its own prototype for static type checking, which would be impossible if operations were indicated by passing a selector.
  • This function searches the metaHandler symbol table 404 for the specified typeName and returns the associated metahandler address. If no metahandler is associated with that type name, it returns NULL.
  • This routine takes a targetType which has a globally unique name and uses that name to find a metahandler. It then calls the metahandler to get the handler routine address for the specified operationType. The function returns the resulting address.
  • Metahandler proc addresses are given to the Container Manager by calls to CMSetMetaHandler.
  • the global name for the input targetType is treated as the typeName to find the metahandler.
  • Containers (files and blocks of memory) are always accessed through handlers, to provide platform independence and support nested containers. Handlers are responsible for creating a container if necessary, opening and closing it, managing stream I/O to it, and reading and writing the container label (which provides such information as the location of the Table of Contents) .
  • the interfaces to container handlers are documented elsewhere herein. The types of storage that can be used as containers are limited only by the types of handlers available.
  • CMContainer CMOpenContainer(CMSession sessionData, CMRefCon attributes, const CMGlobalName typeName, CMContainerUseMode useFlags) ;
  • This operation opens an existing Container Manager container.
  • the attributes must designate management structures for the container storage. This attributes argument is not examined by the Container Manager, but is simply passed to the appropriate handler interfaces. It is intended to provide the information necessary for the handlers to locate a specific container. Thus attributes serves as a communication channel between the application and the Open handler. In its simplest form for a container file it would be a pathname. For an embedded container, it would be the parent value (CMValue) , corresponding to the embedded container.
  • CMValue parent value
  • the typeName is used to find a metahandler defined for that same typeName.
  • the metahandler in turn, defines the handlers for the container and thus knows how to get at the physical container. These handers must understand the attributes provided.
  • the useFlags must be 0 or kCMReuseFreeSpace. 0 implies that the container is to be open for reading only. No writes may be done. If kCMReuseFreeSpace is specified, than both reading and writing may be done to update the container. Free space from deleted data will be reused and overwrites of existing data may be done to change it (subject to the container label flags, see below) . A container refnum is returned. Note that an individual value can be opened as an embedded container. Through the attributes, the value is passed to the handlers. This value must be typed as an embedded container value. Embedded containers can have embedded containers which can also be opened and read. The effect is that a tree of nested containers can be opened and read without restriction.
  • CMContainer CMOpenNewContainer(CMSession sessionData, CM efCon attributes, const CMGlobalName typeName, CMContainerUseMode useFlags, CMGeneration generation, CMContainerFlags containerFlags, ...); This operation opens a new Container Manager container for writing. This is similar to opening for reading (see documentation above) except that here a new and empty container is opened.
  • a minimum TOC is created along with the special TOC object 1 with its seed and offset properties.
  • the resulting container can be updated.
  • the useFlags may be 0, kCMConverting, kCMUpdateByAppend, or kCMUpdateTarget.
  • generation is the generation number of the container; it must be ⁇ 1. If this container is a copy of a previous container, the generation number should be 1 greater than the generation number of the previous container.
  • containerFlags is the flag value that will be stored in the container label. No container flags are currently defined.
  • CMContainerUseMode the physical container is assumed to already contain a sequence of bytes that the caller wants to convert to container format.
  • the application uses CMDefineValueDataO to create values for objects in the bytes. All new stuff, including the TOC is written at the end of the existing stuff. The Container Manager will not modify the existing data. If the kCMUpdateByAppend or kCMUpdateTarget flags are set, all updates to a "target" container are recorded in the container being opened. Future opens of this container, with CMOpenContainer() will apply the updates to the target to bring it "up-to-date" while it is open.
  • kCMUpdateByAppend is specified, then the container is opened for update-by-append. All updates are appended to the existing container and an additional TOC is layered on to the end of the container when closed. Each time the container is opened and then closed for update-by-append, the new updates and a new TOC are appended. Whenever such a container is opened (in any mode) , all the updates are applied appropriately to the original container. Using kCMUpdateTarget is similar to kCMUpdateByAppend, but the updates are recorded in a new container.
  • any number of embedded containers can be opened. Also embedded containers can be opened within embedded containers to any depth. The effect is that a tree of nested containers can be opened and written without restriction. However, when a CMCloseContainer is done on a parent container, all of its descendants will also be closed.
  • CMOpenNewContainer It is an error to call CMOpenNewContainer with a value that belongs to a container that is not updatable, since that call would create an embedded container open for writing. void CMGetContainerlnfo(const CMContainer container, CMGeneration *generation, CMContainerFlags *containerFlags, " CMGlobalName typeName) ;
  • NULL may be passed for any reference; in that case the corresponding value is not returned.
  • the session global data pointer returned from CMStartSessio O is passed to most of the handler routines defined in this file. This routine is provided to make it easier to retrieve the pointer as a function of the container refNum.
  • VOID CMCloseContainer(CMContainer container) If the container was open for writing, all I/O to the designated container is completed, and the table of contents and label are built and written out.
  • This call closes the specified container and all its currently opened embedded containers (if any) .
  • This call destroys the association between the container refnum and the designated container.
  • the specified container refNum and all the others corresponding to the embedded containers are invalid. All memory associated with a container's data structures is freed.
  • the container refNum may be returned by a subsequent CMOpenContainer call, designating another container.
  • VOID CMAbor Container(CMconst_CMContainer container) The container is closed without further writing to the container, i.e., as if it were opened for reading even when opened for writing. This is intended to be used to abort processing of the container from unrecoverable errors.
  • this routine will return to its caller. It is up to the user to actually abort execution if that is required.
  • a refnum to a new object in the designated container is returned. At this point the object has nothing but an identity.
  • currObject is generally a refNum previously returned from this routine. Successive calls to this routine will thus yield all the objects in the container.
  • Objects are returned in order of increasing ID. If there are no larger object IDs defined, NULL is returned. To begin the iteration, pass NULL as the object refnum.
  • CMProperty CMGetNextObjectProperty(CMObject theObject, CMProperty currProperty) ; A refnum for the next property defined for this object is returned.
  • currProperty is generally a refNum previously returned from this routine. Successive calls to this routine will thus yield all the properties for the given object.
  • This routine returns the refNum for the next property defined for the given object. If there are no more properties defined for this object, NULL is returned. If currProperty is NULL, the refNum for the first property for the object is returned.
  • CMGlobalName CMGetGlobalName(CMObject theObject);
  • the name of the designated object is returned. This operation is typically used on types and properties, but it can be applied to any object with a Globally Unique Name property. NULL is returned if the object does not have a Globally Unique Name. 5. Type and Property Operations
  • Types and properties may be registered more than once; the refnum returned from all the different registrations of the same type is the same.
  • Identity of types is defined by string equality of their names.
  • CMType CMRegisterType (CMContainer targetContainer, const CMGlobalName name) ;
  • the designated type is registered in the designated container, and a refnum for it is returned.
  • CMProperty CMRegisterProperty(CMContainer targetContainer, const CMGlobalName name) ;
  • the designated property is registered in the designated container, and a refnum for it is returned.
  • CMBoolean CMIsType CMOb ect theObject
  • CMBoolean CMIsProperty CMObject theObject
  • currType is generally a refNum previously returned from this routine. Successive calls to this routine will thus yield all the type descriptions in the container.
  • Types are returned in order of increasing ID. If there are no larger type IDs registered, NULL is returned. To begin the iteration, pass NULL as the type refnum.
  • CMProperty CMGetNextProperty(CMContainer targetContainer, CMProperty currProperty) ;
  • currProperty is generally a refNum previously returned from this routine.
  • CMCount CMAddBaseType(CMType type, CMType baseType);
  • This routine defines base types for a given type so that layered dynamic values can be created.
  • Base types essentially provide type inheritance. As previously described, the type trees created by CMAddBaseType() are stored in "type objects".
  • a base type is added to the specified type. For each call to CMAddBaseType() for the type a new base type is recorded. They are recorded in the order of the calls. The total number of base types recorded for the type is returned. In the embodiment described herein, it is an error to attempt to add the same base type more than once to the type.
  • CMAddBaseType() The specified base type previously added to the specified type by CMAddBaseType() is removed. If NULL is specified as the baseType, all base types are removed. The number of base types remaining for the type is returned. 4. Value Operations
  • All the I/O calls in the present embodiment do I/O to or from a buffer provided by the application.
  • This routine is used to get the refNum for the value of an object's property of the given type. NULL is returned if the value does not exist, or if or the object does not contain the property. If the type of the value corresponds to a global type name that has an associated "use value” handler, or if its base types (if any) have associated "use value” handlers, a dynamic value will be created and the refnum returned will refer to a dynamic value rather than the base value. (Normally, an application will never be aware of this difference.)
  • CMOpenContainer() a value that is used as an embedded container
  • the data i.e, the embedded container for such a value can only be defined by using CMOpenNewContainer() .
  • the container type name must be associated with a special set of handlers that define a "return parent value" handler. This handler returns the parent value refNum whose data contains the embedded container.
  • the data for an embedded container value includes a TOC. Unless a "blind" copy is being done, the TOC read this way is of not very much use.
  • CMValue CM FIXEDARGS CMUseValue(CMObject object, CMProperty property, CMType type) C TOCPropertyPtr theProperty;
  • ContainerPtr container
  • ExitIfBadObject object, NULL
  • the cmFollowTypes() routine referred to in the above code creates a dynamic value layers for the passed type and all of its base types, if any of these types have a "use value" handler. This routine is only called by CMNewValueO or CMUseValue() . For - Ill -
  • CMNewValueO "metadata" handler and "new value” handlers are also required.
  • the top-most dynamic value header pointer is returned, and is in turn returned from CMUseValue() or CMNewValue() .
  • NULL is returned if an error is reported.
  • the original "real" value is returned if no dynamic values are created.
  • the isNewValue parameter should be set to false. It should only be set to true for CMNewValueO. Also for CMNewValueO, the constructorData must point at the CMNewValueO "" parameters. These are consumed as the base type metadata (returned from the "metadata” handler) describes how to create data packets from the "... " parameters. The packets, in turn, are passed to the "new value” handlers. A "new value” handler uses its data packet to write (possibly different) data to its base value. This written data will then be read and used by the "use value” handler.
  • the "use value” handler is called for both the CMUseValue() and CMNewValueO cases. If it's companion "new value” handler wrote data to its base value, the "use value” handler will probably read the data to create its refCon. The refCon will be passed to all value handlers. The "use value” handler returns its refCon along with another metahandler address that is used to get the value handler addresses. These are then used to create the dynamic value.
  • cmFollowTypes() recursively follows the types, looking for base types as defined by CMAddBaseType0. Each type can have any number of base types. The recursion effectively produces a depth-first search of all the base types. As each type is completed (i.e., no more base types for it) , a dynamic value is created as described above. That is, for CMNewValueO, a type's "metadata" handler instructs us on how many CMNewValueO "" parameters to consume and how to construct their packet. That is passed to the "new value” handler so it can write some appropriate data to the base value. The "use value” is called in all cases which reads the data written by "new value” to construct its refCon. The refCon is returned here along with the metahandler address that will yield the value handler routine addresses.
  • the refCon and metahandler address are passed to newDynamicValueO to construct one dynamic value (layer) .
  • the resulting dynamic value is used as the base value for the next layer. This produces the desired data structures.
  • CMNewValueO "" parameters must be ordered for the "deepest" type first. For example, given a type tree in which Tl has base types T2 and T3, and T3 has base types T4 and T5 (read as Tl inherits from T2 and T3, and T3 inherits from T4 and T5, the depth-first search, starting at Tl, yields the sequence: T2 T4 T5 T3 Tl. Then this is the order the CMNewValue() " ... " parameters must be in.
  • cmFollowTypes() e 192 Apple Computer, Inc.
  • TOCValueHdrPtr cmFollowTypes (TOCValueHdrPtr baseValueHdr, TOCObjectPtr type, Boolean isNewValue, va list *constructorData)
  • ContainerPtr container baseValueHdr->container; TOCObjectPtr baseType; TOCPropertyPtr baseTypeProperty;
  • CMHandlerAddr useValueHandler, newValueHandler, metaDataHandler; unsigned char *dataPacket; char *newOrUseValueName, *metaData, *typeName;
  • the newDynamicValueO call is the one responsible for */ /* creating a layer. It builds upon the current baseValueHdr and returns a new one. */
  • the newDynamicValue routine referred to above is called only by cmFollowTypes () to do the actual construction of the dynamic value for the specified base value.
  • the type is the type that is causing this dynamic value to be created. If the base value is a "real" value, then the dynamic value is added to a "dynamic values" property for the object who owns the value. The "dynamic value”s property is created for the first dynamic value for that object. All further dynamic values with "real" value bases are simply added as values (headers) to that property. If the base value is itself a dynamic value, then the newly created dynamic value is chained to the base (dynamic) value. It is only a backward link from the new dynamic value.
  • a cmDefineObjectO call always initializes the union field */
  • a backward chain of dynamic value headers is formed */ /* from the new dynamic value header back to the original "real" value header.
  • the */ /* link is the baseValue extensions field set above. It always points to its base */ /* value (it is also used by CMGetBaseValueO).
  • the dynValue */ /* union alternative in a "real" value header always points to the last dynamic */ /* value of the chain.
  • a CMUseValueO or CMNewValueO will always return the pointer */ /* to the dynamic value at the end of the chain.
  • CMGetNextValue() is defined next:
  • CMValue CMGetNextValue(CMObject object, CMProperty property, CMValue currValue) ;
  • This routine returns the refNum for the next value
  • currValue is NULL, the refNum for the first value for that object's property is returned. If currValue is not NULL, the next value for that object's property is returned. NULL is returned if there are no more type values following currValue or the object does not contain the property. CurrValue is generally a refNum previously returned from this routine. Successive calls to this routine will thus yield all the values for the specified property of the specified object as long as no other operations change the value order.
  • CMValue CMNewValue(CMOb ect object, CMProperty property, CMType type, ...) ;
  • a new entry is created for the designated object, with the designated property and type and a refnum to the entry is returned.
  • the generation number of the value defaults to the generation number of the container, but it may be set with
  • An object's properties can have more than one value. However, the all the types for the values belonging to a given object property must be unique.
  • the value is created at an unspecified location in the sequence of values for the specified property. Creating a new value may cause the order of the values for that property to change.
  • the CMNewValueO routine is implemented merely as a wrap around the CMVNewValue() routine, described below.
  • the CMNewValueO routine merely does what a user would do to use CMVNewValue() : CMValue CM VARARGS CMNewValue(CMObject object, CMProperty property, CMType type, ...) I
  • CMVNewValue() does the same as CMNewValue() above, except that the dynamic value data initialization (i.e., " --) parameters are given as a variable argument list as defined by the "stdarg" facility. It assumes that the caller has set up and terminated the variable arg list in the manner shown above for CMNewValue() .
  • CMVNewValue() is implemented as follows: o 1992 Apple Computer, Inc.
  • CMValue CM_FIXEDARGS CMVNewValue(CMObjeet object, CMProperty property, CMType type, va list datalnitParams) ⁇
  • ContainerPtr container ExitlfBadObjectCobject, NULL); /* validate object */
  • CONTAINERNAMEx(((TOCObj"ectPtr)property)->container), CONTAINERNAMEx(((TOCObjectPtr)type)->container)); return (NULL); > container container->updatingContainer; /* use updating container from here on */
  • the cmDefineObject() routine used in the above code, is called to define a new TOC entry for an object. We may have either a new object (id) or a preexisting one for which a new property and value are to be defined. All the fields for a TOC entry are passed. The objectFlags indicate the type of the object.
  • the value associated with the property is appended to the end of the property's value list.
  • the caller has the option of getting the pointer to the value header (theValueHdr) when theValueHdr is not passed as NULL.
  • the format of the cmDefineObject() function call is: TOCObjectPtr c DefineObject(ContainerPtr container, unsigned long objectID, const unsigned long propertyID, const unsigned long typelD, TOCValueBytesPtr value, const unsigned long generation, const unsigned short flags, const unsigned short objectFlags, TOCValueHdrPtr *theValueHdr);
  • the function returns a pointer to the object if it was successfully created and NULL if it wasn't. An error is reported if NULL is returned.
  • the objectFlags determine how the function treats the object and all the object fields (the other parameters) . There are four possible objectFlags:
  • UndefinedObject Set if the object is to be created, but we don't yet know what its TOC entries are. Basically a null object (or placeholder) is created. It is an incomplete object in that there are no properties or types chained to the object (yet) . This flag may be used in combination with the others if we know that the object ID corresponds to a property, type, or neither.
  • UndefinedObject it is now considered as defined. If it was an undefined property or type, it now becomes a defined property or type. Of course, duplicate definitions are an error.
  • ObjectObject but here we know the object is for a property. It has to either not exist previously, or was previously flagged as
  • TypeObject Same as PropertyObject, but for type objects.
  • value data can be read using the following API call: CMSize CMReadValueData(CMValue value, CMPtr buffer, CMCount offset, CMSize maxSize) ;
  • the data, starting at the offset, for the value is read into the buffer.
  • the size of the data read is returned. Up to maxSize characters will be read (can be 0) .
  • the data is read starting at the offset, up to the end of the data, or maxSize characters, whichever comes first. Offsets, are relative to 0. If the starting offset is greater than or equal to the current data size, no data is read and 0 returned.
  • Fig. 8 is a flowchart of a CMReadValueData() routine 800.
  • the routine first determines whether the value designation which was passed to it is a valid value, and if not, it exits.
  • the routine determines whether the value is a dynamic value or not. If it is not a dynamic value, then the routine merely obtains the real value from the real value segments (e.g. 426) and assembles them into the caller's buffer (step 806) . It then returns to the caller (step 808) .
  • the routine obtains the dynamic handler address of the read handler associated with this value.
  • the routine calls a cmGetDynHandlerAddress() function (described below), passing the value, a pointer to the dynamic handler vector entry for the desired read handler, and an indication that it is a read handler which is being sought. If this was not the first use of the handler, the cmGetDynHandlerAddress() function returns the desired handler address from the vector entry specified in the calling parameters. If it is the first use, then the function finds the inherited handler address and returns it.
  • the CMReadValueData() routine 800 checks again whether the value is still a dynamic value. If not, then the real value is assembled (step 806) as previously described, and returned in the caller's buffer (step 808) .
  • step 814 the routine signals that the dynamic read handler for this value is in use.
  • This step sets a flag specific to the value and the read handler of that value, which is used to detect an error by recursion.
  • step 816 another flag is set which allows getting base values in the present container. Only dynamic value handlers are permitted to get base values, since base values are not defined for real values.
  • step 818 the actual read handler routine is called as a procedure.
  • the address of that routine was placed in the value's dynamic value extensions vector previously in step 810.
  • step 820 the routine disallows getting of base values in this container, and in step 822, it signals that the dynamic read handler for this value is once again available.
  • the routine than returns with the requested data in the caller's buffer (step 808), as placed there by step 818.
  • CMSize CM_FIXEDARGS CMReadValueData(CMValue value, CMPtr buffer, CMCount offset, CMSize maxSize) ⁇
  • CMReadValueData() routine is the macro defined previously which merely checks the appropriate flags for the value.
  • the GetDynHandlerAddress0 call is also a macro defined previously, which calls a cmGetDynHandlerAddress() routine.
  • the call to cmGetDynHandlerAddress() is of the form:
  • CMValue cmGetDynHandlerAddress (CMValue value, DynamicValueVectorEntriesPtr vectorEntry, CMconst_CMGlobalName operationType, char *rou ineName) ;
  • a dynamic value handler is callable if it exists
  • the vectorEntry points to the dynamic value entry in its vector belonging to the extensions of the passed dynamic value. If this is not the first use of the handler, the vector entry contains the handler address and its associated ("this") value (discussed below). If it is first use, cmGetDynHandlerAddress() must find the ("inherited") handler address and its associated "this" value. In either case, the "this" value is returned as the function result. NULL is returned if an error is reported.
  • the returned value, and the one saved in the vector entry may not be the same. They are the same if the passed value already has a handler. If it doesn't, an "inherited" handler, from one of the dynamic value's base values is used. The value associated for whoever has the handler is the "owning" value. In C++ terms, it is the "this" pointer. In the limit, we could end up using the original "real” value that spawned the dynamic value(s). If that is indeed the case, we end up using the calling routine which will always be an API value operation.
  • the handler is accessed in the usual way via its dynamic*/
  • CMReadValueData() the complement of CMReadValueData() is CMWriteValueData() .
  • CMWriteValueData() the buffer specified by the caller is written to the container and defined as the data for the value. If the value already has data associated with it, the buffer overwrites the "old" data starting at the offset character position. Size bytes are written. Size can be 0.
  • the offset passed by the caller may be any value from 0 to S. That is, existing data may be overwritten or the value extended with new data.
  • the value of S can be obtained using CMGetValueSize() . Note, no "holes" can be created. An offset cannot be greater than S.
  • CMReadValueData0 Once data has been written to the container, it may be read using CMReadValueData0. Note that CMReadValueData() is also used for containers opened for input using CMOpenContainer() . It thus can be used for all kinds of opens. The converse is not true. CMWriteValueData() may only be used for a container opened for writing (or converting) using CMOpenNewContainer() .
  • CMWriteValueData() calls for a particular value do not have to be contiguous. Writes for other values can be done.
  • the API specifically, this routine here, takes care of generating "continued" value data (segments) for a value. The data is physically not contiguous in the container with such a case.
  • CMWriteValueData() hides this by allowing the user to view the data as contiguous. The input offset is mapped into the proper starting segment and offset within that.
  • CMWriteValueData() « 1992 Apple Computer, Inc. void CM_FIXEDARGS CMUriteValueData(CMValue value, CMPtr buffer, CMCount offset,
  • ContainerPtr container unsigned char *p; char offsetStr[15]; unsigned long len, remaining, amountWritten, nextFree, valueSize;
  • a “concat” means here means a physical concat if the old data was the last thing written to the SAME container. If it wasn't we must create a new value entry on the valueHdr's value list to represent a continued value. Note the emphasis on "SAME" container. We could be writing updates for an "old” container to be recorded in a new updating container.
  • the buffer is inserted into the value's data at the specified offset, size bytes are inserted. If the current size of the value data is S (it will be 0 for a new value created by CMNewValue() ) , then the offset may be any value from 0 to S. That is, the insertion may be anywhere within the data value or the value extended with new data.
  • the value of S can be obtained using CMGetValueSizeO . Note, no "holes" can be created. An offset cannot be greater than S. Also, an insertion at offset of S is identical to a CMWriteValueData() to the same place.
  • CMReadValueData() Once data has been written to the container, it may be read using CMReadValueData() .
  • continued (i.e., segmented) values are handled by splitting them into one or two new segments.
  • the insertion is always a new segment.
  • the original segment to be inserted to will split into two segments if the insertion is in to its middle. It will remain intact and unsplit if we are inserting into its beginning.
  • the new segment is simply inserted in front of it. For the split case the new segment must be inserted between the split pieces.
  • CMDeleteValueData (CMValue value, CMCount offset, CMSize size)
  • VOID CMDefineValueData (CMValue value, CMSize offset, CMSize size) ;
  • Existing data in the container which must have been in the container when it was opened, is defined as the data for the value. No data is written to the container. The container must have been opened using CMOpenNewContainer, with the flag kCMConverting set in the useMode. The designated value is set to reference the indicated data.
  • the offset given is the offset from the beginning of the container. It is an error to give an offset or a size that would result in the value containing bytes outside of the data that was in the container when it was opened. The offset therefore, must be in the range of 0 to N-l, where N is the size of preexisting data at the time the container was opened.
  • CMDefineValueDataO for the same value will define additional, i.e. continued, segments when the offset produces noncontiguous data definition. If the size of the last (most recent) value segment is S, and the offset for that segment is such that offset+S equals the offset for the additional segment, then the last segment is simply extended. This follows the same rules as CMWriteValueData() . void CMMoveValue(CMValue value, CMObject object, CMProperty property) ;
  • CMDeleteObjectProperty() Moves the specified value from its original object property to the specified object property.
  • the value is physically deleted from its original object/property as if a CMDeleteValue() were done on it. If the value deleted is the only one for the property, the property itself is deleted as in CMDeleteObjectProperty() .
  • the value is added to the "to"s object property in the same manner as a CMNewValue() .
  • the order of the values for both the value's original object property and for the value's new object property may be changed.
  • VOID CMGetValuelnfo (CMValue value, CMContainer *container, CMObject *object, CMProperty *property,
  • NULL may be passed for any argument except the first. void CMSetValue ype(CMValue value, CMType type);
  • CMSetValueGeneration (CMValue value, CMGeneration generation)
  • the generation for the specified value is set.
  • the generation number must be greater than or equal to 1. Normally this routine does not need to be used since the value inherits its generation from its container, void CMDeleteValue(CMValue value);
  • the designated value is deleted from its object property.
  • a deleted value will be treated by all - 137 - one of its objects for the desired type. This is accomplished by calling CMNewValueO and specifying the object, and the property within the object, as well as the desired type.
  • CMNewValueO creates a value header such as 506 (Fig. 5) for the new value and places the type ID into that value header.
  • the routine sets up the entire dynamic value chain (e.g. 518, 520, 522 in Fig.
  • CMNewValueO instead of CMNewValueO to set up the dynamic value chain.
  • CMNewValueO actually fills any of the dynamic value vectors (e.g. 530, 532, 534 in Fig. 5) at this time.
  • the application program can call any of the value operations of the API as desired. For example, it can call CMWriteValueData() to write data to the value, or it can call CMReadValueData() to read data from the value.
  • real value data segment 512 indicates how to reach the literal value, rather than the literal value data itself.
  • real value data segment 512 may contain a file name or a file handle.
  • CMUseValueO When the application program calls CMUseValueO, that routine returns a pointer (called v for the purposes of the description below) to the top layer dynamic value header 518. If the value were not a dynamic value, then CMUseValueO would have returned a pointer to the real value header 506 instead.
  • the application program can call CMReadValueData(v, destination buffer, offset, size) to perform a read operation on the value data. Note here that the application program need not know that the value is a dynamic value.
  • CMReadValueData0 first determines that v is a dynamic value. Therefore, instead of merely returning data (for example, from real value data segment 512), it performs a series of steps to obtain the data dynamically. First, it calls cmGetDynHandlerAddress() with v, with a pointer to the dynamic value vector 530 entry which should contain the pointer to the read handler for the "compressed file" type, and with a flag indicating that it is a read handler which is being sought. If cmGetDynHandlerAddress0 finds a non-NULL pointer in the specified entry of dynamic value vector 530, it merely returns since the read handler address was previously obtained.
  • metahandler invokes the metahandler for the type (the pointer for which metahandler was previously placed in the dynamic value extensions 524 when the dynamic value chain was created), requesting the type's read handler.
  • the metahandler returns a pointer to the - 135 -
  • Container Manager operations as though it does not exist. For example, it will not be found by CMUseValue, counted by CMCountValues, etc.
  • CMDeleteObjectProperty If that property is the only one for the object, the object is also deleted as in CMDeleteObject. Some values are protected from deletion. Protected values include the predefined TOC object values (seed and offset) and any currently open embedded container values, void CMReleaseValue(CMValue value) ;
  • the application program would start the session by calling the Container Manager APIs for starting a session. These APIs, among other things, create the session global data block 402 (Fig. 4) . The application program then calls the Container Manager APIs for starting a session. These APIs, among other things, create the session global data block 402 (Fig. 4) . The application program then calls the Container Manager APIs for starting a session. These APIs, among other things, create the session global data block 402 (Fig. 4) . The application program then calls the Container
  • Manager API(s) either to create a new container or to open an existing container. These APIs create the container control block 406 in memory 304 and initialize the TOC 414. If the contianer is an update - 136 - container, then the Container Manager performs certain functions to prepare the data, described hereinafter. If the application program then desires to define a type which incorporates data transformation and/or redirection, it would then invoke the Container Manager CMSetMetaHandler() routine to associate a desired metahandler with the global name of the type. If the desired type is to be built as a tree of subtypes, then the application program would call CMSetMetaHandler() for each of the subtypes on the tree which requires a metahandler.
  • CMRegisterType() This call creates a type object such as that shown in Fig. 7, which merely associates the name of the desired type with a type ID number (the object ID of the type object) . Again, if this desired type is to be created using a tree of sub-types, the application program would then call CMAddBaseType() to add the base types in the sequence desired. These calls construct the remainder of the type object as shown in Fig. 7, as well as the separate type objects for each of the base types.
  • a given type is at a leaf of a type tree, its type object would contain only the type's global name (in value 706 of global name property 704 of the type) . Only if the given type has one or more base types do value segments such as 708, 710 exist for a base type property 712 of the type object.
  • the application program may create a value in same, which cmGetDynHandlerAddress() places in the previously specified entry of dynamic value vector 530.
  • cmGetDynHandlerAddress() CMReadValueData() then invokes the handler now pointed to by the read handler entry of dynamic value vector 530.
  • the read handler for the "compressed file” type will be very simple since it does no transformations or redirection. All that work is performed by the base types "compression” and "file access”.
  • CMGetBaseValue() merely checks for errors and then returns the pointer to the supplied value's base value, which was previously stored in the extensions 524. In this case, it will return a pointer to dynamic value header 520.
  • CMGetBaseValue() is implemented in the Container Manager as follows: ⁇ 1992 Apple Computer, Inc.
  • ContainerPtr container
  • the read handler for the "compressed file” type calls CMReadValueData() recursively, this time for dynamic value header 520.
  • CMReadValueData() again determines that the value header 520 is a dynamic value header, and therefore again calls cmGetDynHandlerAddress () to have a pointer to the read handler for the "compressed file” type placed in the read handler slot of dynamic value vector 532.
  • cmGetDynHandlerAddress() invokes the metahandler for the type of value header 520 (which is "compression” in this example) , to obtain that pointer and place it in that slot in dynamic value vector 532.
  • CMReadValueData() then invokes the handler.
  • the read handler for the "compression” type will be more complicated than the read handler for "compressed file", since a transformation must be performed.
  • CMReadValueData0 has been called recursively, this time with the value pointing to dynamic value header 522 (the base value for dynamic value header 520) .
  • this read handler begins by obtaining its base value bv.
  • the base value points to real value header
  • CMReadValueData has now been called recursively yet a fourth time, but this time the value passed to it is a real value rather than a dynamic value. CMReadValueData() determines this and merely returns the data from the real value data segment 512.
  • the read handler might be written to follow the above procedure only if this is the first time the read handler has been called with respect to the particular value, and can save the file handle returned by the file open routine in the value's refCon. In all subsequent times that the read handler is called the read handler can simply use the file handle from the value's refCon to read the requested data.
  • refCon static storage area
  • the read handler knows that it is always the bottom handler on the chain.
  • the value's "use value” or “new value” handler can open the file using the file name in the value's real value segments, and can store the file handle in the value's refCon. Then the read handler could be written to always obtain the file handle from the value's refCon and never call CMReadValueData() to obtain a file name.
  • the read handler for the bottom level dynamic value header 536 After placing the result in the caller's destination buffer, the read handler for the bottom level dynamic value header 536 then returns to the CMReadValueDat 0 routine which called it, which in turn returns to the read handler for the second level dynamic value header 522. That read handler processes the results as previously described and returns to the CMReadValueData0 routine which called it. That routine then returns to the read handler for the top dynamic value header 518, which, as previously mentioned, needs to do no further work before returning to the CMReadValueData() routine which called it. That routine then returns to the caller, which is the application program. Thus the application program has obtained the data it requested, without ever having to know that the data was obtained from a file rather than from the real value data segment 512, and without ever having to know that the data underwent decompression before being returned.
  • Updating refers to a (new) container updating a target container.
  • the target may be appended or a separate container.
  • the topmost update container bases its changes on the second-layer update container, and so on down to the bottom-layer container.
  • Each container except the topmost is a "target" (or "child”) container for one or more respective update or parent containers. Since a target container which is in the middle of a chain in essence incorporates by reference its own descendent containers, the term "target container” is sometimes used herein to refer to not just the information directly in that container, but also all of the information in all of its descendent containers as well.
  • Appendix C C-language software to support the algorithms and data structures described herein is attached hereto as Appendix C, with a header file in Appendix B.
  • Appendix C C-language software to support the algorithms and data structures described herein is attached hereto as Appendix C, with a header file in Appendix B.
  • Updating involves extending the basic data structures as illustrated by the heavy lines in the diagram of Fig. 12.
  • This diagram shows the additional data structures added to the basic data structures illustrated in fig. 4, but without value segments.
  • Fig. 12 illustrates a case of the effect of a CMMoveValue0 of value V ⁇ tl3 from object 0 1; property ⁇ > 1 ( "O ⁇ i " ) to 0 2 P 2 -
  • value segments attached to the value headers but they are only of significance in the recording of data editing. That is illustrated separately later.
  • the additional data structures shown in Fig. 12 are as follows:
  • the head of this chain is contained in the Container Control Block.
  • order of updating recording does not matter, so the most convenient way to link this list is backward.
  • object 0 l was touched before 0 2 .
  • the Touched List 1204 and 1206. This is a doubly linked list for each object, whose header is in the object (TOCObject) .
  • a doubly linked list is used to make it easy to remove or move these entries.
  • List entries are generated for each value updated in the object. There is only one list entry per value no matter how many times the corresponding value is updated.
  • the value header for an updated value points back to its touched list entry, as indicated by arrows 1208 and 1210. This pointer is initially NULL and doubles as a "first touched" switch.
  • Each touched list entry contains the following:
  • OPT address The original object, property, and type IDs that is referred to collectively as the "OPT address”.
  • OPTs will work their way into the value updating instructions generated at close time for use in addressing the original values at open time. They uniquely address the original value prior to applying any changes to properly apply those changes.
  • immediate value has been edited and it is still immediate. Immediate values are up to 4 byte long and stored directly in the TOC in place of the offset, "deleted Value” The value has been deleted, "deleted Property” The property specified by address "OP" has been deleted.
  • Moves are the most involved area in updating. They are the central theme around which all the rest of updating revolve. Values can only be moved through CMMoveValue() calls. Touched list entries for the "from” and “to” (source and destination) objects involved in the moved value are defined. Note, that a move only moves the value header. The value data segments "go along for the ride”. By moving the value header, the user's refNum to it remains valid. The source and destination object's touched lists are modified appropriately.
  • the algorithm is conveniently described using a "finite state machine” (FSM) .
  • Fig. 13 is a "state-transition” diagram describing this FSM, and Fig. 14 defines the actions taken and the next state.
  • FSM finite state machine
  • State 1 is used when a value is moved to or within the same original object. This can be from another object (moved there " from the original object by an earlier move) within the original object, but from a different property (again from an earlier move) .
  • State 2 is entered when a value is moved to a different object from its original.
  • the general "gist" of this whole thing is to generate a "removed" entry at the original source and an
  • the "inserted” entry gets a back pointer to the corresponding "removed” entry (used when values are deleted, which is described later) .
  • the "removed” entry is suppressed when the "inserted” is for the same original object (i.e., moved to a different property in the original object) .
  • Both "removed” and “inserted” are suppressed when a value is moved back to its original object and property. Note, in the code, an actual state number is not maintained in the touched list entries. Instead the state can be inferred by current conditions.
  • FIG. 12 An example of a single move of a value is illustrated in the diagram of Fig. 12.
  • a CMMoveValue() of value VH ⁇ has been done from object OJPJ to 0 2 P 2 as indicated by arrow 1216.
  • the back pointer 1218 is set in the "inserted” to point to the "removed” entry.
  • Editing here refers basically to insert and delete data operations. Overwriting of existing data can always be reduced to a delete data operation followed by an insert data operation. Appending is viewed as an insert.
  • Fig. 16 illustrates this and the notation used in the descriptions to follow.
  • the user's view of the data is represented as two chunks of data, one of length ⁇ l, starting at logical offset 0, and a second of length ⁇ 2, which, by definition, starts at logical offset 0+ ⁇ l, i.e., ⁇ l.
  • the logical size is still ⁇ l+ ⁇ 2.
  • the logical offset information placed in the value segments for a value header is shown at the start of each representation of the data itself. Lengths of segments (also offsets into segments) are shown at the top or bottom. Showing the data with the segment information makes it easier to understand the math involved here.
  • Case (1) Inserting into an immediate data segment.
  • Case (2) Inserting into an arbitrary non-immediate data segment at a segment boundary.
  • Each value segment contains a pointer to the container (control block) that "owns” it. New value segments and their data must always be placed in the updating container. Hence an insertion can always be detected by seeing if the segment is "owned” by the updating container.
  • CMDeleteValueData() a starting offset and size are passed.
  • the size may span from some portion within a segment, through any number of whole segments, and end somewhere in a final segment. Alternatively, the size may specify only a portion of a single segment. However, each of these processes can be viewed separately in terms of a single segment. Thus, for any one segment, the following four cases are possible: Case (1) : Deleting an entire segment. Case (2) : Deleting the left portion of a segment. Case (3) : Deleting the right portion of a segment. Case (4) : Deleting an interior portion of a segment.
  • the original data segment can be immediate. If it is, it will remain immediate since the result can only be less than four bytes. Thus deletion of immediate data is not listed as one of the possible cases. It cannot, however, be ignored, since it must be taken into account as an updating operation. This is discussed later when converting immediates to non-immediates is described.
  • Fig. 18 The four delete cases are illustrated in Fig. 18. In all of these cases the preservation of the original logical offsets of the start of the remaining data segments is important. By doing this it can be determined that a deletion has been done and how much based on what is expected as the logical offset of the next segment.
  • Case (4) is a unique case. It cannot occur at the same time as the other cases. However, here too, the situation is similar to the other cases. A segment starting at ⁇ + ⁇ +x is encountered when it was expected to start at ⁇ + ⁇ .
  • the original logical offsets are preserved in the remaining segments or potions of original segments.
  • the expected offset of the next segment is always determined from the previous segment and its length. If the next segment does not start at that value, a deletion has been detected. This algorithm works no matter how many intervening segments were deleted. This is discussed below.
  • Fig. 19 illustrates a set of value segments in which a delete is done and then a series of inserts.
  • an original data portion of four segments was involved in a delete; the right portion 1902 of a segment 1904 starting at ⁇ + ⁇ , two whole segments 1906 and 1908, and the left portion 1910 of a fourth segment 1912.
  • the logical offset of the right portion of the fourth segment has been computed to start at ⁇ .
  • three new segments have been inserted at offset ⁇ + ⁇ .
  • Determining this situation at close time is basically an extension of the previous computations.
  • the basic scheme of determining deletion is to see if the "next segment" has the expected logical offset based on the previous segment's logical offset and length.
  • the reason “next segment” is quoted is that it is the key to extending the algorithm to the more general case of combining deletions with insertions.
  • “next segment” now refers to the next old segment, no matter how many segments away it is. Note, that the math is still the same for the above case. In the above diagram, the "next segment” is expected to start at ⁇ + ⁇ . But it (the next old segment) starts at ⁇ , before or after the insertion.
  • the insertion is still determined as described previously. New segments are insertions and the insertion point is determined based on the logical offset and size of the preceding segment.
  • Immediate data is a single value segment that was initially created as immediate because the data fit into four bytes.
  • the data is placed directly in the value segment (and hence the TOC) in place of a data offset to the data.
  • Any editing operations that cause that data to become greater than four bytes causes a conversion to non-immediate, i.e., a standard data segment that contains an offset to the data.
  • the data is written to the container, and the offset set to point to it. Once converted to a non-immediate it is never converted back.
  • Deleting a value involves freeing all the value segment memory.
  • the value header itself is moved to a "deleted values" list and never freed. This is done to make sure the user does not try to reuse a value refNum for a deleted value. Value headers on the
  • Properties can be deleted in two possible ways; (1) implicitly by deleting all the values for a property, and (2) explicitly doing a CMDeleteObjectProperty() . For updating purposes, only the explicit case need be considered. The implicit deletion will happen as a byproduct of value deletions.
  • Implicit property deletion touches are different from touches of explicitly deleted values in that no single value is involved explicitly in the property delete. The values are, however, implicitly involved to the degree that all values are deleted from the property (and the value headers put on the "deleted values” list) . Implicit value deletion is essentially a subset of the explicit deletion case. The difference between deleting an individual value, and deleting a value when its property is going to be deleted, is that a "deleted value" entry is created in the individual delete case, while the touched list entry is always freed in the property value delete case. What both have in common, however, is that "removed" entries in other objects for values that were moved into the property being deleted are set to "deleted value”.
  • Deleting an object means all its values and properties are deleted. Objects can only be explicitly deleted using CMDeleteObject() . Deleted objects are placed on the "deleted objects" list and marked deleted to prevent the user from further use of them.
  • the "removed” entries are kept because there are references to these entries from "inserted” entries from the other objects (touched list) to where the values were moved. If a moved value is itself deleted after the object it came from is deleted, then the back pointer in its "inserted” entry must be kept valid so that the "removed” entry it points to can be changed to a "deleted value”. Although this "deleted value" is on a touched list of a deleted object, it won't cause harm, since the final walks of the touched chain know to expect this.
  • Deleted objects are moved to the deleted objects list and flagged as deleted. This is done to protect against the user reusing the refNum after the delete. Thus an object on the touched chain, but flagged as deleted, is enough to indicated to close-time processing what is going on. The remaining touched list "goes along for the ride”.
  • Base types for dynamic values are recorded as an "array" of base type object IDs.
  • the array is stored as immediate value data segments for a special "base type" property of the parent global name object. Being immediates, they are treated for updating purposes much like a standard immediate case; a touched list entry for the (global name) object "owning" the base type value is created. The reason these are treated specially at all is that updating of base types, like immediates, implies recording the entire base type array just like a single immediate is recorded. But these are not normal immediates. They are an array of value segments, and thus must be uniquely treated.
  • a thing e.g. a container, an object in the container, a property of an object, a value of a property, or value data
  • modification can involve one or more of the following: insertion of an object into the container, deletion of an object from the container, modification of an object in the container, and so on.
  • Modification can include one or more of the following: insertion of a property into the object, deletion of a property from the object, or modification of a property of the object, and so on.
  • Modification of a property can include insertion of a value into the property, deletion of a value from the property, modification of a value (header) of the property, and so on.
  • Modification of a value (header) can include moving the value, setting the value type, changing the value base type, modifying the value data, and so on.
  • Modification of value data can include insertion of new value data, deletion of existing value data, overwriting existing value data with new data, converting immediate value data to non-immediate value data, and so on.
  • update containers in persistent storage identify changes to be made to information in their respective target containers when the update container is opened.
  • the update container in persistent storage does not contain the actual information in the target container, as modified, but merely contains instructions on what modifications to perform to bring the information as represented in the target container up to the state represented by the update container.
  • temporary storage on the other hand, as an application program changes information represented in an open container, those changes may actually be made in the temporary version of that information.
  • updates are represented in persistent storage as a series of change instructions, whereas they are represented in temporary storage as actual modifications.
  • the instructions are generated by walking the touched chain of objects and the touched list entries for each object. Further, as each object's touched list is processed, a new "updating property" is created for that object. It is set to contain a single value whose data is the updating instructions for all the values of that object.
  • the following updating instructions are generated (i.e., written to the container) for each touched object: removed value touched list entry created from a move of a value inserted value touched list entry created from a move of a value set-infoed value touched value header flagged as changed type and/or generation data delete touched value header flagged as being edited data insert touched value header flagged as being edited replaced immediate touched value header flagged as edited and immediate replaced base types touched list entry created from updating base type(s) delete value touched list entry created from a delete value
  • the touched list entries for deleted properties as well as deleted objects on the touched chain are not generated during this initial walk of the touched chain and lists.
  • the deleted objects and properties are generated as separate updating instructions after all value updating has been done (in step (4) ) . This is because, at open time, such deletions are processed separately. Thus they are grouped separately.
  • each object has its own updating lists for all the values of that object in the special updating property value. When the object is accessed at open time, all such updates will be applied. This is done to allow for the possibility of other implementations of the Container Manager API using random access techniques directly to the container rather than an all in-memory TOC implementation described here.
  • a "removed" touched list entry is created at the original source, and an "inserted” entry at the destination. Thus a random access implementation could see the source and destination in either order. If the "removed" is seen first, it can do that.
  • the "insert” simply accesses the old value from the container when the destination object is processed. Conversely, if the "insert” is seen first, it can do that. Then when the "remove” is seen the value is just not added to the object. In the present implementation, where the entire TOC is loaded into memory, the "removed” entries are ignored. At open time the "insert” is treated as a move operation.
  • step (3) After an object's touched list is processed for values in step (3) , all that will remain on the touched chain are deleted objects, and non-deleted objects with "deleted property" touched list entries. All that needs to be done is to walk the chain again.
  • the delete update instructions are generated as data just like the value updating instructions. But this time the data is for a special property of the private TOC #1 object (the "private" TOC is discussed in the open-time processing description) .
  • the open-time processing will know how to get at the deleted objects and property instructions.
  • the private TOC The TOC for the updating container is generated in this step. This TOC is kept separate from the target TOC that contains all the updates generated during the session. As discussed in open-time processing below, the updating container must keep its TOC separate from the updates since there is the potential for duplicate object IDs.
  • the container label is written last as a mechanism for achieving atomicity of the update.
  • a container in persistent storage will not be recognized as a valid Container Manager container. Only after the label is written is the container considered valid.
  • This mechanism operates also where the update container is appended to its target container rather than being in a separate file. In this situation, the Container Manager begins writing the update container information only after the label for the target container. Thus until the update container label is written, the last label in the file is that belonging to the target container (which, of course, might itself be a prior update container to its own prior target container) . Should a failure occur before the Container Manager writes the label for the update container, then the next time the file is opened, the Container Manager will find only the target container label. If any of the update container information was actually written to persistent storage prior to the failure, it is ignored in this subsequent invocation since it is not terminated by a valid label.
  • FIG. 20 illustrates the general organization of an updating container in persistent storage. All the information above the heavy line 2002 is just data accessed via TOC entries. All the information below the line are the TOC entries with the label 2004 at the end of the container.
  • TOC #1 has a special "pointing value" that allows access to the target be it appended or separate. The arrow in the diagram is meant to suggest this.
  • Close time processing generates two separate groups of updating instructions: those updating all the values for a single object and those for deleting objects and properties.
  • each object that is to be updated has its own special "updating property" created for that object. It is set to contain a single value whose data is the updating instructions for all the values needing updating in that object.
  • the value updating instructions are generated using the following sequences:
  • ⁇ v> is the control byte defined by the name at the left, and is the code in the first byte of the layouts, whose following bytes correspond to the ⁇ params>....
  • Each small division represents one byte in the value sequences.
  • NewProperty implies a PT immediately follows.
  • NewType implies only a T follows. This can happen when sequences of touches for an object all refer to the same property. This is not the absolute minimum generation, however, since touch list entries are not sorted by property.
  • the way to view a complete value update sequence i.e., " ⁇ new prop> P T ⁇ v> [params...]” or " ⁇ new type> T ⁇ v> [params...]” is that the " ⁇ v> [params...]” act upon their receiver, i.e., the "owning" object. It's sort of like a postfix notation..
  • ⁇ end> EndUpdates control byte to mark the end of the updating sequence What this sequence does is generate "delete object” or “delete property” for a new ( ⁇ del propl>) or the same ( ⁇ del prop2>) object.
  • the handlers for writing and reading the above update instructions are the same handlers used for TOC I/O. These handlers don't actually do any I/O. They only format and extract 1, 2, and 4 byte data.
  • the above layouts represent a compromise between efficiency, space, and to make it reasonably easy to write handlers while hiding the specifics of the layout from the handler writer.
  • the handlers are needed because internal data (bytes, words, and longs) must be read and written from and to the container. To be portable, a handler must be provided to potentially convert 1, 2, and 4-byte entities.
  • the handlers have the following definitions: void extractData_Handler(CMRefCon refCon, CMDataBuffer buffer, CMSize size,
  • handlers are used to extract or format internal container data.
  • 1, 2, or 4 bytes (8-bit bytes) "chunks" of data are expected to be copied between a buffer and the data. Pointers to the data and the buffer are passed.
  • the buffer can always be assumed large enough for the data.
  • the pointer to the data can be assumed to point to a CM_UCHAR if size is 1, CM_USHORT if size is 2, and CM_ULONG is size if 4.
  • the 1, 2, or 4 bytes are, of course, stored within the CM_UCHAR, CM_USHORT, or CM_ULONG as a function of the architecture. These may be a different size than what is expected to be written to the container. Repeated calls to these handlers are done to manipulate the updating instructions.
  • updating instructions are all value data for a single value: a value for the special updating property of an object to be updated, or a value for the special TOC #1 property.
  • the data is actually read or written using the standard API calls to CMReadValueData() and CMWriteValueData() .
  • the updating code controls the buffer, its size, and the starting offset; the parameters to the CMReadValueData() and CMWriteValueData() .
  • the handlers are defined only to move data to and from a buffer location (not by accident) .
  • the code buffers the updating information and calls CMReadValueData() or CMWriteValueData() only when needed.
  • the buffer size is defined in a header file as UpdateBufSize.
  • TOC buffering is optional. This is because buffering may be performed by the I/O handlers.
  • a header file controls this by defining the input and output TOC buffer sizes (TOCInputBufSize and TOCOutputBufSize, respectively) . Defining either or both of these as 0 turns off the corresponding TOC buffering through routines supplied with the Container Manager.
  • the data is generally truly data, with the TOC generated by the -API user calling CMDefineValueDataO .
  • the target container is viewed as "raw” data too. It's just the TOC will be built a different way.
  • the TOC that is created for the CCB is referred to as the "private TOC" . That is because it will contain objects known only to the updating container. Also, these objects may have object IDs that are the same as the target's and must be kept distinct. TOC #1, for example, is the same object ID.
  • the owning CCB pointer is the way the new segments are distinguished from the old ones.
  • the new value header is given the pointer to the updating CCB. This is the way new and old values are detected (new objects have IDs greater than the "min seed") .
  • each value header and segment is "branded" with its owning container.
  • updating may involve multiple layers of containers.
  • this allows the proper handlers to be used to access the associated value.
  • control block -- the only one opened for writing and whose refNum is returned to the user no matter which container the "branding" happens to point to.
  • the CCB field to do this is the "updatingContainer" . It will always point to the first (top-most) updating container (control block) , which when opening a new updater, will be that container, the only one opened for writing, and whose refNum is returned to the user.
  • the updatingContainer is initialized to point to its own container. When a target is opened, which is what's being described here, the updatingContainer field is "pulled down" from its parent (its updater) . As each updater opens its target, the net effect will be to set all the updatingContainer fields of all the target containers to point to the initial (top-most) updater. (2). Create the "pointing value".
  • the initial seed is that starting ID value that is used to assign object IDs to new objects. It is the next available ID from the target. But that can't be known until the dynamic value is created and the target opened. It is too hard to go back and change every object (and only those objects) involved with the dynamic value creation. Hence the concept of the "private TOC" and non-private TOC. (3) . Open the target container.
  • the target container is opened "normally” for reading using the type associated with a special handler package (TargetContainerHandlers. [ch] ) that does all its I/O through the dynamic value.
  • the handler package only supports reading, since targets are only opened for reading.
  • the separate target is read indirectly using the dynamic value's handlers. Following opening the target, it must be "connected” to the updating container. This is step (4) .
  • the target container For an appended container, the target container must also be opened. But being the same container as its updater, it is effectively being opened again, but this time only for reading.
  • the target As with the dynamic value case, the target is opened with a type associated with the same special handler package used for dynamic values. Since that handler package does all its operations in terms of the pointing value, in this appended case, it will work its way to the handlers associated with the updating container itself. With both CCBs created, the two are ready to be "connected” . As part of opening, only the private TOC of the target container is read. This only has meaning if the target is itself an updating container. If it is not, and it won't be if it is the lowest-level target, all of its TOC gets"loaded.
  • an updating container TOC generally consists of a private part and non-private part, with the non-private part being new values to be added to the target. It is that portion that is not yet read when the target is itself an updating container.
  • each CCB has two pairs of pointers; the "normal" TOC and global name table pointers used by everyone, and the "private" pair mainly used by close-time processing.
  • One other pointer is inherited. That is a pointer referred to as the "target container pointer” (targetContainer) . It is a CCB pointer copied from the target. It is always initialized to point to its own CCB.
  • updater's non-private TOC If this is a previously existing updating container opened for reading, then it is at this point all the updates from the updating container are applied to the target. The non-private portion of the updating container's TOC was loaded first in step (3) . Since the normal TOC is now the target's (connection has been made in step (4) ) , the remainder of the updater's TOC (the non-private part) is read. This contains the new values for old object and new values for new objects. They are simply added to the (now common) TOC.
  • the special "updating" list properties for the objects they update will be encountered. As discussed for close-time processing, these will be value operations (set-infos, data edits, moves, etc.).
  • the value data for such a property are all the updating instructions needed to bring all the values the associated object "up to date”. These instructions cannot be applied until the non-private TOC is fully read in because there could potentially be forward reference to other objects (e.g., for moves). So, if the special
  • a touched chain was built in step (5) of all objects containing the special "updating" property.
  • the mechanism used to build the chain is identical to that used when recording new updates.
  • the head of the chain is in the updating CCB.
  • the touched chain will represent all objects needing updating.
  • the touched chain can now be walked much like close-time processing to process the updating instructions associated with the "updating" property of each object on the chain.
  • objects on the touched chain are removed from the chain after each updating list is processed.
  • the object's "updating” property is also removed.
  • the chain is empty and the updating properties no longer exist. This puts the in-memory TOC back to its initial state ready to record new updates.
  • Fig. 22 illustrates the pertinent data structures discussed above. In the example of Fig.
  • CCB "A" 2202 has its private TOC and global name table pointers 2204 and 2206.
  • the normal TOC and global name table pointers are pointing at the target's.
  • Both sets (2212, 2214, 2216 and 2218) for the target point to the same tables. Since "A" is opened first, then "B", the close-time processing reverses this by closing "B” then "A”.
  • use counts are maintained for the TOC and global name tables. This prevents premature release of the data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention se rapporte à des structures de données et à des procédés permettant de stocker des informations sous forme d'objets dans des éléments de stockage cibles et dans des éléments de stockage de mise à jour. Un élément de stockage cible définit un premier état relatif aux informations, et l'élément de stockage de mise à jour, qui peut identifier l'élément de stockage cible, identifie des modifications des informations présentant le premier état, qui devraient permettre la mise à jour dudit premier état en un second état. Les éléments de stockage de mise à jour peuvent s'emboîter indéfiniment. Lorsqu'un programme d'application ouvre un élément de stockage de mise à jour, la procédure appliquée consiste à effectuer une recherche le long de la chaîne jusqu'à ce que l'élément de stockage cible au bout de la chaîne soit identifié. Des structures en mémoire sont alors créées afin de permettre l'accès aux objets et aux données de valeur représentés dans un tel élément de stockage. La procédure consiste alors à remonter la chaîne, et à effectuer, dans la structure en mémoire, les modifications requises dans chacun des éléments de stockage de mise à jour.
PCT/US1995/000196 1994-01-05 1995-01-04 Moyen de mise a jour pour module de gestion d'elements de stockage d'ordinateurs WO1995019001A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU15601/95A AU1560195A (en) 1994-01-05 1995-01-04 Update mechanism for computer storage container manager

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17785394A 1994-01-05 1994-01-05
US08/177,853 1994-01-05

Publications (1)

Publication Number Publication Date
WO1995019001A1 true WO1995019001A1 (fr) 1995-07-13

Family

ID=22650212

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1995/000196 WO1995019001A1 (fr) 1994-01-05 1995-01-04 Moyen de mise a jour pour module de gestion d'elements de stockage d'ordinateurs

Country Status (2)

Country Link
AU (1) AU1560195A (fr)
WO (1) WO1995019001A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8250558B2 (en) 2006-11-30 2012-08-21 Microsoft Corporation Dynamic linked library add-on features
WO2019126154A1 (fr) * 2017-12-18 2019-06-27 Replixio Ltd. Système et procédé de gestion de stockage de données
CN111274165A (zh) * 2018-12-05 2020-06-12 核桃运算股份有限公司 资料管理装置、方法及其电脑存储介质
CN113220822A (zh) * 2021-05-10 2021-08-06 北京百度网讯科技有限公司 文档数据的存储方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0425420A2 (fr) * 1989-10-23 1991-05-02 International Business Machines Corporation ProcÀ©dé et dispositif pour traiter des objets persistants dans un système de programmation orienté objet
EP0458495A2 (fr) * 1990-05-21 1991-11-27 Texas Instruments Incorporated Dispositif et procédé de gestion des versions et des configurations d'objets persistants et transitoires
EP0520924A2 (fr) * 1991-06-28 1992-12-30 International Business Machines Corporation Système de gestion d'objets "récipients"
JPH0520151A (ja) * 1991-07-10 1993-01-29 Nec Corp データ変更方式

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0425420A2 (fr) * 1989-10-23 1991-05-02 International Business Machines Corporation ProcÀ©dé et dispositif pour traiter des objets persistants dans un système de programmation orienté objet
EP0458495A2 (fr) * 1990-05-21 1991-11-27 Texas Instruments Incorporated Dispositif et procédé de gestion des versions et des configurations d'objets persistants et transitoires
EP0520924A2 (fr) * 1991-06-28 1992-12-30 International Business Machines Corporation Système de gestion d'objets "récipients"
JPH0520151A (ja) * 1991-07-10 1993-01-29 Nec Corp データ変更方式

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PATENT ABSTRACTS OF JAPAN vol. 017, no. 296 (P - 1551) 7 June 1993 (1993-06-07) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8250558B2 (en) 2006-11-30 2012-08-21 Microsoft Corporation Dynamic linked library add-on features
WO2019126154A1 (fr) * 2017-12-18 2019-06-27 Replixio Ltd. Système et procédé de gestion de stockage de données
CN111274165A (zh) * 2018-12-05 2020-06-12 核桃运算股份有限公司 资料管理装置、方法及其电脑存储介质
CN113220822A (zh) * 2021-05-10 2021-08-06 北京百度网讯科技有限公司 文档数据的存储方法及装置
CN113220822B (zh) * 2021-05-10 2024-01-09 北京百度网讯科技有限公司 文档数据的存储方法及装置

Also Published As

Publication number Publication date
AU1560195A (en) 1995-08-01

Similar Documents

Publication Publication Date Title
US5873097A (en) Update mechanism for computer storage container manager
US5652879A (en) Dynamic value mechanism for computer storage container manager enabling access of objects by multiple application programs
US6470494B1 (en) Class loader
US11314490B2 (en) Special calling sequence for caller-sensitive methods
US8332835B2 (en) Method and system for automated code-source indexing in java virtual machine environment
US6298353B1 (en) Checking serialization compatibility between versions of java classes
Riggs et al. Pickling state in the Java system
Bolton Pure Corba
US7207002B2 (en) Serialization and preservation of objects
Kelsey et al. A tractable Scheme implementation
Duncan et al. Adding contracts to Java with Handshake
US6633892B1 (en) Archiving tool
Shaylor et al. A java virtual machine architecture for very small devices
US6993744B2 (en) Method for enabling a compiler or interpreter to use identifiers found at run time in a map container object in a manner similar or identical to identifiers declared at compile time
EP1329819A2 (fr) Procédé et système pour traiter un fichier zip en continu
Wirfs-Brock et al. A overview of modular smalltalk
Harris et al. Bento Specification
US11347487B2 (en) Confining reflective access based on module boundaries
WO1995019001A1 (fr) Moyen de mise a jour pour module de gestion d'elements de stockage d'ordinateurs
Urbanek et al. Package ‘rJava’
Cohen et al. An architecture for safe bytecode insertion
Delamaro et al. Mobile code in. net: A porting experience
EP0714532B1 (fr) Mecanisme a valeur dynamique pour programme de gestion de boites a cartes informatiques
Heege Expert Visual C++/CLI:. NET for Visual C++ Programmers
Kim Frigate: an object-oriented file system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AM AT AU BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU JP KE KG KP KR KZ LK LR LT LU LV MD MG MN MW MX NL NO NZ PL PT RO RU SD SE SI SK TJ TT UA UZ VN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): KE MW SD SZ AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载