METHOD AND SYSTEM
FOR MAPPING BETWEEN A SOURCE DOCUMENT
AND A TRANSFORMATION DOCUMENT FIELD OF THE INVENTION
The present invention relates to computer networks and information systems. In particular, the present invention pertains to a computer based system and method for mapping a source document to a transformation document.
BACKGROUND INFORMATION
Computer data is often organized into entities called documents, files produced by an application such as a word processor. Modern information systems and workstations provide a GUI ("Graphical User Interface") for the manipulation and viewing of documents. In conventional paradigms for the representation of documents, content and formatting instructions included in a single document. FIG. la depicts a representation of a conventional paradigm for document representation and display. Rendering engine 106 processes source code file 105, which includes co-mingled content and formatting instructions to generate rendered text/graphics 107, which may be displayed on a display device (not shown). Rendering engine may be, for example, a browser.
HTML ("Hypertext Markup Language") is an example of a tagging language in which content and formatting are largely commingled. HTML has been the ubiquitous representation language for representing WWW ("World- Wide- Web") documents. Known technologies exist for WYSIWYG editing of HTML documents so that changes made via z GUI automatically result in appropriate changes in the underlying HTML code. The commingling of content and formatting in HTML documents is a significant problem because processors such as search engines or databases would preferably only operate on the content of a document, independently of formatting instructions. Moreover, the commingling of formatting instructions and content in WWW documents significantly hamper the reusability of the documents. For example, it is desirable to allow documents to be displayed on various types of display devices, in various environments and contexts. However, if formatting
information is fused with content, re-formatting documents for display in different environments is extremely problematic.
There have been various initiatives to separate format from content using HTML. For example, cascading style sheets ("CSS") provide a simple mechanism for adding style (e.g., fonts, colors, spacing) to WWW documents. However, HTML and CSS present inherent limitations for the separation of content and formatting instructions.
Recent developments in document paradigms have emphasized separation of content from style. Using this paradigm, a source document includes only content, which may be presented in any number of formats depending upon an intended audience. Typically, a set of transformation rules are applied to the source document to generate a transformation document for presentation to a user. For example, FIG. 1 b depicts a paradigm of a transformation process in which a data object is transformed into a presentation code object. As shown in FIG. lb, a data object or file 110 is processed by a source code object 120 to produce presentation code 130. Data object file 110 may be an XML file ("extensible Markup Language"), source code object 120 is an XSL ("extensible Style Language") file and presentation code output file 130 is HTML code.
Thus, the transformation code object rendering engine 106 processes presentation code object 130 to generate rendered text/graphics 107, for example for display on a display device.
In the context of editing/presentation, XML is a technology designed to overcome the deficiencies of HTML, and in particular, the fusion of content and formatting instructions endemic to HTML. A major difference between XML and HTML is that XML has a fixed tag set, while XML allows the definition of any tag sets. XML allows the structuring of content utilizing custom designed structures created by developers and authors, which is completely separated from formatting instructions. XSL is typically used in conjunction with XML to provide a formatting and style language for expression of content structured in XML. An XML document has both a logical and physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is-composed of declarations, elements, comments,
character references and processing instructions, all of which are indicated in the document by explicit markup. The logical and physical structures must nest properly.
A parsed entity contains text, a sequence of characters, which may represent markup or character data. A character is an atomic unit of text as specified in ISO/IEC 10646. A markup declaration is an element type declaration, an attribute-list declaration, or a notation declaration. Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments CD ATA section delimiters, document type declarations and processing instructions. All text that is not markup constitutes the character data of the document. Comments may appear anywhere in a document outside other markup. Processing instructions allow documents to contain instructions for applications
The function of markup in an XML document is to describe its storage and logical structure and-to associate attribute- value pairs with its logical structure. XML provides a mechanism, the document type declaration ("DTDs") to define constraints on the logical structure and to support the use of predefined storage units. A DTD contains or points to markup declarations that provide a grammar for a class of documents. An XML document is valid if it has an associated DTD and if the document complies with the constraints expressed in it.
An XML document includes one or more elements, the boundaries of which are either delimited by start-tags and end tags, or for empty elements by empty- element tags. Each element has a type, identified by name, and may have a set of attribute specifications. Each attribute specification has a name and a value. The text between the start-tag and end-tag is referred to as the element's content.
The element structure of an XML document may be constrained using element type and attribute-list declarations. An element type declaration constrains the element's content. For example, element type declarations are often used to constrain which element types can appear as children of an element.
In order to be displayed to a user, an XML document must be transformed into a presentation language such as HTML. A transformation expressed in XSL describes rules for transforming a source tree into a result tree. The transformation is achieved by associating patterns with templates. A pattern is matched against elements in a source tree. A template is instantiated to create part of a result tree. The structure of the result tree can be completely different from the structure of the source
tree. In constructing the result tree, elements from the source tree can be filtered and reordered, and arbitrary structure can be added. A template is instantiated for a particular source element to create part of the result tree. A template can contain elements that specify literal result element structure. When a template is instantiated, each instruction is executed and replaced by the result tree fragment that it creates. An XSL document includes a set of template rules. A template rule includes two parts: a pattern, which is matched against nodes in the source tree and a template, which can be instantiated to form part of the result tree. This allows a stylesheet to be applicable to a wide class of documents that have similar source tree structures. The template part is also referred to the output actions part. The pattern matching part specifies which elements in the source XML document should use the template to perform a transformation. The output actions part specifies if an element in the source document is matching, what form the selected element should be transformed to in the presentation language file such as HTML (i.e., what to output). A template is instantiated for a particular source element to create part of the result tree. A template can contain elements that specify literal result element structure. When a template is instantiated, each instruction is executed and replaced by the result tree fragment that it creates. Instructions can select and process descendant source elements. Processing a descendant element creates a result tree fragment by finding the applicable template rule and instantiating its template. The result tree is constructed by finding the template rule for the root node and instantiating its template.
For example, an XSL template may be specified: <xsl:template match="name"> <DIV>
<xsl:apply-templates/> //Recursively transform a subtree with current node as root
</DIV> </xsl:template> means that if the current element is "name," output <DIV>(result of transformation of current subtree)</DIV>.
Thus, given an XML fragment: <XML>
<name>abcdefg</name>
</XML> using the above XSL template, is transformed to
<DIV>abcdefg</DIV>
It is very desirable to provide a mapping between one document and another or between multiple views of a document. However, the paradigm depicted in FIG. lb presents significant challenges for mapping between documents or document views. This problem arises because the data file undergoes a transformation process via the source code object file 120. Thus, the transformation document (i.e., presentation language file (e.g., HTML)) is unlinked from the original data file (e.g., XML file). However, mapping between one document view and another is essential in modern information systems. For example, is essential WYSIWYG editors display documents on a display device such as a CRT ("Cathode Ray Tube") as they would appear when rendered in hard-copy format. Known methods exist for WYSIWYG editing of a document where content and formatting instructions are included in one file (fig. la). In this case, changes entered in a WYSIWYG view may be mapped to an underlying source document. For example, modern GUI paradigms such as DOM ("Document Object Model") maintain unique identifiers for each object in a document in a tree structure. An API ("Application Program Interface") provides function calls to return a particular identifier when a user interacts with a rendered object, for example by clicking on a rendered object displayed on a display device. The DOM defines what attributes are associated with each object has and how the objects and attributes may be manipulated.
An objective of the present invention is to provide a method and system for mapping a transformation document to a data file.
SUMMARY OF THE INVENTION
The present invention provides a method and system for mapping between a data object and one or more transformation objects. According to one embodiment,
the transformation object is generated from the data object as a function of a source code object (e.g., a set of transformation rules). This transformation code object may be, for example, a presentation language object which provides input to a display system such as a WWW browser. According to one embodiment, each element in a data object is assigned a unique identifier. During the transformation process, transformation code relating to a particular data object element is marked with the identifier corresponding to the respective data object element. According to one embodiment, this is accomplished by modifying the source code object and transformation rules. For example, the transformation rules may be modified to generate an "invisible" mark (i.e., a metatag) in the transformation object data object. The invisible mark does not connote any substantive meaning to any subsequent processors such as rendering engines.
According to one embodiment, the present invention is applied to an XSL transformation process for transforming an XML document to an HTML document. In particular, the present invention provides a method and system for linking each of a plurality of XML elements included in an XML document to a respective node in an HTML tree that has been generated using an associated XSL file. The linkage between XML elements and HTML nodes provides a powerful mechanism to allow WYSWG editing of the XML document, which is presented as a function of an HTML file generated as a function of the XML document and an associated XSL document.
According to one embodiment, a dynamic WYSWG editing system includes a processor, display device, one or more input devices and a storage subsystem for storing files. The WYSWG system performs an initialization process to prepare for WYSWG editing of an XML file. During the initialization process, the processor retrieves a desired XML file for editing, an associated XSL file. The processor then performs a transformation step to produce a second XSL file, referred to herein as XSL'. The transformation step is designed to generate a unique ID for each element in the XML document. The transformation step further is designed to generate a wrapper tag in the rendered HTML file for each XML element such that the wrapper tag includes the unique ID corresponding to the associated XML element. The wrapper tag permits association of a particular HTML node with a corresponding XML element. This association-permits dynamic editing of an XML document,
which is presented to a user via a presentation language such as HTML. According to one embodiment, the processor generates the XSL' file changing all the output statement of any templates in the XSL file to add wrapper tags to transformation output.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. la depicts a paradigm for conventional data structures for providing a WYSIWYG editing system.
FIG. lb depicts a paradigm of a transformation process according to one embodiment of the present invention.
FIG. 2 depicts a system for providing WYSIWYG editing of a data object according to one embodiment of the present invention.
FIG. 3a depicts a simplified physical structure of an XML file according to one embodiment of the present invention. FIG. 3b depicts a simplified structure of an XSL file according to one embodiment of the present invention.
FIG. 4a is a flowchart depicting a set of steps of an initialization process executed by a dynamic WYSIWYG editing system according to one embodiment of the present invention. FIG. 4b is a flowchart of a process for generating a modified XSL file (XSL') from a source XSL file according to one embodiment of the present invention.
FIG. 5a is a flowchart that depicts a main process for providing WYSWG editing of an XML file according to one embodiment of the present invention. The process begins in step 510. FIG. 5b is a flowchart of a set of steps of an input event handler according to one embodiment of the present invention.
FIG. 5c is a flowchart of a process to return an encapsulating HTML node according to one embodiment of the present invention.
DETAILED DESCRIPTION
The present invention provides a method and system for mapping between data object (e.g., XML file) and one or more transformation objects (e.g., HTML file) generated as a function of a set-of transformation rules (e.g., XSL). Although, the
embodiment described herein relates to a WYSIWYG editing system, the mapping structure may be utilized to provide a user powerful capacity for manipulation and editing of a data object. For example, the present invention may be applied in an environment where it is desired to provide different users different views of a document.
According to one embodiment, a mapping structure between a data document and a transformation document provided by the present invention may be utilized to provide WYSIWYG editing of a data object (e.g., XML file) transformed to a presentation object (e.g., HTML file) using a set of transformation rules (e.g., XSL). Thus, according to one embodiment, the present invention provides a mechanism to provide WYSIWG editing of a data file that has undergone one or more transformations via a set of transformation rules such as those specified in a source code. Thus, for example, the present invention could be applied to provide WYSIWG editing for word processing, code development, etc. According to one embodiment, the present invention provides a dynamic
WYSWG editing system for XML documents or other documents that undergo a transformation process as a function of source code such as XSL in order to render a presentation language file such as HTML. According to one embodiment, each XML is associated with a unique ID. A modified XSL file XSL' is generated. In particular, the output portion of each template in the XSL file is modified to include a wrapper. The wrapper includes a function executed at run-time that generates a unique ID for each XML element processed. Thus, the transformed result of any XML element is linked to a particular source element in the XML document.
FIG. 2 depicts a system for providing WYSIWYG editing of a data object according to one embodiment of the present invention. As shown in FIG. 2, the system includes processor 230, display device 210, input devices 240a and 240b. As depicted in FIG. 2, input devices may be a keyboard and/or mouse. Display device 210 and input devices 240a-b provides a GUI, which allows editing of files, which are presented as WYSWG document 220 on display device 210. Processor 230 is also coupled to storage device 235, which may include a hard disk storage unit or a volatile memory such as a RAM ("Random Access Memory"). According to one embodiment, the WYSWG system depicted in provides for editing of an XML
document. As shown in FIG. 2, storage device 235 includes XSL file 230a, XML file 230c.
In order to provide WYSIWYG editing, WYSIWYG system performs transformation of XSL file 230 into XSL' file 230b (described in detail below), which is also stored on storage device 235. WYSWG editing system also performs transformation of XML file 240c into HTML file 240d, which is utilized as a presentation language, which is then rendered on display device 210 by processor 230.
FIG. 3 a depicts a simplified physical structure of an XML file according to one embodiment of the present invention. XML file 360 includes a plurality of elements 365(1)-365(N). Each XML element may include a plurality of nested elements. Although not depicted in FIG. 3a, it is assumed that XML file includes an associated element type declaration for each element.
FIG. 3b depicts a simplified structure of an XSL file according to one embodiment of the present invention. As shown in FIG. 3b, XSL document 320 includes a plurality of templates 310( 1 )-310(N). Each XSL template 310 includes pattern matching part 301 and output actions part 303. Thus, FIG. 3b shows pattern matching parts 301(1)-301(N) and output actions parts 303(1)-303(N) for each respective template 310(1)-310(N). FIG. 4a is a flowchart depicting a set of steps of an initialization process executed by a dynamic WYSIWYG editing system according to one embodiment of the present invention. The process is initiated in step 410. In step 420, an XML data file to be edit is retrieved. For example, referring to FIG. 2, XML data file may be stored on storage subsystem 235. In step 430, it is determined whether an associated XSL file exists. If so ('yes' branch of step 430), in step 432, the associated XSL file is retrieved. If not ('no' branch of step 430), in step 435 a default XSL file is retrieved. In step 440, processor 230 processes the XSL file (default or associated) to generate a modified XSL file, referred to herein as XSL'. The XSL' file is generated as a function of the XML data file be edited and the associated XSL file such that each of the transformation rules in each template of the XSL file is modified to generates a transformed result that associates a unique identifier of an element in the XML file with the transformed result. According to one embodiment, the association between a unique identifier for each XML element and a transformed result is
effected by modifying the XSL file to generate a wrapper utilizing a tag structure that encapsulates the transformed result of each XML element and includes the unique XML element ID. The steps for production of XSL' file is described in detail below with respect to FIG. 4b. In step 445, a tree data structure is generated for the original XML file, which represents the parent/child relationship between all elements in the XML file. In step 447, a GUI event loop is initiated to begin editing of the XML file.
FIG. 4b is a flowchart of a process for generating a modified XSL file (XSL') from a source XSL file according to one embodiment of the present invention. This process is executed by processor 230 as part of an initialization process (i.e., step 440 in FIG. 4a). The process is initiated in step 451. In step 455, it is determined whether all templates in the original XSL file have been analyzed. If so ('yes' branch of step 455), the process ends in step 467. Otherwise ('no' branch of step 455), in step 457 an ID wrapper is generated. According to one embodiment of the present invention, the ID wrapper is a <SPAN> tag. In step 459, the output action portion of the template is wrapped with the invisible element and ID information. According to one embodiment, the output action portion of the template is modified to include a function called at runtime that generates a unique ID for each element that is matched to the template. For example, consider the following XML element
<XML>. . . <name>abcdefg</name> with an associated XSL template:
<xsl: template match- 'name">
<DIV>
<xsl:apply-templates/> </DIV>
</xsl:template> which would be transformed to the following result using the original XSL file:
<DIV>abcdefg</DIV>
According to one embodiment of the present invention, the following modified template would be generated in the XSL' file: <xsl template match^"name"> -
<DIV>
<SPAN attribute=(id info, of "name" element)> <xsl:apply-templates/> </SPAN> </DIV>
<xsl:template> The modified XSL' template would then transform the XML element as follows:
<SPAN attribute=(id info, of "name element)> <DIV>abide</DIV>
</SPAN>
In step 463, it is determined whether default templates were overridden in the original XSL file. If so ('yes' branch of step 463), the process ends in step 467. Otherwise ('no' branch of step 463), in step 465, default templates are overridden. The process ends in step 467.
FIG. 5a is a flowchart that depicts a main process for providing WYSWG editing of an XML file according to one embodiment of the present invention. The process begins in step 510. In step 520, it is determined whether the user has selected to display the XML source using the original XSL file. If so ('yes' branch of step 520), in step 530, HTML code is generated as a function of the XML file and a file XSL_SOURCEVIEW, which includes formatting instructions for displaying a source view of the XML code. In step 535, it is determined whether the user has selected a WYSWG view of the file for editing. If so ('yes' branch of step 535), in step 537, HTML code is generated as a function of the XML file and the XSL' file. This
HTML code is then provided to a rendering engine such as a browser for display to the user. In this WYSWG view, the user may perform dynamic editing of the XML code. In step 540, it is determined whether the user has selected a preview view. If so, HTML code is generated as a function of the XML file (which may be modified at this point) and the XSL file. In step 547, it is determined whether an edit event has occurred (i.e., an event to edit the XML file). For example, an edit event may correspond to the user clicking on the a particular portion of the screen with the mouse, or providing other selection for editing using a keyboard. If an edit event has
occurred ('yes' branch of step 547), an edit event handler is called in step 549. If not ('no' branch of step 547), flow continues with step 520.
FIG. 5b is a flowchart of a set of steps of an input event handler according to one embodiment of the present invention. The event handler is called upon receipt of an input event such as a mouse click or keyboard input event (i.e., step 547 of FIG. 5a). The process is initiated in step 551. In step 553, a handle or identifier of the HTML element selected by the user is returned using known methods. For example, according to one embodiment the DOM is employed, which provides an API call to return an identifier of an object such as an HTML node when clicked by the user. In step 555, a handle or unique identifier of an HTML encapsulating node is retrieved. The HTML encapsulating node is an HTML node that encapsulates the current node and is associated with an element ID in the XML document. A process for returning the encapsulating node is described below with reference to FIG. 5c. In step 559, the corresponding XML element ID is retrieved. According to one embodiment, the XML element ID is established by setting an attribute of the encapsulating element to reference the XML node. The encapsulating element may be a SPAN element in the HTML code or an element that encapsulates other nodes. In step 563, the XML node is returned, preferably as a pointer to the WYSIWYG editing system, which may be used to map edits made back to the original XML document. The process ends in step 567.
FIG. 5c is a flowchart of a process to return an encapsulating HTML node according to one embodiment of the present invention. The process operates by traversing the HTML tree beginning with the current HTML node upwards (i.e., ascending the tree parent by parent) until an encapsulating HTML node is found. The process is initiated in step 571. In step 575, it is determined whether the current
HTML node is an encapsulating node. An HTML node is encapsulating if it has an attribute that references an XML node. If the HTML node is not encapsulating ('no' branch of step 575), in step 583, the parent node of the current node is set to the current node. Flow then continues with step 575 until the traversal of the HTML tree locates an encapsulating node ('yes' branch of 575). The process ends in step 581, with return of the HTML encapsulating node. According to one embodiment In step 559, the XML corresponding to the clicked HTML node is retrieved. _