WO2008032962A1 - System and method for transforming electronic document - Google Patents
System and method for transforming electronic document Download PDFInfo
- Publication number
- WO2008032962A1 WO2008032962A1 PCT/KR2007/004363 KR2007004363W WO2008032962A1 WO 2008032962 A1 WO2008032962 A1 WO 2008032962A1 KR 2007004363 W KR2007004363 W KR 2007004363W WO 2008032962 A1 WO2008032962 A1 WO 2008032962A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- oes
- electronic document
- web
- information
- module
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000001131 transforming effect Effects 0.000 title claims abstract description 17
- 230000009466 transformation Effects 0.000 claims description 50
- 238000000605 extraction Methods 0.000 claims description 13
- 238000011426 transformation method Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 description 20
- 238000010276 construction Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 11
- 239000000284 extract Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000010365 information processing Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9577—Optimising the visualization of content, e.g. distillation of HTML documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
- G06F40/154—Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
Definitions
- the present invention relates, in general, to an electronic document transformation system and method, and, more particularly, to an electronic document transformation system and method, which transform service-target electronic documents into a new web document format regardless of the current format, and thus cause the electronic documents to be organized into a database (DB) , so that an electronic document service system is prompted to flexibly provide various types of services, which can be implemented over the web, to users, unlike conventional service systems .
- DB database
- FIG. 1 generally, under a prior art electronic document service system 5, a procedure in which a user connects to the system 5 via a user client 1 and requests the provision of a desired electronic document, and an electronic document service server 6 accesses a service information DB 7 , selectively extracts the corresponding electronic document, and provides the extracted electronic document 8 over the Internet 4 is carried out.
- the user can read the desired electronic document 8 via a dedicated electronic document viewer 3 (an add-on viewer) that is installed in the user client 1.
- respective electronic documents 8 have individual independent file formats having independently designated extensions (for example, indesign, quark, pdf, etc.), unlike open web documents (for example, HTML documents) interconnected in a hyperlink form, the system 5 cannot meet users' desires, even though the users desire to separately utilize specific content, for example, specific images, included in the electronic documents 8 (for example, users desire to utilize specific images in order to enhance/improve their personal websites or blogs) .
- a search engine installed in the electronic document service server 6 can limitedly search only the index information of corresponding electronic documents (for example, the title information of the electronic documents, the summary information of the electronic documents, etc.) designated by a system administrator, but cannot search actual content (characters, images, etc.) included in the respective electronic documents 8. Accordingly, a detailed search procedure corresponding to the user' s desire cannot be carried out, and the system 5 cannot provide high-quality search results to the user. As a result, the user must suffer many difficulties in effectively accessing and flexibly utilizing desired electronic documents.
- respective electronic documents 5 have an open web document format (for example, an HTML document format) , in which the electronic documents are interconnected in a hyperlink form
- the system 5 can exploit, for example, an advantage of acquiring a separate source of profits by- providing a service of posting web document-format advertising entities (for example, a popup window, a banner, etc.) in a matching advertisement fashion along with respective electronic documents while posting and providing the electronic documents on and via the web, an advantage of naturally establishing a marketplace having a specific or larger scale by providing a service of linking respective electronic documents to associated shopping mall servers over the web, and an advantage of increasing search quality to high quality and thus maximizing the number of system users, thereby maximizing the probability of increasing total profits.
- web document-format advertising entities for example, a popup window, a banner, etc.
- an object of the present invention is to provide a system in which a software module capable of reading the position characteristics (for example, position coordinates, widths, heights, etc.) of Object Entities
- OEs OEs
- a software module capable of setting the sequence of arrangement of the OEs based on the figure characteristics of the grouped respective OEs (whether the OEs forms a figure that extends from a left side to a right side or a figure that extends from an upper side to a lower side)
- a software module capable of transforming OEs, the sequence of arrangement of which is determined, into a web pattern format, and thus creating new format web documents corresponding to the source electronic documents
- a software module capable of uploading the created web documents to an electronic document service system are arranged to operate in conjunction with each other, through which service-target electronic documents are transformed into a new web document format regardless of the current format, and thus the electronic documents are caused to be organized into
- an electronic document transformation system including an electronic document transformation control module for generally controlling a procedure of collecting source electronic documents held by an electronic document service system, and transforming the source electronic documents into web documents; an Object Entity (OE) grouping module controlled by the electronic document transformation control module, and configured to detect position characteristics and information details of Object Entities (OEs) constituting each page of each of the source electronic documents, and divide and group the OEs into a plurality of groups based on positional relationships between the OEs and the information details of the OEs; an OE sequence designation module controlled by the electronic document transformation control module, and configured to set the sequence of arrangement of the OEs based on figure characteristics of the respective OEs grouped by the OE grouping module; an OE web pattern transformation module controlled by the electronic document transformation control module, and configured to transform the OEs into a web pattern format in the sequence of arrangement set by the OE sequence designation module; and a web document creation module controlled by the electronic document transformation control module, and configured to create
- the present invention provides an electronic document transformation method, including extracting layout information from position characteristics of OEs constituting each page of each electronic document; grouping the OEs based on positional relationships between the OEs using the layout information; sorting groups of the grouped OEs according to predetermined rules; and transforming the groups into a web document in the sequence of being sorted.
- the position characteristics (for example, position coordinates, widths, heights, etc.) of OEs constituting each page of each source electronic document are read, the OEs are divided and grouped into a plurality of groups based on the figure relationship between the OEs, the sequence of arrangement of the OEs is set based on the figure characteristics of the grouped respective OEs, OEs, the sequence of arrangement of which is determined, are transformed into a web pattern format, and thus web documents that has a new format and correspond to the source electronic documents are created, so that service-target electronic documents are transformed into a new web document format regardless of the current format, and thus the electronic documents are caused to be organized into a DB, with the result that the electronic document service system is prompted to flexibly provide various types of services, which can be implemented over the web, to users, unlike a conventional service system.
- the electronic document service system is prompted to flexibly provide various types of services, which can be implemented over the web, to users, unlike a conventional service system.
- electronic documents have a web document format, so that users can easily view desired electronic documents without requiring separate dedicated viewers to be installed in clients. Since such electronic documents have been optimized for a web format, the electronic documents can exhibit very excellent readability over the web.
- the respective electronic documents of the present invention have an open web document format (for example, an HTML document format) in which documents are interconnected in a hyperlink fashion, users can easily meet their desires only through a simple hyperlink setting procedure when the users desire to utilize specific images, included in electronic documents, so as to enhance/improve their personal websites, blogs, etc., which are operated via other web service systems.
- an open web document format for example, an HTML document format
- a search engine installed in an electronic document service server can flexibly search actual content (characters, images, etc.) include in respective electronic documents, so that detailed search procedures meeting users' demands can be easily carried out, with the result that the system can provide high-quality search results to the users, and thus the users can effectively access and flexibly utilize their desired electronic documents.
- the system can utilize, for example, an advantage of acquiring a separate source of profits by providing the service of presenting web document format advertising entities (for example, a popup window, a banner, etc.) in a matching advertisement form while posting and providing respective electronic documents on and to the web, an advantage of naturally establishing a marketplace having a specific or larger scale by providing the service of linking respective electronic documents to associated shopping mall servers over the web, and an
- FIG. 1 is a diagram conceptually showing an example of the construction of a prior art electronic document service system
- FIG. 2 is a conceptual diagram showing an example of the construction of an electronic document transformation system according to the present invention
- FIG. 3 is a conceptual diagram showing an example of the performance of the functions of a source electronic document collection module and an OE information reading module according to the present invention
- FIG. 4 is a conceptual diagram showing an example of the performance of the function of a source layout information creation module according to the present invention
- FIGS. 5 and 6 are conceptual diagrams showing the performance of the function of an OE grouping module according to the present invention.
- FIG. 7 is a conceptual diagram showing an example of the detailed construction of an OE grouping module according to the present invention
- FIGS. 8 and 9 are conceptual diagrams showing an example of the performance of the function of an OE sequence designation module
- FIG. 10 is a conceptual diagram showing an example of the detailed construction of an OE sequence designation module according to the present invention.
- FIGS. 11 and 12 are conceptual diagrams showing an example of the performance of the function of an OE web pattern transformation module according to the present invention
- FIG. 13 is a conceptual diagram showing an example of the detailed construction of an OE web pattern transformation module according to the present invention
- FIGS. 14 and 15 are conceptual diagrams showing an example of the performance of the function of a web document creation module according to the present invention.
- FIG. 16 shows an example of the output of an electronic document
- FIG. 17 shows an example of performing grouping on the electronic document of FIG. 16 based on the positional relationship between respective OEs;
- FIG. 18 shows an example of dividing the electronic document page of FIG. 16 into respective groups through grouping;
- FIG. 19 shows an example of sorting the groups of FIG. 18
- FIG. 20 shows an example of transforming respective groups into an HTML format and combining the groups into a single web document in the sequence of being sorted
- FIG. 21 shows an example of extracting OEs from an electronic document
- FIG. 22 shows an example of grouping respective OEs based on the positional relationship between the OEs.
- an electronic document transformation system 100 is installed in information processing equipment
- an electronic document transformation control module 101 such as a computer, equipped with an operating system, and has a construction in which an electronic document transformation control module 101, a source electronic document collection module 110, an OE information collection module 120, a source layout information creation module 130, an OE grouping module 140, an OE sequence designation module 150, an OE web pattern transformation module 160, a web document creation module 170, and a web document output module 180 are closely combined with each other.
- the electronic document transformation control module 101 establishes a communication relationship with the electronic document service system 15 via the interface module 102, and generally controls a procedure of collecting the source electronic documents SM (shown in FIG. 3) held by the electronic document service system 15, a procedure of reading the detailed information of OEs O constituting each page P of each of the source electronic documents SM, a procedure of creating the source layout information of the source electronic documents SM by combining together the detailed information of the respective OEs 0, a procedure of grouping respective OEs 0, a procedure of designating the sequence of arrangement for the grouped OEs 0, a procedure of transforming respective OEs 0 so that they can have a web pattern, a procedure of creating a web document WP corresponding to the source electronic document SM by combining respective OEs 0, transformed into the web pattern, into one, and a procedure of outputting and providing the created web document WP to the electronic document service system 15.
- the source electronic document collection module 110 controlled by the electronic document transformation control module 101, performs a function of collecting source electronic documents SM, held by the electronic document service system 15, through various paths, for example, a path for communicating with the electronic document service server 16 of the electronic document service system 15, and a path for communicating with an information storage medium (for example, an external compact disk, a hard disk installed in an information processing device, an external information storage device, etc.) provided by an administrator, and storing the collected source electronic documents SM in a processing buffer 103, as shown in FIG. 3.
- an information storage medium for example, an external compact disk, a hard disk installed in an information processing device, an external information storage device, etc.
- the OE information reading module 120 controlled by the electronic document transformation control module 101, communicates with the processing buffer 103, accesses the source electronic documents SM, as shown in FIG.
- a source electronic document a document file name, a source document type, width and height, the existence of an outline, the thickness of an outline, background information, color information detailed information about a page OE (an OE representing the entire page) : coordinates, a width and height, a page number, a page size, margin information, the presence of an outline, the thickness of an outline, background information, and color information
- an image OE an OE representing an image
- coordinates, width and height the presence of rotation, reduction and enlargement ratios, created layout information, the presence of an outline, the thickness of an outline, background information, color information, and transparency information
- OE OE representing a figure represented by a vector
- the source layout information creation module 130 controlled by the electronic document transformation control module 101, performs a procedure of accessing the processing buffer 103, checking the detailed information of respective OEs 0 constituting each page P of the source electronic document SM, and creating the layout information of each page P of the source electronic document SM by combining together
- FIG. 21 shows an example of the construction of layout information that is extracted from the source electronic document .
- the OE grouping module 140 When the source layout information of each page P constituting the source electronic document SM has been acquired through the performance of the function by the above-described source layout information creation module 130, the OE grouping module 140, controlled by the electronic document transformation control module 101, accesses the processing buffer 103, detects the position characteristics (for example, position characteristics, coordinate characteristics, widths, heights, the presence of rotation, information about various figures, such as a rectangle, a circle and a triangle, that are formed by layouts that are formed by respective OEs, etc.) and information details (for example, details represented by each character string) of respective OEs 0, and divides the OEs into a plurality of groups based on the positional relationships between the OEs 0 (for example, information on whether specific OEs have a close relationship with other OEs, whether each OE includes some other OE, and whether respective OEs have an overlapping relationship with other OEs) and information details, as shown in FIG.
- position characteristics for example, position characteristics, coordinate characteristics
- respective OEs 0 included in a specific page P of the source electronic document SM are divided into a group of adjacent OEs, a group of inclusive OEs, and a group of overlapping OEs, as shown in FIG. 6.
- An example of performing grouping on layout information having the construction of FIG. 21 is shown in FIG. 22.
- the OE grouping module 140 has a construction in which an OE grouping control unit 141 for generally controlling a procedure for detecting the position characteristics of OEs 0 and a grouping procedure, and a source layout information extraction unit 143, an OE position information reading unit 144, an OE content reading unit 145, an adjacent OE collection unit 146, an inclusive OE collection unit 147, an overlapping OE collection unit 148, and a grouping result information output unit 149, which are generally controlled by the OE grouping control unit 141, are closely combined with each other.
- the source layout information extraction unit 143 accesses the processing buffer 103 via the information exchange unit 140a, extracts the corresponding source layout information, and stably stores the extracted source layout information in the processing buffer 142.
- the OE position information reading unit 144 communicates with the processing buffer 142, and reads and detects the position characteristics of OEs 0, constituting corresponding pages P, for example, information about position characteristics, coordinate characteristics, widths, heights, the presence of rotation, and the type of figure, such as a rectangle, a circle or a triangle, that are formed by layouts formed by respective OEs.
- the OE content reading unit 145 communicates with the processing buffer 142, and reads and detects information details of OEs 0 constituting corresponding pages P, for example, the details of the content of respective character strings .
- the adjacent OE collection unit 146 groups specific OEs O, which belong to OEs 0 constituting each page P of the source electronic document SM, have a correlation in content, and are arranged adjacent to each other, in a single group. For example, in the case where each of the specific OEs 0 includes a specific number of identical words, have the same width/size/height, have the same color, or use the same font, the corresponding OEs 0 are determined to have a correlation in content.
- the inclusive OE collection unit 147 groups various OEs 0, which belong to OEs 0 constituting each page P of the source electronic document SM and are included in an OE, into a single group. In this case, it is possible to detect the content of respective OEs and group only OEs, the content of which has a correlation, into a single group.
- the corresponding OEs 0 are determined to have a correlation therebetween in content .
- the overlapping OE collection unit 148 groups specific OEs 0, which belong to OEs constituting each page P of the source electronic document SM and overlap each other, into a single group.
- the content of respective OEs may be examined and then only OEs, the content of which has a correlation therebetween, may be configured to form a group.
- the grouping result information output unit 149 controlled by the OE grouping control unit 141, immediately communicates with the processing buffer 103 via the information exchange unit 140a, and stores the results of the above-described grouping in the processing buffer 103, thereby assisting in normally carrying out the following procedure without hindrance.
- the OE sequence designation module 150 controlled by the electronic document transformation control module 101, performs a function of accessing the processing buffer 103, detecting the position characteristics of respective OEs 0, for example, information about position characteristics, coordinate characteristics, widths, heights, the presence of rotation, and the type of figure, such as a rectangle/circle/triangle, formed by layouts formed by- respective OEs, and setting the sequence of arrangement of the OEs based figure characteristics, for example, whether each of the OEs forms a figure that extends from the left side to the right side or a figure that extends from the upper side to the lower side, formed by the OEs grouped by the OE grouping module 140, as shown in FIG. 8.
- a group disposed to the left has priority over a group disposed to the right
- a group disposed in the upper direction has priority over a group disposed in the lower direction.
- the sequence of arrangement may be determined so that OEs having a figure characteristic in which the figures thereof extend from the left to the right have priority over OEs having a figure characteristic in which the figures thereof extend from the upper side to the lower side.
- the OE sequence designation module 150 may have a construction in which an OE sequence designation control unit 151 for generally controlling a procedure of detecting the position characteristics of OEs 0 and a sequence designation procedure, and a grouping result information extraction unit 152, an OE figure characteristic reading unit 153, an OE sequence designation execution unit 154 and an OE sequence designation result output unit 156, generally controlled by the OE sequence designation control unit 151, are combined with each other.
- the grouping result information extraction unit 152 accesses the processing buffer 103 via the information exchange unit 150a, extracts information about corresponding grouping results, and stores the extracted grouping result information in the processing buffer 157.
- the OE figure characteristic reading unit 153 communicates with the processing buffer 157, reads and detects the position characteristics of corresponding OEs 0, for example, information about position characteristics, coordinate characteristics, widths, heights, the presence of rotation, and the types of figures, such as a rectangle, a circle and a triangle, formed by layouts formed by respective OEs, and determines whether the grouped respective OEs 0 have a figure characteristic in which the figures thereof extend from the left to the right or a figure characteristic in which the figures thereof extend from the upper side to the lower side based on the position characteristics.
- the OE sequence designation execution unit 154 communicates with the OE sequence designation reference information storage unit 155 and sets the sequence of arrangement of the OEs 0 so that OEs 0 having a figure characteristic in which the figures thereof extend from the left to the right have priority over OEs 0 having a figure characteristic in which the figures thereof extend from the upper side to the lower side (Of course, according to the situation, when the details of the reference data stored in the OE sequence designation reference information storage unit are newly changed, the results of the sequence of priority by the OE sequence designation execution unit are newly changed also) .
- the grouped respective OEs 0 are flexibly assigned the sequence of arrangement thereof based on the figure characteristics, for example, the condition in which the figures thereof extend from the left to the write or from the upper side to the lower side, as shown in FIG. 9.
- the OE sequence designation result output unit 156 communicates with the processing buffer 103 via the information exchange unit 150a, and stores the results of the above-described sequence designation in the processing buffer 103.
- the OE web pattern transformation module 160 transforms the respective OEs 0 into web pattern format entities WPs capable of constructing web pages WP, for example, HTML format entities, in the sequence of arrangement of the OEs, set by the OE sequence designation module, using web transformation tags, as shown in FIG. 11.
- WPs capable of constructing web pages WP
- HTML format entities for example, HTML format entities
- the respective OEs 0 included in a specific page P constituting the source electronic document SM are transformed into web pattern format entities WEl, WE2, and WE3, as shown in FIG. 12.
- the OE web pattern transformation module 160 includes an OE web pattern transformation unit 161 for generally controlling a procedure of transforming the OEs into a web pattern, and an OE sequence designation result extraction unit 162, an OE information copying unit 163, an OE corresponding web tag extraction unit 164, an OE information pattern transformation engine 166, and a transformed web entity output unit 167, which are controlled by the OE web pattern transformation unit 161.
- the OE sequence designation result extraction unit 162 accesses the processing buffer 103 via the information exchange unit 160a, extracts the corresponding information about the results of the designation of the sequence, and stores the extracted information about the results of the designation of the sequence in the processing buffer 108.
- the OE information copying unit 163 reads and copies the details of the information of the OEs 0 in the sequence of arrangement set by the OE sequence designation module 150, and transfers the details of the information to the OE corresponding web tag extraction unit 164.
- the OE corresponding web tag extraction unit 164 communicates with the OE corresponding web tag storage unit 165, and loads web tags corresponding to the details of the information of the OEs 0 copied by the OE information copying unit 163.
- the OE information pattern transformation engine 166 transforms the respective OEs 0 into a web pattern format by replacing the details of the information of the OEs 0 with a web tag format.
- respective OEs 0 included in the specific page P of the source electronic document SM are transformed into web document language (for example, HTML) format entities WEl, WE2, and WE3, as shown in FIG. 12.
- the transformed web entity output unit 167 communicates with the processing buffer 103 via the information exchange unit 160a, and stores the transformed/created web entities WEs in the processing buffer 103.
- the web document creation module 170 When the web entities WEs corresponding to respective OEs O have been stored in the processing buffer 103, the web document creation module 170 combines corresponding web entities WEs (that is, the OEs transformed into a web document language) into a single file, creates a new-format web page WP (for example, a HTML format web page) corresponding to that of the source electronic document SM, and stores it in the processing buffer 103, as shown in FIGS. 14 and 15.
- a new-format web page WP for example, a HTML format web page
- the web page WP of the present invention is a product that is created through the above- described grouping procedure and sequence designation procedure, so that it can naturally have the pattern that most closely conforms to a format for web services, regardless of the format of the previous electronic document.
- An example of performing grouping on an electronic document and then transforming the electronic document into a web document according to the above-described procedures will be described with reference to FIGS. 16 to 20.
- FIG. 16 shows an example of the output of one page of an electronic document.
- various OEs including text OEs and image OEs, are indicated by dotted line or solid line boxes.
- OEs indicated by dotted lines mean that the OEs are included in other OEs.
- FIG. 17 shows an example of performing grouping on the electronic document page of FIG. 16 based on the positional relationship between respective OEs.
- Group 1 Gl is made up of OEs that have an adjacent relationship or an overlapping relationship
- groups 2 and 3 G2 and G3 are made up of single OEs.
- Groups 4 and 5 G4 and G5 are made up of OEs that have an inclusive, adjacent, and overlapping relationship
- group 6 G6 is made up of OEs that have an adjacent and overlapping relationship.
- groups G2 and G3 having an adjacent relationship are associated with each other in content, it is possible to combine the two groups G2 and G3 into a single group.
- FIG. 18 shows an example of dividing the electronic document page of FIG.
- FIG. 19 shows an example of sorting the groups of FIG. 18.
- a group disposed to the left has priority over another group to the right along the same horizontal line
- a group in the upper direction has priority over another group in the lower direction along the same vertical line.
- Group 1 Gl has the highest priority since there are no OEs to the left of the group 1 Gl and group 1 Gl is disposed at the uppermost position
- group 6 G6 has the lowest priority since group 6 G6 is disposed to the right of the other groups along the same horizontal line. That is, groups G3, G4, and G5, having upper left coordinates below the upper left coordinates of a box of group 6 G6, have higher priorities than that of group 6 G6 since they are disposed to the left of group 6 G6.
- FIG. 19 shows group 5 G5, which is divided into two groups.
- G5-1 and G5-2 in the case where OEs have different background colors, fonts and styles of type, they may be determined not to be associated with each other and be divided into different groups, even though they are disposed adjacent to each other.
- FIG. 20 shows an example of transforming respective groups into a web document language (HTML) and combining the groups into a single web document (INDEX.HTML) in the sequence of being sorted.
- HTML web document language
- INDEX.HTML a web document language
- the web document output module 180 When the web document WP corresponding to the source electronic document SM has been created through the above- described procedure, the web document output module 180, controlled by the electronic document transformation control module 101, performs a procedure of accessing the processing buffer 103, extracting the corresponding web document WP, communicating with the electronic document service server 16 via the interface module 102, and uploading the extracted web document WP to the electronic document service server 16, thereby functioning to cause the web document WP to be managed in the service information DB 17 in the form of a DB.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Document Processing Apparatus (AREA)
- Processing Or Creating Images (AREA)
Abstract
A method and system for transforming electronic documents into hyperlink-type web documents are provided. Layout information is extracted from the characteristics of object entities constituting each page of each electronic document. The layout information may include the position information of object entities, the layout figure information of object entities, and the character strings of each OE. The object entities are grouped based on the location relations therebetween. In this case, correlations in content between respective object entities may be taken into consideration. Respective groups are sorted according to predetermined rules, and are transformed into a web document in the sequence of being sorted.
Description
[DESCRIPTION]
[invention Title]
SYSTEM AND METHOD FOR TRANSFORMING ELECTRONIC DOCUMENT
[Technical Field]
The present invention relates, in general, to an electronic document transformation system and method, and, more particularly, to an electronic document transformation system and method, which transform service-target electronic documents into a new web document format regardless of the current format, and thus cause the electronic documents to be organized into a database (DB) , so that an electronic document service system is prompted to flexibly provide various types of services, which can be implemented over the web, to users, unlike conventional service systems .
[Background Art]
Recently, as electrical/electronic technology infrastructure has widely expanded, online network-related technology, such as Internet technology, has also rapidly developed, with the result that the scale of distribution of electronic documents (for example, magazines, theses, etc.) over the Internet is steeply increasing.
As shown in FIG. 1, generally, under a prior art electronic document service system 5, a procedure in which a user connects to the system 5 via a user client 1 and requests the provision of a desired electronic document, and an electronic document service server 6 accesses a service information DB 7 , selectively extracts the corresponding electronic document, and provides the extracted electronic document 8 over the Internet 4 is carried out. The user can read the desired electronic document 8 via a dedicated electronic document viewer 3 (an add-on viewer) that is installed in the user client 1.
However, under this prior art electronic document service system, because not every electronic document 8 is in a web document format (for example, an HTML document format) appropriate for web services, a user must suffer the inconvenience of additionally installing the separate dedicated viewer 3 for outputting the electronic document 8 in the client 1, in addition to a dedicated web document browser 2, as long as no separate measure is taken. Of course, each electronic document 8 may be provided on the web without using the above-described dedicated viewer 3, in which case the corresponding electronic document 8 must have very poor readability, with the result that the user must suffer inconvenience resulting from the poor readability.
Furthermore, since respective electronic documents 8
have individual independent file formats having independently designated extensions (for example, indesign, quark, pdf, etc.), unlike open web documents (for example, HTML documents) interconnected in a hyperlink form, the system 5 cannot meet users' desires, even though the users desire to separately utilize specific content, for example, specific images, included in the electronic documents 8 (for example, users desire to utilize specific images in order to enhance/improve their personal websites or blogs) . Furthermore, under the situation in which the conventional respective electronic documents 8 have individual independent file formats having independently designated extensions, a search engine installed in the electronic document service server 6 can limitedly search only the index information of corresponding electronic documents (for example, the title information of the electronic documents, the summary information of the electronic documents, etc.) designated by a system administrator, but cannot search actual content (characters, images, etc.) included in the respective electronic documents 8. Accordingly, a detailed search procedure corresponding to the user' s desire cannot be carried out, and the system 5 cannot provide high-quality search results to the user. As a result, the user must suffer many difficulties in effectively accessing and flexibly utilizing desired electronic documents.
If respective electronic documents 5 have an open web document format (for example, an HTML document format) , in which the electronic documents are interconnected in a hyperlink form, the system 5 can exploit, for example, an advantage of acquiring a separate source of profits by- providing a service of posting web document-format advertising entities (for example, a popup window, a banner, etc.) in a matching advertisement fashion along with respective electronic documents while posting and providing the electronic documents on and via the web, an advantage of naturally establishing a marketplace having a specific or larger scale by providing a service of linking respective electronic documents to associated shopping mall servers over the web, and an advantage of increasing search quality to high quality and thus maximizing the number of system users, thereby maximizing the probability of increasing total profits. However, under the conventional situation, in which respective electronic documents have individual independent file formats having independently designated extensions, these advantages cannot be exploited.
[Disclosure] [Technical Problem]
Accordingly, an object of the present invention is to provide a system in which a software module capable of
reading the position characteristics (for example, position coordinates, widths, heights, etc.) of Object Entities
(hereinafter referred to as "OEs") constituting each page of each source electronic document, and dividing and grouping the OEs into a plurality of groups based on the figure relationship (for example, the OEs are arranged adjacent to each other, are included, overlap each other, etc.) between the OEs, a software module capable of setting the sequence of arrangement of the OEs based on the figure characteristics of the grouped respective OEs (whether the OEs forms a figure that extends from a left side to a right side or a figure that extends from an upper side to a lower side) , a software module capable of transforming OEs, the sequence of arrangement of which is determined, into a web pattern format, and thus creating new format web documents corresponding to the source electronic documents, and a software module capable of uploading the created web documents to an electronic document service system are arranged to operate in conjunction with each other, through which service-target electronic documents are transformed into a new web document format regardless of the current format, and thus the electronic documents are caused to be organized into a DB, so that the electronic document service system is prompted to flexibly provide various types of services, which can be implemented over the web, to users, unlike a conventional service system.
[Technical Solution]
In order to accomplish the above objects, the present invention provides an electronic document transformation system, including an electronic document transformation control module for generally controlling a procedure of collecting source electronic documents held by an electronic document service system, and transforming the source electronic documents into web documents; an Object Entity (OE) grouping module controlled by the electronic document transformation control module, and configured to detect position characteristics and information details of Object Entities (OEs) constituting each page of each of the source electronic documents, and divide and group the OEs into a plurality of groups based on positional relationships between the OEs and the information details of the OEs; an OE sequence designation module controlled by the electronic document transformation control module, and configured to set the sequence of arrangement of the OEs based on figure characteristics of the respective OEs grouped by the OE grouping module; an OE web pattern transformation module controlled by the electronic document transformation control module, and configured to transform the OEs into a web pattern format in the sequence of arrangement set by the OE sequence designation module; and a web document creation module controlled by the electronic
document transformation control module, and configured to create a new format web document, corresponding to the source electronic document, by combining together the OEs transformed into a web pattern format by the OE web pattern transformation module.
In addition, the present invention provides an electronic document transformation method, including extracting layout information from position characteristics of OEs constituting each page of each electronic document; grouping the OEs based on positional relationships between the OEs using the layout information; sorting groups of the grouped OEs according to predetermined rules; and transforming the groups into a web document in the sequence of being sorted.
[Advantageous Effects]
According to the present invention, the position characteristics (for example, position coordinates, widths, heights, etc.) of OEs constituting each page of each source electronic document are read, the OEs are divided and grouped into a plurality of groups based on the figure relationship between the OEs, the sequence of arrangement of the OEs is set based on the figure characteristics of the grouped respective OEs, OEs, the sequence of arrangement of which is determined, are transformed into a web pattern format, and thus web documents that has a new
format and correspond to the source electronic documents are created, so that service-target electronic documents are transformed into a new web document format regardless of the current format, and thus the electronic documents are caused to be organized into a DB, with the result that the electronic document service system is prompted to flexibly provide various types of services, which can be implemented over the web, to users, unlike a conventional service system. Furthermore, although electronic documents are automatically transformed into web documents, the positional relationship between the source OEs of the electronic documents and the configuration thereof can be maintained as closely as possible, so that there is an advantage in that the layout of each web document can remain as close as possible to the layout of each source electronic document.
Under the implementation environment of the present invention, electronic documents have a web document format, so that users can easily view desired electronic documents without requiring separate dedicated viewers to be installed in clients. Since such electronic documents have been optimized for a web format, the electronic documents can exhibit very excellent readability over the web.
Furthermore, since the respective electronic documents of the present invention have an open web document format (for example, an HTML document format) in
which documents are interconnected in a hyperlink fashion, users can easily meet their desires only through a simple hyperlink setting procedure when the users desire to utilize specific images, included in electronic documents, so as to enhance/improve their personal websites, blogs, etc., which are operated via other web service systems.
Furthermore, under the implementation environment of the present invention, in which respective electronic documents are organized into a DB in a web document format, a search engine installed in an electronic document service server can flexibly search actual content (characters, images, etc.) include in respective electronic documents, so that detailed search procedures meeting users' demands can be easily carried out, with the result that the system can provide high-quality search results to the users, and thus the users can effectively access and flexibly utilize their desired electronic documents.
Furthermore, under the implementation environment of the present invention, in which respective electronic documents have an open web document format (for example, an HTML document format) in which documents are interconnected to each other in a hyperlink form, the system can utilize, for example, an advantage of acquiring a separate source of profits by providing the service of presenting web document format advertising entities (for example, a popup window, a banner, etc.) in a matching advertisement form while
posting and providing respective electronic documents on and to the web, an advantage of naturally establishing a marketplace having a specific or larger scale by providing the service of linking respective electronic documents to associated shopping mall servers over the web, and an
.advantage of maximizing the increase in possible profits by increasing search quality and thus maximizing the number of users of the system.
[Description of Drawings] FIG. 1 is a diagram conceptually showing an example of the construction of a prior art electronic document service system;
FIG. 2 is a conceptual diagram showing an example of the construction of an electronic document transformation system according to the present invention;
FIG. 3 is a conceptual diagram showing an example of the performance of the functions of a source electronic document collection module and an OE information reading module according to the present invention; FIG. 4 is a conceptual diagram showing an example of the performance of the function of a source layout information creation module according to the present invention;
FIGS. 5 and 6 are conceptual diagrams showing the performance of the function of an OE grouping module
according to the present invention;
FIG. 7 is a conceptual diagram showing an example of the detailed construction of an OE grouping module according to the present invention; FIGS. 8 and 9 are conceptual diagrams showing an example of the performance of the function of an OE sequence designation module;
FIG. 10 is a conceptual diagram showing an example of the detailed construction of an OE sequence designation module according to the present invention;
FIGS. 11 and 12 are conceptual diagrams showing an example of the performance of the function of an OE web pattern transformation module according to the present invention; FIG. 13 is a conceptual diagram showing an example of the detailed construction of an OE web pattern transformation module according to the present invention;
FIGS. 14 and 15 are conceptual diagrams showing an example of the performance of the function of a web document creation module according to the present invention;
FIG. 16 shows an example of the output of an electronic document;
FIG. 17 shows an example of performing grouping on the electronic document of FIG. 16 based on the positional relationship between respective OEs;
FIG. 18 shows an example of dividing the electronic document page of FIG. 16 into respective groups through grouping;
FIG. 19 shows an example of sorting the groups of FIG. 18;
FIG. 20 shows an example of transforming respective groups into an HTML format and combining the groups into a single web document in the sequence of being sorted;
FIG. 21 shows an example of extracting OEs from an electronic document; and
FIG. 22 shows an example of grouping respective OEs based on the positional relationship between the OEs.
[Mode for Invention]
As illustrated in FIG. 2, an electronic document transformation system 100 according to the present invention is installed in information processing equipment
(not shown) , such as a computer, equipped with an operating system, and has a construction in which an electronic document transformation control module 101, a source electronic document collection module 110, an OE information collection module 120, a source layout information creation module 130, an OE grouping module 140, an OE sequence designation module 150, an OE web pattern transformation module 160, a web document creation module 170, and a web document output module 180 are closely
combined with each other.
In this case, the electronic document transformation control module 101 establishes a communication relationship with the electronic document service system 15 via the interface module 102, and generally controls a procedure of collecting the source electronic documents SM (shown in FIG. 3) held by the electronic document service system 15, a procedure of reading the detailed information of OEs O constituting each page P of each of the source electronic documents SM, a procedure of creating the source layout information of the source electronic documents SM by combining together the detailed information of the respective OEs 0, a procedure of grouping respective OEs 0, a procedure of designating the sequence of arrangement for the grouped OEs 0, a procedure of transforming respective OEs 0 so that they can have a web pattern, a procedure of creating a web document WP corresponding to the source electronic document SM by combining respective OEs 0, transformed into the web pattern, into one, and a procedure of outputting and providing the created web document WP to the electronic document service system 15.
Here, the source electronic document collection module 110, controlled by the electronic document transformation control module 101, performs a function of collecting source electronic documents SM, held by the electronic document service system 15, through various
paths, for example, a path for communicating with the electronic document service server 16 of the electronic document service system 15, and a path for communicating with an information storage medium (for example, an external compact disk, a hard disk installed in an information processing device, an external information storage device, etc.) provided by an administrator, and storing the collected source electronic documents SM in a processing buffer 103, as shown in FIG. 3. When the source electronic documents SM have been acquired in the processing buffer 103 through the performance of the function by the above-described source electronic document collection module 110, the OE information reading module 120, controlled by the electronic document transformation control module 101, communicates with the processing buffer 103, accesses the source electronic documents SM, as shown in FIG. 3, and perform a function of reading the detailed information of OEs 0 constituting each page P of each source electronic document SM, for example, the position information (positions, coordinate values, widths, heights, and rotations, etc.) of respective OEs O, information about layout figures formed by respective OEs 0 (for example, information about the types of figures, such as a rectangle, a circle and a triangle, that are formed by layouts that are formed by respective OEs) , the font
characteristics of the character strings of respective OEs 0, font sizes, information about character content, etc., the background information of respective OEs 0, the color information of respective OEs 0, the name information of respective OEs 0, and the characteristic information of a page to which respective OEs 0 belong (page number information, page size information, etc. ) , and storing the results of the reading in another storage area of the processing buffer 103. Examples of the detailed information of various OEs are as follows:
- detailed information about a source electronic document: a document file name, a source document type, width and height, the existence of an outline, the thickness of an outline, background information, color information detailed information about a page OE (an OE representing the entire page) : coordinates, a width and height, a page number, a page size, margin information, the presence of an outline, the thickness of an outline, background information, and color information
- detailed information about an image OE (an OE representing an image) : coordinates, width and height, the presence of rotation, reduction and enlargement ratios, created layout information, the presence of an outline, the thickness of an outline, background information, color
information, and transparency information
- detailed information about a vector graphic OE (an OE representing a figure represented by a vector) width and height, the presence of rotation, vector information, background information, color information, thickness, a solid line, a dotted line, a double dotted line, etc., filling color information, and transparency information detailed information about a text OE (an OE representing a character string) : coordinates, width and height, text, the inclination of text, the thickness of text, font family information, a text arrangement method, reduction and enlargement ratios, OE construction layout information, the presence of rotation, background information, and color information When the detailed information of respective OEs 0 has been read through the performance of the function by the OE information reading module 120, the source layout information creation module 130, controlled by the electronic document transformation control module 101, performs a procedure of accessing the processing buffer 103, checking the detailed information of respective OEs 0 constituting each page P of the source electronic document SM, and creating the layout information of each page P of the source electronic document SM by combining together detailed information, as shown in FIG. 4.
Referring to the layout information, OEs O
constituting each page of the source electronic document SM and the characteristics of the corresponding OEs 0 can be easily checked and detected. FIG. 21 shows an example of the construction of layout information that is extracted from the source electronic document .
When the source layout information of each page P constituting the source electronic document SM has been acquired through the performance of the function by the above-described source layout information creation module 130, the OE grouping module 140, controlled by the electronic document transformation control module 101, accesses the processing buffer 103, detects the position characteristics (for example, position characteristics, coordinate characteristics, widths, heights, the presence of rotation, information about various figures, such as a rectangle, a circle and a triangle, that are formed by layouts that are formed by respective OEs, etc.) and information details (for example, details represented by each character string) of respective OEs 0, and divides the OEs into a plurality of groups based on the positional relationships between the OEs 0 (for example, information on whether specific OEs have a close relationship with other OEs, whether each OE includes some other OE, and whether respective OEs have an overlapping relationship with other OEs) and information details, as shown in FIG. 5.
When the function of the OE grouping module 140 has been performed, respective OEs 0 included in a specific page P of the source electronic document SM are divided into a group of adjacent OEs, a group of inclusive OEs, and a group of overlapping OEs, as shown in FIG. 6. An example of performing grouping on layout information having the construction of FIG. 21 is shown in FIG. 22.
The OE grouping module 140, as shown in FIG. 7, has a construction in which an OE grouping control unit 141 for generally controlling a procedure for detecting the position characteristics of OEs 0 and a grouping procedure, and a source layout information extraction unit 143, an OE position information reading unit 144, an OE content reading unit 145, an adjacent OE collection unit 146, an inclusive OE collection unit 147, an overlapping OE collection unit 148, and a grouping result information output unit 149, which are generally controlled by the OE grouping control unit 141, are closely combined with each other. When the layout information of each page P constituting the source electronic document SM in the processing buffer 103 has been stored through the performance of the function of the above-described source layout information creation module 120, the source layout information extraction unit 143 accesses the processing buffer 103 via the information exchange unit 140a, extracts
the corresponding source layout information, and stably stores the extracted source layout information in the processing buffer 142.
When the source layout information of each page P, constituting the source electronic document SM, has been acquired by the source layout information extraction unit 143, the OE position information reading unit 144 communicates with the processing buffer 142, and reads and detects the position characteristics of OEs 0, constituting corresponding pages P, for example, information about position characteristics, coordinate characteristics, widths, heights, the presence of rotation, and the type of figure, such as a rectangle, a circle or a triangle, that are formed by layouts formed by respective OEs. When the source layout information of each page P, constituting the source electronic document SM, has been acquired by the source layout information extraction unit 143, the OE content reading unit 145 communicates with the processing buffer 142, and reads and detects information details of OEs 0 constituting corresponding pages P, for example, the details of the content of respective character strings .
When the position characteristics of the OEs 0 and the details of the content of the character strings of the OEs 0 has been detected through the performance of the functions of the OE position information reading unit 144
and the OE content reading unit 145, the adjacent OE collection unit 146 groups specific OEs O, which belong to OEs 0 constituting each page P of the source electronic document SM, have a correlation in content, and are arranged adjacent to each other, in a single group. For example, in the case where each of the specific OEs 0 includes a specific number of identical words, have the same width/size/height, have the same color, or use the same font, the corresponding OEs 0 are determined to have a correlation in content.
When this procedure is completed, specific OEs 0, which belong to OEs 0 constituting each page P of the source electronic document SM, have a correlation in content therebetween and are arranged adjacent to each other, are collected and managed in a single group that is distinguished from the other groups, as shown in FIG. 6.
When the position characteristics of the OEs 0 and the details of the content of the character strings of OEs 0 has been detected by the performance of the functions of the OE position information reading unit 144 and the OE content reading unit 145, the inclusive OE collection unit 147 groups various OEs 0, which belong to OEs 0 constituting each page P of the source electronic document SM and are included in an OE, into a single group. In this case, it is possible to detect the content of respective OEs and group only OEs, the content of which has a
correlation, into a single group. For example, in the case where specific OEs 0 include the same word a specific number of times, have the same width/size/height, have the same color, or use the same font, the corresponding OEs 0 are determined to have a correlation therebetween in content .
When the position characteristics of OEs 0 and the details of the content of the character strings of the OEs 0 have been detected by the performance of the functions of the OE position information reading unit 144 and OE content reading unit 145, the overlapping OE collection unit 148 groups specific OEs 0, which belong to OEs constituting each page P of the source electronic document SM and overlap each other, into a single group. In this case, the content of respective OEs may be examined and then only OEs, the content of which has a correlation therebetween, may be configured to form a group. For example, in the case where "specific OEs 0 each include a specific or more number of the same words", have "the same width/size/height," have the "same color," or use "the same font", the corresponding OEs are determined to have a correlation in content therebetween.
When this procedure is completed, specific OEs 0, which belong to OEs 0 constituting each page P of the source electronic document SM, have a correlation in content therebetween, and overlap other OEs 0, are
collected and managed in a single group that is divided from other groups.
Meanwhile, for example, in the case where a group of inclusive OEs has an adjacent relationship with other groups or other OEs, it is possible to group them into a single group.
When OEs, which belong to OEs 0 constituting each page P of the source electronic document SM, have a correlation in content therebetween, and are arranged adjacent to each other, OEs, which belong to OEs 0 constituting each page P of the source electronic document SM and include other OEs, and OEs, which belong to OEs 0 constituting each page P of the source electronic document SM and are arranged to overlap each other, have been divided and grouped through the above-described procedure, the grouping result information output unit 149, controlled by the OE grouping control unit 141, immediately communicates with the processing buffer 103 via the information exchange unit 140a, and stores the results of the above-described grouping in the processing buffer 103, thereby assisting in normally carrying out the following procedure without hindrance.
Meanwhile, when OEs 0, constituting each page P of the source electronic document SM, have been grouped through the previously mentioned procedure, the OE sequence designation module 150, controlled by the electronic
document transformation control module 101, performs a function of accessing the processing buffer 103, detecting the position characteristics of respective OEs 0, for example, information about position characteristics, coordinate characteristics, widths, heights, the presence of rotation, and the type of figure, such as a rectangle/circle/triangle, formed by layouts formed by- respective OEs, and setting the sequence of arrangement of the OEs based figure characteristics, for example, whether each of the OEs forms a figure that extends from the left side to the right side or a figure that extends from the upper side to the lower side, formed by the OEs grouped by the OE grouping module 140, as shown in FIG. 8.
For example, of groups disposed along the same horizontal line, a group disposed to the left has priority over a group disposed to the right, and, of groups disposed along the same vertical line, a group disposed in the upper direction has priority over a group disposed in the lower direction. Furthermore, the sequence of arrangement may be determined so that OEs having a figure characteristic in which the figures thereof extend from the left to the right have priority over OEs having a figure characteristic in which the figures thereof extend from the upper side to the lower side. As a result, when the function of the OE sequence designation module 150 has been performed, the respective
OEs, which are grouped into groups of adjacent OEs, inclusive OEs, and overlapping OEs, as shown in FIG. 9, are assigned the sequence of arrangement based on the figure characteristics thereof. As shown in FIG. 10, the OE sequence designation module 150 may have a construction in which an OE sequence designation control unit 151 for generally controlling a procedure of detecting the position characteristics of OEs 0 and a sequence designation procedure, and a grouping result information extraction unit 152, an OE figure characteristic reading unit 153, an OE sequence designation execution unit 154 and an OE sequence designation result output unit 156, generally controlled by the OE sequence designation control unit 151, are combined with each other. When the OEs 0 of each page P constituting each source electronic document SM has been grouped by the performance of the function of the above-described OE grouping module 140 and information about the results of the performance has been stored in the processing buffer 103, the grouping result information extraction unit 152 accesses the processing buffer 103 via the information exchange unit 150a, extracts information about corresponding grouping results, and stores the extracted grouping result information in the processing buffer 157. When the information about grouping result information has been acquired by the grouping result
information extraction unit 152, the OE figure characteristic reading unit 153 communicates with the processing buffer 157, reads and detects the position characteristics of corresponding OEs 0, for example, information about position characteristics, coordinate characteristics, widths, heights, the presence of rotation, and the types of figures, such as a rectangle, a circle and a triangle, formed by layouts formed by respective OEs, and determines whether the grouped respective OEs 0 have a figure characteristic in which the figures thereof extend from the left to the right or a figure characteristic in which the figures thereof extend from the upper side to the lower side based on the position characteristics.
When the figure characteristics of the grouped respective OEs 0 have been detected by the OE figure characteristic reading unit 153, the OE sequence designation execution unit 154 communicates with the OE sequence designation reference information storage unit 155 and sets the sequence of arrangement of the OEs 0 so that OEs 0 having a figure characteristic in which the figures thereof extend from the left to the right have priority over OEs 0 having a figure characteristic in which the figures thereof extend from the upper side to the lower side (Of course, according to the situation, when the details of the reference data stored in the OE sequence designation reference information storage unit are newly
changed, the results of the sequence of priority by the OE sequence designation execution unit are newly changed also) .
When the procedure is completed, the grouped respective OEs 0 are flexibly assigned the sequence of arrangement thereof based on the figure characteristics, for example, the condition in which the figures thereof extend from the left to the write or from the upper side to the lower side, as shown in FIG. 9. When the grouped OEs 0 are assigned the sequence of arrangement thereof based on the figure characteristics thereof, the OE sequence designation result output unit 156 communicates with the processing buffer 103 via the information exchange unit 150a, and stores the results of the above-described sequence designation in the processing buffer 103.
Meanwhile, when the .OEs 0 of each page P, constituting each source electronic document SM, have been assigned the sequence of arrangement and the results have been stored in the processing buffer 103, the OE web pattern transformation module 160 transforms the respective OEs 0 into web pattern format entities WPs capable of constructing web pages WP, for example, HTML format entities, in the sequence of arrangement of the OEs, set by the OE sequence designation module, using web transformation tags, as shown in FIG. 11.
When the function of the OE web pattern transformation module 160 has been performed, the respective OEs 0 included in a specific page P constituting the source electronic document SM are transformed into web pattern format entities WEl, WE2, and WE3, as shown in FIG. 12.
The OE web pattern transformation module 160, as shown in FIG. 13, includes an OE web pattern transformation unit 161 for generally controlling a procedure of transforming the OEs into a web pattern, and an OE sequence designation result extraction unit 162, an OE information copying unit 163, an OE corresponding web tag extraction unit 164, an OE information pattern transformation engine 166, and a transformed web entity output unit 167, which are controlled by the OE web pattern transformation unit 161.
When the sequence of arrangement for the OEs 0 of each page P of the source electronic document SM has been designated, and thus information about the results of the designation of the sequence has been stored in the processing buffer 103, the OE sequence designation result extraction unit 162 accesses the processing buffer 103 via the information exchange unit 160a, extracts the corresponding information about the results of the designation of the sequence, and stores the extracted information about the results of the designation of the
sequence in the processing buffer 108.
When the information about the results of the designation of the sequence for the respective OEs 0 has been stored in the processing buffer 168, the OE information copying unit 163 reads and copies the details of the information of the OEs 0 in the sequence of arrangement set by the OE sequence designation module 150, and transfers the details of the information to the OE corresponding web tag extraction unit 164. The OE corresponding web tag extraction unit 164 communicates with the OE corresponding web tag storage unit 165, and loads web tags corresponding to the details of the information of the OEs 0 copied by the OE information copying unit 163. When the web tags corresponding to the details of respective OEs 0 have been acquired, the OE information pattern transformation engine 166 transforms the respective OEs 0 into a web pattern format by replacing the details of the information of the OEs 0 with a web tag format. Through this procedure, respective OEs 0 included in the specific page P of the source electronic document SM are transformed into web document language (for example, HTML) format entities WEl, WE2, and WE3, as shown in FIG. 12.
The transformed web entity output unit 167 communicates with the processing buffer 103 via the information exchange unit 160a, and stores the
transformed/created web entities WEs in the processing buffer 103.
When the web entities WEs corresponding to respective OEs O have been stored in the processing buffer 103, the web document creation module 170 combines corresponding web entities WEs (that is, the OEs transformed into a web document language) into a single file, creates a new-format web page WP (for example, a HTML format web page) corresponding to that of the source electronic document SM, and stores it in the processing buffer 103, as shown in FIGS. 14 and 15.
As is easily understood through the comparison of FIG. 3 with FIG. 15, the web page WP of the present invention is a product that is created through the above- described grouping procedure and sequence designation procedure, so that it can naturally have the pattern that most closely conforms to a format for web services, regardless of the format of the previous electronic document. An example of performing grouping on an electronic document and then transforming the electronic document into a web document according to the above-described procedures will be described with reference to FIGS. 16 to 20.
FIG. 16 shows an example of the output of one page of an electronic document. In FIG. 16, various OEs, including text OEs and image OEs, are indicated by dotted line or
solid line boxes. OEs indicated by dotted lines mean that the OEs are included in other OEs.
FIG. 17 shows an example of performing grouping on the electronic document page of FIG. 16 based on the positional relationship between respective OEs. Group 1 Gl is made up of OEs that have an adjacent relationship or an overlapping relationship, and groups 2 and 3 G2 and G3 are made up of single OEs. Groups 4 and 5 G4 and G5 are made up of OEs that have an inclusive, adjacent, and overlapping relationship, and group 6 G6 is made up of OEs that have an adjacent and overlapping relationship. In the case, in FIG. 17, groups G2 and G3 having an adjacent relationship are associated with each other in content, it is possible to combine the two groups G2 and G3 into a single group. FIG. 18 shows an example of dividing the electronic document page of FIG. 16 into respective groups through grouping, and FIG. 19 shows an example of sorting the groups of FIG. 18. As described above, a group disposed to the left has priority over another group to the right along the same horizontal line, and a group in the upper direction has priority over another group in the lower direction along the same vertical line. Group 1 Gl has the highest priority since there are no OEs to the left of the group 1 Gl and group 1 Gl is disposed at the uppermost position, and group 6 G6 has the lowest priority since group 6 G6 is disposed to the right of the other groups
along the same horizontal line. That is, groups G3, G4, and G5, having upper left coordinates below the upper left coordinates of a box of group 6 G6, have higher priorities than that of group 6 G6 since they are disposed to the left of group 6 G6.
Meanwhile, FIG. 19 shows group 5 G5, which is divided into two groups. As in G5-1 and G5-2, in the case where OEs have different background colors, fonts and styles of type, they may be determined not to be associated with each other and be divided into different groups, even though they are disposed adjacent to each other.
FIG. 20 shows an example of transforming respective groups into a web document language (HTML) and combining the groups into a single web document (INDEX.HTML) in the sequence of being sorted.
When the web document WP corresponding to the source electronic document SM has been created through the above- described procedure, the web document output module 180, controlled by the electronic document transformation control module 101, performs a procedure of accessing the processing buffer 103, extracting the corresponding web document WP, communicating with the electronic document service server 16 via the interface module 102, and uploading the extracted web document WP to the electronic document service server 16, thereby functioning to cause the web document WP to be managed in the service
information DB 17 in the form of a DB.
Although, in the above description, the specific embodiments of the present invention are described and illustrated, it will be apparent that the present invention can be varied and worked in various ways by those skilled in the art.
It should be noted that the varied embodiments must not be understood independently of the technical spirit and viewpoint of the present invention, and that the varied embodiments must be considered to fall within the scope of the attached claims of the present invention.
Claims
[Claim 1]
An electronic document transformation system, comprising: an electronic document transformation control module for generally controlling a procedure of collecting source electronic documents held by an electronic document service system, and transforming the source electronic documents into web documents; an Object Entity (OE) grouping module controlled by the electronic document transformation control module, and configured to detect position characteristics and information details of Object Entities (OEs) constituting each page of each of the source electronic documents, and divide and group the OEs into a plurality of groups based on positional relationships between the OEs and the information details of the OEs; an OE sequence designation module controlled by the electronic document transformation control module, and configured to set a sequence of arrangement of the OEs based on figure characteristics of the respective OEs grouped by the OE grouping module; an OE web pattern transformation module controlled by the electronic document transformation control module, and configured to transform the OEs into a web pattern format in the sequence of arrangement set by the OE sequence designation module; and a web document creation module controlled by the electronic document transformation control module, and configured to create a new format web document, corresponding to the source electronic document, by combining together the OEs transformed into a web pattern format by the OE web pattern transformation module.
[Claim 2]
The electronic document transformation system as set forth in claim 1, further comprising a web document output module that is controlled by the electronic document transformation control module, and uploads the web documents, created by the web document creation module, to the electronic document service system, thereby causing the web documents to be organized and managed in a DB.
[Claim 3]
The electronic document transformation system as set forth in claim 1, wherein the OE grouping module comprises: an OE grouping control unit for generally controlling a procedure of detecting position characteristics of the OEs and a grouping procedure; an OE position information reading unit controlled by the OE grouping control unit and configured to read and detect position characteristics of the OEs; an OE content reading unit controlled by the OE grouping control unit and configured to read and detect information details of the OEs; an adjacent OE collection unit controlled by the OE grouping control unit, and configured to combine specific OEs of the OEs, which have a correlation in content and are disposed adjacent to each other, into a single group; an inclusive OE collection unit controlled by the OE grouping control unit, and configured to combine specific OEs of the OEs, which have a correlation in content and each include other OEs, into a single group; and an overlapping OE collection unit controlled by the OE grouping control unit, and configured to combine specific OEs of the OEs, which have a correlation in content and overlap each other, into a single group.
[Claim 4]
The electronic document transformation system as set forth in claim 1, wherein the OE sequence designation module comprises: an OE sequence designation control unit for generally controlling a procedure of detecting figure characteristics of the grouped OEs, and setting a sequence of arrangement of corresponding OEs;
OE figure characteristic reading unit controlled by the OE sequence designation control unit, and configured to determine whether the grouped OEs have figure characteristics in which figures thereof extend from a left side to a right side or from an upper side to a lower side; and an OE sequence designation execution unit, controlled by the OE sequence designation control unit, and setting the sequence of arrangement of the OEs so that OEs having figure characteristics in which figures thereof extend from a left side to a right side have priority over OEs having figure characteristics in which figures thereof extend from an upper side to a lower side, based on results of the reading of the OE figure characteristic reading unit.
[Claim 5]
The electronic document transformation system as set forth in claim 1, wherein the OE web pattern transformation module comprises: an OE web pattern transformation unit for generally controlling a procedure of transforming a web pattern of the OEs; an OE information copy unit controlled by the OE web pattern transformation unit, and configured to read and copy information details of the OEs in the sequence of arrangement thereof set by the OE sequence designation module; an OE corresponding web tag extraction unit controlled by the OE web pattern transformation unit, and configured to load web tags corresponding to the information details of the OEs copied by the OE information copy unit; and an OE information pattern transformation engine controlled by the OE web pattern transformation unit, and transforming the OEs into a web pattern by replacing the information details of the OEs with the web tag format.
[Claim 6] An electronic document transformation method, comprising: extracting layout information from position characteristics of OEs constituting each page of each electronic document; grouping the OEs based on positional relationships between the OEs using the layout information; sorting groups of the grouped OEs according to predetermined rules; and transforming the groups into a web document in sequence of being sorted.
[Claim 7]
The electronic document transformation method as set forth in claim 6, wherein the layout information includes position information of the respective OEs and character string information of the respective OEs .
[Claim 8]
The electronic document transformation method as set forth in claim 7, wherein the layout information includes layout figure information of the respective OEs, background information of the respective OEs, and character information of a page to which the respective OEs belong.
[Claim 9]
The electronic document transformation method as set forth in claim 6, wherein the OEs include page OEs, image OEs, text OEs, and vector graphic OEs.
[Claim 10]
The electronic document transformation method as set forth in claim 6, wherein the group of the OEs includes one or a combination of adjacent OEs including OEs disposed adjacent to each other, overlapping OEs including OEs disposed to overlap each other, and inclusive OEs including an OE including other OEs and OEs included in the OE.
[Claim 11] The electronic document transformation method as set forth in claim 10, wherein the group comprises OEs that belong to OEs having an adjacent, overlapping, or inclusive relationship with respect to position, and that have a correlation in content.
[Claim 12]
The electronic document transformation method as set forth in claim 11, wherein the OEs are determined to have a correlation in content if each of the OEs includes more than a predetermined number of identical words, has an identical width, size or height, has an identical color, or has an identical font.
[Claim 13]
The electronic document transformation method as set forth in claim 6, wherein the predetermined rules comprise: a rule in which an OE to a left has priority over an OE to a right along an identical horizontal line; and a rule in which an OE in an upper direction has priority over an OE in a lower direction along an identical vertical line.
[Claim 14]
The electronic document transformation method as set forth in claim 6, wherein the transforming into the web document comprises : transforming the groups into web document language; and constructing a single web document file by combining the groups, transformed into the web document language, in sequence of being sorted.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20060087279 | 2006-09-11 | ||
KR10-2006-0087279 | 2006-09-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008032962A1 true WO2008032962A1 (en) | 2008-03-20 |
Family
ID=39183974
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2007/004363 WO2008032962A1 (en) | 2006-09-11 | 2007-09-10 | System and method for transforming electronic document |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR100955077B1 (en) |
WO (1) | WO2008032962A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101145895B1 (en) * | 2011-11-22 | 2012-05-16 | 이형로 | System, method and computer readable recording medium for transmitting a home correspondence from a teacher to parents by parsing the particular area of a document |
KR102087247B1 (en) * | 2018-06-27 | 2020-03-10 | 주식회사 한글과컴퓨터 | Web electric document editing apparatus for rendering drawing object and operating method thereof |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6088708A (en) * | 1997-01-31 | 2000-07-11 | Microsoft Corporation | System and method for creating an online table from a layout of objects |
US20020194227A1 (en) * | 2000-12-18 | 2002-12-19 | Siemens Corporate Research, Inc. | System for multimedia document and file processing and format conversion |
KR20030042601A (en) * | 2001-11-23 | 2003-06-02 | 스타브리지커뮤니케이션 주식회사 | Method for manufacturing electronic yellow page for web service |
KR20030075594A (en) * | 2002-03-19 | 2003-09-26 | 주식회사 인터유져 | The Web Document Transform System based on Unicode involving Korean Ancient Writings and Chinese Characters |
KR20030095026A (en) * | 2002-06-11 | 2003-12-18 | 하상호 | Apparatus for transforming source XML document into taget XML document and computer readable recording medium having XML document transformation software stored therein |
KR100522355B1 (en) * | 2005-01-24 | 2005-10-18 | 이종민 | Apparatus and method for composing examination questions |
KR20060010277A (en) * | 2004-07-27 | 2006-02-02 | 최태헌 | System and method for providing page retrieval information integrating electronic document splitting technology and specialized retrieval technology |
US7069506B2 (en) * | 2001-08-08 | 2006-06-27 | Xerox Corporation | Methods and systems for generating enhanced thumbnails |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3948512B2 (en) * | 2001-10-29 | 2007-07-25 | 日鉄鉱業株式会社 | Heat resistant filter element and manufacturing method thereof |
-
2007
- 2007-09-10 WO PCT/KR2007/004363 patent/WO2008032962A1/en active Application Filing
- 2007-09-11 KR KR1020070092154A patent/KR100955077B1/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6088708A (en) * | 1997-01-31 | 2000-07-11 | Microsoft Corporation | System and method for creating an online table from a layout of objects |
US20020194227A1 (en) * | 2000-12-18 | 2002-12-19 | Siemens Corporate Research, Inc. | System for multimedia document and file processing and format conversion |
US7069506B2 (en) * | 2001-08-08 | 2006-06-27 | Xerox Corporation | Methods and systems for generating enhanced thumbnails |
KR20030042601A (en) * | 2001-11-23 | 2003-06-02 | 스타브리지커뮤니케이션 주식회사 | Method for manufacturing electronic yellow page for web service |
KR20030075594A (en) * | 2002-03-19 | 2003-09-26 | 주식회사 인터유져 | The Web Document Transform System based on Unicode involving Korean Ancient Writings and Chinese Characters |
KR20030095026A (en) * | 2002-06-11 | 2003-12-18 | 하상호 | Apparatus for transforming source XML document into taget XML document and computer readable recording medium having XML document transformation software stored therein |
KR20060010277A (en) * | 2004-07-27 | 2006-02-02 | 최태헌 | System and method for providing page retrieval information integrating electronic document splitting technology and specialized retrieval technology |
KR100522355B1 (en) * | 2005-01-24 | 2005-10-18 | 이종민 | Apparatus and method for composing examination questions |
Also Published As
Publication number | Publication date |
---|---|
KR20080023663A (en) | 2008-03-14 |
KR100955077B1 (en) | 2010-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100568226C (en) | Be used for the method that reformatting has the zone of chaotic hyperlink | |
JP2019083063A (en) | System and method for automated conversion of interactive sites and applications to support mobile and other display environments | |
US8107727B2 (en) | Document processing apparatus, document processing method, and computer program product | |
US9514216B2 (en) | Automatic classification of segmented portions of web pages | |
US7904455B2 (en) | Cascading cluster collages: visualization of image search results on small displays | |
US9875220B2 (en) | Panoptic visualization document printing | |
CN101840321B (en) | Job management apparatus and control method | |
JP4079087B2 (en) | Layout system | |
US20080235207A1 (en) | Coarse-to-fine navigation through paginated documents retrieved by a text search engine | |
JP4945813B2 (en) | Print structured documents | |
EP1736894A1 (en) | Digitization service manual generation method and additional data generation method | |
US9032284B2 (en) | Green printing: re-purposing a document to save ink and paper | |
US20100131566A1 (en) | Information processing method, information processing apparatus, and storage medium | |
JP7381106B2 (en) | Information processing equipment and programs | |
US20130124684A1 (en) | Visual separator detection in web pages using code analysis | |
US11042598B2 (en) | Method and system for click-thru capability in electronic media | |
US20070211293A1 (en) | Document management system, method and program therefor | |
WO2008032962A1 (en) | System and method for transforming electronic document | |
US20080086324A1 (en) | Parts managing system, parts managing method, and computer program product | |
EP0971295A1 (en) | System for automatically organizing digital contents and recording medium on which automatically organized digital contents are recorded | |
JP2009093389A (en) | Information processor, information processing method, and program | |
US20110055258A1 (en) | Method and apparatus for the page-by-page provision of an electronic document as a computer graphic | |
US20020099623A1 (en) | System for automatically organizing digital contents and recording medium on which automatically organized digital contents are recorded | |
JP2004318766A (en) | Information retrieval device, program and storage medium | |
JP6379676B2 (en) | Output program, output device, and output method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07808155 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07808155 Country of ref document: EP Kind code of ref document: A1 |