Embodiment
Below with reference to accompanying drawing, illustrate in greater detail the present invention, represented the preferred embodiments of the present invention in the accompanying drawing.But, can embody the present invention according to different form, the present invention should not be construed to the embodiment that is confined to wherein state; On the contrary, provide these embodiment to be convenient to present disclosure, and fully pass on scope of the present invention to those skilled in the art thoroughly with complete.In the accompanying drawing, the identical identical parts of Reference numeral representative.
Referring to Fig. 1, Fig. 1 provides a kind of terminal of the present invention and the system of benefiting from.Will be mainly in conjunction with mobile communication application, system, terminal, the method and computer program product of the embodiment of the invention is described.But, should understand in mobile communications industry and outside mobile communications industry, all can use in conjunction with various other, utilize system, terminal, the method and computer program product of the embodiment of the invention.For example, can use, utilize system, terminal, the method and computer program product of the embodiment of the invention in conjunction with wired and/or wireless network (for example the Internet).
As shown in the figure, terminal 10 can comprise the antenna 12 with respect to base station (BS) 14 receiving and transmitting signals.The base station is to comprise handling the required parts of network, for example part of the cellular network of mobile switching centre (MSC) 16.As well-known to persons skilled in the art that cellular network also can be called as base station/MSC/ collaboration feature (BMI).In the operation, when terminal was sent with receipt of call, MSC can route be to and from the calling of terminal.When terminal was relevant with a certain calling, MSC also can provide and being connected of land line main line.In addition, MSC can control the forwarding of the message of being to and from terminal, can also control the message about terminal of being to and from the information receiving center, for example is to and from the forwarding of Short Message Service (SMS) message at SMS center (SMSC) 17.
MSC 16 can couple with data network such as Local Area Network, Metropolitan Area Network (MAN) (MAN) and/or wide area network (WAN).MSC can directly be connected with data network., in an exemplary embodiments, MSC and agency 18, gateway (GTW) or the like couples, and the agency is connected with WAN such as internet 20.For example, MSC can be connected with the wireless application protocol (wap) GTW of Internet connection.Kind equipment such as processing element (for example personal computer, server computer etc.) can be connected with terminal 10 via the internet again.For example, as described below, processing element can comprise one or more processing element that interrelate with originating server 22 grades, illustrates one of them of originating server 22 grades among Fig. 1.
BS 14 also can be connected with signaling GPRS (general packet radio service) support node (SGSN) 24.Those skilled in the art is known, and for packet-switched services, SGSN generally can realize being similar to the function of MSC 16.Be similar to MSC, SGSN can be connected with data network such as the Internet 20.SGSN can directly be connected with data network.But, in a more typical embodiment, SGSN and packet-switched core network, for example GPRS core network 26 connects.Packet-switched core network couples with another GTW again, and for example GTW GPRS Support Node (GGSN) 28 connects, and GGSN is connected with the Internet.In addition, GGSN can with the information receiving center, for example Multimedia Message sending and receiving services (MMS) center 30 connects.In this respect, be similar to MSC, GGSN and SGSN can control messages, for example forwardings of MMS message.GGSN and SGSN can also control the forwarding about the message of terminal of being to and from the information receiving center.
In addition, be connected with GGSN28 with GPRS core network 26, can pass through the Internet 20, SGSN and GGSN such as the kind equipment of originating server 22 and be connected with terminal 20 by making SGSN.In this respect, the kind equipment such as originating server can pass through SGSN, GPRS and GGSN and terminal communication.For example, originating server can provide content to terminal according to for example multimedia broadcast multicast services (MBMS).The more information of relevant MBMS, referring to third generation partner plan (3GPP) technical manual 3GPPTS 22.146, be entitled as: Multimedia Broadcast Multicast Service (MBMS), its content is contained in this as reference integral body.
Terminal 10 also can be connected with one or more WAPs (AP) 32.AP can comprise and being configured to according to various technology, radio frequency (RT) for example, and bluetooth (BT), infrared (IrDA) or comprise in many different radio networking technologys of WLAN technology any one is with the access point of terminal communication.On the other hand, terminal can be connected with one or more user processors 34.Each user processor can comprise such as personal computer, the computing system of laptop computer and so on.In this respect, user processor can be configured to according to various technology, RF for example, BT, IrDA or comprise LAN and/or the many different wired or wireless communication technology of WLAN technology in any one, with terminal communication.In addition, but one or more user processor can comprise the demolition storer that can preserve content, and described content can be transmitted to terminal subsequently.
AP 32 can be connected with the Internet 20 with user processor 34.Be similar to MSC 16, AP can directly be connected with the Internet with user processor.But in a preferred embodiment, AP is connected with the Internet indirectly through agency 18.Be appreciated that, by directly or indirectly terminal and originating server 22, and any apparatus in many miscellaneous equipments is connected with the Internet, terminal can intercom mutually, communicate by letter with originating server etc., thereby realize the various functions of terminal, for example transmit data, content etc., and/or from originating server received content, data etc. to originating server.Term used herein " data ", " content ", " information " and similar terms can be used to relate to convertibly according to embodiments of the invention, the data that can be transmitted, receive and/or preserve.Thereby the use of these terms should not be considered to the restriction to the spirit and scope of the present invention.
Referring now to Fig. 2,, according to one embodiment of present invention, represented network entity, for example can play the block scheme of the network entity of terminal 10, originating server 22 agencies 18, SMSC 17, MMSC 30, GGSN 28 and/or user processor 34 effects.Though be expressed as independent entity, in certain embodiments, but one or more entities can be supported logical separation be arranged in terminal, agency, originating server and/or user processor one or more of entity jointly.For example, single entity can be supported logical separation, but is positioned at the terminal and the agency in same place.In addition, for example, single entity can be supported logical separation, but is positioned at the originating server and the user processor in same place.
As shown in the figure, network entity generally can comprise the processor 36 that is connected with storer 38.Processor also can be connected with at least one interface 40 or other device, so that transmit and/or receive data, content etc.Storer can comprise volatibility and/or nonvolatile memory, preserves content, data etc. usually.For example, storer is generally processor and preserves application software, instruction etc., so that carry out according to embodiments of the invention the step relevant with the operation of entity.In addition, as described below, storer can be preserved file convertor, and described file convertor can converting form information, for terminal 10 performances.In addition, storer can be preserved from network entity and transmit, and is perhaps received by network entity, for example from the content of another network entity.
Fig. 3 illustrates the functional diagram of the movement station that can play terminal 10 effects according to embodiments of the invention.The movement station that should understand graphic extension and describe below just can be benefited from illustrating of a kind of terminal of the present invention, so should not be considered to limitation of the scope of the invention.Though for example purposes, graphic extension and several embodiment of movement station are described below, but the movement station of other type, for example portable digital assistant (PDA), pager, the voice and the text communication system of laptop computer and other type can easily adopt the present invention.
Movement station comprises transmitter 42, receiver 44 and provides signal and from the processor of receiver received signal, for example controller 46 to transmitter respectively.These signals comprise the signaling information according to the air-interface standard of the cellular system that is suitable for, and the data of user speech and/or user's generation.In this respect, movement station can be according to a kind of or a plurality of air interface standard, communication protocol, modulation type and access style work.More particularly, movement station can be according to any one communication protocol work in many first generation (1G), the second generation (2G), 2.5G and/or the third generation (3G) communication protocol etc.For example, movement station can be according to 2G wireless communication protocol IS-136 (TDMA), GSM and IS-95 (CDMA) work.In addition, movement station can be according to 2.5G wireless communication protocol GPRS, the work such as (EDGE) of enhancing data gsm environment.In addition, movement station can be according in many different digital broadcast technologies any one, for example DVB technology (for example DVB-T, etsi standard EN 300744) work.Movement station can also be according to many different broadcasting and/or in the multicasting technology any one, for example MBMS technology (for example 3GPP TS 22.146) work.In addition, movement station can be according to work such as ISDB-T, DAB, ATSC technology.Some arrowband AMPS (NAMPS) and TACS, movement station is also benefited from embodiments of the invention, double mode or more height mode movement station (for example digital-to-analog or TDMA/CDMA/ analog telephone) also should benefit from embodiments of the invention.
Controller 46 obviously comprises audio frequency and the required circuit of logic function of realizing movement station.For example, controller can be by the data-signal treating apparatus, microprocessor, and various A/D converters, D/A and other support circuit to form.According to the ability of these devices, between these devices, distribute the control and the signal processing function of movement station.Thereby controller also is included in before modulation and the transmission, and message and data are carried out convolutional encoding and staggered function.Controller also can comprise internal voice coder (VC) 46A, and can comprise content-data modulator-demodular unit (DM) 46B.In addition, controller can comprise that operation is kept at the function of the one or more application software in the storer.
Movement station also comprises user interface, comprises traditional earphone or loudspeaker 48, ringer 50, and microphone 52, display 54 and user's input interface, all these is connected with controller 46.The user's input interface that allows movement station to receive data can comprise any one in the many equipment that allow movement station to receive data, and for example keypad 56, touch display (not shown) or other input media.In comprising the embodiment of keypad, keypad comprise traditional numeral (0-9) and related key (#, *), and other button that is used to operate movement station.
According to any one technology in many different wired and/or wireless technologys, movement station also can comprise shared data and/or from electronic equipment, for example another terminal 10, agency 18, originating server 22, AP32, user processor 24 grades obtain one or more devices of data.For example, mobile platform can comprise radio frequency (RF) transceiver 58 and/or infrared ray (IR) transceiver 60, thereby movement station can according to radio frequency and/or infrared technology be shared and/or the acquisition data.In addition for example, movement station can comprise bluetooth (BT) transceiver 62, thereby movement station can be shared and/or the acquisition data according to the bluetooth tranmission techniques.On the other hand, though not shown, movement station can comprise LAN and/or WLAN technology according to many different wired and/or wireless networking technology, transmits and/or receives data from electronic equipment.
Movement station also can comprise storer, user identification module (SIM) 64 for example, and user identification module (R-UIM) etc. movably, described storer is preserved the message unit relevant with the mobile subscriber usually.Except that SIM, movement station can comprise other storer.In this respect, movement station can comprise embeddable and/or demountable volatile memory 66 and/or other nonvolatile memory 68.For example, other nonvolatile memory can comprise contain embedding or removable multimedia storage card (MMC ' s), memory stick, EEPROM, flash memory, hard disk etc.
Storer 64,66,68 can preserve any information of many information that movement station is used for realizing the function of movement station, and data.For example, storer can store can usual practice such as MSC 16 discern the identifier of movement station, for example international mobile device identification (IMEI) sign indicating number, international mobile subscriber identification (IMSI) sign indicating number, moving part ISDN (Integrated Service Digital Network) (MSISDN) sign indicating number etc. uniquely.Storer can also be preserved content, for example the content of receiving from originating server 22 and/or user processor 34.In addition, for example, storer can be preserved one or more presentation application programs, for example traditional text reading device, audio player, video player, multimedia browser etc.In addition, as described below, storer can be preserved can converting form information, so that by the file convertor of movement station performance.
As described in the background technology, because limited display area, the cause of resolution and expressive ability, terminal 10 generally can not show electronic document as the initial design electronic document.For example, many terminals generally can only go up several styles of writing bases of performance at display (as display 54), perhaps gray level, the image of thumbnail size or at all do not have image.In this respect, terminal usually can not be as the initial design form data, the form data of performance electronic document.For example, some traditional equipment are ignored this list information.Miscellaneous equipment is clipped the select column of form, the perhaps linearization of executive table, and each cell of apparent form is not individually generally considered the natural ordering of the content of these cells in the form.Some other equipment only shows the sub-fraction viewport of whole form, requires user mobile viewport in form of equipment, for example from left to right, thus the different piece of demonstration form.
So embodiments of the invention can pass through the conversion electron document, so that by having the limited display area, the terminal performance of the display of resolution and/or expressive ability.More particularly, the form data that embodiments of the invention can the conversion electron document is so that by this terminal performance.Opposite with the conventional art of performance form on this terminal, so, embodiments of the invention can not clipped one or more select columns of this form, each Dan Wuge by the performance form, carry out the mechanical linearization of form, and do not consider the natural ordering of the content of cell in the form and the sub-fraction viewport that not merely shows whole form, thereby the user who requires equipment moves under the situation in viewport, shows the full content of this form.
According to embodiments of the invention, the form data of electronic document generally comprises one or more forms, each form sealing content, for example data, information etc. two-dimensional matrix.Understand as those skilled in the art, this two-dimentional form generally includes delegation and at least one row at least.In addition, this form can (but generally not) comprise labyrinth (for example, nested tables or image).In addition, between the row and column of this form, generally there are senior grammer (syntactic) and semantic (semantic) correlativity.
Referring now to Figure 4 and 5,, Figure 4 and 5 illustrate the process flow diagram of the method for agency's 18 functional-block diagram and the form data in the conversion electron document respectively according to one embodiment of present invention.More particularly, Fig. 4 illustrates according to a preferred embodiment of the present invention from content source 100, and for example terminal 10, originating server 22, SMSC 17, and MMSC 30, GGSN 28, user processor 34 receptions such as grade, and the agency's of forwarding electronic document functional block diagram subsequently.But before the forwarding electronic document, the agency can operate file convertor 102, and file convertor can receive electronic document, and the form data of conversion electron document afterwards is so that by terminal 10 performances.The agency can be transmitted to terminal to the electronic document that comprises the form data after the conversion subsequently, so that by terminal, perhaps more particularly by display (for example display 56) performance of terminal.
Though, agency 18 as represent here and illustrate operate file convertor 102, obviously can be from the arbitrary network entity intrasystem many heterogeneous networks entities, for example comprise that or the content source 100 of terminal 10 own operates file convertors.In this respect, can realize file convertor on the single network entity, perhaps a plurality of parts of file convertor can realize on more than one network entity.In addition, as described here, network commutator generally comprises and can be stored in the storer (for example storer 38), and by the software of processor (for example processor 36) operation.But on the other hand, under the situation that does not break away from the spirit and scope of the present invention, file convertor obviously can comprise firmware or hardware.In addition, except converting form, so that outside the terminal performance electronic document, file convertor obviously can carry out other operation to electronic document.The more information of relevant various this operations, for example referring to, U.S. Patent application No.09/851404, be entitled as: Reorganizing Content of an Electronic Document (the May 8 calendar year 2001 applying date), and be disclosed as its content whole of U.S. Patent application No.2003/0046318 on March 6th, 2003 and be contained in this as a reference.
As shown in the square frame 104 of Fig. 5, the method that conversion comprises the form data of at least one form comprises that file convertor 102 receives electronic document from content source 100.According to embodiments of the invention, electronic document comprises one or more forms, and each form comprises delegation's content and at least one row content at least.Electronic document can have any one form in many different-formats.For example, electronic document can comprise HTTP (HTTP) document, email documents, and Portable Document format (PDF) document, appendix (postscript) document, ASCH text formatting (TXT) document, extensible markup language (XML) document,
Word
TM(DOC) document,
Excel
TM(XLS) document etc.
Receive after the electronic document that file convertor 102 can be represented the intermediate data structure that electronic document converts electronic document to, as shown in square frame 106.File convertor can be according to any one mode in many different modes, the conversion electron document, thus produce in the many different intermediate data structures can represent electronic document any one.In this respect, file convertor can convert electronic document to comprise electronic document the intermediate data structure based on the expression of common internal tree.For example, file convertor can convert electronic document to expanded hypertext markup language (XHTML) document objectives model (DOM) expression of electronic document.
File convertor 102 can convert electronic document to XHTML DOX data structure according to many different modes.For example, in one embodiment, data converter converts this html document to the XHTML document by electronic document being converted to html document (if also not being html document), afterwards this XHTML document is converted to the XHTMLDOM data structure, the conversion electron document.For electronic document being converted to aforementioned form, to produce XHTML DOM data structure, file convertor can be carried out any one in the known many different conversion routines.
On the other hand, file convertor 102 can with another software of moving on agency 18 or another network entity, hardware or firmware module communication, another module described here can be carried out one or more conversion routines, and the help document converter changes into XHTML DOM data structure to electronic document.For example, file convertor can be communicated by letter with the wvWare software package of developing under the wvWare project, and the DOC document is converted to html document.In addition for example, file convertor can be communicated by letter with the xIHTML software package of developing under the Chicago project, and the XLS document is converted to html document.In addition for example, file convertor can be communicated by letter with the HTMLTidy software package of developing under the Tidy project, and html document is converted to the XHTML document.
In addition for example, file convertor 102 can with the XML resolver, for example Xerces XML resolver software package (developing under Apache XML project) communication converts the XHTML document to the XHTMLDOM data structure.No matter how file convertor converts electronic document to represent electronic document data structure, data structure can (but needn't) be stored in the volatile memory opposite with nonvolatile memory, and file convertor and electronic document reside in the nonvolatile memory usually.As what understand, data structure generally includes and changes many nodes of the label correspondence of back XHTML document, and the label of XHTML document can be based on the label of original html document after the described conversion.For example, consider the following source code of html document:
<html>
<body>
<img?src=”http://www.domain.com/img/image.gif”/>
<table>
<tr>
<td>This?is?the?first?cell</td>
<td><img?src=”http://www.domain.com/img/image2.gif”/><td>
<td>This?is?the?third?cell</td>
</tr>
<tr>
<td>This?is?the?fourth?cell</td>
<td><img?src=”http://www.domain.com/img/image3.gif”/></td>
<td>This?is?the?sixth?cell</td>
</tr>
</table>
<small>
Atext?block
</small>
</body>
</html>
Shown in above-mentioned source code, html document comprises a form that comprises two row, three row, and wherein the secondary series of every row comprises an image.Html document also comprises the text block of the 3rd image and a small font.From such html document, file convertor can produce XHTML DOM data structure as shown in Figure 6.
No matter how file convertor 102 is represented the intermediate data structure that electronic document converts electronic document to, when file convertor converts electronic document to intermediate data structure or afterwards, file convertor can be discerned any " implying " form in the intermediate data structure, as shown in the square frame 108 of Fig. 5.Be appreciated that in the format of electronic document otherwise, for example by the label in the html document (promptly "<table〉</table ") identification form, file convertor does not need specifically to discern this form usually.But for other forms, electronic document can be designed to show this information with form, rather than comprises the format of this information being appointed as a form.File convertor can be discerned such " implying " form, and wherein implicit form be can't help usually after file convertor converts electronic document to data structure, and the form node in the data structure is represented.Afterwards, file convertor can form data structure again, to comprise the form node of this form.
For example, consider the following A SCII format text that the TXT document is interior:
Movable calorie/hour
Aerobic exercise 660
Basketball 550
Run 925
Tennis 450
Visually, above-mentioned ASCII fromat text is obviously represented a form.But this TXT form does not comprise represents that clearly this text is the mechanism (for example label) of form.According to the technology that this TXT document is converted to data structure that file convertor adopts, file convertor can be regarded such text block as form.But under various other situations, file convertor 102 may not regarded this text block as form.In these cases, document can produce the data structure shown in Fig. 7 A.Shown in Fig. 7 A, data structure is identified as a non-structured text to the text.Thereby in electronic document comprised transfer process at electronic document, the implicit form of being discerned by file convertor not perhaps was expressed as under the situation of implicit form of the form node in the data structure, and file convertor also can be discerned implicit form.
File convertor 102 can be according to any one mode in known many different modes (comprising special-purpose resolver and automatic learning art), the implicit form of identification.Many known technologies of the implicit form of identification for example are referring to the A System of Understanding and ReformulatingTables of Jiangying Hu etc., Fourth ICPR WORKSHOP ON DOCUMENTANALYSIS SYSTEM (2000); Mathew Hurst﹠amp; Shona Douglas, Layout and Language:PreliminaryInvestigations in Recognizing the Structure of Tables, PROCEEDINGS OF THEINTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS ANDRECOGNITION (ICDAR) 1043-1047 (1997); And the Learningto Recognize Tables in Free Text of Hwee Tou Ng etc., Proceedings of the 37th Annual Meeting ofACL 443-450 (1999), the content of described these documents is contained in this as reference integral body.
No matter how file convertor 102 discerns implicit form, if the one or more implicit form in the file convertor identification electronic document, then file convertor can produce or form the data structure that text identification is become form.For example, file convertor can be identified as form to a text block, and electronic document is being converted in the html document process, produce formatted, so that text block is identified as the html document of form, for example by text block is enclosed in suitable html tag (for example<table,<tr 〉,<td etc.) in.The example about the text block in the TXT document above continuing, file convertor can be identified as implicit form to text piece, produces or form data structure subsequently, so that implicit form as described in comprising with the form of the explicit form as shown in Fig. 7 B.
After the data structure that file convertor 102 produces electronic document is represented, comprise that file convertor can rearrange the form of electronic document as the formation again of the arbitrarily implicit form of explicit form, so that have predetermined ordering, as shown in square frame 110.In this respect, no matter mainly sorting left to right still mainly sorts top to bottom, and file convertor can both be discerned in the form of electronic document, the natural ordering of cell.Understand as those skilled in the art, when the content in the form by rows the time, form has row major usually and sorts.On the contrary, when the content in the form was arranged by row, form had the row priority ordering usually.
After the natural ordering of the cell in the form of identification electronic document,, file convertor 102 all has predetermined ordering thereby can rearranging all forms of this form.For example, in an exemplary embodiments, file convertor can rearrange form, so that all forms all have the row major ordering.In these cases, have the form of row priority ordering for those, file convertor can the transposition form, so that natural ordering becomes row major.But on the other hand, obviously file convertor can rearrange form, so that form has the row priority ordering, adjusts other operation of file convertor simultaneously in view of the above.
File convertor 102 can be according in many different modes any one, the natural ordering of identification form.For example, file convertor can be configured to search for the similarity pattern of adjacent cells compartment in each form.In this respect, for example consider a preferential form of row, the preferential form of described row has second row that comprises an image in each cell.The similar each other fact of all cells in this delegation shows that (generally like this, still such was the case with) this form has the row priority ordering.On the other hand, for example, consider a row major form, described row major form has the secondary series that comprises an image in each cell, and the 5th row that comprise the numeral of following % in every row.That does not show that than similar true same (but not being utterly) this table has the row major ordering cell in the row.
In an exemplary embodiments, file convertor 102 is by being shown as the information table in each cell a proper vector, the natural ordering of identification form.For example, each cell can be by the feature scale-of-two vector representation of designated length N, and each position in the vector is corresponding to the decision content of representing the feature of information in the respective cells (being/denying problem).Be appreciated that the eigenvector representation is a kind of architectural feature Methods for Coding of each cell to form.Thereby, look similar that cell has similar eigenvector usually and represents.
Any binary digit in many (N) binary digit that should in many different decisions, judge arbitrarily when proper vector can comprise.For example, eigenvector can comprise 13 binary digits (being N=13), and each binary digit is corresponding to judgement as follows:
Position 1 is " 1 ", and and if only if, and cell comprises an image
Position 2 is " 1 ", and and if only if, and cell comprises a numeral
Position 3 is " 1 ", and and if only if, and cell comprises a hyperlink
Position 4 is " 1 ", and and if only if, and cell comprises bold text
Position 5 is " 1 ", and and if only if, and cell comprises the italic text
Position 6 is " 1 ", and and if only if, and cell comprises punctuation mark
Position 7 is " 1 ", and and if only if, and cell comprises 0 to 5 character
Position 8 is " 1 ", and and if only if, and cell comprises 6 to 10 characters
Position 9 is " 1 ", and and if only if, and cell comprises 11 to 15 characters
Position 10 is " 1 ", and and if only if, and cell comprises 16 to 20 characters
Position 11 is " 1 ", and and if only if, and cell comprises 21 to 25 characters
Position 12 is " 1 ", and and if only if, and cell comprises 26 to 30 characters
The position 13 is " 1 ", and if only if cell comprises more than 31 character
Concise and to the point now with reference to figure 8A and 8B, Fig. 8 A and 8B for example understand the eigenvector of the cell of form and this form respectively.As shown in the figure, for this form, for example, the cell of first row, first row comprises a space.Like this, the cell in first row, first row can be by eigenvector 0000001000000 expression, and wherein unique " 1 " place value is corresponding to the cell that comprises 0 to 5 character.On the contrary, the cell in second row, first row comprises phrase " equity reserve
SMLine ", this phrase can be by eigenvector 0000001000000 expression.As can be seen, unique " 1 " place value of front is corresponding to the cell that comprises 21 to 25 characters (this phrase comprises 21 characters (comprising the space)).
After the information table in each cell was shown as eigenvector, according to eigenvector, file convertor 102 can be determined the similarity between the cell in the form.File convertor can for example by determining " Manhattan " distance between the adjacent cells lattice, be determined the similarity between the adjacent cells lattice according to many different modes.In this respect, by the number of binary digits different between the eigenvector of determining respective cells, can determine the distance of two adjacent cells lattice.For example, the distance between 0111 and 1110 is 2.Thereby be appreciated that the distance between the adjacent cells lattice is more little, then respective cells is just similar more.For the distance measure between the cell of the form of Fig. 8 A that determines according to embodiments of the invention, ginseng is with Fig. 8 B.
Be appreciated that in addition in all cases, can judge than other, express the cell similarity more corresponding to one or more judgements of the corresponding binary bit value of eigenvector.For example, comparable the 10th of first in above-mentioned 13 eigenvectors expresses the cell similarity more.Thereby according to embodiments of the invention, one or more binary digits can have relevant weight, thus between the adjacent cells lattice, binary digit separately not simultaneously, in summation, the distance between two eigenvectors comprises associated weight.
Anyway definitely the similarity measurement between the adjacent cells lattice of definite form, or distance, after determining distance, file convertor 102 can determine that form has the row major ordering or has the row priority ordering.More particularly, file convertor can be determined the ordering of form according to each row of form and the distance between each row.For example, file convertor can be determined in form many to the mean distance S between the adjacent cells lattice with in the delegation
HWith determine the many mean distance S in the same row in form to the adjacent cells compartment
VBe appreciated that with in the delegation between the adjacent cells lattice each row of less mean distance ordinary representation form comprise similar cell, this hint form may have the row priority ordering.On the contrary, in the same row between the adjacent cells lattice each row of less mean distance ordinary representation form comprise similar cell, this hint form may have the row major ordering.So, represent that file convertor can be determined S according to following formula with symbol
HAnd S
V:
As shown in above-mentioned formula (1) and (2), i=1 ..., R represents each the row j=1 in the form ..., C represents each row.x
I, jAnd x
I+1, jRepresent the adjacent cells lattice of form, Δ
cRepresent the distance between the corresponding adjacent cells lattice.For the form of Fig. 8 A, and according to the distance shown in Fig. 8 B, then S
H=2.4 (promptly 24/10) and S
V=0.583 (promptly 7/12).
In case file convertor 102 has been determined with many to the mean distance S between the adjacent cells lattice in delegation and the same row
HWith row S
V, file convertor just can determine that form has row major ordering or row priority ordering.For example, file convertor can be determined ordering by comparing mean distance.If it is many in the same row to the mean distance S between the adjacent cells lattice
VGreater than with many in the delegation to the mean distance S between the adjacent cells lattice
H(be S
V>S
H), then file convertor can determine that form has the row priority ordering, otherwise form has the row major ordering.In order to rearrange the form of electronic document, so that have predetermined ordering, for example row major ordering, file convertor subsequently can transposition has the form of row priority ordering, thus all forms all have the row major ordering.Represent with symbol, by the reorganization dom tree, so that each cell x
I, jWith each cell x
J, iExchange, but the form of transposition electronic document.
Be appreciated that in all cases S
HAnd S
VMay be closely similar, this can cause being become ordering and the opposite form of being expected of ordering by transposition.Thereby in comparison procedure, file convertor 102 can be to S
HOr S
VWeighting, thereby the form of the more possible or unlikely transposition appointment of file convertor.For example, file convertor can be S
HMultiply by deviate C, described deviate is equal to or greater than 1.Then, comparing S
HAnd S
V(be S
V>C*S
H) process in, along with deviate increases, when the file convertor transposition is listed as preferential form, and S
HAnd S
VWhen having similar value, the file convertor unlikely transposition form that becomes.
Rearranging form, having after the predetermined ordering, file convertor 102 can become list linearization the one-dimensional sequence of content of the cell of form, and linearization content here is not comprised in the form.But advantageously, before linearized table, file convertor can localize and be included in the interior any mark of form, as shown in the square frame 112 of Fig. 5.Be appreciated that many forms comprise the cell as the mark of one or more other cells.In this respect, when mark related to single file or single-row unit, this mark was commonly called " directly " mark.On the contrary, when mark related to more than one row or column, this mark was commonly called " leap " mark.For example, as shown in the form of Fig. 9, such as " MR ", " TM ", " GP " and " G " and so on mark constitutes direct mark, and mark " Regular Season " and " Post Season " constitute the leap mark.
Be appreciated that and cause the related cell of row mark and this mark to separate as shown in Figure 9 row major list linearization meeting with row mark.Like this, on the limited terminal 10 of display area, the form of indigestion performance.In this respect, referring to Figure 10 A and 10B, Figure 10 A and 10B have illustrated on the terminal display, a plurality of parts of the form of Fig. 9, and form has passed through direct linearization process here.Can see that row labels is separated with the cell that these marks relate to, thereby cause the form indigestion.More particularly, as shown in the figure, comprise that the row of many direct marks (example beginning " YR ", " TM ", " GP ", " G " etc.) visually separates with its related data, the cell in promptly corresponding each row is under respective markers.
For fear of isolating the data that row mark and these marks relate to, file convertor 102 localization of a preferred embodiment are positioned at the mark (if any) of the form of electronic document.More particularly, file convertor permutatation mark, so that mark is placed near the data of these marks descriptions.Like this, file convertor can help the user to understand the form of performance subsequently.
File convertor 102 can be according to any one mode in many different modes, the mark in the form of localization (localize) electronic document.Below with reference to Figure 11, a kind of particularly advantageous technology of localization mark is described.As described below, the file convertor localization has the mark of delegation ground row major form at least, and described delegation at least comprises at least one mark.But, the file convertor mark that obviously can under other many situations, localize.For example, can the localize mark of the preferential form of row of file convertor with at least one row, described at least one row comprise at least one mark.In this case, can carry out based on the operation of going based on row usually as the process of describing the mark that localizes here, vice versa.Also should understand need not localize mark in the form of file convertor, particularly those not comprised the form of any mark.Usually, when document conversion is thought highly of the preferential form that is arranged in rows to form, and form do not comprise when having mark any capable, the file convertor mark that yet do not localize.
According to a kind of technology, comprise the leap mark by identification at first, promptly cross over each row of the above mark of row, file convertor 102 mark that can localize is as shown in the square frame 118 of Figure 11.More particularly, file convertor can comprise the label of the cell of list data (for example<td 〉) or gauge outfit (for example<th 〉) about identification, check the cell of predetermined line number (for example triplex row), cell also has multiple row attribute (for example " colspan ") here.Be appreciated that the above cell of this attribute ordinary representation leap form one row.For example, in the form of Fig. 9, first row comprises all crosses over above two cells (i.e. " RegularSeason " and " Post Season ") of row.
After identification comprised the row of crossing over mark, file convertor 102 can be discerned and comprise direct mark, promptly all only related to any row of the mark of row, as shown in square frame 120.Similar to the front, file convertor can detect the cell of predetermined line number (for example triplex row) about arbitrarily direct mark.File convertor can be located direct mark according to any one mode in many different modes.For example, in one embodiment, file convertor is located direct mark according to the similarity measurement between the adjacent lines (for example distance), and wherein the similarity measurement between the adjacent lines is based on the similarity measurement between the cell of adjacent lines, as mentioned above.In this respect, person of skill in the art will appreciate that directly the delegation of row mark is different from all the other each row in the form usually at many different aspects.For example, second row of being made up of direct row mark of Fig. 9 all is made up of word line, almost all is made up of numeral and under second row each is capable.
In order to locate the direct mark of form, file convertor 102 can be determined similarity or the distance between the adjacent lines of form, determines similarity according to similarity of vertical adjacent cells lattice in corresponding each row here.Represent that with symbol file convertor can be according to equation (3), determine the distance, delta between the adjacent lines of form
R:
Above, m represents first row that does not comprise at least one leap mark of form.
After the distance of determining between the adjacent lines, file convertor 102 can be discerned with any a pair of follow-up adjacent lines and compare, dissimilarity is bigger any a pair of adjacent lines obviously, and obviously dissimilarity is expressed the row of first in the corresponding a pair of adjacent lines that comprise direct mark here.In other words, file convertor can determine whether the distance between any a pair of adjacent lines is obvious greater than the distance between follow-up any a pair of adjacent lines, first row of any a pair of this adjacent lines is identified as to comprise direct mark subsequently.More particularly, for example, file convertor can compare each Δ
R(i, i+1) and start from Δ
R(i+1, i+2) follow-up each to the distance between the adjacent lines.Afterwards, at least than first Δ of follow-up any knot adjacent lines big prescribed percentage (for example 50%)
R(i, i+1), row i can be identified as the row that comprises direct mark.
In case file convertor 102 has been discerned the row of crossing over mark and/or comprising direct mark, then the file convertor labelled tree that can produce form is represented, for example according to any one technology in many known technologies, as shown in the square frame 122 of Figure 11.More particularly, for example, file convertor can produce another data structure of usefulness for the localization mark, and related with respect to the individual unit lattice of form of mark and direct row mark crossed in this data structure reflection.This data structure that can be known as " labelled tree " can be caught the structure of the leap mark and the direct row mark of previous identification.In this respect, according to an illustration embodiment, Figure 12 for example labelled tree of the form of clear Fig. 9 represents.
After the labelled tree that file convertor 102 produces into form is represented, can the localize mark of form of file convertor.As shown in square frame 124, represent by traversal labelled tree and data structure, determine in the linearization of form is represented the layout of mark (placement), the file convertor mark that can localize.Like this, can be linearized after the form, thus comprise near the mark that is positioned at the respective table unit lattice.Being appreciated that can be according to any one mode in many different modes, and the traversal labelled tree is represented and data structure.For example, in one embodiment, file convertor is checked every row (j=1, ..., every row R) (i=1 ..., C) each cell is got rid of sleazy any cell (being blank cell) and is comprised any cell of crossing over mark or direct border note.
For every row of every row of getting rid of blank cell and indexing unit lattice, the file convertor definition is represented the path label x of the root node of (referring to Figure 12) to the branches and leaves L of a certain branch from the tree-like mark of form
k={ x
1, x
2... ..x
L, each branches and leaves is relevant with the respective column of form here.In addition, file convertor can define a mark string s
I, j, here with this mark string of 0 value initialization of each cell.Each node k=of each path label of representing for the labelled tree of form (1 ..., L), if x
kComprise the leap mark, x
K+1Be included in x
kFirst following node, then file convertor 102 can be new lines (line), and together with crossing over mark, another new lines and at least one character blank (for example tab) add mark string s together
I, jOn the other hand, if x
kComprise direct mark, then file convertor can be direct mark, and together with separator, for example colon adds mark string s together
I, jThen, at the look-up table unit lattice and produce mark string s
I, jAfterwards, file convertor can be attached to units corresponding lattice x to each mark string
IjOn, for example in the linearization procedure of form, the following describes.
For example, consider the form of Fig. 9 and the mark tree represenation of Figure 12, form comprises many leap marks and direct mark here.In this case, can begin to check the cell of form from the cell (3,1) that comprises value " 84-85 ".Root node along the mark tree represenation arrives the branches and leaves relevant with the third line, file convertor 102 definable path label x
k={ x
1=" YR " }.In addition, file convertor can defined label string s
3,1, the mark string is initialized to zero here.Then, because x
1Comprise direct mark, so file convertor can add mark string s to direct mark together with colon
I, jThereby, s
3,1={ " YR: " }.Can repeat identical process, s here to cell (3,2) subsequently
3,2={ " TM: " }.On the other hand, for cell (3,3), file convertor can defined label path x
k={ x
1=" RegularSeason "; x
2=" GP " }.Mark string s
3,3Be initialized as after zero, because x
1Comprise and cross over mark and x
2(be x
I+1) comprise x
1Under first node, so file convertor can be new lines together with crossing over mark, another new lines and tab add mark string s together
3,3Afterwards, for x
2, file convertor further can add mark string s to direct mark and colon
3,3In.So when terminal 10 performance mark strings, mark string s
3,3Can show as follows: s
3,3=n; " RegularSeason "; N; T; " GP: " }, " n " and " t " represents an ew line and tab respectively here.
Refer again to Fig. 5, in case file convertor 102 has been determined mark string s
I, jThereby, the localization mark, then file convertor can become list linearization the one-dimensional sequence of cell, as shown in square frame 114.File convertor can be according to any one mode linearized table in many different modes.For example, in one embodiment, by replacing form with one or more independent paragraphs (for example<p 〉), file convertor can linearized table, and each paragraph comprises one or more cells of original form here.In this respect, file convertor can travel through the form with row major ordering, common cell from the upper left side, end at the cell of lower right side, and the transfer of content of corresponding units in the corresponding paragraph of the linear expression of form, particularly arrange the form of electronic document, so that when having the row major ordering when file convertor.
In one embodiment, represent that by the data structure of checking form file convertor 102 can linearized table.For example, the data structure of considering the electronic document that comprises form shown in Figure 13 A is represented.For each form node, file convertor can with the identical level of corresponding form node, in data structure, produce a new interim node (being expressed as node " X " among Figure 13 B).Then, for each back end of each row node (promptly<tr 〉) of a cell representing form (getting rid of blank cell and indexing unit lattice usually) (promptly<td 〉), file convertor can add the child node of a paragraph node (for example<p 〉) as node " X ".As shown in Figure 13 B, file convertor can move the content node (being C1, C2 etc.) of each back end afterwards, perhaps more particularly, and the content x of the respective cells below the correspondent section drop node
I, j, in Figure 13 B, have only the first and second content node C1 and C2 to be expressed as moving.Except moving each content node below the correspondent section drop node, if respective cells has corresponding mark string, then file convertor can be corresponding mark string s
I, jBe attached on the content of respective cells.
After the content node of each back end that moves each the row node below the correspondent section drop node, file convertor 102 can be deleted associated row node, back end and the content node under this form node and the form node.Though after the content node of each back end that moves each row node, file convertor can be deleted form node and relevant node, in all child nodes (being interdependent node) that move or delete respective nodes afterwards, file convertor can be deleted each node but on the other hand.Thereby, for example, move under the correspondent section drop node corresponding contents node (being C1, C2, C3 etc.) afterwards, file convertor can be deleted each list data node (promptly<td 〉).Equally, for example, at the corresponding form back end of deletion (that is, td 〉) afterwards, file convertor can be deleted each table row node (promptly<tr 〉).
But, no matter when file convertor 102 deletes the form node, after the content node of each back end that moves each the row node under the correspondent section drop node, file convertor can be eliminated node " X " from data structure, and in the data structure of electronic document is represented, on move each node under node " X ", as shown in Figure 13 C.Thereby, as shown in the figure, represent each cell x of form
I, jEach content node (being C1, C2 etc.) of (get rid of blank cell and indexing unit lattice) can be comprised in the correspondent section drop node that the data structure of electronic document represents.Be appreciated that in this case, comprise any attached interpolation mark string s
IjEach cell of original form can be comprised in one independently under the paragraph node.But on the other hand, file convertor obviously can linearized table, so that more than one cell is comprised in one independently under the paragraph node.For example, file convertor can linearized table, so that cell of each row of form is included under the corresponding paragraph node.An example of this situation is referring to Figure 14, and Figure 14 has represented that the linearization of the form of Fig. 9 represents that wherein one or more cells of form comprise additional mark string.
In case file convertor 102 linearizations form, file convertor just can be represented the data structure of electronic document that (linearization that comprises form is represented) convert electronic document to, as shown in the square frame 116 of Fig. 5.Be appreciated that file convertor can be according to many different modes translation data structure, for example according to file convertor electronic document being converted to the opposite mode of intermediate data structure (referring to square frame 106).Subsequently, in case file convertor 102 has gone back to electronic document to data structure, wherein electronic document comprises the linear expression of each form that is included in the original electronic document now, then acts on behalf of 18 and can be transmitted to terminal 10 to this electronic document, as shown in square frame 117.Afterwards, terminal can show this electronic document, for example on display (for example display 54).In this respect, the example of the form of performance Fig. 9 on this display is referring to Figure 15.
As representing here and illustrate that file convertor 102 can receive the electronic document that comprises at least one form, change the form data of the document afterwards, so that by terminal 10 performances.File convertor can be transmitted to terminal to the electronic document that comprises the form data after the conversion subsequently, so that by described terminal performance, perhaps more particularly, by display (for example display 56) performance of terminal.Be noted that under different situations file convertor can be transmitted this electronic document under the situation of the form data of not changing document.In this respect, be appreciated that in many cases that it is flexible greater than the L-R panorama in the display window of terminal display width that terminal is supported in width.Thereby before converting form information, file convertor can determine whether the terminal recipient of electronic document supports left and right sides panorama flexible.Afterwards, file convertor can be under the situation that does not have converting form information the forwarding electronic document.On the other hand, file convertor can be asked, and for example the user's request to terminal is the form data of conversion electron document, the still selection of forwarding electronic document under the situation that does not have converting form information.
File convertor 102 can be determined the ability that terminal 10 is clapped the electronic document panorama according to many different modes.For example, file convertor can keep comprising the inventory of one or more receiving equipments, and indicates described each receiving equipment whether to support to clap the inside of electronic document panorama " capacity of equipment " tracing table.In this case, file convertor can be determined receiving terminal, and for example the setting up in the process of communication session between this terminal and content source 100 (for example HHTP GET) searched terminal at " capacity of equipment " table afterwards.Subsequently, after the selection of asking converting form information whether, file convertor can receive and handle described selection in view of the above.In this respect, if the user has selected forwarding electronic document under the situation of converting form information not, then file convertor can be operated in view of the above.Otherwise if the user selects converting form information, if perhaps terminal does not support to clap panorama, then file convertor can converting form information, for example according to method described herein.
As expression here and as described in, any one that can be from many heterogeneous networks entities, for example terminal 10, agency 18, perhaps content source, for example originating server 22, and SMSC 17, MMSC 30, and GGSN 28, and user processor 34 grades are handled file convertor 102.But, be appreciated that the various operations of carrying out file convertor that are connected that can be independent of, those operations in the form data of particularly conversion formation electronic document at least one form together with network.Therefore, understand that the network entity of operation file convertor can comprise the entity that is not connected with network, for example terminal or content source (for example originating server, SMSC, MMSC, user processor etc.).In this case, for example, can provide electronic document to network entity according to any one mode in the many different modes that are independent of network, electronic document can be changed the form of this electronic document in the manner described above afterwards.
According to an aspect of the present invention, system of the present invention all or part of, for example terminal 10, agency 18, originating server 22, SMSC 17, MMSC 30, GGSN 28, and/or user processor 34 is all or part of, work under the control of computer program (for example file convertor 102) usually.The computer program that is used to carry out the method for the embodiment of the invention comprises computer-readable storage medium, non-volatile storage medium for example, with the computer readable program code part, for example be included in the series of computation machine instruction in the computer-readable storage medium.
In this respect, Fig. 5 and Figure 11 are the method according to this invention, the process flow diagram of system and program product.Understand each square frame and the step of process flow diagram, and the combination of square frame can be realized all in the process flow diagram by computer program instructions.These computer program instructions can be loaded in computing machine and become on other programmable devices producing a certain machine, thereby the instruction of carrying out on computing machine or other programmable device produces the device that is implemented in the function of stipulating in flowchart block or the step.These computer program instructions also can be kept in the computer-readable memory, but this storer command calculations machine or other programmable devices work according to particular form, thereby one of the instruction generation that is kept in the computer-readable memory manufactures a product, and described manufacturing a product comprises the command device that is implemented in the function of stipulating in flowchart block or the step.Computer program instructions also can be loaded onto on computing machine or other programmable devices, cause on computing machine or other programmable devices, carrying out the sequence of operations step, produce computer implemented process, thereby the instruction of carrying out provides the step that is implemented in the function of stipulating in flowchart block or the step on computing machine or other programmable devices.
Therefore, the put rules into practice combination of device of function of the square frame of process flow diagram or step support, the combination of the step of the function that puts rules into practice, and the functional programs command device that puts rules into practice.Should be appreciated that in addition, each square frame of process flow diagram or and step, and the combination of square frame or step can be by the computer system based on specialized hardware of put rules into practice function or step in the process flow diagram, perhaps the combination of specialized hardware and computer instruction realizes.
According to the instruction that provides in superincumbent explanation and the relevant drawings, one of skill in the art of the present invention are easy to expect many modifications of the present invention and other embodiment.For example, as above described in one implementation, file convertor is arranged or the permutatation form, so that form has the row major ordering, handles form according to this row major ordering afterwards.But in a kind of alternative realization, file convertor can be arranged or the permutatation form, so that form has the row priority ordering, adjusts other operations of file convertor simultaneously in view of the above.For example, file convertor can utilize according to the localization process of carrying out with similar mode described here, usually utilize based on row and carry out the operation of carrying out based on row, and vice versa, about the preferential form local mark of the row with at least one row, described at least one row comprise at least one mark.So the present invention obviously is not limited to disclosed specific embodiment, and modification is included in the scope of accessory claim with other embodiment.Though adopted concrete term here, just on general and describing significance, use these terms, be not limitation of the present invention.