+

WO2006052390A2 - Systeme et procede pour la gestion de la communication et/ou du stockage de donnees d'image - Google Patents

Systeme et procede pour la gestion de la communication et/ou du stockage de donnees d'image Download PDF

Info

Publication number
WO2006052390A2
WO2006052390A2 PCT/US2005/037226 US2005037226W WO2006052390A2 WO 2006052390 A2 WO2006052390 A2 WO 2006052390A2 US 2005037226 W US2005037226 W US 2005037226W WO 2006052390 A2 WO2006052390 A2 WO 2006052390A2
Authority
WO
WIPO (PCT)
Prior art keywords
image
tiles
lods
resolution
images
Prior art date
Application number
PCT/US2005/037226
Other languages
English (en)
Other versions
WO2006052390A3 (fr
WO2006052390A9 (fr
Inventor
Blaise Aguera Y Arcas
Julian Walker
Ian Gilman
Original Assignee
Seadragon Software, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/141,958 external-priority patent/US7546419B2/en
Application filed by Seadragon Software, Inc. filed Critical Seadragon Software, Inc.
Priority to EP05851225.2A priority Critical patent/EP1810249A4/fr
Priority to JP2007536990A priority patent/JP4831071B2/ja
Priority to CN2005800430579A priority patent/CN101147174B/zh
Publication of WO2006052390A2 publication Critical patent/WO2006052390A2/fr
Publication of WO2006052390A3 publication Critical patent/WO2006052390A3/fr
Publication of WO2006052390A9 publication Critical patent/WO2006052390A9/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0483Interaction with page-structured environments, e.g. book metaphor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • H04N19/64Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission
    • H04N19/647Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission using significance based coding, e.g. Embedded Zerotrees of Wavelets [EZW] or Set Partitioning in Hierarchical Trees [SPIHT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/08Bandwidth reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/36Level of detail
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2016Rotation, translation, scaling

Definitions

  • the present invention provides a method that may include establishing communication between a first computer and a second computer over a communication link, the second computer having an image collection stored therein in the form of compressed image data; selecting a plurality of images in the collection for communication to said first computer/ and transmitting low-resolution image data for all of the selected images from the second computer to the first computer before transmitting full-resolution image data for any of the selected images.
  • FIG. 1 is a block diagram of a system that may be connected to enable communication of image data a plurality of computers in accordance with one or more embodiments of the present invention
  • FIG. 2 is a block diagram of an image having at least two regions of interest therein in accordance with one or more embodiments of the present invention
  • FIG. 3 is a block diagram of a "virtual book” that employs aspects of the technology disclosed herein in accordance with one or more embodiments of the present invention
  • FIG. 4 is an illustration of a three-dimensional version of the virtual book of FIG. 3 in accordance with one or more embodiments of the present invention
  • FIG. 5 is block diagram of a system for managing image data communication between one or more portable devices and one or more other computers in accordance with one or more embodiments of the present invention
  • FIG. 6A illustrates the results of an incomplete image data download employing an existing approach
  • FIG. 6B illustrates the results of an incomplete image data download in accordance with one or more embodiments of the present invention
  • FIG. 7 is a block diagram of a "common space" that may include a physical display (screen) and two virtual displays in accordance with one or more embodiments of the present invention
  • FIG. 8 illustrates a collection of over one thousand images (a collection of digitized maps of various sizes) packed into a montage in accordance with one or more embodiments of the present invention
  • FIG. 9 illustrates a snapshot of about three thousand images that have been dynamically re-arranged into a random configuration in accordance with one or more embodiments of the present invention.
  • FIG. 10 is a block diagram of a computer system that may be adaptable for use with one or more embodiments of the present invention.
  • FIG. 1 is a block diagram of a system 100 that may be connected to enable communication of image data between a plurality of computers in accordance with one or more embodiments of the present invention.
  • System 100 preferably includes client computer 102 which is connected to display 104 and data storage device 106.
  • System 100 preferably also includes server computer 108 which may be connected to data storage device 110.
  • Server computer 108 may also be connected to the Internet 112.
  • image data may be communicated between a plurality of computers 102, 108 so as to enable viewing of large collections of potentially large images using a relatively low-bandwidth connection therebetween.
  • desirable viewing and navigation of images stored at server computer 108 may be accomplished by transmitting selected portions of the image data stored at server computer 108 at controllable levels of resolution.
  • the selectivity of image data 114 may be such as to select a particular image at high resolution, or even a selected portion of a particular image at high resolution.
  • various embodiments are discussed that include varying the types of devices used as client computer 102 and server 108, the types of image data 114 transmitted therebetween, and various applications of the ability to transmit selected image data at specified levels of resolution.
  • FIG. 2 is a block diagram of an image 200 having at least two regions of interest 202, 204 therein in accordance with one or more embodiments of the present invention.
  • Image 200 could be a subset of image data 114.
  • image data 114 could represent a subset of image 200, depending upon what image data is requested by client computer 102.
  • image 200 may be stored in compressed form on server computer 108, or within storage device 110.
  • data for a plurality of resolution levels for various regions of image 200 may be stored and may be requested for downloading by client computer 102.
  • the resolution level at which a particular image or region of an image is stored on client computer 102 may be readily increased or decreased. Where a prior download results in storage of region or image at a first resolution level (which may be less-than-full resolution) , this first resolution level may be increased by adding data representing the next higher level of resolution, preferably without having to discard the data representing the first resolution, thereby avoiding redundancy and increasing the efficiency of the image data communication contemplated herein'. Conversely, the resolution level of a region or image stored at client 102 may be decreased by discarding the highest level of resolution stored therein, without losing data corresponding to the lower levels of resolution for the same region or image. Such resolution reduction may be practiced at client 102 to clear data storage space needed for one or more regions or images other than the one for which data is being discarded.
  • the pertinent image compression may be provided by, for instance, the use of JPEG2000 or another discrete wavelet transform-based image compression scheme.
  • the present invention is not limited to the use of any particular compression format or image data representation.
  • Other formats may be employed, including image formats whose sizes in bytes are not substantially smaller than the uncompressed image data. It is merely preferable that the selected image format be susceptible to multiscale representation and storage of image data.
  • client computer 102 may seek to download one or more regions of image 200, where such regions may be portions of image 200.
  • the one or more regions of interest 202, 204 may be the only ones that client computer 102 seeks to download.
  • client computer (client) 102 may merely seek to download one or more selected regions at higher resolution than the resolution at which the remainder of image 200 is downloaded. In either case, client 102 may request a download by identifying both a specified region of image 200 to download and a resolution level at which this specified region will be provided by server computer (server) 108.
  • client 102 preferably requests a download of all of image 200 at low resolution. (The exact resolution level at which the bulk of image 200 is downloaded is not pertinent to this discussion) . However, client 102 seeks to download region of interest 1 202 at a higher resolution, or even at full resolution. Accordingly, client 102 preferably specifies the coordinates and the desired resolution level of region of interest 1 202 to server 108. Thus, in addition to downloading the bulk (including that portion external to region of interest 1 202) of image 200 at low resolution, client 102 preferably downloads region of interest 1 202 at the specified higher resolution. In other situations, client 102 could seek to download only the region (s) of interest and omit a download of the remainder of image 200.
  • a user of client computer 102 may view region of interest 1 202 at high resolution without having to download the entirety of image 200 at this high resolution.
  • a relatively low-bandwidth data communication link between client 102 and server 108 could nevertheless transmit the entirety of image 200, while providing a region of particular interest (in this case, region of interest 1 202) at high resolution, thereby providing the viewer with the same viewing experience with respect to the region of interest that would have occurred had client 102 downloaded the entirety of image 200 at the high resolution, this latter option however demanding considerably more download time and data storage space at client computer 102 or data storage device 106.
  • a user of client computer 102 may wish to pan across image 200. Normally, panning from one region of interest 202 to another 204 would involve having both regions downloaded at client 102 at the level of resolution at which the regions will be viewed. Moreover, generally, all image territory in between region of interest 1 202 and region of interest 2 204 would be stored at client computer 102 to enable the described panning to occur. As described in the following, in one or more embodiments of the present invention, viewing of such regions of interest 202, 204 may be accomplished by downloading much less data and using less storage space at client computer 102 than in the approach described above.
  • client 102 may shift from a high resolution view of region of interest 1 202 to region of interest 2 204.
  • image data corresponding to a low-resolution representation of region of interest 2 204 is already present in client computer 102 from the download of image 200, discussed above, In this case, all that is needed is to supplement the existing image data for region of interest 2 204 with additional image data describing the pertinent higher levels of resolution to arrive at a high- resolution rendition of region of interest 2 204 at client computer 102.
  • image data representing the higher resolution levels of region of interest 1 202 may be discarded or overwritten to make space in data storage device 106 or other data storage space for the additional image data to be downloaded for region of interest 2 204.
  • the shift in view from region of interest 1 202 to region of interest 2 204 may be accomplished gradually, to provide a viewer of display 104 with a viewing experience that may closely simulate that available on a computer that has the entirety of image 200 downloaded at high resolution.
  • the level of resolution at which region of interest 1 202 is displayed may be reduced gradually to the resolution level at which most of image 200 is represented.
  • the view on display 104 may present a gradual pan across the low-resolution territory in between region of interest 1 202 and region of interest 2 204.
  • FIG. 3 is a block diagram of a "virtual book" 300 that employs aspects of the technology disclosed herein in accordance with one or more embodiments of the present invention.
  • Virtual book 300 may include display 302, backward cache 304, and forward cache 306. While the caches 304, 306 are each shown having two pages stored therein, any number of pages may be stored in either of caches 304 and 306.
  • virtual book 302 employs the ability to present selected image data at controllable levels of resolution for the particular case of virtual book 300.
  • each image may be a page within display 302 of virtual book 300.
  • Display 302 may correspond to display 104 of FIG. 1 or may be a special purpose display that accommodates the specific features of virtual book 300.
  • Virtual book 3000 may correspond to client computer 102 of FIG. 1, or may be a special purpose computer that is substantially limited to communicating, storing, and displaying pages of books.
  • virtual book 300 may include only one page that is stored and/or displayed at full resolution, with other pages, both earlier and later in the sequence of pages displayed, at a variety of other resolutions.
  • the page currently displayed on display 104 i.e. the active page
  • other pages may be displayed at progressively lower resolutions with increasing distance in pages from the active page.
  • the resolution at which each page is stored may equal the resolution of the active page being displayed in display 306 divided the quantity equal to 2 raised to a power equal to the number of pages between each stored page and the active page.
  • page 11 (in forward cache 306) and page 9 (in backward cache 304) may each occupy one half the amount of data storage space occupied by the active page in display 302.
  • page 12 (in forward cache 306) and page 8 (in backward cache 304) may each occupy one quarter the amount of data storage space occupied by the active page in display 302.
  • a new active page may be selected in place of page 10 which is shown displayed in FIG. 3.
  • the new selected page may, but need not, be a page immediately adjacent page 10 (either page 9 or page 11) . That is, any page from 1 to the last page in the pertinent book (or any other type of publication with discrete pages) may be the new active page.
  • a transition between the currently active page and the new active page is preferably conducted.
  • This transition to a new active page may include acquiring additional image data for the new active page to enable the new active page to be stored and/or displayed resolution. If the new active page is "page 11", and the "factor-of-two" embodiment, discussed above, is employed, the amount of data storage space allocated to page 11 will preferably double. Continuing with an application of the "factor-of-two" embodiment, the data storage space allocated to page 10 will preferably be halved as part of the transition away from page 10 and toward page 11, as the active page.
  • the data for the active version of page 10 that is not included in the post-transition page 10 may be discarded (which may include overwriting thereof) .
  • this "surplus" data for page 10 may be stored in another cache. Such caching of the page-10 surplus data may provide efficiency if a transition to page 10 occurs soon after (i.e. within a reasonable number of page transitions) the transition away therefrom.
  • the transition from page 10 to page 11 may include a gradual fade-out from page 10 and gradual fade-in of page 11, to provide an experience that is visually pleasing and/or pronounced of a physical page transition to the user of virtual book 300.
  • a sequence of images showing the folding and turning of the old active page may be provided to make the virtual page transition look still more pronounced of a physical turn of a page.
  • FIG. 4 is an illustration of a three-dimensional version of the virtual book of FIG. 3 in accordance with one or more embodiments of the present invention.
  • the embodiment of FIG. 4 illustrates a method in which an alpha channel, for partial transparency (the rough edges) may be stored as image information in addition to the red, green and blue color components.
  • One or more embodiments of the present invention may provide a method which may include providing a collection of digital images or other visual objects on a server; establishing communication between a client and said server; and enabling efficient multi-scale navigation by the client of collections of visual objects residing on the server.
  • digital image data may include digital photographs, digital images, visual documents or other forms of visual content.
  • image generally corresponds to the term “digital image,” and either of these terms may correspond to a "digital photograph.”
  • client generally corresponds to the term “client side” and to the term “client device”.
  • portable device generally refer to a digital image capturing devices and/or digital image storage devices.
  • a “digital image capturing device” may include but is not limited to a digital camera, a camera-enabled mobile phone (which may be referred to as a camera-enabled cell phone) , a personal digital assistant, and/or a digital video recorder able to record digital still images.
  • a “digital image capturing device” may include devices that are capable of • receiving image data by directly optically receiving and recording such data (such as with a standard digital camera) and may also include devices that are able to receive image data via a wired or wireless Internet or other network connection.
  • One or more embodiments of the methods described herein may use a multi-resolution approach to address the problems of storing, synchronizing, browsing, and organizing collections of digital image data, which may be visual documents.
  • One or more of the methods described herein may also apply to visual data objects other than images, such as the roadmap or other vector data of Applicant reference document
  • Digital photographs or other digital image data stored in a camera or other portable device are generally downloaded periodically to a desktop or notebook computer, cleared from the camera's memory to allow more pictures to be taken, and organized and/or viewed on the desktop or notebook computer. Thereafter, digital photographs may be shared with friends by posting a selection of digital photographs to one or more Internet sites.
  • a mobile device which may be a digital camera or other digital image data capturing device, takes pictures. Then, potentially after some culling of the pictures, the pictures are downloaded to the camera user's PC
  • the camera device personal computer
  • the camera device's local storage may be limited and, in this conventional approach, only holds images transiently, until they are safely stored on the PC.
  • the PC may permanently retain in its memory (e.g. hard disk drive or other non-volatile storage) any subset of the digital photos.
  • the user may in turn upload some further culled subset of those images to a web server which may be owned by a web photo publishing service, typically at reduced resolution.
  • the images uploaded may be made publicly viewable by any third party using a web browser on a PC or other device, or by some subset of those users with restricted access.
  • existing camera devices impose other limitations. In existing camera devices, navigation of images stored on the camera device is generally awkward and difficult. In existing camera devices, there is a lack of a unified visual interface to image collections which would give users a consistent experience either on the camera device or on a PC. Existing camera devices tend to impose very restrictive limits on the number of photos that can be stored thereon before downloading becomes necessary. Thus, when employing existing approaches, a lengthy series of steps is generally involved in making images available to a third party.
  • FIG. 5 is block diagram of a system 500 for managing image data communication between one or more portable devices 512, 522 and one or more other computers in accordance with one or more embodiments of the present invention.
  • System 500 may include a client side 510 and a server side 520.
  • client and server statuses of the groupings of devices shown in FIG. 5 may be reversed.
  • system 500 may include portable device 1 512, portable device 2 522, personal computer 102 (which may be substantially the same as client computer 102 of FIG. 1), server 108 (which may be substantially the same as server computer 108 of FIG. 1) and/or additional computers 524.
  • portable device 1 512 portable device 2 522
  • personal computer 102 which may be substantially the same as client computer 102 of FIG. 1
  • server 108 which may be substantially the same as server computer 108 of FIG. 1
  • additional computers 524 Preferably, each of devices
  • 512, 522 and computers 102, 108, and 524 have memory and one or more displays included therewith.
  • the devices and computers of FIG. 5 could be in communication with memories and/or displays.
  • FIG. 5 illustrates various possible data paths useable in accordance with one or more embodiments of the present invention.
  • One or more embodiments may use less than all of the data paths shown in FIG. 5.
  • the available data paths shown in FIG. 5 may have one or more of the following features in common: 1) The data paths may involve server side 520 (the originator of the image data) and a client side 510 (the recipient of the image data); 2) bi-directional data paths (which are illustrated with lines having arrows at both ends) indicate that the devices pointed to by these arrows are capable of serving in either a client or a server capacity; 3) the connections may employ a hard-wired network (e.g.
  • both the client side 510 and the server side 520 may include one or more digital computing and/or storage devices including but not limited to: camera devices, personal computers, and personal digital assistants.
  • a client device may have one or more displays. The client may browse a collection of documents residing on the server using one or more of the efficient multi-resolution browsing methods described in Applicant reference document 489/15P (U.S. Provisional Application Serial No.
  • one of the properties of this navigation method is that the display contents may gradually come into focus as information is sent from the server to the client.
  • the rate at which this information comes into focus may be governed by the ratio of connection bandwidth to display pixels.
  • a client's "display” need not necessarily be physical, or visible to an end-user.
  • this display can be a "virtual display", i.e. an abstract model of a display with a specified resolution.
  • Such a “virtual display” might be represented as an array of pixel values in the client' s memory, irrespective of whether those pixel values are rendered to a screen.
  • a virtual display may include wavelet data that at least partially describes one or more images. The wavelet data is preferably able to represent an image at a range of possible resolutions. In one or more embodiments, the wavelet data may correspond to that employed using JPEG2000.
  • a virtual display may include enough wavelet data to completely describe one or more images. For example, if it were desirable for a device to acquire thumbnails of all of the images in a collection at a specified resolution, then this device could create a "virtual display" of the appropriate size, establish a connection with a server, and request a view of the entire collection. The full set of thumbnails could then be transmitted to and rendered on this "virtual display". If transmission were interrupted before all of the relevant data were sent from the server to the client, then the client's virtual display would not yet have all of the thumbnail images in perfectly focused condition. However, all of the requested thumbnail images would preferably be stored within the client's virtual display with sufficient resolution to enable rendering visible versions of these images on a screen. The images rendered in the manner described would generally be of lower visual quality than if the transmission of the image had concluded without interruption. Thus, some image degradation may be present in the images rendered using data from an incomplete, interrupted transmission.
  • FIG. 6A illustrates the results of an incomplete image data download employing an existing approach
  • FIG. 6B illustrates the results of an incomplete image data download in accordance with one or more embodiments of the present invention.
  • FIG. 3A shows a prior art scenario in which all of the data for three thumbnails (shown with square shapes) have been received, and in which the remaining nine thumbnails (shown with X's) have not been received at all.
  • FIG. 3B illustrates a situation that may arise employing one or more embodiments of the present invention, in which all twelve thumbnails (shown as cross-hatched square shapes) have been received at some level of resolution, which is preferably acceptable for viewing, but which is likely below the resolution that would be obtained after conclusion of a complete and uninterrupted data transmission.
  • a client may have a client- side cache that caches recently viewed visual content.
  • a standard MRU (most-recently-used) cache may be employed for the caching needs for one or more embodiments of the present invention.
  • a cache disclosed in U.S. Patent Application Serial No. 11/141,958 (client reference document 489/10NP), entitled “Efficient Data Cache”, which is incorporated herein by reference, may be beneficially employed to enable more sophisticated client-side caching. In either case, a given amount of client-side memory may be devoted to the cache.
  • navigation back to a recently viewed image may permit using image data stored in the cache, rather than requiring that this image data be re-sent from the server.
  • a client may have multiple displays.
  • a given display may be physical or virtual.
  • a given display may be driven directly by user input, or it may be driven programmatically by software within a client computer such as computer 102.
  • the total size in pixels of all of the displays may be fixed or bounded by some limit, and this limit may define a minimum amount of client-side memory needed for visual content.
  • This client-side memory is preferably separate from storage space allocated to the cache memory.
  • a physical display within a client device is visible to the user and allows zooming and panning navigation through, and rearrangement of, a collection of digitally stored images.
  • the user may also select one or more images from the collection and send them to a "holding pen" which may serve as a place for storing user-selected images.
  • the holding pen may be visualized in some way on the physical display. Adding an image to the holding pen preferably causes the image to be placed on the virtual display, which may be invisible to the user. As images are added to the holding pen, the virtual display representing the holding pen gradually fills up.
  • This virtual display may increase in size (as measured in number of pixels) up to some limit, after which its size may remain fixed at this limit.
  • the virtual display may be too small to display all of the images in the holding pen at full resolution.
  • the data storage space needed for the images resident in the virtual display is preferably reduced as needed to fit the images into the virtual display.
  • an off-screen view (the virtual display) preferably gets supplemented with images as the user puts viewable images into the holding pen. This supplementing of the off-screen view may occur invisibly to the user.
  • a method for browsing is disclosed in U.S. Patent Application Serial No. 10/790,253 (Applicant reference document 489/2NP) , entitled “System and Method for Exact Rendering in a Zooming User Interface", which is incorporated by reference herein.
  • the method disclosed in that document for determining the order in which information is sent from the server to the client based on the client' s view may be modified for a multiple display scenario.
  • the 489/2NP document discloses that visual information may be broken up into tiles, with each tile covering a region in space at a given resolution. Low-resolution tiles may then occupy large physical areas, while high-resolution tiles may occupy smaller physical areas, such that the amount of information in each tile may be substantially the same.
  • the 489/2NP document discloses methods for ordering tiles using criteria discussed in the following.
  • One criterion may be tile resolution and tile position on the display. Sorting of tiles could be lexicographic, such that lower-resolution tiles always precede higher-resolution tiles, with spatial position only playing a role in resolving order within a resolution.
  • Lexicographic sorting is referenced here in the generalized tuple sense — for example, the lexicographic sort of the set of triplets ⁇ (1,2,3), (0,3,1), (4,0,0), (0,0,1), (0,3,2) ⁇ would be (0,0,1), (0,3,1), (0,3,2), (1,2,3), (4,0,0) .
  • non-lexicographic sorting criteria may be employed.
  • sort tiles For instance, a linear combination of a plurality of properties could be used to sort tiles .
  • properties may include but are not limited to: resolution (which could be expressed in logarithmic units) and distance of the tile from the center of the display.
  • sort key corresponds to the term “sorting criterion.”
  • lower-resolution tiles may be sent in preference to higher-resolution tiles, and tiles near the center of the display may be sent in preference to tiles near the periphery, but these properties can trade off against each other.
  • display number can be added as an extra lexicographic sort key.
  • a first display might refine completely (in accordance with the other sort keys) before any tiles are sent relevant to a second display.
  • display number can be an additional variable for inclusion in a linear combination, allowing display number to trade off in some fashion against resolution and proximity to the center of the display.
  • the displays can coexist in an imaginary "common space", and the resolution and proximity-to-center sort keys can be used as before.
  • the "common space” is a notional space establishing an imaginary spatial relationship between multiple displays, as if they were regions of a single, larger display. Defining this imaginary spatial relationship determines all parameters needed for prioritizing tiles in the multiple displays.
  • FIG. 7 is a block diagram of a "common space” 700 that may include a physical display (screen) 702 and two virtual displays 704, 706 in accordance with one or more embodiments of the present invention.
  • the physical display 702 is preferably in the center of "common space” 700 at normal size.
  • Virtual displays Vl 704 and V2 706 are preferably off to the side, and V2 is preferably scaled down, so that its pixels are preferably half the linear size of the physical display' s pixels. This means that, assuming purely lexicographic tile sort order, the contents of each resolution level in Vl 706 will preferably be sent from the server to the client after the corresponding resolution for the physical display (since Vl is farther from the center of the space than any point on the physical display) .
  • V2 706 Resolutions in V2 706 may be sent after all the tiles at a resolution twice as fine have been sent for both the physical display 702 and Vl 704. It is noted that it isn't necessary for the "common space” 700 to correspond to any real larger display or memory address space.
  • the "common space” 700 is merely a conceptual convenience for establishing the relationships among tile priorities across different displays. Clearly many tradeoffs are possible. These tradeoffs can have the consequence, as in the lexicographic example above, of giving refinement of the physical display 702 the highest priority, while using any excess time and bandwidth not required for bringing the physical display into focus to continue refining the virtual display(s) 704, 706.
  • the tradeoffs may alternatively begin refining the virtual display (s) after the physical display has largely, but not completely, come into focus. After the physical display 702 has largely come into focus, the physical and virtual displays 704, 706 can share bandwidth resources to refine in concert.
  • any subset of the data for a given image can itself comprise a JPEG2000 image file.
  • the client may progressively download image data from the server, thereby supplementing the quality of the client's subset of the image and giving the client the ability to create a JPEG2000 file that is an increasingly accurate approximation of the full image.
  • JPEG2000 can be extended to other multi-resolution document types as well. If the client never zoomed in beyond a given resolution, then no information would be available regarding the image content beyond that given resolution. In this case, the version of the JPEG2000 image which may be created and/or stored by the client may have a lower overall resolution than the original version of that image.
  • the camera or camera-enabled mobile device may operate as the server, and a PC may operate as the client.
  • the PC can rapidly browse through the complete set of images available on the camera. During navigation, a group of images can be selected and put in the holding pen. Note that if all images on the camera are to be downloaded to the PC in their entirety, then the total time needed to accomplish the transfer remains the same as in the prior art.
  • this method can provide a number of advantages over the conventional serial download of images which are listed and discussed below. The present invention is not limited to the features listed below.
  • Image download and user navigation of the full image set on the camera or other mobile device may be concurrent and cooperative in their use of bandwidth (in effect, navigation merely influences the order in which tiles are sent from server to client) .
  • the PCs display is larger than the mobile device's display, then better choices can be made about which images to download, which to leave on the mobile device, and which to discard, without incurring the delay of downloading the entire set before deciding.
  • the experiences of browsing on the PC and on the mobile device (assuming that it also has a display) , respectively, are preferably simple and experientially similar, thereby increasing usability.
  • lower-resolution versions of the images in the holding pen are desired, it's preferably straightforward to suitably limit the detail of downloaded data by reducing the size of the item on the virtual display. It is noted that reducing the image size in this manner may both speed up downloading by a large factor — i.e. by a factor of 4 per resolution level discarded — and requires less space on the PC) .
  • the amount of memory allocated to photos on the PC can be bounded. Also, different constraints can be placed on different photos, and hence space can be allocated based on recency or one or more other criteria.
  • premature loss of connectivity results in a degradation in the quality of some or all of the images to be downloaded, instead of completely removing some images from the download operation.
  • the bulk of the data volume for an image is very high- resolution detail, some of which is camera noise, and all of which is less critical for ordinary viewing than the coarser image structure.
  • Hybrid prioritization of the image data is also possible, for example, favoring the complete download of a subset of the photos before proceeding to refine a second set beyond thumbnail detail.
  • one or more methods disclosed herein are resilient to intermittent connectivity, since any JPEG2000 object can continue to be augmented at any time with additional information while still allowing browsing and interaction with whatever visual data has already been received.
  • any JPEG2000 object can continue to be augmented at any time with additional information while still allowing browsing and interaction with whatever visual data has already been received.
  • typical home users may not want to discard any of their images (after an initial culling of such images) . If such users continue to add sufficient storage to their PC, then of course it should not be necessary to discard any content.
  • the addition of storage can in itself increase the virtual display maximum size.
  • Features of (a) and (b) above can therefore be omitted if a sufficiently large virtual display size can be created (i.e., if there is enough available client-side storage) .
  • checkmarks or green dots can appear next to images as they finish downloading. When all images in the "holding pen” include green dots, the connection can be broken without loss.
  • Operations such as requesting that the camera discard some of its images using the client computer may benefit from some additional communication from the client to the server beyond that contemplated in Applicant reference document 489/15P.
  • the client side could also instruct the server side (which may be a mobile device such as a digital camera or mobile phone) to launch its own client side, and create its own view to receive content from the PC.
  • the PC can render the camera/mobile phone's "view” of content on the PC, thus (for example) displaying the green completion dots described above for images uploaded from the PC to the camera.
  • Each of the reciprocal arrows of FIG. 5 can be implemented using either a “push” or “pull” arrangement. Specifically, the viewport setting, the arrangement, and other navigation settings may be controlled from either the client side 510 ("pull") or from the server side 520 (“push”) .
  • a user interacting with one device can be connected reciprocally to another device, thereby enabling both "pulling” and “pushing” to occur simultaneously.
  • a mobile device 512 which may be a camera or camera- enabled mobile phone may serve content to a user's PC (personal computer) 102.
  • This connection might typically take place over a USB cable or a Bluetooth ad-hoc wireless network.
  • the benefits are described above.
  • the PC 102 may serve content back to the mobile device 512. This can be useful for the following applications, among others.
  • Wallet photos can be sent from the PC to the camera or mobile phone, even if those photos did't taken by the mobile device.
  • the PC may be a home appliance without a display, and the mobile device may then be used as a primary visual interface to the archived visual material.
  • the mobile device in this context may be a digital camera, a camera-enabled cell phone, a PDA, or a mobile tablet PC with a display.
  • a first mobile device can be connected directly to, or form an ad-hoc network with, another mobile device (the "guest”) .
  • the two mobile devices can then view and share each others' photos.
  • the PC could upload images (via push) to a remote server.
  • the server may be a photo sharing service, and may therefore implement the kind of space constraints envisioned in the above processes of reducing the size of the item on the physical display and bounding the amount of memory allocated to photos on the PC.
  • the remote server could then serve its collection to one or more additional PCs. Typically this would be a broadband connection. However, other connection types could be employed.
  • the remote server could also serve collections to mobile device users. Typically this would be a mobile wireless wide- area network.
  • Mobile devices could upload their images via "push" (that is, under control of the mobile devices) to a remote server.
  • the upload may be automatic, allowing the mobile device to transparently extend its apparent storage space by transferring content freely to a server and deleting it locally when transfers are complete.
  • local caching on the mobile device 512 may allow the mobile device 512 to support browsing through very large thumbnail collections using only local storage, even if the local storage is limited. Zooming in on details of recently viewed images may also be possible, if the relevant information is still in the mobile device's local cache.
  • zooming in on images whose details are only available on a remote server could result in a blurry and un-detailed image. If the mobile device is on a network that includes the remove server 108 however, the blurry image can become progressively more refined as more and more detailed image data is downloaded to the mobile device 512. If the mobile device is not connected to a network that can supply additional image data, the image may not be presented with any greater detail than is available in the initial thumbnail image.
  • One or more embodiments of the present invention may define precomputed steps and interactive rendering algorithms which can be used in a variety of configurations to implement downloading of selected images and or image regions at controllable levels of resolution for various applications. Many of these applications (such as focusing on regions of interest, virtual book etc..) may involve user interaction with a "universe" of images.
  • the starting point for precomputation may therefore be a list of the filenames, URLs, or other strings referencing the individual images.
  • a montage may be a mosaic or collage of all of the images, rendered at low resolution and packed efficiently into a rectangular area, as shown in FIG. 8.
  • Auxiliary metadata which can be embedded in the montage image file or stored separately, may identify rectangular regions on the montage image with a particular image file.
  • the montage image itself can be navigated using a zooming and panning interface. When the user zooms in far enough to exhaust the resolution available in the montage version of one or more images within the montage, the metadata for that image may refer the client to one or more individual image files, and the client may use imagery from these image files to render the images at higher resolution.
  • the overall size of the montage in pixels may be chosen such that its resolution is only exhausted when zooming in to a stage where only a small number of images, which may be referred to herein as a "set" of images, are visible simultaneously. Therefore, access to more than this small number of images at high resolution is preferably not needed at any given time.
  • image streams may be opened and closed as needed to limit the number of high resolution images that are open at any given time.
  • the montage layout is preferably designed for packing efficiency, but the user may want a different arrangement of the images onscreen. Moreover, the user may want to be able to dynamically rearrange the layout of images on the screen.
  • Texture mapping which may be implemented in software but is, in general, hardware-accelerated on modern personal computers. Texture mapping allows a portion of a “texture”, or source image, to be drawn on the display, optionally rescaling the image, rotating it, and/or performing a three- dimensional perspective transform. Other hardware-accelerated transformations are often supported, including color correction or alteration, full or partial transparency, lighting, occlusion, and coordinate remapping.
  • a low- resolution version of the montage can be used as a "texture", so that when the user is zoomed out, the individual images within the montage can be dynamically remapped in any way, as in FIG. 9.
  • each texture map may be a montage containing a subset of the images. Transitions between arrangements may or may not be animated. It is noted that rearrangement can take place while the user is zoomed in, but because the rearrangement might result in a new zoomed-in view of an image which was previously off-screen, the new image may initially be very blurry.
  • the texture mapping technique may be used only during dynamic rearrangement of images.
  • software compositing can be used to assemble all or part of a higher-definition rearranged montage on-screen.
  • This software compositing method is especially valuable in combination with the multiresolution rendering techniques described in US Patent application Serial No. 10/790,253, (Applicant reference document 489/2NP) , identified in detail earlier in this disclosure. This method may in effect creates a new "display montage" by rearranging the imagery of the original montage.
  • Texture mapping may also be used to display high resolution images, but in this case, rather than using textures containing montages of multiple images, textures are used that contain tiles of individual images . This technique is also described in US Patent application Serial No. 10/790,253 (Applicant reference document 489/2NP) .
  • montage rearrangement may be used to support reorganization of the images without recourse to texture mapping.
  • FIG. 10 is a block diagram of a computing system 1000 adaptable for use with one or more embodiments of the present invention.
  • central processing unit (CPU) 1002 may be coupled to bus 1004.
  • bus 1004 may be coupled to random access memory (RAM) 1006, read only memory (ROM) 1008, input/output (I/O) adapter 1010, communications adapter 1022, user interface adapter 1006, and display adapter 1018.
  • RAM random access memory
  • ROM read only memory
  • I/O input/output
  • RAM 1006 and/or ROM 1008 may hold user data, system data, and/or programs.
  • I/O adapter 1010 may connect storage devices, such as hard drive 1012, a CD-ROM (not shown) , or other mass storage device to computing system 1000.
  • Communications adapter 1022 may couple computing system 1000 to a local, wide-area, or Internet network 1024.
  • User interface adapter 1016 may couple user input devices, such as keyboard 1026 and/or pointing device 1014, to computing system 1000.
  • display adapter 1018 may be driven by CPU 1002 to control the display on display device 1020.
  • CPU 1002 may be any general purpose CPU.
  • the present invention relates generally to graphical zooming user interfaces for computers. More specifically, the invention is a system and method for progressively rendering zoomable visual content in a manner which is both computationally efficient, resulting in good user responsiveness and high frame rates, and exact, in the sense that
  • GUIs graphical computer user interfaces
  • visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out.
  • the desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • viewing maps browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • zoomable components such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc.
  • these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally.
  • continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent.
  • any kind of visual content could be zoomed, and zooming would be as much a part of the user's experience as panning.
  • Ideas along these lines made appearances as futuristic computer user interfaces in many movies even as early as the 1960s 1 ; recent movies continue the trend 2 .
  • a number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present. 3 In 1991, some of these ideas were formalized in U.S. Patent 5,341,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability").
  • the prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++ have undergone some development since 4 .
  • no major application based on a full ZUI Zooming User Interface
  • Voss zooming user interface framework
  • This patent is specifically about one of the innovations in Voss's approach to object tiling and rendition for non-photographic content.
  • a multiresolution visual object is normally rendered from a discrete set of sampled images at different resolutions or levels of detail (an "image pyramid").
  • image pyramid two adjacent levels of detail which bracket the desired level of detail are blended together to render each frame, because it is not normally the case that the desired level of detail is exactly one of those represented by the discrete set.
  • Such techniques are sometimes referred to as trilinear filtering or mipmapping.
  • the resulting interpolated renditions are usually satisfactory for photographic content, but not satisfactory for content defined in terms of geometric primitives, such as text, graphs, drawings, and in short most of the visual content with which ' users interact outside gaming or multimedia applications. This is because blending levels of detail necessarily introduces blurring and aliasing effects.
  • the present invention involves a hybrid strategy, in which an image pyramid- based approach with a discrete number of levels of detail is typically used during rapid zooming and panning, but when the view stabilizes sufficiently, an "exact view” is rendered and blended in over several frames. Because the human visual system is insensitive to fine detail in the visual content while it is still in motion, this hybrid strategy can produce the illusion of continuous "perfect rendering” with a fraction of the computational burden.
  • An objective of the present invention is to allow text, plots, charts, drawings, maps, and any other vector-based content (also referred to here as vectorial content) to be rendered in a zooming user interface without degradation in ultimate image quality relative to the highest possible quality rendition in an ordinary GUI.
  • a further objective of the present invention is to allow arbitrarily large or complex vector-based content to be viewed in a zooming user interface.
  • a further objective of the present invention is to enable near-immediate viewing of arbitrarily complex vector-based visual content, even if this content is ultimately represented using a very large amount of data, and even if these data are stored at a remote location and shared over a low-bandwidth network.
  • a further objective of the present invention is to allow the user to zoom arbitrarily far in on vectorial content while maintaining a crisp, unblurred view of the content and maintaining interactive frame rates.
  • a further objective of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex vectorial content, in the process both preserving the overall appearance of the content and maintaining interactive frame rates.
  • a further objective of the present invention is to minimize the user's perception of transitions between levels of detail or rendition qualities during interaction.
  • a further objective of the present invention is to allow the graceful degradation of image quality by blurring when exact renditions of certain parts of the vectorial content cannot yet be made either because the information needed to render them is unavailable, or because an exact rendition is still in progress.
  • a further objective of the present invention is to gracefully increase image quality by sharpening when exact renditions of certain parts of the vectorial content first become available.
  • zooming user interfaces are a generalization of the usual concepts underlying visual computing, allowing a number of limitations inherent in the classical user/computer/document interaction model to be overcome.
  • One such limitation is on the size of a document that can be "opened” from a computer application, as traditionally the entirety of such a document must be “loaded” before viewing or editing can begin.
  • RAM random access memory
  • this limitation is felt, because all of the document information must be transferred to short-term memory from some repository (e.g. from a hard disk, or across a network) during opening; limited bandwidth can thus make the delay between issuing an "open” command and being able to begin viewing or editing unacceptably long.
  • Still digital images both provide an excellent example of this problem, and an illustration of how the computer science community has moved beyond the standard model for visual computing in overcoming the problem.
  • Table 1 shows download times at different bandwidths for typical compressed sizes of a variety of different image types, from the smallest useful images (thumbnails, which are sometimes used as icons) to the largest in common use today. Shaded boxes indicate images sizes for which interactive browsing is difficult or impossible at a particular connection speed.
  • Modern image compression standards such as JPEG2000 5 , are designed to address precisely this problem. Rather than storing the image contents in a linear fashion (that is, in a single pass over the pixels, normally from top to bottom and left to right), they are based on a multiresolution decomposition.
  • the image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and 1x1. Obviously the fine details are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions.
  • Previous models of document access are by nature serial, meaning that the entirety of an information object is transmitted in linear order.
  • This model by contrast, is random-access, meaning that only selected parts of the information object are requested, and these requests may be made in any order and over an extended period of time, i.e. over the course of a viewing session.
  • the computer and the repository now engage in an extended dialogue, paralleling the user's "dialogue" with the document as viewed on the display.
  • each level of detail is the basic unit of transmission.
  • the size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges of the image, as its dimensions may not be an exact multiple of the nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile.
  • the resulting tiled image pyramid is shown in Figure 2. Note that the "tip" of the pyramid, where the downscaled image is smaller than a single tile, looks like the untiled image pyramid of Figure 1.
  • the JPEG2000 image format includes all of the features just described for representing tiled, multiresolution and random-access images.
  • tile rendition Hence we will refer to the production of a finished, fully drawn tile in response to a "tile drawing request" as tile rendition, with the understanding that this may be a slow process. Whether it is slow because the required data are substantial and must be downloaded over a slow connection or because the rendition process is itself computationally intensive is irrelevant.
  • a complete zooming user interface combines these ideas in such a way that the user is able to view a large and possibly dynamic composite document, whose sub- documents are usually spatially non-overlapping. These sub-documents may in turn contain (usually non-overlapping) sub-sub-documents, and so on.
  • documents form a tree, a structure in which each document has pointers to a collection of sub- documents, or children, each of which is contained within the spatial boundary of the parent document.
  • We call each such document a node borrowing from programming terminology for trees.
  • nodes may be static images which can be edited using painting-like commands, while other nodes may be editable text, while other nodes may be Web pages designed for viewing and clicking. All of these can coexist within a common large spatial environment — a "supernode” — which can be navigated by zooming and panning.
  • zooming is an intrinsic aspect of navigation, content of any kind can be viewed at an appropriate spatial scale.
  • variable names refer to the sampling density of a tile relative to the display.
  • Tiling granularity which we will write as the variable g, is defined as the ratio of the linear tiling grid size at a higher LOD to the linear tiling grid size at the next lower LOD.
  • Granularity 2 is by far the most common in similar applications, but in the present context g may take other values.
  • the level of detail scheme described thus far involves a fixed, discrete set of LODs at different scales separated by factors of the granularity g.
  • the image drawn on any region of the display is then usually a weighted blend between two levels of detail, one of which is somewhat finer than the display resolution (/>1) and one of which is somewhat coarser (/ ⁇ 1) (although more generally, the present invention also applies if an image region is transiently a single undersampled LOD, or a blend between more than two LODs).
  • This scheme unmodified, produces visually compelling results for content defined by sampled images, such as digital photographs or video.
  • Figure 3 (a) is an example of text rendered in pure black and white (without antialiasing); (b) is the same text rendered with antialiasing; (c) is a pattern of closely spaced lines; (d) is a checkerboard fill pattern of alternating black and white pixels.
  • the bottom row of images shows the LOD blending effects on the exact images in the top row.
  • the edge blurring of the blended text in (a) is inferior to the result of pixel-precise antialiasing of the top image in (b).
  • LOD blending further blurs it, again resulting in a suboptimal image.
  • (a) and (b) do not produce terrible results, but the exact version is clearly better in each case.
  • the other two images produce more serious errors, including spurious Moire patterns, shimmer, speckling, and blurring.
  • Simply calling this method at every frame refresh during a zoom or pan would in general be far too slow; the exact drawing method may easily take several frames to execute, and in some cases could be much slower still.
  • the targets of the "ordinary” drawing method are tiles which remain relevant over an entire range of pans and zooms, making it possible to implement queueing and resolution fallback schemes which allow smooth navigation and graceful image quality degradation even if tile rendition is slow and asynchronous.
  • exact tiles are of perfect quality, but are specific to a particular view.
  • exact tiles are a final stage of display refinement.
  • Exact tiles are "throwaways", in the sense that they become unusable when the user pans or zooms, since it is unlikely that the user will pan or zoom back to precisely the old view.
  • the overall effect of the present invention is that navigation performance remains unchanged when panning or zooming over large volumes of text or other vector graphics; during such navigation, the rendered image is less than ideal, but because the image is in motion, in the vast majority of cases the degradation is not noticeable.
  • exact tiles are requested and blended in foveally as they arrive, resulting in a sharpening of the image, beginning near the center of the display and spreading outward.
  • Spatial and temporal blending can generally make the progress of this sharpening difficult to detect by eye, but the resulting image is of perfect quality, i.e. unaffected by the blurring and blending operations that allow a ZUI architecture based on the continuous interpolation between discrete levels of detail to operate.
  • the present invention relates generally to zooming user interfaces (ZUIs) for computers. More specifically, the invention is a system and method for progressively rendering arbitrarily large or complex visual content in a zooming environment while maintaining good user responsiveness and high frame rates. Although it is necessary in some situations to temporarily degrade the quality of the rendition to meet these goals, the present invention largely masks this degradation by exploiting well-known properties of the human visual system.
  • GUIs graphical computer user interfaces
  • visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out.
  • the desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • viewing maps browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • zoomable components such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc.
  • these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally.
  • continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent.
  • any kind of visual content could be zoomed, and zooming would be as much a part of the user's experience as panning.
  • Ideas along these lines made appearances as futuristic computer user interfaces in many movies even as early as the 1960s ; recent movies continue the trend .
  • a number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present. 3 In 1991, some of these ideas were formalized in U.S. Patent 5,341,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability").
  • the prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++, have
  • Voss zooming user interface framework
  • This patent is specifically about Voss's approach to object tiling, level-of-detail blending, and render queueing.
  • a multiresolution visual object is normally rendered from a discrete set of sampled images at different resolutions or levels of detail (an image pyramid).
  • an image pyramid In some technological contexts where continuous zooming is used, such as 3D gaming, two adjacent levels of detail which bracket the desired level of detail are blended together to render each frame, because it is not normally the case that the desired level of detail is exactly one of those represented by the discrete set.
  • Such techniques are sometimes referred to as trilinear filtering or mipmapping.
  • mipmapped image pyramids are premade, and kept in short-term memory (i.e. RAM) continuously during the zooming operation; thus any required level of detail is always available.
  • the image pyramid In some advanced 3D rendering scenarios, the image pyramid must itself be rendered within an
  • the present invention involves both strategies for prioritizing the (potentially slow) rendition of the parts of the image pyramid relevent to the current display, and stategies for presenting the user with a smooth, continuous perception of the rendered content based on partial information, i.e. only the currently available subset of the image pyramid.
  • these strategies make near-optimal use of the available computing power or bandwidth, while masking, to the extent possible, any image degradation resulting from incomplete image pyramids. Spatial and temporal blending are exploited to avoid discontinuities or sudden changes in image sharpness.
  • An objective of the present invention is to allow sampled (i.e. "pixellated") visual content to be rendered in a zooming user interface without degradation in ultimate image quality relative to conventional trilinear interpolation.
  • a further objective of the present invention is to allow arbitrarily large or complex visual content to be viewed in a zooming user interface.
  • a further objective of the present invention is to enable near-immediate viewing of arbitrarily complex visual content, even if this content is ultimately represented using a very large amount of data, and even if these data are stored at a remote location and shared over a low-bandwidth network.
  • a further objective of the present invention is to allow the user to zoom arbitrarily far in on visual content while maintaining interactive frame rates.
  • a further objective of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex visual content, in the process both preserving the overall appearance of the content and maintaining interactive frame rates.
  • a further objective of the present invention is to minimize the user's perception of transitions between levels of detail or rendition qualities during interaction.
  • a further objective of the present invention is to allow the graceful degradation of image quality by continuous blurring when detailed visual content is as yet unavailable, either because the information needed to render it is unavailable, or because rendition is still in progress.
  • a further objective of the present invention is to gracefully increase image quality by gradual sharpening when renditions of certain parts of the visual content first become available.
  • zooming user interfaces are a generalization of the usual concepts underlying visual computing, allowing a number of limitations inherent in the classical user/computer/document interaction model to be overcome.
  • One such limitation is on the size of a document that can be "opened” from a computer application, as traditionally the entirety of such a document must be “loaded” before viewing or editing can begin.
  • RAM random access memory
  • this limitation is felt, because all of the document information must be transferred to short-term memory from some repository (e.g. from a hard disk, or across a network) during opening; limited bandwidth can thus make the delay between issuing an "open” command and being able to begin viewing or editing unacceptably long.
  • Still digital images both provide an excellent example of this problem, and an illustration of how the computer science community has moved beyond the standard model for visual computing in overcoming the problem.
  • Table 1 shows download times at different bandwidths for typical compressed sizes of a variety of different image types, from the smallest useful images (thumbnails, which are sometimes used as icons) to the largest in common use today. Shaded boxes indicate images sizes for which interactive browsing is difficult or impossible at a particular connection speed. Table 1.
  • Modern image compression standards such as JPEG2000 5 , are designed to address precisely this problem. Rather than storing the image contents in a linear fashion (that is, in a single pass over the pixels, normally from top to bottom and left to right), they are based on a multiresolution decomposition.
  • the image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and 1x1.
  • the fine details are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently-sized images are often called levels of detail, or LODs for short.
  • LODs levels of detail
  • each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges of the image, as its dimensions may not be an exact multiple of the nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile.
  • the resulting tiled image pyramid is shown in Figure 2. Note that the "tip" of the pyramid, where the downscaled image is smaller than a single tile, looks like the untiled image pyramid of Figure 1.
  • the JPEG2000 image format includes all of the features just described for representing tiled, multiresolution and random-access images.
  • tile rendition we will refer to the production of a finished, fully drawn tile in response to a tile drawing request as tile rendition, with the understanding that this may be a slow process. Whether it is slow because the required data are substantial and must be downloaded over a slow connection or because the rendition process is itself computationally intensive is irrelevant.
  • a complete zooming user interface combines these ideas in such a way that the user is able to view a large and possibly dynamic composite document, whose sub- documents are usually spatially non-overlapping. These sub-documents may in turn contain (usually non-overlapping) sub-sub-documents, and so on.
  • documents form a tree, a structure in which each document has pointers to a collection of sub- documents, or children, each of which is contained within the spatial boundary of the parent document.
  • We call each such document a node borrowing from programming terminology for trees.
  • nodes may be static images which can be edited using painting-like commands, while other nodes may be editable text, while other nodes may be Web pages designed for viewing and clicking. All of these can coexist within a common large spatial environment — a "supernode” — which can be navigated by zooming and panning.
  • zooming is an intrinsic aspect of navigation, content of any kind can be viewed at an appropriate spatial scale.
  • the vision impaired can easily navigate the same content as normally sighted people, simply by zooming in farther.
  • variable g refers to the sampling density of a tile relative to the display, defined in #1.
  • Tiling granularity which we will write as the variable g, is defined as the ratio of the linear tiling grid size at a some LOD to the linear tiling grid size at the next lower LOD. This is in general presumed to be
  • a zooming user interface must address the problem of how to request tiles during navigation. In many situations, it is unrealistic to assume that all such requests will be met in a timely manner, or even that they will be met at all during the period when the information is relevant (i.e. before the user has zoomed or panned elsewhere.) It is therefore desirable to prioritize tile requests intelligently.
  • the "outermost” rule for tile request queuing is increasing level of detail relative to the display.
  • This rule ensures that, if a region of the display is undersampled (i.e.
  • the client's first priority will be to fill in this "resolution hole”. If more than one level of detail is missing in the hole, then requests for all levels of detail with/ ⁇ 1, plus the next higher level of detail (to allow LOD blending — see #5), are queued in increasing order. At first glance, one might suppose that this introduces unnecessary overhead, because only the finest of these levels of detail is strictly required to render the current view; the coarser levels of detail are redundant, in that they define a lower-resolution image on the display. However, these coarser levels cover a larger area — in general, an area considerably larger than the display.
  • the coarsest level of detail for any node in fact includes only a single tile by construction, so a client rendering any view of a node will invariably queue this "outermost" tile first.
  • Foveated tile request queuing Within a relative level of detail, tile requests are queued by increasing distance to the center of the screen, as shown in Figure 3.
  • This technology is inspired by the human eye, which has a central region — the fovea — specialized for high resolution. Because zooming is usually associated with interest in the central region of the display, foveated tile request queuing usually reflects the user's implicit prioritization for visual information during inward zooms. Furthermore, because the user's eye generally spends more time looking at regions near the center of the display than the edge, residual blurriness at the display edge is less noticeable than near the center.
  • the figure shows two alternate "navigation paths": in the top row, the user remains stationary while viewing a single document (or node) occupying about two thirds of the display, which we assume can be displayed at very high resolution.
  • the blending function may be linear (i.e. the opacity of the new tile is a linear function of time since the tile became available, so that halfway through the fixed blend- in interval the new tile is 50% opaque), exponential, or follow any other interpolating function.
  • every small constant interval of time corresponds to a constant percent change in the opacity; for example, the new tile may become 20% more opaque at every frame, which results in the sequence of opacities over consecutive frames 20%, 36%, 49%, 59%, 67%, 74%, 79%, 83%, 87%, 89%, 91%, 93%, etc.
  • the exponential never reaches 100%, but in practice, the opacity becomes indistinguishable from 100% after a short interval.
  • An exponential blend has the advantage that the greatest increase in opacity occurs near the beginning of the blending-in, which makes the new information visible to the user quickly while still preserving acceptable temporal continuity.
  • the illusion created is that regions of the display come smoothly into focus as the necessary information becomes available.
  • Figure 5 shows our simplest reference implementation for how each tile can be decomposed into rectangles and triangles, called tile shards, such that opacity changes continuously over each tile shard.
  • Tile X bounded by the square aceg, has neighboring tiles L, R, T and B on the left, right, top and bottom, each sharing an edge. It also has neighbors TL, TR, BL and BR sharing a single corner. Assume that tile X is present. Its "inner square", iiii, is then fully opaque. (Note that repeated lowercase letters indicate identical vertex opacity values.) However, the opacity of the surrounding rectangular frame is determined by whether the neighboring tiles are present (and fully opaque). Hence if tile TL is absent, then point g will be fully transparent; if L is absent, then points h will be fully transparent, etc. We term the border region of the tile (X outside iiii) the blending flaps.
  • Figure 6 illustrates the reference method used to interpolate opacity over a shard.
  • Part (a) shows a constant opacity rectangle.
  • Part (b) is a rectangle in which the opacities of two opposing edges are different; then the opacity over the interior is simply a linear interpolation based on the shortest distance of each interior point from the two edges.
  • Part (c) shows a bilinear method for interpolating opacity over a triangle, when the opacities of all three corners abc may be different.
  • every interior point/? subdivides the triangle into three sub-triangles as shown, with areas A, B and C. The opacity at/?
  • this method ensures that opacity will vary smoothly over the entire tiled surface.
  • this strategy causes the relative level of detail visible to the user to be a continuous function, both over the display area and in time. Both spatial seams and temporal discontinuities are thereby avoided, presenting the user with a visual experience reminiscent of an optical instrument bringing a scene continuously into focus.
  • the speed with which the scene comes into focus is a function of the bandwidth of the connection to the repository, or the speed of tile rendition, whichever is slower.
  • the continuous level of detail is biased in such a way that the central area of the display is brought into focus first.
  • the target opacity is decreased linearly (or using any other monotonic function) such that it goes to zero if the oversampling is g-fold.
  • this causes continuous blending over a zoom operation, ensuring that the perceived level of detail never changes suddenly.
  • the number of blended levels of detail in this scheme can be one, two, or more.
  • a number larger than two is transient, and caused by tiles at more than one level of detail not having been fully blended in temporally yet.
  • a single level is also usually transient, in that it normally occurs when a lower-than-ideal LOD is "standing in” at 100% opacity for higher LODs which have yet to be downloaded or constructed and blended in.
  • the simplest reference implementation for rendering the set of tile shards for a node is to use the so-called "painter's algorithm": all tile shards are rendered in back-to- front order, that is, from coarsest (lowest LOD) to finest (highest LOD which oversamples the display less than g-fold).
  • the target opacities of all but the highest LOD are 100%, though they may transiently be rendered at lower opacity if their temporal blending is incomplete.
  • the highest LOD has variable opacity, depending on how much it oversamples the display, as discussed above.
  • this reference implementation is not optimal, in that it may render shards which are then fully obscured by subsequently rendered shards. More optimal implementations are possible through the use of data structures and algorithms analogous to those used for hidden surface removal in 3D graphics.
  • FIGURE 1 Title: SYSTEM AND METHOD FOR THE EFFICIENT, DYNAMIC AND
  • the present invention relates generally to multiresolution imagery. More specifically, the invention is a system and method for efficiently blending together visual representations of content at different resolutions or levels of detail in real time. The method ensures perceptual continuity even in highly dynamic contexts, in which the data being visualized may be changing, and only partial data may be available at any given time.
  • the invention has applications in a number of fields, including (but not limited to) zooming user interfaces (ZUIs) for computers.
  • ZUIs zooming user interfaces
  • Multiresolution methods are also used in mathematical and physical simulations, in situations where a possibly lengthy calculation can be performed more "coarsely” or more “finely”; this invention also applies to such simulations, and to other situations in which multiresolution visual data may be generated interactively. Further, the invention applies in situations in which visual data can be obtained “on the fly” at different levels of detail, for example, from a camera with machine-controllable pan and zoom.
  • the present invention is a general approach to the dynamic display of such multiresolution visual data on one or more 2D displays (such as CRTs or LCD screens).
  • the wavelet decomposition of a large digital image (e.g. as used in the JPEG2000 image format).
  • This decomposition takes as its starting point the original pixel data, normally an array of samples on a regular rectangular grid. Each sample usually represents a color or luminance measured at a point in space corresponding to its grid coordinates.
  • the grid may be very large, e.g. tens of thousands of samples (pixels) on a side, or more. This large size can present considerable difficulties for interactive display, especially when such images are to be browsed remotely, in environments where the server (where the image is stored) is connected to the client (where the image is to be viewed) by a low-bandwidth connection.
  • IOOK 0.1MB
  • Modern image compression standards such as JPEG2000 1 , are designed to address precisely this problem. Rather than storing the image contents in a linear fashion (that is, in a single pass over the pixels, normally from top to bottom and left to right), they are based on a multiresolution decomposition.
  • the image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and 1x1.
  • a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and 1x1.
  • the granularity which we represent by the variable g.
  • the granularity may change at different scales, but here, for example and without limitation, we will assume that g is constant over the "image pyramid
  • the JPEG2000 image format includes the features just described for representing tiled, multiresolution and random-access images.
  • tiled JPEG2000 image If a detail of a large, tiled JPEG2000 image is being viewed interactively by a client on a 2D display of limited size and resolution, then some particular set of adjacent tiles, at a certain level of detail, are needed to produce an accurate rendition. Ih a ⁇ dynamic context, however, these may not all be available. Tiles at coarser levels of detail often will be available, however, particularly if the user began with a broad overview of the image. Since tiles at coarser levels of detail span a much wider area spatially, it is likely that the entire area of interest is covered by some combination of available tiles. This implies that the image resolution available will not be constant over the display area.
  • the present invention resolves these issues, while preserving all the advantages of the painter's algorithm.
  • One of these advantages is the ability to deal with any kind of LOD tiling, including non-rectangular or irregular tilings, as well as irrational grid tilings, for which I am filing a separate provisional patent application.
  • Tilings generally consist of a subdivision, or tesselation, of the area containing the visual content into polygons.
  • the multiplicative factor by which their sizes differ is the granularity g, which we will assume (but without limitation) to be a constant.
  • an irrational but rectangular tiling grid will be used to describe the improved algorithm. Generalizations to other tiling schemes should be evident to anyone skilled in the art.
  • the improved algorithm consists of four stages.
  • a composite grid is constructed in the image's reference frame from the superposition of the visible parts of all of the tile grids in all of the levels of detail to be drawn.
  • this results in an irregular composite grid, shown schematically in Figure 1.
  • the grid is further augmented by grid lines corresponding to the x- and >>-values which would be needed to draw the tile "blending flaps" at each level of detail (not shown in Figure 1, because the resulting grid would be too, dense and visually confusing).
  • This composite grid which
  • n grid lines parallel to the jc-axis and m grid lines parallel to the ⁇ -axis Let there be n grid lines parallel to the jc-axis and m grid lines parallel to the ⁇ -axis. We then construct a two-dimensional n * m table, with entries corresponding to the squares of the grid. Each grid entry has two fields: an opacity, which is initialized to zero, and a list of references to specific tiles, which is initially empty.
  • the second stage is to walk through the tiles, sorted by decreasing level of detail (opposite to the naive implementation). Each tile covers an integral number of composite grid squares. For each of these squares, we check to see if its table entry has an opacity less than 100%, and if so, we add the current tile to its list and increase the opacity accordingly.
  • the per-tile opacity used in this step is stored in the tile data structure.
  • the composite grid will contain entries corresponding to the correct pieces of tiles to draw in each grid square, along with the opacities with which to draw these "tile shards". Normally these opacities will sum to one. Low-resolution tiles which are entirely obscured will not be referenced anywhere in this table, while partly obscured tiles will be referenced only in tile shards where they are partly visible.
  • the third stage of the algorithm is a traversal of the composite grid in which tile shard opacities at the composite grid vertices are adjusted by averaging with neighboring vertices at the same level of detail, followed by readjustment of the vertex opacities to preserve the summed opacity at each vertex (normally 100%). This implements a refined
  • the composite grid is in general denser than the 3x3 grid per tile defined in innovation #4, especially for low-resolution tiles. (At the highest LOD, by construction, the composite gridding will be at least as fine as necessary.) This allows the averaging technique to achieve greater smoothness in apparent level of detail, in effect by creating smoother blending flaps consisting of a larger number of tile shards.
  • the composite grid is again traversed, and the tile shards are actually drawn.
  • this algorithm involves multiple passes over the data • and a certain amount of bookkeeping, it results in far better performance than the na ⁇ ve algorithm, because much less drawing must take place in the end; every tile shard rendered is visible to the user, though sometimes at low opacity. Some tiles may not be drawn at all. This contrasts with the na ⁇ ve algorithm, which draws every tile intersecting with the displayed area in its entirety.
  • An additional advantage of this algorithm is that it allows partially transparent nodes to be drawn, simply by changing the total opacity target from 100% to some lower value. This is not possible with the na ⁇ ve algorithm, because every level of detail except the most detailed must be drawn at full opacity in order to completely "paint over" any underlying, still lower resolution tiles.
  • the composite grid can be constructed in the usual manner; it may be larger than the grid would have been for the unrotated case, as larger coordinate ranges are visible along a diagonal.
  • the composite grid squares outside the viewing area need not be updated during the traversal in the second or third stages, or drawn in the fourth stage.
  • each level of detail can be drawn immediately as it is completed, with the correct opacity, thus requiring only the storage of a single tile identity per shard at any one time.
  • Another exemplary optimization is that the total opacity rendering left to do, expressed in terms of (area) x (remaining opacity), can be kept track of, so that the algorithm can quit early if everything has already been drawn; then low levels of detail need not be "visited” at all if they are not needed.
  • the algorithm can be generalized to arbitrary polygonal tiling patterns by using a constrained Delaunay triangulation instead of a grid to store vertex opacities and tile shard identifiers.
  • This data structure efficiently creates a triangulation whose edges contain every edge in all of the original LOD grids; accessing a particular triangle or vertex is an efficient operation, which can take place in of order n*log( ⁇ ) time (where n is the number of vertices or triangles added).
  • n is the number of vertices or triangles added.
  • the present invention relates generally to zooming user interfaces (ZUIs) for computers. More specifically, the invention is a system and method for efficiently representing and navigating through zoomable content using hierchical data structures which allow the content to have effectively infinite precision spatial positioning and size. This enables zoomable environments of unlimited scale or depth.
  • ZUIs zooming user interfaces
  • GUIs graphical computer user interfaces
  • visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out.
  • the desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • viewing maps browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • zoomable components such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc.
  • these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally.
  • continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent.
  • any kind of visual content could be zoomed, and zooming would be as much a part of the user's experience as panning.
  • Ideas along these lines made appearances as futuristic computer user interfaces in many movies even as early as the 1960s 1 ; recent movies continue the trend 2 .
  • a number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present. In 1991, some of these ideas were formalized in U.S. Patent 5,341,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability").
  • the prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++ have undergone some development since 4 .
  • Voss zooming user interface framework
  • This patent is specifically about Voss's approach to coordinate systems and navigation.
  • 2D coordinate system which defines a point in two dimensional (2D) space as a pair of numbers, usually termed the x- and y-coordinates (x,y), representing horizontal and vertical displacements from the origin, (0,0).
  • 2D points are also represented using non-Cartesian coordinate systems, such as polar coordinates; the substantive aspects of the following discussion apply equally to any such coordinate systems.
  • 3D coordinate system defined by a triplet of numbers (x,y,z) is normally used to represent points in space; again, these may or may not be Cartesian coordinates.
  • displays are normally two dimensional, view-dependent mathematical transformations are required to reduce three dimensional world coordinates to two dimensional screen coordinates.
  • the coordinates being manipulated are normally represented using a numerical data type native to the computer, either integer or floating-point.
  • Such data types normally use between 16 and 64 bits (binary digits) of memory. Because of their limited representational size, these numbers have limited precision — that is, their decimal expansions are only defined out to some limited number of significant places. In the case of 64-bit floating point numbers, this is about 15 decimal places.
  • each 2D coordinate pair (x,y) corresponds to a fixed point on the display surface
  • this precision is more than adequate.
  • the user is easily able to zoom in, causing the area which previously covered a single pixel to fill the entire display, or zoom out, causing the contents of the entire display to shrink to the size of a single pixel.
  • Each such zoom may effectively multiply or divide the (x,y) coordinates by a factor of approximately 1,000.
  • Several such zooms in or out therefore exhaust the precision of any standard internal floating-point representation. (Five such zooming operations, for example, would completely exhaust the precision of a 64-bit foating point number.
  • the present invention solves this problem by dispensing entirely with world coordinates. Instead, all zooming and panning operations are conducted in a tree (or more generally, a directed graph) of local coordinate systems which collectively define the zoomable "universe" of content.
  • Content comprises a collection of nodes, which are themselves defined using a local 2D coordinate system of machine-precision coordinates. If standard 64-bit floating point numbers are used, a single node is thus limited to having about 15 decimal places of precision per coordinate, or, in terms of pixels, being at most about 10 ⁇ 14 pixels on a side. However, a node may be the parent of a number of child nodes, each of which is geometrically contained within the boundary of the parent.
  • the child's size and position relative to the parent can be specified in the parent's local coordinate system, and thus fit into machine-precision numbers; the child may have its own local coordinate system, however, which allows it in turn to have (for example) resolution up to 10 ⁇ 14 pixels on a side.
  • An objective of the present invention is to allow a pannable and zoomable 2D space of finite "physical size", but arbitrarily high complexity or resolution, to be embedded into a well-defined area of a larger pannable and zoomable 2D space.
  • a further objective of the present invention is to allow geometric trees or directed graphs of visual objects to be constructed by the above embedding procedure, allowing such trees or graphs to become arbitrarily large and complex while retaining the ability to pan and zoom through the resulting space.
  • An objective of the present invention is therefore to allow fluid zooming and panning in a virtual 2D universe of potentially unlimited visual complexity and detail on ordinary, present-day computer architectures.
  • a further objective of the present invention is to mimic the behavior of infinite- precision arithmetic on the coordinates of a pannable and zoomable 2D space while still retaining the computational speed of coordinate calculations performed on native machine precision numbers.
  • the software package MathematicaTM ( ⁇ Wolfram Research) provides examplary implementations of data structures and algorithms for infinite precision arithmetic (which however do not satisfy these same criteria).
  • a further objective of the present invention is to mimic the behavior of infmite- precision arithmetic on the coordinates of a pannable and zoomable 2D space while avoiding the large memory consumption of infinite-precision numbers.
  • a further objective of the present invention is to allow the embedding of reusable visual content into a zoomable and pannable 2D space by reference, i.e. without needing to update any coordinates or other data structures in the content to be embedded. Because this allows a 2D space to be embedded within another without a traversal of the new child's coordinate system tree, this capability enables the embedding of any 2D space, regardless of complexity.
  • a further objective of the present invention is to allow infinite nesting in zoomable and pannable content due to circular references: a node with content B may be a child of a node with content A (i.e. B appears geometrically in the interior of A), and node B may in turn contain a node with content A as a child.
  • this type of recurrence can occur very easily. This generalizes the concept of a tree of nodes with associated coordinate systems to that of a directed graph of nodes with associated coordinate systems.
  • a further objective of the present invention is to allow zooming out after a deep zoom-in to behave like the "back" button of a web browser, letting the user retrace his or her steps through a visual navigation.
  • a further objective of the present invention is to allow zooming in immediately after zooming out to behave analogously to the "forward" button of a web browser, letting the user precisely undo the effects of an arbitrarily long zoom-out.
  • a further objective of the present invention is to allow a node to have a very large number of child nodes (for example, up to 10 A 28).
  • a further objective of the present invention is to allow a node to generate its own children programmatically on the fly, enabling content to be defined, created or modified dynamically during navigation.
  • a further objective of the present invention is to enable near-immediate viewing of arbitrarily complex visual content, even if this content is ultimately represented using a very large amount of data, and even if these data are stored at a remote location and shared over a low-bandwidth network.
  • a further objective of the present invention is to allow the user to zoom arbitrarily far in on visual content while maintaining interactive frame rates.
  • a further objective of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex visual content, in the process both preserving the overall appearance of the content and maintaining interactive frame rates.
  • Data structures (sometimes known as abstract data types, or ADTs) will be introduced using the word Structure followed by material in curly braces ⁇ ... ⁇ .
  • curly braces will be listed the fields, or data elements that comprise the structure, in the format DataType variableName where DataType is either a previously defined structure or a primitive type, and variableName is the field's name. Note that data types and functions will always begin with a capital letter, and variable names or field names will always begin with lowercase.
  • the primitive types used will be Boolean (which can take on values true or false), Double (a 64-bit floating point number corresponding to C language's double type), Integer (a 64-bit signed integer data type), and String (a character string).
  • the names of structures and variables, as well as details of the data types and formats used, are exemplary; alternate implementations of the present invention may alter any of these details, include any number of additional fields, or use different structures or internal representations.
  • Collection ⁇ T> which stores an
  • nodes For a node to be visible it must be associated with a rendering method, which is able to draw it in whole or in part on some area of the display.
  • Each node is endowed with a local coordinate system of finite precision. For illustrative purposes we will assume that a node is rectangular and represents its local coordinates using the Point2D and Rectangle data structures as defined above. Thus a Rectangle will define the boundaries of the local coordinate
  • a node may be non-rectangular and/or use a different coordinate system.
  • Structure Node ⁇ Rectangle coordSystem
  • rectangle in the node's coordinate system
  • rectangle onDisplay in display or "screen” coordinates
  • onDisplay should be within a rectangle defining the boundaries of the display in display coordinates.
  • Each node may have zero or more child nodes, which it addresses by reference. This means that a node need not, and generally does not, contain all the information of each child node, but instead only an address providing the information necessary to obtain the child node.
  • a URL http://(7) is an example of such an address, but addresses may take other forms, e.g. a pointer in memory, a globally unique identifier, a hardware port, etc.
  • Address address a function Function Node Dereference
  • a reference to a child In addition to the child node's address, a reference to a child must specify the child's size and location in the parent node's local coordinate system.
  • Different nodes may share some or all of their children, but perhaps in a different spatial arrangement, allowing the possibility of different views of the same information.
  • truncated rendition may become negligible, because any nodes not drawn are too small to affect the overall appearance of the display.
  • a node may be its own descendant.
  • the directed graph of nodes defined by the "is a child of relationship may have cycles (making it no longer a tree in the graph theoretic sense). If children occupy a substantial portion of their parents' area, and a graph cycle is small (i.e. A->B->A or A->B->C- ⁇ A) then this results in a hall-of-mirrors effect.
  • node User interaction with a node, such as typing text into it, normally requires that the node be visible.
  • a number of different models might be used for selecting the node with which an interaction is to take place: for example, the tab key may cycle through nodes, or the node under the mouse pointer may be the target.
  • the number of nodes that are candidates for user interaction is of the same order as the number of nodes rendered, and thus finite.
  • Methods analogous to the rendering function described above can be used to pass user interaction messages to nodes, which may affect their future behavior or appearance. This architecture is therefore sufficient to enable a node to be a complete software application, and not merely a static visual object.
  • zooming in means progressively enlarging a part of the content visible on the display so that it occupies more area on the display; a smaller physical area will then be visible, but in greater detail.
  • Zooming out is the converse operation. Because we have assumed that the physical dimensions of the universe are bounded, zooming out is a bounded operation: once the entire universe is visible, further zooming out cannot bring any new content into view, but simply shrinks the universe to an area smaller than the entire display. It is therefore natural to define a root node as encompassing the entire universe; it has child nodes which are visible when fully zoomed out, these children have their own child nodes, etc.
  • Child nodes are in general smaller than parent nodes.
  • Each node however, has its own local coordinate system, so this construction allows for a cascade of ever-finer coordinate systems, and thus for a universe of potentially infinite spatial resolution. This means that zooming in is not a bounded operation: if the node graph is has cycles, one may zoom forever in a "content loop"; or more interestingly, the node graph may have a very large or even infinite number of nodes, allowing one to zoom in indefinitely while viewing new content all the time.
  • the expanded Node definition is: Structure Node ⁇ Rectangle coordSystem; Collection ⁇ ChildReference> children; Rectangle view;
  • the new view field represents, in node coordinates, the visible area of the node — that is,
  • This rectangle may only partially overlap the node's area as defined by coordSystem, as when the node is partially off-
  • the stack structure is defined thus: Stack ⁇ Address> viewStack; where this stack is a global variable of the client (the computer connected to the display). For exemplary purposes we assume that navigation begins with an overview of a universe of content, defined by a root node; then this root node is pushed onto the viewStack,
  • rootNode.view rootNode.coordSystem
  • Push(viewStack, rootNode) the rootNode's view field
  • minimumArea some minimum size
  • the last element's view field does not, however, specify the user's viewpoint
  • the view field of the root node does specify where in the universe the user is looking, although due to roundoff and discretization error it is probable that the root node's view . Io and view . hi have collapsed to a point, and this point will only be a finite- precision approximation to the real view position. Nodes closer to the "fine end" of the viewStack thus specify the view position with increasing precision, but relative to
  • step is to alter the view field of this last node; this altered view is taken to be the correct
  • the second step is to propagate the new view "upward" toward the root node, which entails making progressively smaller and smaller changes to the view fields of nodes earlier in the
  • the alteration to the view may be so small that it ceases to be accurately representable; upward propagation stops at this node.
  • the change is also propagated downward to other visible nodes, using the approach of the unmodified RenderNode pseudo-code. Hence, first, the last node's parent's view is
  • a panning operation may move the last node far enough away that it no longer belongs in the viewStack.
  • zooming in might enlarge a child of the last
  • Truncating the viewStack is only necessary if the user then pans. Although a long outward zoom will cause the view fields of deeply zoomed nodes to grow very large (and therefore numerically inaccurate), a field
  • zooming without panning therefore does not alter the viewCenter field of any node.
  • This construction allows zooming far outward to be followed immediately by zooming back in. Because the viewStack has been left intact, the user can then return to
  • the Zeno cache a system for increasing the effectiveness of most- recently- used (MRU)caching for variably compressable data objects.
  • MRU caching where MRU stands for “most recently used”, is a well-known concept for implementing a client-side memory in a client-server system. It is assumed that the server has access to and can serve to a client a large number of data objects, which in the aggregate may occupy a large amount of memory. The available bandwidth between client and server is limited, however, so client requests for data objects to be sent from the server take time. If access to data objects is reasonably “coherent”, meaning that objects which the client needed recently are likely to be needed again in the near future, then MRU caching is a way to greatly increase the efficiency of the client-server system.
  • LRU least recently used
  • the linear LRU-MRU arrangement is merely a conceptual convenience.
  • the cache is first examined to see if the object is cached. If it is, then the cached representation is used, obviating the need for an expensive server request; usually, making use of a cached representation also "promotes" that object to the MRU end of the cache.
  • the performance advantages of this scheme are obvious.
  • Caching can confer performance advantages in any situation in which a client request for one object affects the probability distribution of the likelihoods of requesting other objects in the near future.
  • Straightforward MRU caching is optimized for the case in which this alteration is simply modeled as an increased likelihood of requesting the same object again; but generalizations exist and the present invention can be extended to them. Zeno caching concept.
  • the data objects being served are compressable, which for our purposes "will mean amenable to lossy data compression techniques.
  • Lossy data compression is characterized by the ability to represent a data object with fewer bytes than the full representation; higher compression ratios (meaning more compression) correspond to higher distortion, or lower quality.
  • higher compression ratios meaning more compression
  • the nature of the data and associated compression algorithm should have the following characteristics:
  • compressed versions of the data should be suitable as stand-ins for the uncompressed data. Below a certain level of distortion, the compressed representations may be fully adequate, and above a certain level of distortion, they may be adequate as a stopgap while the client waits for uncompressed, lossless or higher- quality versions of the data.
  • lower-quality representations are subsets of higher- quality representations, meaning that improving the representation quality at the client side using additional information available on the server side only requires sending the missing data or difference, not re-sending an entirely new version of the data. This avoids redundancy and hence substantially increases efficiency.
  • the enhanced embodiment above also usually implies that lowering the representation quality of an object merely involves throwing away some of the data, not re-compressing. This properly is also important for efficiency.
  • the compression technique scales from lossy to lossless (i.e. a perfect, or zero distortion representation). Combined with the above enhanced embodiments, this allows a perfect representation of a data object to be built up in steps, from highly lossy to lossless, at little or no extra total cost relative to sending across a lossless version initially.
  • Memory is discrete, so that, for example, it is usually meaningless in practice to compress an object to a representations smaller than one bit.
  • the number of objects in the cache will in practice be finite, but by using the Zeno cache concept this number may be much larger than would be possible with conventional MRU caching. Further, cached objects will have the property that if recently used, they will be represented in the cache at high quality, and this quality will deteriorate progressively if the object has not been accessed recently. In this sense, Zeno caching probably works a lot like the human memory.
  • cached representations will be subject to a maximum compression ratio.
  • the absolute maximum number of objects that could be stored in the cache is thus the cache size divided by this minimum compressed size, assuming that the objects are all of equal size (of course, they need not be).
  • Random algorithms can be used to improve the basic algorithm in a number of ways. The chief disadvantage of the 2*1/4, 8*1/16 etc. series above arises from its strength— its small number of assumed values. Random choice can also be used to "squeeze" a random subset of the cache elements in a weighted fashion until some target amount of space is freed. This works because exact position in the cache becomes decreasingly important with large ru The amount of squeezing can also be (somewhat) random. Using random approaches like these can eliminate obvious discontinuities or thresholds in object quality.
  • caching can also involve intelligent guessing about which objects might be needed next; thus objects less likely to be needed can be "squeezed" before objects with a higher likelihood of being needed in the future. This can be combined with a random algorithm.
  • An MRU/LRU caching system substantially as described.
  • image compression standards such as JPEG2000/JPIP 1 have been introduced to meet a demanding engineering goal: to enable very large images (i.e. gigapixels in size) to be delivered incrementally or selectively from a server to a client over a low-bandwidth communication channel.
  • images are being viewed at full resolution, only a limited region can fit on a client's graphical display at any given time; the new standards are geared toward selectively accessing such regions and sending across the communication channel only data relevant to the region. If this "region of interest" or ROI changes continuously, then a continuous dialogue between a client and server over a low- bandwidth channel can continue to keep the client's representation of the area inside the ROI accurate.
  • the present invention relates to an extension of these selectively decompressable image compression and transmission technologies to textual or other non-image data.
  • a large text e.g. the book Ulysses, by James Joyce.
  • We can format this text by putting each chapter in its own column, with columns for sequential chapters arranged Ieft-to-right. Columns are assumed to have a maximum width in characters, e.g. 100.
  • Figure 2 shows the entire text of Ulysses encoded as an image in this fashion, with each textual character corresponding to a single pixel.
  • the pixel intensity value in Figure 1 is simply the ASCII code of the corresponding character.
  • JPEG2000 is used as a lossy compression format, meaning that the decoded image bytes are not necessarily identical to the original bytes. Clearly if the image bytes represent text, lossy compression is not acceptable.
  • One of the design goals of JPEG2000 was, however, to support lossless compression efficiently, as this is important in certain sectors of the imaging community (e.g. medical and scientific). Lossless compression ratios for photographic images are typically only around 2:1, as compared with visually acceptable lossy images, which can usually easily be compressed by 24:1.
  • Image compression both lossy and lossless, can operate best on images that have good spatial continuity, meaning that the differences between the intensity values of adjacent pixels are minimized.
  • the raw ASCII encoding is clearly not optimal from this perspective.
  • One very simple way to improve the encoding is to reorder characters by frequency in the text or simply in the English language, from highest to lowest: code 0 remains empty space, code 1 becomes the space character, and codes 2 onward are e, t, a, o, i, n, s, r, h, 1, etc.
  • Figures 2 and 3 compare text-images with ASCII encoding and with this kind of character frequency encoding.
  • the file size is 1.6MB, barely larger than the raw ASCII text file (1.5MB) and 37% smaller than the ASCII encoded text-image.
  • the compressed file size can drop well below the ASCII text file size.
  • the further optimizations can include, but are not limited to: using letter transition probabilities (Markov-1) to develop the encoding, instead of just frequencies (Markov-0) encoding as pixels the delta or difference between one character and the next, rather than the characters themselves.
  • the new invention is discussed here in the context of JPEG2000/JPIP as a selective image decompression technology, but nothing about the invention limits it to that particular format or protocol.
  • LizardTech's MrSID format and protocol has similar properties, and would also work.
  • a method of spatially encoding large texts and the like comprising encoding the ascii value of each of a plurality of characters into an intensity level.
  • This invention relates to a novel method for accessing visual data, usually images, by computer. It is applicable to any situation in which visual content consists of a series of objects viewed one or a few at a time, in some established order.
  • Some popular image browsing applications e.g. ACDSeeTM by ACD Systems, implement "read-ahead” and “read-behind” strategies to avoid flicker or lack of responsiveness during virtual image slideshows. This involves loading and decompressing the next and previous image files in a presentation in addition to the current picture.
  • a timer expires or some other event signals a change of image
  • the image being displayed is immediately replaced with the next image which has been "waiting in the wings”, and the following image is read and decoded to prepare for the next transition.
  • the old previous image now two images behind — is normally erased from memory, keeping the number of images in memory at 3.
  • the present invention extends the concept of read-ahead/read-behind in conjunction with multiresolution imagery.
  • Multiresolution imagery can be decompressed at a ladder of resolutions, e.g. full size, half size (on each side), quarter size, eighth size, etc.
  • the time required to decompress an image at 1/8 size should be about 1/8 of the time required to decompress it at full resolution; and, of course, 1/8 of the memory is required to hold the image at 1/8 size.
  • This multiscale representation of images must be coupled with a multiresolution rendering scheme to allow the client or viewer to respond instantly to user requests to switch images.
  • a rendering scheme simply interpolates or "upsamples" lower-resolution representations of an image for display on a high-resolution screen.
  • the display must then refine dynamically to reflect the new higher-quality data. This refinement may be instantaneous, or it may be accomplished using gradual blending or other techniques that mask the transition from low to high visual quality.
  • interpolating from very low-resolution image data results in a blurry appearance. If high-resolution imagery replaces lower-resolution interpolated imagery, blending in over time, the perceptual effect is is for the image to appear to "come into focus".
  • the viewer or client Upon transition to a different image, the viewer or client must request additional data from the server, or load additional data from files, to improve the quality both of the new current image and of surrounding images (normally the new next images, if advancing images, or the new previous images, if stepping backward). Unneeded high-resolution data may also be discarded when the current image changes, to preserve the total memory footprint.
  • This scheme has many advantages over traditional look-ahead/look-behind, including: the user can skip any number of images forward or backward at a time — larger skips simply result in a blurrier initial appearance of the new image; the memory footprint may be no bigger than traditional methods, and may even be smaller (if deemed necessary) by making the function r(i) drop off more sharply, e.g. ..., 1/64, 1/16, 1/4, 1, 1/4, 1/16, 1/64, ....
  • the invention as described above applies to linear sequences of images, but it may be extended to "graphs" of images, i.e. collections of images in which one image may (potentially by user choice) be followed by or preceded by more than one possible next or previous image.
  • the original function r(i) may be applied to all images which might follow the current image via one transition, two transitions, etc.; or particular "paths" through the set of images may be weighted preferentially; or constraints can be used to allocate some fixed amount of memory among all possible preceding or successive images according to some allocation algorithm.
  • a method comprising caching an image, said step of caching being nonlinear.
  • Scenario #la A user keeps her entire collection of digital photos (5 megapixels each) on the hard drive of her notebook computer. She is an avid photographer, and after several years, this collection totals 25,000 images. She uses the present invention to organize the entire collection, and is able to rearrange the photos dynamically to sort them by date, size, color, or other properties, and extract subsets. When viewing the entire collection, she can zoom out smoothly and continuously until all of the photos are in view, 2 zoom in to view a detail of a single photo, or zoom to any intermediate view.
  • scenario #lb The user of scenario #la can configure her home computer as a server, and then navigate the entire photo collection from a remote client computer just as in scenario #la.
  • Scenario #2a An art museum has invested in high-resolution scanning of all of its paintings (100 megapixels and up), and puts together an online exhibit featuring dozens or hundreds of these, organized spatially with descriptive captions. Using the present invention, not
  • Scenario #2b The art museum creates a virtual three-dimensional space representing the museum building, with high-resolution scans of all of the artworks in their "geographically correct" positions within the 3D model. Alternatively three-dimensional virtual museum spaces can be created with no physical counterpart. These 3D models can be navigated in a manner similar to the 2D versions of scenarios #2a, either locally or remotely.
  • the analogue to the two- dimensional operation of zooming in is moving closer to an image surface, and zooming out is analogous to moving away from the image surface.
  • Scenario #2c The museum also scans its 14 th century Book of Hours at very high resolution, yielding hundreds of images of > 100 megapixels. These are assembled into a "virtual book", a high-quality surrogate for the original, available online. The book can be navigated locally or remotely, with turnable pages in three dimensions.
  • JPEG2000/JPIP relevant to enabling the current invention are: a multiscale image representation, making it efficient to decompress an image file at a ladder of resolutions lower than full resolution.
  • these resolutions are are downsampled from the original by powers of two, e.g. if the original is 512x512 pixels, then 256x256, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2 and 1x1 representations will also be available.
  • the 1x1 version is just a single pixel value corresponding the the average color of the entire image; progressively higher resolutions add progressively more details. For some images, the lowest resolutions (e.g.
  • 4x4, 2x2 and 1x1) may not be available.
  • the ability to selectively decompress only a portion of an image (called a "region of interest" or ROI) at a given resolution, e.g., a 32x32 pixel section from the 256x256 resolution of a 512x512 image.
  • a given resolution e.g., a 32x32 pixel section from the 256x256 resolution of a 512x512 image.
  • the amount of information sent should be approximately proportional to the size of the ROI.
  • image compression format/protocol satisfying these requirements would be equally suitable.
  • image format simply as “multiscale”, with the understanding that it could be wavelet-based, like JPEG2000, or based on some other technology.
  • the present invention defines precomputed steps and interactive rendering algorithms which can be used in a variety of configurations to implement the scenarios listed above. All of these scenarios involve user interaction with a "universe" of images; the starting point for precomputation is therefore a list of the filenames, URLs, or other strings referencing the individual images. When the user is zoomed out far enough to view all of these images at once, it is impractical for either the client or the server to traverse all of the image files, as there may be a very large number of them.
  • montage is a mosaic or collage of all of the images, rendered at low resolution and packed efficiently into a rectangular area, as shown in Figure 1.
  • Auxiliary metadata which can be embedded in the montage image file or stored separately, identifies rectangular regions on the montage image with a particular image file.
  • Figure 1 More than 1000 images (a collection of digitized maps of various sizes) packed into a montage.
  • the montage image itself can be navigated using a zooming and panning interface.
  • the metadata refers the client to individual image files, and the client uses imagery from these to render at higher resolution.
  • the overall montage size in pixels is chosen such that its resolution is only exhausted during a zoom-in at a stage where only a few images are visible simultaneously; therefore it is never necessary to access more than a few images at a time.
  • image streams are opened and closed as necessary to limit the number open at any given time.
  • montage layout is designed for packing efficiency, but the user may want a different visual arrangement onscreen. Moreover, the user may want to be able to dynamically rearrange the layout of images on the screen.
  • texture mapping a graphics rendering technique known as "texture mapping”, which may be implemented in software but is in general hardware-accelerated on modern personal computers. Texture mapping allows a portion of a "texture”, or source image, to be drawn on the display, optionally rescaling the image, rotating it, and performing a three-dimensional perspective transform.
  • a low-resolution version of the montage can be used as a "texture", so that when the user is zoomed out, the individual images within the montage can be dynamically remapped in any way, as in Figure 2. More than one texture map may be used, in which case each texture map may be a montage containing a subset of the images.
  • the texture mapping technique (which is generally only applicable to low-resolution renditions of the montage image or images) can be used only during dynamic rearrangement; when the image arrangement is static, software compositing can be used to assemble all or part of a higher-definition rearranged montage on-screen.
  • This software compositing method is especially valuable in combination with the lazy multiresolution rendering techniques described in US Patent application number 10/790,253, a copy of which is provided herewith as Ex. A. This method in effect creates a new "display montage" by rearranging the imagery of the original montage.
  • Texture mapping, software rendering, or any combination of the two can be used to render imagery in three dimensions instead of on the plane. Dynamic rearrangement in three dimensions is also possible. Three-dimensional applications include virtual galleries or other walk-through environments as well as virtual books, especially when used in combination with the invention described in a copending provisional application filed by the applicant concurrently herewith, and attached hereto as Exhibit B.
  • the virtual book application is illustrated in Figure 3. This example also illustrates an extension of the method in which an alpha channel, for partial transparency (the rough edges) is stored as image information in addition to the red, green and blue color components. Most implementations of hardware-accelerated texture mapping support an alpha channel.
  • Another extension applicable in either 2D or 3D is dynamic deformation of images, e.g. bending the pages of this book as they turn. .
  • the invention can also be extended to support visual objects other than static images, such as the output of a visual calculation, or an application or applet.
  • a method comprising performing texture mapping during dynamic rearrangement and ceasing to do so when such dynamic rearrangement ceases.
  • image compression standards such as JPEG2000/JPIP 1 have been introduced to meet a demanding engineering goal: to enable very large images (i.e. gigapixels in size) to be delivered incrementally or selectively from a server to a client over a low-bandwidth communication channel.
  • images are being viewed at full resolution, only a limited region can fit on a client's graphical display at any given time; the new standards are geared toward selectively accessing such regions and sending across the communication channel only data relevant to the region. If this "region of interest" or ROI changes continuously, then a continuous dialogue between a client and server over a low-bandwidth channel can continue to keep the client's representation of the area inside the ROI accurate.
  • the present invention relates to an extension of these selectively decompressable image compression and transmission technologies to geospatial or schematic data. It combines and extends methods introduced in previous application (1) Method for spatially encoding large texts, metadata, and other coherently accessed non-image data, attached as exhibit A, and (2) METHODS AND APPARATUS FOR NAVIGATING AN IMAGE attached as exhibit B.
  • (2) the concept of continuous multiscale roadmap rendering was introduced.
  • the basis for the invention of (2) is a pre-rendered "stack" of images of a roadmap or other vector-based diagram at different resolutions, in which categories of visual elements (e.g. classes of road, including national highway, state highway, and local road) are rendered with different visual weights at different resolutions.
  • categories of visual elements e.g. classes of road, including national highway, state highway, and local road
  • corresponding areas of more than one of these images are downloaded, and the client's display shows a blended combination of these areas; the blending coefficients and the choice of image resolution
  • Pre-rendering at higher detail is not desirable for several reasons: first, because the file sizes on the server side become prohibitive (a single Universal Transverse Mercator zone image at 15 meters/pixel may already be in the gigapixel range); second, because a pre-rendered image is an inefficient representation for the kind of very sparse black-and-white data normally associated with high-resolution map rendering; and third, because the client may require the "real" vector data for performing computational tasks beyond static visual presentation. For example, a route guidance system may highlight a road or change its color; this can be done on the client side only if the client has access to vector data, as opposed to a pre-rendered image alone.
  • Vector data may also include street names, addresses, and other information which the client must have the flexibility to lay out and render selectively. Pre-rendering street name labels into the map image stack is clearly undesirable, as these labels must be drawn in different places and sizes depending on the precise location and scale of the client view; different label renditions should not blend into one another as the user zooms. Pre-rendering such data would also eliminate any flexibility with regard to font.
  • vector data (where we use the term generically to refer both to geometric and other information, such as place names) is both important to the client in its own right, and a more efficient representation of the information than pre-rendered imagery, when the desired rendering resolution is high. Note, however, that if a large area is to be rendered at low resolution, the vector data may become prohibitively large and complex, making the pre- rendered image the more efficient representation. Even at low resolution, however, some subset of the vector data is necessary, such as the names of major highways.
  • the present invention extends the methods introduced in (1) to allow spatial vector data to be encoded and transmitted selectively and incrementally to the client, possibly in conjunction with the pre-rendered imagery of (2).
  • this would be accomplished using a geospatial database.
  • the database would need to include all relevant vector data, indexed spatially.
  • Such databases present many implementation challenges.
  • the prerendered layer is a precomputed literal rendition of the roadmap, as per (2);
  • the pointer layer consists of 2*2 pixel blocks positioned at or very near the roadmap features to which they refer, typically intersections;
  • the data layer consists of n*m pixel blocks centered on or positioned near the 2*2 pointers which refer to them.
  • Figures 2-3 show the prerendered layer alone, for comparison and orientation. The region shown is King County, in Washington state, which includes Seattle and many of its suburbs.
  • Figures 3 a and 3b are closeups from suburban and urban areas of the map, respectively.
  • FIG. 3a Closeup of suburban area of King County.
  • FIG. 3b Closeup of urban area of King County.
  • the client will request from the server the relevant portions of all three image layers, as shown.
  • the prerendered layer (shown in yellow) is the only one of the three displayed on the screen as is.
  • the other two specify the vector data.
  • the pointer image consists of 2x2 pixel blocks aligned on a 2x2 pixel grid, each of which specifies an (x,y) vector offset (with the x and y components of the vector each comprising a 16-bit integer, hence two pixels each) from its own location to the beginning (top left corner) of a corresponding data block in the data layer.
  • the corresponding data block begins with two 16-bit values (four pixels) specifying the data block width and height.
  • the width is specified first, and is constrained to be at least 2, hence avoiding ambiguities in reading the width and height.
  • the remainder of the data block can be treated as binary data which may contain any combination of vectors, text, or other information.
  • data blocks contain streetmap information including street names, address ranges, and vector representations.
  • the pointer and data layers are precomputed, just as the prerendered layer is.
  • Precomputation for the pointer and data layers consists of encoding all of the relevant vector data into data blocks, and packing both the pointers and data blocks as efficiently as possible into their respective images.
  • features tend to be well-separated, resulting in large empty areas in the pointer and data images. Where pointers do occur, they fall precisely on the feature to which they refer, and their corresponding data blocks are in turn often centered precisely on the pointer.
  • dense urban areas however (see Figure 3b), features are often too close together for the pointers and data blocks to all fit.
  • Efficient rectangle packing is a computationally difficult problem; however, there are numerous approximate algorithms for solving it in the computational geometry literature, and the present invention does not stipulate any particular one of these.
  • This algorithm always succeeds in ultimately placing a rectangle provided that somewhere in the image an empty space of at least the right dimensions exists. It is "greedy” in the sense that it places a single rectangle at a time; it does not attempt to solve the wholistic problem of packing n rectangles as efficiently as possible. (A wholistic algorithm would require defining an explicit measure of packing efficiency, specifying the desired tradeoff between minimizing wasted space and minimizing distance between rectangles and their "target points".
  • Figure 4 demonstrates the output of the basic packing algorithm for three cases. In each case, the algorithm sequentially placed a number of rectangles as near as possible to a common point. This solution to the rectangle packing problem is provided by way of example only.
  • Figure 4 Test output of the greedy rectangle packing algorithm. On the left, predominantly small, skinny rectangles; in the center, large, square rectangles; and on the right, a mixture.
  • Pointers are always 2x2 (our notation is rows x columns); however, for data blocks, there is freedom in selecting an aspect ratio: the required block area in square pixels is determined by the amount of data which must fit in the block, but this area can fit into rectangles of many different shapes.
  • a 24 byte data block (including 4 bytes of width and height information, and 20 bytes of arbitrary data) can be represented exactly as 1x24, 2x12, 3x8, 4x6, 6x4, 8x3, or 12x2. (24x1 is disqualified, as the block width must be at least 2 for the 2-byte width to be decoded before the block dimensions are known on the client side, as described above.)
  • the block can also be represented, with one byte left over, as 5x5.
  • Each of the three map layers — prerendered roads, pointers and data — is stored as a JPEG2000 or similar spatially-accessible representation.
  • the prerendered road layer need not be lossless; it is only necessary for it to have reasonable perceptual accuracy when displayed.
  • the pointer and data layers must be represented losslessly, as they contain data which the client must be able to reconstruct exactly. Lossless compression is not normally very efficient; typical digital imagery, for example, is not usually compressible losslessly by more than a factor of about two at best.
  • 16-bit unsigned values such as the width or height of the data block, would normally be encoded using a high-order byte and a low-order byte.
  • the high-order byte would be zero. Frequent zero high-order bytes followed by significant low-order bytes account for much of the 2-pixel periodicity apparent in parts of Figure 5a.
  • the left eight columns represent the first pixel of the pair, previously the high-order byte; the rightmost eight columns represent the second pixel, previously the low-order byte.
  • Similar techniques apply to 32-bit or larger integer values. These techniques are also extensible to signed quantities. For variables in which the sign changes frequently, as occurs for differential coding of a road vector, a sign bit can be assigned to position 0, and the absolute value encoded in alternating bytes as above. Note that to be drawn convincingly, road vector data must often be represented at greater than pixel precision. Arbitrary units smaller than a pixel can instead be used, or equivalently, subpixel precision can be implemented using fixed point in combination with the above techniques. In our exemplary embodiment, 4 subpixel bits are used, for 1/16 pixel precision.
  • the JPEG2000 representation (including lossy pre-rendered roadmap image, lossless pointer layer, and lossless data layer) is actually smaller than the compressed ZIP rile representing the original data as tabulated text.
  • This file is part of the United States Census Bureau's 2002 TIGER/Line database.
  • the new representation is ready to serve interactively to a client, with efficient support for continuously pannable and zoomable spatial access.
  • the original prerendered multiscale map invention introduced in [2] included not a single prerendered image, but a stack of such images, rendered at progressively coarser resolutions and with rescaled weights for lines (or other visible features). Although no features are omitted from any of these prerenditions, some features are de-emphasized enough to be clearly visible only in an aggregate sense, e.g. the local roads of a city become a faint grey blur at the s nationwide level.
  • the present invention can be extended to include pointer and data images corresponding to the coarser prerendered roadmap images, in which only a subset of the original vector objects are represented.
  • statewide pointer and data images which are at much lower resolution than those of Figures 1-3, might only include data for state and national highways, excluding all local roads. These coarser data may also be "abstracts", for example specifying only road names, not vectors. Images at different resolutions might include varying mixtures or subsets of the original data, or abstracted versions. This technique both allows all of the relevant data to fit into the smaller coarse images, and provides the client with the subset of the vector information relevant for navigation at that scale.
  • the prerendered images may also be in color. Further, the prerendered images may be displayed by the client in color even if they are single-channel images, since the vector data can be used to draw important roads in different colors than the prerendered material. Finally, the prerendered images may omit certain features or roads present in the vectorial data, relying on the client to composite the image and vectorial material appropriately.
  • CLAIM CLAIM:
  • a method of encoding images using rectangle packing and a JPEG representation A method of encoding images using rectangle packing and a JPEG representation.
  • the present invention relates generally to graphical zooming user interfaces (ZUI) for computers. More specifically, the invention is a system and method for progressively rendering zoomable visual content in a manner that is both computationally efficient, resulting in good user responsiveness and interactive frame rates, and exact, in the sense that vector drawings, text, and other non-photographic content is ultimately drawn without the resampling which would normally lead to degradation in image quality, and without interpolation of other images, which would also lead to degradation.
  • ZUI graphical zooming user interfaces
  • GUIs graphical computer user interfaces
  • visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out.
  • the desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • viewing maps browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets.
  • Even when viewing ordinary documents, such as spreadsheets and reports it is often useful to be able to glance at a document overview, and then zoom in on an area of interest.
  • zoomable components such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc.
  • these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally.
  • continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom and pan continuously in a user-friendly manner is absent from prior art systems.
  • a display is the device or devices used to output rendered imagery to the user.
  • a frame buffer is used to dynamically represent the contents of at least a portion of the display.
  • Display refresh rate is the rate at which the physical display, or portion thereof, is refreshed using the contents of the frame buffer.
  • a frame buffer's frame rate is the rate at which the frame buffer is updated.
  • the display refresh rate is 60- 90 Hz.
  • Most digital video for example, has a frame rate of 24-30 Hz.
  • each frame of digital video will actually be displayed at least twice as the display is refreshed.
  • Plural frame buffers may be utilized at different frame rates and thus be displayed substantially simultaneously on the same display. This would occur, for example, when two digital videos with different frame rates were being played on the same display, in different windows.
  • ZUT zooming user interfaces
  • LOD pyramid The complete set of LODs, organized conceptually as a stack of images of decreasing resolution, is termed the LOD pyramid — see Fig. 1.
  • LOD pyramid The complete set of LODs, organized conceptually as a stack of images of decreasing resolution, is termed the LOD pyramid — see Fig. 1.
  • the system interpolates between the LODs and displays a resulting image at a desired resolution. While this approach solves the computational issue, it displays a final compromised image that is often blurred and unrealistic, and often involves loss of information due to the fact that it represents interpolation of different LODs. These interpolation errors are especially noticeable when the user stops zooming and has the opportunity to view a still image at a chosen resolution which does not precisely match the resolution of any of the LODs.
  • a further object of the present invention is to allow the user to zoom arbitrarily far in on vector content while maintaining a crisp, unblurred view of the content and maintaining interactive frame rates.
  • a further object of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex vectorial content, while both preserving the overall appearance of the content and maintaining interactive frame rates.
  • a further object of the present invention is to diminish the user's perception of transitions between LODs or rendition qualities during interaction.
  • a further object of the present invention is to allow the graceful degradation of image quality by blurring when information ordinarily needed to render portions of the image is as yet incomplete.
  • a further object of the present invention is to gradually increase image quality by bringing it into sharper focus as more complete information needed to render portions of the image becomes available.
  • the final image is then displayed by preferably first displaying an intermediate final image.
  • the intermediate final image is the first image displayed at the desired resolution before that image is refined as described hereafter.
  • the intermediate final image may correspond to the image that would be displayed at the desired resolution using the prior art.
  • the transition from the intermediate final image to the final image may be gradual, as explained in more detail below.
  • the present invention allows LODs to be spaced in any resolution increments, including irrational increments (i.e. magnification or minification factors between consecutive LODs which cannot be expressed as the ratio of two integers), as explained in more detail below.
  • irrational increments i.e. magnification or minification factors between consecutive LODs which cannot be expressed as the ratio of two integers
  • portions of the image at each different LOD are denoted tiles, and such tiles are rendered in an order that minimizes any perceived imperfections to a viewer.
  • the displayed visual content is made up of plural LODs (potentially a superset of the surrounding LODs as described above), each of which is displayed in the proper proportion and location in order to cause the display to gradually fade into the final image in a manner that conceals imperfections.
  • the present invention involves a hybrid strategy, in which an image is displayed using predefined LODs during rapid zooming and panning, but when the view stabilizes sufficiently, an exact LOD is rendered and displayed.
  • the exact LOD is rendered and displayed at the precise resolution chosen by the user, which is normally different from the predefined LODs. Because the human visual system is insensitive to fine detail in the visual content while it is still in motion, this hybrid strategy can produce the illusion of continuous "perfect rendering" with far less computation.
  • Figure 1 depicts an LOD pyramid (in this case the base of the pyramid, representing the highest-resolution representation, is a 512x512 sample image, and successive nullifications of this image are shown in factors of 2);
  • Figure 2 depicts a flow chart for use in an exemplary embodiment of the invention
  • Figure 3 is another flow chart that shows how the system displays the final image after zooming
  • Figure 4 is the LOD pyramid of Figure 1 with grid lines added showing the subdivision of each LOD into rectangular tiles of equal size in samples;
  • Figure 5 is another flow chart, for use in connection with the present invention, and it depicts a process for displaying rendered tiles on a display
  • Figure 6 shows a concept termed irrational tiling, explained in more detail herein;
  • Figure 7 depicts a composite tile and the tiles that make up the composite tile, as explained more fully below.
  • Figure 2 shows a flowchart of a basic technique for implementation of the present invention.
  • the flowchart of Figure 2 represents an exemplary embodiment of the invention and would begin executing when an image is displayed at an initial resolution.
  • the invention may be used in the client/server model, but that the client and server may be on the same or different machines.
  • the actual hardware platform and system utilized are not critical to the present invention.
  • the flowchart is entered at start block 201 with an initial view of an image at a particular resolution.
  • the image is taken to be static.
  • the image is displayed at block 202.
  • a user may navigate that image by moving, for example, a computer mouse.
  • the initial view displayed at block 202 will change when the user navigates the image.
  • the underlying image may itself be dynamic, such as in the case of motion video, however, for purposes of this example, the image itself is treated as static.
  • any image to be displayed may also have textual or other vector data and/or nonvector data such as photographs and other images.
  • the present invention, and the entire discussion below, is applicable regardless of whether the image comprises vector or nonvector data, or both.
  • the method transfers control to decision point 203 at which navigation input may be detected. If such input is not detected, the method loops back to block 202 and continues displaying the stationary visual content. If a navigation input is detected, control will be transferred to block 204 as shown.
  • Decision point 203 may be implemented by a continuous loop in software looking for a particular signal that detects movement, an interrupt system in hardware, or any other desired methodology.
  • the particular technique utilized to detect and analyze the navigation request is not critical to the present invention. Regardless of the methodology used, the system can detect the request, thus indicating a desire to navigate the image.
  • the techniques are applicable to zooming, panning, or otherwise navigating. Indeed, the techniques described herein are applicable to any type of dynamic transformation or change in perspective on the image. Such transformations may include, for example, three dimensional translation and rotation, application of an image filter, local stretching, dynamic spatial distortion applied to selected areas of the image, or any other kind of distortion that might reveal more information.
  • block 204 will then render and display a new view of the image, which may be, for example, at a different resolution from the prior displayed view.
  • One straightforward prior art technique of displaying the new view is based upon interpolating LODs as the user zooms in or out.
  • the selected LODs may be those two LODs that "surround" the desired resolution; i.e.; the resolution of the new view..
  • the interpolation in prior systems, constantly occurs as the user zooms and is thus often implemented directly in the hardware to achieve speed.
  • the combination of detection of movement in decision point 205 and a substantially immediate display of an appropriate interpolated image at block 204 results in the image appearing to zoom continuously as the user navigates.
  • an interpolated image is sufficient to look realistic and clear. Any interpolation error is only minimally detectable by the human visual system, as- such errors are disguised by the constantly changing view of the image.
  • the system tests whether or not the movement has substantially ceased. This can be accomplished using a variety of techniques, including, for example, measuring the rate of change of one or more parameters of the view. That is, the methodology ascertains whether or not the user has arrived at the point where he has finished zooming. Upon such stabilization at decision point 205, control is transferred to block 206, where an exact image is rendered, after which control returns to block 203. Thus, at any desired resolution, the system will eventually display an exact LOD.
  • the display is not simply rendered and displayed by an interpolation of two predefined LODs, but may be rendered and displayed by re- rendering vector data using the original algorithm used to render the text or other vector data when the initial view was displayed at block 202.
  • Nonvector data may also be resampled for rendering and displayed at the exact required LOD.
  • the required re- rendering or resampling may be performed not only at the precise resolution required for display at the desired resolution, but also on a sampling grid corresponding precisely to the correct positions of the display pixels relative to the underlying content, as calculated based on the desired view.
  • 1.61 pixel in the display plane does not change the required resolution, but it does alter the sampling grid, and therefore requires re-rendering or resampling of the exact LOD.
  • the foregoing system of Fig. 2 represents a hybrid approach in which interpolation based upon predefined LODs is utilized while the view is changing (e.g. navigation is occurring) but an exact view is rendered and displayed when the view becomes substantially stationary.
  • the term render refers to the generation by the computer of a tile at a specific LOD based upon vector or nonvector data. With respect to nonvector data, these may be rerendered at an arbitrary resolution by resampling an original image at higher or lower resolution.
  • the intermediate image is displayed, block 304 is entered, which causes the image to begin to gradually fade towards an exact rendition of the image, which we term the final image.
  • the final image differs from the intermediate image in that the final image may not involve interpolation of any predefined LODs. Instead, the final image, or portions thereof, may comprise newly rendered tiles. In the case of photographic data, the newly rendered tiles may result from resampling the original data, and in the case of vector data, the newly rendered tiles may result from rasterization at the desired resolution.
  • step 304 is executed so the changeover from the intermediate final image to the final image is done gradually and smoothly.
  • This gradual fading sometimes called blending, causes the image to come into focus gradually when ' navigation ceases, producing an effect similar to automatic focusing in cameras or other optical instruments.
  • the illusion of physicality created by this effect is an important aspect of the present invention.
  • a first LOD may take a 1 inch by 1 inch area of a viewable object and generate a single 32 by 32 sample tile.
  • the information may al,so be rendered by taking the same 1 inch by 1 inch area and representing it as a tile that is 64 by 64 samples, and therefore at a higher resolution.
  • Tiling granularity which we will write as the variable g, is defined as the ratio of the linear tiling grid size at a higher- resolution LOD to the linear tiling grid size at the next lower-resolution LOD.
  • each LOD is subdivided into a grid of square or rectangular tiles containing a constant number of samples (except, as required, at the edges of the visual content).
  • g 2
  • Figure 6(b) illustrates the advantage gained by irrational tiling granularity.
  • Figure 6 shows cross-sections through several LODs of the visual content; each bar represents a cross-section of a rectangular tile.
  • the curves 601, drawn from top to bottom represent the bounds of the visible area of the visual content at the relevant LOD during a zooming operation: as the resolution is increased (zooming in to reveal more detail), the area under examination decreases.
  • Darker bars e.g., 602 represent tiles which have already been rendered over the course of the zoom. Lighter bars have not yet been rendered, so cannot be displayed.
  • irrational tiling granularity Another benefit of irrational tiling granularity is that it allows finer control of g, since there are a great many more irrational numbers than integers, particularly over the useful range where g is not too large. This additional freedom can be useful for tuning the zooming performance of certain applications. If g is set to the irrational square root of an integer (such as sqrt(2), sqrt(5) or sqrt(8)), then in the embodiment described above the grid lines of alternate LODs would align exactly; if g is an irrational cube root, then every third LOD would align exactly; and so on. This confers an additional benefit with respect to limiting the complexity of a composite tiling, as defined below.
  • An important aspect of the invention is the order in which the tiles are rendered. More particularly, the various tiles of the various LODs are optimally rendered such that all visible tiles are rendered first. Nonvisible tiles may not be rendered at all. Within the set of visible tiles, rendition proceeds in order of increasing resolution, so that tiles within low-resolution LODs are rendered first. Within any particular LOD, tiles are rendered in order of increasing distance from the center of the display, which we refer to as foveated rendering. To sort such tiles in the described order, numerous sorting algorithms such as heapsort, quicksort, or others may be used.
  • a lexigraphic key may be used for sorting "requests" to render tiles, such that the outer subkey is visibility, the middle subkey is resolution in samples per physical unit, and the inner subkey is distance to the center of the display.
  • Other methods for ordering tile rendering requests may also be used.
  • the actual rendering of the tiles optimally takes place as a parallel process with the navigation and display described herein. When rendering and navigation/display proceed as parallel processes, user responsiveness may remain high even when tiles are slow to render.
  • a tile represents vector data, such as alphabetic typography in a stroke based font
  • rendering of the tile would involve running the algorithm to rasterize the alphabetic data and possibly transmitting that data to a client from a server.
  • the data fed to the rasterization algorithm could be sent to the client, and the client could run the algorithm to rasterize the tile.
  • rendering of a tile involving digitally sampled photographic data could involve resampling of that data to generate the tile at the appropriate LOD. For discrete LODs that are prestored, rendering may involve no more than simply transmitting the tile to a client computer for subsequent display.
  • the actual display may comprise different mixes of different tiles from different LODs.
  • any portion of the display could contain for example, 20% from LOD 1, 40% from LOD 2, and 40% from LOD 3.
  • the algorithm attempts to render tiles from the various LODs in a priority order best suited to supply the rendered tiles for display as they are most needed.
  • the actual display of the rendered tiles will be explained in more detail later with reference to Figure 5.
  • a composite tile area or simply a composite tile.
  • To define a composite tile we consider all of the LODs stacked on top of each other. Each LOD has its own tile grid. The composite grid is then formed by the projection of all of the grids from all of the LODs onto a single plane. The composite grid is then made up of various composite tiles of different sizes, defined by the boundaries of tiles from all of the different LODs. This is shown conceptually in Fig. 7. Fig. 7 depicts the tiles from three different LODs, 701 through 703, all representing the same image. One can imagine the LODs 701 through 703 being stacked up on top of each other.
  • Fig. 5 depicts a flow chart of an algorithm for updating the frame buffer as tiles are rendered.
  • the arrangement of Fig. 5 is intended to operate on every composite tile in the displayed image each time the frame buffer is updated. Thus, for example, if a frame duration is 1/20 of a second, each of the composite tiles on the entire screen would preferably be examined and updated during each 1/20 of a second.
  • the composite tile may lack the relevant tiles in one or more LODs.
  • each composite tile attempts to display each composite tile as a weighted average of all the available superimposed tiles within which the composite tile lies: Note that composite tiles are defined in such a way that they fall within exactly one tile at any given LOD; hence the weighted average can be expressed as a relative proportion of each LOD. The process attempts to determine the appropriate weights for each LOD within the composite tile, and to vary those weights gradually over space and time to cause the image to gradually fade towards the final image discussed above.
  • the composite grid includes plural vertices which are defined to be any intersection or corner of gridlines in the composite grid. ' These are termed composite grid vertices.
  • the opacity can be expressed as a weight between 0.0 and 1.0, and the sum of all the LOD weights at each vertex should therefore be 1.0 if the desired result is for the image to be totally opaque.
  • the algorithm walks through the composite tiling once for each relevant LOD, beginning with the highest-resolution LOD.
  • the algorithm maintains the following variables: levelOpacityGrid and opacityGrid. Both of these variables are again numbers between 0.0 and 1.0, and are maintained for each vertex in the composite tiling.
  • the algorithm walks through each LOD in turn, in order from highest- resolution to lowest, performing the following operations. First 0.0 is assigned to levelOpacityGrid at all vertices. Then, for each rendered tile at that LOD (which may be a subset of the set of tiles at that LOD, if some have not yet been rendered), the algorithm updates the parts of the levelOpacityGrid touching that tile based on the tile's centerOpacity, cornerOpacity and edgeOpacity values:
  • vertex is entirely in the interior of the tile, then it gets updated using centerOpacity.
  • vertex If the vertex is e.g. on the tile's left edge, it gets updated with the left edgeOpacity.
  • vertex If the vertex is e.g. on the top right corner, it gets updated with the top right cornerOpacity.
  • Updating means the following: if the pre-existing levelOpacityGrid value is greater than 0.0, then set the new value to the minimum of the present value, or the value it's being updated with. If the pre-existing value is zero (i.e. this vertex hasn't been touched yet) then just set the levelOpacityGrid value to the value it's being updated with. The end result is that the levelOpacityGrid at each vertex position gets set to the minimum nonzero value with which it gets updated.
  • the algorithm then walks through the levelOpacityGrid and sets to 0.0 any vertices that touch a tile which has not yet been rendered, termed a hole. This ensures spatial continuity of blending: wherever a composite tile falls within a hole, at the current LOD, drawing opacity should fade to zero at all vertices abutting that hole.
  • the algorithm can then relax all levelOpacityGrid values to further improve spatial continuity of LOD blending.
  • the situation as described thus far can be visualized as follows: every vertex is like a tentpole,
  • the levelOpacityGrid value at that point are the tentpole's height.
  • the algorithm has thus far ensured that at all points bordering on a hole, the tentpoles have zero height; and in the interior of tiles that have been rendered, the tentpoles are set to some (probably) nonzero value. In the extreme case, perhaps all the values inside a rendered tile are set to 1.0. Assume for purposes of illustration that the rendered tile has no rendered neighbors yet, so the border values are 0.0. We have not specified how narrow the "margin" is between a 0.0 border tentpole and one of the 1.0 internal tentpoles. If this margin is too small, then even though the blending is technically continuous, the transition may be too sharp when measured as an opacity derivative over space.
  • the relax operation smoothes out the tent, always preserving values of 0.0, but possibly lowering other tentpoles to make the function defined by the tent surface smoother, i.e. limiting its maximum spatial derivative. It is immaterial to the invention which of a variety of methods are used to implement this operation; one approach, for example, is to use selective low-pass filtering, locally replacing every nonzero value with a weighted average of its neighbors while leaving zeroes intact. Other methods will also be apparent to those skilled in the art.
  • the algorithm then walks over all composite grid vertices, considering corresponding values of levelOpacityGrid and opacityGrid at each vertex: if levelOpacityGrid is greater than 1.0-opacityGrid, then levelOpacityGrid is set to 1.0- opacityGrid. Then, again for each vertex, corresponding values of levelOpacityGrid are added to opacityGrid. Due to the previous step, this can never bring opacityGrid above 1.0. These steps in the algorithm ensure that as much opacity as possible is contributed by higher-resolution LODs when they are available, allowing lower-resolution LODs to "show through" only where there are holes.
  • the final step in. the traversal of the current LOD is to actually draw the composite tiles at the current LOD 5 using levelOpacityGrid as the per-vertex opacity values.
  • levelOpacityGrid can be multiplied by a scalar overallOpacity variable in the range 0.0 to 1.0 just before drawing; this allows the entire image to be drawn with partial transparency given by the overallOpacity.
  • drawing an image-containing polygon, such as a rectangle, with different opacities at each vertex is a standard procedure. It can be accomplished, for example, using industry- standard texture mapping functions using the OpenGL or Direct3D graphics libraries.
  • the drawn opacity within the interior of each such polygon is spatially interpolated, resulting in a smooth change in opacity over the polygon.
  • tiles maintain not only their current values of centerOpacity, cornerOpacity and edgeOpacity (called the current values), but also a parallel set of values called targetCenterOpacity, targetCornerOpacity and targetEdgeOpacity (called the target values).
  • the current values are all set to 0.0 when a tile is first rendered, but the the target values are all set to 1.0. Then, after each frame, the current values are adjusted to new values closer to the target values.
  • newValue oldValue*(l- ⁇ ) + targetValue*£, where b is a rate in greater than 0.0 and less than 1.0.
  • a value of b close to 0.0 will result in a very slow transition toward the target value, and a value of b close to 1.0 will result in a very rapid transition toward the target value.
  • This method of updating opacities results in exponential convergence toward the target, and results in a visually pleasing impression of temporal continuity.
  • Other formulae can achieve the same result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Architecture (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)
  • Facsimiles In General (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Information Transfer Between Computers (AREA)
  • Facsimile Transmission Control (AREA)

Abstract

L'invention concerne un procédé et un dispositif facilitant la navigation réaliste sur contenu visuel, par affichage d'une image interpolée durant la navigation et d'une image plus exacte en fin de navigation. On décrit des procédés permettant de restituer et d'afficher des «pavés», c'est-à-dire des parties du contenu visuel à différents niveaux de détail qui visent à réduire au minimum les discontinuités susceptibles d'être perçues.
PCT/US2005/037226 2004-10-15 2005-10-17 Systeme et procede pour la gestion de la communication et/ou du stockage de donnees d'image WO2006052390A2 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP05851225.2A EP1810249A4 (fr) 2004-10-15 2005-10-17 Systeme et procede pour la gestion de la communication et/ou du stockage de donnees d'image
JP2007536990A JP4831071B2 (ja) 2004-10-15 2005-10-17 イメージ・データの通信および/または格納を管理するシステムおよび方法
CN2005800430579A CN101147174B (zh) 2004-10-15 2005-10-17 用于管理图像数据的传送和/或存储的系统和方法

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US61905304P 2004-10-15 2004-10-15
US61911804P 2004-10-15 2004-10-15
US61907004P 2004-10-15 2004-10-15
US60/619,118 2004-10-15
US60/619,070 2004-10-15
US60/619,053 2004-10-15
US11/141,958 US7546419B2 (en) 2004-06-01 2005-06-01 Efficient data cache
US11/141,958 2005-06-01

Publications (3)

Publication Number Publication Date
WO2006052390A2 true WO2006052390A2 (fr) 2006-05-18
WO2006052390A3 WO2006052390A3 (fr) 2007-03-01
WO2006052390A9 WO2006052390A9 (fr) 2007-07-12

Family

ID=36336931

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/037226 WO2006052390A2 (fr) 2004-10-15 2005-10-17 Systeme et procede pour la gestion de la communication et/ou du stockage de donnees d'image

Country Status (4)

Country Link
EP (1) EP1810249A4 (fr)
JP (1) JP4831071B2 (fr)
CN (1) CN101147174B (fr)
WO (1) WO2006052390A2 (fr)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1901189A2 (fr) * 2006-09-14 2008-03-19 Ricoh Company, Ltd. Système de gestion de pièces, procédé de gestion de pièces, programme informatique et support de stockage lisible par un ordinateur correspondants
WO2014137379A1 (fr) * 2013-03-06 2014-09-12 Dell Products, L.P. Système et procédé de gestion des instantanés d'un système de stockage
EP3005343A4 (fr) * 2013-05-31 2017-02-01 Freedom Scientific Inc. Repères de pointage personnalisables basés sur un vecteur
US9569400B2 (en) 2012-11-21 2017-02-14 International Business Machines Corporation RDMA-optimized high-performance distributed cache
US9628747B2 (en) 2014-05-09 2017-04-18 Lyve Minds, Inc. Image scrolling on a photo sharing device display
FR3048524A1 (fr) * 2016-03-07 2017-09-08 Datexim Systeme d'affichage a distance d'une image medicale
WO2020007460A1 (fr) * 2018-07-04 2020-01-09 Telefonaktiebolaget Lm Ericsson (Publ) Dispositif sans fil, nœud serveur informatique, et procédés associés
CN112445727A (zh) * 2020-11-27 2021-03-05 鹏城实验室 基于视口特征的边缘缓存置换方法及装置
CN113031896A (zh) * 2021-03-31 2021-06-25 卡莱特云科技股份有限公司 文本循环滚动播放方法、播放控制装置及计算机设备
CN113568996A (zh) * 2021-07-29 2021-10-29 西安恒歌数码科技有限责任公司 一种基于osgEarth的多图层掉帧优化方法及系统
WO2022026966A1 (fr) * 2020-07-29 2022-02-03 Google Llc Surlissage d'images progressives
US11544818B2 (en) * 2018-03-01 2023-01-03 Nvidia Corporation Enhancing high-resolution images with data from low-resolution images
US20230005109A1 (en) * 2019-09-10 2023-01-05 Aerospace Information Research Institute, Chinese Academy Of Sciences Remote Sensing Image Geometric Normalization Method and Apparatus

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2462589B (en) * 2008-08-04 2013-02-20 Sony Comp Entertainment Europe Apparatus and method of viewing electronic documents
CN102411763A (zh) * 2010-09-20 2012-04-11 湖南科创信息技术股份有限公司 基于3g网络的移动车险查勘方法及系统
EP2636025B1 (fr) * 2010-11-05 2019-12-18 Koninklijke Philips N.V. Prediction basé sur le contenue des images et contrôleur de cache d'image
US20130166390A1 (en) * 2011-12-27 2013-06-27 Anthony T. BLOW Crowd-determined file size uploading methods, devices and systems
CN103337090B (zh) * 2013-06-17 2016-07-13 清华大学 月球模型远程交互浏览可视化方法、客户端及系统
US9247379B2 (en) * 2014-01-27 2016-01-26 Qualcomm Incorporated Method and apparatus for hierarchical map tiling
KR102344096B1 (ko) * 2014-02-21 2021-12-29 소니그룹주식회사 송신 장치, 송신 방법, 수신 장치 및 수신 방법
US9978126B2 (en) 2014-04-30 2018-05-22 Empire Technology Development Llc Image resolution modification
US9710893B2 (en) * 2014-04-30 2017-07-18 Empire Technology Development Llc Image resolution modification
CN106716386B (zh) * 2014-10-07 2020-05-29 谷歌有限责任公司 使用页面过滤器和系统mmu的硬件辅助存储器压缩管理
CN107004135A (zh) * 2014-10-15 2017-08-01 核心健康有限责任公司 大图像文件的远程查看
CN105653496B (zh) * 2016-03-18 2018-08-31 联想(北京)有限公司 电子设备及其数据传输方法
US10460704B2 (en) * 2016-04-01 2019-10-29 Movidius Limited Systems and methods for head-mounted display adapted to human visual mechanism
CN107331222B (zh) * 2016-04-29 2019-11-15 北京学而思教育科技有限公司 一种图像数据处理方法及装置
CN106060382A (zh) * 2016-05-27 2016-10-26 北京金山安全软件有限公司 一种图像处理方法、装置及电子设备
US10235612B2 (en) * 2016-07-29 2019-03-19 Canon Kabushiki Kaisha Information processing apparatus, information processing method, storage medium, and image forming apparatus for converting drawing data of a transparent object that does not overlap another drawing object into drawing data of a drawing object that does not have an alpha channel as color information
EP3282588B1 (fr) * 2016-08-09 2019-09-25 Siemens Aktiengesellschaft Procédé, système et produit de programmation pour la transmission de données avec une quantité réduite de données
CN108573513B (zh) * 2017-03-14 2021-08-03 腾讯科技(深圳)有限公司 随机元素生成方法及随机元素生成装置
US10949947B2 (en) 2017-12-29 2021-03-16 Intel Corporation Foveated image rendering for head-mounted display devices
CN109345629A (zh) * 2018-08-08 2019-02-15 安徽慧软科技有限公司 一种三维医学图像模糊凸显显示方法
CN112416347B (zh) * 2020-11-25 2022-07-12 中睿信数字技术有限公司 一种基于栅格和动态磁贴的网页布局方法
CN113673405B (zh) * 2021-08-14 2024-03-29 深圳市快易典教育科技有限公司 基于题目识别的习题批改方法、系统及智能家教学习机
CN117627176B (zh) * 2024-01-25 2024-03-26 华南理工大学 一种大尺度三维晶格结构的3d空间打印方法
TWI863841B (zh) * 2024-03-13 2024-11-21 國立雲林科技大學 圖面之文字及圖元辨識系統及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182114B1 (en) 1998-01-09 2001-01-30 New York University Apparatus and method for realtime visualization using user-defined dynamic, multi-foveated images
WO2002095608A1 (fr) 2001-05-23 2002-11-28 New York University Procede et systeme de distribution de donnees foveales dans un reseau
US7075535B2 (en) 2003-03-05 2006-07-11 Sand Codex System and method for exact rendering in a zooming user interface
US7254271B2 (en) 2003-03-05 2007-08-07 Seadragon Software, Inc. Method for encoding and serving geospatial or other vector data as images
US7546419B2 (en) 2004-06-01 2009-06-09 Aguera Y Arcas Blaise Efficient data cache

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5341466A (en) * 1991-05-09 1994-08-23 New York University Fractal computer user centerface with zooming capability
US6348921B1 (en) * 1996-04-12 2002-02-19 Ze Hong Zhao System and method for displaying different portions of an object in different levels of detail
US6496607B1 (en) * 1998-06-26 2002-12-17 Sarnoff Corporation Method and apparatus for region-based allocation of processing resources and control of input image formation
SE513353C2 (sv) * 1998-10-21 2000-08-28 Ericsson Telefon Ab L M Partiell hämtning av bilder i den komprimerade domänen
GB9926131D0 (en) * 1999-11-05 2000-01-12 Superscape Limited Image enhancement
US6453330B1 (en) * 1999-11-24 2002-09-17 Ati International Srl High-precision bilinear interpolation
JP2002330951A (ja) * 2001-05-11 2002-11-19 Canon Inc 画像符号化装置及び復号装置及び方法及びコンピュータプログラム及び記憶媒体
DE10300048B4 (de) * 2002-01-05 2005-05-12 Samsung Electronics Co., Ltd., Suwon Verfahren und Vorrichtung zur Bildcodierung und -decodierung
US6885939B2 (en) * 2002-12-31 2005-04-26 Robert Bosch Gmbh System and method for advanced 3D visualization for mobile navigation units

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182114B1 (en) 1998-01-09 2001-01-30 New York University Apparatus and method for realtime visualization using user-defined dynamic, multi-foveated images
WO2002095608A1 (fr) 2001-05-23 2002-11-28 New York University Procede et systeme de distribution de donnees foveales dans un reseau
US7075535B2 (en) 2003-03-05 2006-07-11 Sand Codex System and method for exact rendering in a zooming user interface
US7254271B2 (en) 2003-03-05 2007-08-07 Seadragon Software, Inc. Method for encoding and serving geospatial or other vector data as images
US7546419B2 (en) 2004-06-01 2009-06-09 Aguera Y Arcas Blaise Efficient data cache

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1810249A4

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1901189A2 (fr) * 2006-09-14 2008-03-19 Ricoh Company, Ltd. Système de gestion de pièces, procédé de gestion de pièces, programme informatique et support de stockage lisible par un ordinateur correspondants
EP1901189A3 (fr) * 2006-09-14 2015-04-22 Ricoh Company, Ltd. Système de gestion de pièces, procédé de gestion de pièces, programme informatique et support de stockage lisible par un ordinateur correspondants
US9569400B2 (en) 2012-11-21 2017-02-14 International Business Machines Corporation RDMA-optimized high-performance distributed cache
US9575927B2 (en) 2012-11-21 2017-02-21 International Business Machines Corporation RDMA-optimized high-performance distributed cache
WO2014137379A1 (fr) * 2013-03-06 2014-09-12 Dell Products, L.P. Système et procédé de gestion des instantanés d'un système de stockage
US10346079B2 (en) 2013-03-06 2019-07-09 Dell Products, L.P. System and method for managing storage system snapshots
EP3005343A4 (fr) * 2013-05-31 2017-02-01 Freedom Scientific Inc. Repères de pointage personnalisables basés sur un vecteur
US9628747B2 (en) 2014-05-09 2017-04-18 Lyve Minds, Inc. Image scrolling on a photo sharing device display
FR3048524A1 (fr) * 2016-03-07 2017-09-08 Datexim Systeme d'affichage a distance d'une image medicale
WO2017153642A1 (fr) * 2016-03-07 2017-09-14 Datexim Sas Système d'affichage à distance d'une image médicale
US11544818B2 (en) * 2018-03-01 2023-01-03 Nvidia Corporation Enhancing high-resolution images with data from low-resolution images
WO2020007460A1 (fr) * 2018-07-04 2020-01-09 Telefonaktiebolaget Lm Ericsson (Publ) Dispositif sans fil, nœud serveur informatique, et procédés associés
US20230005109A1 (en) * 2019-09-10 2023-01-05 Aerospace Information Research Institute, Chinese Academy Of Sciences Remote Sensing Image Geometric Normalization Method and Apparatus
WO2022026966A1 (fr) * 2020-07-29 2022-02-03 Google Llc Surlissage d'images progressives
US12223627B2 (en) 2020-07-29 2025-02-11 Google Llc Oversmoothing progressive images
CN112445727A (zh) * 2020-11-27 2021-03-05 鹏城实验室 基于视口特征的边缘缓存置换方法及装置
CN112445727B (zh) * 2020-11-27 2023-08-25 鹏城实验室 基于视口特征的边缘缓存置换方法及装置
CN113031896A (zh) * 2021-03-31 2021-06-25 卡莱特云科技股份有限公司 文本循环滚动播放方法、播放控制装置及计算机设备
CN113031896B (zh) * 2021-03-31 2023-01-20 卡莱特云科技股份有限公司 文本循环滚动播放方法、播放控制装置及计算机设备
CN113568996A (zh) * 2021-07-29 2021-10-29 西安恒歌数码科技有限责任公司 一种基于osgEarth的多图层掉帧优化方法及系统
CN113568996B (zh) * 2021-07-29 2023-05-16 西安恒歌数码科技有限责任公司 一种基于osgEarth的多图层掉帧优化方法及系统

Also Published As

Publication number Publication date
JP2008517540A (ja) 2008-05-22
WO2006052390A3 (fr) 2007-03-01
JP4831071B2 (ja) 2011-12-07
EP1810249A2 (fr) 2007-07-25
WO2006052390A9 (fr) 2007-07-12
CN101147174A (zh) 2008-03-19
EP1810249A4 (fr) 2014-09-10
CN101147174B (zh) 2011-06-08

Similar Documents

Publication Publication Date Title
WO2006052390A2 (fr) Systeme et procede pour la gestion de la communication et/ou du stockage de donnees d'image
AU2006230233B2 (en) System and method for transferring web page data
CA2812008C (fr) Procedes et appareil de navigation d'une image
US11023094B2 (en) Collaborative, multi-user system for viewing, rendering, and editing 3D assets
US6631240B1 (en) Multiresolution video
US7554543B2 (en) System and method for exact rendering in a zooming user interface
CN108647336B (zh) 一种利用关键比例尺以及类瓦片技术处理矢量图的方法
Rusinkiewicz et al. Streaming QSplat: A viewer for networked visualization of large, dense models
Sample et al. Tile-based geospatial information systems: principles and practices
CN108664619B (zh) 一种类瓦片技术的海量线划地形图本原存储与调度方法
EP1756521A2 (fr) Procede de codage et de service de donnees geospatiales ou autres donnees vectorielles sous forme d'images
US6618053B1 (en) Asynchronous multilevel texture pipeline
US7224361B2 (en) System and method for multiple node display
US7930434B2 (en) System and method for managing communication and/or storage of image data
Potmesil Maps alive: viewing geospatial information on the WWW
US20110148886A1 (en) Method and system for receiving an indexed look-up table and viewing a vector animation sequence
US8234558B2 (en) Adaptive artwork for bandwidth- and/or memory-limited devices
CN101501664A (zh) 用于传送网页数据的系统和方法
JP2008535098A (ja) ウェブページデータを転送するシステムおよび方法
CA2558833C (fr) Procedes et appareil de navigation d'une image
Sohn et al. Volumetric video compression for interactive playback
Zhang et al. A web-mapping system for real-time visualization of the global terrain
Marschallinger et al. Presenting 3-D models of geological materials on the World Wide Web
Rodríguez et al. Scalable exploration of highly detailed and annotated 3D models.
Triantafyllos et al. A VRML Terrain Visualization Approach

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200580043057.9

Country of ref document: CN

AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2005851225

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2007536990

Country of ref document: JP

NENP Non-entry into the national phase in:

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 3251/DELNP/2007

Country of ref document: IN

WWP Wipo information: published in national office

Ref document number: 2005851225

Country of ref document: EP

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载