+

HK1123865A - Use of image-derived information as search criteria for internet and other search engines - Google Patents

Use of image-derived information as search criteria for internet and other search engines Download PDF

Info

Publication number
HK1123865A
HK1123865A HK09103248.5A HK09103248A HK1123865A HK 1123865 A HK1123865 A HK 1123865A HK 09103248 A HK09103248 A HK 09103248A HK 1123865 A HK1123865 A HK 1123865A
Authority
HK
Hong Kong
Prior art keywords
image
search
server
information
search term
Prior art date
Application number
HK09103248.5A
Other languages
Chinese (zh)
Inventor
Wayne C. Boncyk
Ronald H. Cohen
Original Assignee
埃韦里克斯技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 埃韦里克斯技术股份有限公司 filed Critical 埃韦里克斯技术股份有限公司
Publication of HK1123865A publication Critical patent/HK1123865A/en

Links

Description

Using image-derived information as search criteria for internet and other search engines
This application is a partial continuation of 09/992942 filed on 5/11/2001, 09/992942 claiming priority from provisional application No. 60/317521 filed on 5/9/2001 and provisional application No. 60/246295 filed on 6/11/2000; also, provisional application nos. 60/630524 and 60/625526 filed on provisional application No. 11/22 of 2004 and utility application No. 10/492243 filed on provisional application No. 11/4/04 of 2004 and 5/20 of 2004 are also required to be given priority, utility application No. 10/492243 is required to be given priority to PCT/US02/35407 filed on 11/5 of 2002, and PCT/US02/35407 is required to be given priority to utility application No. 09/992942 filed on 11/5 of 2001. These and all other referenced patents and applications are incorporated herein by reference in their entirety. The definitions of terms provided herein are to be considered controlling when the definitions or uses of the terms in the references that are incorporated by reference are inconsistent or contrary to the definitions of the terms provided herein.
Technical Field
The field of the invention is digital imaging.
Background
Several years ago, the present inventors first proposed the idea of using digitally captured images to identify objects within the images, and then using such identification to retrieve information from various databases. Examples include:
● use a local device (cell phone, digital camera, PDA or other device) to capture an image of an object in the art gallery, identify the object from the image data, and then provide information about the object (i.e., related to the object) to the user;
● use a local device (cell phone, digital camera, PDA or other device) to capture an image of a car travelling along a road, identify make and model from the image data, and then provide the user with a link to a website associated with that particular make and model;
● use a local device (cell phone, digital camera, PDA or other device) to capture an image of a bar code, logo or other mark in a magazine, identify a product using the information contained in the mark, and provide a phone number or other contact information associated with that product;
● use a local device (cell phone, digital camera, PDA or other device) to take a billboard at a restaurant, identify the restaurant from a bar code, special object, written language or other information contained in the photograph, and use that information to access a database to provide the user with the restaurant's location, menu or phone number; and is
● use a local device (cell phone, digital camera, PDA or other device) to capture an image of the marker at the stadium, use information extracted from the image to automatically purchase tickets for the user, and provide the user with a long queue entry code that can be used to bypass ordinary ticket purchasers.
In such embodiments, it is specifically contemplated that the analysis of the image may be performed locally (i.e., on a cell phone, PDA, or other device that captures the image), remotely on a server, or more preferably using some combination of the two. It is also contemplated that any available database, including publicly accessible databases on the internet, may be accessed to provide the returned information. However, it is not appreciated that these concepts may be combined with the search capabilities of standard search engines.
In the 90 s of the 19 th century, Yahoo!TMThe idea of indexing web pages accessible on the internet and providing a search engine to access the index is introduced. Since then, many other search systems have been developed that use all manner of various search methods, algorithms, hardware, and/or software. All such systems and methods that accept input of key information for a user and then use such key information to provide information of interest to the user are referred to herein as search engines. Of course, the user may be a natural person as well as a device (computing device or other device), an algorithm, a system, an organization, or any other entity. In searching for information, the search engine may use any suitable search domain including, for example:
● databases (including, for example, relational databases, object databases, or XML databases);
● network resources including, for example, web pages accessible within the Internet; and
● public or private files or collections of information (e.g., files, information, and/or company or other organizational messages), such as by LEXISTMThe saved set.
In a typical search, key information is provided to a search engine in the form of keywords comprising text, numbers, strings, or other machine-readable types of information. The search engine then searches the index of web pages for matches and returns to the user a list of hyperlinks to internet uniform resource locators ("URLs") along with some brief display of the context in which the keyword was used. Information of interest can sometimes be found in a list of hyperlinks, but more commonly by linking directly to the listed web pages.
Providing key information to a search engine in the form of text strings has inherent difficulties. Which includes policies in the selection of text to be entered, even with respect to the format of the keywords (e.g., using wildcards). Another difficulty is that small computing and/or telephony devices (e.g., telephones, including mobile and non-mobile telephones) have small and/or limited keyboards, thus making text entry difficult.
Disclosure of Invention
The present invention provides apparatus, systems, and methods, wherein: (a) capturing digital photographs, video, MPEG, AVI or other images using a camera-equipped cell phone, PDA or other image capture device; (b) automatically extracting or deriving keywords or other search criteria from the image; (c) the search criteria are submitted to a search engine to obtain information of interest; and (d) at least a portion of the resulting information is transmitted back to or near the device that captured the image.
Some images so used will include symbolic content that is, by itself and by nature, sufficiently relatively unobscured. Such symbolic content may be, for example, a telephone number or a website address. In this case, the symbolic content search criteria may be advantageously used as text in the search criteria. In other cases, considerable additional processing may be required. For example, an image of a car may need to be processed to determine make and model, and this information (e.g., Mercedes)TMS500TM) May then be sent to a search engine to be used as a keyword for a search. It is also contemplated that the processing of some images will only result in best guesses. Thus, the side view of the car cannot be analyzed for a particular make and model, and in this case, the system may provide more general terms such as SUV or car.
In general, the present invention provides techniques and processes that can be adapted to link objects or images to information via a network, such as the Internet, without requiring modification of the linked objects. Conventional methods for linking objects to digital information, including applying a barcode, radio or optical transceiver or transmitter, or some other means of identification to an object, or modifying an image or object to encode detectable information therein, need not be used, as the image or object may be identified solely by its appearance. A user or device may even interact with an object by "linking" to the object. For example, a user may link to a vending machine by "pointing and clicking" the vending machine. His device will be connected via the internet to the company that owns the vending machine. The company then establishes a connection to the vending machine whereby the user will establish a communication channel with the vending machine and can interact with it.
The present invention contemplates any suitable decomposition algorithm. It is clear that faster and more accurate algorithms are preferred over slower and less accurate algorithms. It is particularly preferred that the algorithm is chosen such that at least some of the processing can take place locally at the device that captured the image. Such processing may in many cases eliminate the need to wirelessly transmit detailed images, and may eliminate reliance on remote servers that may be oversubscribed. Thus, some or all of the image processing, including image/object detection and/or decoding of symbols detected in the image, may be arbitrarily distributed between the mobile (client) device and the server. In other words, some processes may be performed in the client device and some processes in the server without specifying which specific process is performed in each, or all processes may be performed on one platform or another, or the platforms may be combined so that there is only one platform. Image processing can be implemented in a parallel computational manner, thus facilitating scaling of the system with respect to database size and incoming traffic load.
It is also contemplated that some suitable algorithm will take into account the position and orientation of the object relative to the user at the time the image is captured, which may be determined from the appearance of the object in the image. This may be the location and/or identity of a person scanned by multiple cameras in a passive locator system that is more accurate than GPS or available for areas that cannot receive GPS signals, the location of a particular car that does not need to be transmitted from the car, in a security system, and many other uses.
It is therefore an object of the present invention to provide a system and method for identifying a digitally captured image without modifying the object.
Another object is to use the digital capture device in a manner never considered by the manufacturer of the digital capture device.
Another object is to allow identification of objects from partial views of the objects.
Another object is to provide means of communication with an operating device without requiring a common connection thereto.
Various other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of preferred embodiments of the invention, along with the accompanying drawings in which like reference numerals refer to like components.
Drawings
FIG. 1 is a schematic block diagram top-level algorithm flow diagram;
FIG. 2 is an idealized view of image capture;
FIGS. 3A and 3B are schematic block diagrams of processing details of the present invention;
FIG. 4 is a schematic block diagram of a different interpretation of the present invention;
FIG. 5 is a schematic block diagram similar to FIG. 4 for cellular telephone and Personal Data Assistant (PDA) applications; and
FIG. 6 is a schematic block diagram of a spacecraft application.
FIG. 7 is a schematic diagram of a system in which a local device captures an image, automatically derives search terms from the image, submits the search terms to a search engine to produce a result set, and information from the result set is sent back to the device.
Detailed Description
FIGS. 1-6 are reproduced from the priority PCT application PCT/US02/35407 filed on 5.11.2002. A discussion of those figures is given below in this application.
Search engine related embodiments
In fig. 7, system 400 generally includes a portable imaging device 410, a remote server 420, an electronic communications network 425, and a search engine 430.
Generally, the portable device 410 captures an image 412 of an object 415; and sends information 413 about the image to the server 420. At least one of the device 410 and the server 420 derives the search terms 421A, 421B from at least one of the image 412 and the transmitted information 413, respectively. At least one of the device 410 and the server 420 causes the search terms 421A, 421B to be submitted to the search engine 430 via the network 425, the search engine 430 using an index 432 of web pages or other information. The search engine then uses the search terms 421A, 421B to generate a result set 434, and causes at least a portion of the result set 434 to be transmitted back to the portable device 410. In the discussion above, it should be appreciated that information about an image may include the entire image, one or more subsets of the image, and names or other information derived from but not contained within the image. It should also be understood that one may use a proxy server between his/her portable device and the server, and in short, the present application contemplates any complexity of using indirect communication (not necessarily a direct connection) between the mobile client and the server.
Device 410 may be a cellular phone, PDA, notebook computer, or any other portable device that optically captures images. By "optically capturing" is meant some kind of photosensitive array, the output of which may be processed to include a visually perceptible image. From another perspective, the device 410 can be any camera with telephone capabilities, particularly with cellular telephone capabilities. With current technology, the device 410 typically has a lens or other optical focusing mechanism, although it is contemplated that advances in electronics may eliminate the need for any physical focusing mechanism. Devices without optical components do not meet the term "optically capture" and are only capable of downloading images from the internet or other sources.
It is of course contemplated that a cellular telephone or other device providing the services discussed herein will operate software that allows it to do so. That software may reside on the device, on external memory (memory card), or enter the page as needed.
Object 415 (referred to as "thing of interest" in one or more priority applications) can be any visually perceptible object regardless of dimension. "two-dimensional objects" are considered to include objects in which relevant information is substantially in a two-dimensional format, including advertisements and articles in magazines or other printed media, as well as photographs or designs on billboards, street signs, restaurants or other commercial signs, user manuals, and drawings at museums, to name a few.
The three-dimensional object under consideration comprises substantially all physical objects in which relevant information is derived from the shape of the object and/or the appearance of the surface of the object. Thus, an automobile is considered herein to have three related dimensions, where shape and other dimensions convey information about make and model. Similarly, a window within a building may be considered to have three related dimensions, where the identity of the manufacturer or distributor may be gleaned from the overall physical dimensions, details, etc. As another example, the beverage container may be considered to have three dimensions; the information may be obtained from the shape of the container, but further information may also be obtained from labels, printed matter, logos, text or other such visual indicia on the container (obtaining information from visual indicia on the container enables distinguishing between different containers having the same physical shape). The three-dimensional object under consideration includes substantially all physical objects in which relevant information is derived from changes over time. For example, the speed of a bird or its flight pattern or the gender of a person may be captured in multiple images over a period of time and may be relevant information and may be narrowed down to search terms for submission to a search engine (referred to as key information in one or more of the priority files). Of course, many objects are considered herein to have two, three, or four dimensions. Thus, the relevant information of the car can be provided by each of the two-dimensional identification on the side of the car, the three-dimensional shape of the car and its four-dimensional acceleration or processing characteristics.
It is specifically contemplated that the objects may include animated and non-animated objects. The animated object includes a person's face and biometric information such as a fingerprint pattern on the person's finger and the person's iris, etc.
Image 412 is considered to be any array of pixels. In most cases, the pixels are regularly arranged, but this is not absolutely necessary. In most cases, the number of pixels is also greater than 19,200(160 × 120), such as 78,800(320 × 240), but their number may be less than this. More preferred images have a larger pixel count, including, for example, 256,000(640 x 400), more preferably at least 2 million, and even more preferably at least 4 million. It is not necessary to actually construct the image in the portable device. Thus, the statement "a portable device captures an image of an object" includes the case where the device receives data and derives data from light emanating or emitted from the object, even if the data is never provided to the user as a visually perceptible image, and even if the data is sent to a remote server without being collected into an image by the device.
The information sent to the server may include any relevant information about the content of the image. Thus, information 413 may include the entire image or a portion of the image. For example, when a user takes a picture of a barcode (whether in two-dimensional, three-dimensional, or any other configuration), the device 410 may process the image 412 to remove color and all background except the barcode itself, and then send only a portion of the image containing the barcode as the sent information 413. In other cases, it is contemplated that the device 410 may process the image 413 sufficiently to derive one or more keywords, and then only transmit the keywords as the transmitted information 413. All possible combinations are also contemplated. Thus, the user can photograph GucciTMA picture of the handbag, from which the device 412 can derive the word "Gucci", subtracting the background except for the handbag, and then send: (a) the word "Gucci"; and (b) an image of the handbag, as transmitted information 413. In this case, the process may be iterative. Thus, the device may start sending the word "Gucci" as the first transmission information, receive a result set indicating a clothing accessory from the search engine, then subtract the background except for the handbag, and send the handbag mapLike the information sent as the second. As described above, it is specifically contemplated that the device 410 may send numerical/digital data mathematically derived from the image to the server 420. Examples include that server 420 may be used to server image features and characteristics in the process without sending the original image.
It should now be clear that the information being transmitted need not be limited to image information. Scenes, sounds, text and other information of all kinds may be included in the transmitted information, some of which may be derived directly from the image and some of which may be derived indirectly from the image. Additionally, the device 410 may also capture non-visual information, such as sound, and the information may also be transmitted. Thus, it is contemplated that the device may capture the sound of a frog, capture an image of a lake or forest, and send both as search terms (or further analyzed as search terms).
The remote server 420 is remote in the sense that it has no hardwired link to the device 410. The server 420 may be a single device or any number of devices coupled together, as in a server farm (server farm), for example. All manner of appropriate servers are contemplated. Thus, the server may operate using any reasonable hardware, using any reasonable software and communication protocols, and so forth.
The various analysis tasks described above may be distributed between the server 420 and the device 410 in any suitable manner in interaction with the device. For example, in contrast to GucciTMIn the iterative operation of the handbag as described above, it is contemplated that the device 410 may analyze the image sufficiently to send the term "Gucci" as an initial search term to the search engine 430, and the server 420 may then undertake the following tasks: the background of the image other than the handbag is subtracted, and the image of the handbag is sent as the second search term.
In another example, the server 420 may determine that the original image provides insufficient information and send a message to the user via the device 410 directing the user to take another image (such as from another angle, closer, or in more detail). In fact, the server 420 may direct the user to take an image of another object in its entirety to help determine the identity of the first object. Thus, a user may take a first image of a pay-for-play display at a ball game, provide that image to a server for identification, and then instruct the user to take an image of a credit card with which the user is to pay for entry to the ball game. The server may then process the payment against the credit card and provide an access code that the user may print to pass through an electronically controlled gate.
In another example, a user may use his cell phone to capture an image of a screwdriver suite at a hardware store, and the cell phone may go to GoogleTMOr some other search engine sends information derived from the image to find a comparison price. The server 420 may then instruct the user to reverse the package and take another image of the kit, this time from the back of the package. In this manner, repeated interactions are made between the user's device, the server, and the search engine.
It should also be appreciated that there are embodiments where the search engine never communicates with the portable device. For example, the server may conduct search queries, get results, and provide them to the portable device, or even to a television or other device in addition to the portable device.
The term "search engine" is contemplated herein to include any system that is dedicated to indexing, searching, and retrieving information. Such as GoogleTM、Yahoo!TM、MSNTMAnd Alta VistaTMMost of the familiar search engines of (a) focus primarily or exclusively on indexing web pages from the world wide web portion of the internet. Such as Lexis/NexisTMOther search engines of (2) focus on indexing a proprietary collection of data, which may include links to internet web pages. The term "search term" is considered herein to include an indexing system used by search engines to access themAny keywords or other information. In the case of most web-based search engines, the keywords are currently text. In this case, the user typically enters one or more keywords, where the term "keyword" is used in a broad sense to include: (a) words that may be found in a dictionary; (b) proper names, strings of numbers, and other terms not found in any dictionary; and (c) characters that are interpreted as wildcards, truncated words, etc. Such search engines have begun to be tested using non-text keywords including, for example, images and/or sounds. All such possible keywords fall within the scope of the search term under consideration.
Thus, the search term under consideration includes a keyword, a portion of an image, and a logo, barcode, or other symbol. It is specifically contemplated that in some cases, the image will contain text for the search term (e.g., the name of the movie on the movie poster), and in some cases, the image will not contain such text (e.g., a picture of a tree or other plant, where the search term is the name of the plant). In either case, the device and/or server in any combination may perform one or more tasks of deriving and submitting search terms to one or more search engines.
Network 425 may be any operational electronic network including public and private access networks and combinations of both. Preferred networks include the internet, the upcoming internet II, and cellular telephone networks, among others. Although not explicitly shown, the communication lines in fig. 7 are all considered to be either unidirectional or bidirectional communications, as appropriate. Also, it is contemplated that multiple networks are typically included. Thus, for example, communication between device 410 and server 420 would likely occur over some combination of a cellular telephone (not shown) and an internet network (e.g., 425), while communication between the server and the search engine would likely occur over some combination of the internet and a local server farm network.
The result set 434 may be any size and composition, but is likely to be customized to fit the device 410. It is of little benefit, for example, sending tens of web pages to a cell phone, which does not have enough display area to view them correctly. Thus, it is contemplated that the result set 434 may be pruned or otherwise processed by a server before being transmitted to the device 410 (the server is of course generally represented by the numeral 420 and need not be the same block as used earlier in the transmission of the transmitted information 413). Thus, server 420 or some other processor may process the results before providing them to device 410, such as when the search terms are submitted to a search engine by server 420 rather than device 410. However, the device 410 may also directly access the search engine using search information provided by the server. The four search modes considered include the following:
1. the server 420 constructs a search URL (consisting of a search engine address and keywords) and transmits it to the portable device 410. The portable device then executes a search engine query by sending a search URL to the search engine, and the search engine sends one or more web pages back to the portable device.
2. The server 420 sends the keywords and optionally the search engine address to the portable device 410. The portable device constructs a search URL, sends a search query to a search engine, and receives one or more web pages in response.
3. Server 420 sends a search query to a search engine and receives a response. The server optionally processes the search response (which may be in any form) and provides some result to the portable device. The result may for example comprise a file sent to the portable device or a web page on some server and the URL of that web page is sent to the portable device.
4. In any of the above modes or "direct link" modes, the results may not be a search results page, but rather some other type of information or behavior. For example, a server may identify an object and thereby send code to another server that causes an action to occur. An example of this is the use of a cell phone to click on a vending machine to purchase something from the vending machine. Another example is clicking on a TV list in a newspaper, causing the server to change the channel of the TV set in front of the user.
Thus, the statement "the search engine causes at least a portion of the result set 434 to be sent back to the portable device 410" should be interpreted herein as indicating that at least some of the information related to the result set (which may or may not be verbatim included in the result set) is sent back to the device, whether directly or indirectly through the search engine. It is specifically contemplated that the result set may include at least one hyperlinked address.
It is specifically contemplated that the result set may include the following types of information: a Uniform Resource Locator (URL); a Uniform Resource Identifier (URI); an Internet Protocol (IP) address; a telephone number; a radio frequency or a channel; a television frequency or channel; and physical location or address. The results displayed to the user may be interactive. In this case, the user may take further action by directly interacting with the object, by linking to the referenced web page, or some combination of the two. Alternatively, as described above, the results may cause another server/computer or machine to perform some action, such as distributing a product or changing a channel.
From a method point of view, a method of obtaining information using a search engine is considered to include: capturing an image of a subject using a portable device that enables a cellular phone; running computer software that automatically derives a first search term from at least a portion of the image; submitting a first search term to a search engine; and, transmitting information to the device. Some preferred methods further comprise: capturing a second image of the object using the device; running computer software to derive a second search term from at least a portion of the second object; and, submitting the second search term to the search engine along with the first search term. Other preferred methods include the steps of: submitting the first search term may advantageously include: transmitting at least a portion of the image to a remote server; running software on the server; and, the server sends the search term to the search engine. Other preferred methods include a remote server that provides one or more search terms to a device, and the device submits the one or more search terms to a search engine.
Analysis of the data (whether visual or otherwise) to generate search terms may be accomplished in any suitable manner. Advantageous techniques include, for example, signal analysis, fourier analysis, pattern matching, pattern recognition, image recognition, object recognition, wavelet analysis, component analysis, and the like.
Examples of the invention
The search term may advantageously be derived from one or more attributes including name, type, size, color, position (position) and location (location), and the derivation is performed by an algorithm, table/database look-up table, hardware device, or other suitable means. For example, consider an example in which the object being imaged is a poster of a colored version of a movie named "modern times" by charles zerland. The device 410 and/or server 420 may identify the text "modern times movie poster" and "color version" as attributes, and may determine search terms therefrom, such as "modern times", "colored", "charles-zhu bielin", and "classic movies". The attributes and search terms in this case may be determined by a human user, a machine algorithm, or some combination of the two.
In another example, a user takes an image of a notebook computer. The algorithm detects the notebook computer in the image and identifies it as model 5, manufactured by ZZZ corporation. The algorithm then determines the attribute "ZZZ model 5" and the corresponding search terms "online shopping," ZZZ, "" notebook, "and" 5.
One embodiment of particular interest includes searching using image and/or video input. The device captures one or more of a single image, multiple images, moving images, and/or video (each and all of these types of information are referred to hereinafter as "images"). In fact, an image may be captured by more than one electronic imaging device, such as a digital camera, a camera-equipped mobile phone, or a security camera, or a plurality of such devices. One or more objects are identified in the image via image/object identification techniques (software and/or hardware). The identity of the one or more objects is used to look up a set of text keyword search terms in a table/database, which is then provided to a search engine. The search engine returns (e.g., in the form of a web page with hyperlinks) information addresses associated with the objects identified in the image. The user then accesses the information and/or computing resources based on at least one of the information addresses.
Another contemplated embodiment includes a search using sign language input. An image of a person gesturing in sign language is captured. Image/motion recognition techniques are used to translate sign language into text or other machine-understandable data, such as text. The machine understandable data is sent directly to a search engine or is used to determine search terms, which are then sent to the search engine. The search engine returns an address of information related to the sign language or a partial meaning thereof.
Another embodiment includes a search using a speech input. Thus, the voice of the person is captured by the sound capture and/or recording device. The speech recognition process is thus used to recognize speech and translate it into machine-understandable data (such as text). The machine understandable data is sent directly to the search engine, or is used to determine search terms, which are then sent to the search engine. The search engine returns information addresses related to the meaning of the person's language or parts thereof.
A particularly preferred embodiment of the invention includes searching using a portable device equipped with a camera. Here, the image is captured by a portable device (e.g., a cellular phone) having a network connection. An image recognition process is then used to identify at least one object in the image. The identification process may be performed in the portable device, a remote server, or distributed and/or otherwise shared and partially performed in each. Text keywords corresponding to the one or more objects are retrieved from the database based on the identity of the one or more objects. As in the case of image recognition, although the process may be performed on the portable device or on a combination of the portable device and the server, it is preferable that this process occur on a remote server. The text keywords are then sent to a search engine. This is accomplished by sending a keyword to an internet search engine website as an HTTP transaction, and the search keyword is embedded in the URL sent to the search engine website. Preferably, an HTTP transaction is initiated from the portable device so that the search results are returned directly to the portable device. In this case, the search key is generally first made available on the portable device; if the keywords are determined at a remote server, they are first sent from the server to the portable device. The search engine results are returned to the portable device as a web page, which may then be displayed in the web browser of the portable device. If an HTTP transaction is initiated by the server, the resulting web page is made viewable on the portable device by one or more of a variety of means (the address of the resulting web page may be sent to the portable device, or the entire web page may be sent to the portable device, or the web page may be stored or converted to another form on the server, after which the portable device is directed to the address of the stored or converted page, etc.).
Image analysis
The preferred image analysis technique is described below, wherein FIG. 1 shows the overall process flow and steps. These steps are described in more detail in the following sections.
In fig. 2, for image capture 10, a user 12 uses a computer, mobile phone, personal digital assistant, or other similar device 14 equipped with an image sensor (such as a CCD or CMOS digital camera). The user 12 aligns the sensors of the image capture device 14 with the object of interest 16. The linking process is then initiated by suitable means including: user 12 presses a button on device 14 or sensor; automatically identifying, by software in the device 14, the image to be acquired; through user voice commands; or by any other suitable means. The device 14 captures a digital image 18 of the scene at which it is pointed. This image 18 is represented as three separate two-dimensional pixel matrices, which correspond to the original RGB (red, green, blue) representation of the input image. To standardize the analysis process in this embodiment, if the device 14 provides images in a format other than the RGB format, a transformation to RGB is carried out. These analyses can be performed in any standard color format, if desired.
If the server 20 is physically separate from the device 14, the image captured by the user is transmitted from the device 14 to the image processor/server 20 using conventional digital or wireless network means. If the image 18 has been compressed (e.g., via a lossy JPEG discrete cosine transform) in a manner that introduces compression artifacts into the reconstructed image 18, these artifacts may be partially removed, for example, by applying a conventional de-speckle filter to the reconstructed image prior to additional processing.
The image type determination 26 may be implemented using a discriminator algorithm that operates on the input image 18 and determines whether the input image contains recognizable symbols, such as bar codes, matrix codes, or alphanumeric characters. If such a symbol is found, the image 18 is sent to a decoded symbol 28 for processing. Depending on the confidence with which the discriminator algorithm finds the symbol, the image 18 may also or alternatively contain the object of interest and thus can also or alternatively be sent to the object image branch of the processing stream. For example, if the input image 18 contains a barcode and an object, the image may be analyzed by object image and symbol image branching, depending on the clarity of detecting the barcode, and the branch with the highest success rate in recognition will be used to identify and link from the object.
The image may then be analyzed in the decoded symbol 28 to determine the location, size, and characteristics of the symbol. Preferably, the symbols are analyzed according to their type and their content information is extracted. For example, bar codes and alphanumeric characters will produce numeric and/or textual information.
For object images, we can advantageously perform a "decomposition" of the high resolution input image into several different types of quantifiable salient (salient) parameters in an input image decomposition step 34. This allows multiple independent convergent search processes of the database to occur in parallel, which greatly improves image matching speed and matching robustness in database matching 36. Then, a best match 38 from the decoded symbol 28 or the image database match 36 or both is determined. If a particular URL (or other online address) is associated with the image, a URL lookup 40 is performed and an Internet address is returned via a URL return 42. Examples of code are given in the priority document, along with other details, including segmentation, segment group generation, bounding box generation, geometric normalization, wavelet decomposition, color cube decomposition, shape decomposition, low resolution gray scale image generation, gray scale comparison, wavelet comparison, color cube comparison, and computation of a combined match score.
Fig. 3A and 3B illustrate a preferred process flow that may occur within a database matching operation. Here, the algorithm appears to contain four nested loops (loops), with four parallel processes in the innermost loop. This structure is for presentation and explanation only. Any practical implementation (although likely to perform the same operations at the innermost layer) may have a different structure in order to obtain the greatest benefit from processing speed enhancement techniques such as parallel computation and data indexing techniques. It is also important to note that the loop structure can be implemented independently for each internal comparison, rather than the shared approach shown in fig. 3A and 3B.
Preferably, parallel processing is used to divide tasks among multiple CPUs (central processing units) and/or computers. The entire algorithm can be partitioned in several ways, such as:
sharing external loops In this technique, all CPUs run the entire algorithm, including the outer loop, but one CPU runs the loop in the first N loops and the other CPU runs the loop in the second N loops, all at the same time.
Shared comparison In this technique, one CPU performs a loop function. When multiple comparisons are performed, they are each transferred to a separate CPU to be performed in parallel.
Shared database This technique requires partitioning the database search among the CPUs so that each CPU is responsible for searching a portion of the database and searching multiple portions in parallel by multiple CPUs. This is essentially a form of the "shared outer loop" technique described above.
An actual implementation may be some combination of the above techniques that optimizes processing on available hardware.
Another technique for maximizing speed is data indexing. This technique involves searching only those parts of the database that contain possible matches using a priori knowledge of where the data is located. Various forms of indexing may be used, such as hash tables, data partitioning (i.e., data stored in a particular range of values at a particular location), data classification, and database table indexing. One example of such a technique is in a shape comparison algorithm, if an entry is to be searched for a database that has a region with a value of a, the algorithm will know which database entries or data regions have this approximation and does not have to search the entire database.
Figure 4 shows a simplified arrangement of an alternative analysis technique. The blocks with solid lines represent processes, software, physical objects or devices. The boxes with dashed lines represent information. The process starts with an object of interest: the target object 100. In the case of a consumer application, target object 100 may be, for example, a beverage can, a music CD box, a DVD video box, a magazine advertisement, a poster, a movie theater, a store, a building, a car, or any other object that a user is interested in or desires to interact with. In security applications, target object 100 may be, for example, a person, a passport, or a driver's license, etc. In industrial applications, the target object 100 may be, for example, a part in a machine, a part on an assembly line, a box in a warehouse, or a spacecraft on a track, etc.
The terminal 102 is a computing device having an "image" capture device, such as a digital camera 103, a video camera, or any other device that converts a physical object into a digital representation of the object. The image may be a single image, a series of images, or a continuous video stream. For simplicity of illustration, this file generically describes digital images as a single image, however, the present invention and this system may use all image types as described above.
After the camera 103 captures a digital image of the target object 100, the image pre-processing 104 software converts the digital image into image data 105 for transmission to and analysis by the recognition server 106. Typically, a network connection is provided that is capable of providing communication with the recognition server 106. The image data 105 is data extracted or converted from an original image of the target object 100, and has information content suitable for identifying the target object 100 by object recognition 107, which object recognition 107 may be software or hardware. Image data 105 may take many forms depending on the particular embodiment of the invention. Specific examples are given in the priority file.
The image data 105 is transmitted from the terminal 102 to the recognition server 106. The recognition server 106 receives the image data 105 and passes it to the object recognition 107.
The recognition server 106 is a set of functions that typically reside on a separate computing platform from the terminal 102, but may reside on the same computing platform. If the recognition server 106 resides on a separate computing device (such as a computer in a data center), the transmission of the image component 105 to the recognition server 106 is accomplished via a network or combination of networks, such as a cellular telephone network, a wireless internet, the internet, and a wired network. If the recognition server 106 is present on the same computing device as the terminal 102, the sending consists only of transferring data from one software component or process to another.
Placing the recognition server 106 on a computing platform separate from the terminal 102 enables the use of powerful computing resources for the object recognition 107 and database 108 functions, thus providing the terminal 102 with the powerful capabilities of these computing resources via a network connection. For example, embodiments that identify objects from a database of millions of known objects will be facilitated by the processing power, large memory and storage capacity available in the data center; it is difficult to have such computing power and storage in portable devices. Whether the terminal 102 and the recognition server 106 are on the same computing platform or separate units is an architectural decision that depends on system response time, the number of database records, image recognition algorithm computing power and storage available on the terminal 102, etc., and this decision must be made for each embodiment of the invention. In accordance with current technology, in most embodiments, these functions will be on separate computing platforms.
The overall function of the recognition server 106 is to determine and provide target object information 109 corresponding to the target object 100 from the image data 105.
The object recognition 107 and database 108 work together to:
1. symbols in the image, such as barcodes or text, are detected, identified and decoded.
2. An object (target object 100) in the image is identified.
3. Target object information 109 corresponding to the target object 100 is provided. The target object information 109 typically (according to the embodiment) includes an information address corresponding to the target object 100.
The object recognition 107 detects and decodes symbols, such as barcodes or text, in the input image. This is done via algorithms, software and/or hardware components adapted to this task. Such components are commercially available (the HALCON software package from MVTec is one example). The object recognition 107 also detects and recognizes an image of the target object 100 or a part thereof. This is accomplished by: the image data 105 is analyzed, the results are compared with other data stored in the database 108 for images representing a plurality of known objects, and the target object 100 is identified if a representation of the target object 100 is stored in the database 108.
In some embodiments, terminal 102 includes software, such as a web browser (browser 110), that receives an information address, connects to the information address via one or more networks, such as the Internet, and exchanges information with another computing device at the information address. In a consumer application, the terminal 102 may be a portable cellular phone or a personal digital assistant equipped with a camera 103 and a wireless internet connection. In security and industrial applications, the terminal 102 may be a similar portable handheld device, or may be fixed in position and/or orientation, and may have a wireless or wired network connection.
Other object recognition techniques also exist, including methods of storing three-dimensional models (rather than two-dimensional images) of objects in a database, and associating input images with these models of target objects, which methods are performed by object recognition techniques, many of which are commercially available and in the prior art. Such object recognition techniques typically consist of comparing a new input image with a plurality of known images, and detecting correspondences between the new input image and one or more of the known images. The known image is a view of the known object from multiple perspectives, thus allowing identification of two-dimensional and three-dimensional objects in any direction relative to the camera 103.
Fig. 4 shows the object recognition 107 and the database 108 as separate functions for simplicity. However, in many embodiments, the object recognition 107 and database 108 are so tightly interdependent that they can be treated as a single.
It is generally desirable that the database 108 be scalable to enable identification of target objects 100 from a very large number (e.g., millions) of known objects in the database 108. Algorithms, software and computing hardware must be designed to work together to perform such searches quickly. One example software technique for quickly performing such a search is to use metric distance comparison techniques for comparing image data 105 with data stored in database 108, as well as database clustering and multi-resolution distance comparisons. This technique is described in "Fast explicit Multi-Resolution Search for Efficient Image Retrieval" of Song, Kim and Ra, 2000.
In addition to such software techniques, parallel processing computing architectures may be used to enable fast searching of large databases. Parallel processing is particularly important where non-metric distances are used in object recognition 107, since such techniques of database clustering and multi-resolution searching are not possible, and thus the entire database must be searched by distributing the database over multiple CPUs.
As described above, the object recognition 107 may also detect an identification mark on the target object 100. For example, the target object 100 may include an identification number or a barcode. This information may be decoded and used to identify or help identify the target object 100 in the database 108. This information may also be communicated as part of the target object information 109. If included as part of the target object information 109, it may be used by the terminal 102 or the content server 111 to identify a particular target object 100 among many such objects that have similar appearances and differ only in identifying indicia. Such a technique is useful, for example, where target object 100 is an active device with a network connection (such as a vending machine) and a content server establishes communication with target object 100. Integration with global positioning systems can be used to identify objects by their location
The object recognition 107 may be implemented in hardware, software, or a combination of both. Examples and additional details of each category are given in one or more of the priority files.
In most embodiments, the browser 110 will be a web browser, embedded in the terminal 102, capable of accessing websites and communicating with the network via one or more networks, such as the Internet. However, in some embodiments, such as those involving only displaying the identity, location, orientation, or state of target object 100, browser 110 may be a software component or application that displays or provides target object information 109 to a human user or to another software component or application.
In embodiments where the browser 110 is a web browser, the browser 110 connects to a content server 111 located at an information address (typically an internet URL) included in the target object information 109. This connection is made through the cooperating terminal 102 and browser 110. The content server 111 is an information server and a computing system. The connection and information exchanged between terminal 102 and content server 111 is typically accomplished via standard internet and wireless network software, protocols (e.g., HTTP, WAP, etc.) and networks, although any information exchange technique may be used. The physical network connection depends on the system architecture of the particular embodiment, but in most embodiments it will involve a wireless network and the internet. This physical network will most likely be the same network used to connect the terminal 102 and the identification server 106.
The content server 111 transmits content information to the terminal 102 and the browser 110. This content information is typically associated with target object 100 and may be text, audio, video, graphics, or any form of information that may be used by browser 110 and terminal 102. The terminal 102 and browser 110 in some embodiments send additional information to the content server 111. This additional information may be information such as the identity of the user of terminal 102 or the location of the user of terminal 102 (as determined from a GPS system or a radio frequency ranging system). In some embodiments, such information is provided to the content server over a wireless network carrier.
The user may perform ongoing interactions with the content server 111. For example, depending on the embodiment and application of the invention, a user may:
if the target object 100 is an audio recording (e.g., a compact audio disc), a streaming audio sample is listened to.
The target object 100 is purchased via an online transaction and the purchase amount is billed to an account linked to the terminal 102, an independent user, a bank account, or a credit card.
In some embodiments, the content server 111 may reside within the terminal 102. In such embodiments, communication between the terminal 102 and the content server 111 does not occur via a network, but rather occurs within the terminal 102.
In embodiments where target object 100 comprises or is a device capable of communicating with other devices or computers via one or more networks, such as the internet, and where target object information 109 comprises sufficient identification (such as a tag, number, or barcode) of a particular target object 100, content server 111 connects to and exchanges information with target object 100 via one or more networks, such as the internet. In this type of embodiment, the terminal 102 is connected to the content server 111, and the content server 111 is connected to the target object 100. Accordingly, the terminal 102 and the target object 100 may communicate via the content server 111. Which enables a user to interact with target object 100 despite the lack of a direct connection between target object 100 and terminal 102.
Fig. 5 shows an embodiment using a cellular phone, a PDA or a portable device equipped with computing power, a digital camera and a wireless network connection as a terminal 202 corresponding to the terminal 102 in fig. 4. In this embodiment, terminal 202 communicates with identification server 206 and content server 211 via a network such as a cellular telephone network and the internet.
This embodiment may be used for applications such as "user" indicates a person operating the terminal 202, the terminal 202 is a cellular phone, PDA or similar device, "point and click" indicates an operation in which the user captures a graphic of the target object 200 and initiates transfer of the image data 205 to the recognition server 206.
The user "points and clicks" the terminal 202 on a Compact Disc (CD) containing recorded music or a Digital Video Disc (DVD) containing recorded video. The terminal 202 browser connects to the URL corresponding to the CD or DVD and displays a menu from which the user can select options. From this menu, the user can listen to a streaming audio sample of a CD or a streaming video sample of a DVD, or can purchase a CD or DVD.
A user "points and clicks" on the terminal 202 for a printed media advertisement, poster, or billboard promoting a movie, music recording, video, or other entertainment. The browser 210 connects to the URL corresponding to the advertised item, and the user can listen to the streaming audio sample, purchase a streaming video sample, obtain airtime, or purchase an item or ticket.
The user "points and clicks" the terminal 202 with respect to the television screen to interact with the television program in real time. For example, the program may include a product promotion that involves a price reduction for a limited time, during which the users "point and click" on this television program are linked to a website where they can purchase the product at the promotional price. Another example is interactive television programming, where users "point and click" on a television screen at a particular time according to the screen content to register a vote, indicate a behavior, or connect to a website through which they can interact with the screen program in real time.
The user "points and clicks" on an object such as a consumer product, an advertisement for a product, a poster, etc., the terminal 202 calls the company that sells the product, and the consumer discusses the company's product or service directly with the company representative. In this case, the company telephone number is included in the target object information 209. If the target object information 209 also includes a company URL, the user may interact with the company via both voice and the Internet (via browser 210).
The user "points and clicks" on a vending machine (target object 200) equipped with a connection to a network such as the internet and having a unique identification tag such as a number. The terminal 202 is connected to a content server 211 of a company operating the vending machine. The identification server identifies the particular vending machine by identifying and decoding the unique identification indicia. The identity of the specific machine is included in the target object information 209 and is sent from the terminal 202 to the content server 211. The content server 211 with the identification of the particular vending machine (target object 200) initiates communication with the vending machine. The user uses his terminal 202 to conduct a transaction with the vending machine, such as purchasing a product, the terminal 202 communicating with the vending machine via the content server 211.
A user "points and clicks" on a part of a machine, such as an airplane part. The terminal 202 then displays information related to the portion, such as a maintenance instruction or a repair history.
A user "points and clicks" on a magazine or newspaper article and links to streaming audio or video content, more information.
The user "points and clicks" on the car. The location of terminal 202 may be determined by a GPS receiver in terminal 202, by wireless ranging over a cellular network, or by another technique. The location of the terminal 202 is sent to the content server 211. The content server provides information about the car, such as price and characteristics, to the user and further provides the user with the location of a nearby car vendor selling the car based on the location information. This same technique may be used to direct the user to a retail store in the vicinity of the item being sold that appears in the magazine advertisement that the user "points and clicks".
For people with impaired vision:
clicking on any item in the store, the device speaks you the name and price of the item (the item must be in the database).
Clicking on a newspaper or magazine article, the device reads the article to you.
Click on a mark (building, street mark, etc.) and the device reads the mark out to you and provides any additional relevant information (the mark must be in the database).
Figure 6 illustrates one embodiment of the present invention for use in an airship application. In this embodiment, all components of the system (except the target object 300) are on-board the airship. The target object 300 is another airship or object. This embodiment is used to determine the position and orientation of the target object 300 relative to the airship so that this information may be used in navigating, guiding and maneuvering the airship relative to the target object 300. One example use of this embodiment is autonomous spacecraft rendezvous and outer space rendezvous.
This embodiment determines the position and orientation of the target object 300 relative to the airship from the position, orientation, and size of the target object 300 in the image by comparing the image captured by the camera 303 with the views of the target object 300 from different orientations stored in the database 308. The relative position and orientation of the target object 300 is output in the target object information so that the airship data system 310 may use this information in planning trajectories and maneuvering.
Thus, particular embodiments and applications have been disclosed for using information derived from images as search criteria for the Internet and other search engines. It will be apparent, however, to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms "comprises" and "comprising" should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. When the specification and claims refer to at least one of certain from the group consisting of A, B, C,.

Claims (22)

1. A system, comprising:
a portable device that optically captures an image and transmits information about the image to a remote server;
at least one of the device and the server deriving a search term from at least one of the image and the transmitted information;
at least one of the device and the server causing the search term to be submitted to a search engine that indexes publicly accessible information; and is
The search engine generates a result set using the search terms and causes at least a portion of the result set to be sent back to the portable device.
2. The system of claim 1, wherein said portable device includes cellular telephone telephony capabilities.
3. The system of claim 1, wherein the search term is included as text in the image.
4. The system of claim 1, wherein the search term is not included as text in the image.
5. The system of claim 1, wherein said device derives said search term.
6. The system of claim 1, wherein the appliance submits the search term to a search engine through a server.
7. The system of claim 1, wherein the appliance submits the search term directly to a search engine.
8. The system of claim 1, wherein said server derives said search term.
9. The system of claim 1, wherein the server submits the search term to the search engine.
10. The system of claim 1, wherein the search term comprises a keyword.
11. The system of claim 1, wherein the search term comprises a portion of an image.
12. The system of claim 1, wherein the search term comprises a symbol.
13. The system of claim 11, wherein the symbol comprises a logo.
14. The system of claim 11, wherein the symbol comprises a bar code.
15. The system of claim 11, wherein the result set includes at least one hyperlink address.
16. The system of claim 11, wherein said search engine is selected from the group consisting of GoogleTM、Yahoo!TM、MSNTMAnd Alta VistaTMAnd selecting from the formed list.
17. A cellular telephone running software that allows the cellular telephone to operate as the device in a system according to claim 1.
18. A method of obtaining information using a search engine, comprising:
capturing an image of a subject using a cellular phone enabled portable device;
running computer software that automatically derives a first search term from at least a portion of the image;
submitting the first search term to the search engine; and is
Submitting the information to the device.
19. The method of claim 16, further comprising: capturing a second image of a subject using the device; running computer software to derive a second search term from at least a portion of the second object; and submitting the second search term to a search engine along with the first search term.
20. The method of claim 16, wherein said step of submitting said first search term comprises: transmitting the at least a portion of the image to a remote server; running the software on a server; and the server sends the search term to the search engine.
21. A portable device, comprising:
a portable power source;
a camera portion that optically captures an image; and
a processor that processes instructions to derive a search term from the image and sends the search term to a search engine.
22. The result set is derived using the following steps:
capturing an image using a portable device;
the portable device transmitting information about the image to a remote server;
at least one of the device and the server automatically deriving a search term from at least one of the image and the transmitted information;
at least one of the device and the server causing the search term to be submitted to a search engine that indexes publicly accessible information; and is
A search engine uses the search terms to produce the result set.
HK09103248.5A 2005-08-15 2006-08-10 Use of image-derived information as search criteria for internet and other search engines HK1123865A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/204,901 2005-08-15

Publications (1)

Publication Number Publication Date
HK1123865A true HK1123865A (en) 2009-06-26

Family

ID=

Similar Documents

Publication Publication Date Title
US10509820B2 (en) Object information derived from object images
US8494271B2 (en) Object information derived from object images
US9310892B2 (en) Object information derived from object images
HK1123865A (en) Use of image-derived information as search criteria for internet and other search engines
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载