US20110047163A1 - Relevance-Based Image Selection - Google Patents
Relevance-Based Image Selection Download PDFInfo
- Publication number
- US20110047163A1 US20110047163A1 US12/546,436 US54643609A US2011047163A1 US 20110047163 A1 US20110047163 A1 US 20110047163A1 US 54643609 A US54643609 A US 54643609A US 2011047163 A1 US2011047163 A1 US 2011047163A1
- Authority
- US
- United States
- Prior art keywords
- video
- keyword
- feature
- keywords
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000012549 training Methods 0.000 claims abstract description 72
- 238000000034 method Methods 0.000 claims abstract description 55
- 238000010801 machine learning Methods 0.000 claims abstract description 3
- 239000013598 vector Substances 0.000 claims description 66
- 239000011159 matrix material Substances 0.000 claims description 39
- 238000000605 extraction Methods 0.000 claims description 31
- 230000006870 function Effects 0.000 claims description 28
- 238000005070 sampling Methods 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 9
- 238000012544 monitoring process Methods 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims 4
- 230000008569 process Effects 0.000 abstract description 13
- 241001481833 Coryphaena hippurus Species 0.000 description 9
- 239000000284 extract Substances 0.000 description 9
- 230000015654 memory Effects 0.000 description 7
- 238000004590 computer program Methods 0.000 description 5
- 241000282472 Canis lupus familiaris Species 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000009182 swimming Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/74—Browsing; Visualisation therefor
- G06F16/743—Browsing; Visualisation therefor a collection of video files or sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
Definitions
- the invention relates generally to identifying videos or their parts that are relevant to search terms.
- embodiments of the invention are directed to selecting one or more representative thumbnail images based on the audio-visual content of a video.
- Searchable metadata may include, for example, titles of the media files or descriptive summaries of the media content.
- textual metadata often is not representative of the entire content of the video, particularly when a video is very long and has a variety of scenes. In other words, if a video has a large number of scenes and variety of content, it is likely that some of those scenes are not described in the textual metadata, and as a result, that video would not be returned in response to searching on keywords that would likely describe such scenes.
- conventional search engines often fail to return the media content most relevant to the user's search.
- a second problem with conventional media hosting websites is that due to the large amount of hosted media content, a search query may return hundreds or even thousands of media files responsive to the user query. Consequently, the user may have difficulties assessing which of the hundreds or thousands of search results are most relevant.
- the website may present each search result together with a thumbnail image.
- the thumbnail image used to represent a video is a predetermined frame from the video file (e.g., the first frame, center frame, or last frame).
- a thumbnail selected in this manner is often not representative of the actual content of the video, since there is no relationship between the ordinal position of the thumbnail and the content of a video.
- the thumbnail may not be relevant to the user's search query. Thus, the user may have difficulty assessing which of the hundreds or thousands of search results are most relevant.
- a system, computer readable storage medium, and computer-implemented method finds and presents video search results responsive to a user keyword query.
- a video hosting system receives a keyword search query from a user and selects a video having content relevant to the keyword query. The video hosting system selects a frame from the video as representative of the video's content using a video index that stores keyword association scores between frames of a plurality of videos and keywords associated with the frames. The video hosting system presents the selected frame as a thumbnail for the video.
- a computer system generates the searchable video index using a machine-learned model of the relationships between features of video frames, and keywords descriptive of video content.
- the video hosting system receives a labeled training dataset that includes a set of media items (e.g., images or audio clips) together with one or more keywords descriptive of the content of the media items.
- the video hosting system extracts features characterizing the content of the media items.
- a machine-learned model is trained to learn correlations between particular features and the keywords descriptive of the content.
- the video index is then generated that maps frames of videos in a video database to keywords based on features of the videos and the machine-learned model.
- the video hosting system finds and presents search results based on the actual content of the videos instead of relying solely on textual metadata.
- the video hosting system enables the user to better assess the relevance of videos in the set of search results.
- FIG. 1 is a high-level block diagram of a video hosting system 100 according to one embodiment.
- FIG. 2 is a high-level block diagram illustrating a learning engine 140 according to one embodiment.
- FIG. 3 is a flowchart illustrating steps performed by the learning engine 140 to generate a learned feature-keyword model according to one embodiment.
- FIG. 4 is a flowchart illustrating steps performed by the learning engine 140 to generate a feature dataset 255 according to one embodiment.
- FIG. 5 is a flowchart illustrating steps performed by the learning engine 140 to generate a feature-keyword matrix according to one embodiment.
- FIG. 6 is a block diagram illustrating a detailed view of a image annotation engine 160 according to one embodiment.
- FIG. 7 is a flowchart illustrating steps performed by the video hosting system 100 to find and present video search results according to one embodiment.
- FIG. 8 is a flowchart illustrating steps performed by the video hosting system 100 to select a thumbnail for a video based on video metadata according to one embodiment.
- FIG. 9 is a flowchart illustrating steps performed by the video hosting system 100 to select a thumbnail for a video based on keywords in a user search query according to one embodiment.
- FIG. 10 is a flowchart illustrating steps performed by the image annotation engine 160 to identify specific events or scenes within videos based on a user keyword query according to one embodiment.
- FIG. 1 illustrates an embodiment of a video hosting system 100 .
- the video hosting system 100 finds and presents a set of video search results responsive to a user keyword query. Rather than relying solely on textual metadata associated with the videos, the video hosting system 100 presents search results based on the actual audio-visual content of the videos. Each search result is presented together with a thumbnail representative of the audio-visual content of the video that assists the user in assessing the relevance of the results.
- the video hosting system 100 comprises a front end server 110 , a video search engine 120 , a video annotation engine 130 , a learning engine 140 , a video database 175 , a video annotation index 185 , and a feature-keyword model 195 .
- the video hosting system 100 represents any system that allows users of client devices 150 to access video content via searching and/or browsing interfaces.
- the sources of videos can be from uploads of videos by users, searches or crawls by the system of other websites or databases of videos, or the like, or any combination thereof.
- a video hosting system 100 can be configured to allow upload of content by users.
- a video hosting system 100 can be configured to only obtain videos from other sources by crawling such sources or searching such sources, either offline to build a database of videos, or at query time.
- Each of the various components e.g., front end server 110 , a video search engine 120 , a video annotation engine 130 , a learning engine 140 , a video database 175 , a video annotation index 185 , and a feature-keyword model 195 , is implemented as part of a server-class computer system with one or more computers comprising a CPU, memory, network interface, peripheral interfaces, and other well known components.
- the computers themselves preferably run an operating system (e.g., LINUX), have generally high performance CPUs, 1 G or more of memory, and 100 G or more of disk storage.
- modules are stored on a computer readable storage device (e.g., hard disk), loaded into the memory, and executed by one or more processors included as part of the system 100 .
- a general purpose computer becomes a particular computer, as understood by those of skill in the art, as the particular functions and data being stored by such a computer configure it in a manner different from its native capabilities as may be provided by its underlying operating system and hardware logic.
- a suitable video hosting system 100 for implementation of the system is the YOUTUBETM website; other video hosting systems are known as well, and can be adapted to operate according to the teachings disclosed herein. It will be understood that the named components of the video hosting system 100 described herein represent one embodiment of the present invention, and other embodiments may include other components. In addition, other embodiments may lack components described herein and/or distribute the described functionality among the modules in a different manner. Additionally, the functionalities attributed to more than one component can be incorporated into a single component.
- FIG. 1 also illustrates three client devices 150 communicatively coupled to the video hosting system 100 over a network 160 .
- the client devices 150 can be any type of communication device that is capable of supporting a communications interface to the system 100 . Suitable devices may include, but are not limited to, personal computers, mobile computers (e.g., notebook computers), personal digital assistants (PDAs), smartphones, mobile phones, and gaming consoles and devices, network-enabled viewing devices (e.g., settop boxes, televisions, and receivers). Only three clients 150 are shown in FIG. 1 in order to simplify and clarify the description. In practice, thousands or millions of clients 150 can connect to the video hosting system 100 via the network 160 .
- PDAs personal digital assistants
- the network 160 may be a wired or wireless network.
- Examples of the network 160 include the Internet, an intranet, a WiFi network, a WiMAX network, a mobile telephone network, or a combination thereof.
- Those of skill in the art will recognize that other embodiments can have different modules than the ones described here, and that the functionalities can be distributed among the modules in a different manner.
- the method of communication between the client devices and the system 100 is not limited to any particular user interface or network protocol, but in a typical embodiment a user interacts with the video hosting system 100 via a conventional web browser of the client device 150 , which employs standard Internet protocols.
- the clients 150 interact with the video hosting system 100 via the front end server 110 to search for video content stored in the video database 175 .
- the front end server 110 provides controls and elements that allow a user to input search queries (e.g., keywords). Responsive to a query, the front end server 110 provides a set of search results relevant to the query. In one embodiment, the search results include a list of links to the relevant video content in the video database 175 .
- the front end server 110 may present the links together with information associated with the video content such as, for example, thumbnail images, titles, and/or textual summaries.
- the front end server 110 additionally provides controls and elements that allow the user to select a video from the search results for viewing on the client 150 .
- the video search engine 120 processes user queries received via the front end server 110 , and generates a result set comprising links to videos or portions of videos in the video database 175 that are relevant to the query, and is one means for performing this function.
- the video search engine 120 may additionally perform search functions such as ranking search results and/or scoring search results according to their relevance.
- the video search engine 120 find relevant videos based on the textual metadata associated with the videos using various textual querying techniques.
- the video search engine 120 searches for videos or portions of videos based on their actual audio-visual content rather than relying on textual metadata.
- the video search engine 120 can find and return a car racing scene from a movie, even though the scene may only be a short portion of the movie that is not described in the textual metadata.
- a process for using the video search engine to locate particular scenes of video based on their audio-visual content is described in more detail below with reference to FIG. 10 .
- the video search engine 120 also selects a thumbnail image or a set of thumbnail images to display with each retrieved search result.
- Each thumbnail image comprises an image frame representative of the video's audio-visual content and responsive to the user's query, and assists the user in determining the relevance of the search result. Methods for selecting the one or more representative thumbnail images are described in more detail below with reference to FIGS. 8-9 .
- the video annotation engine 130 annotates frames or scenes of video from the video database 175 with keywords relevant to the audio-visual content of the frames or scenes and stores these annotations to the video annotation index 185 , and is one means for performing this function.
- the video annotation engine 130 generates feature vectors from sampled portions of video (e.g., frames of video or short audio clips) from the video database 175 .
- the video annotation engine 130 then applies a learned feature-keyword model 195 to the extracted feature vectors to generate a set of keyword scores.
- Each keyword score represents the relative strength of a learned association between a keyword and one or more features. Thus, the score can be understood to describe a relative likelihood that the keyword is descriptive of the frame's content.
- the video annotation engine 130 also ranks the frames of each video according to their keyword scores, which facilitates scoring and ranking the videos at query time.
- the video annotation engine 130 stores the keyword scores for each frame to the video annotation index 185 .
- the video search engine 120 may use these keyword scores to determine videos or portions of videos most relevant to a user query and to determine thumbnail images representative of the video content.
- the video annotation engine 130 is described in more detail below with reference to FIG. 6 .
- the learning engine 140 uses machine learning to train the feature-keyword model 195 that associates features of images or short audio clips with keywords descriptive of their visual or audio content, and is one means for performing this function.
- the learning engine 140 processes a set of labeled training images, video, and/or audio clips (“media items”) that are labeled with one or more keywords representative of the media item's audio and or visual content. For example, an image of a dolphin swimming in the ocean may be labeled with keywords such as “dolphin,” “swimming,” “ocean,” and so on.
- the learning engine 140 extracts a set of features from the labeled training data (images, video, or audio) and analyzes the extracted features to determine statistical associations between particular features and the labeled keywords.
- the learning engine 140 generates a matrix of weights, frequency values, or discriminative functions indicating the relative strength of the associations between the keywords that have been used to label a media item and the features that are derived from the content of the media item.
- the learning engine 140 stores the derived relationships between keywords and features to the feature-keyword model 195 .
- the learning engine 140 is described in more detail below with reference to FIG. 2 .
- FIG. 2 is a block diagram illustrating a detailed view of the learning engine 140 according to one embodiment.
- the learning engine comprises a click-through module 210 , a feature extraction module 220 , a keyword learning module 240 , an association learning module 230 , a labeled training dataset 245 , a feature dataset 255 , and a keyword dataset 265 .
- Those of skill in the art will recognize that other embodiments can have different modules than the ones described here, and that the functionalities can be distributed among the modules in a different manner.
- the functions ascribed to the various modules can be performed by multiple engines.
- the click-through module 210 provides an automated mechanism for acquiring a labeled training dataset 245 , and is one means for performing this function.
- the click-through module 210 tracks user search queries on the video hosting system 100 or on one or more external media search websites. When a user performs a search query and selects a media item from the search results, the click-through module 210 stores a positive association between keywords in the user query and the user-selected media item. The click-through module 210 may also store negative association between the keywords and unselected search results. For example, a user searches for “dolphin” and receives a set of image results. The image that the user selects from the list is likely to actually contain an image of a dolphin and therefore provides a good label for the image.
- the click-through module 210 determines one or more keywords to attach to each image. For example, in one embodiment, the click-through module 210 stores a keyword for a media item after a threshold number of positive associations between the image and the keyword are observed (e.g., after 5 users searching for “dolphin” select the same image from the result set). Thus, the click-through module 210 can statistically identify relationships between keywords and images, based on monitoring user searches and the resulting user actions in selecting search results. This approaches takes advantage of the individual user's knowledge of what counts as relevant images for a given keywords in the ordinary course of their search behavior.
- the keyword identification module 240 may use natural language techniques such as stemming and filtering to pre-process search query data in order to identify and extract keywords.
- the click-through module 210 stores the labeled media items and their associated keywords to the labeled training dataset 245 .
- the labeled training dataset 245 may instead store training data from external sources 291 such as, for example, a database of labeled stock images or audio clips.
- keywords are extracted from metadata associated with images or audio clips such as file names, titles, or textual summaries.
- the labeled training dataset 245 may also store data acquired from a combination of the sources discussed above (e.g., using data derived from both the click-through module 210 and from one or more external databases 291 ).
- the feature extraction module 220 extracts a set of features from the labeled training data 245 , and is one means for performing this function.
- the features characterize different aspects of the media in such a way that images of similar objects will have similar features and audio clips of similar sounds will have similar features.
- the feature extraction module 220 may apply texture algorithms, edge detection algorithms, or color identification algorithms to extract image features.
- the feature extraction module 220 may apply various transforms on the sound wave, like generating a spectrogram, apply a set of band-pass filters or auto correlations, and then apply vector quantization algorithms to extract audio features.
- the feature extraction module 220 segments training images into “patches” and extracts features for each patch.
- the patches can range in height and width (e.g., 64 ⁇ 64 pixels).
- the patches may be overlapping or non-overlapping.
- the feature extraction module 220 applies an unsupervised learning algorithm to the feature data to identify a subset of the features that most effectively characterize a majority of the images patches.
- the feature extraction module 220 may apply a clustering algorithm (e.g., K-means clustering) to identify clusters or groups of features that are similar to each other or co-occur in images.
- K-means clustering e.g., K-means clustering
- the feature extraction module 220 segments training audio clips into short “sounds” and extracts features for the sounds. As with the training images, the feature extraction module 220 applies unsupervised learning to identify a subset of audio features most effectively characterizing the training audio clips.
- the association learning module 230 determines statistical associations between the features in the feature dataset 255 and the keywords in the keyword dataset 265 , and is one means for performing this function.
- the association learning module 230 represents the associations in the form of a feature-keyword matrix.
- each entry of the feature-keyword matrix comprises a weight or score indicating the relative strength of the correlation between a feature and a keyword in the training dataset.
- an entry in the matrix dataset may indicate the relative likelihood that an image labeled with the keyword “dolphin” will exhibit a feature particular feature vector Y.
- the association learning module 230 stores the learned feature-keyword matrix to the learned feature-keyword model 195 .
- different association functions and representations may be used, such as, for example, a nonlinear function that relates keywords to the visual and/or audio features.
- FIG. 3 is a flowchart illustrating an embodiment of a method for generating the feature-keyword model 195 .
- the matrix learning engine 140 receives 302 a set of labeled training data 245 , for example, from an external source 291 or from the click-through module 210 as described above.
- the keyword learning module 240 determines 304 the most frequently appearing keywords in the labeled training data 245 (e.g., the top 20,000 keywords).
- the feature extraction module 220 then generates 306 features for the training data 245 and stores the representative features to the feature dataset 255 .
- the association learning module 230 generates 308 a feature-keyword matrix mapping the keywords to features and stores the mappings to the feature-keyword model 195 .
- FIG. 4 illustrates an example embodiment of a process for generating 306 the features from the labeled training images 245 .
- the feature extraction module 220 generates 402 color features by determining color histograms that represent the color data associated with the image patches.
- a color histogram for a given patch stores the number of pixels of each color within the patch.
- the feature extraction module 220 also generates 404 texture features.
- the feature extraction module 220 uses local binary patterns (LBPs) to represent the edge and texture data within each patch.
- LBPs for a pixel represents the relative pixel intensity values of neighboring pixels.
- the LBP for a given pixel may be an 8-bit code (corresponding to the 8 neighboring pixels in a circle of radius of 1 pixel) with a 1 indicating that the neighboring pixel has a higher intensity value and a 0 indicating that neighboring pixel has a lower intensity value.
- the feature extraction module determines a histogram for each patch that stores a count of LBP values within a given patch.
- the feature extraction module 220 applies 406 clustering to the color features and texture features. For example, in one embodiment, the feature extraction module 220 applies K-means clustering to the color histograms to identify a plurality of clusters (e.g. 20) that best represent the patches. For each cluster, a centroid (feature vector) of the cluster is determined, which is representative of the dominant color of the cluster, thus creating a set of dominant color features for all the patches. The feature extraction module 220 separately clusters the LBP histograms to identify a subset of texture histograms (i.e. texture features) that best characterizes the texture of the patches, and thus identifies the set of dominant texture features for the patches as well.
- K-means clustering to the color histograms to identify a plurality of clusters (e.g. 20) that best represent the patches. For each cluster, a centroid (feature vector) of the cluster is determined, which is representative of the dominant color of the cluster, thus creating a set of dominant color features for all the patches
- the feature extraction module 220 then generates 408 a feature vector for each patch.
- texture and color histograms for a patch are concatenated to form the single feature vector for the patch.
- the feature extraction module 220 applies an unsupervised learning algorithm (e.g., clustering) to the set of feature vectors for the patches to generate 410 a subset of feature vectors representing a majority of the patches (e.g., the 10,000 most representative feature vectors).
- the feature extraction module 220 stores the subset of feature vectors to the feature dataset 255 .
- the feature extraction module 220 may generate audio feature vectors by computing Mel-frequency cepstral coefficients (MFCCs). These coefficients represent the short-term power spectrum of a sound based on a linear cosine transform of a log power spectrum on a nonlinear frequency scale. Audio feature vectors are then stored to the feature dataset 255 and can be processed similarly to the image feature vectors.
- the feature extraction module 220 generates audio feature vectors by using stabilized auditory images (SAI).
- SAI stabilized auditory images
- one or more band-pass filters are applied to the audio data and features are derived based on correlations within and among the channels.
- spectrograms are used as audio features.
- FIG. 5 illustrates an example process for iteratively learning a feature-keyword matrix from the feature dataset 255 and the keyword dataset 265 .
- the association learning module 230 initializes 502 the feature-keyword matrix by populating the entries with initial weights. For example, in one embodiment, the initial weights are all set to zero.
- the association learning module 230 randomly selects 504 a positive training item p+ (i.e. a training item labeled with the keyword K) and randomly selects a negative training item p ⁇ (i.e. a training item not labeled with the keyword K).
- the feature extraction module 220 determines 506 feature vectors for both the positive training item and the negative training item as described above.
- the association learning engine 230 generates 508 keyword scores for each of the positive and negative training items by using the feature-keyword matrix to transform the feature vectors from the feature space to the keyword space (e.g., by multiplying the feature vector and the feature-keyword matrix to yield a keyword vector).
- the association learning module 230 determines 510 the difference between the keyword scores. If the difference is greater than a predefined threshold value (i.e., the positive and negative training items are correctly ordered), then the matrix is not changed 512 . Otherwise, the matrix entries are set 514 such that the difference is greater than the threshold.
- the association learning module 230 determines 516 whether or not a stopping criterion is met. If the stopping criterion is not met, the matrix learning performs another iteration 520 with new positive and negative training items to further refine the matrix. If the stopping criterion is met, then the learning process stops 518 .
- the stopping criterion is met when, on average over a sliding window of previously selected positive and negative training pairs, the number of pairs correctly ordered exceeds a predefined threshold.
- the performance of the learned matrix can be measured by applying the learned matrix to a separate set of validation data, and the stopping criterion is met when the performance exceeds a predefined threshold.
- keyword scores are computed and compared for different keywords rather than the same keyword K in each iteration of learning process.
- the positive training item p+ is selected as a training item labeled with a first keyword K 1
- the negative training item p ⁇ is selected as a training item that is not labeled with a different keyword K 2 .
- the association learning module 230 generates keywords scores for each training item/keyword pair (i.e. a positive pair and a negative pair). The association learning module 230 then compares the keywords scores in the same manner as described above even though the keyword scores are related to different keywords.
- the association learning module 230 learns a different type of feature-keyword model 195 such as, for example, a generative model or a discriminative model.
- the association learning module 230 derives discriminative functions (i.e. classifiers) that can be applied to a set of features to obtain one or more keywords associated with those features.
- the association learning module 230 applies clustering algorithms to specific types of features or all features that are associated with an image patch or audio segment.
- the association learning module 230 generates a classifier for each keyword in the keyword dataset 265 .
- the classifier comprises a discriminative function (e.g.
- the association learning module 230 stores the learned classifiers to the learned feature-keyword model 195 .
- the feature extraction module 220 and the association learning module 230 iteratively generate sets of features for new training data 245 and re-train a classifier until the classifier converges.
- the classifier converges when the discriminative function and the weights associated with the sets of features are substantially unchanged by the addition of new training sets of features.
- an on-line support vector machine algorithm is used to iteratively re-calculate a hyperplane function based on features values associated with new training data 245 until the hyperplane function converges.
- the association learning module 230 re-trains the classifier on a periodic basis.
- the association learning module 230 retrains the classifier on a continuous basis, for example, whenever new search query data is added to the labeled training dataset 245 (e.g., from new click-through data).
- the resulting feature-keyword matrix represents a model of the relationship between keywords (as have been applied to images/audio files) and feature vectors derived from the image/audio files.
- the model may be understood to express the underlying physical relationship in terms of the co-occurrences of keywords, and the physical characteristics representing the images/audio files (e.g., color, texture, frequency information).
- FIG. 6 illustrates a detailed view of the video annotation engine 130 .
- the video annotation engine 130 includes a video sampling module 610 , a feature extraction module 620 , and a thumbnail selection module 630 .
- Those of skill in the art will recognize that other embodiments can have different modules than the ones described here, and that the functionalities can be distributed among the modules in a different manner.
- the functions ascribed to the various modules can be performed by multiple engines.
- the video sampling module 610 samples frames of video content from videos in the video database 175 .
- the video sampling module 610 samples video content from individual videos in the video database 175 .
- the sampling module 610 can sample a video at a fixed periodic rate (e.g., 1 frame every 10 seconds), a rate dependent on intrinsic factors (e.g. length of the video), or a rate based on extrinsic factors such as the popularity of the video (e.g., more popular videos, based on number of views, would be sampled at a higher frequency than less popular videos).
- the video sample module 610 uses scene segmentation to sample frames based on the scene boundaries. For example, the video sampling module 610 may sample at least one frame from each scene to ensure that the sampled frames are representative of the whole content of the video. In another alternative embodiment, the video sample module 610 samples entire scenes of videos rather than individual frames.
- the feature extraction module 620 uses the same methodology as the feature extraction module 220 described above with respect to the learning engine 140 .
- the feature extraction module 620 generates a feature vector for each sampled frame or scene.
- each feature vector may comprise 10,000 entries, each being a representative of a particular feature obtained through vector quantization.
- the frame annotation module 630 generates keyword association scores for each sampled frame of a video.
- the frame annotation module 630 applies the learned feature-keyword model 195 to the feature vector for a sample frame to determine the keyword association scores for the frame.
- the frame annotation module 630 may perform a matrix multiplication using the feature-keyword matrix to transform the feature vector to the keyword space.
- the frame annotation module 630 thus generates a vector of keyword association scores for each frame (“keyword score vector”), where each keyword association score in the keyword score vector specifies the likelihood that the frame is relevant to a keyword of the set of frequently-used keywords in the keyword dataset 265 .
- the frame annotation module 630 stores the keyword score vector for the frame in association with indicia of the frame (e.g.
- each sampled frame is associated with a keyword vector score that describes the relationship between each of keywords and the frame, based on the feature vectors derived from the frame.
- each video in the database is thus associated with one or more sampled frames (which can be used for thumbnails) and these sampled frames are associated with keywords, as described.
- the video annotation engine 130 generates keyword scores for a group of frames (e.g. scenes) rather for each individual sampled frame. For example, keywords scored may be stored for a particular scene of video. For audio features, keyword scores may be stored in association with a group of frames spanning a particular audio clip, such as, for example, speech from a particular individual.
- the search engine 120 accesses the video annotation index 185 to find and present a result set of relevant videos (e.g., by performing a lookup in the index 185 ).
- the search engine 120 uses keyword scores in the video annotation index 185 for the input query words that match the selected keywords, to find videos relevant to the search query and rank the relevant videos in the result set.
- the video search engine 120 may also provide a relevance score for each search result indicating the perceived relevance to the search query.
- the search engine 120 may also access a conventional index that includes textual metadata associated with the videos in order to find, rank, and score search results.
- FIG. 7 is a flowchart illustrating a general process performed by the video hosting system 100 for finding and presenting video search results.
- the front end server 110 receives 702 a search query comprising one or more query terms from a user.
- the search engine 120 determines 704 a result set satisfying the keyword search query; this result set can be selected using any type of search algorithm and index structure.
- the result set includes a link to one or more videos having content relevant to the query terms.
- the search engine 120 selects 706 a frame (or several frames) from each of the videos in the result set that is representative of the video's content based on the keywords scores. For each search result, the front end server 110 presents 708 the selected frames as a set of one or more representative thumbnails together with the link to the video.
- FIGS. 8 and 9 illustrate two different embodiments by which a frame can be selected 906 based on keyword scores.
- the video search engine 120 selects a thumbnail representative of a video based on textual metadata stored in association with the video in the video database 175 .
- the video search engine 120 selects 802 a video from the video database for thumbnail selection.
- the video search engine 120 then extracts 804 keywords from metadata stored in association with the video in the video database 175 .
- Metadata may include, for example, the video title or a textual summary of the video provided by the author or other user.
- the video search engine 120 then accesses the video annotation index 185 and uses the extracted keyword to choose 806 one or more representative frames of video (e.g., by selecting the frame or set of frames having the highest ranked keyword score(s) for the extracted keyword).
- the front end server 110 displays 808 the chosen frames as a thumbnail for the video in the search results.
- This embodiment beneficially ensures that the selected thumbnails will actually be representative of the video content. For example, consider a video entitled “Dolphin Swim” that includes some scenes of a swimming dolphin but other scenes that are just empty ocean. Rather than arbitrarily selecting a thumbnail frame (e.g., the first frame or center frame), the video search engine 120 will select one or more frames that actually depicts a dolphin. Thus, the user is better able to assess the relevance of the search results to the query.
- FIG. 9 is a flowchart illustrating a second embodiment of a process for selecting a thumbnail to present with a video in a set of search results.
- the one or more selected thumbnails are dependent on the keywords provided in the user search query.
- the search engine 120 identifies 902 a set of video search results based on the user search query.
- the search engine 120 extracts 904 keywords from the user's search query to use in selecting the representative thumbnail frames for each of the search results.
- the video search engine 120 accesses the video annotation index 185 and uses the extracted keyword to choose 906 one or more representative frame of video (e.g., by selecting the one or more frames having the highest ranked keyword score(s) for the extracted keyword).
- the front end server 110 displays 908 the chosen frames as thumbnails for the video in the search results.
- This embodiment beneficially ensures that the video thumbnail is actually related to the user's search query. For example, suppose the user enters the query “dog on a skateboard.” A video entitled “Animals Doing Tricks” includes a relevant scene featuring a dog on a skateboard, but also includes several other scenes without dogs or skateboards. The method of FIG. 9 beneficially ensures that the presented thumbnail is representative of the scene that the user searched for (i.e., the dog on the skateboard). Thus, the user can easily assess the relevance of the search results to the keyword query.
- FIG. 10 illustrates an example embodiment of a process for finding scenes or events relevant to a keyword query.
- the search engine 120 receives 1002 a search query from a user and identifies 1004 keywords from the search string.
- the search engine 120 accesses the video annotation index 185 (e.g., by performing a lookup function) to retrieve a number of frames 1006 (e.g., top 10) having the highest keyword scores for the extracted keyword.
- the search engine determines 1008 boundaries for the relevant scenes within the video. For example, the search engine 120 may use scene segmentation techniques to find the boundaries of the scene including the highly relevant frame. Alternatively, the search engine 120 may analyze the keyword scores of surrounding frames to determine the boundaries. For example, the search engine 120 may return a video clip in which all sampled frames have keyword scores above a threshold.
- the search engine 120 selects 1010 a thumbnail image for each video in the result set based on the keyword scores.
- the front end server 110 displays 1012 a ranked set of videos represented by the selected thumbnails.
- Another feature of the video hosting system 100 is the ability to select a set of “related videos” that may be displayed before, during, or after playback of a user-selected video based on the video annotation index 185 .
- the video hosting system 100 extracts keywords from the title or other metadata associated with the playback of the selected video.
- the video hosting system 100 uses the extracted keywords to query the video annotation index 185 for videos relevant to the keywords; this identifies other videos that are likely to be similar to the user selected video in terms of their actual image/audio content, rather than just having the same keywords in their metadata.
- the video hosting system 100 then chooses thumbnails for the related videos as described above, and presents the thumbnails in a “related videos” portion of the user interface display. This embodiment beneficially provides a user with other videos that may be of interest based on the content of the playback video.
- Another feature of the video hosting system 100 is the ability to find and present advertisements that may be displayed before, during, or after playback of a selected video, based on the use of the video annotation index 185 .
- the video hosting system 100 retrieves keywords associated with frames of video in real-time as the user views the video (i.e., by performing a lookup in the annotation index 185 using the current frame index).
- the video hosting system 100 may then query an advertisement database using the retrieved keywords for advertisements relevant to the keywords.
- the video hosting system 100 may then display advertisements related to the current frames in real-time as the video plays back.
- the above described embodiments beneficially allow a media host to provide video content items and representative thumbnail images that are most relevant to a user's search query.
- the video hosting system provides improved search results over systems that rely solely on textual metadata.
- the present invention has been described in particular detail with respect to a limited number of embodiments. Those of skill in the art will appreciate that the invention may additionally be practiced in other embodiments.
- the particular naming of the components, capitalization of terms, the attributes, data structures, or any other programming or structural aspect is not mandatory or significant, and the mechanisms that implement the invention or its features may have different names, formats, or protocols.
- the system may be implemented via a combination of hardware and software, as described, or entirely in hardware elements.
- the particular division of functionality between the various system components described herein is merely exemplary, and not mandatory; functions performed by a single system component may instead be performed by multiple components, and functions performed by multiple components may instead performed by a single component.
- the particular functions of the media host service may be provided in many or one module.
- Certain aspects of the present invention include process steps and instructions described herein in the form of an algorithm. All such process steps, instructions or algorithms are executed by computing devices that include some form of processing unit (e.g,. a microprocessor, microcontroller, dedicated logic circuit or the like) as well as a memory (RAM, ROM, or the like), and input/output devices as appropriate for receiving or providing data.
- processing unit e.g. a microprocessor, microcontroller, dedicated logic circuit or the like
- RAM random access memory
- ROM read only memory
- input/output devices as appropriate for receiving or providing data.
- the present invention also relates to an apparatus for performing the operations herein.
- This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer, in which event the general-purpose computer is structurally and functionally equivalent to a specific computer dedicated to performing the functions and operations described herein.
- a computer program that embodies computer executable data e.g.
- program code and data is stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for persistently storing electronically coded instructions.
- a tangible computer readable storage medium such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for persistently storing electronically coded instructions.
- Such computer programs by nature of their existence as data stored in a physical medium by alterations of such medium, such as alterations or variations in the physical structure and/or properties (e.g., electrical, optical, mechanical, magnetic, chemical properties) of the medium, are not abstract ideas or concepts or representations per se, but instead are physical artifacts produced by physical processes that transform a physical medium from one state to another state (e.g., a change in the electrical charge, or a change in magnetic polarity) in order to persistently store the computer program in the medium.
- the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
A system, computer readable storage medium, and computer-implemented method presents video search results responsive to a user keyword query. The video hosting system uses a machine learning process to learn a feature-keyword model associating features of media content from a labeled training dataset with keywords descriptive of their content. The system uses the learned model to provide video search results relevant to a keyword query based on features found in the videos. Furthermore, the system determines and presents one or more thumbnail images representative of the video using the learned model.
Description
- 1. Field of the Art
- The invention relates generally to identifying videos or their parts that are relevant to search terms. In particular, embodiments of the invention are directed to selecting one or more representative thumbnail images based on the audio-visual content of a video.
- 2. Background
- Users of media hosting websites typically browse or search the hosted media content by inputting keywords or search terms to query textual metadata describing the media content. Searchable metadata may include, for example, titles of the media files or descriptive summaries of the media content. Such textual metadata often is not representative of the entire content of the video, particularly when a video is very long and has a variety of scenes. In other words, if a video has a large number of scenes and variety of content, it is likely that some of those scenes are not described in the textual metadata, and as a result, that video would not be returned in response to searching on keywords that would likely describe such scenes. Thus, conventional search engines often fail to return the media content most relevant to the user's search.
- A second problem with conventional media hosting websites is that due to the large amount of hosted media content, a search query may return hundreds or even thousands of media files responsive to the user query. Consequently, the user may have difficulties assessing which of the hundreds or thousands of search results are most relevant. In order to assist the user in assessing which search results are most relevant, the website may present each search result together with a thumbnail image. Conventionally, the thumbnail image used to represent a video is a predetermined frame from the video file (e.g., the first frame, center frame, or last frame). However, a thumbnail selected in this manner is often not representative of the actual content of the video, since there is no relationship between the ordinal position of the thumbnail and the content of a video. Furthermore, the thumbnail may not be relevant to the user's search query. Thus, the user may have difficulty assessing which of the hundreds or thousands of search results are most relevant.
- Accordingly, improved methods of finding and presenting media search results that will allow a user to easily assess their relevance are needed.
- A system, computer readable storage medium, and computer-implemented method finds and presents video search results responsive to a user keyword query. A video hosting system receives a keyword search query from a user and selects a video having content relevant to the keyword query. The video hosting system selects a frame from the video as representative of the video's content using a video index that stores keyword association scores between frames of a plurality of videos and keywords associated with the frames. The video hosting system presents the selected frame as a thumbnail for the video.
- In one aspect, a computer system generates the searchable video index using a machine-learned model of the relationships between features of video frames, and keywords descriptive of video content. The video hosting system receives a labeled training dataset that includes a set of media items (e.g., images or audio clips) together with one or more keywords descriptive of the content of the media items. The video hosting system extracts features characterizing the content of the media items. A machine-learned model is trained to learn correlations between particular features and the keywords descriptive of the content. The video index is then generated that maps frames of videos in a video database to keywords based on features of the videos and the machine-learned model.
- Advantageously, the video hosting system finds and presents search results based on the actual content of the videos instead of relying solely on textual metadata. Thus, the video hosting system enables the user to better assess the relevance of videos in the set of search results.
- The features and advantages described in this summary and the following detailed description are not all-inclusive. Many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims hereof.
-
FIG. 1 is a high-level block diagram of avideo hosting system 100 according to one embodiment. -
FIG. 2 is a high-level block diagram illustrating alearning engine 140 according to one embodiment. -
FIG. 3 is a flowchart illustrating steps performed by thelearning engine 140 to generate a learned feature-keyword model according to one embodiment. -
FIG. 4 is a flowchart illustrating steps performed by thelearning engine 140 to generate afeature dataset 255 according to one embodiment. -
FIG. 5 is a flowchart illustrating steps performed by thelearning engine 140 to generate a feature-keyword matrix according to one embodiment. -
FIG. 6 is a block diagram illustrating a detailed view of aimage annotation engine 160 according to one embodiment. -
FIG. 7 is a flowchart illustrating steps performed by thevideo hosting system 100 to find and present video search results according to one embodiment. -
FIG. 8 is a flowchart illustrating steps performed by thevideo hosting system 100 to select a thumbnail for a video based on video metadata according to one embodiment. -
FIG. 9 is a flowchart illustrating steps performed by thevideo hosting system 100 to select a thumbnail for a video based on keywords in a user search query according to one embodiment. -
FIG. 10 is a flowchart illustrating steps performed by theimage annotation engine 160 to identify specific events or scenes within videos based on a user keyword query according to one embodiment. - The figures depict preferred embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
-
FIG. 1 illustrates an embodiment of avideo hosting system 100. Thevideo hosting system 100 finds and presents a set of video search results responsive to a user keyword query. Rather than relying solely on textual metadata associated with the videos, thevideo hosting system 100 presents search results based on the actual audio-visual content of the videos. Each search result is presented together with a thumbnail representative of the audio-visual content of the video that assists the user in assessing the relevance of the results. - In one embodiment, the
video hosting system 100 comprises afront end server 110, avideo search engine 120, avideo annotation engine 130, alearning engine 140, avideo database 175, avideo annotation index 185, and a feature-keyword model 195. Thevideo hosting system 100 represents any system that allows users ofclient devices 150 to access video content via searching and/or browsing interfaces. The sources of videos can be from uploads of videos by users, searches or crawls by the system of other websites or databases of videos, or the like, or any combination thereof. For example, in one embodiment, avideo hosting system 100 can be configured to allow upload of content by users. In another embodiment, avideo hosting system 100 can be configured to only obtain videos from other sources by crawling such sources or searching such sources, either offline to build a database of videos, or at query time. - Each of the various components (alternatively, modules) e.g.,
front end server 110, avideo search engine 120, avideo annotation engine 130, alearning engine 140, avideo database 175, avideo annotation index 185, and a feature-keyword model 195, is implemented as part of a server-class computer system with one or more computers comprising a CPU, memory, network interface, peripheral interfaces, and other well known components. The computers themselves preferably run an operating system (e.g., LINUX), have generally high performance CPUs, 1 G or more of memory, and 100 G or more of disk storage. Of course, other types of computers can be used, and it is expected that as more powerful computers are developed in the future, they can be configured in accordance with the teachings here. In this embodiment, the modules are stored on a computer readable storage device (e.g., hard disk), loaded into the memory, and executed by one or more processors included as part of thesystem 100. Alternatively, hardware or software modules may be stored elsewhere within thesystem 100. When configured to execute the various operations described herein, a general purpose computer becomes a particular computer, as understood by those of skill in the art, as the particular functions and data being stored by such a computer configure it in a manner different from its native capabilities as may be provided by its underlying operating system and hardware logic. A suitablevideo hosting system 100 for implementation of the system is the YOUTUBE™ website; other video hosting systems are known as well, and can be adapted to operate according to the teachings disclosed herein. It will be understood that the named components of thevideo hosting system 100 described herein represent one embodiment of the present invention, and other embodiments may include other components. In addition, other embodiments may lack components described herein and/or distribute the described functionality among the modules in a different manner. Additionally, the functionalities attributed to more than one component can be incorporated into a single component. -
FIG. 1 also illustrates threeclient devices 150 communicatively coupled to thevideo hosting system 100 over anetwork 160. Theclient devices 150 can be any type of communication device that is capable of supporting a communications interface to thesystem 100. Suitable devices may include, but are not limited to, personal computers, mobile computers (e.g., notebook computers), personal digital assistants (PDAs), smartphones, mobile phones, and gaming consoles and devices, network-enabled viewing devices (e.g., settop boxes, televisions, and receivers). Only threeclients 150 are shown inFIG. 1 in order to simplify and clarify the description. In practice, thousands or millions ofclients 150 can connect to thevideo hosting system 100 via thenetwork 160. - The
network 160 may be a wired or wireless network. Examples of thenetwork 160 include the Internet, an intranet, a WiFi network, a WiMAX network, a mobile telephone network, or a combination thereof. Those of skill in the art will recognize that other embodiments can have different modules than the ones described here, and that the functionalities can be distributed among the modules in a different manner. The method of communication between the client devices and thesystem 100 is not limited to any particular user interface or network protocol, but in a typical embodiment a user interacts with thevideo hosting system 100 via a conventional web browser of theclient device 150, which employs standard Internet protocols. - The
clients 150 interact with thevideo hosting system 100 via thefront end server 110 to search for video content stored in thevideo database 175. Thefront end server 110 provides controls and elements that allow a user to input search queries (e.g., keywords). Responsive to a query, thefront end server 110 provides a set of search results relevant to the query. In one embodiment, the search results include a list of links to the relevant video content in thevideo database 175. Thefront end server 110 may present the links together with information associated with the video content such as, for example, thumbnail images, titles, and/or textual summaries. Thefront end server 110 additionally provides controls and elements that allow the user to select a video from the search results for viewing on theclient 150. - The
video search engine 120 processes user queries received via thefront end server 110, and generates a result set comprising links to videos or portions of videos in thevideo database 175 that are relevant to the query, and is one means for performing this function. Thevideo search engine 120 may additionally perform search functions such as ranking search results and/or scoring search results according to their relevance. In one embodiment, thevideo search engine 120 find relevant videos based on the textual metadata associated with the videos using various textual querying techniques. In another embodiment, thevideo search engine 120 searches for videos or portions of videos based on their actual audio-visual content rather than relying on textual metadata. For example, if the user enters the search query “car race,” thevideo search engine 120 can find and return a car racing scene from a movie, even though the scene may only be a short portion of the movie that is not described in the textual metadata. A process for using the video search engine to locate particular scenes of video based on their audio-visual content is described in more detail below with reference toFIG. 10 . - In one embodiment, the
video search engine 120 also selects a thumbnail image or a set of thumbnail images to display with each retrieved search result. Each thumbnail image comprises an image frame representative of the video's audio-visual content and responsive to the user's query, and assists the user in determining the relevance of the search result. Methods for selecting the one or more representative thumbnail images are described in more detail below with reference toFIGS. 8-9 . - The
video annotation engine 130 annotates frames or scenes of video from thevideo database 175 with keywords relevant to the audio-visual content of the frames or scenes and stores these annotations to thevideo annotation index 185, and is one means for performing this function. In one embodiment, thevideo annotation engine 130 generates feature vectors from sampled portions of video (e.g., frames of video or short audio clips) from thevideo database 175. Thevideo annotation engine 130 then applies a learned feature-keyword model 195 to the extracted feature vectors to generate a set of keyword scores. Each keyword score represents the relative strength of a learned association between a keyword and one or more features. Thus, the score can be understood to describe a relative likelihood that the keyword is descriptive of the frame's content. In one embodiment, thevideo annotation engine 130 also ranks the frames of each video according to their keyword scores, which facilitates scoring and ranking the videos at query time. Thevideo annotation engine 130 stores the keyword scores for each frame to thevideo annotation index 185. Thevideo search engine 120 may use these keyword scores to determine videos or portions of videos most relevant to a user query and to determine thumbnail images representative of the video content. Thevideo annotation engine 130 is described in more detail below with reference toFIG. 6 . - The
learning engine 140 uses machine learning to train the feature-keyword model 195 that associates features of images or short audio clips with keywords descriptive of their visual or audio content, and is one means for performing this function. Thelearning engine 140 processes a set of labeled training images, video, and/or audio clips (“media items”) that are labeled with one or more keywords representative of the media item's audio and or visual content. For example, an image of a dolphin swimming in the ocean may be labeled with keywords such as “dolphin,” “swimming,” “ocean,” and so on. Thelearning engine 140 extracts a set of features from the labeled training data (images, video, or audio) and analyzes the extracted features to determine statistical associations between particular features and the labeled keywords. For example, in one embodiment, thelearning engine 140 generates a matrix of weights, frequency values, or discriminative functions indicating the relative strength of the associations between the keywords that have been used to label a media item and the features that are derived from the content of the media item. Thelearning engine 140 stores the derived relationships between keywords and features to the feature-keyword model 195. Thelearning engine 140 is described in more detail below with reference toFIG. 2 . -
FIG. 2 is a block diagram illustrating a detailed view of thelearning engine 140 according to one embodiment. In the illustrated embodiment, the learning engine comprises a click-throughmodule 210, afeature extraction module 220, akeyword learning module 240, anassociation learning module 230, a labeledtraining dataset 245, afeature dataset 255, and akeyword dataset 265. Those of skill in the art will recognize that other embodiments can have different modules than the ones described here, and that the functionalities can be distributed among the modules in a different manner. In addition, the functions ascribed to the various modules can be performed by multiple engines. - The click-through
module 210 provides an automated mechanism for acquiring a labeledtraining dataset 245, and is one means for performing this function. The click-throughmodule 210 tracks user search queries on thevideo hosting system 100 or on one or more external media search websites. When a user performs a search query and selects a media item from the search results, the click-throughmodule 210 stores a positive association between keywords in the user query and the user-selected media item. The click-throughmodule 210 may also store negative association between the keywords and unselected search results. For example, a user searches for “dolphin” and receives a set of image results. The image that the user selects from the list is likely to actually contain an image of a dolphin and therefore provides a good label for the image. Based on the learned positive and/or negative associations, the click-throughmodule 210 determines one or more keywords to attach to each image. For example, in one embodiment, the click-throughmodule 210 stores a keyword for a media item after a threshold number of positive associations between the image and the keyword are observed (e.g., after 5 users searching for “dolphin” select the same image from the result set). Thus, the click-throughmodule 210 can statistically identify relationships between keywords and images, based on monitoring user searches and the resulting user actions in selecting search results. This approaches takes advantage of the individual user's knowledge of what counts as relevant images for a given keywords in the ordinary course of their search behavior. In some embodiments, thekeyword identification module 240 may use natural language techniques such as stemming and filtering to pre-process search query data in order to identify and extract keywords. The click-throughmodule 210 stores the labeled media items and their associated keywords to the labeledtraining dataset 245. - In an alternative embodiment, the labeled
training dataset 245 may instead store training data fromexternal sources 291 such as, for example, a database of labeled stock images or audio clips. In one embodiment, keywords are extracted from metadata associated with images or audio clips such as file names, titles, or textual summaries. The labeledtraining dataset 245 may also store data acquired from a combination of the sources discussed above (e.g., using data derived from both the click-throughmodule 210 and from one or more external databases 291). - The
feature extraction module 220 extracts a set of features from the labeledtraining data 245, and is one means for performing this function. The features characterize different aspects of the media in such a way that images of similar objects will have similar features and audio clips of similar sounds will have similar features. To extract features from images, thefeature extraction module 220 may apply texture algorithms, edge detection algorithms, or color identification algorithms to extract image features. For audio clips, thefeature extraction module 220 may apply various transforms on the sound wave, like generating a spectrogram, apply a set of band-pass filters or auto correlations, and then apply vector quantization algorithms to extract audio features. - In one embodiment, the
feature extraction module 220 segments training images into “patches” and extracts features for each patch. The patches can range in height and width (e.g., 64×64 pixels). The patches may be overlapping or non-overlapping. Thefeature extraction module 220 applies an unsupervised learning algorithm to the feature data to identify a subset of the features that most effectively characterize a majority of the images patches. For example, thefeature extraction module 220 may apply a clustering algorithm (e.g., K-means clustering) to identify clusters or groups of features that are similar to each other or co-occur in images. Thus, for example, thefeature extraction module 220 can identify the 10,000 most representative feature patterns and associated patches. - Similarly, the
feature extraction module 220 segments training audio clips into short “sounds” and extracts features for the sounds. As with the training images, thefeature extraction module 220 applies unsupervised learning to identify a subset of audio features most effectively characterizing the training audio clips. - The
keyword identification module 240 identifies a set of frequently occurring keywords based on the labeledtraining dataset 245, and is one means for performing this function. For example, in one embodiment, thekeyword identification module 240 determines the N most common keywords in the labeled training dataset (e.g., N=20,000). Thekeyword identification module 220 stores the set of frequently occurring keywords in thekeyword dataset 265. - The
association learning module 230 determines statistical associations between the features in thefeature dataset 255 and the keywords in thekeyword dataset 265, and is one means for performing this function. For example, in one embodiment, theassociation learning module 230 represents the associations in the form of a feature-keyword matrix. The feature-keyword matrix comprises a matrix with m rows and n columns, where each of the m rows corresponds to a different feature vector from thefeature dataset 255 and each of the n columns corresponds to a different keyword from the keyword dataset 265 (e.g., m=10,000 and n=20,000). In one embodiment, each entry of the feature-keyword matrix comprises a weight or score indicating the relative strength of the correlation between a feature and a keyword in the training dataset. For example, an entry in the matrix dataset may indicate the relative likelihood that an image labeled with the keyword “dolphin” will exhibit a feature particular feature vector Y. Theassociation learning module 230 stores the learned feature-keyword matrix to the learned feature-keyword model 195. In other alternative embodiments, different association functions and representations may be used, such as, for example, a nonlinear function that relates keywords to the visual and/or audio features. -
FIG. 3 is a flowchart illustrating an embodiment of a method for generating the feature-keyword model 195. First, thematrix learning engine 140 receives 302 a set of labeledtraining data 245, for example, from anexternal source 291 or from the click-throughmodule 210 as described above. Thekeyword learning module 240 determines 304 the most frequently appearing keywords in the labeled training data 245 (e.g., the top 20,000 keywords). Thefeature extraction module 220 then generates 306 features for thetraining data 245 and stores the representative features to thefeature dataset 255. Theassociation learning module 230 generates 308 a feature-keyword matrix mapping the keywords to features and stores the mappings to the feature-keyword model 195. -
FIG. 4 illustrates an example embodiment of a process for generating 306 the features from the labeledtraining images 245. In the example embodiment, thefeature extraction module 220 generates 402 color features by determining color histograms that represent the color data associated with the image patches. A color histogram for a given patch stores the number of pixels of each color within the patch. - The
feature extraction module 220 also generates 404 texture features. In on embodiment, thefeature extraction module 220 uses local binary patterns (LBPs) to represent the edge and texture data within each patch. The LBPs for a pixel represents the relative pixel intensity values of neighboring pixels. For example, the LBP for a given pixel may be an 8-bit code (corresponding to the 8 neighboring pixels in a circle of radius of 1 pixel) with a 1 indicating that the neighboring pixel has a higher intensity value and a 0 indicating that neighboring pixel has a lower intensity value. The feature extraction module then determines a histogram for each patch that stores a count of LBP values within a given patch. - The
feature extraction module 220 applies 406 clustering to the color features and texture features. For example, in one embodiment, thefeature extraction module 220 applies K-means clustering to the color histograms to identify a plurality of clusters (e.g. 20) that best represent the patches. For each cluster, a centroid (feature vector) of the cluster is determined, which is representative of the dominant color of the cluster, thus creating a set of dominant color features for all the patches. Thefeature extraction module 220 separately clusters the LBP histograms to identify a subset of texture histograms (i.e. texture features) that best characterizes the texture of the patches, and thus identifies the set of dominant texture features for the patches as well. Thefeature extraction module 220 then generates 408 a feature vector for each patch. In one embodiment, texture and color histograms for a patch are concatenated to form the single feature vector for the patch. Thefeature extraction module 220 applies an unsupervised learning algorithm (e.g., clustering) to the set of feature vectors for the patches to generate 410 a subset of feature vectors representing a majority of the patches (e.g., the 10,000 most representative feature vectors). Thefeature extraction module 220 stores the subset of feature vectors to thefeature dataset 255. - For audio training data, the
feature extraction module 220 may generate audio feature vectors by computing Mel-frequency cepstral coefficients (MFCCs). These coefficients represent the short-term power spectrum of a sound based on a linear cosine transform of a log power spectrum on a nonlinear frequency scale. Audio feature vectors are then stored to thefeature dataset 255 and can be processed similarly to the image feature vectors. In another embodiment, thefeature extraction module 220 generates audio feature vectors by using stabilized auditory images (SAI). In yet another embodiment, one or more band-pass filters are applied to the audio data and features are derived based on correlations within and among the channels. In yet another embodiment, spectrograms are used as audio features. -
FIG. 5 illustrates an example process for iteratively learning a feature-keyword matrix from thefeature dataset 255 and thekeyword dataset 265. In one embodiment, theassociation learning module 230 initializes 502 the feature-keyword matrix by populating the entries with initial weights. For example, in one embodiment, the initial weights are all set to zero. For a given keyword, K, from thekeyword dataset 265, theassociation learning module 230 randomly selects 504 a positive training item p+ (i.e. a training item labeled with the keyword K) and randomly selects a negative training item p− (i.e. a training item not labeled with the keyword K). Thefeature extraction module 220 determines 506 feature vectors for both the positive training item and the negative training item as described above. Theassociation learning engine 230 generates 508 keyword scores for each of the positive and negative training items by using the feature-keyword matrix to transform the feature vectors from the feature space to the keyword space (e.g., by multiplying the feature vector and the feature-keyword matrix to yield a keyword vector). Theassociation learning module 230 then determines 510 the difference between the keyword scores. If the difference is greater than a predefined threshold value (i.e., the positive and negative training items are correctly ordered), then the matrix is not changed 512. Otherwise, the matrix entries are set 514 such that the difference is greater than the threshold. Theassociation learning module 230 then determines 516 whether or not a stopping criterion is met. If the stopping criterion is not met, the matrix learning performs anotheriteration 520 with new positive and negative training items to further refine the matrix. If the stopping criterion is met, then the learning process stops 518. - In one embodiment, the stopping criterion is met when, on average over a sliding window of previously selected positive and negative training pairs, the number of pairs correctly ordered exceeds a predefined threshold. Alternatively, the performance of the learned matrix can be measured by applying the learned matrix to a separate set of validation data, and the stopping criterion is met when the performance exceeds a predefined threshold.
- In an alternative embodiment, in order for the scores to be compatible between keywords, keyword scores are computed and compared for different keywords rather than the same keyword K in each iteration of learning process. Thus, in this embodiment, the positive training item p+ is selected as a training item labeled with a first keyword K1 and the negative training item p− is selected as a training item that is not labeled with a different keyword K2. In this embodiment, the
association learning module 230 generates keywords scores for each training item/keyword pair (i.e. a positive pair and a negative pair). Theassociation learning module 230 then compares the keywords scores in the same manner as described above even though the keyword scores are related to different keywords. - In alternative embodiments, the
association learning module 230 learns a different type of feature-keyword model 195 such as, for example, a generative model or a discriminative model. For example, in one alternative embodiment, theassociation learning module 230 derives discriminative functions (i.e. classifiers) that can be applied to a set of features to obtain one or more keywords associated with those features. In this embodiment, theassociation learning module 230 applies clustering algorithms to specific types of features or all features that are associated with an image patch or audio segment. Theassociation learning module 230 generates a classifier for each keyword in thekeyword dataset 265. The classifier comprises a discriminative function (e.g. a hyperplane) and a set of weights or other values, where the weights or values specify the discriminative ability of the feature in distinguishing a class of media items from another class of media items. Theassociation learning module 230 stores the learned classifiers to the learned feature-keyword model 195. - In some embodiments, the
feature extraction module 220 and theassociation learning module 230 iteratively generate sets of features fornew training data 245 and re-train a classifier until the classifier converges. The classifier converges when the discriminative function and the weights associated with the sets of features are substantially unchanged by the addition of new training sets of features. In a specific embodiment, an on-line support vector machine algorithm is used to iteratively re-calculate a hyperplane function based on features values associated withnew training data 245 until the hyperplane function converges. In other embodiments, theassociation learning module 230 re-trains the classifier on a periodic basis. In some embodiments, theassociation learning module 230 retrains the classifier on a continuous basis, for example, whenever new search query data is added to the labeled training dataset 245 (e.g., from new click-through data). - In any of the foregoing embodiment, the resulting feature-keyword matrix represents a model of the relationship between keywords (as have been applied to images/audio files) and feature vectors derived from the image/audio files. The model may be understood to express the underlying physical relationship in terms of the co-occurrences of keywords, and the physical characteristics representing the images/audio files (e.g., color, texture, frequency information).
-
FIG. 6 illustrates a detailed view of thevideo annotation engine 130. In one embodiment, thevideo annotation engine 130 includes avideo sampling module 610, afeature extraction module 620, and athumbnail selection module 630. Those of skill in the art will recognize that other embodiments can have different modules than the ones described here, and that the functionalities can be distributed among the modules in a different manner. In addition, the functions ascribed to the various modules can be performed by multiple engines. - The
video sampling module 610 samples frames of video content from videos in thevideo database 175. In one embodiment, thevideo sampling module 610 samples video content from individual videos in thevideo database 175. Thesampling module 610 can sample a video at a fixed periodic rate (e.g., 1 frame every 10 seconds), a rate dependent on intrinsic factors (e.g. length of the video), or a rate based on extrinsic factors such as the popularity of the video (e.g., more popular videos, based on number of views, would be sampled at a higher frequency than less popular videos). Alternatively, thevideo sample module 610 uses scene segmentation to sample frames based on the scene boundaries. For example, thevideo sampling module 610 may sample at least one frame from each scene to ensure that the sampled frames are representative of the whole content of the video. In another alternative embodiment, thevideo sample module 610 samples entire scenes of videos rather than individual frames. - The
feature extraction module 620 uses the same methodology as thefeature extraction module 220 described above with respect to thelearning engine 140. Thefeature extraction module 620 generates a feature vector for each sampled frame or scene. For example, as described above each feature vector may comprise 10,000 entries, each being a representative of a particular feature obtained through vector quantization. - The
frame annotation module 630 generates keyword association scores for each sampled frame of a video. Theframe annotation module 630 applies the learned feature-keyword model 195 to the feature vector for a sample frame to determine the keyword association scores for the frame. For example, theframe annotation module 630 may perform a matrix multiplication using the feature-keyword matrix to transform the feature vector to the keyword space. Theframe annotation module 630 thus generates a vector of keyword association scores for each frame (“keyword score vector”), where each keyword association score in the keyword score vector specifies the likelihood that the frame is relevant to a keyword of the set of frequently-used keywords in thekeyword dataset 265. Theframe annotation module 630 stores the keyword score vector for the frame in association with indicia of the frame (e.g. the offset of the frame in the video the frame is part of) and indicia of the video in thevideo annotation index 185. Thus, each sampled frame is associated with a keyword vector score that describes the relationship between each of keywords and the frame, based on the feature vectors derived from the frame. Further, each video in the database is thus associated with one or more sampled frames (which can be used for thumbnails) and these sampled frames are associated with keywords, as described. - In alternative embodiments, the
video annotation engine 130 generates keyword scores for a group of frames (e.g. scenes) rather for each individual sampled frame. For example, keywords scored may be stored for a particular scene of video. For audio features, keyword scores may be stored in association with a group of frames spanning a particular audio clip, such as, for example, speech from a particular individual. - When a user inputs a search query of one more words, the
search engine 120 accesses thevideo annotation index 185 to find and present a result set of relevant videos (e.g., by performing a lookup in the index 185). In one embodiment, thesearch engine 120 uses keyword scores in thevideo annotation index 185 for the input query words that match the selected keywords, to find videos relevant to the search query and rank the relevant videos in the result set. Thevideo search engine 120 may also provide a relevance score for each search result indicating the perceived relevance to the search query. In addition to or instead of the keyword scores in thevideo annotation index 185, thesearch engine 120 may also access a conventional index that includes textual metadata associated with the videos in order to find, rank, and score search results. -
FIG. 7 is a flowchart illustrating a general process performed by thevideo hosting system 100 for finding and presenting video search results. Thefront end server 110 receives 702 a search query comprising one or more query terms from a user. Thesearch engine 120 determines 704 a result set satisfying the keyword search query; this result set can be selected using any type of search algorithm and index structure. The result set includes a link to one or more videos having content relevant to the query terms. - The
search engine 120 then selects 706 a frame (or several frames) from each of the videos in the result set that is representative of the video's content based on the keywords scores. For each search result, thefront end server 110 presents 708 the selected frames as a set of one or more representative thumbnails together with the link to the video. -
FIGS. 8 and 9 illustrate two different embodiments by which a frame can be selected 906 based on keyword scores. In the embodiment ofFIG. 8 , thevideo search engine 120 selects a thumbnail representative of a video based on textual metadata stored in association with the video in thevideo database 175. Thevideo search engine 120 selects 802 a video from the video database for thumbnail selection. Thevideo search engine 120 then extracts 804 keywords from metadata stored in association with the video in thevideo database 175. Metadata may include, for example, the video title or a textual summary of the video provided by the author or other user. Thevideo search engine 120 then accesses thevideo annotation index 185 and uses the extracted keyword to choose 806 one or more representative frames of video (e.g., by selecting the frame or set of frames having the highest ranked keyword score(s) for the extracted keyword). Thefront end server 110 then displays 808 the chosen frames as a thumbnail for the video in the search results. This embodiment beneficially ensures that the selected thumbnails will actually be representative of the video content. For example, consider a video entitled “Dolphin Swim” that includes some scenes of a swimming dolphin but other scenes that are just empty ocean. Rather than arbitrarily selecting a thumbnail frame (e.g., the first frame or center frame), thevideo search engine 120 will select one or more frames that actually depicts a dolphin. Thus, the user is better able to assess the relevance of the search results to the query. -
FIG. 9 is a flowchart illustrating a second embodiment of a process for selecting a thumbnail to present with a video in a set of search results. In this embodiment, the one or more selected thumbnails are dependent on the keywords provided in the user search query. First, thesearch engine 120 identifies 902 a set of video search results based on the user search query. Thesearch engine 120 extracts 904 keywords from the user's search query to use in selecting the representative thumbnail frames for each of the search results. For each video in the result set, thevideo search engine 120 then accesses thevideo annotation index 185 and uses the extracted keyword to choose 906 one or more representative frame of video (e.g., by selecting the one or more frames having the highest ranked keyword score(s) for the extracted keyword). Thefront end server 110 then displays 908 the chosen frames as thumbnails for the video in the search results. - This embodiment beneficially ensures that the video thumbnail is actually related to the user's search query. For example, suppose the user enters the query “dog on a skateboard.” A video entitled “Animals Doing Tricks” includes a relevant scene featuring a dog on a skateboard, but also includes several other scenes without dogs or skateboards. The method of
FIG. 9 beneficially ensures that the presented thumbnail is representative of the scene that the user searched for (i.e., the dog on the skateboard). Thus, the user can easily assess the relevance of the search results to the keyword query. - Another feature of the
video hosting system 100 allows a user to search for specific scenes or events within a video using thevideo annotation index 185. For example, in a long action movie, a user may want to search for fighting scenes or car racing scenes, using query terms such as “car race” or “fight.” Thevideo hosting system 100 then retrieves only the particular scene or scenes (rather than the entire video) relevant to the query.FIG. 10 illustrates an example embodiment of a process for finding scenes or events relevant to a keyword query. Thesearch engine 120 receives 1002 a search query from a user and identifies 1004 keywords from the search string. Using the keywords, thesearch engine 120 accesses the video annotation index 185 (e.g., by performing a lookup function) to retrieve a number of frames 1006 (e.g., top 10) having the highest keyword scores for the extracted keyword. The search engine then determines 1008 boundaries for the relevant scenes within the video. For example, thesearch engine 120 may use scene segmentation techniques to find the boundaries of the scene including the highly relevant frame. Alternatively, thesearch engine 120 may analyze the keyword scores of surrounding frames to determine the boundaries. For example, thesearch engine 120 may return a video clip in which all sampled frames have keyword scores above a threshold. Thesearch engine 120 selects 1010 a thumbnail image for each video in the result set based on the keyword scores. Thefront end server 110 then displays 1012 a ranked set of videos represented by the selected thumbnails. - Another feature of the
video hosting system 100 is the ability to select a set of “related videos” that may be displayed before, during, or after playback of a user-selected video based on thevideo annotation index 185. In this embodiment, thevideo hosting system 100 extracts keywords from the title or other metadata associated with the playback of the selected video. Thevideo hosting system 100 uses the extracted keywords to query thevideo annotation index 185 for videos relevant to the keywords; this identifies other videos that are likely to be similar to the user selected video in terms of their actual image/audio content, rather than just having the same keywords in their metadata. Thevideo hosting system 100 then chooses thumbnails for the related videos as described above, and presents the thumbnails in a “related videos” portion of the user interface display. This embodiment beneficially provides a user with other videos that may be of interest based on the content of the playback video. - Another feature of the
video hosting system 100 is the ability to find and present advertisements that may be displayed before, during, or after playback of a selected video, based on the use of thevideo annotation index 185. In one embodiment, thevideo hosting system 100 retrieves keywords associated with frames of video in real-time as the user views the video (i.e., by performing a lookup in theannotation index 185 using the current frame index). Thevideo hosting system 100 may then query an advertisement database using the retrieved keywords for advertisements relevant to the keywords. Thevideo hosting system 100 may then display advertisements related to the current frames in real-time as the video plays back. - The above described embodiments beneficially allow a media host to provide video content items and representative thumbnail images that are most relevant to a user's search query. By learning associations between textual queries and non-textual media content, the video hosting system provides improved search results over systems that rely solely on textual metadata.
- The present invention has been described in particular detail with respect to a limited number of embodiments. Those of skill in the art will appreciate that the invention may additionally be practiced in other embodiments. First, the particular naming of the components, capitalization of terms, the attributes, data structures, or any other programming or structural aspect is not mandatory or significant, and the mechanisms that implement the invention or its features may have different names, formats, or protocols. Further, the system may be implemented via a combination of hardware and software, as described, or entirely in hardware elements. Also, the particular division of functionality between the various system components described herein is merely exemplary, and not mandatory; functions performed by a single system component may instead be performed by multiple components, and functions performed by multiple components may instead performed by a single component. For example, the particular functions of the media host service may be provided in many or one module.
- Some portions of the above description present the feature of the present invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the art to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules or code devices, without loss of generality.
- It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the present discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
- Certain aspects of the present invention include process steps and instructions described herein in the form of an algorithm. All such process steps, instructions or algorithms are executed by computing devices that include some form of processing unit (e.g,. a microprocessor, microcontroller, dedicated logic circuit or the like) as well as a memory (RAM, ROM, or the like), and input/output devices as appropriate for receiving or providing data.
- The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer, in which event the general-purpose computer is structurally and functionally equivalent to a specific computer dedicated to performing the functions and operations described herein. A computer program that embodies computer executable data (e.g. program code and data) is stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for persistently storing electronically coded instructions. It should be further noted that such computer programs by nature of their existence as data stored in a physical medium by alterations of such medium, such as alterations or variations in the physical structure and/or properties (e.g., electrical, optical, mechanical, magnetic, chemical properties) of the medium, are not abstract ideas or concepts or representations per se, but instead are physical artifacts produced by physical processes that transform a physical medium from one state to another state (e.g., a change in the electrical charge, or a change in magnetic polarity) in order to persistently store the computer program in the medium. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
- Finally, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention.
Claims (51)
1. A computer-implemented method for creating a searchable video index, the method executed by a computer system, and comprising:
receiving a labeled training dataset comprising a set of media items together with one or more keywords descriptive of content of the media items;
extracting features characterizing the content of the media items;
training a machine-learned model to learn correlations between the extracted features of the media items and the keywords descriptive of the content; and
generating the video index mapping frames of videos in a video database to keywords based on features of the videos in the video database and the machine-learned model.
2. The method of claim 1 , wherein the media items comprise images.
3. The method of claim 1 , wherein the media items comprise audio clips.
4. The method of claim 1 , wherein extracting the features characterizing the content of the media items comprises:
segmenting each image into a plurality of patches;
generating a feature vector for each of the patches; and
applying a clustering algorithm to determine a plurality of most representative feature vectors in the labeled training data.
5. The method of claim 4 , wherein the patches are at least partially overlapping.
6. The method of claim 4 , further comprising:
determining a plurality of most commonly found keywords in the labeled training dataset.
7. The method of claim 6 , further comprising:
storing associations between the most representative keywords and the most commonly found feature vectors.
8. The method of claim 6 , wherein storing associations between the most commonly found keywords and the most commonly found feature vectors comprise:
generating a set of association functions, each association function representative of an association strength between one of the most representative feature vectors and one of the most commonly found keywords.
9. The method of claim 7 , wherein storing associations between the most commonly found keywords and the most commonly found feature vectors comprises:
generating a feature-keyword matrix, wherein entries in a first dimension of the feature-keyword matrix each correspond to a different one of the most representative feature vectors, and wherein entries in a second dimension of the feature-keyword matrix each correspond to a different one of the most commonly found keywords.
10. The method of claim 9 , wherein generating the feature-keyword matrix comprises:
initializing the feature-keyword matrix by populating the entries with initial weights;
selecting a positive training media item associated with a first keyword and a negative training media item not associated with a second keyword;
extracting features for the positive and negative training media items to obtain a positive feature vector and a negative feature vector;
applying a transformation to the positive feature vector using the feature-keyword matrix to obtain a first keyword score for the positive training media item;
applying a transformation to the negative feature vector using the feature-keyword matrix to obtain a second keyword score for the negative training media item;
determining if the keyword score for the positive media training item is at least a threshold value higher than the keyword score for the negative training media item; and
responsive to the keyword score for the positive media training item not being at least a threshold value higher than the keyword score for the negative training media item, adjusting the weights in the feature-keyword matrix.
11. The method of claim 1 , wherein generating the video index comprises:
sampling frames of a video in the video database;
computing a first feature vector for a first sampled frame of the video representative of content of the first sampled frame;
applying the machine-learned model to the first feature vector to generate a keyword association score between the first sampled frame and the selected keyword; and
storing the keyword association score in association with the first sampled frame in the video index.
12. The method of claim 1 , wherein generating the video index comprises:
sampling scenes of a video in the video database;
computing a first feature vector for a first sampled scene of the video representative of content of the first sampled scene;
applying the machine-learned model to the first feature vector to generate a keyword association score between the first sampled scene and the selected keyword; and
storing the keyword association score in association with the first sampled scene in the video index.
13. A computer-implemented method for presenting video search results, the method executed by a computer system, and comprising:
receiving a video;
selecting a frame from the video as representative of content of the video using a video annotation index that stores keyword association scores between frames of a plurality of videos and keywords associated with the frames of the plurality of videos; and
providing the selected frame as a thumbnail for the video.
14. The method of claim 13 , wherein selecting the frame from the video as representative of the video's content comprises:
selecting a keyword representative of desired video content;
accessing the video annotation index to determine keyword association scores between frames of the video and the selected keyword; and
selecting the frame having a highest ranked keyword association score with the selected keyword according to the video annotation index.
15. The method of claim 14 , wherein selecting the keyword representative of the desired video content comprises using a title of the video as the selected keyword.
16. The method of claim 14 , wherein selecting the keyword representative of the desired video content comprises using the keyword query as the selected keyword.
17. The method of claim 13 , wherein receiving the video comprises:
receiving a keyword query from a user; and
selecting the video from a database of videos as having content relevant to the keyword query.
18. The method of claim 18 , wherein selecting the video having content relevant to the keyword query comprises:
determining a frame of video having a high keyword association score with a keyword from the keyword query;
determining scene boundaries of a scene relevant to the keyword query, the scene of video including the frame having the high keyword association score; and
selecting the scene as the selected video.
19. The method of claim 18 , further comprising:
ranking the selected video among a plurality of videos in a result set based on the keyword association scores between frames of videos in the result set and keywords in the keyword query.
20. The method of claim 18 , further comprising:
presenting a relevance score for the selected video based on the keyword association scores between frames of the video and keywords in the keyword query.
21. A computer readable storage medium storing computer executable code for creating a searchable video index, the computer executable program code when executed cause an application to perform the steps of:
receiving a labeled training dataset comprising a set of media items together with one or more keywords descriptive of content of the media items;
extracting features characterizing the content of the media items;
training a machine-learned model to learn correlations between the extracted features of the media items and the keywords descriptive of the content; and
generating the video index mapping frames of videos in a video database to keywords based on features of the videos in the video database and the machine-learned model.
22. The computer readable storage medium of claim 21 , wherein the media items comprise images.
23. The computer readable storage medium of claim 21 , wherein the media items comprise audio clips.
24. The computer readable storage medium of claim 21 , wherein extracting the features characterizing the content of the media items comprises:
segmenting each image into a plurality of patches;
generating a feature vector for each of the patches; and
applying a clustering algorithm to determine a plurality of most representative feature vectors in the labeled training data.
25. The computer readable storage medium of claim 24 , wherein the patches are at least partially overlapping.
26. The computer readable storage medium of claim 24 , further comprising:
determining a plurality of most commonly found keywords in the labeled training dataset.
27. The computer readable storage medium of claim 26 , further comprising:
storing associations between the most commonly found keywords and the most commonly found feature vectors.
28. The computer readable storage medium of claim 26 , wherein storing associations between the most commonly found keywords and the most commonly found feature vectors comprise:
generating a set of association functions, each association function representative of an association strength between one of the most representative feature vectors and one of the most commonly found keywords.
29. The computer readable storage medium of claim 27 , wherein storing associations between the most commonly found keywords and the most representative feature vectors comprises:
generating a feature-keyword matrix, wherein entries in a first dimension of the feature-keyword matrix each correspond to a different one of the most representative feature vectors, and wherein entries in a second dimension of the feature-keyword matrix each correspond to a different one of the most commonly found keywords.
30. The computer readable storage medium of claim 29 , wherein generating the feature-keyword matrix comprises:
initializing the feature-keyword matrix by populating the entries with initial weights;
selecting a positive training media item associated with a first keyword and a negative training media item not associated with a second keyword;
extracting features for the positive and negative training media items to obtain a positive feature vector and a negative feature vector;
applying a transformation to the positive feature vector using the feature-keyword matrix to obtain a first keyword score for the positive training media item;
applying a transformation to the negative feature vector using the feature-keyword matrix to obtain a second keyword score for the negative training media item;
determining if the keyword score for the positive media training item is at least a threshold value higher than the keyword score for the negative training media item; and
responsive to the keyword score for the positive media training item not being at least a threshold value higher than the keyword score for the negative training media item, adjusting the weights in the feature-keyword matrix.
31. The computer readable storage medium of claim 21 , wherein generating the video index comprises:
sampling frames of a video in the video database;
computing a first feature vector for a first sampled frame of the video representative of content of the first sampled frame;
applying the machine-learned model to the first feature vector to generate a keyword association score between the first sampled frame and the selected keyword; and
storing the keyword association score in association with the first sampled frame in the video index.
32. The computer readable storage medium of claim 21 , wherein generating the video index comprises:
sampling scenes of a video in the video database;
computing a first feature vector for a first sampled scene of the video representative of content of the first sampled scene;
applying the machine-learned model to the first feature vector to generate a keyword association score between the first sampled scene and the selected keyword; and
storing the keyword association score in association with the first sampled scene in the video index.
33. A computer readable storage medium storing computer executable code for presenting video search results, the computer executable program code when executed cause an application to perform the steps of:
receiving a video;
selecting a frame from the video as representative content of the video using a video annotation index that stores keyword association scores between frames of a plurality of videos and keywords associated with the frames of the plurality of videos; and
providing the selected frame as a thumbnail for the video.
34. The computer readable storage medium of claim 33 , wherein selecting the frame from the video as representative of the video's content comprises:
selecting a keyword representative of desired video content;
accessing the video index to determine keyword association scores between frames of the video and the selected keyword; and
selecting the frame having a highest ranked keyword association score with the selected keyword according to the video annotation index.
35. The computer readable storage medium of claim 34 , wherein selecting the keyword representative of the desired video content comprises using a title of the video as the selected keyword.
37. The computer readable storage medium of claim 34 , wherein selecting the keyword representative of the desired video content comprises using the keyword query as the selected keyword.
38. The computer readable storage medium of claim 33 , wherein receiving the video comprise:
receiving a keyword query from a user; and
selecting the video from a database of videos as having content relevant to the keyword query.
39. The computer readable storage medium of claim 38 , wherein selecting the video having content relevant to the keyword query comprises:
determining a frame of video having a high keyword association score with a keyword from the keyword query;
determining scene boundaries of a scene relevant to the keyword query, the scene of video including the frame having the high keyword association score; and
selecting the scene as the selected video.
40. The computer readable storage medium of claim 38 , further comprising:
ranking the selected video among a plurality of videos in a result set using the keyword association scores between frames of videos in the result set and keywords in the keyword query.
41. The computer readable storage medium of claim 38 , further comprising:
presenting a relevance score for the selected video based on the keyword association scores between frames of the video and keywords in the keyword query.
42. A video hosting system for finding and presenting videos relevant to a keyword query, the system comprising:
a front end server configured to receive a keyword query from a user and present a result set comprising a video having content relevant to the keyword query and a thumbnail image representative of the content of the video;
a video annotation index comprising a mapping between keywords and frames of video, the mapping derived from a machine-learned model; and
a video search engine configured to access the video annotation index to determine the video having content relevant to the keyword and to determine the thumbnail image representative of the content of the video.
43. The system of claim 42 , further comprising:
a video database storing videos searchable by the video search engine, wherein frames of the stored videos are indexed in the video annotation index to map the frames to keywords descriptive of their content.
44. The system of claim 42 , further comprising:
a video annotation engine configured to determine a mapping between frames of videos in a video database and keywords descriptive of their content using a learned feature-keyword model obtained through machine learning.
45. The system of claim 44 , wherein the video annotation engine comprises:
a video sampling module configured to sample frames of video from a video database;
a feature extraction module configure to generate a feature vector representative of each of the sampled frames of video; and
a frame annotation module configured to apply the learned feature-keyword model to the feature vectors in order to determine keyword scores for each of the sampled frames of video, the keyword scores indexed to the video annotation index in association with the relevant sampled frames.
46. The system of claim 42 , further comprising:
a learning engine configured to learn a feature-keyword model mapping features of images or audio clips in a labeled training dataset to keywords descriptive of their content.
47. The system of claim 45 , wherein the learning engine comprises:
a feature extraction module configured to generate a feature dataset comprising a plurality of most representative feature vectors for the labeled training dataset;
a keyword learning module configured to generate a keyword dataset comprising a plurality of most commonly occurring keywords in the labeled training dataset; and
an association learning module adapted to generate the keyword-feature model mapping associations between the feature vectors in feature dataset and the keywords in the keyword dataset.
48. The system of claim 47 , wherein the learning engine further comprises:
a click-through module configured to automatically acquire labels for the labeled training data by tracking user search queries on a media search web site, and learning labels for media items by observing search results selected by a user and search results not selected by the user.
49. A method for presenting advertisements, the method executed by a computer and comprising:
playing a selected video using a web-based video player;
monitoring a current frame of video during playback of the selected video;
accessing a video annotation index using the current frame of video to determine one or more keywords associated with the current frame,
accessing an a advertising database using the one or more keywords to select an advertisement associated with the one or more keywords; and
providing the advertisement for display during playback of the current frame.
50. The method of claim 49 , wherein the video annotation index maps frames of video to one or more keywords according to a machine-learned model.
51. A method for presenting a set of related videos, the method executed by a computer and comprising:
playing a selected video using a web-based video player;
extracting metadata associated with the selected video, the metadata including one or more keywords descriptive of the selected video;
accessing a video annotation index using the one or more keywords to determine one or more related videos; and
providing the one or more related videos for display, each related video represented by a thumbnail image representative of its content.
52. The method of claim 51 , wherein the video annotation index maps keywords to videos in a video database according to a machine-learned model.
Priority Applications (13)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/546,436 US20110047163A1 (en) | 2009-08-24 | 2009-08-24 | Relevance-Based Image Selection |
CA2771593A CA2771593C (en) | 2009-08-24 | 2010-08-18 | Relevance-based image selection |
EP18161198.9A EP3352104A1 (en) | 2009-08-24 | 2010-08-18 | Relevance-based image selection |
PCT/US2010/045909 WO2011025701A1 (en) | 2009-08-24 | 2010-08-18 | Relevance-based image selection |
EP10812505.5A EP2471026B1 (en) | 2009-08-24 | 2010-08-18 | Relevance-based image selection |
AU2010286797A AU2010286797A1 (en) | 2009-08-24 | 2010-08-18 | Relevance-based image selection |
CN201080042760.9A CN102549603B (en) | 2009-08-24 | 2010-08-18 | Relevance-based image selection |
US14/687,116 US10614124B2 (en) | 2009-08-24 | 2015-04-15 | Relevance-based image selection |
AU2016202074A AU2016202074B2 (en) | 2009-08-24 | 2016-04-04 | Relevance-based image selection |
AU2018201624A AU2018201624B2 (en) | 2009-08-24 | 2018-03-06 | Relevance-based image selection |
US16/100,414 US11017025B2 (en) | 2009-08-24 | 2018-08-10 | Relevance-based image selection |
US17/328,442 US11693902B2 (en) | 2009-08-24 | 2021-05-24 | Relevance-based image selection |
US18/321,225 US20230306057A1 (en) | 2009-08-24 | 2023-05-22 | Relevance-Based Image Selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/546,436 US20110047163A1 (en) | 2009-08-24 | 2009-08-24 | Relevance-Based Image Selection |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/687,116 Continuation US10614124B2 (en) | 2009-08-24 | 2015-04-15 | Relevance-based image selection |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110047163A1 true US20110047163A1 (en) | 2011-02-24 |
Family
ID=43606147
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/546,436 Abandoned US20110047163A1 (en) | 2009-08-24 | 2009-08-24 | Relevance-Based Image Selection |
US14/687,116 Active 2031-06-02 US10614124B2 (en) | 2009-08-24 | 2015-04-15 | Relevance-based image selection |
US16/100,414 Active 2030-07-19 US11017025B2 (en) | 2009-08-24 | 2018-08-10 | Relevance-based image selection |
US17/328,442 Active 2030-03-26 US11693902B2 (en) | 2009-08-24 | 2021-05-24 | Relevance-based image selection |
US18/321,225 Pending US20230306057A1 (en) | 2009-08-24 | 2023-05-22 | Relevance-Based Image Selection |
Family Applications After (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/687,116 Active 2031-06-02 US10614124B2 (en) | 2009-08-24 | 2015-04-15 | Relevance-based image selection |
US16/100,414 Active 2030-07-19 US11017025B2 (en) | 2009-08-24 | 2018-08-10 | Relevance-based image selection |
US17/328,442 Active 2030-03-26 US11693902B2 (en) | 2009-08-24 | 2021-05-24 | Relevance-based image selection |
US18/321,225 Pending US20230306057A1 (en) | 2009-08-24 | 2023-05-22 | Relevance-Based Image Selection |
Country Status (6)
Country | Link |
---|---|
US (5) | US20110047163A1 (en) |
EP (2) | EP2471026B1 (en) |
CN (1) | CN102549603B (en) |
AU (3) | AU2010286797A1 (en) |
CA (1) | CA2771593C (en) |
WO (1) | WO2011025701A1 (en) |
Cited By (116)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100040285A1 (en) * | 2008-08-14 | 2010-02-18 | Xerox Corporation | System and method for object class localization and semantic class based image segmentation |
US20110173214A1 (en) * | 2010-01-14 | 2011-07-14 | Mobdub, Llc | Crowdsourced multi-media data relationships |
US20110218994A1 (en) * | 2010-03-05 | 2011-09-08 | International Business Machines Corporation | Keyword automation of video content |
US20110225133A1 (en) * | 2010-03-09 | 2011-09-15 | Microsoft Corporation | Metadata-aware search engine |
US20110229017A1 (en) * | 2010-03-18 | 2011-09-22 | Yuan Liu | Annotation addition method, annotation addition system using the same, and machine-readable medium |
US20120117046A1 (en) * | 2010-11-08 | 2012-05-10 | Sony Corporation | Videolens media system for feature selection |
CN102542066A (en) * | 2011-11-11 | 2012-07-04 | 冉阳 | Video clustering method, ordering method, video searching method and corresponding devices |
US20130051663A1 (en) * | 2011-08-26 | 2013-02-28 | Aravind Krishnaswamy | Fast Adaptive Edge-Aware Matting |
US20130073961A1 (en) * | 2011-09-20 | 2013-03-21 | Giovanni Agnoli | Media Editing Application for Assigning Roles to Media Content |
US20130080426A1 (en) * | 2011-09-26 | 2013-03-28 | Xue-wen Chen | System and methods of integrating visual features and textual features for image searching |
US20130086105A1 (en) * | 2011-10-03 | 2013-04-04 | Microsoft Corporation | Voice directed context sensitive visual search |
US20130163679A1 (en) * | 2010-09-10 | 2013-06-27 | Dong-Qing Zhang | Video decoding using example-based data pruning |
US20130226930A1 (en) * | 2012-02-29 | 2013-08-29 | Telefonaktiebolaget L M Ericsson (Publ) | Apparatus and Methods For Indexing Multimedia Content |
US20140032544A1 (en) * | 2011-03-23 | 2014-01-30 | Xilopix | Method for refining the results of a search within a database |
US8649613B1 (en) * | 2011-11-03 | 2014-02-11 | Google Inc. | Multiple-instance-learning-based video classification |
US20140089799A1 (en) * | 2011-01-03 | 2014-03-27 | Curt Evans | Methods and system for remote control for multimedia seeking |
US20140095346A1 (en) * | 2012-09-28 | 2014-04-03 | International Business Machines Corporation | Data analysis method and system thereof |
US8838680B1 (en) | 2011-02-08 | 2014-09-16 | Google Inc. | Buffer objects for web-based configurable pipeline media processing |
US20150019203A1 (en) * | 2011-12-28 | 2015-01-15 | Elliot Smith | Real-time natural language processing of datastreams |
US8938393B2 (en) | 2011-06-28 | 2015-01-20 | Sony Corporation | Extended videolens media engine for audio recognition |
US20150058358A1 (en) * | 2013-08-21 | 2015-02-26 | Google Inc. | Providing contextual data for selected link units |
US20150120726A1 (en) * | 2013-10-30 | 2015-04-30 | Texas Instruments Incorporated | Using Audio Cues to Improve Object Retrieval in Video |
US20150139557A1 (en) * | 2013-11-20 | 2015-05-21 | Adobe Systems Incorporated | Fast dense patch search and quantization |
US20150193528A1 (en) * | 2012-08-08 | 2015-07-09 | Google Inc. | Identifying Textual Terms in Response to a Visual Query |
US20150235672A1 (en) * | 2014-02-20 | 2015-08-20 | International Business Machines Corporation | Techniques to Bias Video Thumbnail Selection Using Frequently Viewed Segments |
WO2015127385A1 (en) * | 2014-02-24 | 2015-08-27 | Lyve Minds, Inc. | Automatic generation of compilation videos |
US9172740B1 (en) | 2013-01-15 | 2015-10-27 | Google Inc. | Adjustable buffer remote access |
US9189834B2 (en) | 2013-11-14 | 2015-11-17 | Adobe Systems Incorporated | Adaptive denoising with internal and external patches |
US9225979B1 (en) | 2013-01-30 | 2015-12-29 | Google Inc. | Remote access encoding |
US9240215B2 (en) | 2011-09-20 | 2016-01-19 | Apple Inc. | Editing operations facilitated by metadata |
US9292552B2 (en) * | 2012-07-26 | 2016-03-22 | Telefonaktiebolaget L M Ericsson (Publ) | Apparatus, methods, and computer program products for adaptive multimedia content indexing |
US9311692B1 (en) | 2013-01-25 | 2016-04-12 | Google Inc. | Scalable buffer remote access |
CN105488183A (en) * | 2015-12-01 | 2016-04-13 | 北京邮电大学世纪学院 | Method and apparatus for mining temporal-spatial correlation relationship among grotto frescoes in grotto fresco group |
US9338477B2 (en) | 2010-09-10 | 2016-05-10 | Thomson Licensing | Recovering a pruned version of a picture in a video sequence for example-based data pruning using intra-frame patch similarity |
US20160147760A1 (en) * | 2014-11-26 | 2016-05-26 | Adobe Systems Incorporated | Providing alternate words to aid in drafting effective social media posts |
US20160180881A1 (en) * | 2014-12-19 | 2016-06-23 | Oracle International Corporation | Video storytelling based on conditions determined from a business object |
US20160299968A1 (en) * | 2015-04-09 | 2016-10-13 | Yahoo! Inc. | Topical based media content summarization system and method |
US20160335362A1 (en) * | 2008-05-26 | 2016-11-17 | Kenshoo Ltd. | System for finding website invitation cueing keywords and for atrribute-based generation of invitation-cueing instructions |
WO2016210268A1 (en) * | 2015-06-24 | 2016-12-29 | Google Inc. | Selecting representative video frames for videos |
US9536564B2 (en) | 2011-09-20 | 2017-01-03 | Apple Inc. | Role-facilitated editing operations |
US9544598B2 (en) | 2010-09-10 | 2017-01-10 | Thomson Licensing | Methods and apparatus for pruning decision optimization in example-based data pruning compression |
US20170011068A1 (en) * | 2015-07-07 | 2017-01-12 | Adobe Systems Incorporated | Extrapolative Search Techniques |
US20170011643A1 (en) * | 2015-07-10 | 2017-01-12 | Fujitsu Limited | Ranking of segments of learning materials |
US9602814B2 (en) | 2010-01-22 | 2017-03-21 | Thomson Licensing | Methods and apparatus for sampling-based super resolution video encoding and decoding |
US9633015B2 (en) | 2012-07-26 | 2017-04-25 | Telefonaktiebolaget Lm Ericsson (Publ) | Apparatus and methods for user generated content indexing |
US9715641B1 (en) * | 2010-12-08 | 2017-07-25 | Google Inc. | Learning highlights using event detection |
US20170243065A1 (en) * | 2016-02-19 | 2017-08-24 | Samsung Electronics Co., Ltd. | Electronic device and video recording method thereof |
US9767540B2 (en) | 2014-05-16 | 2017-09-19 | Adobe Systems Incorporated | Patch partitions and image processing |
US9779775B2 (en) | 2014-02-24 | 2017-10-03 | Lyve Minds, Inc. | Automatic generation of compilation videos from an original video based on metadata associated with the original video |
US9813707B2 (en) | 2010-01-22 | 2017-11-07 | Thomson Licensing Dtv | Data pruning for video compression using example-based super-resolution |
CN107463592A (en) * | 2016-06-06 | 2017-12-12 | 百度(美国)有限责任公司 | For by the method, equipment and data handling system of content item and images match |
US9858340B1 (en) | 2016-04-11 | 2018-01-02 | Digital Reasoning Systems, Inc. | Systems and methods for queryable graph representations of videos |
US9870802B2 (en) | 2011-01-28 | 2018-01-16 | Apple Inc. | Media clip management |
US20180084023A1 (en) * | 2016-09-20 | 2018-03-22 | Facebook, Inc. | Video Keyframes Display on Online Social Networks |
CN107870959A (en) * | 2016-09-23 | 2018-04-03 | 奥多比公司 | Inquired about in response to video search and associated video scene is provided |
EP3327590A1 (en) * | 2016-11-29 | 2018-05-30 | Beijing Xiaomi Mobile Software Co., Ltd. | Method and device for adjusting video playback position |
US9997196B2 (en) | 2011-02-16 | 2018-06-12 | Apple Inc. | Retiming media presentations |
US10008218B2 (en) | 2016-08-03 | 2018-06-26 | Dolby Laboratories Licensing Corporation | Blind bandwidth extension using K-means and a support vector machine |
CN108205581A (en) * | 2016-12-20 | 2018-06-26 | 奥多比公司 | The compact video features generated in digital media environment represent |
US10062015B2 (en) | 2015-06-25 | 2018-08-28 | The Nielsen Company (Us), Llc | Methods and apparatus for identifying objects depicted in a video using extracted video frames in combination with a reverse image search engine |
US10115433B2 (en) * | 2015-09-09 | 2018-10-30 | A9.Com, Inc. | Section identification in video content |
US20180365216A1 (en) * | 2017-06-20 | 2018-12-20 | The Boeing Company | Text mining a dataset of electronic documents to discover terms of interest |
CN109089133A (en) * | 2018-08-07 | 2018-12-25 | 北京市商汤科技开发有限公司 | Method for processing video frequency and device, electronic equipment and storage medium |
US20190026367A1 (en) * | 2017-07-24 | 2019-01-24 | International Business Machines Corporation | Navigating video scenes using cognitive insights |
CN109376145A (en) * | 2018-11-19 | 2019-02-22 | 深圳Tcl新技术有限公司 | The method for building up of movie dialogue database establishes device and storage medium |
CN109598527A (en) * | 2017-09-30 | 2019-04-09 | 北京国双科技有限公司 | Analysis of advertising results method and device |
US10289810B2 (en) | 2013-08-29 | 2019-05-14 | Telefonaktiebolaget Lm Ericsson (Publ) | Method, content owner device, computer program, and computer program product for distributing content items to authorized users |
US10311112B2 (en) | 2016-08-09 | 2019-06-04 | Zorroa Corporation | Linearized search of visual media |
US10311038B2 (en) | 2013-08-29 | 2019-06-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Methods, computer program, computer program product and indexing systems for indexing or updating index |
US10318575B2 (en) * | 2014-11-14 | 2019-06-11 | Zorroa Corporation | Systems and methods of building and using an image catalog |
US10324605B2 (en) | 2011-02-16 | 2019-06-18 | Apple Inc. | Media-editing application with novel editing tools |
US10372991B1 (en) * | 2018-04-03 | 2019-08-06 | Google Llc | Systems and methods that leverage deep learning to selectively store audiovisual content |
US10381022B1 (en) * | 2015-12-23 | 2019-08-13 | Google Llc | Audio classifier |
US10445367B2 (en) | 2013-05-14 | 2019-10-15 | Telefonaktiebolaget Lm Ericsson (Publ) | Search engine for textual content and non-textual content |
CN110381368A (en) * | 2019-07-11 | 2019-10-25 | 北京字节跳动网络技术有限公司 | Video cover generation method, device and electronic equipment |
US10467257B2 (en) | 2016-08-09 | 2019-11-05 | Zorroa Corporation | Hierarchical search folders for a document repository |
US20190354608A1 (en) * | 2018-05-21 | 2019-11-21 | Qingdao Hisense Electronics Co., Ltd. | Display apparatus with intelligent user interface |
US20200012969A1 (en) * | 2017-07-19 | 2020-01-09 | Alibaba Group Holding Limited | Model training method, apparatus, and device, and data similarity determining method, apparatus, and device |
US10592750B1 (en) * | 2015-12-21 | 2020-03-17 | Amazon Technlogies, Inc. | Video rule engine |
WO2020060538A1 (en) * | 2018-09-18 | 2020-03-26 | Google Llc | Methods and systems for processing imagery |
US10628677B2 (en) * | 2016-03-14 | 2020-04-21 | Tencent Technology (Shenzhen) Company Limited | Partner matching method in costarring video, terminal, and computer readable storage medium |
US10664514B2 (en) | 2016-09-06 | 2020-05-26 | Zorroa Corporation | Media search processing using partial schemas |
US10678853B2 (en) * | 2015-12-30 | 2020-06-09 | International Business Machines Corporation | Aligning visual content to search term queries |
CN111432282A (en) * | 2020-04-01 | 2020-07-17 | 腾讯科技(深圳)有限公司 | Video recommendation method and device |
US20200322570A1 (en) * | 2019-04-08 | 2020-10-08 | Baidu.Com Times Technology (Beijing) Co., Ltd. | Method and apparatus for aligning paragraph and video |
US10867183B2 (en) | 2014-09-08 | 2020-12-15 | Google Llc | Selecting and presenting representative frames for video previews |
US10891019B2 (en) * | 2016-02-29 | 2021-01-12 | Huawei Technologies Co., Ltd. | Dynamic thumbnail selection for search results |
CN112399262A (en) * | 2020-10-30 | 2021-02-23 | 深圳Tcl新技术有限公司 | Video searching method, television and storage medium |
US20210224321A1 (en) * | 2018-11-20 | 2021-07-22 | Google Llc | Methods, systems, and media for modifying search results based on search query risk |
WO2021173219A1 (en) * | 2020-02-27 | 2021-09-02 | Rovi Guides, Inc. | Systems and methods for generating dynamic annotations |
CN113378781A (en) * | 2021-06-30 | 2021-09-10 | 北京百度网讯科技有限公司 | Training method and device of video feature extraction model and electronic equipment |
US11128910B1 (en) | 2020-02-27 | 2021-09-21 | Rovi Guides, Inc. | Systems and methods for generating dynamic annotations |
CN113821657A (en) * | 2021-06-10 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Image processing model training method and image processing method based on artificial intelligence |
CN113901263A (en) * | 2021-09-30 | 2022-01-07 | 宿迁硅基智能科技有限公司 | Label generating method and device for video material |
US11250039B1 (en) * | 2018-12-06 | 2022-02-15 | A9.Com, Inc. | Extreme multi-label classification |
US20220114361A1 (en) * | 2020-10-14 | 2022-04-14 | Adobe Inc. | Multi-word concept tagging for images using short text decoder |
US20220157300A1 (en) * | 2020-06-09 | 2022-05-19 | Google Llc | Generation of interactive audio tracks from visual content |
US11442820B2 (en) | 2005-12-19 | 2022-09-13 | Commvault Systems, Inc. | Systems and methods of unified reconstruction in storage systems |
US11445007B2 (en) | 2014-01-25 | 2022-09-13 | Q Technologies, Inc. | Systems and methods for content sharing using uniquely generated identifiers |
US11468105B1 (en) | 2016-12-08 | 2022-10-11 | Okta, Inc. | System for routing of requests |
US11500927B2 (en) * | 2019-10-03 | 2022-11-15 | Adobe Inc. | Adaptive search results for multimedia search queries |
US11507619B2 (en) | 2018-05-21 | 2022-11-22 | Hisense Visual Technology Co., Ltd. | Display apparatus with intelligent user interface |
US11509957B2 (en) | 2018-05-21 | 2022-11-22 | Hisense Visual Technology Co., Ltd. | Display apparatus with intelligent user interface |
US11532111B1 (en) * | 2021-06-10 | 2022-12-20 | Amazon Technologies, Inc. | Systems and methods for generating comic books from video and images |
US11531707B1 (en) | 2019-09-26 | 2022-12-20 | Okta, Inc. | Personalized search based on account attributes |
US11606613B2 (en) | 2020-02-27 | 2023-03-14 | Rovi Guides, Inc. | Systems and methods for generating dynamic annotations |
CN116150428A (en) * | 2021-11-16 | 2023-05-23 | 腾讯科技(深圳)有限公司 | Video tag acquisition method and device, electronic equipment and storage medium |
US20230222129A1 (en) * | 2022-01-11 | 2023-07-13 | International Business Machines Corporation | Measuring relevance of datasets to a data science model |
US11709889B1 (en) * | 2012-03-16 | 2023-07-25 | Google Llc | Content keyword identification |
US11747972B2 (en) | 2011-02-16 | 2023-09-05 | Apple Inc. | Media-editing application with novel editing tools |
US11803556B1 (en) * | 2018-12-10 | 2023-10-31 | Townsend Street Labs, Inc. | System for handling workplace queries using online learning to rank |
EP3872652B1 (en) * | 2020-12-17 | 2023-12-20 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for processing video, electronic device, medium and product |
CN118568279A (en) * | 2024-07-31 | 2024-08-30 | 深圳市泰迅数码有限公司 | Intelligent storage system for photographing and movie creation and display terminal |
US12131503B1 (en) * | 2020-12-21 | 2024-10-29 | Aurora Operations, Inc | Virtual creation and visualization of physically based virtual materials |
WO2024249138A1 (en) * | 2023-05-31 | 2024-12-05 | Microsoft Technology Licensing, Llc | Historical data-based video categorizer |
US12185019B2 (en) | 2017-12-20 | 2024-12-31 | Hisense Visual Technology Co., Ltd. | Smart television and method for displaying graphical user interface of television screen shot |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10594763B2 (en) | 2013-03-15 | 2020-03-17 | adRise, Inc. | Platform-independent content generation for thin client applications |
US10356461B2 (en) | 2013-03-15 | 2019-07-16 | adRise, Inc. | Adaptive multi-device content generation based on associated internet protocol addressing |
US10887421B2 (en) | 2013-03-15 | 2021-01-05 | Tubi, Inc. | Relevant secondary-device content generation based on associated internet protocol addressing |
WO2015061979A1 (en) * | 2013-10-30 | 2015-05-07 | 宇龙计算机通信科技(深圳)有限公司 | Terminal and method for managing video file |
US9842390B2 (en) * | 2015-02-06 | 2017-12-12 | International Business Machines Corporation | Automatic ground truth generation for medical image collections |
CN104881798A (en) * | 2015-06-05 | 2015-09-02 | 北京京东尚科信息技术有限公司 | Device and method for personalized search based on commodity image features |
EP3326083A4 (en) * | 2015-07-23 | 2019-02-06 | Wizr | Video processing |
US9779304B2 (en) | 2015-08-11 | 2017-10-03 | Google Inc. | Feature-based video annotation |
CN106708876B (en) * | 2015-11-16 | 2020-04-21 | 任子行网络技术股份有限公司 | Similar video retrieval method and system based on Lucene |
US11580589B2 (en) * | 2016-10-11 | 2023-02-14 | Ebay Inc. | System, method, and medium to select a product title |
AU2016432315B2 (en) | 2016-12-13 | 2020-05-07 | Google Llc | Compensation pulses for qubit readout |
US10606814B2 (en) * | 2017-01-18 | 2020-03-31 | Microsoft Technology Licensing, Llc | Computer-aided tracking of physical entities |
US10216766B2 (en) * | 2017-03-20 | 2019-02-26 | Adobe Inc. | Large-scale image tagging using image-to-topic embedding |
CN107025275B (en) * | 2017-03-21 | 2019-11-15 | 腾讯科技(深圳)有限公司 | Video searching method and device |
US10268897B2 (en) | 2017-03-24 | 2019-04-23 | International Business Machines Corporation | Determining most representative still image of a video for specific user |
US10740394B2 (en) * | 2018-01-18 | 2020-08-11 | Oath Inc. | Machine-in-the-loop, image-to-video computer vision bootstrapping |
CN110795597A (en) * | 2018-07-17 | 2020-02-14 | 上海智臻智能网络科技股份有限公司 | Video keyword determination method, video retrieval method, video keyword determination device, video retrieval device, storage medium and terminal |
CA3111455C (en) * | 2018-09-12 | 2023-05-09 | Avigilon Coporation | System and method for improving speed of similarity based searches |
WO2020065839A1 (en) * | 2018-09-27 | 2020-04-02 | 株式会社オプティム | Object situation assessment system, object situation assessment method, and program |
CN109933688A (en) * | 2019-02-13 | 2019-06-25 | 北京百度网讯科技有限公司 | Determine the method, apparatus, equipment and computer storage medium of video labeling information |
CN110110140A (en) * | 2019-04-19 | 2019-08-09 | 天津大学 | Video summarization method based on attention expansion coding and decoding network |
CN110362694A (en) * | 2019-07-05 | 2019-10-22 | 武汉莱博信息技术有限公司 | Data in literature search method, equipment and readable storage medium storing program for executing based on artificial intelligence |
WO2021171099A2 (en) * | 2020-02-28 | 2021-09-02 | Lomotif Private Limited | Method for atomically tracking and storing video segments in multi-segment audio-video compositions |
US11645733B2 (en) | 2020-06-16 | 2023-05-09 | Bank Of America Corporation | System and method for providing artificial intelligence architectures to people with disabilities |
CN112015926B (en) * | 2020-08-27 | 2022-03-04 | 北京字节跳动网络技术有限公司 | Search result display method and device, readable medium and electronic equipment |
US11829413B1 (en) * | 2020-09-23 | 2023-11-28 | Amazon Technologies, Inc. | Temporal localization of mature content in long-form videos using only video-level labels |
CN112733779B (en) | 2021-01-19 | 2023-04-07 | 三星电子(中国)研发中心 | Video poster display method and system based on artificial intelligence |
US12022138B2 (en) | 2021-06-21 | 2024-06-25 | Tubi, Inc. | Model serving for advanced frequency management |
CN117643061A (en) * | 2021-07-23 | 2024-03-01 | 聚好看科技股份有限公司 | Display equipment and media asset content recommendation method |
US12105755B1 (en) * | 2022-06-28 | 2024-10-01 | Amazon Technologies, Inc. | Automated content filtering using image retrieval models |
JP7645312B2 (en) | 2023-07-07 | 2025-03-13 | 株式会社Zozo | Information processing device, information processing method, and information processing program |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020164070A1 (en) * | 2001-03-14 | 2002-11-07 | Kuhner Mark B. | Automatic algorithm generation |
US20030097301A1 (en) * | 2001-11-21 | 2003-05-22 | Masahiro Kageyama | Method for exchange information based on computer network |
US6574378B1 (en) * | 1999-01-22 | 2003-06-03 | Kent Ridge Digital Labs | Method and apparatus for indexing and retrieving images using visual keywords |
US20030103565A1 (en) * | 2001-12-05 | 2003-06-05 | Lexing Xie | Structural analysis of videos with hidden markov models and dynamic programming |
US20050267879A1 (en) * | 1999-01-29 | 2005-12-01 | Shunichi Sekiguchi | Method of image feature coding and method of image search |
US20060179051A1 (en) * | 2005-02-09 | 2006-08-10 | Battelle Memorial Institute | Methods and apparatus for steering the analyses of collections of documents |
US20060179454A1 (en) * | 2002-04-15 | 2006-08-10 | Shusman Chad W | Method and apparatus for internet-based interactive programming |
US20070067724A1 (en) * | 1998-12-28 | 2007-03-22 | Yasushi Takahashi | Video information editing method and editing device |
US20070094251A1 (en) * | 2005-10-21 | 2007-04-26 | Microsoft Corporation | Automated rich presentation of a semantic topic |
Family Cites Families (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7263659B2 (en) * | 1998-09-09 | 2007-08-28 | Ricoh Company, Ltd. | Paper-based interface for multimedia information |
US20040125877A1 (en) * | 2000-07-17 | 2004-07-01 | Shin-Fu Chang | Method and system for indexing and content-based adaptive streaming of digital video content |
CN1647070A (en) * | 2001-06-22 | 2005-07-27 | 诺萨·欧莫贵 | System and method for knowledge retrieval, management, delivery and presentation |
US8682097B2 (en) * | 2006-02-14 | 2014-03-25 | DigitalOptics Corporation Europe Limited | Digital image enhancement with reference images |
WO2005076594A1 (en) * | 2004-02-06 | 2005-08-18 | Agency For Science, Technology And Research | Automatic video event detection and indexing |
US8156427B2 (en) * | 2005-08-23 | 2012-04-10 | Ricoh Co. Ltd. | User interface for mixed media reality |
US7639387B2 (en) * | 2005-08-23 | 2009-12-29 | Ricoh Co., Ltd. | Authoring tools using a mixed media environment |
US8156176B2 (en) * | 2005-04-20 | 2012-04-10 | Say Media, Inc. | Browser based multi-clip video editing |
US7680853B2 (en) * | 2006-04-10 | 2010-03-16 | Microsoft Corporation | Clickable snippets in audio/video search results |
US20070255755A1 (en) * | 2006-05-01 | 2007-11-01 | Yahoo! Inc. | Video search engine using joint categorization of video clips and queries based on multiple modalities |
EP2049983A2 (en) * | 2006-08-07 | 2009-04-22 | Yeda Research And Development Co. Ltd. | Data similarity and importance using local and global evidence scores |
US20080120291A1 (en) * | 2006-11-20 | 2008-05-22 | Rexee, Inc. | Computer Program Implementing A Weight-Based Search |
US7840076B2 (en) * | 2006-11-22 | 2010-11-23 | Intel Corporation | Methods and apparatus for retrieving images from a large collection of images |
US20080154889A1 (en) * | 2006-12-22 | 2008-06-26 | Pfeiffer Silvia | Video searching engine and methods |
KR100856027B1 (en) * | 2007-01-09 | 2008-09-03 | 주식회사 태그스토리 | Copyrighted video data service system and method |
CN100461182C (en) * | 2007-05-24 | 2009-02-11 | 北京交通大学 | An Interactive Video Search Method Based on Multi-view |
US8358840B2 (en) * | 2007-07-16 | 2013-01-22 | Alexander Bronstein | Methods and systems for representation and matching of video content |
US8806320B1 (en) * | 2008-07-28 | 2014-08-12 | Cut2It, Inc. | System and method for dynamic and automatic synchronization and manipulation of real-time and on-line streaming media |
US20090263014A1 (en) * | 2008-04-17 | 2009-10-22 | Yahoo! Inc. | Content fingerprinting for video and/or image |
US20090327236A1 (en) * | 2008-06-27 | 2009-12-31 | Microsoft Corporation | Visual query suggestions |
US9390169B2 (en) * | 2008-06-28 | 2016-07-12 | Apple Inc. | Annotation of movies |
US20100191689A1 (en) * | 2009-01-27 | 2010-07-29 | Google Inc. | Video content analysis for automatic demographics recognition of users and videos |
US8559720B2 (en) * | 2009-03-30 | 2013-10-15 | Thomson Licensing S.A. | Using a video processing and text extraction method to identify video segments of interest |
US8873813B2 (en) * | 2012-09-17 | 2014-10-28 | Z Advanced Computing, Inc. | Application of Z-webs and Z-factors to analytics, search engine, learning, recognition, natural language, and other utilities |
US9916538B2 (en) * | 2012-09-15 | 2018-03-13 | Z Advanced Computing, Inc. | Method and system for feature detection |
US8983192B2 (en) * | 2011-11-04 | 2015-03-17 | Google Inc. | High-confidence labeling of video volumes in a video sharing service |
US9070046B2 (en) * | 2012-10-17 | 2015-06-30 | Microsoft Technology Licensing, Llc | Learning-based image webpage index selection |
US9779304B2 (en) * | 2015-08-11 | 2017-10-03 | Google Inc. | Feature-based video annotation |
-
2009
- 2009-08-24 US US12/546,436 patent/US20110047163A1/en not_active Abandoned
-
2010
- 2010-08-18 AU AU2010286797A patent/AU2010286797A1/en not_active Abandoned
- 2010-08-18 CA CA2771593A patent/CA2771593C/en active Active
- 2010-08-18 EP EP10812505.5A patent/EP2471026B1/en active Active
- 2010-08-18 WO PCT/US2010/045909 patent/WO2011025701A1/en active Application Filing
- 2010-08-18 CN CN201080042760.9A patent/CN102549603B/en active Active
- 2010-08-18 EP EP18161198.9A patent/EP3352104A1/en active Pending
-
2015
- 2015-04-15 US US14/687,116 patent/US10614124B2/en active Active
-
2016
- 2016-04-04 AU AU2016202074A patent/AU2016202074B2/en active Active
-
2018
- 2018-03-06 AU AU2018201624A patent/AU2018201624B2/en active Active
- 2018-08-10 US US16/100,414 patent/US11017025B2/en active Active
-
2021
- 2021-05-24 US US17/328,442 patent/US11693902B2/en active Active
-
2023
- 2023-05-22 US US18/321,225 patent/US20230306057A1/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070067724A1 (en) * | 1998-12-28 | 2007-03-22 | Yasushi Takahashi | Video information editing method and editing device |
US6574378B1 (en) * | 1999-01-22 | 2003-06-03 | Kent Ridge Digital Labs | Method and apparatus for indexing and retrieving images using visual keywords |
US20050267879A1 (en) * | 1999-01-29 | 2005-12-01 | Shunichi Sekiguchi | Method of image feature coding and method of image search |
US20020164070A1 (en) * | 2001-03-14 | 2002-11-07 | Kuhner Mark B. | Automatic algorithm generation |
US20030097301A1 (en) * | 2001-11-21 | 2003-05-22 | Masahiro Kageyama | Method for exchange information based on computer network |
US20030103565A1 (en) * | 2001-12-05 | 2003-06-05 | Lexing Xie | Structural analysis of videos with hidden markov models and dynamic programming |
US20060179454A1 (en) * | 2002-04-15 | 2006-08-10 | Shusman Chad W | Method and apparatus for internet-based interactive programming |
US20060179051A1 (en) * | 2005-02-09 | 2006-08-10 | Battelle Memorial Institute | Methods and apparatus for steering the analyses of collections of documents |
US20070094251A1 (en) * | 2005-10-21 | 2007-04-26 | Microsoft Corporation | Automated rich presentation of a semantic topic |
Cited By (185)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11442820B2 (en) | 2005-12-19 | 2022-09-13 | Commvault Systems, Inc. | Systems and methods of unified reconstruction in storage systems |
US20160335362A1 (en) * | 2008-05-26 | 2016-11-17 | Kenshoo Ltd. | System for finding website invitation cueing keywords and for atrribute-based generation of invitation-cueing instructions |
US8111923B2 (en) * | 2008-08-14 | 2012-02-07 | Xerox Corporation | System and method for object class localization and semantic class based image segmentation |
US20100040285A1 (en) * | 2008-08-14 | 2010-02-18 | Xerox Corporation | System and method for object class localization and semantic class based image segmentation |
US9477667B2 (en) * | 2010-01-14 | 2016-10-25 | Mobdub, Llc | Crowdsourced multi-media data relationships |
US20110173214A1 (en) * | 2010-01-14 | 2011-07-14 | Mobdub, Llc | Crowdsourced multi-media data relationships |
US9813707B2 (en) | 2010-01-22 | 2017-11-07 | Thomson Licensing Dtv | Data pruning for video compression using example-based super-resolution |
US9602814B2 (en) | 2010-01-22 | 2017-03-21 | Thomson Licensing | Methods and apparatus for sampling-based super resolution video encoding and decoding |
US20110218994A1 (en) * | 2010-03-05 | 2011-09-08 | International Business Machines Corporation | Keyword automation of video content |
US20110225133A1 (en) * | 2010-03-09 | 2011-09-15 | Microsoft Corporation | Metadata-aware search engine |
US20110229017A1 (en) * | 2010-03-18 | 2011-09-22 | Yuan Liu | Annotation addition method, annotation addition system using the same, and machine-readable medium |
US8737771B2 (en) * | 2010-03-18 | 2014-05-27 | Ricoh Company, Ltd. | Annotation addition method, annotation addition system using the same, and machine-readable medium |
US9544598B2 (en) | 2010-09-10 | 2017-01-10 | Thomson Licensing | Methods and apparatus for pruning decision optimization in example-based data pruning compression |
US20130163679A1 (en) * | 2010-09-10 | 2013-06-27 | Dong-Qing Zhang | Video decoding using example-based data pruning |
US9338477B2 (en) | 2010-09-10 | 2016-05-10 | Thomson Licensing | Recovering a pruned version of a picture in a video sequence for example-based data pruning using intra-frame patch similarity |
US8966515B2 (en) | 2010-11-08 | 2015-02-24 | Sony Corporation | Adaptable videolens media engine |
US9734407B2 (en) | 2010-11-08 | 2017-08-15 | Sony Corporation | Videolens media engine |
US8959071B2 (en) * | 2010-11-08 | 2015-02-17 | Sony Corporation | Videolens media system for feature selection |
US8971651B2 (en) | 2010-11-08 | 2015-03-03 | Sony Corporation | Videolens media engine |
US9594959B2 (en) | 2010-11-08 | 2017-03-14 | Sony Corporation | Videolens media engine |
US20120117046A1 (en) * | 2010-11-08 | 2012-05-10 | Sony Corporation | Videolens media system for feature selection |
US9715641B1 (en) * | 2010-12-08 | 2017-07-25 | Google Inc. | Learning highlights using event detection |
US20170323178A1 (en) * | 2010-12-08 | 2017-11-09 | Google Inc. | Learning highlights using event detection |
US11556743B2 (en) * | 2010-12-08 | 2023-01-17 | Google Llc | Learning highlights using event detection |
US10867212B2 (en) * | 2010-12-08 | 2020-12-15 | Google Llc | Learning highlights using event detection |
US8856638B2 (en) * | 2011-01-03 | 2014-10-07 | Curt Evans | Methods and system for remote control for multimedia seeking |
US20140089799A1 (en) * | 2011-01-03 | 2014-03-27 | Curt Evans | Methods and system for remote control for multimedia seeking |
US11017488B2 (en) | 2011-01-03 | 2021-05-25 | Curtis Evans | Systems, methods, and user interface for navigating media playback using scrollable text |
US9870802B2 (en) | 2011-01-28 | 2018-01-16 | Apple Inc. | Media clip management |
US8838680B1 (en) | 2011-02-08 | 2014-09-16 | Google Inc. | Buffer objects for web-based configurable pipeline media processing |
US10324605B2 (en) | 2011-02-16 | 2019-06-18 | Apple Inc. | Media-editing application with novel editing tools |
US11157154B2 (en) | 2011-02-16 | 2021-10-26 | Apple Inc. | Media-editing application with novel editing tools |
US11747972B2 (en) | 2011-02-16 | 2023-09-05 | Apple Inc. | Media-editing application with novel editing tools |
US9997196B2 (en) | 2011-02-16 | 2018-06-12 | Apple Inc. | Retiming media presentations |
US20140032544A1 (en) * | 2011-03-23 | 2014-01-30 | Xilopix | Method for refining the results of a search within a database |
US8938393B2 (en) | 2011-06-28 | 2015-01-20 | Sony Corporation | Extended videolens media engine for audio recognition |
US8879835B2 (en) * | 2011-08-26 | 2014-11-04 | Adobe Systems Incorporated | Fast adaptive edge-aware matting |
US20130051663A1 (en) * | 2011-08-26 | 2013-02-28 | Aravind Krishnaswamy | Fast Adaptive Edge-Aware Matting |
US9536564B2 (en) | 2011-09-20 | 2017-01-03 | Apple Inc. | Role-facilitated editing operations |
US20130073961A1 (en) * | 2011-09-20 | 2013-03-21 | Giovanni Agnoli | Media Editing Application for Assigning Roles to Media Content |
US9240215B2 (en) | 2011-09-20 | 2016-01-19 | Apple Inc. | Editing operations facilitated by metadata |
US9075825B2 (en) * | 2011-09-26 | 2015-07-07 | The University Of Kansas | System and methods of integrating visual features with textual features for image searching |
US20130080426A1 (en) * | 2011-09-26 | 2013-03-28 | Xue-wen Chen | System and methods of integrating visual features and textual features for image searching |
US9098533B2 (en) * | 2011-10-03 | 2015-08-04 | Microsoft Technology Licensing, Llc | Voice directed context sensitive visual search |
US20130086105A1 (en) * | 2011-10-03 | 2013-04-04 | Microsoft Corporation | Voice directed context sensitive visual search |
US8649613B1 (en) * | 2011-11-03 | 2014-02-11 | Google Inc. | Multiple-instance-learning-based video classification |
CN102542066A (en) * | 2011-11-11 | 2012-07-04 | 冉阳 | Video clustering method, ordering method, video searching method and corresponding devices |
US10366169B2 (en) * | 2011-12-28 | 2019-07-30 | Intel Corporation | Real-time natural language processing of datastreams |
US20150019203A1 (en) * | 2011-12-28 | 2015-01-15 | Elliot Smith | Real-time natural language processing of datastreams |
US9710461B2 (en) * | 2011-12-28 | 2017-07-18 | Intel Corporation | Real-time natural language processing of datastreams |
US9846696B2 (en) * | 2012-02-29 | 2017-12-19 | Telefonaktiebolaget Lm Ericsson (Publ) | Apparatus and methods for indexing multimedia content |
US20130226930A1 (en) * | 2012-02-29 | 2013-08-29 | Telefonaktiebolaget L M Ericsson (Publ) | Apparatus and Methods For Indexing Multimedia Content |
US11709889B1 (en) * | 2012-03-16 | 2023-07-25 | Google Llc | Content keyword identification |
US12147480B2 (en) | 2012-03-16 | 2024-11-19 | Google Inc. | Content keyword identification |
US9633015B2 (en) | 2012-07-26 | 2017-04-25 | Telefonaktiebolaget Lm Ericsson (Publ) | Apparatus and methods for user generated content indexing |
US9292552B2 (en) * | 2012-07-26 | 2016-03-22 | Telefonaktiebolaget L M Ericsson (Publ) | Apparatus, methods, and computer program products for adaptive multimedia content indexing |
US20150193528A1 (en) * | 2012-08-08 | 2015-07-09 | Google Inc. | Identifying Textual Terms in Response to a Visual Query |
US9372920B2 (en) * | 2012-08-08 | 2016-06-21 | Google Inc. | Identifying textual terms in response to a visual query |
US20140095346A1 (en) * | 2012-09-28 | 2014-04-03 | International Business Machines Corporation | Data analysis method and system thereof |
US11222375B2 (en) * | 2012-09-28 | 2022-01-11 | International Business Machines Corporation | Data analysis method and system thereof |
US11176586B2 (en) * | 2012-09-28 | 2021-11-16 | International Business Machines Corporation | Data analysis method and system thereof |
US20140095345A1 (en) * | 2012-09-28 | 2014-04-03 | International Business Machines Corporation | Data analysis method and system thereof |
US9172740B1 (en) | 2013-01-15 | 2015-10-27 | Google Inc. | Adjustable buffer remote access |
US9311692B1 (en) | 2013-01-25 | 2016-04-12 | Google Inc. | Scalable buffer remote access |
US9225979B1 (en) | 2013-01-30 | 2015-12-29 | Google Inc. | Remote access encoding |
US10445367B2 (en) | 2013-05-14 | 2019-10-15 | Telefonaktiebolaget Lm Ericsson (Publ) | Search engine for textual content and non-textual content |
US9521189B2 (en) * | 2013-08-21 | 2016-12-13 | Google Inc. | Providing contextual data for selected link units |
US20150058358A1 (en) * | 2013-08-21 | 2015-02-26 | Google Inc. | Providing contextual data for selected link units |
US10289810B2 (en) | 2013-08-29 | 2019-05-14 | Telefonaktiebolaget Lm Ericsson (Publ) | Method, content owner device, computer program, and computer program product for distributing content items to authorized users |
US10311038B2 (en) | 2013-08-29 | 2019-06-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Methods, computer program, computer program product and indexing systems for indexing or updating index |
US20150120726A1 (en) * | 2013-10-30 | 2015-04-30 | Texas Instruments Incorporated | Using Audio Cues to Improve Object Retrieval in Video |
US10108617B2 (en) * | 2013-10-30 | 2018-10-23 | Texas Instruments Incorporated | Using audio cues to improve object retrieval in video |
US9189834B2 (en) | 2013-11-14 | 2015-11-17 | Adobe Systems Incorporated | Adaptive denoising with internal and external patches |
US9286540B2 (en) * | 2013-11-20 | 2016-03-15 | Adobe Systems Incorporated | Fast dense patch search and quantization |
US20150139557A1 (en) * | 2013-11-20 | 2015-05-21 | Adobe Systems Incorporated | Fast dense patch search and quantization |
US11445007B2 (en) | 2014-01-25 | 2022-09-13 | Q Technologies, Inc. | Systems and methods for content sharing using uniquely generated identifiers |
US11991239B2 (en) | 2014-01-25 | 2024-05-21 | Q Technologies, Inc. | Systems and methods for authorized, proximal device to device communication without prior pairing within a controlled computing system |
US20150235672A1 (en) * | 2014-02-20 | 2015-08-20 | International Business Machines Corporation | Techniques to Bias Video Thumbnail Selection Using Frequently Viewed Segments |
US9728230B2 (en) * | 2014-02-20 | 2017-08-08 | International Business Machines Corporation | Techniques to bias video thumbnail selection using frequently viewed segments |
US9779775B2 (en) | 2014-02-24 | 2017-10-03 | Lyve Minds, Inc. | Automatic generation of compilation videos from an original video based on metadata associated with the original video |
WO2015127385A1 (en) * | 2014-02-24 | 2015-08-27 | Lyve Minds, Inc. | Automatic generation of compilation videos |
US9767540B2 (en) | 2014-05-16 | 2017-09-19 | Adobe Systems Incorporated | Patch partitions and image processing |
US9978129B2 (en) | 2014-05-16 | 2018-05-22 | Adobe Systems Incorporated | Patch partitions and image processing |
US20210166035A1 (en) * | 2014-09-08 | 2021-06-03 | Google Llc | Selecting and presenting representative frames for video previews |
US12014542B2 (en) * | 2014-09-08 | 2024-06-18 | Google Llc | Selecting and presenting representative frames for video previews |
US10867183B2 (en) | 2014-09-08 | 2020-12-15 | Google Llc | Selecting and presenting representative frames for video previews |
US10318575B2 (en) * | 2014-11-14 | 2019-06-11 | Zorroa Corporation | Systems and methods of building and using an image catalog |
US11017018B2 (en) * | 2014-11-14 | 2021-05-25 | Zorroa Corporation | Systems and methods of building and using an image catalog |
US10074102B2 (en) * | 2014-11-26 | 2018-09-11 | Adobe Systems Incorporated | Providing alternate words to aid in drafting effective social media posts |
US20160147760A1 (en) * | 2014-11-26 | 2016-05-26 | Adobe Systems Incorporated | Providing alternate words to aid in drafting effective social media posts |
US20160180881A1 (en) * | 2014-12-19 | 2016-06-23 | Oracle International Corporation | Video storytelling based on conditions determined from a business object |
US20180096707A1 (en) * | 2014-12-19 | 2018-04-05 | Oracle International Corporation | Video storytelling based on conditions determined from a business object |
US9847101B2 (en) * | 2014-12-19 | 2017-12-19 | Oracle International Corporation | Video storytelling based on conditions determined from a business object |
US10347291B2 (en) * | 2014-12-19 | 2019-07-09 | Oracle International Corporation | Video storytelling based on conditions determined from a business object |
US10095786B2 (en) * | 2015-04-09 | 2018-10-09 | Oath Inc. | Topical based media content summarization system and method |
US10769208B2 (en) | 2015-04-09 | 2020-09-08 | Oath Inc. | Topical-based media content summarization system and method |
US20160299968A1 (en) * | 2015-04-09 | 2016-10-13 | Yahoo! Inc. | Topical based media content summarization system and method |
WO2016210268A1 (en) * | 2015-06-24 | 2016-12-29 | Google Inc. | Selecting representative video frames for videos |
US10331984B2 (en) | 2015-06-25 | 2019-06-25 | The Nielsen Company (Us), Llc | Methods and apparatus for identifying objects depicted in a video using extracted video frames in combination with a reverse image search engine |
US11417074B2 (en) | 2015-06-25 | 2022-08-16 | The Nielsen Company (Us), Llc | Methods and apparatus for identifying objects depicted in a video using extracted video frames in combination with a reverse image search engine |
US10062015B2 (en) | 2015-06-25 | 2018-08-28 | The Nielsen Company (Us), Llc | Methods and apparatus for identifying objects depicted in a video using extracted video frames in combination with a reverse image search engine |
US10984296B2 (en) | 2015-06-25 | 2021-04-20 | The Nielsen Company (Us), Llc | Methods and apparatus for identifying objects depicted in a video using extracted video frames in combination with a reverse image search engine |
US20170011068A1 (en) * | 2015-07-07 | 2017-01-12 | Adobe Systems Incorporated | Extrapolative Search Techniques |
US10242033B2 (en) * | 2015-07-07 | 2019-03-26 | Adobe Inc. | Extrapolative search techniques |
US20170011643A1 (en) * | 2015-07-10 | 2017-01-12 | Fujitsu Limited | Ranking of segments of learning materials |
US10140880B2 (en) * | 2015-07-10 | 2018-11-27 | Fujitsu Limited | Ranking of segments of learning materials |
US10115433B2 (en) * | 2015-09-09 | 2018-10-30 | A9.Com, Inc. | Section identification in video content |
CN105488183A (en) * | 2015-12-01 | 2016-04-13 | 北京邮电大学世纪学院 | Method and apparatus for mining temporal-spatial correlation relationship among grotto frescoes in grotto fresco group |
US10592750B1 (en) * | 2015-12-21 | 2020-03-17 | Amazon Technlogies, Inc. | Video rule engine |
US10566009B1 (en) | 2015-12-23 | 2020-02-18 | Google Llc | Audio classifier |
US10381022B1 (en) * | 2015-12-23 | 2019-08-13 | Google Llc | Audio classifier |
US10678853B2 (en) * | 2015-12-30 | 2020-06-09 | International Business Machines Corporation | Aligning visual content to search term queries |
US20170243065A1 (en) * | 2016-02-19 | 2017-08-24 | Samsung Electronics Co., Ltd. | Electronic device and video recording method thereof |
US10891019B2 (en) * | 2016-02-29 | 2021-01-12 | Huawei Technologies Co., Ltd. | Dynamic thumbnail selection for search results |
US10628677B2 (en) * | 2016-03-14 | 2020-04-21 | Tencent Technology (Shenzhen) Company Limited | Partner matching method in costarring video, terminal, and computer readable storage medium |
US10108709B1 (en) | 2016-04-11 | 2018-10-23 | Digital Reasoning Systems, Inc. | Systems and methods for queryable graph representations of videos |
US9858340B1 (en) | 2016-04-11 | 2018-01-02 | Digital Reasoning Systems, Inc. | Systems and methods for queryable graph representations of videos |
CN107463592A (en) * | 2016-06-06 | 2017-12-12 | 百度(美国)有限责任公司 | For by the method, equipment and data handling system of content item and images match |
US10008218B2 (en) | 2016-08-03 | 2018-06-26 | Dolby Laboratories Licensing Corporation | Blind bandwidth extension using K-means and a support vector machine |
US11151168B2 (en) | 2016-08-09 | 2021-10-19 | Zorroa Corporation | Hierarchical search folders for a document repository |
US10311112B2 (en) | 2016-08-09 | 2019-06-04 | Zorroa Corporation | Linearized search of visual media |
US10467257B2 (en) | 2016-08-09 | 2019-11-05 | Zorroa Corporation | Hierarchical search folders for a document repository |
US10664514B2 (en) | 2016-09-06 | 2020-05-26 | Zorroa Corporation | Media search processing using partial schemas |
US10645142B2 (en) * | 2016-09-20 | 2020-05-05 | Facebook, Inc. | Video keyframes display on online social networks |
US20180084023A1 (en) * | 2016-09-20 | 2018-03-22 | Facebook, Inc. | Video Keyframes Display on Online Social Networks |
CN107870959A (en) * | 2016-09-23 | 2018-04-03 | 奥多比公司 | Inquired about in response to video search and associated video scene is provided |
EP3327590A1 (en) * | 2016-11-29 | 2018-05-30 | Beijing Xiaomi Mobile Software Co., Ltd. | Method and device for adjusting video playback position |
US10141025B2 (en) | 2016-11-29 | 2018-11-27 | Beijing Xiaomi Mobile Software Co., Ltd. | Method, device and computer-readable medium for adjusting video playing progress |
US11468105B1 (en) | 2016-12-08 | 2022-10-11 | Okta, Inc. | System for routing of requests |
US11928139B2 (en) | 2016-12-08 | 2024-03-12 | Townsend Street Labs, Inc. | System for routing of requests |
CN108205581A (en) * | 2016-12-20 | 2018-06-26 | 奥多比公司 | The compact video features generated in digital media environment represent |
US10540444B2 (en) * | 2017-06-20 | 2020-01-21 | The Boeing Company | Text mining a dataset of electronic documents to discover terms of interest |
US20180365216A1 (en) * | 2017-06-20 | 2018-12-20 | The Boeing Company | Text mining a dataset of electronic documents to discover terms of interest |
US20200012969A1 (en) * | 2017-07-19 | 2020-01-09 | Alibaba Group Holding Limited | Model training method, apparatus, and device, and data similarity determining method, apparatus, and device |
US20190026367A1 (en) * | 2017-07-24 | 2019-01-24 | International Business Machines Corporation | Navigating video scenes using cognitive insights |
US10970334B2 (en) * | 2017-07-24 | 2021-04-06 | International Business Machines Corporation | Navigating video scenes using cognitive insights |
JP2020528705A (en) * | 2017-07-24 | 2020-09-24 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Moving video scenes using cognitive insights |
JP7123122B2 (en) | 2017-07-24 | 2022-08-22 | キンドリル・インク | Navigating Video Scenes Using Cognitive Insights |
CN109598527A (en) * | 2017-09-30 | 2019-04-09 | 北京国双科技有限公司 | Analysis of advertising results method and device |
US12185019B2 (en) | 2017-12-20 | 2024-12-31 | Hisense Visual Technology Co., Ltd. | Smart television and method for displaying graphical user interface of television screen shot |
US10372991B1 (en) * | 2018-04-03 | 2019-08-06 | Google Llc | Systems and methods that leverage deep learning to selectively store audiovisual content |
US11509957B2 (en) | 2018-05-21 | 2022-11-22 | Hisense Visual Technology Co., Ltd. | Display apparatus with intelligent user interface |
US12126866B2 (en) | 2018-05-21 | 2024-10-22 | Hisense Visual Technology Co., Ltd. | Display apparatus with intelligent user interface |
US20190354608A1 (en) * | 2018-05-21 | 2019-11-21 | Qingdao Hisense Electronics Co., Ltd. | Display apparatus with intelligent user interface |
US11507619B2 (en) | 2018-05-21 | 2022-11-22 | Hisense Visual Technology Co., Ltd. | Display apparatus with intelligent user interface |
US11706489B2 (en) | 2018-05-21 | 2023-07-18 | Hisense Visual Technology Co., Ltd. | Display apparatus with intelligent user interface |
CN109089133A (en) * | 2018-08-07 | 2018-12-25 | 北京市商汤科技开发有限公司 | Method for processing video frequency and device, electronic equipment and storage medium |
US11120078B2 (en) | 2018-08-07 | 2021-09-14 | Beijing Sensetime Technology Development Co., Ltd. | Method and device for video processing, electronic device, and storage medium |
US11947591B2 (en) | 2018-09-18 | 2024-04-02 | Google Llc | Methods and systems for processing imagery |
WO2020060538A1 (en) * | 2018-09-18 | 2020-03-26 | Google Llc | Methods and systems for processing imagery |
EP4002160A1 (en) * | 2018-09-18 | 2022-05-25 | Google LLC | Methods and systems for processing imagery |
JP2021516832A (en) * | 2018-09-18 | 2021-07-08 | グーグル エルエルシーGoogle LLC | Methods and systems for processing images |
CN109376145A (en) * | 2018-11-19 | 2019-02-22 | 深圳Tcl新技术有限公司 | The method for building up of movie dialogue database establishes device and storage medium |
US20210224321A1 (en) * | 2018-11-20 | 2021-07-22 | Google Llc | Methods, systems, and media for modifying search results based on search query risk |
US11609949B2 (en) * | 2018-11-20 | 2023-03-21 | Google Llc | Methods, systems, and media for modifying search results based on search query risk |
US11250039B1 (en) * | 2018-12-06 | 2022-02-15 | A9.Com, Inc. | Extreme multi-label classification |
US11803556B1 (en) * | 2018-12-10 | 2023-10-31 | Townsend Street Labs, Inc. | System for handling workplace queries using online learning to rank |
US11758088B2 (en) * | 2019-04-08 | 2023-09-12 | Baidu.Com Times Technology (Beijing) Co., Ltd. | Method and apparatus for aligning paragraph and video |
US20200322570A1 (en) * | 2019-04-08 | 2020-10-08 | Baidu.Com Times Technology (Beijing) Co., Ltd. | Method and apparatus for aligning paragraph and video |
CN110381368A (en) * | 2019-07-11 | 2019-10-25 | 北京字节跳动网络技术有限公司 | Video cover generation method, device and electronic equipment |
US11531707B1 (en) | 2019-09-26 | 2022-12-20 | Okta, Inc. | Personalized search based on account attributes |
US11500927B2 (en) * | 2019-10-03 | 2022-11-15 | Adobe Inc. | Adaptive search results for multimedia search queries |
US11941049B2 (en) | 2019-10-03 | 2024-03-26 | Adobe Inc. | Adaptive search results for multimedia search queries |
US11128910B1 (en) | 2020-02-27 | 2021-09-21 | Rovi Guides, Inc. | Systems and methods for generating dynamic annotations |
US11606613B2 (en) | 2020-02-27 | 2023-03-14 | Rovi Guides, Inc. | Systems and methods for generating dynamic annotations |
WO2021173219A1 (en) * | 2020-02-27 | 2021-09-02 | Rovi Guides, Inc. | Systems and methods for generating dynamic annotations |
US12108114B2 (en) | 2020-02-27 | 2024-10-01 | Rovi Guides, Inc. | Systems and methods for generating dynamic annotations |
CN111432282A (en) * | 2020-04-01 | 2020-07-17 | 腾讯科技(深圳)有限公司 | Video recommendation method and device |
US12230252B2 (en) * | 2020-06-09 | 2025-02-18 | Google Llc | Generation of interactive audio tracks from visual content |
US20220157300A1 (en) * | 2020-06-09 | 2022-05-19 | Google Llc | Generation of interactive audio tracks from visual content |
US20220114361A1 (en) * | 2020-10-14 | 2022-04-14 | Adobe Inc. | Multi-word concept tagging for images using short text decoder |
CN112399262A (en) * | 2020-10-30 | 2021-02-23 | 深圳Tcl新技术有限公司 | Video searching method, television and storage medium |
EP3872652B1 (en) * | 2020-12-17 | 2023-12-20 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for processing video, electronic device, medium and product |
US11856277B2 (en) | 2020-12-17 | 2023-12-26 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for processing video, electronic device, medium and product |
US12131503B1 (en) * | 2020-12-21 | 2024-10-29 | Aurora Operations, Inc | Virtual creation and visualization of physically based virtual materials |
CN113821657A (en) * | 2021-06-10 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Image processing model training method and image processing method based on artificial intelligence |
US11532111B1 (en) * | 2021-06-10 | 2022-12-20 | Amazon Technologies, Inc. | Systems and methods for generating comic books from video and images |
CN113378781A (en) * | 2021-06-30 | 2021-09-10 | 北京百度网讯科技有限公司 | Training method and device of video feature extraction model and electronic equipment |
CN113901263A (en) * | 2021-09-30 | 2022-01-07 | 宿迁硅基智能科技有限公司 | Label generating method and device for video material |
CN116150428A (en) * | 2021-11-16 | 2023-05-23 | 腾讯科技(深圳)有限公司 | Video tag acquisition method and device, electronic equipment and storage medium |
US20230222129A1 (en) * | 2022-01-11 | 2023-07-13 | International Business Machines Corporation | Measuring relevance of datasets to a data science model |
US11893032B2 (en) * | 2022-01-11 | 2024-02-06 | International Business Machines Corporation | Measuring relevance of datasets to a data science model |
WO2024249138A1 (en) * | 2023-05-31 | 2024-12-05 | Microsoft Technology Licensing, Llc | Historical data-based video categorizer |
US12197501B2 (en) | 2023-05-31 | 2025-01-14 | Microsoft Technology Licensing, Llc | Historical data-based video categorizer |
CN118568279A (en) * | 2024-07-31 | 2024-08-30 | 深圳市泰迅数码有限公司 | Intelligent storage system for photographing and movie creation and display terminal |
Also Published As
Publication number | Publication date |
---|---|
US20180349391A1 (en) | 2018-12-06 |
AU2018201624B2 (en) | 2019-11-21 |
US10614124B2 (en) | 2020-04-07 |
US11017025B2 (en) | 2021-05-25 |
EP3352104A1 (en) | 2018-07-25 |
AU2010286797A1 (en) | 2012-03-15 |
CA2771593C (en) | 2018-10-30 |
CN102549603B (en) | 2015-05-06 |
CN102549603A (en) | 2012-07-04 |
AU2016202074A1 (en) | 2016-04-28 |
EP2471026A1 (en) | 2012-07-04 |
AU2018201624A1 (en) | 2018-03-29 |
US20150220543A1 (en) | 2015-08-06 |
US20230306057A1 (en) | 2023-09-28 |
WO2011025701A1 (en) | 2011-03-03 |
US11693902B2 (en) | 2023-07-04 |
EP2471026A4 (en) | 2014-03-12 |
AU2016202074B2 (en) | 2017-12-07 |
US20210349944A1 (en) | 2021-11-11 |
EP2471026B1 (en) | 2018-04-11 |
CA2771593A1 (en) | 2011-03-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11693902B2 (en) | Relevance-based image selection | |
US10922350B2 (en) | Associating still images and videos | |
CA2817103C (en) | Learning tags for video annotation using latent subtags | |
US8396286B1 (en) | Learning concepts for video annotation | |
US20180293313A1 (en) | Video content retrieval system | |
US20220237247A1 (en) | Selecting content objects for recommendation based on content object collections | |
US8706655B1 (en) | Machine learned classifiers for rating the content quality in videos using panels of human viewers | |
US20140212106A1 (en) | Music soundtrack recommendation engine for videos | |
US20140029801A1 (en) | In-Video Product Annotation with Web Information Mining | |
WO2016038522A1 (en) | Selecting and presenting representative frames for video previews | |
Ulges et al. | A system that learns to tag videos by watching youtube | |
US8880534B1 (en) | Video classification boosting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044142/0357 Effective date: 20170929 |