Summary of the invention
For the lower problem of efficient that solves Content-based Video Retrieval, the embodiment of the invention provides a kind of Content-based Video Retrieval method, this method is applied to content addressed network, is provided with a plurality of servers on the node in content addressed network cartesian coordinate space, comprising:
Acquisition module obtains the proper vector of frame of video to be detected;
Search the call number of module according to sensitive video frequency frame proper vector to be compared in the video finger print storehouse, and predefined transformation rule, from content addressed network, find the Cartesian coordinates that correspondence is preserved the server of sensitive video frequency frame proper vector to be compared in the Cartesian coordinates of each server;
Sending module sends to the proper vector of frame of video to be detected the retrieval module of corresponding with service device according to the Cartesian coordinates that finds;
Retrieval module is judged the proper vector of frame of video to be detected and the similarity of sensitive video frequency frame proper vector to be compared, and determines the sensitive video frequency proper vector of coupling.
Simultaneously the embodiment of the invention also provides a kind of Content-based Video Retrieval system, comprising:
The video finger print storehouse: be used to preserve the sensitive video frequency frame proper vector with call number, described video finger print storehouse is evenly distributed in a plurality of servers on the content addressed network cartesian coordinate space node;
Acquisition module: the proper vector that is used to obtain frame of video to be detected;
Search module: be used for call number according to video finger print storehouse sensitive video frequency frame to be compared proper vector, and predefined transformation rule, from the Cartesian coordinates of each server, find the Cartesian coordinates that correspondence is preserved the server of sensitive video frequency frame proper vector to be compared;
Sending module: be used for the proper vector of frame of video to be detected being sent to the retrieval module of corresponding with service device according to the Cartesian coordinates that finds;
Retrieval module: be used to judge the proper vector of frame of video to be detected and the similarity of sensitive video frequency frame proper vector to be compared, and determine the sensitive video frequency proper vector of coupling.
The specific embodiments that is provided by the invention described above as can be seen, just because of application content addressing network rationalization is carried out in the video finger print storehouse, set up index, after obtaining video example to be detected, adopt certain searching algorithm from fingerprint base, to search the sensitive video frequency proper vector of mating most, make the efficient of video frequency searching be improved.
Embodiment
For the lower problem of efficient that solves Content-based Video Retrieval, the embodiment of the invention provides a kind of Content-based Video Retrieval method, to improve recall precision, adapts to the needs of large scale network video frequency searching.When the user imports video example to be measured, adopt certain searching algorithm from fingerprint base, to search the sensitive video frequency frame and the corresponding sensitive video frequency segment of coupling, and pairing similar sensitive video frequency fragment is returned to the user.The video finger print here is meant the frame of video proper vector of extracting from original video data, can represent the content of this video.
Wherein, the vector approximation method can solve the problem of accurate arest neighbors retrieval, and additive method is then only at approximate arest neighbors retrieval.Because video finger print itself is exactly the approximate representation of video content, the arest neighbors of spatial signature vectors does not also mean that arest neighbors on the video content, so even accurately arest neighbors retrieval does not guarantee to obtain Query Result the most accurately yet.And, under many circumstances, select suitable approximate query algorithm can return and the accurate identical result of search algorithm, and have higher efficient.What video frequency searching needed is the balance of a precision and efficient.Particularly in big, higher to the response time requirement occasion of data scale, approximate arest neighbors retrieval will be brought into play more importantly effect, therefore adopt LSH (LocalitySensitive Hashing) algorithm as preferred version.
The LSH algorithm is at first proposed by Indyk and Motwani, utilizes statistical theory, can solve the k-neighbour fast and inquire about problem under the prerequisite that guarantees certain accuracy (with probabilistic manner).Paper " SimilaritySearch in High Dimensions via Hashing " has provided the specific implementation step of this algorithm, its basic thought is, for the point data collection, utilize one group of hash function to set up a plurality of Hash tables with certain constraint condition, make under certain similarity measure condition, the probability that similar point clashes is bigger, and the probability that dissimilar point clashes is less relatively.
First embodiment provided by the invention is a kind of Content-based Video Retrieval method, adopts the LSH algorithm to carry out index in the present embodiment and sets up, and the LSH function definition is: one group of hash function H={h1, ..., hm}, m are positive integer, for data point p, if q is p, distance D (p between the q, q)<R, then P[hi (q)=hi (p)]>P1, if D (p, q)>cR, then P[hi (q)=hi (p)]<P2.Wherein function P (.) is a probability function, and P1, P2 are given probability, and P1>P2, i are random number, i ∈ 1 ..., m}.This group hash function is called as so that (P1 P2) is the LSH group of functions of parameter for R, cR.Data point p wherein, different sensitive video frequency frame proper vector in the corresponding present embodiment of q
The LSH function is:
Wherein vectorial
Satisfy normal distribution (Gaussian distribution), w is any real number, and b is any real number between [0, w].Method flow comprises as shown in Figure 1:
Step 101: n sensitive video frequency frame proper vector is mapped among L the Hash table g.
N sensitive video frequency frame proper vector arranged in the video finger print storehouse
Adopt the LSH algorithm to carry out index and set up, use L hash function g (.) n sensitive video frequency frame proper vector
Be mapped among L the Hash table g, for example: 10 sensitive video frequency frame proper vectors are arranged
With
Wherein
With
Be mapped among the Hash table g1 all the other 5
With
Be mapped among the Hash table g2, adopt different hash function g (.) to be mapped to the sensitive video frequency frame proper vector of Hash table
Can be different, as
With
Be mapped among the Hash table g1 all the other 5
With
Be mapped among the Hash table g2.
Step 102: the sensitive video frequency frame proper vector among each Hash table g is carried out hash by the LSH function, and the gained result is carried out the secondary hash again, and the sensitive video frequency frame proper vector in each Hash table is mapped in a plurality of hash buckets.
Hash table gj=[h1 wherein
(j)..., hk
(j)] (j=1,2 ..., L), hi (.) ∈ H (i=2 ... k), wherein H is a LSH family of functions, i.e. one group of hash function H={h1 ..., hm}.As in the g1 Hash table
With
Carry out the secondary hash, will
With
Be mapped in 7 hash buckets.Hash table g1=[h1
(1), h2
(1), h3
(1), h4
(1)] expression employing h1
(1), h2
(1), h3
(1), h4
(1)Function will
With
Be mapped in 7 hash buckets.As pass through h1
(1)Will
With
Be mapped to first hash bucket, will
With
Be mapped to second hash bucket, pass through h2
(1)Will
With
Be mapped to the 3rd hash bucket, pass through h3
(1)Will
With
Be mapped to the 4th hash bucket, will
With
Be mapped to the 5th hash bucket, pass through h4
(1)Will
With
Be mapped to the 6th hash bucket, will
With
Be mapped to the 7th hash bucket.
Step 103:, determine the Cartesian coordinates of L Hash table numbering correspondence in content addressed network according to the numbering and the predefined transformation rule of L Hash table.
The number table of Hash table g1 is shown binary sequence x, as 0100001010.If
Wherein d is the dimension such as the d=2 of Virtual Space, and m is the figure place 10 of binary sequence x.X is divided into groups to a high position from low level, and per 8 is one group, and being divided into is 2 groups (last group can be discontented with 8), and first group is 00001010, the second group is 10, the dimension of 2 in the corresponding Virtual Space.Calculate every group decimal value xi (i=1 ..., d), first group is 10, the second groups is 1, xi%2
dBe the i dimension coordinate of pairing node, promptly the Cartesian coordinates of 0100001010 correspondence is (2,1).
Adopt the call number of the numbering of Hash table g in the present embodiment as sensitive video frequency frame proper vector.
Step 104: according to the corresponding relation of numbering and the Cartesian coordinates of Hash table g, L Hash table (including the sensitive video frequency frame proper vector that is mapped to wherein) be distributed in N the server in the content addressed network preserve, each server all has corresponding Cartesian coordinates, wherein N≤L in content addressed network.
The Cartesian coordinates that No. 8 servers (Hash table is numbered 0100001010) are corresponding is (2,1), preserves Hash table g1 in this server.Wherein have Cartesian coordinates each server content addressed network diagram as shown in Figure 2.
Step 105: the acquisition module of arbitrary server obtains the proper vector of frame of video to be detected in the content addressed network
One section video to be detected is arranged from network, and No. 4 servers therefrom obtain the proper vector of the frame of video to be detected of this video in the content addressed network
Step 106: obtain
Server search module, determine the Cartesian coordinates of this Hash table place server successively according to the numbering of sensitive video frequency frame proper vector to be compared place Hash table.
As sensitive video frequency frame proper vector to be compared in Hash table g1, No. 4 servers search the Cartesian coordinates that module at first will be determined No. 8 servers in Hash table g1 place, determine that according to the numbering of Hash table g1 the Cartesian coordinates of No. 8 server correspondences is (2,1).
Step 107: obtain
Server sending module will
Send to the retrieval module of the server of storage sensitive video frequency frame proper vector to be compared place Hash table.
The Cartesian coordinates of No. 4 server correspondences is (1,1), is that (2,1) are adjacent because Cartesian coordinates is (1,1) and Cartesian coordinates, and the sending module of No. 4 servers directly will
Send to the retrieval module of No. 8 servers.
If acquisition module obtains the proper vector of frame of video to be detected
Server be not No. 4 servers but No. 8 servers, then the sending module of No. 8 servers directly sends
Retrieval module to book server.
Step 108: the retrieval module of server of storing sensitive video frequency frame proper vector to be compared place Hash table is right
Carry out the secondary hash, it is mapped in the hash bucket of Hash table.
Retrieval module is right
Carrying out the secondary hash is mapped to it in first hash bucket of Hash table g1.
Step 109: take out with
Fall into the sensitive video frequency frame proper vector of same hash bucket, and calculate wherein each sensitive video frequency frame proper vector and
Euclidean distance, judge similarity each other, and determine the sensitive video frequency proper vector of coupling.
In first hash bucket of taking-up Hash table g1
With
And calculate respectively with
Euclidean distance, judge similarity each other, and determine
For with
(similar) sensitive video frequency proper vector of coupling.Up to obtaining abundant similar sensitive video frequency frame proper vector, or relatively finish with whole sensitive video frequency frame proper vectors.
Wherein in the step 102 in order to guarantee the performance of LSH algorithm, need to consider two important parameters here---the number k of LSH function h (.) in the number L of Hash table g and each Hash table.The value of L and k can directly have influence on the performance of this algorithm.Consider following performance index: index Time Created: O (nLkt), wherein t is for calculating required time of each h (.), space: O (the nL)+required space of preservation data point, query time: O (L (kt+dnP
2 k)), should guarantee that L and k have following relation:
Wherein P1 is a given probability in the LSH function as previously mentioned.
Also can adopt in the present embodiment as the multi-dimensional indexing technology as: gridfile, k-d-B tree, quaternary tree, hB tree, R tree and mutation R+ tree and R* tree etc., these all are based on the space or based on the division methods of DATA DISTRIBUTION, generate the index of each sensitive video frequency frame proper vector in the video finger print storehouse by said method, sensitive video frequency frame proper vector with different index number (as 1-10000), be distributed in 10 servers in the content addressed network and preserve, each server all has corresponding Cartesian coordinates in content addressed network.When Content-based Video Retrieval, after wherein the acquisition module of 3# server obtains the proper vector of frame of video to be detected, the 3# server search the call number 1000 of module according to sensitive video frequency frame proper vector to be compared, and predefined transformation rule, from the Cartesian coordinates of 10 servers, find correspondence and preserve the Cartesian coordinates of server of sensitive video frequency frame proper vector to be compared for (0,0); The sending module of 3# server sends to the proper vector of frame of video to be detected the retrieval module of 2# server (Cartesian coordinates is (0,0)) according to the Cartesian coordinates (0,0) that finds;
The retrieval module of 2# server is judged the proper vector of frame of video to be detected and the similarity of sensitive video frequency frame proper vector to be compared.
Second embodiment provided by the invention is an a kind of Content-based Video Retrieval system, and its structure comprises as shown in Figure 3:
Video finger print storehouse 201: be used to preserve the sensitive video frequency frame proper vector with call number, described video finger print storehouse is evenly distributed in a plurality of servers on the content addressed network cartesian coordinate space node;
Acquisition module 202: the proper vector that is used to obtain frame of video to be detected;
Search module 203: be used for call number according to video finger print storehouse sensitive video frequency frame to be compared proper vector, and predefined transformation rule, from the Cartesian coordinates of each server, find the Cartesian coordinates that correspondence is preserved the server of sensitive video frequency frame proper vector to be compared;
Sending module 204: be used for the proper vector of frame of video to be detected being sent to the retrieval module of corresponding with service device according to the Cartesian coordinates that finds;
Retrieval module 205: be used to judge the proper vector of frame of video to be detected and the similarity of sensitive video frequency frame proper vector to be compared, and determine the sensitive video frequency proper vector of coupling.
Further, the video finger print storehouse 201 of each server comprises Hash table 2011: be used to preserve sensitive video frequency frame proper vector, the numbering of described Hash table is as the call number of sensitive video frequency frame proper vector;
Described system also comprises:
Secondary Hash module 206: be used for using the LSH algorithm that each the sensitive video frequency frame proper vector that is kept at each server Hash table is carried out hash, again the gained result carried out the secondary hash, obtain a plurality of hash buckets;
Hash table 2011 comprises a plurality of hash buckets 20111: each the sensitive video frequency frame proper vector that is used for Hash table is preserved and is carried out the sensitive video frequency frame proper vector after the hash twice after carrying out twice hash;
Retrieval submodule 2051: be used for the proper vector of frame of video to be detected is carried out hash twice, obtain the hash bucket of the correspondence that the proper vector of frame of video to be detected is mapped to;
Judge the similarity of sensitive video frequency frame proper vector to be compared in the proper vector of frame of video to be detected and the corresponding hash bucket.
Further, the secondary Hash module 206: also be used for L hash function g (.) with each sensitive video frequency frame proper vector
After being mapped to L the Hash table gj that is kept on each server, by k LSH function hi (.) to each the sensitive video frequency frame proper vector among the Hash table gj
Carry out hash, i.e. gj=[h1 (j) ..., hk (j)] (j=1,2 ..., L), hi (.) ∈ H (i=1,2 ... k), H is a LSH family of functions.
Further, retrieval module 205: also be used for the proper vector by calculating frame of video to be detected and the Euclidean distance of sensitive video frequency frame proper vector to be compared and judge similarity each other.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.