Retrieve with the long query image of visual information the algorithm that reorders in conjunction with semantic
Technical field
The invention belongs to technical field of information retrieval, specifically a kind of combination is semantic retrieves method for reordering with the long query image of visual information.
Background technology
21 century is the information age, is accompanied by the development of Internet technology and network share service, and network epigraph data increase by geometric progression, and the retrieval of image has become an activity requisite in people's daily life.Along with the network user's retrieval behavior is more and more accurate, what query word became becomes increasingly complex, and complicated long inquiry can be expressed more specific and accurate information than simple queries.But the result for retrieval that existing network search engines returns for long inquiry has wrong sequence conventionally.Trace it to its cause, be mainly because: first, long inquiry is made up of multiple concepts, and this has just further expanded the semantic gap between text query word and vision content.Secondly,, because length is inquired about the rare of positive sample, cause the results of learning based on model poor.For the performance of improving retrieval is experienced and satisfaction to improve user, on initial search result, combining image characteristic information carries out retrieving result reordering and has become a popular research point.
Generally speaking, the characteristic information of image comprises the visual information of text message and the image of image.Existing web image search engine, depends on text matches definite between query statement and textual description, and the result that search is returned is easy to make user dissatisfied.At present, most of images algorithm that reorders adopts visual signature to reorder, and summary is got up, and can be divided into two class algorithms below: based on spurious correlation feedback and reordering based on figure.This two classes method for reordering all relies on visual signature and reorders.But much research points out, only using image vision information to reorder can not achieve satisfactory results.Meanwhile, in the time using long inquiry to retrieve, initial retrieval result is normally insecure, comes image and query word correlativity before initial retrieval result very low.
Summary of the invention
In order to overcome the deficiencies in the prior art, the present invention proposes a kind of combination long query image semantic and visual information and retrieves the algorithm that reorders, and can make full use of image feature information, thereby effectively improve the accuracy that image retrieval reorders.
The present invention is that technical solution problem adopts following technical scheme:
The feature that a kind of combination of the present invention long query image semantic and visual information is retrieved the algorithm that reorders is to carry out as follows:
Step 1, on search engine, input long query statement Q and carry out image retrieval, return to several long query image, choose the long query image that in described long query image, sequence is front N, grow query image by described top n and form initial return-list X={x
1, x
2..., x
u..., x
n, x
ube illustrated in u long query image in described initial return-list, u represents described long query image x
uposition in initial return-list is u, u=0, and 1 ..., N;
Step 2, utilize reptile instrument to obtain unique question and answer pair, and utilize part-of-speech tagging device to collect verb and the noun of described unique question and answer centering, and remove the stop words in described verb and noun, thereby build visual dictionary;
Step 3, utilize partition tools to cut apart described long query statement Q, obtain some statement blocks, and each statement block and described visual dictionary are compared, choose the statement block of the verb that includes in described visual dictionary or noun as visual concept; And form visual concept set C={q by τ visual concept
0, q
1..., q
c..., q
τ-1; q
cbe illustrated in c visual concept in described visual concept set C, c=0,1 ..., τ-1;
Step 4, on search engine, respectively the each visual concept in described visual concept set C is carried out to image retrieval, return to several visual concept images corresponding with each visual concept, choose the visual concept image that in described visual concept image, sequence is front L, by described front L visual concept image construction sample set D={ (X
0; q
0), (X
1; q
1) ..., (X
c; q
c) ..., (X
τ-1; q
τ-1); And X
0=(x
n+1, x
n+2..., x
n+L), X
1=(x
n+L+1, x
n+L+2..., x
n+2L), X
c=(x
n+cL+1, x
n+cL+2..., x
n+cL+ ζ..., x
n+ (c+1) L), X
τ-1=(x
n+ (τ-1) L+1, x
n+ (τ-1) L+2..., x
n+ τ L), X
crepresent and described c visual concept q
ccorresponding visual concept image collection; x
n+cL+ ζrepresent with described c visual concept q
cζ the visual concept image returning while carrying out image retrieval;
Step 5, described N long query image is extracted respectively to text feature and visual signature, obtain long query text characteristic set
with long inquiry visual signature set F={f
1, f
2..., f
u..., f
n;
represent u long query image x
ulist of labels, and formed t by n label
μrepresent μ label; f
urepresent u long query image x
uvisual signature;
Described sample set D is extracted to visual signature, obtain and a described front L Image Visual Feature that visual concept image is corresponding respectively; By the set of described Image Visual Feature constitutive characteristic
represent and described c visual concept q
ccorresponding visual concept image collection X
cthe visual signature extracting; f
n+cL+ ζrepresent with described c visual concept q
cζ the visual concept image x returning while carrying out image retrieval
n+cL+ ζcorresponding Image Visual Feature;
Step 6, utilize formula (1) to set up probability model Score (Q, x
u):
In formula (1), P (q
c| Q) c visual concept q of expression
cfor the significance level of described long query statement Q, P (q
c| x
u) c visual concept q of expression
cwith described u long query image x
urelevance;
Step 7, semantic dependency are estimated:
Step 7.1, utilize formula (2) to estimate the semantic dependency between any two visual concepts:
Sim(q
i,q
j)=Sim
co(q
i,q
j)×Sim
wd(q
i,q
j)×Sim
wiki(q
i,q
j) (2)
In formula (2), Sim
co(q
i, q
j) represent any two visual concept q
iand q
jbetween send out altogether frequency similarity, i, j ∈ 0,1 ..., τ-1, and have:
In formula (3), I represents total number of images all on described search engine; F (q
i) and f (q
j) be illustrated respectively in and on described search engine, input visual concept q
iand q
jthe rear visual concept total number of images of returning respectively; F (q
i, q
j) be illustrated in and on described search engine, input visual concept q simultaneously
iand q
jafter the total number of images returned;
In formula (2), Sim
wd(q
i, q
j) represent any two visual concept q of obtaining by WordNet thesaurus tools
iand q
jbetween similarity, and have:
In formula (4), # (q
i) expression use visual concept q
jafter inquiring about in described WordNet dictionary, visual concept q in the Query Result returning
ithe number of times occurring; # (q
j) expression use visual concept q
iafter inquiring about in described WordNet dictionary, visual concept q in the Query Result returning
jthe number of times occurring;
represent to use visual concept q
jafter inquiring about in described WordNet dictionary, the total number of word of the Query Result returning;
represent to use visual concept q
iafter inquiring about in described WordNet dictionary, the total number of word of the Query Result returning;
In formula (2), Sim
wiki(q
i, q
j) represent any two visual concept q of obtaining by wikipedia
iand q
jbetween similarity, and have:
In formula (5), # (q
i) expression use visual concept q
jafter inquiring about in described wikipedia, visual concept q in the Query Result returning
ithe number of times occurring; # (q
j) expression use visual concept q
iafter inquiring about in described wikipedia, visual concept q in the Query Result returning
jthe number of times occurring;
represent to use visual concept q
jafter inquiring about in described wikipedia, the total number of word of the Query Result returning;
represent to use visual concept q
iafter inquiring about in described wikipedia, the total number of word of the Query Result returning;
Step 7.2, utilize formula (6) to obtain described long query statement Q and c visual concept q
cbetween semantic dependency G (q
c, Q):
Step 7.3, utilize formula (7) obtain c visual concept q
cwith u long query image x
ubetween correlativity G (q
c, x
u):
In formula (7),
represent described u long query image x
ulist of labels
radix;
Step 8, visual correlation are estimated:
Step 8.1, utilize formula (8) to obtain described long query statement Q and c visual concept q
cbetween visual correlation V (q
c, Q):
In formula (8), | X| represents the radix of described initial return-list X; | X
c| represent described and described c visual concept q
ccorresponding visual concept image collection X
cradix; K (f
n+cL+ ζ, f
u) represent Gauss's similar function, and have:
K(f
N+cL+ζ,f
u)=exp(-||f
N+cL+ζ-f
u||
2/δ
2) (9)
In formula (9), δ is scale parameter;
Step 8.2, utilize formula (10) by described c visual concept q
cwith u long query image x
ubetween visual correlation V (q
c, x
u) further decompose:
In formula (10): x
ωrepresent any one visual concept image in sample set D;
Step 8.3, based on markov Random Walk Algorithm, regard described N long query image and τ L visual concept image as node, set up symmetrical κ neighbour and scheme; Through type (11) obtains the connection weight W between φ node and ψ node
φ ψ:
In formula (11), N κ (φ) represents the indexed set of the symmetrical κ neighbour figure of ψ the node calculating by Euclidean distance; N κ (ψ) represents the indexed set of the symmetrical κ neighbour figure of φ the node calculating by Euclidean distance; φ, ψ ∈ (0,1 ..., N+ τ L);
Represent a step transition probability matrix, the elements A in a described step transition probability matrix A with A
ω urepresent to transfer to from ω node the probability of u node, A
ω u=W
ω u/ Σ
ψw
ω ψ; Utilize formula (12) to obtain from ω node and shift the probability P at u Nodes through s step
s|0(x
u| x
ω):
P
s|0(x
u|x
ω)=[A
s]
ωu (12)
Utilize formula (13) to obtain with described any one visual concept image x
ωfor starting point stops at u long query image x through s step
uthe conditional probability P at place
0|s(x
ω| x
u):
Utilize P
0(x
ω)=P
0(x
ψ), formula (13) is rewritten as:
Step 8.4, travel through each the visual concept image in described sample set D, obtain any one visual concept image x
ωwith c visual concept q
cbetween relevance scores P (q
c| x
ω):
In formula (11),
Step 9: the correlation estimation in conjunction with semantic and vision:
Step 9.1, utilize formula (6) and formula (8), obtain c visual concept q
cand final associated score P (q between long query statement Q
c| Q):
P(q
c|Q)=αV(q
c,Q)+(1-α)G(q
c,Q) (15)
In formula (12), α represents that balance semanteme and vision are to described final associated score P (q
c| Q) parameter of significance level, α ∈ (0,1);
Step 9.2, utilize formula (7) and formula (10), obtain c visual concept q
cwith u long query image x
ubetween final associated score P (q
c| x
u):
P(q
c|x
u)=βV(q
c,x
u)+(1-β)G(q
c,x
u) (16)
In formula (13), β represents that balance semanteme and vision are to described final associated score P (q
c| x
u) parameter of significance level, β ∈ (0,1);
Step 10: the probability model Score (Q, the x that obtain according to formula (1)
u) N long query image set X is reordered, thereby obtain the result that reorders of described N long query image.
Compared with the prior art, beneficial effect of the present invention is embodied in:
1, long query statement is considered as multiple visual concepts by the present invention, the result for retrieval of visual concept not only can be expressed the Partial Feature of the result for retrieval of the long inquiry of its composition, the accuracy rate of its retrieval is simultaneously high, and then promotes the correlation estimation accuracy between long inquiry and image;
2, the present invention is by building a probability model, analyze the correlativity of visual concept and long inquiry and initial retrieval result thereof, this method can not be subject to the impact of image sequence in initial retrieval result, overcome the defect that traditional images method for reordering relies on initial ranking results, can effectively promote the performance that reorders;
3, the present invention is in calculating relevance scores process, by text feature and visual signature in conjunction with utilization, adopt the method for linear combination, the semanteme calculating and visual correlation mark are combined, to overcome in image retrieval reorders characteristics of image utilization problem not fully;
4, the present invention, in the correlation process of calculating between concept, combines send out the altogether frequency and WordNet and three kinds of different resources of wikipedia of concept, thereby can estimate more accurately semantic dependency between concept.
Embodiment
In the present embodiment, a kind of combination is semantic retrieves with the long query image of visual information the algorithm that reorders, and is that the Search Results returning for image search engine is resequenced, and carries out as follows:
Step 1, on search engine, input long query statement Q and carry out image retrieval, long query statement has by several a natural language querying statement that the concept that is closely connected forms, for example, long inquiry " people dancing on the wedding " comprises three concepts, is respectively " people ", " dancing ", " wedding ", and be closely connected between these three concepts.From returning several long query image, choosing sequence in long query image is the long query image of front N, forms initial return-list X={x by the long query image of top n
1, x
2..., x
u..., x
n, x
ube illustrated in u long query image in initial return-list, u represents long query image x
uposition in initial return-list is u, u=0, and 1 ..., N;
Step 2, utilize reptile instrument to obtain the unique question and answer pair of dimension on base question and answer website, and utilize part-of-speech tagging device, as Part-Of-Speech Tagger collects verb and the noun of unique question and answer centering, and remove the stop words in verb and noun, thereby build visual dictionary;
Step 3, utilize partition tools, as openNLP instrument, long query statement Q is cut apart, obtain some statement blocks, and each statement block and visual dictionary are compared, choose the statement block of the verb that includes in visual dictionary or noun as visual concept; And form visual concept set C={q by τ visual concept
0, q
1..., q
c..., q
τ-1; q
cbe illustrated in c visual concept in visual concept set C, c=0,1 ..., τ-1;
Step 4, on search engine, respectively the each visual concept in visual concept set C is carried out to image retrieval, return to several visual concept images corresponding with each visual concept, choose the visual concept image that in visual concept image, sequence is front L, by front L visual concept image construction sample set D={ (X
0; q
0), (X
1; q
1) ..., (X
c; q
c) ..., (X
τ-1; q
τ-1); And X
0=(x
n+1, x
n+2..., x
n+L), X
1=(x
n+L+1, x
n+L+2..., x
n+2L), X
c=(x
n+cL+1, x
n+cL+2..., x
n+cL+ ζ..., x
n+ (c+1) L), X
τ-1=(x
n+ (τ-1) L+1, x
n+ (τ-1) L+2..., x
n+ τ L), X
crepresent and c visual concept q
ccorresponding visual concept image collection; x
n+cL+ ζrepresent with c visual concept q
cζ the visual concept image returning while carrying out image retrieval;
Step 5, N long query image is extracted respectively to text feature and visual signature, text feature extracts from image tag text, and visual signature uses the overall situation and local two kinds of features: the global characteristics of 428 dimensions comprises: the Wavelet Texture of the color moment of 225 dimensions, the marginal distribution histogram of 75 dimensions, 128 dimensions; Local feature refers to: use the key point of the every piece image of DOG function check, then extract SIFT feature at the regional area of these key points.Based on K means clustering method, the proper vector of extracting is carried out to cluster, obtain the code book of one 1000 dimension, the word bag histogram that finally obtains 1000 dimensions of every piece image represents.Thereby obtain long query text characteristic set
with long inquiry visual signature set F={f
1, f
2..., f
u..., f
n;
represent u long query image x
ulist of labels, and formed t by n label
μrepresent μ label; f
urepresent u long query image x
uvisual signature;
Sample set D is extracted to visual signature, obtain and front L the Image Visual Feature that visual concept image is corresponding respectively; By the set of Image Visual Feature constitutive characteristic
represent and c visual concept q
ccorresponding visual concept image collection X
cthe visual signature extracting; f
n+cL+ ζrepresent with c visual concept q
cζ the visual concept image x returning while carrying out image retrieval
n+cL+ ζcorresponding Image Visual Feature;
Step 6, employing associated score method, utilize formula (1) to set up probability model Score (Q, x
u):
In formula (1), P (q
c| Q) c visual concept q of expression
cfor the significance level of long query statement Q, P (q
c| x
u) c visual concept q of expression
cwith u long query image x
urelevance;
Step 7, semantic dependency are estimated:
Step 7.1, utilize formula (2) to estimate the semantic dependency between any two visual concepts:
Sim(q
i,q
j)=Sim
co(q
i,q
j)×Sim
wd(q
i,q
j)×Sim
wiki(q
i,q
j) (2)
In formula (2), Sim
co(q
i, q
j) represent any two visual concept q
iand q
jbetween send out altogether frequency similarity, i, j ∈ 0,1 ..., τ-1, and have:
In formula (3), I represents total number of images all on search engine; F (q
i) and f (q
j) be illustrated respectively in and on search engine, input visual concept q
iand q
jthe rear visual concept total number of images of returning respectively; F (q
i, q
j) be illustrated in and on search engine, input visual concept q simultaneously
iand q
jafter the total number of images returned;
In formula (2), Sim
wd(q
i, q
j) represent any two visual concept q of obtaining by WordNet thesaurus tools
iand q
jbetween similarity, and have:
In formula (4), # (q
i) expression use visual concept q
jafter inquiring about in WordNet dictionary, visual concept q in the Query Result returning
ithe number of times occurring; # (q
j) expression use visual concept q
iafter inquiring about in WordNet dictionary, visual concept q in the Query Result returning
jthe number of times occurring;
represent to use visual concept q
jafter inquiring about in WordNet dictionary, the total number of word of the Query Result returning;
represent to use visual concept q
iafter inquiring about in WordNet dictionary, the total number of word of the Query Result returning;
In formula (2), Sim
wiki(q
i, q
j) represent any two visual concept q of obtaining by wikipedia
iand q
jbetween similarity, and have:
In formula (5), # (q
i) expression use visual concept q
jafter inquiring about in wikipedia, visual concept q in the Query Result returning
ithe number of times occurring; # (q
j) expression use visual concept q
iafter inquiring about in wikipedia, visual concept q in the Query Result returning
jthe number of times occurring;
represent to use visual concept q
jafter inquiring about in wikipedia, the total number of word of the Query Result returning;
represent to use visual concept q
iafter inquiring about in wikipedia, the total number of word of the Query Result returning;
Step 7.2, utilize formula (6) to obtain long query statement Q and c visual concept q
cbetween semantic dependency G (q
c, Q):
Step 7.3, adopt simple linear fusion method, so-called linear fusion, by different data values, is merged arrangement, thereby is obtained a new data value, and the new data value obtaining so more definitely also has more generality; Utilize formula (7) to obtain c visual concept q
cwith u long query image x
ubetween correlativity G (q
c, x
u):
In formula (7),
represent u long query image x
ulist of labels
radix;
Step 8, visual correlation are estimated:
Step 8.1, utilize formula (8) to obtain long query statement Q and c visual concept q
cbetween visual correlation V (q
c, Q):
In formula (8), | X| represents the radix of initial return-list X; | X
c| represent and c visual concept q
ccorresponding visual concept image collection X
cradix; K (f
n+cL+ ζ, f
u) represent Gauss's similar function, and have:
K(f
N+cL+ζ,f
u)=exp(-||f
N+cL+ζ-f
u||
2/δ
2) (9)
In formula (9), δ is scale parameter, is made as the right Euclidean distance median of all images;
Step 8.2, utilize formula (10) by c visual concept q
cwith u long query image x
ubetween visual correlation V (q
c, x
u) further decompose:
In formula (10): x
ωrepresent any one visual concept image in sample set D;
Step 8.3, based on markov Random Walk Algorithm, regard N long query image and τ L visual concept image as node, set up symmetrical κ neighbour and scheme; Through type (11) obtains the connection weight W between φ node and ψ node
φ ψ:
In formula (11), N κ (φ) represents the indexed set of the symmetrical κ neighbour figure of ψ the node calculating by Euclidean distance; N κ (ψ) represents the indexed set of the symmetrical κ neighbour figure of φ the node calculating by Euclidean distance; φ, ψ ∈ (0,1 ..., N+ τ L);
Represent a step transition probability matrix, the elements A in a step transition probability matrix A with A
ω urepresent to transfer to from ω node the probability of u node, A
ω u=W
ω u/ Σ
ψw
ω ψ; Utilize formula (12) to obtain from ω node and shift the probability P at u Nodes through s step
s|0(x
u| x
ω):
P
s|0(x
u|x
ω)=[A
s]
ωu (12)
Utilize formula (13) to obtain with any one visual concept image x
ωfor starting point stops at u long query image x through s step
uthe conditional probability P at place
0|s(x
ω| x
u):
Markov random walk starting point is evenly random, utilizes P
0(x
ω)=P
0(x
ψ), formula (13) is rewritten as:
Step 8.4, based on Normalizing Relatedness Cross Concepts method, propose at " Partially labeled classification with Markov random walks " article in 2002, the method can be considered the contact between visual concept in same long inquiry simultaneously; Each visual concept image in traversal sample set D, obtains any one visual concept image x
ωwith c visual concept q
cbetween relevance scores P (q
c| x
ω):
In formula (11), Z is normalized factor, and has:
Step 9: the correlation estimation in conjunction with semantic and vision:
Step 9.1, adopt the method for linear combination, utilize formula (6) and formula (8), obtain c visual concept q
cand final associated score P (q between long query statement Q
c| Q):
P(q
c|Q)=αV(q
c,Q)+(1-α)G(q
c,Q) (15)
In formula (12), α represents that balance semanteme and vision are to final associated score P (q
c| Q) parameter of significance level, α ∈ (0,1), considers that visual correlation mark plays a major role, in the present embodiment, α=0.8;
Step 9.2, utilize formula (7) and formula (10), obtain c visual concept q
cwith u long query image x
ubetween final associated score P (q
c| x
u):
P(q
c|x
u)=βV(q
c,x
u)+(1-β)G(q
c,x
u) (16)
In formula (13), β represents that balance semanteme and vision are to final associated score P (q
c| x
u) parameter of significance level, β ∈ (0,1), considers that visual correlation mark plays a major role, in the present embodiment, and β=0.8;
Step 10: the probability model Score (Q, the x that obtain according to formula (1)
u) N long query image set X reordered, mark, by descending sort, is generated to new sorted lists, thereby obtain the result that reorders of N long query image.