CN104077419A

CN104077419A - Long inquiring image searching reordering algorithm based on semantic and visual information

Info

Publication number: CN104077419A
Application number: CN201410346066.6A
Authority: CN
Inventors: 洪日昌; 高鹏飞; 汪萌; 刘学亮; 郝世杰
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2014-07-18
Filing date: 2014-07-18
Publication date: 2014-10-01
Anticipated expiration: 2034-07-18
Also published as: CN104077419B

Abstract

本发明公开了一种结合语义与视觉信息的长查询图像检索重排序方法，其特征是按如下步骤进行：1.输入长查询语句获得初始返回列表；2.构建视觉词典；3.将长查询语句进行分割，提取视觉概念；4.由视觉概念分别获得各自的初始返回列表；5.提取文本特征和视觉特征；6.建立概率模型；7.语义相关性估计；8.视觉相关性估计；9.结合语义与视觉的相关性估计；10.重排序获得重排序结果。本发明能够充分利用图像特征信息，从而有效提高图像检索重排序的准确性。The invention discloses a long-query image retrieval reordering method combining semantic and visual information, which is characterized in that the steps are as follows: 1. Input a long query sentence to obtain an initial return list; 2. Build a visual dictionary; 3. Convert the long query Segment the sentence and extract the visual concept; 4. Obtain the respective initial return list from the visual concept; 5. Extract text features and visual features; 6. Establish a probability model; 7. Estimation of semantic correlation; 8. Estimation of visual correlation; 9. Combining semantic and visual correlation estimation; 10. Reordering to obtain reordering results. The invention can make full use of image feature information, thereby effectively improving the accuracy of image retrieval and reordering.

Description

Retrieve with the long query image of visual information the algorithm that reorders in conjunction with semantic

Technical field

The invention belongs to technical field of information retrieval, specifically a kind of combination is semantic retrieves method for reordering with the long query image of visual information.

Background technology

21 century is the information age, is accompanied by the development of Internet technology and network share service, and network epigraph data increase by geometric progression, and the retrieval of image has become an activity requisite in people's daily life.Along with the network user's retrieval behavior is more and more accurate, what query word became becomes increasingly complex, and complicated long inquiry can be expressed more specific and accurate information than simple queries.But the result for retrieval that existing network search engines returns for long inquiry has wrong sequence conventionally.Trace it to its cause, be mainly because: first, long inquiry is made up of multiple concepts, and this has just further expanded the semantic gap between text query word and vision content.Secondly,, because length is inquired about the rare of positive sample, cause the results of learning based on model poor.For the performance of improving retrieval is experienced and satisfaction to improve user, on initial search result, combining image characteristic information carries out retrieving result reordering and has become a popular research point.

Generally speaking, the characteristic information of image comprises the visual information of text message and the image of image.Existing web image search engine, depends on text matches definite between query statement and textual description, and the result that search is returned is easy to make user dissatisfied.At present, most of images algorithm that reorders adopts visual signature to reorder, and summary is got up, and can be divided into two class algorithms below: based on spurious correlation feedback and reordering based on figure.This two classes method for reordering all relies on visual signature and reorders.But much research points out, only using image vision information to reorder can not achieve satisfactory results.Meanwhile, in the time using long inquiry to retrieve, initial retrieval result is normally insecure, comes image and query word correlativity before initial retrieval result very low.

Summary of the invention

In order to overcome the deficiencies in the prior art, the present invention proposes a kind of combination long query image semantic and visual information and retrieves the algorithm that reorders, and can make full use of image feature information, thereby effectively improve the accuracy that image retrieval reorders.

The present invention is that technical solution problem adopts following technical scheme:

The feature that a kind of combination of the present invention long query image semantic and visual information is retrieved the algorithm that reorders is to carry out as follows:

Step 1, on search engine, input long query statement Q and carry out image retrieval, return to several long query image, choose the long query image that in described long query image, sequence is front N, grow query image by described top n and form initial return-list X={x ₁, x ₂..., x _u..., x _n, x _ube illustrated in u long query image in described initial return-list, u represents described long query image x _uposition in initial return-list is u, u=0, and 1 ..., N;

Step 2, utilize reptile instrument to obtain unique question and answer pair, and utilize part-of-speech tagging device to collect verb and the noun of described unique question and answer centering, and remove the stop words in described verb and noun, thereby build visual dictionary;

Step 3, utilize partition tools to cut apart described long query statement Q, obtain some statement blocks, and each statement block and described visual dictionary are compared, choose the statement block of the verb that includes in described visual dictionary or noun as visual concept; And form visual concept set C={q by τ visual concept ₀, q ₁..., q _c..., q _τ-1; q _cbe illustrated in c visual concept in described visual concept set C, c=0,1 ..., τ-1;

Step 4, on search engine, respectively the each visual concept in described visual concept set C is carried out to image retrieval, return to several visual concept images corresponding with each visual concept, choose the visual concept image that in described visual concept image, sequence is front L, by described front L visual concept image construction sample set D={ (X ₀; q ₀), (X ₁; q ₁) ..., (X _c; q _c) ..., (X _τ-1; q _τ-1); And X ₀=(x _n+1, x _n+2..., x _n+L), X ₁=(x _n+L+1, x _n+L+2..., x _n+2L), X _c=(x _n+cL+1, x _n+cL+2..., x _{n+cL+ ζ}..., x _{n+ (c+1) L}), X _τ-1=(x _{n+ (τ-1) L+1}, x _{n+ (τ-1) L+2}..., x _{n+ τ L}), X _crepresent and described c visual concept q _ccorresponding visual concept image collection; x _{n+cL+ ζ}represent with described c visual concept q _cζ the visual concept image returning while carrying out image retrieval;

Step 5, described N long query image is extracted respectively to text feature and visual signature, obtain long query text characteristic set with long inquiry visual signature set F={f ₁, f ₂..., f _u..., f _n; represent u long query image x _ulist of labels, and formed t by n label _μrepresent μ label; f _urepresent u long query image x _uvisual signature;

Described sample set D is extracted to visual signature, obtain and a described front L Image Visual Feature that visual concept image is corresponding respectively; By the set of described Image Visual Feature constitutive characteristic represent and described c visual concept q _ccorresponding visual concept image collection X _cthe visual signature extracting; f _{n+cL+ ζ}represent with described c visual concept q _cζ the visual concept image x returning while carrying out image retrieval _{n+cL+ ζ}corresponding Image Visual Feature;

Step 6, utilize formula (1) to set up probability model Score (Q, x _u):

Score (Q, x_{u}) = \underset{q_{c} &Element; C}{Σ} P (q_{c} | Q) \log P (q_{c} | x_{u}) - - - (1)

In formula (1), P (q _c| Q) c visual concept q of expression _cfor the significance level of described long query statement Q, P (q _c| x _u) c visual concept q of expression _cwith described u long query image x _urelevance;

Step 7, semantic dependency are estimated:

Step 7.1, utilize formula (2) to estimate the semantic dependency between any two visual concepts:

Sim(q _i,q _j)＝Sim _co(q _i,q _j)×Sim _wd(q _i,q _j)×Sim _wiki(q _i,q _j) (2)

In formula (2), Sim _co(q _i, q _j) represent any two visual concept q _iand q _jbetween send out altogether frequency similarity, i, j ∈ 0,1 ..., τ-1, and have:

{Sim}_{co} (q_{i}, q_{j}) = \exp (- \frac{\max (\log f (q_{i}), \log f (q_{j})) - \log f (q_{i}, q_{j})}{\log I - \min (\log f (q_{i}), \log f (q_{j}))}) - - - (3)

In formula (3), I represents total number of images all on described search engine; F (q _i) and f (q _j) be illustrated respectively in and on described search engine, input visual concept q _iand q _jthe rear visual concept total number of images of returning respectively; F (q _i, q _j) be illustrated in and on described search engine, input visual concept q simultaneously _iand q _jafter the total number of images returned;

In formula (2), Sim _wd(q _i, q _j) represent any two visual concept q of obtaining by WordNet thesaurus tools _iand q _jbetween similarity, and have:

{Sim}_{wd} (q_{i}, q_{j}) = \frac{# (q_{i}) + # (q_{j})}{# {({words}_{q_{j}})}_{wd} + # {({words}_{q_{i}})}_{wd}} - - - (4)

In formula (4), # (q _i) expression use visual concept q _jafter inquiring about in described WordNet dictionary, visual concept q in the Query Result returning _ithe number of times occurring; # (q _j) expression use visual concept q _iafter inquiring about in described WordNet dictionary, visual concept q in the Query Result returning _jthe number of times occurring; represent to use visual concept q _jafter inquiring about in described WordNet dictionary, the total number of word of the Query Result returning; represent to use visual concept q _iafter inquiring about in described WordNet dictionary, the total number of word of the Query Result returning;

In formula (2), Sim _wiki(q _i, q _j) represent any two visual concept q of obtaining by wikipedia _iand q _jbetween similarity, and have:

{Sim}_{wiki} (q_{i}, q_{j}) = \frac{# (q_{i}) + # (q_{j})}{# {({words}_{q_{j}})}_{wiki} + # {({words}_{q_{i}})}_{wiki}} - - - (5)

In formula (5), # (q _i) expression use visual concept q _jafter inquiring about in described wikipedia, visual concept q in the Query Result returning _ithe number of times occurring; # (q _j) expression use visual concept q _iafter inquiring about in described wikipedia, visual concept q in the Query Result returning _jthe number of times occurring; represent to use visual concept q _jafter inquiring about in described wikipedia, the total number of word of the Query Result returning; represent to use visual concept q _iafter inquiring about in described wikipedia, the total number of word of the Query Result returning;

Step 7.2, utilize formula (6) to obtain described long query statement Q and c visual concept q _cbetween semantic dependency G (q _c, Q):

G (q_{c}, Q) = \frac{1}{τ} Σ_{q_{j} &Element; C} Sim (q_{c}, q_{j}) - - - (6)

Step 7.3, utilize formula (7) obtain c visual concept q _cwith u long query image x _ubetween correlativity G (q _c, x _u):

G (q_{c}, x_{u}) = \frac{Σ_{t_{μ} &Element; T_{x_{u}}} Sim (q_{c}, t_{μ})}{| T_{x_{u}} |} - - - (7)

In formula (7), represent described u long query image x _ulist of labels radix;

Step 8, visual correlation are estimated:

Step 8.1, utilize formula (8) to obtain described long query statement Q and c visual concept q _cbetween visual correlation V (q _c, Q):

V (q_{c}, Q) = \frac{1}{| X_{c} | \times | X |} \underset{f_{N + cL + ζ} &Element; F_{Xc}, f_{u} &Element; F}{Σ} K (f_{N + cL + ζ}, f_{u}) - - - (8)

In formula (8), | X| represents the radix of described initial return-list X; | X _c| represent described and described c visual concept q _ccorresponding visual concept image collection X _cradix; K (f _{n+cL+ ζ}, f _u) represent Gauss's similar function, and have:

K(f _N+cL+ζ,f _u)＝exp(-||f _N+cL+ζ-f _u|| ²/δ ²) (9)

In formula (9), δ is scale parameter;

Step 8.2, utilize formula (10) by described c visual concept q _cwith u long query image x _ubetween visual correlation V (q _c, x _u) further decompose:

V (q_{c}, x_{u}) = Σ_{ω = N + 1}^{N + τL} P (q_{c} | x_{ω}) P (x_{ω} | x_{u}) - - - (10)

In formula (10): x _ωrepresent any one visual concept image in sample set D;

Step 8.3, based on markov Random Walk Algorithm, regard described N long query image and τ L visual concept image as node, set up symmetrical κ neighbour and scheme; Through type (11) obtains the connection weight W between φ node and ψ node _{φ ψ}:

In formula (11), N κ (φ) represents the indexed set of the symmetrical κ neighbour figure of ψ the node calculating by Euclidean distance; N κ (ψ) represents the indexed set of the symmetrical κ neighbour figure of φ the node calculating by Euclidean distance; φ, ψ ∈ (0,1 ..., N+ τ L);

Represent a step transition probability matrix, the elements A in a described step transition probability matrix A with A _{ω u}represent to transfer to from ω node the probability of u node, A _{ω u}=W _{ω u}/ Σ _ψw _{ω ψ}; Utilize formula (12) to obtain from ω node and shift the probability P at u Nodes through s step _s|0(x _u| x _ω):

P _s|0(x _u|x _ω)＝[A ^s] _ωu (12)

Utilize formula (13) to obtain with described any one visual concept image x _ωfor starting point stops at u long query image x through s step _uthe conditional probability P at place _0|s(x _ω| x _u):

P_{0 | s} (x_{ω} | x_{u}) = \frac{P_{s | 0} (x_{u} | x_{ω}) P_{0} (x_{ω})}{Σ_{ψ} P_{s | 0} (x_{u} | x_{ψ}) P_{0} (x_{ψ})} - - - (13)

Utilize P ₀(x _ω)=P ₀(x _ψ), formula (13) is rewritten as:

P_{0 | s} (x_{ω} | x_{u}) = = \frac{{[A^{s}]}_{ωu}}{Σ_{ψ} {[A^{s}]}_{ψu}} - - - (14)

Step 8.4, travel through each the visual concept image in described sample set D, obtain any one visual concept image x _ωwith c visual concept q _cbetween relevance scores P (q _c| x _ω):

P (q_{c} | x_{ω}) = \frac{1}{Z} \underset{x_{N + cL + ζ} &Element; X_{c}}{Σ} P_{0 | s} (x_{N + cL + ζ} | x_{ω}) - - - (15)

In formula (11),

Z = \underset{q_{c} &Element; C}{Σ} \underset{x_{N + cL + ζ} &Element; X_{c}}{Σ} P_{0 | s} (x_{N + cL + ζ} | x_{ω});

Step 9: the correlation estimation in conjunction with semantic and vision:

Step 9.1, utilize formula (6) and formula (8), obtain c visual concept q _cand final associated score P (q between long query statement Q _c| Q):

P(q _c|Q)＝αV(q _c,Q)+(1-α)G(q _c,Q) (15)

In formula (12), α represents that balance semanteme and vision are to described final associated score P (q _c| Q) parameter of significance level, α ∈ (0,1);

Step 9.2, utilize formula (7) and formula (10), obtain c visual concept q _cwith u long query image x _ubetween final associated score P (q _c| x _u):

P(q _c|x _u)＝βV(q _c,x _u)+(1-β)G(q _c,x _u) (16)

In formula (13), β represents that balance semanteme and vision are to described final associated score P (q _c| x _u) parameter of significance level, β ∈ (0,1);

Step 10: the probability model Score (Q, the x that obtain according to formula (1) _u) N long query image set X is reordered, thereby obtain the result that reorders of described N long query image.

Compared with the prior art, beneficial effect of the present invention is embodied in:

1, long query statement is considered as multiple visual concepts by the present invention, the result for retrieval of visual concept not only can be expressed the Partial Feature of the result for retrieval of the long inquiry of its composition, the accuracy rate of its retrieval is simultaneously high, and then promotes the correlation estimation accuracy between long inquiry and image;

2, the present invention is by building a probability model, analyze the correlativity of visual concept and long inquiry and initial retrieval result thereof, this method can not be subject to the impact of image sequence in initial retrieval result, overcome the defect that traditional images method for reordering relies on initial ranking results, can effectively promote the performance that reorders;

3, the present invention is in calculating relevance scores process, by text feature and visual signature in conjunction with utilization, adopt the method for linear combination, the semanteme calculating and visual correlation mark are combined, to overcome in image retrieval reorders characteristics of image utilization problem not fully;

4, the present invention, in the correlation process of calculating between concept, combines send out the altogether frequency and WordNet and three kinds of different resources of wikipedia of concept, thereby can estimate more accurately semantic dependency between concept.

Embodiment

In the present embodiment, a kind of combination is semantic retrieves with the long query image of visual information the algorithm that reorders, and is that the Search Results returning for image search engine is resequenced, and carries out as follows:

Step 1, on search engine, input long query statement Q and carry out image retrieval, long query statement has by several a natural language querying statement that the concept that is closely connected forms, for example, long inquiry " people dancing on the wedding " comprises three concepts, is respectively " people ", " dancing ", " wedding ", and be closely connected between these three concepts.From returning several long query image, choosing sequence in long query image is the long query image of front N, forms initial return-list X={x by the long query image of top n ₁, x ₂..., x _u..., x _n, x _ube illustrated in u long query image in initial return-list, u represents long query image x _uposition in initial return-list is u, u=0, and 1 ..., N;

Step 2, utilize reptile instrument to obtain the unique question and answer pair of dimension on base question and answer website, and utilize part-of-speech tagging device, as Part-Of-Speech Tagger collects verb and the noun of unique question and answer centering, and remove the stop words in verb and noun, thereby build visual dictionary;

Step 3, utilize partition tools, as openNLP instrument, long query statement Q is cut apart, obtain some statement blocks, and each statement block and visual dictionary are compared, choose the statement block of the verb that includes in visual dictionary or noun as visual concept; And form visual concept set C={q by τ visual concept ₀, q ₁..., q _c..., q _τ-1; q _cbe illustrated in c visual concept in visual concept set C, c=0,1 ..., τ-1;

Step 4, on search engine, respectively the each visual concept in visual concept set C is carried out to image retrieval, return to several visual concept images corresponding with each visual concept, choose the visual concept image that in visual concept image, sequence is front L, by front L visual concept image construction sample set D={ (X ₀; q ₀), (X ₁; q ₁) ..., (X _c; q _c) ..., (X _τ-1; q _τ-1); And X ₀=(x _n+1, x _n+2..., x _n+L), X ₁=(x _n+L+1, x _n+L+2..., x _n+2L), X _c=(x _n+cL+1, x _n+cL+2..., x _{n+cL+ ζ}..., x _{n+ (c+1) L}), X _τ-1=(x _{n+ (τ-1) L+1}, x _{n+ (τ-1) L+2}..., x _{n+ τ L}), X _crepresent and c visual concept q _ccorresponding visual concept image collection; x _{n+cL+ ζ}represent with c visual concept q _cζ the visual concept image returning while carrying out image retrieval;

Step 5, N long query image is extracted respectively to text feature and visual signature, text feature extracts from image tag text, and visual signature uses the overall situation and local two kinds of features: the global characteristics of 428 dimensions comprises: the Wavelet Texture of the color moment of 225 dimensions, the marginal distribution histogram of 75 dimensions, 128 dimensions; Local feature refers to: use the key point of the every piece image of DOG function check, then extract SIFT feature at the regional area of these key points.Based on K means clustering method, the proper vector of extracting is carried out to cluster, obtain the code book of one 1000 dimension, the word bag histogram that finally obtains 1000 dimensions of every piece image represents.Thereby obtain long query text characteristic set with long inquiry visual signature set F={f ₁, f ₂..., f _u..., f _n; represent u long query image x _ulist of labels, and formed t by n label _μrepresent μ label; f _urepresent u long query image x _uvisual signature;

Sample set D is extracted to visual signature, obtain and front L the Image Visual Feature that visual concept image is corresponding respectively; By the set of Image Visual Feature constitutive characteristic represent and c visual concept q _ccorresponding visual concept image collection X _cthe visual signature extracting; f _{n+cL+ ζ}represent with c visual concept q _cζ the visual concept image x returning while carrying out image retrieval _{n+cL+ ζ}corresponding Image Visual Feature;

Step 6, employing associated score method, utilize formula (1) to set up probability model Score (Q, x _u):

Score (Q, x_{u}) = \underset{q_{c} &Element; C}{Σ} P (q_{c} | Q) \log P (q_{c} | x_{u}) - - - (1)

In formula (1), P (q _c| Q) c visual concept q of expression _cfor the significance level of long query statement Q, P (q _c| x _u) c visual concept q of expression _cwith u long query image x _urelevance;

Step 7, semantic dependency are estimated:

{Sim}_{co} (q_{i}, q_{j}) = \exp (- \frac{\max (\log f (q_{i}), \log f (q_{j})) - \log f (q_{i}, q_{j})}{\log I - \min (\log f (q_{i}), \log f (q_{j}))}) - - - (3)

In formula (3), I represents total number of images all on search engine; F (q _i) and f (q _j) be illustrated respectively in and on search engine, input visual concept q _iand q _jthe rear visual concept total number of images of returning respectively; F (q _i, q _j) be illustrated in and on search engine, input visual concept q simultaneously _iand q _jafter the total number of images returned;

{Sim}_{wd} (q_{i}, q_{j}) = \frac{# (q_{i}) + # (q_{j})}{# {({words}_{q_{j}})}_{wd} + # {({words}_{q_{i}})}_{wd}} - - - (4)

In formula (4), # (q _i) expression use visual concept q _jafter inquiring about in WordNet dictionary, visual concept q in the Query Result returning _ithe number of times occurring; # (q _j) expression use visual concept q _iafter inquiring about in WordNet dictionary, visual concept q in the Query Result returning _jthe number of times occurring; represent to use visual concept q _jafter inquiring about in WordNet dictionary, the total number of word of the Query Result returning; represent to use visual concept q _iafter inquiring about in WordNet dictionary, the total number of word of the Query Result returning;

{Sim}_{wiki} (q_{i}, q_{j}) = \frac{# (q_{i}) + # (q_{j})}{# {({words}_{q_{j}})}_{wiki} + # {({words}_{q_{i}})}_{wiki}} - - - (5)

In formula (5), # (q _i) expression use visual concept q _jafter inquiring about in wikipedia, visual concept q in the Query Result returning _ithe number of times occurring; # (q _j) expression use visual concept q _iafter inquiring about in wikipedia, visual concept q in the Query Result returning _jthe number of times occurring; represent to use visual concept q _jafter inquiring about in wikipedia, the total number of word of the Query Result returning; represent to use visual concept q _iafter inquiring about in wikipedia, the total number of word of the Query Result returning;

Step 7.2, utilize formula (6) to obtain long query statement Q and c visual concept q _cbetween semantic dependency G (q _c, Q):

G (q_{c}, Q) = \frac{1}{τ} Σ_{q_{j} &Element; C} Sim (q_{c}, q_{j}) - - - (6)

Step 7.3, adopt simple linear fusion method, so-called linear fusion, by different data values, is merged arrangement, thereby is obtained a new data value, and the new data value obtaining so more definitely also has more generality; Utilize formula (7) to obtain c visual concept q _cwith u long query image x _ubetween correlativity G (q _c, x _u):

G (q_{c}, x_{u}) = \frac{Σ_{t_{μ} &Element; T_{x_{u}}} Sim (q_{c}, t_{μ})}{| T_{x_{u}} |} - - - (7)

In formula (7), represent u long query image x _ulist of labels radix;

Step 8, visual correlation are estimated:

Step 8.1, utilize formula (8) to obtain long query statement Q and c visual concept q _cbetween visual correlation V (q _c, Q):

V (q_{c}, Q) = \frac{1}{| X_{c} | \times | X |} \underset{f_{N + cL + ζ} &Element; F_{Xc}, f_{u} &Element; F}{Σ} K (f_{N + cL + ζ}, f_{u}) - - - (8)

In formula (8), | X| represents the radix of initial return-list X; | X _c| represent and c visual concept q _ccorresponding visual concept image collection X _cradix; K (f _{n+cL+ ζ}, f _u) represent Gauss's similar function, and have:

K(f _N+cL+ζ,f _u)＝exp(-||f _N+cL+ζ-f _u|| ²/δ ²) (9)

In formula (9), δ is scale parameter, is made as the right Euclidean distance median of all images;

Step 8.2, utilize formula (10) by c visual concept q _cwith u long query image x _ubetween visual correlation V (q _c, x _u) further decompose:

V (q_{c}, x_{u}) = Σ_{ω = N + 1}^{N + τL} P (q_{c} | x_{ω}) P (x_{ω} | x_{u}) - - - (10)

In formula (10): x _ωrepresent any one visual concept image in sample set D;

Step 8.3, based on markov Random Walk Algorithm, regard N long query image and τ L visual concept image as node, set up symmetrical κ neighbour and scheme; Through type (11) obtains the connection weight W between φ node and ψ node _{φ ψ}:

Represent a step transition probability matrix, the elements A in a step transition probability matrix A with A _{ω u}represent to transfer to from ω node the probability of u node, A _{ω u}=W _{ω u}/ Σ _ψw _{ω ψ}; Utilize formula (12) to obtain from ω node and shift the probability P at u Nodes through s step _s|0(x _u| x _ω):

P _s|0(x _u|x _ω)＝[A ^s] _ωu (12)

Utilize formula (13) to obtain with any one visual concept image x _ωfor starting point stops at u long query image x through s step _uthe conditional probability P at place _0|s(x _ω| x _u):

P_{0 | s} (x_{ω} | x_{u}) = \frac{P_{s | 0} (x_{u} | x_{ω}) P_{0} (x_{ω})}{Σ_{ψ} P_{s | 0} (x_{u} | x_{ψ}) P_{0} (x_{ψ})} - - - (13)

Markov random walk starting point is evenly random, utilizes P ₀(x _ω)=P ₀(x _ψ), formula (13) is rewritten as:

P_{0 | s} (x_{ω} | x_{u}) = = \frac{{[A^{s}]}_{ωu}}{Σ_{ψ} {[A^{s}]}_{ψu}} - - - (14)

Step 8.4, based on Normalizing Relatedness Cross Concepts method, propose at " Partially labeled classification with Markov random walks " article in 2002, the method can be considered the contact between visual concept in same long inquiry simultaneously; Each visual concept image in traversal sample set D, obtains any one visual concept image x _ωwith c visual concept q _cbetween relevance scores P (q _c| x _ω):

P (q_{c} | x_{ω}) = \frac{1}{Z} \underset{x_{N + cL + ζ} &Element; X_{c}}{Σ} P_{0 | s} (x_{N + cL + ζ} | x_{ω}) - - - (15)

In formula (11), Z is normalized factor, and has:

Z = \underset{q_{c} &Element; C}{Σ} \underset{x_{N + cL + ζ} &Element; X_{c}}{Σ} P_{0 | s} (x_{N + cL + ζ} | x_{ω});

Step 9: the correlation estimation in conjunction with semantic and vision:

Step 9.1, adopt the method for linear combination, utilize formula (6) and formula (8), obtain c visual concept q _cand final associated score P (q between long query statement Q _c| Q):

P(q _c|Q)＝αV(q _c,Q)+(1-α)G(q _c,Q) (15)

In formula (12), α represents that balance semanteme and vision are to final associated score P (q _c| Q) parameter of significance level, α ∈ (0,1), considers that visual correlation mark plays a major role, in the present embodiment, α=0.8;

P(q _c|x _u)＝βV(q _c,x _u)+(1-β)G(q _c,x _u) (16)

In formula (13), β represents that balance semanteme and vision are to final associated score P (q _c| x _u) parameter of significance level, β ∈ (0,1), considers that visual correlation mark plays a major role, in the present embodiment, and β=0.8;

Step 10: the probability model Score (Q, the x that obtain according to formula (1) _u) N long query image set X reordered, mark, by descending sort, is generated to new sorted lists, thereby obtain the result that reorders of N long query image.

Claims

1. A long query image retrieval reordering algorithm combining semantics and visual information, characterized in that it proceeds as follows:

Step 1. On the search engine, input a long query sentence Q to perform image retrieval, return several long query images, and select the long query images sorted as the top N in the long query images, which are composed of the first N long query images The initial return list X={x ₁ , x ₂ ,...,x _u ,...,x _N }, x _u represents the uth long query image in the initial return list, u represents the long query image x _u in The position in the initial return list is the uth, u=0,1,...,N;

Step 2, utilize crawler tool to obtain unique question and answer pair, and utilize part-of-speech tagger to collect the verb and noun in described unique question and answer pair, and remove the stop words in described verb and noun, thereby construct visual dictionary;

Step 3, using a segmentation tool to segment the long query statement Q to obtain several statement blocks, and compare each statement block with the visual dictionary, and select a statement block containing verbs or nouns in the visual dictionary As a visual concept; and a visual concept set C={q ₀ , q ₁ ,...,q _c ,...,q _τ-1 } is composed of τ visual concepts; q _c represents the cth in the visual concept set C visual concept, c = 0,1,...,τ-1;

Step 4. On the search engine, perform image retrieval for each visual concept in the visual concept set C, return several visual concept images corresponding to each visual concept, and select the visual concept images sorted as For the previous L visual concept images, a sample set D={(X ₀ ; q ₀ ),(X ₁ ;q ₁ ),...,(X _c ;q _c ),..., (X _τ-1 ; q _τ-1 )}; and X ₀ =(x _N+1 ,x _N+2 ,…,x _N+L ), X ₁ =(x _N+L+1 ,x _{N+ L+2} ,...,x _N+2L ), X _c ＝(x _N+cL+1 ,x _N+cL+2 ,...,x _N+cL+ζ ,...,x _N+(c+1)L ) , X _τ-1 ＝(x _N+(τ-1)L+1 ,x _N+(τ-1)L+2 ,…,x _N+τL ), X _c represents the c-th visual concept q _c Corresponding set of visual concept images; xN _+cL+ζ represents the _ζth visual concept image returned when performing image retrieval with the c-th visual concept qc;

Step 5. Extracting text features and visual features from the N long query images respectively to obtain a set of long query text features And long query visual feature set F={f ₁ ,f ₂ ,…,f _u ,…,f _N }; Indicates the tag list of the u-th long query image x _u , and consists of n tags, t _μ denotes the μ-th tag; f _u denotes the visual features of the u-th long query image x _u ;

Extracting visual features to the sample set D, respectively obtaining image visual features corresponding to the first L visual concept images; forming a feature set by the image visual features Indicates the visual _features extracted from the visual concept image set _Xc corresponding to the c-th visual _concept _qc ; The image visual features corresponding to the ζth visual concept image x _N+cL+ζ ;

Step 6. Establish a probability model Score(Q,x _u ) using formula (1):

Score Score ((Q Q,, {x x}_{u u})) = = \underset{{q q}_{c c} &Element; &Element; C C}{Σ Σ} P P (({q q}_{c c} | | Q Q)) log log P P (({q q}_{c c} | | {x x}_{u u})) - - - - - - ((11))

In formula (1), P(q| _c Q) represents the importance of the c-th visual concept q _c to the long query statement Q, and P(q _c |x _u ) represents the relationship between the c-th visual concept q _c and all Describe the relevance of the uth long query image x _u ;

Step 7. Semantic relevance estimation:

Step 7.1, use formula (2) to estimate the semantic correlation between any two visual concepts:

Sim(q _i ,q _j )＝Sim _co (q _i ,q _j )×Sim _wd (q _i ,q _j )×Sim _wiki (q _i ,q _j ) (2)

In formula (2), Sim _co (q _i ,q _j ) represents the co-occurrence frequency similarity between any two visual concepts q _i and q _j , i,j∈0,1,…,τ-1, and have:

{Sim Sim}_{co co} (({q q}_{i i},, {q q}_{j j})) = = exp exp ((- - \frac{max max ((log log f f (({q q}_{i i})),, log log f f (({q q}_{j j})))) - - log log f f (({q q}_{i i},, {q q}_{j j}))}{log log I I - - min min ((log log f f (({q q}_{i i})),, log log f f (({q q}_{j j}))))})) - - - - - - ((33))

In formula (3), I represents the total number of images on the search engine; f(q _i ) and f(q _j ) represent the visual images returned after inputting visual concepts q _i and q _j respectively on the search engine The total number of concept images; f(q _i , q _j ) represents the total number of images returned after inputting visual concepts q _i and q _j simultaneously on the search engine;

In formula (2), Sim _wd (q _i ,q _j ) represents the similarity between any two visual concepts q _i and q _j obtained through the WordNet dictionary tool, and has:

{Sim Sim}_{wd wd} (({q q}_{i i},, {q q}_{j j})) = = \frac{# # (({q q}_{i i})) + + # # (({q q}_{j j}))}{# # {(({words words}_{{q q}_{j j}}))}_{wd wd} + + # # {(({words words}_{{q q}_{i i}}))}_{wd wd}} - - - - - - ((44))

In formula (4), #(q _i ) represents the number of occurrences of visual concept q _i in the returned query results after using visual concept q _j to query in the WordNet dictionary; #(q _j ) represents the use of visual concept q After _i inquires in described WordNet dictionary, the number of times that visual concept q _j occurs in the query result returned; After expressing to use visual concept _qj to inquire in described WordNet dictionary, the total number of words of the query result that returns; Represents the total number of words of the query result returned after using the visual concept q _i to inquire in the WordNet dictionary;

In formula (2), Sim _wiki (q _i ,q _j ) represents the similarity between any two visual concepts q _i and q _j obtained through Wikipedia, and has:

{Sim Sim}_{wiki wiki} (({q q}_{i i},, {q q}_{j j})) = = \frac{# # (({q q}_{i i})) + + # # (({q q}_{j j}))}{# # {(({words words}_{{q q}_{j j}}))}_{wiki wiki} + + # # {(({words words}_{{q q}_{i i}}))}_{wiki wiki}} - - - - - - ((55))

In formula (5), #(q _i ) represents the number of occurrences of visual concept q _i in the returned query results after using visual concept q _j to query in the Wikipedia; #(q _j ) represents the use of visual concept q After _i performs a query in said Wikipedia, the number of occurrences of visual concept _qj in the returned query result; Indicates the total number of words in the query result returned after using the visual concept q _j to query in the Wikipedia; Indicates the total number of words in the query result returned after using the visual concept q _i to query in the Wikipedia;

Step 7.2, using formula (6) to obtain the semantic correlation G(q _c , Q) between the long query statement Q and the c-th visual concept q _c :

G G (({q q}_{c c},, Q Q)) = = \frac{11}{τ τ} {Σ Σ}_{{q q}_{j j} &Element; &Element; C C} Sim Sim (({q q}_{c c},, {q q}_{j j})) - - - - - - ((66))

Step 7.3. Use formula (7) to obtain the correlation G(q _c ,x _u ) between the c-th visual concept q _c and the u-th long query image x _u :

G G (({q q}_{c c},, {x x}_{u u})) = = \frac{{Σ Σ}_{{t t}_{μ μ} &Element; &Element; {T T}_{{x x}_{u u}}} Sim Sim (({q q}_{c c},, {t t}_{μ μ}))}{| | {T T}_{{x x}_{u u}} | |} - - - - - - ((77))

In formula (7), A list of labels representing the u-th long query image x _u base of

Step 8. Visual correlation estimation:

Step 8.1, using formula (8) to obtain the visual correlation V(q _c , Q) between the long query statement Q and the c-th visual concept q _c :

V V (({q q}_{c c},, Q Q)) = = \frac{11}{| | {X x}_{c c} | | \times \times | | X x | |} \underset{{f f}_{N N + + cL c + + ζ ζ} &Element; &Element; {F f}_{Xc Xc},, {f f}_{u u} &Element; &Element; F f}{Σ Σ} K K (({f f}_{N N + + cL c + + ζ ζ},, {f f}_{u u})) - - - - - - ((88))

In formula (8), |X| represents the cardinality of the initial return list X; |X _c | represents the cardinality of the visual concept image set X _c corresponding to the c-th visual concept q _c ; K( f _N+cL+ζ ,f _u ) represents the Gaussian similarity function, and has:

K(f _N+cL+ζ ,f _u )＝exp(-||f _N+cL+ζ -f _u || ² /δ ² ) (9)

In formula (9), δ is a scale parameter;

Step 8.2, use formula (10) to further decompose the visual correlation V(q _c , x _u ) between the c-th visual concept q _c and the u-th long query image x _u :

V V (({q q}_{c c},, {x x}_{u u})) = = {Σ Σ}_{ω ω = = N N + + 11}^{N N + + τL τL} P P (({q q}_{c c} | | {x x}_{ω ω})) P P (({x x}_{ω ω} | | {x x}_{u u})) - - - - - - ((1010))

In formula (10): x _ω represents any visual concept image in the sample set D;

Step 8.3. Based on the Markov random walk algorithm, the N long query images and τL visual concept images are regarded as nodes, and a symmetrical κ neighbor graph is established; then the φth node and the ψth node are obtained through formula (11). The connection weight W _φψ between nodes:

In formula (11), Nκ(φ) represents the index set of the symmetric κ neighbor graph of the ψth node calculated by the Euclidean distance; Nκ(ψ) represents the index of the symmetric κ neighbor graph of the φth node calculated by the Euclidean distance set; φ, ψ∈(0,1,...,N+τL);

Use A to represent a one-step transition probability matrix, and the element A _ωu in the one-step transition probability matrix A represents the probability of transferring from the ω-th node to the u-th node, A _ωu =W _ωu / _Σψ W _ωψ ; then use the formula ( 12) Obtain the probability P _s|0 (x _u |x _ω ) of moving from the ω-th node to the u-th node after s steps:

P _s|0 (x _u |x _ω )＝[A ^s ] _ωu (12)

Use formula (13) to obtain the conditional probability P _0|s (x _ω |x _u ) of the u-th long query image x _u after s steps starting from any one of the visual concept images x _ω :

{P P}_{00 | | s the s} (({x x}_{ω ω} | | {x x}_{u u})) = = \frac{{P P}_{s the s | | 00} (({x x}_{u u} | | {x x}_{ω ω})) {P P}_{00} (({x x}_{ω ω}))}{{Σ Σ}_{ψ ψ} {P P}_{s the s | | 00} (({x x}_{u u} | | {x x}_{ψ ψ})) {P P}_{00} (({x x}_{ψ ψ}))} - - - - - - ((1313))

Using P ₀ (x _ω )=P ₀ (x _ψ ), rewrite formula (13) as:

{P P}_{00 | | s the s} (({x x}_{ω ω} | | {x x}_{u u})) = = = = \frac{{[[{A A}^{s the s}]]}_{ωu ωu}}{{Σ Σ}_{ψ ψ} {[[{A A}^{s the s}]]}_{ψu ψu}} - - - - - - ((1414))

Step 8.4, traverse each visual concept image in the sample set D, and obtain the correlation score P(q _c |x _ω ) between any visual concept image x _ω and the c-th visual concept q _c :

P P (({q q}_{c c} | | {x x}_{ω ω})) = = \frac{11}{Z Z} \underset{{x x}_{N N + + cL c + + ζ ζ} &Element; &Element; {X x}_{c c}}{Σ Σ} {P P}_{00 | | s the s} (({x x}_{N N + + cL c + + ζ ζ} | | {x x}_{ω ω})) - - - - - - ((1515))

In formula (11),

Z = \underset{q_{c} &Element; C}{Σ} \underset{x_{N + c + ζ} &Element; x_{c}}{Σ} P_{0 | the s} (x_{N + c + ζ} | x_{ω});

Step 9: Combine semantic and visual correlation estimation:

Step 9.1. Using formula (6) and formula (8), obtain the final correlation score P(q _c |Q) between the c-th visual concept q _c and the long query sentence Q:

P(q _c |Q)＝αV(q _c ,Q)+(1-α)G(q _c ,Q) (15)

In formula (12), α represents a parameter that weighs the importance of semantics and vision to the final correlation score P(q _c |Q), α∈(0,1);

Step 9.2. Using formula (7) and formula (10), obtain the final correlation score P(q _c | x _u ) between the c-th visual concept q _c and the u-th long query image x _u :

P(q _c |x _u )＝βV(q _c ,x _u )+(1-β)G(q _c ,x _u ) (16)

In formula (13), β represents a parameter weighing the importance of semantics and vision to the final correlation score P(q _c |x _u ), β∈(0,1);

Step 10: Reorder the N long query image sets X according to the probability model Score(Q,x _u ) obtained by formula (1), so as to obtain the reordering results of the N long query images.