WO2017168125A1 - Procédés de recherche basés sur un croquis - Google Patents
Procédés de recherche basés sur un croquis Download PDFInfo
- Publication number
- WO2017168125A1 WO2017168125A1 PCT/GB2017/050825 GB2017050825W WO2017168125A1 WO 2017168125 A1 WO2017168125 A1 WO 2017168125A1 GB 2017050825 W GB2017050825 W GB 2017050825W WO 2017168125 A1 WO2017168125 A1 WO 2017168125A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sketch
- images
- image
- sketches
- model
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 60
- 230000003190 augmentative effect Effects 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims description 66
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 238000013528 artificial neural network Methods 0.000 claims description 15
- 230000002452 interceptive effect Effects 0.000 abstract description 26
- 238000004891 communication Methods 0.000 abstract description 2
- 238000005070 sampling Methods 0.000 description 16
- 238000013434 data augmentation Methods 0.000 description 12
- 241000282412 Homo Species 0.000 description 8
- 238000013459 approach Methods 0.000 description 8
- 238000011176 pooling Methods 0.000 description 8
- 239000013598 vector Substances 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000000354 decomposition reaction Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- FKOQWAUFKGFWLH-UHFFFAOYSA-M 3,6-bis[2-(1-methylpyridin-1-ium-4-yl)ethenyl]-9h-carbazole;diiodide Chemical compound [I-].[I-].C1=C[N+](C)=CC=C1C=CC1=CC=C(NC=2C3=CC(C=CC=4C=C[N+](C)=CC=4)=CC=2)C3=C1 FKOQWAUFKGFWLH-UHFFFAOYSA-M 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000023886 lateral inhibition Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- -1 style Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5854—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- the present invention relates to sketch-based search methods, i.e. methods in which a hand- drawn sketch is the basis for a search amongst a library of images, and in particular to fine-grained searches, that is methods which return specific matching images rather than image categories.
- Most image searching methods currently known take either text or another image as the input query. If the input query is text, the search method looks for images associated with the query text, e.g. by looking in the metadata of the images or in the context around the images. Image metadata may derive from pre-classification of the searched images. If the input query is an image, then the search method looks for similar images, e.g. using pattern recognition algorithms.
- Text-based image searches work well for images of specific people, places or things if the user knows the proper name for the target. If the target is well known enough then a suitably labelled image can usually be found quite easily. However, if the user does not know the proper name of the target but tries to search by a textual description, for example "red high-heeled shoes with a bow on the toe", then the results are dependent on the user's textual description highlighting the same features and using the same terms as any description applied to the searched images. Thus the accuracy of the search is likely to be dependent on the user's use of common terminology for important features of the search target.
- Image-based image searches require the user to have an image similar to that being searched for, which the user may not have.
- image-based search methods mostly consider the whole image and therefore, if the user desires to find other images of a foreground object in the search query image, images with that foreground object in a different orientation or against a different background may not be found.
- An aspect of the present invention provides a method of searching for images of a target object instance, the method comprising:
- the deep triplet ranking model is a convolutional neural network.
- the convolutional neural network is a Siamese network.
- the deep triplet ranking model is a multi-task model.
- the multi-task model has been trained to learn an auxiliary task comprising an attribute prediction task and/or an attribute learning task.
- the sketch data includes information representing the order of strokes in the sketch.
- An embodiment further comprises:
- An embodiment further comprises:
- the second and third ranked lists of images are provided to the user as a merged updated ranked list of images.
- An aspect of the invention also provides a method of training a neural network to perform fine-grained sketch-based image retrieval, the method comprising:
- a training image gallery comprising a plurality of images of objects and attribute data relating to the objects
- a training sketch gallery comprising a plurality of sketches of objects and attribute data relating to the objects
- each triplet comprising a sketch of a target object, a positive image representing an object similar to the target object and a negative image representing an object dissimilar to the target object;
- generating a plurality of triplets includes selecting positive images and/or negative images by extracting features using a category-trained ranking model.
- An embodiment further comprises generating a plurality of additional sketches by modifying sketches of the training sketch gallery.
- selectively removing strokes comprises randomly removing strokes with a probability based on stroke length and stroke order.
- modifying sketches comprises deforming strokes of sketches individually.
- modifying sketches comprises deforming sketches as a whole.
- An embodiment further comprises pre-training the neural network using images to recognise a plurality of categories of object.
- An embodiment further comprises fine-tuning the pre-trained model using sketches to recognise a plurality of categories of object.
- the neural network is a three-branch triplet network.
- the training objective is a triplet ranking objective.
- the neural network is trained to perform an auxiliary task.
- the auxiliary task comprises an attribute prediction task and/or an attribute ranking task.
- the training is performed with a plurality of hard triplets selected from the automatically generated triplets.
- the present invention can provide an interactive method of searching using a sketch, e.g. done on a touchscreen device, to input the search query.
- the present invention can provide fine-grained instance-level retrieval of images, where at each stage of the interactive iteration the input sketch is augmented with more details through directly sketching on the retrieved images, resulting in increasingly fine-grained matches that are closer to the originally intended object.
- the invention By performing instance-level (rather than category level) retrieval, the invention provides a practical user interface for searching, particularly with the wide and increasing availability of touchscreens.
- Figure 1 is a diagram of an interactive fine-grained sketch-based image retrieval system
- Figure 2 is a diagram of a selective interactive module
- Figure 3 is a diagram of an augmenting interactive module
- Figure 4 is a diagram illustrating a method of training a model
- Figure 5 is an example of a selective sketch on a retrieved image in a method of the invention.
- Figure 6 is an example of an augmenting sketch on a retrieved image in a method of the invention.
- Figure 7 depicts parts of a shoe to which attributes can be applied;
- Figure 8 depicts examples of photos and corresponding sketches used for training the model of an embodiment of the invention.
- Figure 9 depicts a training network in an embodiment of the invention.
- Figure 10 depicts an example of a query sketch and positive and negative edge -extracted photos
- Figure 11 depicts examples of original sketches and generated sketches after removing 10%, 30% and 50% of strokes
- Figure 12 depicts an example process of data augmentation by stroke removal and deformation
- Figure 13 depicts examples of local deformation of sketches in an embodiment of the invention.
- Figure 14 depicts examples of global deformation of sketches in an embodiment of the invention.
- Figure 15 depicts examples of combined local and global deformation of sketches in an embodiment of the invention.
- Figure 16 depicts the network architecture of a model according to an embodiment of the invention.
- Figure 17 depicts ranked lists generated automatically and by humans.
- the present invention provides methods and systems that can accept user-created sketches as the input query.
- Sketches are intuitive and descriptive. They are one of the few means for non-experts to create visual content. As a query modality, they offer a more natural way to provide detailed visual cues than pure text. With the proliferation of touch-screen devices, sketch-based image retrieval (SBIR) has gained tremendous application potential.
- Fine-grained SBIR is challenging due to: (i) free-hand sketches are highly abstract and iconic, e.g., sketched objects do not accurately depict their real-world image counterparts, (ii) sketches and photos are from inherently heterogeneous domains, e.g., sparse black line drawings with white background versus dense colour pixels, potentially with background clutter, (iii) fine-grained correspondence between sketches and images is difficult to establish especially given the abstract and cross-domain nature of the problem.
- An embodiment of the invention brings together attribute and part-centric modelling to decorrelate and better predict attributes, as well as provide two complementary views of the data to enhance matching.
- the present invention can provide a part-aware SBIR framework that addresses the finegrained SBIR challenge by identifying discriminative attributes and parts.
- an off-the- shelf strongly-supervised deformable part-based model (SS-DPM) is first trained to obtain semantic localized regions, followed by low-level feature (e.g. a Histogram of Oriented Features, abbreviated herein as "HOG") extraction within each part region to train part-level attribute detectors (e.g., using conventional Support Vector Machine classifiers).
- HOG Histogram of Oriented Features
- the overall system 1 of an embodiment of the invention has three main parts: a fine-grained retrieval engine 2, a selective sketch interactive module 3 and an augmenting sketch interactive module 4, as shown in Figure 1.
- An interface 5 handles communication with a user.
- the fine-grained retrieval engine 2 comprises a ranking model, e.g. a deep ranking model, trained from a database of sketches and photos. It can then be used non -interactively to retrieve photos similar to an input sketch, or interactively via one of the two interactive modules 3, 4.
- the interactive interface 5 takes the users' feedback, and refines the returned results based on the user's feedback, which can be of a selective type or an augmenting type.
- the fine-grained retrieval engine 2, selective sketch interactive module 3, and augmenting sketch interactive module 4 are described in more detail below.
- Both interactive modules can be iteratively called upon to further refine the retrieval results with multiple rounds of feedback.
- the selective sketch interactive module 3 enables the user to add detail to the original sketch in order to select from an image gallery returned from a previous search.
- the augmenting sketch interactive module 4 enables the user to sketch on an image from the image gallery returned by a previous search in order to retrieve a new set of results. The main differences between the two interactive modules are described below.
- the augmenting sketch interactive module 4 outputs two sets of data: the actual sketches the user draws and the particular image the user sketched on.
- the actual sketch, once combined with the original sketch is provided to the sketch retrieval engine to produce a ranking list; and the selected image is provided to a separate image-level retrieval engine to produce another ranking list.
- the two ranking lists are then merged to generate a final rank list.
- Triplet training data is generated using part-level attributes from the part decomposition and part-level attribute detection component and image features. Both augmented sketch data and triplet training data are fed to a triplet ranking network to train it. There are three key points that contribute to solving the problem of training the model to provide improved results. Each provides advantages individually and they synergistically combine to provide greatly improved ranking accuracy.
- the invention employs sketch-specific data augmentation to solve the problem of sketch data scarcity - far fewer sketches than photos are available.
- Data augmentation of sketches can be temporal, spatial or both.
- the present invention provides cross-domain triplet ranking, part decomposition and part-level attribute detection and automates triplet annotation using attributes.
- the interactive sketching framework described above allows the user to refine the search results in an iterative process which can involve two types of interactive sketches.
- Selective sketches indicate parts of interest on retrieved images, e.g., scribbling around a particular decoration on a pair of shoes the user particularly liked (see shoe 6 in Figure 5).
- Augmenting sketches allow the user to sketch details otherwise not in the images, e.g., sketching a higher heel on top of a retrieved shoe to indicate the desire for a shoe like one sketched on but with a higher heel (see shoe 2 in Figure 6).
- the present invention has been applied, by way of an example, with a fine grained shoe SBIR dataset with images and free-hand human sketches of shoes and chairs. Each image has three sketches corresponding to various drawing styles.
- This dataset provides a solid basis for learning tasks.
- the images in the dataset cover most subcategories of shoes commonly encountered in day life.
- the shoes themselves are unique enough and provide enough visual cues to be differentiated from others.
- the sketches are drawn by non-experts using their fingers on a touch screen, which resembles the real- world situations when sketches are practically used.
- the shoes in the dataset are tagged with a list of fine-grained attributes for shoes, including words most frequently used to describe a shoe, such as “front platform”, “sandal style round”, “running shoe”, “clogs”, “high heel”, “great”, “feminine” and “appeal”. Also included are
- a dataset used with an embodiment uses 13 fine-grained shoe attributes, which can be clustered to one of the four parts of a shoe they are semantically attached to, as shown in Fig. 7 [0069] Selective sketch interactive module
- the function of the selective module is to make a refined retrieval based on a user's preference for a particular region of a particular retrieved image.
- the user draws on a chosen retrieved image to indicate that s/he likes the particular highlighted part of that particular image. (E.g., the style of a shoe's heel or toe in a particular retrieved image. Or the style of a chair's back).
- the ranked list is updated to move examples with parts like the selected one moved higher up the list.
- a system diagram representing a single loop of user interaction is given in Figure 2. In Figure 2, data nodes are illustrated by ellipses and steps effected by technical components are illustrated by rectangles.
- the input sketch query Dl is applied to the fine-grained ranking model CI which has been trained by a triplet-ranking network.
- the fine-grained ranking model CI computes the similarity of the input sketch to every photo in an image library and outputs a ranked list of images D2.
- the user then provides user feedback C2 of selective type by sketching on a specific part of an image. For example, the user draws a circle on an accessory of a particular pairs of shoes he/she likes.
- the part selected by the selective sketch is segmented into a segment image D3.
- the corresponding segments from the entire image library are compared with the segment image D3.
- the process to establish part-level correspondences between image-image, and sketch-image is described below.
- SS-DPM strongly-supervised deformable part-based model
- the part from the selected image is matched C3 to the corresponding part in each other image in the dataset.
- Any suitable image-domain matching method e.g., nearest neighbour based on HOG feature
- a new ranking list D4 is generated by the image-domain matching method (e.g., sorted by nearest neighbour distances).
- the original rank list D2 and the new rank list from the part selection D4 are fused C4 them to compute a final ranked list that reflects both similarity to the user's initial sketch Dl and selected part D3.
- the list fusion can be done with any existing fusion method, for example by averaging the distances produced by the two methods C2 and C3.
- the fusion C4 generates the final rank list D5.
- the process can be iterated. The user can go back to step C2, giving another selective feedback, thus updating the final ranked list.
- Augmenting sketch interactive module
- the interactive module combines an augmenting sketch, drawn on a retrieved image, with the original sketch to generate a new more fine-grained sketch, which gets fed back to the sketch-based finegrained ranking model. It is a way for the user to say "I like this image, but with this ⁇ sketched> additional fine-grained detail it's missing".
- a separate image-level ranking model is also used to produce another rank list with the image the user sketched on as input.
- the two rank lists, one from the sketch-based retrieval engine, the other from the image-based retrieval engine are merged to produce the final ranked list.
- a system diagram representing a single loop of user interaction is given in the Fig. 3 and explained below.
- the input sketch query Dll is input to the fine-grained ranking model Cll to obtain a rank list of photos D12.
- the rank list of photos D12 is updated in each loop.
- the user provides feedback C12 of augmenting type. This means that the user adds some detail on part of the image he or she prefers. For instance, the user adds a high heel on the boot image in the initial retrieval result. This provides two pieces of information: the augmenting sketch D13, representing the detailed part, and the image D14 which the user sketches on, indicating that the user likes this style.
- image-domain matching C14 e.g., Nearest Neighbour (NN) with HOG
- NN Nearest Neighbour
- the rank list of images returned by the sketch-based retrieval engine is updated (D2 updated).
- the new rank list and the one generated from the NN method are then fused to generate the final rank list D17. If further feedback is received, this rank list could be updated until the loop is converged.
- the photo images used cover the variability of the corresponding object category.
- 419 representative images were selected from UT-Zap50K (A. Yu and K.
- each triplet consists of one query sketch and two candidate photos; the task is to determine which one of the two candidate photos is more similar to the query sketch.
- Q(N A 3) Exhaustively annotating all possible triplets (Q(N A 3)) is also out of the question due to the extremely large number of possible triplets.
- the inventors have found it to be sufficient to use only a selected subset of the triplets obtain the annotations through the following three steps:
- Attribute Annotation first an ontology of attributes for shoes and chairs is defined based on existing UT-Zap50K attributes and product tags on online shopping websites. 21 and 15 binary attributes for shoes and chairs respectively were selected and all 1,432 images were annotated with ground-truth attribute vectors.
- Generating Candidate Photos for each Sketch Next, most-similar candidate images, e.g. 10, are selected for each sketch in order to make best use of a limited amount of gold-standard fine-grained annotation effort. In particular, each image was represented by its annotated attribute vector, concatenated with a data driven representation obtained by feeding the image into an existing well-trained deep neural network, such as the recognition network Sketch-a-Net [Sketch-a-Net]
- the present invention provides a deep triplet ranking model learnt using a domain invariant representation _/3 ⁇ 4( ⁇ ) which enables us to measure the similarity between s and p 6 P for retrieval with Euclidean distance:
- each triplet consists of a query sketch s and two photos p + and p ⁇ , namely a positive photo and a negative photo, such that the positive one is more similar to the query sketch than the negative one.
- the goal is to learn a feature mapping _/3 ⁇ 4( ⁇ ) that maps photos and sketches to a common feature embedding space, R d , in which photos similar to particular sketches are closer than those dissimilar ones, i.e., the distance between query s and positive p + is always smaller than the distance between query s and negative p ⁇ :
- T is the training set of triplets
- ⁇ are the parameters of the deep model, which defines a mapping & ( ⁇ ) from the input space to the embedding space
- R( ) is a regulariser ⁇ ⁇ ⁇ .
- each branch in the network of the invention corresponds to one of the atoms in the triplet: query sketch s, positive photo p + and negative photo p ⁇ (see Fig. 9).
- the weights of the two photo branches should always be shared, while the weights of the photo branch and the sketch branch can either be shared or not depending on whether a Siamese network or a heterogeneous network is used.
- the first step is to train the ranking model from scratch to classify a large number, e.g. 1,000, categories using categorised image data with the edge maps.
- the edge maps are extracted from bounding box areas.
- Category Fine-tuning The pre-trained ranking model is fine-tuned to classify a smaller number of categories using free-hand sketch images, so that it also represents well the free-hand sketch inputs.
- a novel form of data augmentation is used to improve performance. This data augmentation strategy is discussed below.
- the result is a set of weights for a single branch of the three-branch ranking network architecture that represent well both free-hand sketch data and photo edge-map data.
- Auxiliary sketch/photo category-paired data can be obtained from independent sketch and photo datasets by selecting categories which exist in both datasets, and collecting sketches and photos from each. For sketches, outliers can be excluded by selecting the 60% most representative images in each category (measured by their scores of the category-trained ranking model of the invention for that category). Edge extraction is performed on the photos using the same strategy as used for the pre- training. This can produce a large number, many thousands, of sketches and photos, paired at the category-level.
- In-class hard negatives photos drawn from the bottom 20% most similar samples to the probe within the same category. Overall these are drawn in a 3 : 1 : 1 ratio. Some examples of sampled positive and negative photos can be seen in Fig. 10.
- FIG. 11 shows some examples of original sketches and generated sketches after removing 10%, 30% and 50% of strokes. Clearly they capture different levels of abstraction for the same object (category) which are likely to present in hand-free sketches.
- Stroke Deformation Different styles of sketching can also be captured by stroke deformations, e.g. by using a Moving Least Squares algorithm for stroke deformation. In the same spirit as stroke removal, the deformation degree should be different across strokes. It can be controlled by the length and curvature of stroke so that strokes with shorter length and smaller curvature are probabilistically deformed more.
- Local Deformation Another approach to data augmentation is local deformation, i.e.
- p p + e, s. t. e ⁇ J ⁇ f (0, rl) (7)
- the standard deviation of the Gaussian noise is the ratio between the linear distance between endpoints and actual length of the stroke. This means that strokes with shorter length and smaller curvature are probabilistically deformed more, while long and curly strokes are deformed less.
- MLS Moving Least Squares
- both sketches and photos are subjected to pre-processing to alleviate misalignment due to scale, aspect ratio, and centering.
- the heights of the bounding boxes for both sketches and images are downscaled to a fixed value of pixels while retaining their original aspect ratios. Then the downscaled sketches and images are located to the centre of a blank canvas with the rest padded by background pixels.
- the architecture of a convolutional neural network that can be used in an embodiment of the invention comprises: five convolutional layers, each with rectifier (ReLU) units, with the first, second and fifth layers followed by max pooling (Maxpool).
- the filter size of the sixth convolutional layer (index 14 in Table 1) is 7 ⁇ 7, which is the same as the output from previous pooling layer, thus it is precisely a fully -connected layer. Then two more fully connected layers are appended. Dropout regularisation is applied on the first two fully connected layers.
- the final layer has 250 output units corresponding to 250 categories (the number of unique classes in the TU-Berlin sketch dataset), upon which we place a softmax loss.
- the details of an example CNN are summarised in Table 1. Note that for simplicity of presentation, fully connected layers explicitly distinguished from their convolutional equivalents.
- a commonality between the above CNN and some known convolutional neural networks for photograph matching is that the number of filters increases with depth. Specifically the first layer is set to 64, and this is doubled after every pooling layer (indices: 3 ⁇ 4, 6 ⁇ 7 and 13 ⁇ 14) until 512. Also, the stride of convolutional layers after the first is set to one. This keeps as much information as possible. Furthermore, zero-padding is used only in L3-5 (indices 7, 9 and 1 1). This is to ensure that the output size is an integer number.
- CNNs used in embodiments of the invention may also differ from conventional neural networks in lacking Local Response Normalisation: Local Response Normalisation (LRN) implements a form of lateral inhibition, which is found in real neurons. This is used pervasively in contemporary CNN recognition architectures (Krizhevsky et al, 2012; Chatfield et al, 2014; Simonyan and Zisserman, 2015). However, in practice LRN's benefit is due to providing "brightness normalisation". This is not necessary in sketches since brightness is not an issue in line-drawings. Thus removing LRN layers makes learning faster without sacrificing performance.
- LRN Local Response Normalisation
- CNNs used in embodiments of the present invention may also have a larger Pooling Size. Many recent CNNs use 2 x 2 max pooling with stride 2. This approach efficiently reduces the size of the layer by 75% while bringing some spatial invariance. However, a CNN used in an embodiment of the present invention may use a 3 ⁇ 3 pooling size with stride 2, thus generating overlapping pooling areas. This can provide ⁇ 1% improvement without much additional computation. [00116] Deep Multi-Task Embodiment
- Another embodiment of the present invention provides a fine grained SBIR model that exploits semantic attributes and deep feature learning in a complementary way. Specifically it performs multitask deep learning with three objectives, including: retrieval by fine-grained ranking on a learned representation, attribute prediction, and attribute-level ranking. Simultaneously predicting semantic attributes and using such predictions in the ranking procedure help retrieval results to be more semantically relevant. Importantly, the introduction of semantic attribute learning in the model allows for the elimination of the cost of human annotations required for training a fine-grained deep ranking model. Experimental results demonstrate that this embodiment outperforms the state-of-the-art on challenging fine-grained SBIR benchmarks while requiring considerably less annotation.
- This embodiment takes advantage of a DNN's strength as a representation learner, but also combines this with semantic attribute learning, resulting in a deep multi-task attribute-based ranking model for FG-SBIR.
- this embodiment includes a multi-task DNN model, where the main task is a retrieval task with triplet-ranking objective as described above, and attributes are detected and exploited in two side tasks, which are also referred to herein as auxiliary tasks.
- the first side task is to predict the attributes of the input sketch and photo images. By optimising this task at training-time, it is encouraged that the learned representation more meaningfully encodes the semantic properties of the photo/sketch.
- the second side-task is to perform retrieval ranking based on the attribute predictions themselves.
- An embodiment of the invention may have only one auxiliary task rather than two as described below.
- An embodiment of the invention may have more than two auxiliary tasks, such as prediction of other attributes such as material, style, product price and/or brand.
- predicted attributes of the retrieved images can be displayed to the user.
- Retrieved images can be sorted and/or filtered by one or more predicted attributes.
- a multistage search can be performed by receiving a user selection of attributes from the first search results and then performing a second sketch-based search within images having the user-selected attributes.
- This novel deep multi-task attribute-based ranking network architecture has a number of advantages over existing methods:
- the proposed network is a three branch network.
- Each input tuple consists of three images corresponding to the query sketch (gone through the middle branch), positive photo image (top branch) and negative photo image (bottom branch) respectively.
- the positive photo has been annotated as more visually similar to the query than the negative photo.
- the learned deep model aims to enforce this ranking in the model output.
- the architecture of the task-shared part consists of five convolution layers with max pooling as well as a fully -connected (FC) layer, to learn a better representation of original data via feature maps. After these shared layers, different tasks evolve along separate branches: in the main task, one more FC layer with dropout and rectified linear unit (RELU) are added to represent the learned fine-grained feature vectors.
- FC fully -connected
- FC layer (with dropout and RELU) extracts fine-grained attribute representations followed by a score layer to make predictions.
- Main Triplet Ranking Task the main task is sketch-photo ranking, and in this respect the network of this embodiment is similar to the embodiment of Figure 1 , except for the additional dropout to reduce overfitting.
- the main task is trained by supervision in the form of triplet tuples, with each instance tuple ⁇ ! containing an anchor sketch s, positive photo p + and negative photo p ⁇ .
- the network has three branches and the goal is to learn a representation, such that the positive photo p + is ranked above the negative photo p ⁇ in terms of its similarity to the query sketch s.
- the main task loss function is triplet ranking loss:
- ⁇ ' V denotes the squared Euclidean distance
- D is the required margin of ranking for the hinge loss
- Attribute Prediction Task In order to encourage the learned network representation to encode semantically salient properties of objects (and thus help the main task to make better
- This attribute prediction task can then be trained simultaneously with the main sketch-photo ranking task
- Attribute Ranking Task The attribute-prediction task above ensures that the learned representation of the network encodes semantically salient features that support attribute prediction. Since retrieval ranking is the main task, the attribute predication would not be used during test-time. This task's effect on the main task is thus implicit rather than direct. However, as a semantic representation, attributes are domain invariant and thus intrinsically useful for matching a photo with a query sketch. To this end, a third task of attribute-level sketch-photo matching, which matches based on the predicted attributes of sketch and photo input rather than on an internally generated
- ⁇ ( ⁇ ) is the cross-entropy between the attribute prediction vectors of the corresponding branches.
- Multi-Task Testing At run-time the main and attribute -ranking tasks are used together to generate an overall similarity score for a given sketch/photo pair. All sketch/photo pairs are ranked, and the retrieval for a given sketch is the similarity-sorted list of photos. Specifically for a given query sketch s the similarity to each image p in the gallery set is calculated as
- a staged pre-training strategy is adopted similar to that of the embodiment of Figure 1. Specifically first a single branch classification model with the same feature extraction layers as the proposed full model is pre-trained to first classify ImageNetlK data (encoded as edge maps). This model is very similar to the Sketch-a-Net model designed for sketch classification. This is followed by fine tuning on the 250 classes TU-Berlin sketch recognition task. After that, this single branch network is extended to form a three-branch Siamese triplet ranking network. Each branch is initialised as the pre-trained single-branch model, and the model is then fine tuned on a category-level photo-sketch dataset re- purposed for fine-grained SBIR as described above. After these three stages of pre-training, the full model with two added side-tasks and the overall loss in Eq. (12) is then initialised and fine tuned with the fine-grained SBIR dataset for within -category sketch-based photo retrieval.
- Triplet Generation Instead of choosing top-10 most similar photos and asking humans to annotate, this embodiment automatically generate triplets based on a strict top-10 ranking induced by attribute and feature similarity. More specially, attribute similarity is used first to construct a top- 10 candidate list of most similar photos given a query sketch. ImageNetCNN features are then used to further rank these photos by similarity with respect to the ground-truth match. Intuitively this strategy can be seen as using semantic attribute properties to generate a meaningful short list, but otherwise driving the cross-domain ranking objective by more subtle photo-photo similarity encoded by a well- trained ImageNet CNN.
- Triplet Sampling A further novel feature of this embodiment is that instead of using all triplets, a plurality, e.g. 9, of the hardest ones are selected for model training, each consisting of the anchor and two photos of neighbouring ranks (e.g., anchor-Rl-R2 or anchor-R4-R5). It can be shown empirically that this choice of learning curriculum significantly boosts model performance compared to alternatives ranging from exhaustive sampling, easy, and medium.
- Training and Evaluation Data We use the same shoe and chair FG-SBIR datasets described above. For training, 304 sketch-photo pairs of shoes, and 200 pairs of chairs were used. Each sketch/photo comes with attribute annotations, which are used to obtain the top 10 photo rank list and additionally to learn attribute-based tasks in the multi-task model. Data augmentation like flipping and cropping is applied.
- the main and attribute -level ranking tasks have equivalent weight, and the attribute- prediction tasks all have the same lower weights.
- the batch size is 128, and the network was trained with a maximum of 25000 iterations.
- the base learning rate was 0.001 and weight decay (lq) was set to 0.0005.
- An alternative scenario is where the user just wants to see similar items to the sketch, and in this case the overall ordering is the salient metric. For this the percentage % of correctly ranked triplets is used, which reflects how well the predicted triplet ranking agrees with that of humans.
- the original human annotation can be noisy, thus human annotations are cleaned by inferring a globally optimised rank list from the annotated pairs using the generalised Bradley-Terry model [Francois Caron and Arnaud Doucet. Efficient bayesian inference for generalized bradley-terry models. Journal of Computational and Graphical Statistics, 21(1): 174-196, 2012].
- Sampling options include: (i) Exhaustive: use all 45 triplets with no sampling, or (ii) Hard: sample the 9 hardest triplets as proposed.
- a network is also trained using the same human annotated triplets used above as baseline
- the present embodiment provides a deep multi-task attribute -based model for fine-grained SBIR.
- attribute-prediction and attribute-based ranking side-tasks alongside the main sketch-based image retrieval task, the main task representation is enhanced by being required to encode semantic attributes of sketches and photos, and moreover the attribute predictions can be exploited to help make similarity predictions at test time.
- the combined result is that performance is significantly improved compared to models using a deep triplet ranking task alone. Beyond this it is shown that somewhat surprisingly the human subjective triplet annotation is not critical for obtaining good performance. This means that it is relatively easy to extend the method to new categories and larger datasets, since attribute annotation grows only linearly rather than cubically in the amount of data.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention porte, dans un mode de réalisation, sur un système global (1) qui comprend trois parties principales : un moteur de récupération à grains fins (2), un module interactif de croquis sélectif (3) et un module interactif de croquis progressif (4). Une interface (5) gère une communication avec un utilisateur. Le moteur de récupération à grains fins (1) est un modèle formé à partir d'une base de données de croquis et de photos. Il peut ensuite être utilisé de manière non interactive pour récupérer des photos similaires à un croquis d'entrée, ou de manière interactive par le biais de l'un des deux modules interactifs (3, 4). L'interface interactive peut être d'un type sélectif ou d'un type progressif.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1605481.9 | 2016-03-31 | ||
GB201605481 | 2016-03-31 | ||
GB201613525 | 2016-08-05 | ||
GB1613525.3 | 2016-08-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017168125A1 true WO2017168125A1 (fr) | 2017-10-05 |
Family
ID=58609589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2017/050825 WO2017168125A1 (fr) | 2016-03-31 | 2017-03-23 | Procédés de recherche basés sur un croquis |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2017168125A1 (fr) |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107748798A (zh) * | 2017-11-07 | 2018-03-02 | 中国石油大学(华东) | 一种基于多层视觉表达和深度网络的手绘图像检索方法 |
CN108154155A (zh) * | 2017-11-13 | 2018-06-12 | 合肥阿巴赛信息科技有限公司 | 一种基于草图的珠宝检索方法和系统 |
CN108416780A (zh) * | 2018-03-27 | 2018-08-17 | 福州大学 | 一种基于孪生-感兴趣区域池化模型的物体检测与匹配方法 |
CN108960258A (zh) * | 2018-07-06 | 2018-12-07 | 江苏迪伦智能科技有限公司 | 一种基于自学习深度特征的模板匹配方法 |
CN109215123A (zh) * | 2018-09-20 | 2019-01-15 | 电子科技大学 | 基于cGAN的无限地形生成方法、系统、存储介质和终端 |
CN109543559A (zh) * | 2018-10-31 | 2019-03-29 | 东南大学 | 基于孪生网络和动作选择机制的目标跟踪方法及系统 |
US10248664B1 (en) | 2018-07-02 | 2019-04-02 | Inception Institute Of Artificial Intelligence | Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval |
WO2019084419A1 (fr) * | 2017-10-27 | 2019-05-02 | Google Llc | Apprentissage non supervisé de représentations audio sémantiques |
WO2019093819A1 (fr) * | 2017-11-10 | 2019-05-16 | 삼성전자주식회사 | Dispositif électronique et procédé de fonctionnement associé |
CN110188228A (zh) * | 2019-05-28 | 2019-08-30 | 北方民族大学 | 基于草图检索三维模型的跨模态检索方法 |
CN110222217A (zh) * | 2019-04-18 | 2019-09-10 | 北京邮电大学 | 一种基于分段加权的鞋印图像检索方法 |
WO2019178155A1 (fr) * | 2018-03-13 | 2019-09-19 | Pinterest, Inc. | Réseau de convolution efficace pour systèmes de recommandation |
CN110472088A (zh) * | 2019-08-13 | 2019-11-19 | 南京大学 | 一种基于草图的图像检索方法 |
CN110570490A (zh) * | 2019-09-06 | 2019-12-13 | 北京航空航天大学 | 显著性图像生成方法及设备 |
CN110598018A (zh) * | 2019-08-13 | 2019-12-20 | 天津大学 | 一种基于协同注意力的草图图像检索方法 |
CN110633745A (zh) * | 2017-12-12 | 2019-12-31 | 腾讯科技(深圳)有限公司 | 一种基于人工智能的图像分类训练方法、装置及存储介质 |
CN111966849A (zh) * | 2020-08-17 | 2020-11-20 | 深圳市前海小萌科技有限公司 | 一种基于深度学习和度量学习的草图检索方法 |
CN112257812A (zh) * | 2020-11-12 | 2021-01-22 | 四川云从天府人工智能科技有限公司 | 一种标注样本确定方法、装置、机器可读介质及设备 |
US10902053B2 (en) * | 2017-12-21 | 2021-01-26 | Adobe Inc. | Shape-based graphics search |
CN112395442A (zh) * | 2020-10-12 | 2021-02-23 | 杭州电子科技大学 | 移动互联网上的低俗图片自动识别与内容过滤方法 |
CN112800267A (zh) * | 2021-02-03 | 2021-05-14 | 大连海事大学 | 一种细粒度鞋印图像检索方法 |
CN112862920A (zh) * | 2021-02-18 | 2021-05-28 | 清华大学 | 基于手绘草图的人体图像生成方法及系统 |
CN113129447A (zh) * | 2021-04-12 | 2021-07-16 | 清华大学 | 基于单张手绘草图的三维模型生成方法、装置和电子设备 |
CN113673635A (zh) * | 2020-05-15 | 2021-11-19 | 复旦大学 | 一种基于自监督学习任务的手绘草图理解深度学习方法 |
US11182980B1 (en) | 2020-03-26 | 2021-11-23 | Apple Inc. | Procedural generation of computer objects |
CN114373127A (zh) * | 2021-12-24 | 2022-04-19 | 复旦大学 | 一种基于手绘草图的目标物可抓取点检测方法及系统 |
EP4009196A1 (fr) * | 2020-12-01 | 2022-06-08 | Accenture Global Solutions Limited | Systèmes et procédés de recherche visuelle à base de fractales |
CN114782586A (zh) * | 2022-05-09 | 2022-07-22 | 重庆邮电大学 | 一种素描绘画序列集自动生成方法 |
CN114860980A (zh) * | 2022-05-26 | 2022-08-05 | 重庆邮电大学 | 一种基于草图局部特征和全局特征匹配的图像检索方法 |
CN117709210A (zh) * | 2024-02-18 | 2024-03-15 | 粤港澳大湾区数字经济研究院(福田) | 约束推断模型训练、约束推断方法、组件、终端及介质 |
US11954881B2 (en) | 2018-08-28 | 2024-04-09 | Apple Inc. | Semi-supervised learning using clustering as an additional constraint |
CN118072332A (zh) * | 2024-04-19 | 2024-05-24 | 西北工业大学 | 基于草图和文本双重提示的自进化零样本目标识别方法 |
CN118227821A (zh) * | 2024-05-24 | 2024-06-21 | 济南大学 | 一种基于抗噪声网络的草图检索三维模型的方法 |
US12067045B2 (en) | 2021-01-25 | 2024-08-20 | Samsung Electronics Co., Ltd. | Electronic apparatus and controlling method thereof |
-
2017
- 2017-03-23 WO PCT/GB2017/050825 patent/WO2017168125A1/fr active Application Filing
Non-Patent Citations (4)
Title |
---|
HOFFER ELAD ET AL: "Deep Metric Learning Using Triplet Network", 25 November 2015, NETWORK AND PARALLEL COMPUTING; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER INTERNATIONAL PUBLISHING, CHAM, PAGE(S) 84 - 92, ISBN: 978-3-642-38347-2, ISSN: 0302-9743, XP047327039 * |
JIANG WANG ET AL: "Learning Fine-Grained Image Similarity with Deep Ranking", 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 17 April 2014 (2014-04-17), pages 1386 - 1393, XP055263324, ISBN: 978-1-4799-5118-5, DOI: 10.1109/CVPR.2014.180 * |
QIAN YU ET AL: "Sketch-a-Net that Beats Humans", PROCEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2015, 21 August 2015 (2015-08-21), pages 7.1 - 7.12, XP055374782, ISBN: 978-1-901725-53-7, DOI: 10.5244/C.29.7 * |
Y LI ET AL: "Fine-grained sketch-based image retrieval by matching deformable part models", PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2014, NOTTINGHAM, 1 September 2014 (2014-09-01), pages 1 - 12, XP055375000 * |
Cited By (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11335328B2 (en) | 2017-10-27 | 2022-05-17 | Google Llc | Unsupervised learning of semantic audio representations |
CN111433843B (zh) * | 2017-10-27 | 2024-05-28 | 谷歌有限责任公司 | 语义音频表示的无监督学习 |
CN111433843A (zh) * | 2017-10-27 | 2020-07-17 | 谷歌有限责任公司 | 语义音频表示的无监督学习 |
WO2019084419A1 (fr) * | 2017-10-27 | 2019-05-02 | Google Llc | Apprentissage non supervisé de représentations audio sémantiques |
CN107748798A (zh) * | 2017-11-07 | 2018-03-02 | 中国石油大学(华东) | 一种基于多层视觉表达和深度网络的手绘图像检索方法 |
WO2019093819A1 (fr) * | 2017-11-10 | 2019-05-16 | 삼성전자주식회사 | Dispositif électronique et procédé de fonctionnement associé |
CN111344671A (zh) * | 2017-11-10 | 2020-06-26 | 三星电子株式会社 | 电子设备及其操作方法 |
CN108154155A (zh) * | 2017-11-13 | 2018-06-12 | 合肥阿巴赛信息科技有限公司 | 一种基于草图的珠宝检索方法和系统 |
CN110633745A (zh) * | 2017-12-12 | 2019-12-31 | 腾讯科技(深圳)有限公司 | 一种基于人工智能的图像分类训练方法、装置及存储介质 |
CN110633745B (zh) * | 2017-12-12 | 2022-11-29 | 腾讯科技(深圳)有限公司 | 一种基于人工智能的图像分类训练方法、装置及存储介质 |
US11704357B2 (en) | 2017-12-21 | 2023-07-18 | Adobe Inc. | Shape-based graphics search |
US10902053B2 (en) * | 2017-12-21 | 2021-01-26 | Adobe Inc. | Shape-based graphics search |
US11232152B2 (en) | 2018-03-13 | 2022-01-25 | Amazon Technologies, Inc. | Efficient processing of neighborhood data |
US11227014B2 (en) | 2018-03-13 | 2022-01-18 | Amazon Technologies, Inc. | Generating neighborhood convolutions according to relative importance |
US11227013B2 (en) | 2018-03-13 | 2022-01-18 | Amazon Technologies, Inc. | Generating neighborhood convolutions within a large network |
US11227012B2 (en) | 2018-03-13 | 2022-01-18 | Amazon Technologies, Inc. | Efficient generation of embedding vectors of nodes in a corpus graph |
WO2019178155A1 (fr) * | 2018-03-13 | 2019-09-19 | Pinterest, Inc. | Réseau de convolution efficace pour systèmes de recommandation |
US11783175B2 (en) | 2018-03-13 | 2023-10-10 | Pinterest, Inc. | Machine learning model training |
US11797838B2 (en) | 2018-03-13 | 2023-10-24 | Pinterest, Inc. | Efficient convolutional network for recommender systems |
US11922308B2 (en) | 2018-03-13 | 2024-03-05 | Pinterest, Inc. | Generating neighborhood convolutions within a large network |
CN108416780B (zh) * | 2018-03-27 | 2021-08-31 | 福州大学 | 一种基于孪生-感兴趣区域池化模型的物体检测与匹配方法 |
CN108416780A (zh) * | 2018-03-27 | 2018-08-17 | 福州大学 | 一种基于孪生-感兴趣区域池化模型的物体检测与匹配方法 |
US10248664B1 (en) | 2018-07-02 | 2019-04-02 | Inception Institute Of Artificial Intelligence | Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval |
CN108960258A (zh) * | 2018-07-06 | 2018-12-07 | 江苏迪伦智能科技有限公司 | 一种基于自学习深度特征的模板匹配方法 |
US11954881B2 (en) | 2018-08-28 | 2024-04-09 | Apple Inc. | Semi-supervised learning using clustering as an additional constraint |
CN109215123A (zh) * | 2018-09-20 | 2019-01-15 | 电子科技大学 | 基于cGAN的无限地形生成方法、系统、存储介质和终端 |
CN109215123B (zh) * | 2018-09-20 | 2022-07-29 | 电子科技大学 | 基于cGAN的无限地形生成方法、系统、存储介质和终端 |
CN109543559A (zh) * | 2018-10-31 | 2019-03-29 | 东南大学 | 基于孪生网络和动作选择机制的目标跟踪方法及系统 |
CN109543559B (zh) * | 2018-10-31 | 2021-12-28 | 东南大学 | 基于孪生网络和动作选择机制的目标跟踪方法及系统 |
CN110222217B (zh) * | 2019-04-18 | 2021-03-09 | 北京邮电大学 | 一种基于分段加权的鞋印图像检索方法 |
CN110222217A (zh) * | 2019-04-18 | 2019-09-10 | 北京邮电大学 | 一种基于分段加权的鞋印图像检索方法 |
CN110188228A (zh) * | 2019-05-28 | 2019-08-30 | 北方民族大学 | 基于草图检索三维模型的跨模态检索方法 |
CN110188228B (zh) * | 2019-05-28 | 2021-07-02 | 北方民族大学 | 基于草图检索三维模型的跨模态检索方法 |
CN110472088B (zh) * | 2019-08-13 | 2023-06-27 | 南京大学 | 一种基于草图的图像检索方法 |
CN110598018A (zh) * | 2019-08-13 | 2019-12-20 | 天津大学 | 一种基于协同注意力的草图图像检索方法 |
CN110472088A (zh) * | 2019-08-13 | 2019-11-19 | 南京大学 | 一种基于草图的图像检索方法 |
CN110570490B (zh) * | 2019-09-06 | 2021-07-30 | 北京航空航天大学 | 显著性图像生成方法及设备 |
CN110570490A (zh) * | 2019-09-06 | 2019-12-13 | 北京航空航天大学 | 显著性图像生成方法及设备 |
US11182980B1 (en) | 2020-03-26 | 2021-11-23 | Apple Inc. | Procedural generation of computer objects |
CN113673635A (zh) * | 2020-05-15 | 2021-11-19 | 复旦大学 | 一种基于自监督学习任务的手绘草图理解深度学习方法 |
CN113673635B (zh) * | 2020-05-15 | 2023-09-01 | 复旦大学 | 一种基于自监督学习任务的手绘草图理解深度学习方法 |
CN111966849A (zh) * | 2020-08-17 | 2020-11-20 | 深圳市前海小萌科技有限公司 | 一种基于深度学习和度量学习的草图检索方法 |
CN111966849B (zh) * | 2020-08-17 | 2023-07-28 | 深圳市前海小萌科技有限公司 | 一种基于深度学习和度量学习的草图检索方法 |
CN112395442A (zh) * | 2020-10-12 | 2021-02-23 | 杭州电子科技大学 | 移动互联网上的低俗图片自动识别与内容过滤方法 |
CN112257812B (zh) * | 2020-11-12 | 2024-03-29 | 四川云从天府人工智能科技有限公司 | 一种标注样本确定方法、装置、机器可读介质及设备 |
CN112257812A (zh) * | 2020-11-12 | 2021-01-22 | 四川云从天府人工智能科技有限公司 | 一种标注样本确定方法、装置、机器可读介质及设备 |
EP4009196A1 (fr) * | 2020-12-01 | 2022-06-08 | Accenture Global Solutions Limited | Systèmes et procédés de recherche visuelle à base de fractales |
US12067045B2 (en) | 2021-01-25 | 2024-08-20 | Samsung Electronics Co., Ltd. | Electronic apparatus and controlling method thereof |
CN112800267A (zh) * | 2021-02-03 | 2021-05-14 | 大连海事大学 | 一种细粒度鞋印图像检索方法 |
CN112800267B (zh) * | 2021-02-03 | 2024-06-11 | 大连海事大学 | 一种细粒度鞋印图像检索方法 |
CN112862920A (zh) * | 2021-02-18 | 2021-05-28 | 清华大学 | 基于手绘草图的人体图像生成方法及系统 |
CN113129447A (zh) * | 2021-04-12 | 2021-07-16 | 清华大学 | 基于单张手绘草图的三维模型生成方法、装置和电子设备 |
CN114373127A (zh) * | 2021-12-24 | 2022-04-19 | 复旦大学 | 一种基于手绘草图的目标物可抓取点检测方法及系统 |
CN114782586A (zh) * | 2022-05-09 | 2022-07-22 | 重庆邮电大学 | 一种素描绘画序列集自动生成方法 |
CN114860980A (zh) * | 2022-05-26 | 2022-08-05 | 重庆邮电大学 | 一种基于草图局部特征和全局特征匹配的图像检索方法 |
CN117709210A (zh) * | 2024-02-18 | 2024-03-15 | 粤港澳大湾区数字经济研究院(福田) | 约束推断模型训练、约束推断方法、组件、终端及介质 |
CN117709210B (zh) * | 2024-02-18 | 2024-06-04 | 粤港澳大湾区数字经济研究院(福田) | 约束推断模型训练、约束推断方法、组件、终端及介质 |
CN118072332A (zh) * | 2024-04-19 | 2024-05-24 | 西北工业大学 | 基于草图和文本双重提示的自进化零样本目标识别方法 |
CN118072332B (zh) * | 2024-04-19 | 2024-08-23 | 西北工业大学 | 基于草图和文本双重提示的自进化零样本目标识别方法 |
CN118227821A (zh) * | 2024-05-24 | 2024-06-21 | 济南大学 | 一种基于抗噪声网络的草图检索三维模型的方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017168125A1 (fr) | Procédés de recherche basés sur un croquis | |
US10025950B1 (en) | Systems and methods for image recognition | |
US9171013B2 (en) | System and method for providing objectified image renderings using recognition information from images | |
Yu et al. | Fine-grained instance-level sketch-based image retrieval | |
US20160350336A1 (en) | Automated image searching, exploration and discovery | |
Chen et al. | Structure-aware deep learning for product image classification | |
Song et al. | Deep multi-task attribute-driven ranking for fine-grained sketch-based image retrieval | |
US20060253491A1 (en) | System and method for enabling search and retrieval from image files based on recognized information | |
Lee et al. | Feature selection in multimedia: the state-of-the-art review | |
JP2005535952A (ja) | 画像内容検索法 | |
Zhan et al. | DeepShoe: An improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval | |
Mohanan et al. | A survey on different relevance feedback techniques in content based image retrieval | |
Fan et al. | Structured max-margin learning for inter-related classifier training and multilabel image annotation | |
Khodaskar et al. | Image mining: an overview of current research | |
Huang et al. | Modeling multiple aesthetic views for series photo selection | |
Papapanagiotou et al. | Improving concept-based image retrieval with training weights computed from tags | |
Shah et al. | Random patterns clothing image retrieval using convolutional neural network | |
John et al. | A novel deep learning based cbir model using convolutional siamese neural networks | |
Cornia et al. | Matching faces and attributes between the artistic and the real domain: the personart approach | |
Ma’Rufah et al. | A novel approach to visual search in e-commerce fashion using siamese neural network and multi-scale cnn | |
US20240355090A1 (en) | Conditional similarity-based image identification and matching with reduced labels | |
Hashmi | A Hybrid Approach to Fashion product Recommendation and Classification: Leveraging Transfer Learning and ANNOY | |
Yang et al. | Deep high-order asymmetric supervised hashing for image retrieval | |
Thanikachalam et al. | T2T-ViT: A Novel Semantic Image Mining Approach for Improving CBIR Using Vision Transformer | |
Markatopoulou | Machine Learning Architectures for Video Annotation and Retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17718974 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17718974 Country of ref document: EP Kind code of ref document: A1 |