+

WO2018170663A1 - 图像标注方法、装置及电子设备 - Google Patents

图像标注方法、装置及电子设备 Download PDF

Info

Publication number
WO2018170663A1
WO2018170663A1 PCT/CN2017/077253 CN2017077253W WO2018170663A1 WO 2018170663 A1 WO2018170663 A1 WO 2018170663A1 CN 2017077253 W CN2017077253 W CN 2017077253W WO 2018170663 A1 WO2018170663 A1 WO 2018170663A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
labeling
voice information
sub
result
Prior art date
Application number
PCT/CN2017/077253
Other languages
English (en)
French (fr)
Inventor
廉士国
刘兆祥
王宁
南一冰
Original Assignee
深圳前海达闼云端智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海达闼云端智能科技有限公司 filed Critical 深圳前海达闼云端智能科技有限公司
Priority to PCT/CN2017/077253 priority Critical patent/WO2018170663A1/zh
Priority to JP2019547989A priority patent/JP6893606B2/ja
Priority to CN201780000661.6A priority patent/CN107223246B/zh
Publication of WO2018170663A1 publication Critical patent/WO2018170663A1/zh
Priority to US16/576,234 priority patent/US11321583B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/2163Partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2178Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/40Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Definitions

  • the present application relates to the field of image management and image recognition technologies, and in particular, to an image labeling method, apparatus, and electronic device.
  • a key step is to label the data samples. For example, in order to train a smart recognizer to identify a dog, a large number of already labeled data samples of the dog, including the dog's picture and the text label "dog", etc., are required.
  • the currently used data annotation method is to mark a large number of images and texts that have been collected based on manual and computer equipment. After obtaining the labeled data samples, the corresponding image recognition training is performed according to the marked data samples.
  • this implementation method has the problems of long labeling time, low efficiency, and high labor cost.
  • the embodiment of the present application provides an image labeling method, device, and electronic device, which are mainly used to solve the problem of low efficiency and insufficient convenience when performing image labeling.
  • a technical solution adopted by the embodiment of the present application is to provide an image labeling method, including: acquiring an image collected at a terminal; acquiring voice information associated with the image; The image is annotated and the annotation result of the image is stored.
  • an image tagging apparatus including: a first acquiring module, configured to acquire an image collected at the terminal; and a second acquiring module, configured to acquire Voice information associated with the image; a first labeling module for The voice information marks the image and stores the labeling result of the image.
  • an electronic device including: at least one processor; and a memory communicably connected to the at least one processor; wherein the memory An instruction program executable by the at least one processor is stored, the instruction program being executed by the at least one processor to cause the at least one processor to perform the method as described above.
  • another technical solution adopted by the embodiment of the present application is to provide a computer program product, the computer program product comprising: a non-volatile computer readable storage medium and embedded in the nonvolatile Computer program instructions of a computer readable storage medium; the computer program instructions comprising instructions to cause a processor to perform the method as described above.
  • another technical solution adopted by the embodiment of the present application is to provide a non-transitory computer readable storage medium storing computer executable instructions, the computer executable The instructions are for causing a computer to perform the method as described above.
  • the obtained voice information is analyzed, and the image is labeled according to the analysis result of the voice information.
  • the implementation manner of the embodiment of the present application can mark the received image in real time. The time period of image annotation is shortened, which improves the efficiency of image recognition.
  • FIG. 1 is a schematic diagram of an operating environment of an image annotation method provided by an embodiment of the present application.
  • FIG. 2 is a schematic flow chart of an image labeling method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an image labeling method according to another embodiment of the present application.
  • FIG. 4 is a schematic flow chart of an image labeling method according to still another embodiment of the present application.
  • FIGS. 5(a)-(d) are schematic diagrams showing an example of an image labeling method provided by an embodiment of the present application.
  • FIG. 6 is a schematic flow chart of an image labeling method according to another embodiment of the present application.
  • FIG. 7 is a schematic flowchart diagram of an image labeling method according to another embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an image labeling apparatus according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an image labeling apparatus according to another embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an image labeling apparatus according to still another embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an image labeling apparatus according to another embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of hardware of an electronic device according to an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an operating environment of an image labeling method according to an embodiment of the present application.
  • the application environment includes: a user 10, a terminal 20, and a cloud 30.
  • User 10 may be any group having any number of similar or similar operational behaviors, such as a population of robot users, a population of mobile phone users, a group of AR eyewear users, and a community of blinded helmet users. User 10 can also be a separate individual. Different users 10 have different personalization requirements, usage habits, usage requirements, etc., so each user has their own specific user data.
  • the terminal 20 can be any suitable type of electronic device having a certain logical computing capability and providing one or more functions capable of satisfying the user's intention, and has functions of image acquisition, sound collection, voice recognition, display and the like.
  • the terminal 20 includes a plurality of intelligent terminal devices such as a robot, a smart phone, an AR glasses, and a smart helmet.
  • the user 10 can interact with the smart terminal 20 by any suitable type, one or more user interaction devices (such as a mouse, a keyboard, a remote controller, a touch screen, a somatosensory camera, an audio capture device, etc.), input commands or control the smart terminal. 20 performs one or more operations.
  • Cloud 30 has data storage and data processing functions that enable data communication with terminal 20, including receiving data and transmitting data.
  • the cloud 30 receives the data sent by the terminal 20, and performs data processing on the data.
  • the image is image-labeled according to the received image data and voice data, and the cloud 30 can also store the data.
  • image labeling method provided by the embodiment of the present application may be further extended to other suitable application environments, and is not limited to the application environment shown in FIG. 1 .
  • FIG. 1 Although only three users, three terminals, and one cloud server are shown in FIG. 1, those skilled in the art can understand that in an actual application process, the application environment may further include more or less users and terminals. And cloud service Server.
  • FIG. 2 is a schematic flowchart diagram of an image labeling method according to an embodiment of the present application. The method is applied to the cloud, as shown in FIG. 2, the method includes:
  • Step 101 Acquire an image collected at a terminal
  • Step 102 Acquire voice information associated with the image.
  • Step 103 Label the image according to the voice information, and store the labeling result of the image.
  • the terminal collects the screen image of the object to be marked in the preset range in real time through the device such as the camera, and the terminal may be in a static state or a moving state during the process of acquiring the image.
  • the terminal sends the image to the cloud, and the transmitted image may be an image after the terminal is compressed, thereby increasing the speed at which the image is uploaded to the cloud.
  • the voice information associated with the image is also uploaded to the cloud.
  • the cloud cannot automatically mark the acquired image, the cloud can mark the image based on the voice information, and store the image labeling result. .
  • the user can tell the terminal that the image collected by the voice input is “dog”, the terminal collects the voice information of the user, and sends the voice information to the cloud, the cloud.
  • the key feature in the voice information is extracted by the voice recognition module, and the image is labeled according to the key feature.
  • the user can also tell the terminal to collect the image as a "dog" by means of text input.
  • the terminal includes a mobile phone, an AR glasses, a smart helmet, a robot, and the like.
  • the phone, AR glasses, smart helmet and robot can capture images and capture sound.
  • the user can mark a plurality of images with only one voice input, for example, input voices in order according to the order of image uploading, so that the voices correspond to the images. Users can also use a single voice input to label a continuous sequence of images, such as a video of a different perspective for a dog.
  • the terminal can use the cloud to intelligently identify the objects in the image frame, thereby saving the manual labeling process on the terminal side.
  • the terminal can issue a voice “What is this?” through its own voice module, notify the person next to it or in the background of the cloud, and manually pass the smart device that comes with the terminal locally.
  • image annotation such as by voice annotation or by touch screen annotation
  • background control device such as computer
  • An embodiment of the present application provides an image annotation method, which uses a smart tool in the cloud to analyze the acquired voice information, and displays the acquired image containing the object to be labeled according to the analysis result, and stores the image annotation in the cloud. result.
  • the method can perform real-time interactive annotation on the acquired image, which improves the efficiency and convenience of image annotation.
  • the captured image contains a plurality of objects to be labeled, for example, a picture contains both dogs and cats, in order to more accurately mark the objects in the image
  • the following embodiment provides an image labeling method.
  • FIG. 3 is a schematic flowchart diagram of an image labeling method according to another embodiment of the present application. As shown in FIG. 3, the method includes:
  • Step 201 Acquire an image collected at the terminal.
  • Step 202 Extract area information of the object to be labeled in the image by using a region extraction algorithm.
  • the area extraction algorithm is used to extract the area of the image to be labeled.
  • content-based image retrieval For example, content-based image retrieval. Image compression and encoding based on regions of interest, content-based image authentication, image adaptive display, and the like.
  • the area extraction algorithm is used to extract the area information of the object to be labeled in the image.
  • the object to be labeled in the image is a dog and a cat.
  • the "dog” and the "cat” are extracted.
  • the area information that is, the range in which the images of "dog” and "cat” are taken up in the picture image.
  • Each object to be marked in the image has its corresponding area information, and the extracted area information can be represented by a mathematical expression, for example, the area information of "dog” and "cat” are respectively represented by [a1, a2].
  • the region extraction algorithm includes: extraction methods based on feature points, extraction methods based on visual attention mechanisms (such as Itti saliency map models, spectral residual models, etc.).
  • visual attention mechanisms such as Itti saliency map models, spectral residual models, etc.
  • Step 203 Perform sub-area division on the object to be labeled in the image according to the area information.
  • the area to be labeled in the image is divided according to the area information, and a plurality of sub-areas are divided, and the sub-area dividing process is actually
  • the area range corresponding to each object to be labeled is distinguished.
  • a sub-region of each object to be labeled may be clarified by using a block diagram of different colors. For example, a sub-region corresponding to “dog” indicates “green frame region”, and a sub-region corresponding to “cat” indicates “red frame region”.
  • Step 204 Send a result of the sub-area division or perform an image after sub-area division.
  • the cloud may send the segmentation result of the sub-area to the terminal, and the terminal superimposes the segmentation result on the collected image, so as to display the sub-region after the terminal user is divided.
  • Image The cloud can also directly send the sub-area divided image to the terminal, and the terminal only needs to display the divided image.
  • Step 206 Acquire voice information associated with a sub-area in the image.
  • the terminal After the terminal receives the result of the sub-area division sent by the cloud or performs the sub-area-divided image, the terminal can acquire an image including the sub-area. At this time, for each sub-area on the image, the terminal acquires the sub-area related to the sub-area. Key information, then send the key information to the cloud.
  • the user selects a sub-area in the image displayed by the terminal through a touch screen or the like, and inputs “this is a dog” by voice.
  • the key information of the sub-area is the voice information, and the terminal sends the voice information.
  • the cloud To the cloud.
  • the user directly inputs the voice information of "the dog in the red area” and "the cat in the green area” through the terminal.
  • the key information is the two pieces of voice information, and the terminal sends the collected voice information to the cloud.
  • Step 207 Label the image according to the voice information, and store the labeling result of the image.
  • the voice information is voice information corresponding to a sub-region in the image
  • the cloud may extract a keyword in the voice information based on the voice recognition module by using a voice recognition module, and establish a mapping between the keyword and the sub-region.
  • the image is first divided into sub-regions, and then the voice information of each sub-region is obtained by interacting with the terminal based on the divided sub-regions.
  • the voice information is sent to the cloud, and the cloud identifies the sub-areas in the image according to the voice information.
  • FIG. 4 is a schematic flowchart diagram of an image labeling method according to still another embodiment of the present application.
  • the main difference between FIG. 4 and FIG. 3 is that, after the cloud sends the result of the sub-area division or the image after the sub-area division to the terminal, the method further includes:
  • Step 205 Acquire an image obtained by performing an adjustment operation on the result of dividing the sub-area at the terminal or performing an image after sub-area division.
  • the terminal may perform an adjustment operation on the image to confirm that the sub-region divided in the cloud is accurate and suitable.
  • the terminal can accept the position and size of the color frame by the user to adjust the position of the color frame to adjust the object to be labeled, and the terminal can accept the user to delete the extra frame in the image, for example, the object is not to be marked in the frame, and the terminal can also Accept the user to increase the missing frames in the image, etc.
  • the terminal performs the adjustment operation on the divided sub-area
  • the voice information is collected based on the sub-area
  • the voice information is collected based on the sub-area of the adjusted image
  • the cloud is based on the voice information.
  • the image after the adjustment operation is marked.
  • the divided sub-areas are adjusted by the terminal, and the adjusted image is sent to the cloud, and the cloud labels the sub-areas of the image according to the confirmed image and the voice information of the sub-area of the confirmation image. . It ensures the accuracy and integrity of the object to be marked in the image when it is marked.
  • the image collected at the terminal includes a plurality of objects to be labeled, and the image may be as shown in FIG. 5( a ), and the image includes two items to be labeled: “dog” and “cat”.
  • the area extraction algorithm performs sub-area division on the object to be labeled in the image, and the result of the division is as shown in FIG. 5(b) or FIG. 5(c), and FIG. 5(b) or FIG. 5 can be found on the user terminal side.
  • the sub-region division result of the object to be marked is not complete or there is an error.
  • the end user may adjust the sub-area division result or the sub-area division image, and the adjusted image is as shown in FIG. 5 (d).
  • the terminal sends the adjusted image to the cloud, and sends the voice information associated with the sub-region of the adjusted image, so that the cloud can mark the adjusted image of the sub-region according to the received voice information.
  • FIG. 6 is a schematic flowchart diagram of an image labeling method according to another embodiment of the present application. As shown in FIG. 6, the method includes:
  • Step 301 Acquire an image collected at the terminal.
  • Step 302 Automatically label the image by image recognition
  • Step 303 After automatically marking the image, displaying the automatically labeled result at the terminal;
  • Step 304 Acquire voice information associated with the image.
  • Step 305 store the result of the automatic labeling when the voice information indicates that the result of the automatic labeling is correct; and/or, when the voice information indicates that the result of the automatic labeling is incorrect, label the image according to the voice information.
  • the image labeling method in the embodiment of the present application can be automatically completed by the cloud, and does not need to receive voice information collected by the terminal side.
  • the image is automatically labeled based on an image recognition method. For example, first, the cloud performs sub-region division on the received image, and then automatically identifies each sub-region by using an object recognition method, which includes marking a to-be-labeled object in the image, and marking a plurality of to-be-labeled objects in the image. The object is labeled to complete the annotation of the image.
  • the cloud may use a region extraction algorithm to perform sub-region partitioning on the image. For a specific process, refer to the description in the foregoing embodiment.
  • the object recognition method is based on the field of computer vision, and is mainly used to solve the problem of accurate detection and recognition of objects, which includes selecting effective image feature points to reduce the influence of occlusion and image noise appearing in the object recognition process. And achieve better object recognition accuracy and so on.
  • the object recognition method can recognize the text in addition to the object in the image, that is, by identifying the text on the object as an alternative label item of the object, for example, identifying "milk” on a box.
  • the image may also be The result of the image is sent to the terminal and displayed at the terminal.
  • the end user can confirm whether there is an incorrect labeling result. If there is an error in the result of the automatic labeling, the labeling result can be modified.
  • the result can be automatically annotated by voice modification, such as deleting the "pig” corresponding to the red area by the touch screen, and generating the "dog” for the red area by the voice "this is a dog".
  • the process can also be done by inputting text. Add to. It is also possible to automatically mark out the extra labels in the results by voice deletion, and so on.
  • the result of the automatic labeling is stored.
  • the embodiment of the present application provides an image labeling method, which automatically marks an image obtained by the cloud, and determines whether the result of the automatic labeling is correct at the terminal, and if the labeling is correct, stores the labeling result, if there is an error labeling , the adjustment result is adjusted according to the voice information.
  • This embodiment can not only shorten the time period of image annotation, but also significantly improve the accuracy of image annotation results and the accuracy of image recognition.
  • FIG. 7 is a schematic flowchart diagram of an image labeling method according to an embodiment of the present application. As shown in FIG. 7, the method includes:
  • Step 401 Acquire an image collected at the terminal
  • Step 402 Automatically label the image by image recognition
  • Step 403 Acquire voice information associated with the image.
  • Step 404 When the automatic labeling fails, label the image according to the voice information.
  • the image labeling method in the embodiment of the present application is for the case where the cloud image automatic labeling fails, and at this time, the image is labeled again according to the acquired voice information.
  • the cloud whether the automatic labeling is successful, or the terminal can feedback whether the automatic labeling is successful, and can also be judged by other means, and is not limited herein.
  • the image annotation method provided by the embodiment of the present invention can automatically mark an image in the cloud, and mark the image by the acquired voice information when the automatic labeling fails. This embodiment can ensure that the image is successfully labeled, and the labeling time is shortened, and the labeling method is more convenient.
  • the methods in the foregoing embodiments may be performed by a corresponding function module in the cloud server, or may be included in a function module and a terminal in the cloud.
  • the functional modules of the system come together to perform the method.
  • the obtaining in the steps 101 and 102 may refer to the image sent by the receiving terminal and the voice information; displaying the labeling result at the terminal may refer to sending the labeling result to the terminal by the terminal. Display.
  • the foregoing acquisition may refer to that the function module at the terminal invokes the hardware of the terminal to collect images and voices, and displays the corresponding content at the terminal.
  • FIG. 8 is a schematic structural diagram of an image labeling apparatus according to an embodiment of the present application.
  • the device 40 includes a first acquiring module 41 , a second acquiring module 42 , and a first labeling module 43 .
  • the first obtaining module 41 is configured to acquire an image collected at the terminal, the first obtaining module 42 is configured to acquire voice information associated with the image, and the first labeling module 43 is configured to: according to the voice information pair The image is annotated and the annotation result of the image is stored.
  • the first obtaining module 41 and the second obtaining module 42 are respectively connected to the first labeling module 43, and the first labeling module 43 performs image labeling on the image according to the received image and voice information.
  • the embodiment of the present application provides an image tagging device, which uses the smart tool of the cloud to analyze the acquired voice information, and displays the acquired image containing the object to be labeled according to the analysis result, and stores the image tag in the cloud. result.
  • the device can mark the acquired image in real time, which improves the efficiency of image annotation.
  • FIG. 9 is a schematic structural diagram of an image labeling apparatus according to another embodiment of the present application.
  • the embodiment of the present application is directed to an implementation manner in which a plurality of objects to be labeled are included in a received image.
  • the device 50 includes a first acquiring module 51 , a first extracting module 52 , a first dividing module 53 , a first sending module 54 , a second acquiring module 55 , and a first labeling module 56 .
  • the first obtaining module 51 is configured to acquire an image collected at the terminal, and the first extracting module 52 is configured to use the region extracting algorithm to extract the region information of the object to be labeled in the image; the first dividing module 53 uses Sub-area dividing the object to be labeled in the image according to the area information;
  • the sending module 54 is configured to send the result of the sub-area partitioning or the sub-area divided image;
  • the second obtaining module 55 is configured to acquire the voice information associated with the sub-area in the image; the first labeling module 56 And for labeling the image according to the voice information, and storing the labeling result of the image.
  • the first labeling module 56 includes an extracting unit 561 and an labeling unit 562.
  • the extracting unit 561 is configured to extract a keyword in the voice information based on the voice recognition, where the keyword corresponds to the sub-region, and the labeling unit 562 is configured to establish a mapping relationship between the keyword and the sub-region, And marking the sub-area according to the mapping relationship table, and storing the labeling result.
  • the apparatus further includes a third acquisition module, configured to acquire an image after the sub-region division result or the sub-region division image adjustment operation at the terminal.
  • the first labeling module 56 is specifically configured to label the image after the adjustment operation according to the voice information, and store the labeling result of the image.
  • FIG. 10 is a schematic structural diagram of an image labeling apparatus according to still another embodiment of the present application.
  • the device 60 includes a first acquiring module 61 , a second labeling module 62 , a display module 63 , a second acquiring module 64 , and a first labeling module 65 .
  • the first obtaining module 61 is configured to acquire an image collected at the terminal; the second labeling module 62 is configured to automatically mark the image by image recognition; the display module 63 is configured to perform the image on the image After the automatic labeling, the result of the automatic labeling is displayed at the terminal; the second obtaining module 64 is configured to acquire the voice information associated with the image; the first labeling module 65 is configured to automatically mark the voice information When the result is correct, the result of the automatic labeling is stored; and/or, when the voice information indicates that the result of the automatic labeling is incorrect, the image is labeled according to the voice information.
  • the embodiment of the present application provides an image labeling device, which automatically marks an image obtained by the cloud, and determines whether the result of the automatic labeling is correct at the terminal, and if the labeling is correct, stores the labeling result, and if there is an error labeling, Adjust the labeling result according to the voice information.
  • This embodiment can not only shorten the time period of image annotation, but also significantly improve the accuracy of image annotation results and the accuracy of image recognition.
  • FIG. 11 is a structural diagram of an image labeling apparatus according to another embodiment of the present application. intention.
  • the device 70 includes a first acquiring module 71 , a third labeling module 72 , a second acquiring module 73 , and a first labeling module 74 .
  • the first obtaining module 71 is configured to acquire an image collected at the terminal; the third labeling module 72 is configured to automatically mark the image by image recognition; the second obtaining module 73 is configured to acquire the image The voice information associated with the image; the first labeling module 74 is configured to mark the image according to the voice information when the automatic labeling fails.
  • the image labeling device provided by the embodiment of the present invention can automatically mark an image in the cloud, and when the automatic labeling fails, the image is marked by the acquired voice information. This embodiment can ensure that the image is successfully labeled, and the labeling time is shortened, and the labeling method is more convenient.
  • FIG. 12 is a schematic diagram showing the hardware structure of an electronic device according to an embodiment of the present application.
  • the electronic device 80 is capable of performing the image labeling method as described above.
  • the electronic device can be a cloud server or a system including a terminal and a cloud server.
  • the electronic device 80 includes: one or more processors 81 and a memory 82, and one processor 81 is taken as an example in FIG.
  • the processor 81 and the memory 82 can be connected by a bus or other means, and the bus connection is taken as an example in FIG.
  • the memory 81 is used as a non-volatile computer readable storage medium, and can be used for storing a non-volatile software program, a non-volatile computer-executable program, and a module, such as a program instruction corresponding to the image annotation method in the embodiment of the present application. / Module (for example, the first acquisition module 41, the second acquisition module 42, and the first annotation module 43 shown in FIG. 8).
  • the processor 81 executes various functional applications and data processing of the server by executing non-volatile software programs, instructions, and modules stored in the memory 82, that is, implementing the image labeling method of the above method embodiment.
  • the memory 82 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the image annotation device, and the like.
  • memory 82 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • memory 82 can include relative to processor 81 Remotely set up memory that can be connected to the image tagging device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the one or more modules are stored in the memory 82, and when executed by the one or more processors 81, perform an image annotation method in any of the above method embodiments.
  • Embodiments of the present invention provide a non-transitory computer readable storage medium storing computer-executable instructions that are executed by an electronic device to perform any of the above-described method embodiments Image labeling method in, for example, performing method step 101 to step 103 in FIG. 2 described above, method step 201 to step 204 in FIG. 3, and steps 206 and 207, method step 201 to step 207 in FIG.
  • the method steps 301 to 305 in FIG. 6 and the method steps 401 to 404 in FIG. 7 implement the modules 41-43 in FIG. 8, the modules 51-56 in FIG. 9, the units 561-562, in FIG. Modules 61-65, the functions of modules 71-74 in Figure 11.
  • the embodiment of the present application further provides a computer program product, including a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer,
  • the computer performs the image labeling method in any of the above method embodiments, for example, performs the method steps 101 to 103 in FIG. 2 described above, the method steps 201 to 204 in FIG. 3, and the steps 206 and 207, FIG.
  • the method steps 201 to 207 the method steps 301 to 305 in FIG. 6, and the method steps 401 to 404 in FIG. 7, the modules 41-43 in FIG. 8 and the modules 51-56 in FIG. 9 are implemented.
  • the computer software can be stored in a computer readable storage medium, which, when executed, can include the flow of an embodiment of the methods described above.
  • the storage medium may be a magnetic disk, an optical disk, a read-only storage memory, or a random storage memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Processing (AREA)

Abstract

本申请实施例公开了一种图像标注方法、装置及电子设备。该方法包括:获取在终端处采集的图像;获取与所述图像关联的语音信息;根据所述语音信息对所述图像进行标注,并存储所述图像的标注结果。本申请实施例的实施方式能够对接收到的图像进行实时交互式标注,缩短了图像标注的时间周期,提升了移动状态下进行图像标注的便利性,从而提升了进行图像标注时的工作效率和便捷性。

Description

图像标注方法、装置及电子设备 技术领域
本申请涉及图像管理和图像识别技术领域,特别是涉及一种图像标注方法、装置及电子设备。
背景技术
在图像识别过程中,关键的一步就是对数据样本进行标注。例如,为了训练出一个智能识别器来识别狗,则需要大量已经标注好的狗的数据样本,包括狗的图片以及文字标注“狗”等。
当前常用的数据标注方法是,基于人工和计算机设备对已采集的大量图像和文字进行标注。获取标注好的数据样本后,再根据该标注好的数据样本进行相应的图像识别训练,然而,这种实现方式存在标注时间长、效率低,耗费人力成本高等问题。
日常生活中,有些情况需要移动状态下(甚至通过穿戴式设备)实时采集图像样本(例如通过手机、AR眼镜、导盲头盔、机器人等),如果能在采集样本的同时做数据标注,这将降低后续离线标注的复杂度。但是,如何在移动状态下实时标注,这是需要解决的问题。例如,在移动/穿戴设备情况下,较难进行文字标注的输入、图像子区域的选择。以前的标注工具大多针对台式电脑,没有考虑这种移动/穿戴设备情况,不适合移动状态下的实时数据标注,即标注的便利性不足。
发明内容
本申请实施例提供一种图像标注方法、装置及电子设备,主要用于解决进行图像标注时效率低、便利性不足的问题。
为解决上述技术问题,本申请实施例采用的一个技术方案是:提供一种图像标注方法,包括:获取在终端处采集的图像;获取与所述图像关联的语音信息;根据所述语音信息对所述图像进行标注,并存储所述图像的标注结果。
为解决上述技术问题,本申请实施例采用的另一个技术方案是:提供一种图像标注装置,包括:第一获取模块,用于获取在终端处采集的图像;第二获取模块,用于获取与所述图像关联的语音信息;第一标注模块,用于根据所述 语音信息对所述图像进行标注,并存储所述图像的标注结果。
为解决上述技术问题,本申请实施例采用的又一个技术方案是:提供一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令程序,所述指令程序被所述至少一个处理器执行,以使所述至少一个处理器执行如上所述的方法。
为解决上述技术问题,本申请实施例采用的再一个技术方案是:提供一种计算机程序产品,所述计算机程序产品包括:非易失性计算机可读存储介质以及内嵌于所述非易失性计算机可读存储介质的计算机程序指令;所述计算机程序指令包括用以使处理器执行如上所述的方法的指令。
为解决上述技术问题,本申请实施例采用的还一个技术方案是:提供一种非易失性计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行如上所述的方法。
在本申请实施例中,通过对获取到的语音信息进行分析,根据该语音信息的分析结果对获取到的图像进行图像标注,本申请实施例的实施方式能够对接收到的图像进行实时标注,缩短了图像标注的时间周期,从而提升了进行图像识别时的工作效率。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。
图1是本申请实施例提供的图像标注方法的运行环境的示意图;
图2是本申请实施例提供的一种图像标注方法的流程示意图;
图3是本申请另一实施例提供的一种图像标注方法的流程示意图;
图4是本申请再一实施例提供的一种图像标注方法的流程示意图;
图5(a)-(d)是本申请实施例提供的一种图像标注方法的示例示意图;
图6是本申请又一实施例提供的一种图像标注方法的流程示意图;
图7是本申请还一实施例提供的一种图像标注方法的流程示意图;
图8是本申请实施例提供的一种图像标注装置的结构示意图;
图9是本申请另一实施例提供的一种图像标注装置的结构示意图;
图10是本申请再一实施例提供的一种图像标注装置的结构示意图;
图11是本申请又一实施例提供的一种图像标注装置的结构示意图;
图12是本申请实施例提供的电子设备的硬件结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
请参照图1,图1为本申请实施例提供的图像标注方法的运行环境的示意图。如图1所示,该应用环境包括:用户10、终端20以及云端30。
用户10可以为具有任何数量的,具有相同或者相近操作行为的群体,例如机器人用户群体、手机用户群体、AR眼镜用户群体以及导盲头盔用户群体等。用户10也可以为单独的个体。不同的用户10具有不同的个性化要求、使用习惯、使用需求等,因此每个用户都有其特定的用户数据。
终端20可以为任何合适类型的,具有一定逻辑运算能力,提供一个或者多个能够满足用户意图的功能的电子设备,其具备图像采集、声音采集、声音识别、显示播放等功能。终端20包括:机器人、智能手机、AR眼镜、智能头盔等多种智能终端设备。用户10可以通过任何合适的类型的,一种或者多种用户交互设备(比如鼠标、键盘、遥控器、触摸屏、体感摄像头以及音频采集装置等)与智能终端20进行交互,输入指令或者控制智能终端20执行一种或者多种操作。
云端30具有数据存储和数据处理功能,其能够与终端20进行数据通信,包括接收数据和发送数据。云端30接收终端20发送的数据,并对该数据进行数据处理,具体是根据接收到的图像数据和语音数据对图像进行图像标注,云端30还能够存储数据。
需要说明的是,本申请实施例提供的图像标注方法的还可以进一步的拓展到其他合适的应用环境中,而不限于图1中所示的应用环境。虽然图1中仅显示了三个用户、三个终端和一个云服务器,但本领域技术人员可以理解的是,在实际应用过程中,该应用环境还可以包括更多或者更少的用户、终端及云服 务器。
结合上述运行环境,下述内容给出了基于终端20和云端30进行图像标注的具体实施方式。
请参考图2,图2是本申请实施例提供的一种图像标注方法的流程示意图。该方法应用于云端,如图2所示,该方法包括:
步骤101、获取在终端处采集的图像;
步骤102、获取与所述图像关联的语音信息;
步骤103、根据所述语音信息对所述图像进行标注,并存储所述图像的标注结果。
在本申请实施例中,根据终端所处的实际场景,由终端通过摄像头等设备实时的采集预设范围内要标注的物体的画面图像,在采集图像过程中,终端可以处于静止状态或移动状态,采集到待标注物体的图像之后,终端将该图像发送至云端,该发送的图像可以是在终端经过压缩处理后的图像,从而提升图像上传至云端的速度。
终端向云端发送图像的同时还向云端上传与该图像关联的语音信息,当云端不能自动对获取到的图像进行标注时,云端可以基于该语音信息对该图像进行标注,并存储图像的标注结果。
例如,当终端采集的图像中只有一种待标注物时,用户可以通过语音输入的方式告诉终端采集的图像是“狗”,终端采集用户的语音信息,并将该语音信息发送至云端,云端通过语音识别模块提取该语音信息中的关键特征,再根据该关键特征对该图像进行标注。此外,用户也可以通过文字输入的方式告诉终端采集的图像是“狗”。
其中,该终端包括手机、AR眼镜、智能头盔、机器人等。该手机、AR眼镜、智能头盔及机器人均可以拍摄图像并采集声音。
其中,用户可以只用一次语音输入来标注多幅图像,比如根据图像上传的顺序来按顺序的输入语音,从而使语音与图像对应。用户还可以只用一次语音输入来标注一个连续的图像序列,比如一段针对狗的不同视角的视频。
当云端已存储大量的图像标注文件之后,终端再次采集到相同待标注物的图像画面时,可以利用云端来智能识别出图像画面中的物品,从而节省终端侧的人工标注过程。
需要说明的是,如果遇到无法识别的物体时,终端可以通过自身的语音模块发出“这是什么?”的语音,通知其旁边或云端后台的人工,人工在本地通过终端自带的智能设备来进行图像标注(比如通过语音标注或者通过触屏进行标注等),或者在云端通过后台操控设备(比如计算机)进行标注,并存储标注的结果至云端。
本申请实施例提供了一种图像标注方法,该方法利用云端的智能工具分析获取到的语音信息,并根据分析结果对获取到的包含待标注物的图像进行标注,并在云端存储图像的标注结果。该方法能够对获取到的图像进行实时交互式标注,提升了图像标注的效率、便利性。
当采集的图像中包含多种待标注物,例如一幅图片里同时包含狗和猫,此时为了更准确的标注出图像中的物体,下述实施例提供了一种图像标注方法。
请参考图3,图3是本申请另一实施例提供的一种图像标注方法的流程示意图。如图3所示,该方法包括:
步骤201、获取在终端处采集的图像。
步骤202、采用区域提取算法提取所述图像中的待标注物的区域信息。
从图像中获取目标区域是多种图像处理应用的重要步骤,该区域提取算法即用于提取图像中待标注物的区域,现有技术中相关的研究有很多,例如,基于内容的图像检索,基于感兴趣区域的图像压缩与编码,基于内容的图像认证以及图像自适应显示等。
在本申请实施例中,采用区域提取算法提取图像中待标注物的区域信息,例如,该图像中的待标注物为一只狗和一只猫,此时,提取“狗”和“猫”的区域信息,即提取“狗”和“猫”的图像在该画面图像中所占据的范围。图像中每一待标注物都有其对应的区域信息,提取出的区域信息可以用数学表达式表示,比如通过[a1,a2]分别表示“狗”和“猫”的区域信息。该区域提取算法包括:基于特征点的提取方法,基于视觉注意机制的提取方法(比如Itti显著图模型、频谱残差模型等)等。在相关技术中,对该相关的区域提取算法均有详细的介绍,在此不再赘述。
步骤203、根据所述区域信息对所述图像中的待标注物进行子区域划分。
提取图像中待标注物的区域信息后,本步骤中基于该区域信息对图像中的待标注物进行了区域划分,划分出多个子区域,该子区域划分过程实际就是在 获取到每个子区域对应的区域信息后,区别出每个待标注物所对应的区域范围。例如,可以使用不同颜色的框图来明确每个待标注物的子区域,比如“狗”对应的子区域表示是“绿色框区域”,“猫”对应的子区域表示是“红色框区域”。还可以使用不同的颜色来明确每个待标注物的子区域,比如“狗”的子区域示出灰色,“猫”的子区域示出黑色。还可以通过其他方式来区别出图像中待标注物的区域。需要说明的是,图像中包含的待标注物的种类越多时,准确的进行子区域划分能够有效的提高对该图像进行标注的准确率。
步骤204、发送所述子区域划分的结果或者进行子区域划分后的图像。
在云端完成对该图像的子区域划分之后,云端可以将子区域的划分结果发送至终端,终端将该划分结果叠加至已采集到的所述图像上,从而为终端用户展示出划分子区域后的图像。云端还可以将进行子区域划分后的图像直接发送至终端,终端只需显示该划分后的图像。
步骤206、获取与所述图像中的子区域关联的语音信息。
在终端接收云端发送的子区域划分的结果或者进行子区域划分后的图像之后,终端能够获取到包含子区域的图像,此时,针对图像上的每个子区域,终端获取与该子区域相关的关键信息,再将该关键信息发送至云端。
例如,用户通过触屏等方式选择终端显示的图像中的一个子区域,并通过语音输入“这是狗”,此时,该子区域的关键信息即这条语音信息,终端将该语音信息发送至云端。
例如,用户直接通过终端输入“红色区域的是狗”,“绿色区域的是猫”的语音信息,此时,该关键信息即这两条语音信息,终端将采集到的语音信息发送至云端。
步骤207、根据所述语音信息对所述图像进行标注,并存储所述图像的标注结果。
可以理解的是,该语音信息是与所述图像中的子区域对应的语音信息,云端可以通过语音识别模块基于语音识别提取该语音信息中的关键词,并建立该关键词与子区域的映射关系表,例如,<a1,t1>,<a2,t2>,<a3,t3>,…。从而根据该映射关系表对所述子区域进行标注,并存储标注结果,例如,<a1,t1>=<红色区域,“狗”>;<a2,t2>=<绿色区域,“猫”>。其中,该关键词对应子区域,每一个子区域可以包含一个或多个关键词,当一个子区域包含多个关键词时, 可以用该多个关键词对该子区域进行标注,例如,<a1,t1>=<红色区域,“狗”“萨摩”“白色”>。
在本申请实施例中,针对图像中包含多种待标注物的情况,首先对图像进行子区域划分,然后基于划分的子区域,采取人与终端互动的方式获取每一子区域的语音信息,再将该语音信息发送至云端,由云端根据该语音信息对图像中的子区域进行标注。通过这种实施方式,能够提高图像标注的准确率,而且划分子区域后再标注,提升了图像标注的效率。
可以理解的是,在云端进行子区域划分的过程中,由于图像的噪声等原因造成图像的子区域的划分存在错误,比如将两种待标注物划分至一个子区域内,或者将不是待标注物的区域划分出了一个子区域,或者是一些待标注物没有被划入子区域范围内。为了避免影响图像标注的准确性和完整性,下述实施例给出了一种图像标注方法。
请参考图4,图4是本申请再一实施例提供的一种图像标注方法的流程示意图。图4与图3的主要区别在于,当云端发送进行子区域划分后的结果或者进行子区域划分后的图像至终端之后,该方法还包括:
步骤205、获取在终端处对所述子区域划分的结果或者进行子区域划分后的图像进行调整操作后的图像。
在本申请实施例中,云端向终端发送所述子区域的划分结果或者进行子区域划分的图像之后,终端可以对该图像进行调整操作,以确认在云端划分的子区域是准确且合适的。例如,终端可以接受用户通过触屏方式微调颜色框的位置和尺寸以适配其中的待标注物,终端可以接受用户删除图像中的多余的框,比如该框中没有待标注物,终端还可以接受用户增加图像中缺少的框,等。
需要说明的是,终端对划分后的子区域进行调整操作后,基于子区域采集语音信息时,是基于被调整后的图像的子区域来采集语音信息的,并且云端是根据所述语音信息对进行调整操作后的图像进行标注的。
在本申请实施例中,通过终端对划分的子区域进行调整,并将调整后的图像发送至云端,云端根据该确认的图像以及该确认图像的子区域的语音信息对图像的子区域进行标注。保证了图像中待标注物在被标注时的准确性和完整性。
基于上述实施例,举例说明,在终端处采集的图像中包含多种待标注物,该图像可如图5(a)所示,该图像中包含“狗”和“猫”两种待标注物,通过 上述区域提取算法对该图像中的待标注物进行子区域划分,划分的结果如图5(b)或图5(c)所示,在用户终端侧可以发现图5(b)或图5(c)中待标注物的子区域划分结果不够完整或者存在错误,此时,可以由终端用户对子区域划分的结果或者进行子区域划分后的图像进行调整,调整后的图像如图5(d)所示,终端将该调整后的图像发送至云端,并发送与调整后的图像的子区域关联的语音信息,从而,云端可以根据接收到的语音信息对子区域调整后的图像进行标注。
请参考图6,图6是本申请又一实施例提供的一种图像标注方法的流程示意图。如图6所示,该方法包括:
步骤301、获取在终端处采集的图像;
步骤302、通过图像识别对所述图像进行自动标注;
步骤303、在对所述图像进行自动标注后,将自动标注的结果在终端处展示;
步骤304、获取与所述图像关联的语音信息;
步骤305、在所述语音信息指示自动标注的结果正确时,存储自动标注的结果;和/或,在所述语音信息指示自动标注的结果不正确时,根据语音信息对所述图像进行标注。
本申请实施例的图像标注方法可以由云端自动完成,不需要接收终端侧采集的语音信息。
具体地,云端获取该图像之后,基于图像识别方法对该图像进行自动标注。例如,首先云端对接收到的图像进行子区域划分,然后利用物体识别方法来自动的标注各个子区域,其包括对图像中的一种待标注物进行标注,以及对图像中的多种待标注物进行标注,从而完成该图像的标注。其中,云端可以采用区域提取算法来对图像进行子区域划分,具体的过程可参考上述实施例中的叙述。
其中,物体识别方法是基于计算机视觉领域,主要用于解决对物体进行准确的检测识别的问题,其包括选取有效的图像特征点,降低在物体识别过程中出现的遮挡、图像噪声带来的影响,以及达到较好的物体识别精度等。
需要说明的是,物体识别方法除了识别图像中的物体之外,还可以识别文字,即通过识别物体上的文字来作为该物体的备选标注项,例如,识别出一盒子上的“牛奶”字样,此时该盒子的标注项里包括“牛奶”。
进一步地,云端基于物体识别方法对图像进行自动标注之后,也可以将图 像的标注结果发送至终端并在终端处展示,可以由终端用户进行确认是否存在错误的标注结果,若自动标注的结果存在错误,则可以对标注结果进行修改。例如,可以通过语音修改自动标注结果,比如通过触屏删除红色区域对应的标注“猪”,再通过语音“这是狗”来给红色区域生成标注“狗”。还可以通过语音来添加自动标注结果中缺少的标注,比如触屏选中待标注物“猫”,然后输入语音“这是猫”,以生成新的标注,该过程也可以通过输入文字的方式进行添加。还可以通过语音删除自动标注结果中多余的标注,等等。
若自动标注的结果正确,则存储该自动标注的结果。
本申请实施例提供了一种图像标注方法,该方法由云端对获取的图像进行自动标注,并在终端处判断该自动标注的结果是否正确,若标注正确则存储该标注结果,若存在错误标注,则根据语音信息对标注结果进行调整。该实施方式不仅能够缩短图像标注的时间周期,而且能够明显地提高图像标注结果的正确性和图像识别的准确率。
请参考图7,图7是本申请还一实施例提供的一种图像标注方法的流程示意图。如图7所示,该方法包括:
步骤401、获取在终端处采集的图像;
步骤402、通过图像识别对所述图像进行自动标注;
步骤403、获取与所述图像关联的语音信息;
步骤404、在所述自动标注失败时,根据所述语音信息对所述图像进行标注。
本申请实施例的图像标注方法是针对云端图像自动标注失败时的情况,此时,根据获取的语音信息来对图像再次进行标注。
其中,云端对该图像进行自动标注的过程,以及根据语音信息对该图像再次进行标注的过程,可参考上述实施例中的叙述,在此不再赘述。
其中,可以由云端来判断自动标注是否成功,也可以由终端来反馈自动标注是否成功,还可以通过其他方式来判断,在此不做限定。
本申请实施例提供的图像标注方法,能够在云端对图像进行自动标注,并且在自动标注失败时,由获取的语音信息来对图像进行标注。该实施方式能够保证图像被成功标注,并且标注时间缩短,标注方式更方便。
需要说明的是,以上的各个实施例的方法,可以是指由云端服务器中相应的功能模块来独立执行的方法,也可以是指由包含云端中的功能模块和终端中 的功能模块的系统来共同执行的方法。当由云端中的功能模块单独进行标注时,步骤101、102中的获取可以是指接收终端发送的图像以及语音信息;在终端处展示标注结果,可以是指将标注结果发送给终端,由终端进行显示。当由云端和终端共同构成的系统共同执行时,上述的获取可以是指,终端处的功能模块调用终端的硬件进行图像和语音的采集,以及在终端处显示相应的内容。可以理解的是,不管是哪种方式,都可以达到本申请的目的,相应的,也应该落入本申请的保护范围。
请参考图8,图8是本申请实施例提供的一种图像标注装置的结构示意图。如图8所示,该装置40包括:第一获取模块41、第二获取模块42以及第一标注模块43。
其中,第一获取模块41,用于获取在终端处采集的图像;第一获取模块42,用于获取与所述图像关联的语音信息;第一标注模块43,用于根据所述语音信息对所述图像进行标注,并存储所述图像的标注结果。
在本申请实施例中,该第一获取模块41和第二获取模块42均分别连接第一标注模块43,由第一标注模块43根据接收到的图像和语音信息,对该图像进行图像标注。
值得说明的是,上述装置内的模块之间的信息交互、执行过程等内容,由于与本申请的方法实施例基于同一构思,具体内容可参见本申请方法实施例中的叙述,此处不再赘述。
本申请实施例提供了一种图像标注装置,该装置利用云端的智能工具分析获取到的语音信息,并根据分析结果对获取到的包含待标注物的图像进行标注,并在云端存储图像的标注结果。该装置能够对获取到的图像进行实时标注,提升了图像标注的效率。
请参考图9,图9是本申请另一实施例提供的一种图像标注装置的结构示意图。本申请实施例是针对接收的图像中包含多种待标注物的一种实现方式。如图9所示,该装置50包括:第一获取模块51、第一提取模块52、第一划分模块53、第一发送模块54、第二获取模块55、第一标注模块56。
其中,第一获取模块51,用于获取在终端处采集的图像;第一提取模块52,用于采用区域提取算法提取所述图像中的待标注物的区域信息;第一划分模块53,用于根据所述区域信息对所述图像中的待标注物进行子区域划分;第一发 送模块54,用于发送所述子区域划分的结果或者进行子区域划分后的图像;第二获取模块55,用于获取与所述图像中的子区域关联的语音信息;第一标注模块56,用于根据所述语音信息对所述图像进行标注,并存储所述图像的标注结果。
其中,该第一标注模块56包括:提取单元561和标注单元562。提取单元561,用于基于语音识别提取所述语音信息中的关键词,所述关键词对应所述子区域;标注单元562,用于建立所述关键词与所述子区域的映射关系表,并根据所述映射关系表对所述子区域进行标注,并存储标注结果。
在一些实施例中,该装置还包括第三获取模块,该第三获取模块用于获取在终端处对所述子区域划分的结果或者进行子区域划分后的图像进行调整操作后的图像。此时,所述第一标注模块56具体用于根据所述语音信息对所述进行调整操作后的图像进行标注,并存储所述图像的标注结果。
请参考图10,图10是本申请再一实施例提供的一种图像标注装置的结构示意图。如图10所示,该装置60包括:第一获取模块61、第二标注模块62、展示模块63、第二获取模块64以及第一标注模块65。
该第一获取模块61,用于获取在终端处采集的图像;该第二标注模块62,用于通过图像识别对所述图像进行自动标注;该展示模块63,用于在对所述图像进行自动标注后,将自动标注的结果在终端处展示;该第二获取模块64,用于获取与所述图像关联的语音信息;该第一标注模块65,用于在所述语音信息指示自动标注的结果正确时,存储自动标注的结果;和/或,在所述语音信息指示自动标注的结果不正确时,根据语音信息对所述图像进行标注。
值得说明的是,上述装置内的模块之间的信息交互、执行过程等内容,由于与本申请的方法实施例基于同一构思,具体内容可参见本申请方法实施例中的叙述,此处不再赘述。
本申请实施例提供了一种图像标注装置,通过云端对获取的图像进行自动标注,并在终端处判断该自动标注的结果是否正确,若标注正确则存储该标注结果,若存在错误标注,则根据语音信息对标注结果进行调整。该实施方式不仅能够缩短图像标注的时间周期,而且能够明显地提高图像标注结果的正确性和图像识别的准确率。
请参考图11,图11是本申请又一实施例提供的一种图像标注装置的结构示 意图。如图11所示,该装置70包括:第一获取模块71、第三标注模块72、第二获取模块73以及第一标注模块74。
该第一获取模块71,用于获取在终端处采集的图像;该第三标注模块72,用于通过图像识别对所述图像进行自动标注;该第二获取模块73,用于获取与所述图像关联的语音信息;该第一标注模块74,用于在自动标注失败时,根据所述语音信息对所述图像进行标注。
值得说明的是,上述装置内的模块之间的信息交互、执行过程等内容,由于与本申请的方法实施例基于同一构思,具体内容可参见本申请方法实施例中的叙述,此处不再赘述。
本申请实施例提供的图像标注装置,能够在云端对图像进行自动标注,并且在自动标注失败时,由获取的语音信息来对图像进行标注。该实施方式能够保证图像被成功标注,并且标注时间缩短,标注方式更方便。
请参考图12,图12是本申请实施例提供的电子设备的硬件结构示意图,该电子设备80能够执行如上所述的图像标注方法。该电子设备可以是一个云端服务器,也可以是一个包含终端和云端服务器的系统。
如图12所示,该电子设备80包括:一个或多个处理器81以及存储器82,图12中以一个处理器81为例。
处理器81、存储器82可以通过总线或者其他方式连接,图12中以通过总线连接为例。
存储器81作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的图像标注方法对应的程序指令/模块(例如,附图8所示的第一获取模块41、第二获取模块42以及第一标注模块43)。处理器81通过运行存储在存储器82中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例图像标注方法。
存储器82可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据图像标注装置的使用所创建的数据等。此外,存储器82可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器82可以包括相对于处理器81 远程设置的存储器,这些远程存储器可以通过网络连接至用图像标注装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
所述一个或者多个模块存储在所述存储器82中,当被所述一个或者多个处理器81执行时,执行上述任意方法实施例中的图像标注方法。
本发明实施例提供了一种非易失性计算机可读存储介质,所述非易失性计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被电子设备执行上述任意方法实施例中的图像标注方法,例如,执行以上描述的图2中的方法步骤101至步骤103,图3中的方法步骤201至步骤204,以及步骤206、207,图4中的方法步骤201至步骤207,图6中的方法步骤301至步骤305,图7中的方法步骤401至步骤404,实现图8中的模块41-43,图9中的模块51-56,单元561-562,图10中的模块61-65,图11中的模块71-74的功能。
本申请实施例还提供了一种计算机程序产品,包括存储在非易失性计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任意方法实施例中的图像标注方法,例如,执行以上描述的图2中的方法步骤101至步骤103,图3中的方法步骤201至步骤204,以及步骤206、207,图4中的方法步骤201至步骤207,图6中的方法步骤301至步骤305,图7中的方法步骤401至步骤404,实现图8中的模块41-43,图9中的模块51-56,单元561-562,图10中的模块61-65,图11中的模块71-74的功能。
专业人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。所述的计算机软件可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体或随机存储记忆体等。
以上所述仅为本发明的实施方式,并非因此限制本发明的专利范围,凡是 利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (15)

  1. 一种图像标注方法,其特征在于,包括:
    获取在终端处采集的图像;
    获取与所述图像关联的语音信息;
    根据所述语音信息对所述图像进行标注,并存储所述图像的标注结果。
  2. 根据权利要求1所述的方法,其特征在于,所述图像包含多种待标注物,所述获取与所述图像关联的语音信息之前,所述方法还包括:
    采用区域提取算法提取所述图像中的待标注物的区域信息;
    根据所述区域信息对所述图像中的待标注物进行子区域划分;
    发送所述子区域划分的结果或者进行子区域划分后的图像;
    所述获取与所述图像关联的语音信息包括:获取与所述图像中的子区域关联的语音信息。
  3. 根据权利要求2所述的方法,其特征在于,所述发送所述子区域划分的结果或者进行子区域划分后的图像之后,所述方法还包括:
    获取在终端处对所述子区域划分的结果或者进行子区域划分后的图像进行调整操作后的图像;
    所述根据所述语音信息对所述图像进行标注具体包括:根据所述语音信息对所述进行调整操作后的图像进行标注。
  4. 根据权利要求1至3任一项所述的方法,所述根据所述语音信息对所述图像进行标注,并存储所述图像的标注结果包括:
    基于语音识别提取所述语音信息中的关键词,所述关键词对应所述子区域;
    建立所述关键词与所述子区域的映射关系表,并根据所述映射关系表对所述子区域进行标注,并存储标注结果。
  5. 根据权利要求1所述的方法,其特征在于,获取与所述图像关联的语音信息之前,所述方法还包括:
    通过图像识别对所述图像进行自动标注;
    在对所述图像进行自动标注后,将自动标注的结果在终端处展示;
    所述根据所述语音信息对所述图像进行标注,包括:
    在所述语音信息指示自动标注的结果正确时,存储自动标注的结果;和/或, 在所述语音信息指示自动标注的结果不正确时,根据语音信息对所述图像进行标注。
  6. 根据权利要求1所述的方法,其特征在于,获取与所述图像关联的语音信息之前,所述方法还包括:
    通过图像识别对所述图像进行自动标注;
    所述根据所述语音信息对所述图像进行标注,包括:
    在所述自动标注失败时,根据所述语音信息对所述图像进行标注。
  7. 一种图像标注装置,其特征在于,包括:
    第一获取模块,用于获取在终端处采集的图像;
    第二获取模块,用于获取与所述图像关联的语音信息;
    第一标注模块,用于根据所述语音信息对所述图像进行标注,并存储所述图像的标注结果。
  8. 根据权利要求7所述的装置,其特征在于,所述图像包含多种待标注物,所述装置还包括:
    第一提取模块,用于采用区域提取算法提取所述图像中的待标注物的区域信息;
    第一划分模块,用于根据所述区域信息对所述图像中的待标注物进行子区域划分;
    第一发送模块,用于发送所述子区域划分的结果或者进行子区域划分后的图像;
    所述第二获取模块具体用于获取与所述图像中的子区域关联的语音信息。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    第三获取模块,用于获取在终端处对所述子区域划分的结果或者进行子区域划分后的图像进行调整操作后的图像;
    所述第一标注模块具体用于根据所述语音信息对所述进行调整操作后的图像进行标注,并存储所述图像的标注结果。
  10. 根据权利要求7至9任一项所述的装置,所述第一标注模块包括:
    提取单元,用于基于语音识别提取所述语音信息中的关键词,所述关键词对应所述子区域;
    标注单元,用于建立所述关键词与所述子区域的映射关系表,并根据所述映射关系表对所述子区域进行标注,并存储标注结果。
  11. 根据权利要求7所述的装置,其特征在于,所述装置还包括:
    第二标注模块,用于通过图像识别对所述图像进行自动标注;
    展示模块,用于在对所述图像进行自动标注后,将自动标注的结果在终端处展示;
    所述第一标注模块具体用于在所述语音信息指示自动标注的结果正确时,存储自动标注的结果;和/或,在所述语音信息指示自动标注的结果不正确时,根据语音信息对所述图像进行标注。
  12. 根据权利要求7所述的装置,其特征在于,所述装置还包括:
    第三标注模块,用于通过图像识别对所述图像进行自动标注;
    所述第一标注模块具体用于在所述自动标注失败时,根据所述语音信息对所述图像进行标注。
  13. 一种电子设备,其特征在于,包括:至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令程序,所述指令程序被所述至少一个处理器执行,以使所述至少一个处理器执行权利要求1至6任一项所述的方法。
  14. 一种计算机程序产品,其特征在于,所述计算机程序产品包括:非易失性计算机可读存储介质以及内嵌于所述非易失性计算机可读存储介质的计算机程序指令;所述计算机程序指令包括用以使处理器执行权利要求1至6任一项所述的方法的指令。
  15. 一种非易失性计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行权力要求1至6任一项所述的方法。
PCT/CN2017/077253 2017-03-20 2017-03-20 图像标注方法、装置及电子设备 WO2018170663A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2017/077253 WO2018170663A1 (zh) 2017-03-20 2017-03-20 图像标注方法、装置及电子设备
JP2019547989A JP6893606B2 (ja) 2017-03-20 2017-03-20 画像のタグ付け方法、装置及び電子機器
CN201780000661.6A CN107223246B (zh) 2017-03-20 2017-03-20 图像标注方法、装置及电子设备
US16/576,234 US11321583B2 (en) 2017-03-20 2019-09-19 Image annotating method and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/077253 WO2018170663A1 (zh) 2017-03-20 2017-03-20 图像标注方法、装置及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/576,234 Continuation US11321583B2 (en) 2017-03-20 2019-09-19 Image annotating method and electronic device

Publications (1)

Publication Number Publication Date
WO2018170663A1 true WO2018170663A1 (zh) 2018-09-27

Family

ID=59953858

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077253 WO2018170663A1 (zh) 2017-03-20 2017-03-20 图像标注方法、装置及电子设备

Country Status (4)

Country Link
US (1) US11321583B2 (zh)
JP (1) JP6893606B2 (zh)
CN (1) CN107223246B (zh)
WO (1) WO2018170663A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035325A (zh) * 2019-12-25 2021-06-25 无锡祥生医疗科技股份有限公司 超声影像注释方法、存储介质及超声设备

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704884B (zh) * 2017-10-16 2022-01-07 Oppo广东移动通信有限公司 图像标签处理方法、图像标签处理装置及电子终端
CN107835362B (zh) * 2017-10-30 2019-12-31 Oppo广东移动通信有限公司 图像存储方法、图像显示方法及电子设备
CN108124479A (zh) * 2017-12-29 2018-06-05 深圳前海达闼云端智能科技有限公司 地图标注的方法、装置、云端服务器、终端及应用程序
CN110688509A (zh) * 2018-06-19 2020-01-14 新智数字科技有限公司 一种样本数据的存储方法及装置
CN109344757B (zh) * 2018-09-25 2021-05-28 北京旷视科技有限公司 数据采集方法、装置及电子设备
CN109379538B (zh) * 2018-10-26 2021-06-22 创新先进技术有限公司 图像采集设备、系统及方法
CN109543661A (zh) * 2018-12-28 2019-03-29 北京隆恩智慧科技有限公司 基于语音辅助的关注点信息自动获取系统及关注点信息自动获取方法
CN110084289B (zh) * 2019-04-11 2021-07-27 北京百度网讯科技有限公司 图像标注方法、装置、电子设备及存储介质
JP2021086441A (ja) * 2019-11-28 2021-06-03 シャープ株式会社 学習装置、個体識別装置、動物監視装置、学習方法、および、制御プログラム
CN111143925B (zh) * 2019-12-18 2023-05-26 万翼科技有限公司 图纸标注方法及相关产品
CN111104523A (zh) * 2019-12-20 2020-05-05 西南交通大学 基于语音辅助的视听协同学习机器人及学习方法
CN111355912A (zh) * 2020-02-17 2020-06-30 江苏济楚信息技术有限公司 一种执法记录方法及系统
CN113449548A (zh) * 2020-03-24 2021-09-28 华为技术有限公司 更新物体识别模型的方法和装置
US11989890B2 (en) * 2020-03-31 2024-05-21 Hcl Technologies Limited Method and system for generating and labelling reference images
CN111914822B (zh) * 2020-07-23 2023-11-17 腾讯科技(深圳)有限公司 文本图像标注方法、装置、计算机可读存储介质及设备
US11869319B2 (en) * 2020-12-31 2024-01-09 Datalogic Usa, Inc. Fixed retail scanner with annotated video and related methods
CN115101057A (zh) * 2021-03-03 2022-09-23 Oppo广东移动通信有限公司 图像的语音标注及使用方法与装置、电子装置及存储介质
CN113642416A (zh) * 2021-07-20 2021-11-12 武汉光庭信息技术股份有限公司 一种用于ai标注的测试云平台和ai标注测试方法
CN115131543A (zh) * 2022-06-28 2022-09-30 北京钢铁侠科技有限公司 图像标注处理方法、装置以及电子设备
CN118298427A (zh) * 2024-03-20 2024-07-05 广东奥普特科技股份有限公司 一种图像标注方法及相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289468A (zh) * 2011-07-22 2011-12-21 北京航空航天大学 一种照相机中照片信息获取与记录方法
US20140108963A1 (en) * 2012-10-17 2014-04-17 Ponga Tools, Inc. System and method for managing tagged images
CN105095919A (zh) * 2015-09-08 2015-11-25 北京百度网讯科技有限公司 图像识别方法和装置
CN105094760A (zh) * 2014-04-28 2015-11-25 小米科技有限责任公司 一种图片标记方法及装置
CN106095764A (zh) * 2016-03-31 2016-11-09 乐视控股(北京)有限公司 一种动态图片处理方法及系统

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003069925A (ja) * 2001-08-24 2003-03-07 Fuji Photo Film Co Ltd 付帯情報入力方法および装置並びにプログラム
JP2004086124A (ja) * 2002-06-24 2004-03-18 Matsushita Electric Ind Co Ltd メタデータ制作装置及び制作方法
JP2005065191A (ja) * 2003-08-20 2005-03-10 Ntt Comware Corp 動画メタデータ自動作成装置及び動画メタデータ自動作成プログラム
US20050192808A1 (en) * 2004-02-26 2005-09-01 Sharp Laboratories Of America, Inc. Use of speech recognition for identification and classification of images in a camera-equipped mobile handset
JP4659681B2 (ja) * 2005-06-13 2011-03-30 パナソニック株式会社 コンテンツタグ付け支援装置およびコンテンツタグ付け支援方法
JP2007079416A (ja) * 2005-09-16 2007-03-29 Matsushita Electric Ind Co Ltd 画像データ作成装置、画像データ作成方法及びプログラム
JP2009272816A (ja) * 2008-05-02 2009-11-19 Visionere Corp サーバ、情報処理システム及び情報処理方法
JP5436559B2 (ja) * 2008-09-02 2014-03-05 エコール・ポリテクニーク・フェデラル・ドゥ・ローザンヌ(エーペーエフエル) ポータブル・デバイス上での画像アノテーション
JP4930564B2 (ja) * 2009-09-24 2012-05-16 カシオ計算機株式会社 画像表示装置及び方法並びにプログラム
US10706128B2 (en) * 2010-05-12 2020-07-07 Zipongo System and method for automated personalized and community-specific eating and activity planning, linked to tracking system with automated multimodal item identification and size estimation system
US8625887B2 (en) * 2011-07-13 2014-01-07 Google Inc. Systems and methods for matching visual object components
JP5611155B2 (ja) * 2011-09-01 2014-10-22 Kddi株式会社 コンテンツに対するタグ付けプログラム、サーバ及び端末
KR20140035713A (ko) * 2012-09-14 2014-03-24 한국전자통신연구원 실감 미디어 저작 방법 및 장치, 이를 이용하는 휴대형 단말 장치
CN104504150B (zh) * 2015-01-09 2017-09-29 成都布林特信息技术有限公司 新闻舆情监测系统
CN106033418B (zh) * 2015-03-10 2020-01-31 阿里巴巴集团控股有限公司 语音添加、播放方法及装置、图片分类、检索方法及装置
CN106210615A (zh) * 2015-04-30 2016-12-07 北京文安智能技术股份有限公司 一种城市管理自动监测方法、装置及系统
CN106156799B (zh) * 2016-07-25 2021-05-07 北京光年无限科技有限公司 智能机器人的物体识别方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289468A (zh) * 2011-07-22 2011-12-21 北京航空航天大学 一种照相机中照片信息获取与记录方法
US20140108963A1 (en) * 2012-10-17 2014-04-17 Ponga Tools, Inc. System and method for managing tagged images
CN105094760A (zh) * 2014-04-28 2015-11-25 小米科技有限责任公司 一种图片标记方法及装置
CN105095919A (zh) * 2015-09-08 2015-11-25 北京百度网讯科技有限公司 图像识别方法和装置
CN106095764A (zh) * 2016-03-31 2016-11-09 乐视控股(北京)有限公司 一种动态图片处理方法及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035325A (zh) * 2019-12-25 2021-06-25 无锡祥生医疗科技股份有限公司 超声影像注释方法、存储介质及超声设备

Also Published As

Publication number Publication date
JP2020509504A (ja) 2020-03-26
US20200012888A1 (en) 2020-01-09
US11321583B2 (en) 2022-05-03
CN107223246B (zh) 2021-08-03
CN107223246A (zh) 2017-09-29
JP6893606B2 (ja) 2021-06-23

Similar Documents

Publication Publication Date Title
WO2018170663A1 (zh) 图像标注方法、装置及电子设备
CN110085224B (zh) 智能终端全程语音操控处理方法、智能终端及存储介质
CN107491174B (zh) 用于远程协助的方法、装置、系统及电子设备
KR101810578B1 (ko) 셔터 클릭을 통한 자동 미디어 공유
CA3017647C (en) Optical character recognition in structured documents
CN113011403B (zh) 手势识别方法、系统、介质及设备
CN111626126B (zh) 一种人脸情绪识别的方法、装置、介质及电子设备
CN108416003A (zh) 一种图片分类方法和装置、终端、存储介质
WO2022116545A1 (zh) 一种基于多特征识别的交互方法、装置及计算机设备
KR102440198B1 (ko) 시각 검색 방법, 장치, 컴퓨터 기기 및 저장 매체 (video search method and apparatus, computer device, and storage medium)
CN107153838A (zh) 一种照片自动分级方法及装置
Shah et al. Efficient portable camera based text to speech converter for blind person
CN117671499A (zh) 基于深度学习的花卉等级自动分类分拣与病虫害监测系统
CN112270297A (zh) 用于显示识别结果的方法和计算机系统
JP7502570B2 (ja) 酒製品位置決め方法、酒製品情報管理方法及びその装置、デバイス及び記憶媒体
CN111629267B (zh) 音频标注方法、装置、设备及计算机可读存储介质
CN108197563B (zh) 用于获取信息的方法及装置
CN118426581A (zh) 一种元宇宙数据处理方法和系统
CN106778449B (zh) 动态影像的物件辨识方法及自动截取目标图像的互动式影片建立方法
CN115661548A (zh) 一种物体识别的训练方法、装置及存储介质
CN115474040A (zh) 一种网络录像设备的测试方法和装置、电子设备及存储介质
CN114202719A (zh) 视频样本的标注方法、装置、计算机设备及存储介质
CN111259182A (zh) 一种截屏图像的搜索方法和装置
CN206236111U (zh) 一种基于语音交互的叶片图像植物自动识别装置
CN117522644B (zh) 自主编辑变电检修作业培训系统及变电检修作业执行方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17901718

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019547989

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 25.11.2019)

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 13.01.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17901718

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载