WO2019076867A1 - Semantic segmentation of an object in an image - Google Patents
Semantic segmentation of an object in an image Download PDFInfo
- Publication number
- WO2019076867A1 WO2019076867A1 PCT/EP2018/078192 EP2018078192W WO2019076867A1 WO 2019076867 A1 WO2019076867 A1 WO 2019076867A1 EP 2018078192 W EP2018078192 W EP 2018078192W WO 2019076867 A1 WO2019076867 A1 WO 2019076867A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image frame
- neuronal network
- predefined
- high priority
- convolutional neuronal
- Prior art date
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 14
- 230000001537 neural effect Effects 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000013461 design Methods 0.000 abstract description 4
- 238000013527 convolutional neural network Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 7
- 238000011156 evaluation Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 210000003618 cortical neuron Anatomy 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 210000000857 visual cortex Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Definitions
- the invention relates to a method for semantic segmentation of an object in an image, comprising the following method steps:
- CNN convolution neural network
- CNNs are highly successful at classification and categorization tasks but much of the research is on standard photometric RGB images and is not focused on embedded automotive devices. Automotive hardware devices need to have low power consumption requirements and thus low computational power.
- a convolutional neural is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery.
- CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing.
- Convolutional networks were inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
- CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage.
- CNNs have applications in image and video recognition, recommender systems and natural language processing.
- US 2017/0200063 A1 teaches applying a set of sections spanning a down-sampled version of an image of a road-scene to a low-fidelity classifier to determine a set of candidate sections for depicting one or more objects in a set of classes.
- the set of candidate sections of the down-sampled version may be mapped to a set of potential sectors in a high-fidelity version of the image.
- a high-fidelity classifier may be used to vet the set of potential sectors, determining the presence of one or more objects from the set of classes.
- the low-fidelity classifier may include a first convolution neural network trained on a first training set of down-sampled versions of cropped images of objects in the set of classes.
- the high-fidelity classifier may include a second CNN trained on a second training set of high-fidelity versions of cropped images of objects in the set of classes.
- US 9.704,054 B1 describes that image classification and related imaging tasks performed using machine learning tools may be accelerated by using tools to associate an image with a cluster of such labels or categories, and then to select one of the labels or categories of the cluster as associated with the image.
- the clusters of labels or categories may comprise labels that are mutually confused for one another, e.g. two or more labels or categories that have been identified as associated with a single image.
- processes for identifying labels or categories associated with images may be accelerated because computations associated with labels or categories not included in the cluster may be omitted.
- the invention provides a method for semantic segmentation of an object in an image, comprising the following method steps:
- semantically classifying the detected objects by the convolutional neuronal network by assigning each detected object to one of a list of predefined object classes, providing a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively,
- determining one or more object(s) which have a predefined priority level determining one or more object(s) which have a predefined priority level, determining a high priority area of the image frame which relates to the or an object with the predefined priority level,
- a high priority area of the image is determined in a first image frame based on the priority levels of the objects detected in the image. Then, in a next image frame, only the high priority area of the image is processed which makes the method a lot more effective.
- the priority levels of the different object classes are defined based on an order of safety, e.g. objects belonging to the object class "person” might be more important than objects belonging to the object class "curbside”.
- the high priority area would be defined by the object(s) with the highest priority level, i.e. the predefined priority level would be the highest priority level. If these objects have been classified in a trustworthy way, areas of the image with objects having lower priority levels may be processed.
- the step of analyzing only the high priority area in the next image frame by the convoiutional neuronal network may be performed in different ways as set out in the following. According to a preferred embodiment of the invention, analyzing only the high priority area in the next image frame by the convoiutional neuronal network is performed by
- semantically classifying the detected objects by the convoiutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects by comparison with the lookup-table,
- the step of analyzing only the high priority area in the next image frame by the convoiutional neuronal network by
- semantically classifying the detected objects by the convoiutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects by comparison with the lookup-table,
- analyzing only the new high priority area in the next image frame by the convolutional neuronal network is repeated at least once.
- a high priority area with objects which should be classified may be defined in a multi-step process.
- classification may also be performed directly after the first definition of the high priority area. Therefore, according to a preferred embodiment of the invention analyzing only the high priority area in the next image frame by the convolutional neuronal network is performed by semantically classifying the object by assigning the object to one of the list of predefined object classes. In this respect, preferably the following step is performed:
- inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time may be performed by inputting the complete image frame.
- inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time is performed by inputting only the high priority area of the next image frame into the convolutional neuronal network.
- the step of consecutively acquiring image frames is performed by a camera with a field of view of more than 150 yielding respective image frames covering an image angle of more than 150 ⁇ More preferably, the camera has a field of view of more than 180 yielding respective image frames covering an image angle of more than 180°. In this way, a large field of view may be monitored while the mere amount of pixels of the images acquired by such a camera does not slow down processing speed appreciably since not the complete images have to processed for all image frames.
- the invention also relates to the use of a method as described above in an automotive vehicle.
- the invention further relates to a sensor arrangement for an automotive vehicle configured for performing a method as described above.
- the invention also relates to a non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement of an automotive vehicle to perform a method as described above.
- Fig. 1 schematically depicts a vehicle with a sensor arrangement according to a preferred embodiment of the invention
- FIG. 2 a, b schematically depict the processing of image frames according to a
- FIGs. 3a - d schematically depict a further aspect of the processing of image frames according to a preferred embodiment of the invention.
- Fig. 1 schematically depicts an automotive vehicle 1 with a sensor arrangement 2 which is comprised of a camera 3 and an evaluation unit 4.
- the sensor arrangement 2 is adapted for semantic segmentation of images of objects 5 captured by camera 3.
- the evaluation unit 4 may be part of an advanced driver-assistance system for helping a driver of the automotive vehicle 1 in the driving process.
- the camera 3 is a large field-of- view camera 3 and may have a viewing angle which is larger than 180°.
- the camera 3 consecutively acquires image frames.
- the fre uency of acquiring image frames may be as high as 30 frames/second. However, for effectively processing the image frames, a processing frequency of 5 frames/second has shown to be sufficient.
- a first image frame 6 of the consecutively acquired image frames is input into a convolutional neural network in real time.
- the convolutional neural network is provided in the evaluation unit 4 to which the image frames of the camera 3 are transmitted.
- the convolutional neural network it is examined whether any object 5 which is not part of the ground area the automotive vehicle 1 is driving on can be detected in the first image frame 6. If such objects 5 can be detected in the first image frame 6, these objects are semantically classified by the convolutional neural network by assigning each detected object to one of a list of predefined object classes.
- these object classes may be "person”, “car”, “wall”, “tree”,...
- a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively, is provided.
- this priority list looks as follows: person priority 1
- This priority list may have further object classes which are related to respective priorities.
- a respective priority level is determined for each object which has been detected in the first image frame 6 by comparison with the lookup table.
- a respective image frame 6 can be seen from Fig. 2a.
- this image frame 6 two persons are detected as one object 5, and further a wall is detected as another object 8. Since the object class "person ' ' has a higher priority than the object class "wall" a high priority area 9 is determined which relates to the object 5 which belong to the object class "person ".
- a next image frame 7 of the consecutively acquired image frames is input into the convolutional neural network in real time, wherein only the high priority area 9 in the next image frame 7 is analyzed by the convolutional neural network.
- Fig. 2b the image frame 7 which is processed by the convolutional neural network for semantic segmentation of the objects 5 relates to the high priority area determined in the previous method step in image frame 6.
- the objects 5 can be processed in much higher resolution which makes semantic segmentation of the objects 5, i.e.
- a high priority area with objects which should be classified may also be defined in a multi-step process as described in the following with respect to Figs. 3a to d.
- a high priority area 8 is defined which comprises two objects 5, 8 which belong to different object classes, i.e. "person” and "wall".
- a high priority area 9 is defined which comprises both objects 5, 8 which then, in the next image frame 7 shown in Fig. 3b can be analyzed with higher resolution.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Biodiversity & Conservation Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a method for semantic segmentation of an object (5, 8) in an image, comprising the following method steps: - consecutively acquiring image frames (6, 7, 11), - inputting a first image frame (6) of the consecutively acquired image frames (6, 7, 1) into a convolutional neuronal network in real time, - examining by the convolutional neuronal network whether any object (5, 8) can be detected in the first image frame (6), - semantically classifying the detected objects (5, 8) by the convolutional neuronal network by assigning each detected object (5, 8) to one of a list of predefined object classes, - providing a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively, - determining a respective priority level of the detected objects (5, 8) by comparison with the lookup-table, - determining one or more object(s) (5) which have a predefined priority level, - determining a high priority area (9) of the image frame (6) which relates to the or an object (5) with the predefined priority level, - inputting a next image frame (7) of the consecutively acquired image frames (5, 6) into the convolutional neuronal network in real time, - analyzing only the high priority (9) area in the next image frame (7) by the convolutional neuronal network. In this way, an efficient CNN architecture design that can be applied for an automotive camera (3) with a large field of view taking advantage of the large field of view.
Description
Semantic Segmentation of an Object in an Image
The invention relates to a method for semantic segmentation of an object in an image, comprising the following method steps:
consecutively acquiring image frames,
inputting a first image frame of the consecutively acquired image frames into a convolutional neuronal network in real time, and
examining by the convolutional neuronal network whether any object can be detected in the first image frame for semantic segmentation.
One of the most fundamental problems in automotive computer vision is the semantic segmentation of objects in an image. The segmentation approach refers to the problems of associating every pixel to its corresponding object class. In recent times, there was a surge of convolution neural network (CNN) research and design aided by increase in computational power in computer architectures and the availability of large annotated datasets.
CNNs are highly successful at classification and categorization tasks but much of the research is on standard photometric RGB images and is not focused on embedded automotive devices. Automotive hardware devices need to have low power consumption requirements and thus low computational power.
In machine learning, a convolutional neural is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery. CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing. Convolutional networks were inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage. CNNs have applications in image and video recognition, recommender systems and natural language processing.
In this respect, US 2017/0200063 A1 teaches applying a set of sections spanning a down-sampled version of an image of a road-scene to a low-fidelity classifier to determine a set of candidate sections for depicting one or more objects in a set of classes. The set of candidate sections of the down-sampled version may be mapped to a set of potential sectors in a high-fidelity version of the image. A high-fidelity classifier may be used to vet the set of potential sectors, determining the presence of one or more objects from the set of classes. The low-fidelity classifier may include a first convolution neural network trained on a first training set of down-sampled versions of cropped images of objects in the set of classes. Similarly, the high-fidelity classifier may include a second CNN trained on a second training set of high-fidelity versions of cropped images of objects in the set of classes.
From US 2017/0099200 A1 it is known that data is received characterizing a request for agent computation of sensor data. The request includes a required confidence and required latency for completion of the agent computation. Agents to query are determined based on the required confidence. Data is transmitted to query the determined agents to provide analysis of the sensor data.
US 9.704,054 B1 describes that image classification and related imaging tasks performed using machine learning tools may be accelerated by using tools to associate an image with a cluster of such labels or categories, and then to select one of the labels or categories of the cluster as associated with the image. The clusters of labels or categories may comprise labels that are mutually confused for one another, e.g. two or more labels or categories that have been identified as associated with a single image. By defining clusters of labels or categories, and configuring a machine learning tool to associate an image with one of the clusters, processes for identifying labels or categories associated with images may be accelerated because computations associated with labels or categories not included in the cluster may be omitted.
It is an objective of the present invention to provide an efficient CNN architecture design that can be applied for an automotive camera with a large field of view taking advantage of the large field of view.
This object is addressed by the subject matter of the independent claims. Preferred embodiments are described in the sub claims.
Therefore, the invention provides a method for semantic segmentation of an object in an image, comprising the following method steps:
consecutively acquiring image frames,
inputting a first image frame of the consecutively acquired image frames into a convolutional neuronal network in real time,
examining by the convolutional neuronal network whether any object can be detected in the first image frame,
semantically classifying the detected objects by the convolutional neuronal network by assigning each detected object to one of a list of predefined object classes, providing a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively,
determining a respective priority level of the detected objects by comparison with the lookup-table,
determining one or more object(s) which have a predefined priority level, determining a high priority area of the image frame which relates to the or an object with the predefined priority level,
inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time,
analyzing only the high priority area in the next image frame by the convolutional neuronal network.
Thus, it is an essential idea of the invention that instead of regularly processing whole images only a section of the image may be processed with higher resolution for semantic segmentation of objects in the image. Especially, instead of always analyzing the complete image, a high priority area of the image is determined in a first image frame based on the priority levels of the objects detected in the image. Then, in a next image frame, only the high priority area of the image is processed which makes the method a lot more effective. Preferably, the priority levels of the different object classes are defined based on an order of safety, e.g. objects belonging to the object class "person" might be more important than objects belonging to the object class "curbside".
Preferably, at the beginning of this method, the high priority area would be defined by the object(s) with the highest priority level, i.e. the predefined priority level would be the highest priority level. If these objects have been classified in a trustworthy way, areas of the image with objects having lower priority levels may be processed.
The step of analyzing only the high priority area in the next image frame by the convoiutional neuronal network may be performed in different ways as set out in the following. According to a preferred embodiment of the invention, analyzing only the high priority area in the next image frame by the convoiutional neuronal network is performed by
examining by the convoiutional neuronal network whether any object can be detected in the high priority area,
semantically classifying the detected objects by the convoiutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects by comparison with the lookup-table,
determining the one or more object(s) with the predefined priority level, determining a new high priority area of the image frame which relates to the or an object with the predefined priority level,
inputting a next image frame of the consecutively acquired image frames into the convoiutional neuronal network in real time, and
analyzing only the new high priority area in the next image frame by the convoiutional neuronal network.
Preferably, the step of analyzing only the high priority area in the next image frame by the convoiutional neuronal network by
examining by the convoiutional neuronal network whether any object can be detected in the high priority area,
semantically classifying the detected objects by the convoiutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects by comparison with the lookup-table,
determining the one or more object(s) with the predefined priority level,
determining a new high priority area of the image frame which relates to the or an object with the predefined priority level,
inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time, and
analyzing only the new high priority area in the next image frame by the convolutional neuronal network, is repeated at least once.
In this way, a high priority area with objects which should be classified may be defined in a multi-step process. However, according to another preferred embodiment of the invention, such classification may also be performed directly after the first definition of the high priority area. Therefore, according to a preferred embodiment of the invention analyzing only the high priority area in the next image frame by the convolutional neuronal network is performed by semantically classifying the object by assigning the object to one of the list of predefined object classes. In this respect, preferably the following step is performed:
accepting the object class the object has assigned to when analyzing only the high priority area in the next image frame as a trustworthy object class. If such a trustworthy classification of objects with a certain priority level has been achieved, preferably areas with objects with the next smaller priority class are processed.
In general, inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time may performed by inputting the complete image frame. However, according to a preferred embodiment of the invention, inputting a next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time is performed by inputting only the high priority area of the next image frame into the convolutional neuronal network.
Further, according to a preferred embodiment of the invention, the step of consecutively acquiring image frames is performed by a camera with a field of view of more than 150 yielding respective image frames covering an image angle of more than 150 \ More preferably, the camera has a field of view of more than 180 yielding respective image frames covering an image angle of more than 180°. In this way, a large field of view may be monitored while the mere amount of pixels of the images acquired by such a camera
does not slow down processing speed appreciably since not the complete images have to processed for all image frames.
The invention also relates to the use of a method as described above in an automotive vehicle.
The invention further relates to a sensor arrangement for an automotive vehicle configured for performing a method as described above.
The invention also relates to a non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement of an automotive vehicle to perform a method as described above.
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter. Individual features disclosed in the embodiments con constitute alone or in combination an aspect of the present invention. Features of the different embodiments can be carried over from one embodiment to another embodiment.
In the drawings:
Fig. 1 schematically depicts a vehicle with a sensor arrangement according to a preferred embodiment of the invention,
Figs. 2 a, b schematically depict the processing of image frames according to a
preferred embodiment of the invention, and
Figs. 3a - d schematically depict a further aspect of the processing of image frames according to a preferred embodiment of the invention.
Fig. 1 schematically depicts an automotive vehicle 1 with a sensor arrangement 2 which is comprised of a camera 3 and an evaluation unit 4. The sensor arrangement 2 is adapted for semantic segmentation of images of objects 5 captured by camera 3. The evaluation unit 4 may be part of an advanced driver-assistance system for helping a
driver of the automotive vehicle 1 in the driving process. The camera 3 is a large field-of- view camera 3 and may have a viewing angle which is larger than 180°.
The method performed by the sensor arrangement 2 according to the preferred embodiment of the invention is as described in the following:
The camera 3 consecutively acquires image frames. The fre uency of acquiring image frames may be as high as 30 frames/second. However, for effectively processing the image frames, a processing frequency of 5 frames/second has shown to be sufficient. For processing the image frames, a first image frame 6 of the consecutively acquired image frames is input into a convolutional neural network in real time. The convolutional neural network is provided in the evaluation unit 4 to which the image frames of the camera 3 are transmitted.
In the convolutional neural network it is examined whether any object 5 which is not part of the ground area the automotive vehicle 1 is driving on can be detected in the first image frame 6. If such objects 5 can be detected in the first image frame 6, these objects are semantically classified by the convolutional neural network by assigning each detected object to one of a list of predefined object classes.
According to the preferred embodiment described here, these object classes may be "person", "car", "wall", "tree",... Such semantical classification of objects by a
convolutional neural network is well-known to the man skilled in the art and does not require any further explanations here.
However, differently from conventional methods, according to the preferred embodiment of the invention, a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively, is provided. In the present case, this priority list looks as follows: person priority 1
car priority 2
wall priority 3
tree priority 4
This priority list may have further object classes which are related to respective priorities. A respective priority level is determined for each object which has been detected in the first image frame 6 by comparison with the lookup table.
A respective image frame 6 can be seen from Fig. 2a. In this image frame 6 two persons are detected as one object 5, and further a wall is detected as another object 8. Since the object class "person'' has a higher priority than the object class "wall" a high priority area 9 is determined which relates to the object 5 which belong to the object class "person ".
Then, a next image frame 7 of the consecutively acquired image frames is input into the convolutional neural network in real time, wherein only the high priority area 9 in the next image frame 7 is analyzed by the convolutional neural network. This is shown in Fig. 2b in which the image frame 7 which is processed by the convolutional neural network for semantic segmentation of the objects 5 relates to the high priority area determined in the previous method step in image frame 6. In this way, the objects 5 can be processed in much higher resolution which makes semantic segmentation of the objects 5, i.e.
assigning the objects 5 to one of the list of predefined object classes, easier and, thus, more trustworthy.
However, according to a preferred embodiment of the invention, a high priority area with objects which should be classified may also be defined in a multi-step process as described in the following with respect to Figs. 3a to d.
In Fig. 3a it is shown that a high priority area 8 is defined which comprises two objects 5, 8 which belong to different object classes, i.e. "person" and "wall". Instead of directly focusing on object 5 which is the object with the higher priority, a high priority area 9 is defined which comprises both objects 5, 8 which then, in the next image frame 7 shown in Fig. 3b can be analyzed with higher resolution.
This analysis with higher resolution allows to clearly distinguish between the two objects 5, 8, and to define a new high priority area 10 which only relates to the object 5 which belongs to the object class with the highest priority, i.e. "person" as shown in Fig. 3c.
Then, in a further image frame 1 1 shown in Fig. 3d, only this new high priority area 10 is examined, i.e. semantic segmentation is only performed for object 5 in order to verify that the object 5 detected here does actually belong to the object class "person".
Reference signs list
automotive vehicle
sensor arrangement
camera
evaluation unit
persons
first image frame
next image frame
wall
high priority area
new high priority area
further next image frame
Claims
1 . Method for semantic segmentation of an object (5, 8) in an image, comprising the following method steps:
consecutively acquiring image frames (6, 7, 1 1 ),
inputting a first image frame (6) of the consecutively acquired image frames (6, 7, 1 1 ) into a convolutional neuronal network in real time,
examining by the convolutional neuronal network whether any object (5, 8) can be detected in the first image frame (6),
semantically classifying the detected objects (5, 8) by the convolutional neuronal network by assigning each detected object (5, 8) to one of a list of predefined object classes,
providing a lookup-table with a priority list which comprises a priority level for each of the predefined object classes, respectively,
determining a respective priority level of the detected objects (5, 8) by comparison with the lookup-table,
determining one or more object(s) (5) which have a predefined priority level, determining a high priority area (9) of the image frame (6) which relates to the or an object (5) with the predefined priority level,
inputting a next image frame (7) of the consecutively acquired image frames (5, 6) into the convolutional neuronal network in real time,
analyzing only the high priority (9) area in the next image frame (7) by the convolutional neuronal network.
2. Method according to claim 1 , wherein analyzing only the high priority area (9) in the next image frame (7) by the convolutional neuronal network is performed by
examining by the convolutional neuronal network whether any object (5, 8) can be detected in the high priority area (9),
semantically classifying the detected objects (5, 8) by the convolutional neuronal network by assigning each detected object (5, 8) to one of the list of predefined object classes,
determining a respective priority of the detected objects (5, 8) by comparison with the lookup-table,
determining the one or more object(s) (5) with the predefined priority level,
determining a new high priority area (10) of the image frame which relates to the or an object (5) with the predefined priority level,
inputting a next image frame (1 1 ) of the consecutively acquired image frames (5, 6) into the convolutional neuronal network in real time, and
analyzing only the new high priority area (10) in the next image frame (1 1 ) by the convolutional neuronal network.
3. Method according to claim 2, by repeating at least once the step of analyzing only the high priority area in a further next image frame by the convolutional neuronal network by
examining by the convolutional neuronal network whether any object (5, 8) can be detected in the high priority area,
semanticaliy classifying the detected objects (5, 8) by the convolutional neuronal network by assigning each detected object to one of the list of predefined object classes, determining a respective priority of the detected objects (5, 8) by comparison with the lookup-table,
determining the one or more object(s) (5) with the predefined priority level, determining a new high priority area of the image frame which relates to the or an object (5) with the predefined priority level,
inputting a further next image frame of the consecutively acquired image frames into the convolutional neuronal network in real time, and
analyzing only the new high priority area in the next image frame by the
convolutional neuronal network.
4. Method according to any of claims 1 to 3, wherein analyzing only the high priority area (9, 10) in the next image frame (7, 1 1 ) by the convolutional neuronal network is performed by semanticaliy classifying the object (5, 8) by assigning the object (5, 8) to one of the list of predefined object classes.
5. Method according to claim 4 comprising the following method step:
accepting the object class the object (5, 8) has assigned to when analyzing only the high priority area in the next image (7, 1 1 ) frame as a trustworthy object class.
6. Method according to any of the previous claims, wherein inputting a next image frame (7, 1 1 ) of the consecutively acquired image (6, 7, 1 1 ) frames into the convolutional
neuronal network in real time is performed by inputting only the high priority area (9, 10) of the next image frame (7, 1 1 ) into the convolutional neuronal network.
7. Method according to any of the previous claims wherein the step of consecutively acquiring image frames (6, 7, 1 1 ) is performed by a camera (3) with a field of view of more than 150 yielding respective image frames (6, 7. 1 1 ) covering an image angle of more than 150°.
8. Use of the method according to any of the previous claims in an automotive vehicle (1 ).
9. Sensor arrangement (2) for an automotive vehicle (1 ) configured for performing the method according to any of claims 1 to 8.
10. Non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement (2) of an automotive vehicle (1 ) to perform the method of any of claims 1 to 8.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102017124600.2 | 2017-10-20 | ||
DE102017124600.2A DE102017124600A1 (en) | 2017-10-20 | 2017-10-20 | Semantic segmentation of an object in an image |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019076867A1 true WO2019076867A1 (en) | 2019-04-25 |
Family
ID=63896158
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2018/078192 WO2019076867A1 (en) | 2017-10-20 | 2018-10-16 | Semantic segmentation of an object in an image |
Country Status (2)
Country | Link |
---|---|
DE (1) | DE102017124600A1 (en) |
WO (1) | WO2019076867A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113392837A (en) * | 2021-07-09 | 2021-09-14 | 超级视线科技有限公司 | License plate recognition method and device based on deep learning |
GB2607420A (en) * | 2021-04-06 | 2022-12-07 | Canon Kk | Image processing apparatus and method for controlling the same |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102021003439A1 (en) | 2021-07-02 | 2021-08-19 | Daimler Ag | Method for drawing attention to at least one occupant in a vehicle |
DE102021004931A1 (en) | 2021-10-01 | 2021-12-09 | Daimler Ag | Method for processing environmental data in a vehicle |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170099200A1 (en) | 2015-10-06 | 2017-04-06 | Evolv Technologies, Inc. | Platform for Gathering Real-Time Analysis |
US9704054B1 (en) | 2015-09-30 | 2017-07-11 | Amazon Technologies, Inc. | Cluster-trained machine learning for image processing |
US20170200063A1 (en) | 2016-01-13 | 2017-07-13 | Ford Global Technologies, Llc | Low- and high-fidelity classifiers applied to road-scene images |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6400831B2 (en) * | 1998-04-02 | 2002-06-04 | Microsoft Corporation | Semantic video object segmentation and tracking |
US6697502B2 (en) * | 2000-12-14 | 2004-02-24 | Eastman Kodak Company | Image processing method for detecting human figures in a digital image |
US9607224B2 (en) * | 2015-05-14 | 2017-03-28 | Google Inc. | Entity based temporal segmentation of video streams |
-
2017
- 2017-10-20 DE DE102017124600.2A patent/DE102017124600A1/en active Pending
-
2018
- 2018-10-16 WO PCT/EP2018/078192 patent/WO2019076867A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9704054B1 (en) | 2015-09-30 | 2017-07-11 | Amazon Technologies, Inc. | Cluster-trained machine learning for image processing |
US20170099200A1 (en) | 2015-10-06 | 2017-04-06 | Evolv Technologies, Inc. | Platform for Gathering Real-Time Analysis |
US20170200063A1 (en) | 2016-01-13 | 2017-07-13 | Ford Global Technologies, Llc | Low- and high-fidelity classifiers applied to road-scene images |
Non-Patent Citations (1)
Title |
---|
SERGI CAELLES ET AL: "Semantically-Guided Video Object Segmentation", 6 April 2017 (2017-04-06), XP055543131, Retrieved from the Internet <URL:https://arxiv.org/pdf/1704.01926v1.pdf> [retrieved on 20190116] * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2607420A (en) * | 2021-04-06 | 2022-12-07 | Canon Kk | Image processing apparatus and method for controlling the same |
GB2607420B (en) * | 2021-04-06 | 2024-08-21 | Canon Kk | Image processing apparatus and method for controlling the same |
GB2629706A (en) * | 2021-04-06 | 2024-11-06 | Canon Kk | Image processing apparatus and method for controlling the same |
CN113392837A (en) * | 2021-07-09 | 2021-09-14 | 超级视线科技有限公司 | License plate recognition method and device based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
DE102017124600A1 (en) | 2019-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3740897B1 (en) | License plate reader using optical character recognition on plural detected regions | |
CN108399628B (en) | Method and system for tracking objects | |
CN111989689B (en) | Method for identifying an object in an image and mobile device for executing the method | |
US10860837B2 (en) | Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition | |
Soo | Object detection using Haar-cascade Classifier | |
US20180189610A1 (en) | Active machine learning for training an event classification | |
US20180285698A1 (en) | Image processing apparatus, image processing method, and image processing program medium | |
Oloyede et al. | Improving face recognition systems using a new image enhancement technique, hybrid features and the convolutional neural network | |
WO2019076867A1 (en) | Semantic segmentation of an object in an image | |
Shenavarmasouleh et al. | Drdr: Automatic masking of exudates and microaneurysms caused by diabetic retinopathy using mask r-cnn and transfer learning | |
Alshayeji et al. | Optic disc detection in retinal fundus images using gravitational law-based edge detection | |
US20100111375A1 (en) | Method for Determining Atributes of Faces in Images | |
JP2021533506A (en) | Systems and methods for video anomaly detection and storage media | |
Niu et al. | Automatic localization of optic disc based on deep learning in fundus images | |
Thabet et al. | Fast marching method and modified features fusion in enhanced dynamic hand gesture segmentation and detection method under complicated background | |
Chandra et al. | Automated bird species recognition system based on image processing and svm classifier | |
Marzec et al. | Fast eye localization from thermal images using neural networks | |
KR20210089044A (en) | Method of selecting training data for object detection and object detection device for detecting object using object detection model trained using method | |
Oh et al. | Visual adversarial attacks and defenses | |
Borji et al. | Bottom-up attention, models of | |
Lakshmanan et al. | A novel deep facenet framework for real-time face detection based on deep learning model | |
CN109711260B (en) | Fatigue state detection method, terminal device and medium | |
Sharma et al. | Image Fusion with Deep Leaning using Wavelet Transformation | |
US12243352B2 (en) | Methods and systems for adaptive binarization thresholding for pupil segmentation | |
Chaurasiya et al. | Novel Approach for Automatic Cataract Detection Using Image Processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18789091 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18789091 Country of ref document: EP Kind code of ref document: A1 |