WO2025185158A1

WO2025185158A1 - Target detection method based on three-dimensional data, device, and medium

Info

Publication number: WO2025185158A1
Application number: PCT/CN2024/124474
Authority: WO
Inventors: 甘俊英; 陈真; 庄圳鑫; 许文超; 陈汉添; 刘建强; 熊俊玲; 黎慧聪
Original assignee: Wuyi University Fujian
Current assignee: Wuyi University Fujian
Priority date: 2024-03-07
Filing date: 2024-10-12
Publication date: 2025-09-12
Anticipated expiration: 2026-09-07
Also published as: WO2025185158A8; CN118351287A

Abstract

The present application provides a target detection method based on three-dimensional data, a device, and a medium. The method comprises: constructing a three-dimensional model of a target to be detected; acquiring target images of the three-dimensional model under different rendering parameters; inputting noise and a real image of said target into a generator to generate a first image, and inputting the first image and the target images into a discriminator for training to obtain a trained detection model; and inputting an image under detection into the trained detection model for detection to obtain a target detection result. Virtual cameras are adjusted to perform multi-angle rendering on a target three-dimensional model, thereby ensuring that a target image database has balanced and sufficient data, solving the problem of data imbalance, and improving the detection effect of the detection model.

Description

Target detection method, device and medium based on three-dimensional data

Technical Field

本申请实施例涉及图像识别领域，尤其涉及基于三维数据的目标检测方法、设备及介质。The embodiments of the present application relate to the field of image recognition, and in particular to target detection methods, devices, and media based on three-dimensional data.

Background Art

在计算机视觉领域中，目标检测是指识别图像或视频中的特定物体的类型，并确定它们的位置。但是，目标检测模型的训练依赖于丰富的训练数据；由于取证拍摄困难，诸多领域存在训练图像数据不足的问题。当训练图像数据不足的时候，会导致模型训练度差，进而导致使用训练后的模型进行目标检测的检测效果差。In the field of computer vision, object detection refers to identifying the types of specific objects in images or videos and determining their locations. However, training object detection models relies on abundant training data. Due to the difficulties in obtaining forensic images, many fields face a shortage of training image data. Insufficient training image data leads to poor model training, which in turn results in poor object detection performance using the trained model.

发明内容Summary of the Invention

以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

本申请的目的在于至少一定程度上解决相关技术中存在的技术问题之一，本申请实施例提供了基于三维数据的目标检测方法、设备及介质，能够保证平衡且充分的训练数据。The purpose of this application is to solve one of the technical problems existing in the related art to at least a certain extent. The embodiments of this application provide a target detection method, device and medium based on three-dimensional data, which can ensure balanced and sufficient training data.

本申请的第一方面的实施例，一种基于三维数据的目标检测方法，包括：An embodiment of the first aspect of the present application is a method for object detection based on three-dimensional data, comprising:

构建待检测目标的三维模型；Construct a three-dimensional model of the target to be detected;

获取待检测目标的三维模型在不同的渲染参数下的目标图像；Obtain target images of the three-dimensional model of the target to be detected under different rendering parameters;

将噪声和待检测目标的真实图像输入至检测模型的生成器生成第一图像，将所述第一图像和目标图像输入至检测模型的判别器进行训练，得到训练后的检测模型；Inputting noise and a real image of a target to be detected into a generator of a detection model to generate a first image, inputting the first image and the target image into a discriminator of the detection model for training, and obtaining a trained detection model;

将待检测图像输入至训练后的检测模型进行检测，得到目标检测结果。The image to be detected is input into the trained detection model for detection to obtain the target detection result.

根据本申请的第一方面的某些实施例，所述获取待检测目标的三维模型在不同的渲染参数下的目标图像，包括：According to certain embodiments of the first aspect of the present application, obtaining target images of the three-dimensional model of the target to be detected under different rendering parameters includes:

设置渲染参数；Set rendering parameters;

设置虚拟相机群，所述虚拟相机群的多个虚拟相机围绕所述三维模型部署；Setting a virtual camera group, wherein a plurality of virtual cameras of the virtual camera group are deployed around the three-dimensional model;

通过所述虚拟相机群获取待检测目标的三维模型在渲染参数下的目标图像。The target image of the three-dimensional model of the target to be detected under the rendering parameters is obtained through the virtual camera group.

根据本申请的第一方面的某些实施例，所述通过所述虚拟相机群获取待检测目标的三维模型在渲染参数下的目标图像，包括：According to certain embodiments of the first aspect of the present application, obtaining a target image of the three-dimensional model of the target to be detected under rendering parameters by the virtual camera group includes:

执行以下步骤，直至虚拟相机群的旋转角度达到角度阈值或者目标图像的数量达到数量阈值：使所述虚拟相机群旋转预设单位角度，通过旋转后的虚拟相机群获取待检测目标的三维模型在渲染参数下的目标图像。The following steps are performed until the rotation angle of the virtual camera group reaches an angle threshold or the number of target images reaches a number threshold: the virtual camera group is rotated by a preset unit angle, and a target image of the three-dimensional model of the target to be detected under rendering parameters is obtained through the rotated virtual camera group.

根据本申请的第一方面的某些实施例，所述虚拟相机的坐标为(x_c，y_c，z_c)；其中，z_c＝rcosθ＝0；r为虚拟相机到三维模型的距离，θ为预设参数，为预设参数。According to some embodiments of the first aspect of the present application, the coordinates of the virtual camera are (x _c , y _c , z _c ); wherein, z _c = rcosθ = 0; r is the distance from the virtual camera to the 3D model, θ is the preset parameter, are preset parameters.

根据本申请的第一方面的某些实施例，当所述待检测目标与真实图像的大小比例小于第一比例阈值，增加虚拟相机到三维模型的距离。According to certain embodiments of the first aspect of the present application, when the size ratio of the target to be detected to the real image is less than a first ratio threshold, the distance from the virtual camera to the three-dimensional model is increased.

根据本申请的第一方面的某些实施例，所述获取待检测目标的三维模型在不同的渲染参数下的目标图像，包括： According to certain embodiments of the first aspect of the present application, obtaining target images of the three-dimensional model of the target to be detected under different rendering parameters includes:

当所述待检测目标与真实图像的大小比例大于第二比例阈值，将所述三维模型进行分割得到分割单元，获取所述分割单元在不同的渲染参数下的目标图像。When the size ratio of the target to be detected to the real image is greater than a second ratio threshold, the three-dimensional model is segmented to obtain segmentation units, and target images of the segmentation units under different rendering parameters are obtained.

根据本申请的第一方面的某些实施例，所述将所述第一图像和目标图像输入至检测模型的判别器进行训练，包括：According to certain embodiments of the first aspect of the present application, inputting the first image and the target image into a discriminator of a detection model for training includes:

当所述判别器将所述目标图像判定为真实图像，去除所述目标图像中的纯色背景并标记目标图像中的目标的轮廓线，得到标记图像；When the discriminator determines that the target image is a real image, the pure color background in the target image is removed and the contour line of the target in the target image is marked to obtain a marked image;

将所述标记图像的格式转换为真实图像的格式；Converting the format of the labeled image to the format of the real image;

将转换后的标记图像与所述真实图像叠加，得到叠加图像。The converted labeled image is superimposed on the real image to obtain a superimposed image.

根据本申请的第一方面的某些实施例，所述检测模型的损失函数为：L_total＝L_D+L_G，L_D＝-log(D(x))-log(1-D(G(z)))，L_G＝-log(D(G(z)))；其中，L_total为检测模型的损失函数，L_D为判别器的损失函数，L_G为生成器的损失函数，D为判别器，G为生成器，x为目标图像，z为噪声。According to certain embodiments of the first aspect of the present application, the loss function of the detection model is: L _total = _LD + _LG , _LD = -log(D(x))-log(1-D(G(z))), _LG = -log(D(G(z))); wherein, L _total is the loss function of the detection model, _LD is the loss function of the discriminator, _LG is the loss function of the generator, D is the discriminator, G is the generator, x is the target image, and z is the noise.

本申请的第二方面的实施例，一种电子设备，包括：存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上所述的基于三维数据的目标检测方法。An embodiment of the second aspect of the present application is an electronic device, comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the target detection method based on three-dimensional data as described above is implemented.

本申请的第三方面的实施例，一种计算机存储介质，存储有计算机可执行指令，所述计算机可执行指令用于执行如上所述的基于三维数据的目标检测方法。An embodiment of the third aspect of the present application is a computer storage medium storing computer-executable instructions, wherein the computer-executable instructions are used to execute the target detection method based on three-dimensional data as described above.

上述方案至少具有以下的有益效果：通过调整三维渲染引擎的虚拟相机，对目标三维模型进行无死角的摄像渲染，可得到待检测目标的三维模型的任意角度任意距离的图像，保证目标图像数据库具有平衡且充分的数据，解决数据不平衡问题，有利于后续的目标检测训练，提升检测模型的检测效果。The above scheme has at least the following beneficial effects: by adjusting the virtual camera of the 3D rendering engine, the target 3D model can be rendered without blind spots, and images of the 3D model of the target to be detected at any angle and any distance can be obtained, ensuring that the target image database has balanced and sufficient data, solving the data imbalance problem, which is beneficial to subsequent target detection training and improving the detection effect of the detection model.

BRIEF DESCRIPTION OF THE DRAWINGS

附图用来提供对本申请技术方案的进一步理解，并且构成说明书的一部分，与本申请的实施例一起用于解释本申请的技术方案，并不构成对本申请技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application and constitute a part of the specification. Together with the embodiments of the present application, they are used to explain the technical solution of the present application and do not constitute a limitation on the technical solution of the present application.

图1是基于三维数据的目标检测方法的步骤图；FIG1 is a step diagram of a target detection method based on three-dimensional data;

图2是步骤S200的子步骤图；FIG2 is a sub-step diagram of step S200;

图3是虚拟相机群与三维模型的位置关系图。FIG3 is a diagram showing the positional relationship between the virtual camera group and the three-dimensional model.

DETAILED DESCRIPTION

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of this application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain this application and are not intended to limit this application.

需要说明的是，虽然在装置示意图中进行了功能模块划分，在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于装置中的模块划分，或流程图中的顺序执行所示出或描述的步骤。说明书、权利要求书或上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。It should be noted that although the device schematics illustrate functional module divisions and the flowcharts illustrate logical sequences, in certain circumstances, the steps shown or described may be performed in a sequence that differs from the module divisions in the device or the sequence in the flowcharts. The terms "first," "second," and the like in the specification, claims, or accompanying drawings are used to distinguish similar items and are not necessarily used to describe a specific sequence or precedence.

下面结合附图，对本申请实施例作进一步阐述。The embodiments of the present application are further described below with reference to the accompanying drawings.

本申请的实施例，提供了目标检测网络，目标检测网络应用于如下的基于三维数据的目标检测方法。The embodiment of the present application provides a target detection network, which is applied to the following target detection based on three-dimensional data: Standard detection method.

参照图1，基于三维数据的目标检测方法，包括以下步骤：1 , the target detection method based on three-dimensional data includes the following steps:

步骤S100，构建待检测目标的三维模型；Step S100, constructing a three-dimensional model of the target to be detected;

步骤S200，获取待检测目标的三维模型在不同的渲染参数下的目标图像；Step S200, obtaining target images of a three-dimensional model of a target to be detected under different rendering parameters;

步骤S300，将噪声和待检测目标的真实图像输入至检测模型的生成器生成第一图像，将第一图像和目标图像输入至检测模型的判别器进行训练，得到训练后的检测模型；Step S300: Input the noise and the real image of the target to be detected into the generator of the detection model to generate a first image, and input the first image and the target image into the discriminator of the detection model for training to obtain a trained detection model;

步骤S400，将待检测图像输入至训练后的检测模型进行检测，得到目标检测结果。Step S400: Input the image to be detected into the trained detection model for detection to obtain the target detection result.

在该实施例中，通过调整三维渲染引擎的虚拟相机，对目标三维模型进行无死角的摄像渲染，可得到待检测目标的三维模型的任意角度任意距离的图像，保证目标图像数据库具有平衡且充分的数据，解决数据不平衡问题，有利于后续的目标检测训练，提升检测模型的检测效果。In this embodiment, by adjusting the virtual camera of the three-dimensional rendering engine, the target three-dimensional model is rendered without blind spots, and images of the three-dimensional model of the target to be detected at any angle and any distance can be obtained, ensuring that the target image database has balanced and sufficient data, solving the data imbalance problem, and facilitating subsequent target detection training and improving the detection effect of the detection model.

对于步骤S100，构建待检测目标的三维模型。In step S100 , a three-dimensional model of the target to be detected is constructed.

一方面，可以通过三维建模软件对待检测目标进行三维建模，构建待检测目标的三维模型。另一方面，可以从PASCAL3D+等数据库中获取待检测目标的三维模型。On the one hand, the target to be detected can be modeled in three dimensions using three-dimensional modeling software to construct a three-dimensional model of the target to be detected. On the other hand, the three-dimensional model of the target to be detected can be obtained from databases such as PASCAL3D+.

对于步骤S200，获取待检测目标的三维模型在不同的渲染参数下的目标图像。In step S200 , target images of the three-dimensional model of the target to be detected under different rendering parameters are obtained.

参照图2，获取待检测目标的三维模型在不同的渲染参数下的目标图像，包括以下步骤：2 , obtaining target images of a 3D model of a target to be detected under different rendering parameters includes the following steps:

步骤S210，设置渲染参数；Step S210, setting rendering parameters;

步骤S220，设置虚拟相机群；Step S220, setting a virtual camera group;

步骤S230，通过虚拟相机群获取待检测目标的三维模型在渲染参数下的目标图像。Step S230 : obtaining a target image of the three-dimensional model of the target to be detected under rendering parameters through the virtual camera group.

对于步骤S210，设置渲染参数；例如，渲染背景为与目标模型具有高对比度的纯色背景，通过天空盒模拟环境光照。In step S210 , rendering parameters are set; for example, the rendering background is a solid color background with high contrast to the target model, and the ambient lighting is simulated by a sky box.

可以设置不同的渲染参数，来模拟三维模型在不同环境下的状态。Different rendering parameters can be set to simulate the state of the 3D model in different environments.

通过设置不同的渲染参数可以丰富训练样本；在3D渲染阶段，有丰富的可调节参数，包括天气、太阳光线强度、目标的材质等，每调节一个参数，可以产生大量训练样本，具有成本低和效率高的优点。Training samples can be enriched by setting different rendering parameters; in the 3D rendering stage, there are a variety of adjustable parameters, including weather, sunlight intensity, target material, etc. Each time a parameter is adjusted, a large number of training samples can be generated, which has the advantages of low cost and high efficiency.

对于步骤S220，设置虚拟相机群。In step S220 , a virtual camera group is set.

参照图3，虚拟相机群的多个虚拟相机围绕三维模型部署。中间的方块表示三维模型；外围的方块表示相机。虚拟相机与三维模型之间的距离r大于三维模型的长宽高中值最大的一个，即r>max(L,W,H)。As shown in Figure 3, multiple virtual cameras in a virtual camera cluster are deployed around a 3D model. The center block represents the 3D model; the outer blocks represent the cameras. The distance r between the virtual cameras and the 3D model is greater than the maximum of the 3D model's length, width, and height, i.e., r > max(L, W, H).

为了解决数据不平衡问题和加速渲染速度，设计多个相机同时渲染，假设相机的数量为n(n≤360)，则相机群(c₁,c₂,c₃…c_n)的初始坐标可用如下公式表示：其中，θ和为虚拟相机的角度。In order to solve the data imbalance problem and speed up the rendering speed, multiple cameras are designed to render simultaneously. Assuming that the number of cameras is n (n≤360), the initial coordinates of the camera group (c ₁ ,c ₂ ,c ₃ … _cn ) can be expressed by the following formula: Among them, θ and is the angle of the virtual camera.

对于步骤S230，通过虚拟相机群获取待检测目标的三维模型在渲染参数下的目标图像。In step S230 , a target image of the three-dimensional model of the target to be detected under rendering parameters is obtained through the virtual camera group.

执行以下步骤，直至虚拟相机群的旋转角度达到角度阈值或者目标图像的数量达到数量阈值：使虚拟相机群旋转预设单位角度，通过旋转后的虚拟相机群获取待检测目标的三维模型在渲染参数下的目标图像。The following steps are performed until the rotation angle of the virtual camera group reaches an angle threshold or the number of target images reaches a number threshold: the virtual camera group is rotated by a preset unit angle, and a target image of the three-dimensional model of the target to be detected under rendering parameters is obtained through the rotated virtual camera group.

具体地，将预设单位角度设置为1度。相机群均匀分布在圆上，坐标(0，r，0)和(0，-r，0)必然存在相机，在初始坐标渲染完第一批图像后，坐标(0，r，0)和(0，-r，0)的相机停用，剩余相机绕坐标轴旋转渲染，每旋转1度可获得一批新的目标图像，剩余相机回归原坐标后停止渲染。Specifically, the preset unit angle is set to 1 degree. The camera group is evenly distributed on the circle. There must be a camera at the coordinates (0, r, 0) and (0, -r, 0). After the first batch of images are rendered at the initial coordinates, the coordinates (0, r, 0) and (0, -r, 0) are The camera is deactivated, and the remaining cameras are rotated around the coordinate axis for rendering. A new batch of target images can be obtained every time the camera rotates 1 degree. The rendering stops after the remaining cameras return to the original coordinates.

其中，对于小目标，可以通过增加相机与目标模型的距离以获取小目标样本。当待检测目标与真实图像的大小比例小于第一比例阈值，增加虚拟相机到三维模型的距离。For small objects, the distance between the camera and the object model can be increased to obtain small object samples. When the size ratio of the object to be detected to the real image is less than a first ratio threshold, the distance between the virtual camera and the 3D model is increased.

对于近距离目标，可以将目标模型进行多段分割，对分割后的每段模型建立图像样本数据库。当待检测目标与真实图像的大小比例大于第二比例阈值，将三维模型进行分割得到分割单元，获取分割单元在不同的渲染参数下的目标图像。例如，待检测目标为人形目标，可以先使用火柴人模型建立目标图像数据进行多次训练，使对抗式生成网络能够生成较小的目标图像。对于残缺的或受遮掩的目标图像，可以将人体模型从上到下进行三段式分割，分割为头部，上半身，下半身，然后建立目标图像数据库用于训练。为了实现近距离目标检测，也可以增加一些目标独有的特征训练样本，如人脸，手部，五官的图片，进行混合训练。For close-range targets, the target model can be segmented into multiple segments, and an image sample database can be established for each segmented model. When the size ratio between the target to be detected and the real image is greater than the second ratio threshold, the three-dimensional model is segmented to obtain segmentation units, and the target image of the segmentation unit under different rendering parameters is obtained. For example, if the target to be detected is a humanoid target, a stickman model can be used to establish target image data for multiple trainings, so that the adversarial generative network can generate smaller target images. For incomplete or obscured target images, the human body model can be segmented into three segments from top to bottom, namely the head, upper body, and lower body, and then a target image database can be established for training. In order to achieve close-range target detection, some target-specific feature training samples can also be added, such as pictures of faces, hands, and facial features, for mixed training.

通过小目标模糊检测、大目标特征检测，能够有效降低目标检测的失误的概率。By performing small target fuzzy detection and large target feature detection, the probability of target detection errors can be effectively reduced.

对于步骤S300，将噪声和待检测目标的真实图像输入至检测模型的生成器生成第一图像，将第一图像和目标图像输入至检测模型的判别器进行训练，得到训练后的检测模型。For step S300, the noise and the real image of the target to be detected are input into the generator of the detection model to generate a first image, and the first image and the target image are input into the discriminator of the detection model for training to obtain a trained detection model.

生成器接收一个随机生成的高斯噪声和待检测目标的真实图片，生成新的第一图像。将第一图像和目标图像数据库中的目标图像一起输入判别器进行训练，判别器试图将真实数据判别为真实，将生成的假数据判别为假。The generator receives a randomly generated Gaussian noise and a real image of the target to be detected, generating a new first image. This first image, along with target images from a target image database, is then fed into the discriminator for training. The discriminator attempts to classify real data as real and generated fake data as fake.

设有n个噪声样本{z⁽¹⁾,z⁽²⁾…z⁽ⁿ⁾}，n个来自目标图像数据库的样本{x⁽¹⁾,x⁽²⁾…x⁽ⁿ⁾}。计算判别器和生成器的的损失函数，有检测模型的损失函数为：L_total＝L_D+L_G，L_D＝-log(D(x))-log(1-D(G(z)))，L_G＝-log(D(G(z)))；其中，L_total为检测模型的损失函数，L_D为判别器的损失函数，L_G为生成器的损失函数，D为判别器，G为生成器，x为目标图像，z为噪声。通过梯度下降法依次更新判别器和生成器的参数，直到生成器产生逼真的数据，判别器难以区分真实数据和生成数据。Suppose there are n noise samples {z ⁽¹⁾ , z ⁽²⁾ …z ⁽ⁿ⁾ } and n samples from the target image database {x ⁽¹⁾ , x ⁽²⁾ …x ⁽ⁿ⁾ }. Calculate the loss functions of the discriminator and generator. The loss function of the detection model is: L _total = _LD + _LG , _LD = -log(D(x))-log(1-D(G(z))), _LG = -log(D(G(z))); where L _total is the loss function of the detection model, _LD is the loss function of the discriminator, _LG is the loss function of the generator, D is the discriminator, G is the generator, x is the target image, and z is the noise. The parameters of the discriminator and generator are updated sequentially using gradient descent until the generator produces realistic data and the discriminator has difficulty distinguishing between real data and generated data.

通过优化算法不断更新判别器和生成器的参数，使生成器生成的图像能够骗过判别器，即判别器判别生成的图像为真实的。最终得到训练后的检测模型。The parameters of the discriminator and generator are continuously updated through an optimization algorithm, so that the images generated by the generator can deceive the discriminator, that is, the discriminator judges the generated images as real. Finally, a trained detection model is obtained.

当判别器将目标图像判定为真实图像，使用基于阈值和轮廓线提取的方法去除目标图像中的纯色背景并用红色线条标记目标图像中的目标的轮廓线，得到标记图像。将标记图像的格式转换为真实图像的格式；例如，转换为带Alpha通道的PNG图片，适当压缩PNG图片，使其分辨率和图片质量与原真实图像保持一致。将压缩后的PNG图片叠加在原图片上合成用红色轮廓线标记出目标的图片，得到叠加图像。When the discriminator identifies the target image as a real image, it uses a thresholding and contour extraction method to remove the solid background from the target image and mark the outline of the target in the target image with a red line, resulting in a marked image. The marked image is then converted to the format of the real image; for example, to a PNG image with an alpha channel. The PNG image is then compressed appropriately to maintain the same resolution and image quality as the real image. The compressed PNG image is then overlaid on the original image to create a composite image with the target outline marked with a red line, resulting in the overlaid image.

通过目标轮廓线来对目标进行标记，可以将目标与环境更加充分的分隔开，预测目标的空间体积。By marking the target with its contour line, the target can be more fully separated from the environment and the spatial volume of the target can be predicted.

本申请的实施例，提供一种电子设备。电子设备包括：存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，处理器执行计算机程序时实现如上的基于三维数据的目标检测方法。An embodiment of the present application provides an electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the above-mentioned object detection method based on three-dimensional data when executing the computer program.

该电子设备可以为包括电脑等任意智能终端。The electronic device may be any intelligent terminal including a computer.

总体而言，对于电子设备的硬件结构，处理器可以采用通用的CPU(Central Processing Unit，中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit，ASIC)、或者一个或多个集成电路等方式实现，用于执行相关程序，以实现本申请实施例所提供的技术方案。In general, for the hardware structure of the electronic device, the processor can be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits, etc., for executing relevant programs to implement the technology provided in the embodiments of the present application. plan.

存储器可以采用只读存储器(Read Only Memory，ROM)、静态存储设备、动态存储设备或者随机存取存储器(Random Access Memory，RAM)等形式实现。存储器可以存储操作系统和其他应用程序，在通过软件或者固件来实现本说明书实施例所提供的技术方案时，相关的程序代码保存在存储器中，并由处理器来调用执行本申请实施例的方法。The memory can be implemented in the form of a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory can store an operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory and is called by the processor to execute the methods of the embodiments of this application.

输入/输出接口用于实现信息输入及输出。The input/output interface is used to realize information input and output.

通信接口用于实现本设备与其他设备的通信交互，可以通过有线方式(例如USB、网线等)实现通信，也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface is used to realize the communication interaction between this device and other devices. Communication can be achieved through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).

总线在设备的各个组件(例如处理器、存储器、输入/输出接口和通信接口)之间传输信息。处理器、存储器、输入/输出接口和通信接口通过总线实现彼此之间在设备内部的通信连接。The bus transmits information between the various components of the device (such as the processor, memory, input/output interface, and communication interface). The processor, memory, input/output interface, and communication interface communicate with each other within the device through the bus.

本申请的实施例，提供了一种计算机可读存储介质。计算机可读存储介质存储有计算机可执行指令，计算机可执行指令用于执行如上的基于三维数据的目标检测方法。An embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions for executing the above-mentioned object detection method based on three-dimensional data.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器，如中央处理器、数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。在本说明书的上述描述中，参考术语“一个实施方式/实施例”、“另一实施方式/实施例”或“某些实施方式/实施例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施方式或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。Those skilled in the art will appreciate that all or some of the steps and systems in the method disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some physical components or all physical components can be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or implemented as hardware, or implemented as an integrated circuit, such as an application-specific integrated circuit. Such software can be distributed on a computer-readable medium, and the computer-readable medium can include computer storage media (or non-transitory media) and communication media (or temporary media). As known to those skilled in the art, the term computer storage media is included in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data) and is volatile and non-volatile, removable, and non-removable. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory, or other memory technology, CD-ROM, digital versatile disks (DVD), or other optical disk storage, magnetic cassettes, magnetic tapes, disk storage, or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer. In addition, it is well known to those skilled in the art that communication media generally contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium. In the above description of this specification, the reference terms "one embodiment/example", "another embodiment/example" or "certain embodiments/examples" and the like are intended to mean that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present application. In this specification, the schematic representation of the above terms does not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those skilled in the art will appreciate that all or some of the steps in the methods, systems, and functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, or appropriate combinations thereof.

上述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of these units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into a single processing unit, or each unit may exist physically separately, or two or more units may be integrated into a single unit. The aforementioned integrated units may be implemented in the form of hardware or software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括多指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例的方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，简称ROM)、随机存取存储器(Random Access Memory，简称RAM)、磁碟或者光盘等各种可以存储程序的介质。If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can store In a computer-readable storage medium. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes multiple instructions for enabling a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of various embodiments of the present application. The aforementioned storage medium includes: various media that can store programs, such as a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，上述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。尽管已经示出和描述了本申请的实施方式，本领域的普通技术人员可以理解：在不脱离本申请的原理和宗旨的情况下可以对这些实施方式进行多种变化、修改、替换和变型，本申请的范围由权利要求及其等同物限定。In the several embodiments provided in the present application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are merely schematic. For example, the division of the above-mentioned units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms. Although the embodiments of the present application have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and purpose of the present application, and the scope of the present application is defined by the claims and their equivalents.

以上是对本申请的较佳实施进行了具体说明，但本申请并不限于实施例，熟悉本领域的技术人员在不违背本申请精神的前提下可做作出种种的等同变形或替换，这些等同的变形或替换均包含在本申请权利要求所限定的范围内。 The above is a specific description of the preferred implementation of the present application, but the present application is not limited to the embodiments. Those skilled in the art may make various equivalent modifications or substitutions without violating the spirit of the present application, and these equivalent modifications or substitutions are all included in the scope defined by the claims of the present application.

Claims

A target detection method based on three-dimensional data, characterized by comprising:

Construct a three-dimensional model of the target to be detected;

Obtain target images of the three-dimensional model of the target to be detected under different rendering parameters;

Inputting noise and a real image of a target to be detected into a generator of a detection model to generate a first image, inputting the first image and the target image into a discriminator of the detection model for training, and obtaining a trained detection model;

The image to be detected is input into the trained detection model for detection to obtain the target detection result.

The target detection method based on three-dimensional data according to claim 1 is characterized in that obtaining target images of the three-dimensional model of the target to be detected under different rendering parameters includes:

Set rendering parameters;

Setting a virtual camera group, wherein a plurality of virtual cameras of the virtual camera group are deployed around the three-dimensional model;

The target image of the three-dimensional model of the target to be detected under the rendering parameters is obtained through the virtual camera group.

The target detection method based on three-dimensional data according to claim 2 is characterized in that the step of obtaining a target image of the three-dimensional model of the target to be detected under rendering parameters by the virtual camera group comprises:

The following steps are performed until the rotation angle of the virtual camera group reaches an angle threshold or the number of target images reaches a number threshold: the virtual camera group is rotated by a preset unit angle, and a target image of the three-dimensional model of the target to be detected under rendering parameters is obtained through the rotated virtual camera group.

The target detection method based on three-dimensional data according to claim 2, characterized in that the coordinates of the virtual camera are (x _C , y _C , z _C ); wherein, Z _C = r cosθ = 0; r is the distance from the virtual camera to the 3D model, θ is the preset parameter, are preset parameters.

The object detection method based on three-dimensional data according to claim 4 is characterized in that when the size ratio of the target to be detected to the real image is less than a first ratio threshold, the distance between the virtual camera and the three-dimensional model is increased.

When the size ratio of the target to be detected to the real image is greater than a second ratio threshold, the three-dimensional model is segmented to obtain segmentation units, and target images of the segmentation units under different rendering parameters are obtained.

The object detection method based on three-dimensional data according to claim 1, characterized in that the step of inputting the first image and the target image into a discriminator of a detection model for training comprises:

When the discriminator determines that the target image is a real image, the pure color background in the target image is removed and the contour line of the target in the target image is marked to obtain a marked image;

Converting the format of the labeled image to the format of the real image;

The converted labeled image is superimposed on the real image to obtain a superimposed image.

The target detection method based on three-dimensional data according to claim 1 is characterized in that the loss function of the detection model is: L _total = _LD + _LG , _LD = -log(D(x))-log(1-D(G(z))), _LG = -log(D(G(z))); wherein L _total is the loss function of the detection model, _LD is the loss function of the discriminator, _LG is the loss function of the generator, D is the discriminator, G is the generator, x is the target image, and z is the noise.

An electronic device, characterized in that it comprises: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the device according to any one of claims 1 to 8 is implemented. The target detection method based on three-dimensional data is described.

A computer storage medium, characterized in that it stores computer-executable instructions, wherein the computer-executable instructions are used to execute the target detection method based on three-dimensional data as described in any one of claims 1 to 8.