CN111860493B

CN111860493B - Target detection method and device based on point cloud data

Info

Publication number: CN111860493B
Application number: CN202010535697.8A
Authority: CN
Inventors: 李智超; 王乃岩
Original assignee: Beijing Tusimple Technology Co Ltd
Current assignee: Beijing Original Generation Technology Co.,Ltd.
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2024-02-09
Anticipated expiration: 2040-06-12
Also published as: CN111860493A

Abstract

This application provides a target detection method and device based on point cloud data, and relates to the technical field of target detection. The method includes: obtaining original point cloud data and each initial target detection frame output by the initial target detection network, and obtaining information on each initial target detection frame. Extract point clouds within the preset frame range around each initial target detection frame from the original point cloud data; based on the points in the original point cloud data within each initial target detection frame and the point clouds within the preset frame range outside the initial target detection frame The points in and the information of each initial target detection frame are generated to generate the input data of the neural network; the input data is input into the pre-trained neural network for processing, and the detection frame corresponding to each initial target detection frame is obtained based on the output of the pre-trained neural network. results and target category results. The embodiments of the present application can improve the performance of target detection.

Description

A target detection method and device based on point cloud data

技术领域Technical field

本申请涉及目标检测技术领域，尤其涉及一种基于点云数据的目标检测方法及装置。The present application relates to the field of target detection technology, and in particular, to a target detection method and device based on point cloud data.

背景技术Background technique

目前，随着自动驾驶技术和移动机器人技术的发展，激光雷达设备在自动驾驶车辆和移动机器人上已经得到了广泛的应用。为了保证自动驾驶车辆和移动机器人的正常运行，一般需要通过激光雷达采集周围环境的点云数据，以帮助自动驾驶车辆和移动机器人感知自身周围环境。目前的自动驾驶车辆和移动机器人为了更好的感知自身周围环境，一般需要对自身周围环境的各种物体进行目标检测。其中一种方式是通过激光雷达采集的点云数据来实现。Currently, with the development of autonomous driving technology and mobile robot technology, lidar equipment has been widely used in autonomous vehicles and mobile robots. In order to ensure the normal operation of autonomous vehicles and mobile robots, it is generally necessary to collect point cloud data of the surrounding environment through lidar to help autonomous vehicles and mobile robots perceive their surrounding environment. In order to better perceive their surrounding environment, current self-driving vehicles and mobile robots generally need to perform target detection on various objects in their surrounding environment. One way is to use point cloud data collected by lidar.

当前，现有技术中有通过将点云数据输入到目标检测网络中来进行目标检测的，例如采用稀疏嵌入卷积目标检测网络(Sparsely Embedded Convolutional Detection，简称SECOND)、新型点云编码器和网络(PointPillars)、多视图三维网络(Multi-View3D，简称MV3D)、点体素集成网络(Point Voxel-RCNN，简称PV-RCNN)或者一种新型的三维物体检测网络(Point RCNN)。然而通过上述这些现有的目标检测网络进行的目标检测均存在检测性能较差的问题。为了提高目标检测的性能，本申请将提供一种基于点云数据的目标检测方案。Currently, existing technologies include inputting point cloud data into a target detection network to perform target detection, such as using Sparsely Embedded Convolutional Detection (SECOND), new point cloud encoders and networks. (PointPillars), Multi-View3D (MV3D for short), Point Voxel-RCNN (PV-RCNN for short) or a new three-dimensional object detection network (Point RCNN). However, the target detection performed by the above-mentioned existing target detection networks has the problem of poor detection performance. In order to improve the performance of target detection, this application will provide a target detection solution based on point cloud data.

发明内容Contents of the invention

本申请的实施例提供一种基于点云数据的目标检测方法及装置，以提高目标检测的性能。Embodiments of the present application provide a target detection method and device based on point cloud data to improve the performance of target detection.

为达到上述目的，本申请的实施例采用如下技术方案：In order to achieve the above objectives, the embodiments of the present application adopt the following technical solutions:

本申请实施例的第一方面，提供一种基于点云数据的目标检测方法，包括：A first aspect of the embodiments of this application provides a target detection method based on point cloud data, including:

获得原始点云数据和初始目标检测网络输出的各初始目标检测框，并获得各初始目标检测框信息；Obtain the original point cloud data and each initial target detection frame output by the initial target detection network, and obtain the information of each initial target detection frame;

从所述原始点云数据中提取各初始目标检测框周围预设框范围内的点云；Extract point clouds within the preset frame range around each initial target detection frame from the original point cloud data;

根据各初始目标检测框内原始点云数据中的点、初始目标检测框外预设框范围内的点云中的点以及各初始目标检测框信息，生成神经网络的输入数据；Generate the input data of the neural network based on the points in the original point cloud data within each initial target detection frame, the points in the point cloud within the preset frame outside the initial target detection frame, and the information on each initial target detection frame;

将所述输入数据输入到预先训练的神经网络中进行处理，根据预先训练的神经网络的输出得到各初始目标检测框对应的检测框结果和目标类别结果。The input data is input into a pre-trained neural network for processing, and the detection frame results and target category results corresponding to each initial target detection frame are obtained according to the output of the pre-trained neural network.

本申请实施例的第二方面，提供一种基于点云数据的目标检测装置，包括：A second aspect of the embodiment of the present application provides a target detection device based on point cloud data, including:

初始信息获得单元，用于获得原始点云数据和初始目标检测网络输出的各初始目标检测框，并获得各初始目标检测框信息；The initial information acquisition unit is used to obtain the original point cloud data and each initial target detection frame output by the initial target detection network, and obtain the information of each initial target detection frame;

点云提取单元，用于从所述原始点云数据中提取各初始目标检测框周围预设框范围内的点云；A point cloud extraction unit configured to extract point clouds within the preset frame range around each initial target detection frame from the original point cloud data;

输入数据生成单元，用于根据各初始目标检测框内原始点云数据中的点、初始目标检测框外预设框范围内的点云中的点以及各初始目标检测框信息，生成神经网络的输入数据；The input data generation unit is used to generate the neural network based on the points in the original point cloud data within each initial target detection frame, the points in the point cloud within the preset frame outside the initial target detection frame, and the information on each initial target detection frame. Input data;

结果生成单元，用于将所述输入数据输入到预先训练的神经网络中进行处理，根据预先训练的神经网络的输出得到各初始目标检测框对应的检测框结果和目标类别结果。The result generation unit is used to input the input data into a pre-trained neural network for processing, and obtain the detection frame results and target category results corresponding to each initial target detection frame based on the output of the pre-trained neural network.

本申请实施例的第三方面，提供一种计算机可读存储介质，包括程序或指令，当所述程序或指令在计算机上运行时，实现上述第一方面所述的方法。A third aspect of the embodiments of the present application provides a computer-readable storage medium, including a program or instructions. When the program or instructions are run on a computer, the method described in the first aspect is implemented.

本申请实施例的第四方面，提供一种包含指令的计算机程序产品，当所述计算机程序产品在计算机上运行时，使得所述计算机执行如上述第一方面所述的方法。A fourth aspect of the embodiments of the present application provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method described in the first aspect.

本申请实施例的第五方面，提供一种芯片系统，包括处理器，所述处理器与存储器的耦合，所述存储器存储有程序指令，当所述存储器存储的程序指令被所述处理器执行时实现上述第一方面所述的方法。A fifth aspect of the embodiment of the present application provides a chip system, including a processor, the processor is coupled to a memory, and the memory stores program instructions. When the program instructions stored in the memory are executed by the processor When implementing the method described in the first aspect above.

本申请实施例的第六方面，提供一种计算机服务器，包括存储器，以及与所述存储器通信连接的一个或多个处理器；A sixth aspect of the embodiments of the present application provides a computer server, including a memory, and one or more processors communicatively connected to the memory;

所述存储器中存储有可被所述一个或多个处理器执行的指令，所述指令被所述一个或多个处理器执行，以使所述一个或多个处理器实现上述第一方面所述的方法。The memory stores instructions that can be executed by the one or more processors, and the instructions are executed by the one or more processors, so that the one or more processors implement the first aspect described above. method described.

本申请实施例提供一种基于点云数据的目标检测方法及装置，首先获得原始点云数据和初始目标检测网络输出的各初始目标检测框，并获得各初始目标检测框信息；为了让后续处理的神经网络获得初始目标检测框的信息，本申请从原始点云数据中提取各初始目标检测框周围预设框范围内的点云，并根据各初始目标检测框内原始点云数据中的点、初始目标检测框外预设框范围内的点云中的点以及各初始目标检测框信息，生成神经网络的输入数据。该输入数据即为考虑到初始目标检测框信息的数据，从而将输入数据输入到预先训练的神经网络中进行处理，根据预先训练的神经网络的输出得到各初始目标检测框对应的检测框结果和目标类别结果。这样，得到的检测框结果和目标类别结果更为准确，可以提高目标检测的性能。Embodiments of the present application provide a target detection method and device based on point cloud data. First, the original point cloud data and each initial target detection frame output by the initial target detection network are obtained, and each initial target detection frame information is obtained; in order to facilitate subsequent processing The neural network obtains the information of the initial target detection frame. This application extracts the point cloud within the preset frame range around each initial target detection frame from the original point cloud data, and based on the points in the original point cloud data within each initial target detection frame , the points in the point cloud within the preset frame outside the initial target detection frame and the information of each initial target detection frame generate the input data of the neural network. The input data is the data that takes into account the initial target detection frame information, so that the input data is input into the pre-trained neural network for processing, and the detection frame results corresponding to each initial target detection frame are obtained based on the output of the pre-trained neural network. Target category results. In this way, the obtained detection frame results and target category results are more accurate, which can improve the performance of target detection.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort.

图1为本申请实施例提供的一种基于点云数据的目标检测方法的流程图一；Figure 1 is a flow chart 1 of a target detection method based on point cloud data provided by an embodiment of the present application;

图2为本申请实施例提供的一种基于点云数据的目标检测方法的流程图二；Figure 2 is a flow chart 2 of a target detection method based on point cloud data provided by an embodiment of the present application;

图3为本申请实施例中初始目标检测框位置不准确的情况示意图；Figure 3 is a schematic diagram of the situation where the position of the initial target detection frame is inaccurate in the embodiment of the present application;

图4为本申请实施例中初始目标检测框尺寸不精准的情况示意图；Figure 4 is a schematic diagram of the situation where the size of the initial target detection frame is inaccurate in the embodiment of the present application;

图5为本申请实施例中的初始目标检测框、预设框范围及虚拟点的示意图；Figure 5 is a schematic diagram of the initial target detection frame, preset frame range and virtual points in the embodiment of the present application;

图6为本申请实施例提供的一种基于点云数据的目标检测装置的结构示意图。Figure 6 is a schematic structural diagram of a target detection device based on point cloud data provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that data so used may be interchanged where appropriate for the embodiments of the application described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

为了使本领域的技术人员更好的了解本申请，下面先对本申请实施例中出现的部分技术术语进行解释如下：In order to enable those skilled in the art to better understand this application, some technical terms appearing in the embodiments of this application are explained as follows:

可移动物体：是指车辆、移动机器人、飞行器等可移动运行的物体，可移动物体上可以搭载各类型传感器，如激光雷达、相机等。Movable objects: refers to vehicles, mobile robots, aircraft and other movable objects. Movable objects can be equipped with various types of sensors, such as lidar, cameras, etc.

点云：通过激光雷达采集的周围环境的数据，用稀疏的三维空间点标示。Point cloud: Data of the surrounding environment collected through lidar, marked with sparse three-dimensional space points.

帧(Frame)：传感器完成一次观测所接收到的测量数据，如相机的一帧数据为一张图片，激光雷达的一帧数据为一组激光点云。Frame: The measurement data received by the sensor after completing an observation. For example, a frame of data from a camera is a picture, and a frame of data from a lidar is a set of laser point clouds.

PointNet：现有一种新型的处理点云数据的深度学习模型，可以不对原始点云数据进行编码，即可完成原始点云数据的处理。PointNet: There is a new deep learning model for processing point cloud data, which can complete the processing of the original point cloud data without encoding the original point cloud data.

PointNet++：现有的PointNet网络的一种改进版神经网络模型。PointNet++: an improved neural network model of the existing PointNet network.

SECOND：Sparsely Embedded Convolutional Detection，现有的一种采用稀疏嵌入卷积目标检测网络。SECOND: Sparsely Embedded Convolutional Detection, an existing sparsely embedded convolutional target detection network.

PointPillars：现有的一种新型点云编码器和网络。PointPillars: A new existing point cloud encoder and network.

MV3D：Multi-View 3D，现有的一种多视图三维网络。MV3D: Multi-View 3D, an existing multi-view three-dimensional network.

PV-RCNN：Point Voxel-RCNN，现有的一种点体素集成网络。PV-RCNN: Point Voxel-RCNN, an existing point voxel integration network.

PointRCNN：现有的一种新型的三维物体检测网络。PointRCNN: A new existing 3D object detection network.

DGCNN：Dynamic Graph Convolutional Neural Network，现有的一种动态图卷积神经网络。DGCNN: Dynamic Graph Convolutional Neural Network, an existing dynamic graph convolutional neural network.

RSCNN：Relation-Shape CNN，现有的一种基于几何关系的卷积神经网络。RSCNN: Relation-Shape CNN, an existing convolutional neural network based on geometric relationships.

NMS：Non Maximum Suppression，非极大值抑制，顾名思义就是抑制不是极大值的元素，搜索局部的极大值。NMS: Non Maximum Suppression, non-maximum suppression, as the name suggests, suppresses elements that are not maximum values and searches for local maximum values.

在本申请的一些实施例中，术语“车辆”广泛地解释为包括任何可移动物体，包括例如飞行器、船只、航天器、汽车、卡车、厢式货车、半挂车、摩托车、高尔夫球车、越野车辆、仓库运输车辆或农用车以及行驶在轨道上的运输工具，例如电车或火车以及其它有轨车辆。本申请中的“车辆”通常可以包括：动力系统、传感器系统、控制系统、外围设备和计算机系统。在其它实施例中，车辆可以包括更多、更少或者不同的系统。In some embodiments of the present application, the term "vehicle" is interpreted broadly to include any movable object, including, for example, aircraft, boats, spacecraft, cars, trucks, vans, semi-trailers, motorcycles, golf carts, Off-road vehicles, warehouse transport vehicles or agricultural vehicles as well as transport vehicles running on rails, such as trams or trains and other rail vehicles. "Vehicle" in this application may generally include: power systems, sensor systems, control systems, peripherals, and computer systems. In other embodiments, the vehicle may include more, fewer, or different systems.

其中，动力系统是为车辆提供动力运动的系统，包括：引擎/马达、变速器和车轮/轮胎、能源单元。Among them, the power system is the system that provides power for the vehicle, including: engine/motor, transmission and wheels/tires, and energy unit.

控制系统可以包括控制车辆及其组件的装置的组合，例如转向单元、节气门、制动单元。The control system may include a combination of devices that control the vehicle and its components, such as steering units, throttles, braking units.

外围设备可以是允许车辆与外部传感器、其它车辆、外部计算设备和/或用户进行交互的设备，例如无线通信系统、触摸屏、麦克风和/或扬声器。Peripheral devices may be devices that allow the vehicle to interact with external sensors, other vehicles, external computing devices, and/or users, such as wireless communication systems, touch screens, microphones, and/or speakers.

基于上述描述的车辆，自动驾驶车辆中还配置有传感器系统和自动驾驶控制装置。Based on the vehicle described above, the autonomous vehicle is also equipped with a sensor system and an autonomous driving control device.

传感器系统可以包括用于感测车辆所处环境的信息的多个传感器，以及改变传感器的位置和/或方向的一个或多个致动器。传感器系统可以包括全球定位系统传感器、惯性测量单元、无线电检测和测距(RADAR)单元、相机、激光测距仪、光检测和测距(LIDAR)单元和/或声学传感器等传感器的任何组合；传感器系统还可以包括监视车辆内部系统的传感器(例如O₂监视器、燃油表、引擎温度计等)。The sensor system may include a plurality of sensors for sensing information about the environment in which the vehicle is located, and one or more actuators that change the position and/or orientation of the sensors. The sensor system may include any combination of sensors such as GPS sensors, inertial measurement units, radio detection and ranging (RADAR) units, cameras, laser rangefinders, light detection and ranging (LIDAR) units, and/or acoustic sensors; The sensor system may also include sensors that monitor the vehicle's internal systems (e.g., _O2 monitor, fuel gauge, engine temperature gauge, etc.).

自动驾驶控制装置可以包括一个处理器和存储器，存储器中存储有至少一条机器可执行指令，处理器执行至少一条机器可执行指令实现包括地图引擎、定位模块、感知模块、导航或路径模块、以及自动控制模块等的功能。地图引擎和定位模块用于提供地图信息和定位信息。感知模块用于根据传感器系统获取到的信息和地图引擎提供的地图信息感知车辆所处环境中的事物。导航或路径模块用于根据地图引擎、定位模块和感知模块的处理结果，为车辆规划行驶路径。自动控制模块将导航或路径模块等模块的决策信息输入解析转换成对车辆控制系统的控制命令输出，并通过车载网(例如通过CAN总线、局域互联网络、多媒体定向系统传输等方式实现的车辆内部电子网络系统)将控制命令发送给车辆控制系统中的对应部件，实现对车辆的自动控制；自动控制模块还可以通过车载网来获取车辆中各部件的信息。The automatic driving control device may include a processor and a memory. At least one machine-executable instruction is stored in the memory. The processor executes at least one machine-executable instruction to implement a map engine, a positioning module, a perception module, a navigation or path module, and an automatic driving control device. Functions of control modules, etc. The map engine and positioning module are used to provide map information and positioning information. The perception module is used to perceive things in the environment of the vehicle based on the information obtained by the sensor system and the map information provided by the map engine. The navigation or path module is used to plan a driving path for the vehicle based on the processing results of the map engine, positioning module and perception module. The automatic control module parses and converts the decision-making information input of modules such as navigation or path modules into control command output for the vehicle control system, and implements the vehicle through the vehicle network (such as CAN bus, local area Internet network, multimedia directional system transmission, etc. The internal electronic network system) sends control commands to the corresponding components in the vehicle control system to realize automatic control of the vehicle; the automatic control module can also obtain information about each component in the vehicle through the vehicle network.

当前，现有技术中有通过将点云数据输入到各种目标检测网络中来进行目标检测的，例如采用稀疏嵌入卷积目标检测网络(Sparsely Embedded ConvolutionalDetection，简称SECOND)、新型点云编码器和网络(PointPillars)、多视图三维网络(Multi-View 3D，简称MV3D)、点体素集成网络(Point Voxel-RCNN，简称PV-RCNN)或者一种新型的三维物体检测网络(Point RCNN)。然而通过上述这些现有的目标检测网络进行的目标检测均存在检测性能较差的问题，其原因主要有以下两种：Currently, existing technologies include inputting point cloud data into various target detection networks for target detection, such as using Sparsely Embedded Convolutional Detection (SECOND), new point cloud encoders and Network (PointPillars), Multi-View 3D (MV3D for short), Point Voxel-RCNN (PV-RCNN for short) or a new type of three-dimensional object detection network (Point RCNN). However, target detection through the above-mentioned existing target detection networks has the problem of poor detection performance. There are two main reasons:

其一，在采用SECOND、PointPillars、MV3D或PV-RCNN进行目标检测时，均需要对点云数据做各种编码(例如栅格化或鸟瞰图投影等方式)，以使得杂乱无序的点云变得有序，以降低目标检测过程中的计算量。然而在进行各种编码的过程中，也造成了原始点云数据的位置信息的损失，导致得到的目标检测框位置不准确，整体检测性能下降。First, when using SECOND, PointPillars, MV3D or PV-RCNN for target detection, various encodings of point cloud data (such as rasterization or bird's-eye view projection, etc.) are required to make the point cloud messy and disordered. Become orderly to reduce the amount of calculation in the target detection process. However, in the process of various encodings, the position information of the original point cloud data is also lost, resulting in inaccurate position of the target detection frame and a decrease in overall detection performance.

其二，在采用Point RCNN进行目标检测时，其所预测的目标检测框尺寸在一些点云稀疏的位置可大可小(尺寸大一些，小一些均可以将相应的点云框进去)，这样后续处理的网络(如PointNet)在无法清楚知悉目标检测框尺寸的情况下，对各目标检测框所划分的正负样本不准确，使得后续的非极大值抑制(Non-Maximum Suppression，简称NMS)处理将不够准确，目标检测性能较差。Secondly, when using Point RCNN for target detection, the size of the target detection frame predicted by it can be large or small in some sparse point cloud locations (either a larger size or a smaller size can fit the corresponding point cloud into the frame), so When the subsequent processing network (such as PointNet) cannot clearly know the size of the target detection frame, the positive and negative samples divided by each target detection frame are inaccurate, making the subsequent Non-Maximum Suppression (NMS) ) processing will not be accurate enough and target detection performance will be poor.

本申请实施例旨在提出一种基于点云数据的目标检测方案，以提高目标检测的性能。The embodiments of this application aim to propose a target detection solution based on point cloud data to improve the performance of target detection.

如图1所示，本申请实施例提供一种基于点云数据的目标检测方法，包括：As shown in Figure 1, this embodiment of the present application provides a target detection method based on point cloud data, including:

步骤101、获得原始点云数据和初始目标检测网络输出的各初始目标检测框，并获得各初始目标检测框信息。Step 101: Obtain the original point cloud data and each initial target detection frame output by the initial target detection network, and obtain the information of each initial target detection frame.

步骤102、从原始点云数据中提取各初始目标检测框周围预设框范围内的点云。Step 102: Extract point clouds within the preset frame range around each initial target detection frame from the original point cloud data.

步骤103、根据各初始目标检测框内原始点云数据中的点、初始目标检测框外预设框范围内的点云中的点以及各初始目标检测框信息，生成神经网络的输入数据。Step 103: Generate the input data of the neural network based on the points in the original point cloud data within each initial target detection frame, the points in the point cloud within the preset frame outside the initial target detection frame, and the information on each initial target detection frame.

步骤104、将输入数据输入到预先训练的神经网络中进行处理，根据预先训练的神经网络的输出得到各初始目标检测框对应的检测框结果和目标类别结果。Step 104: Input the input data into the pre-trained neural network for processing, and obtain the detection frame results and target category results corresponding to each initial target detection frame based on the output of the pre-trained neural network.

为了使本领域的技术人员更好的了解本申请，下面结合附图、实例等对本申请实施例做更为详细的阐述。值得说明的是，本申请实施例中的目标检测可以是指安装有雷达的自动驾驶车辆、智能机器人、无人机等对周围环境进行感知并检测识别所关注的物体(如车辆、行人、障碍物等，但不仅局限于此)，还可以是指安装有雷达的安防设施对所监控范围进行感知并检测识别所关注的物体(人、车辆、货箱等，但不仅局限于此)。当然，应用到目标检测的场景还有很多，应该知道的是，能够应用目标检测的场景均可能应用到本申请实施例，本申请实施例在此不再一一列举。In order to enable those skilled in the art to better understand the present application, the embodiments of the present application will be described in more detail below with reference to the accompanying drawings, examples, etc. It is worth noting that target detection in the embodiments of the present application may refer to autonomous vehicles, intelligent robots, drones, etc. equipped with radars perceiving the surrounding environment and detecting and identifying objects of concern (such as vehicles, pedestrians, obstacles, etc.) objects, etc., but not limited to this), it can also refer to security facilities equipped with radar to sense the monitored range and detect and identify objects of concern (people, vehicles, containers, etc., but not limited to this). Of course, there are many scenarios that are applied to target detection. It should be noted that all scenarios in which target detection can be applied may be applied to the embodiments of this application, and the embodiments of this application will not be listed one by one here.

在本申请的一实施例中，如图2所示，提供一种基于点云数据的目标检测方法，包括：In an embodiment of the present application, as shown in Figure 2, a target detection method based on point cloud data is provided, including:

步骤201、获得原始点云数据和初始目标检测网络输出的各初始目标检测框，并获得各初始目标检测框信息。Step 201: Obtain the original point cloud data and each initial target detection frame output by the initial target detection network, and obtain the information of each initial target detection frame.

此处，在本申请的一实施例中，获得的原始点云数据是指由雷达(如激光雷达)采集得到的原始点云数据。本申请实施例中后续会应用原始点云数据来进行目标检测，而无需对这些原始点云数据进行各种编码(例如栅格化或鸟瞰图投影等方式)，以避免在进行各种编码的过程中，造成原始点云数据的位置信息的损失，导致得到的初始目标检测框位置不准确，整体检测性能下降的问题。Here, in an embodiment of the present application, the original point cloud data obtained refers to the original point cloud data collected by radar (such as lidar). In the embodiment of the present application, the original point cloud data will be used for target detection later, without the need to perform various encodings (such as rasterization or bird's-eye view projection, etc.) on these original point cloud data to avoid the need for various encodings. During the process, the position information of the original point cloud data is lost, resulting in inaccurate position of the initial target detection frame and a decrease in overall detection performance.

此处，在本申请的一实施例中，该初始目标检测网络可以是目前现有的一些目标检测网络，例如一些需要对原始点云数据进行各种编码(例如栅格化或鸟瞰图投影等方式)的目标检测网络，如SECOND、PointPillars、MV3D或PV-RCNN等目标检测网络，这些目标检测网络的具体工作过程均属于现有技术，此处不再赘述。需要知道的是，这些目标检测网络所输出的初始目标检测框可能存在位置不准确的问题。例如在图3中，一组点云31(如车辆的外部轮廓点云)理想情况下应被准确的初始目标检测框框住，基本没有点落在初始目标检测框外，然而上述这些目标检测网络所输出的初始目标检测框32可能如图3中所示，其可能存在位置不准确，造成点云31中的部分点在初始目标检测框32外面的情况。本申请实施例提供的基于点云数据的目标检测方法的其中一个目的就是为了克服如图3所示的问题，使检测框结果位置更精准。Here, in an embodiment of the present application, the initial target detection network can be some currently existing target detection networks, such as some that require various encodings of original point cloud data (such as rasterization or bird's-eye view projection, etc. Method) target detection network, such as SECOND, PointPillars, MV3D or PV-RCNN and other target detection networks. The specific working processes of these target detection networks belong to existing technologies and will not be described again here. What needs to be known is that the initial target detection frames output by these target detection networks may have inaccurate positions. For example, in Figure 3, a set of point clouds 31 (such as the external contour point cloud of a vehicle) should ideally be framed by an accurate initial target detection frame, and basically no points fall outside the initial target detection frame. However, the above target detection networks The output initial target detection frame 32 may be as shown in FIG. 3 , and its position may be inaccurate, causing some points in the point cloud 31 to be outside the initial target detection frame 32 . One of the purposes of the target detection method based on point cloud data provided by the embodiment of the present application is to overcome the problem shown in Figure 3 and make the result position of the detection frame more accurate.

另外，在本申请的一实施例中，该初始目标检测网络可以是目前现有的一些目标检测网络，例如一些直接对原始点云数据进行处理的目标检测网络，如PointRCNN等目标检测网络，这些目标检测网络的具体工作过程均属于现有技术，此处不再赘述。需要知道的是，这些目标检测网络所输出的初始目标检测框可能存在尺寸大小不清楚的问题，初始目标检测框尺寸大一些或尺寸小一些均可以将相应的点云框进去，这样后续处理的网络(如PointNet)在无法清楚知悉初始目标检测框尺寸的情况下，对各初始目标检测框所划分的正负样本不准确，使得后续的非极大值抑制(Non-Maximum Suppression，简称NMS)处理将不够准确，目标检测性能较差。例如在图4中，一组点云41(如车辆的外部轮廓点云)理想情况下应被初始目标检测框框住，且初始目标检测框的尺寸应该是刚好满足将这些点云41框住。然而上述这些目标检测网络所输出的初始目标检测框42可能如图4中所示，其可能尺寸过大，造成点云41虽然被框住，但是初始目标检测框42中空白部分过多。本申请实施例提供的基于点云数据的目标检测方法的其中又一个目的就是为了克服如图4所示的问题，使检测框结果更精准地框住目标，且检测框尺寸大小合适。In addition, in an embodiment of the present application, the initial target detection network can be some currently existing target detection networks, such as some target detection networks that directly process original point cloud data, such as PointRCNN and other target detection networks. The specific working process of the target detection network belongs to the existing technology and will not be described again here. What needs to be known is that the initial target detection frame output by these target detection networks may have unclear size. If the initial target detection frame is larger or smaller, the corresponding point cloud can be framed, so that subsequent processing can When the network (such as PointNet) cannot clearly know the size of the initial target detection frame, the positive and negative samples divided by each initial target detection frame are inaccurate, making the subsequent Non-Maximum Suppression (NMS) Processing will be less accurate and target detection performance will be poorer. For example, in Figure 4, a set of point clouds 41 (such as the vehicle's outer contour point cloud) should ideally be framed by the initial target detection frame, and the size of the initial target detection frame should be just enough to frame these point clouds 41. However, the initial target detection frame 42 output by the above-mentioned target detection networks may be as shown in FIG. 4 , and may be too large in size, resulting in that although the point cloud 41 is framed, there are too many blank parts in the initial target detection frame 42 . Another purpose of the target detection method based on point cloud data provided by the embodiment of the present application is to overcome the problem shown in Figure 4, so that the detection frame result can more accurately frame the target, and the detection frame size is appropriate.

另外，由于目标检测网络可以同时检测多个物体，且即使对于同一物体的目标检测，也可能存在多个初始目标检测框。为了保证后续的计算处理过程，本申请的一实施例中需要获得各初始目标检测框信息，其中该初始目标检测框信息可以包括初始目标检测框的尺寸范围信息，例如以(l、w、h)形式表示，但不仅局限于此；其中，l、w和h分别表示初始目标检测框的长度、宽度和高度。另外，该初始目标检测框信息还可以包括初始目标检测框中心点信息和朝向信息，例如初始目标检测框中心点信息以(Cx、Cy、Cz)表示，Cx、Cy、Cz分别表示初始目标检测框的中心点坐标，而该朝向信息可以记为heading，但不仅局限于此。In addition, since the target detection network can detect multiple objects at the same time, and even for target detection of the same object, there may be multiple initial target detection frames. In order to ensure the subsequent calculation process, in an embodiment of the present application, it is necessary to obtain each initial target detection frame information, where the initial target detection frame information may include the size range information of the initial target detection frame, for example, in the form of (l, w, h ) form, but is not limited to this; among them, l, w and h respectively represent the length, width and height of the initial target detection frame. In addition, the initial target detection frame information may also include initial target detection frame center point information and orientation information. For example, the initial target detection frame center point information is represented by (Cx, Cy, Cz), and Cx, Cy, Cz respectively represent initial target detection. The coordinates of the center point of the box, and the orientation information can be recorded as heading, but it is not limited to this.

步骤202、从原始点云数据中提取各初始目标检测框周围预设框范围内的点云。Step 202: Extract point clouds within the preset frame range around each initial target detection frame from the original point cloud data.

在本申请的一实施例中，由于初始目标检测框可能存在如图3所示的点云31中的部分点落在初始目标检测框32外面的情况，因此为了满足原始点云数据的处理需求，避免落下点云中的关键点，该步骤202需要从原始点云数据中提取各初始目标检测框周围预设框范围内的点云，其具体过程可以采用如下两种方式(但不仅局限于此)：In an embodiment of the present application, because the initial target detection frame may have some points in the point cloud 31 falling outside the initial target detection frame 32 as shown in Figure 3, in order to meet the processing requirements of the original point cloud data , to avoid missing key points in the point cloud. This step 202 requires extracting the point cloud within the preset frame range around each initial target detection frame from the original point cloud data. The specific process can be used in the following two ways (but is not limited to this):

方式一：获得预先设置的框倍率，即如图5所示，该框倍率是指所需要的扩大到的预设范围(例如框范围51)与初始目标检测框52的大小的比值(如可以为两者体积的比值、两者框长度的比值或者两者表面积的比值，但不仅局限于此。)。一般情况下框倍率大于1，例如可以选用1.5倍，但不仅局限于此。Method 1: Obtain the preset frame magnification, that is, as shown in Figure 5. The frame magnification refers to the ratio of the preset range that needs to be expanded to (for example, the frame range 51) and the size of the initial target detection frame 52 (such as It is the ratio of the two volumes, the ratio of the two frame lengths, or the ratio of the two surface areas, but is not limited to this.). Generally, the frame magnification is greater than 1, for example, 1.5 times can be selected, but it is not limited to this.

根据各初始目标检测框和框倍率，确定各预设框范围(即在已知初始目标检测框52和框倍率的情况下，是可以计算出预设框范围51的)，并从原始点云数据中提取各预设框范围内的点云，即在初始目标检测框之外，且在预设框范围之内的点云将被提取出来。According to each initial target detection frame and frame magnification, determine the range of each preset frame (that is, when the initial target detection frame 52 and frame magnification are known, the preset frame range 51 can be calculated), and from the original point cloud Point clouds within each preset frame are extracted from the data, that is, point clouds outside the initial target detection frame and within the preset frame will be extracted.

方式二：获得预先设置的框扩大量；其中，该框扩大量可以为预先设置的扩大长度、扩大面积或扩大体积；例如如图5所示，初始目标检测框52具有框长度、表面积、框体积等特征，因此在获得相应的框扩大量后，可以确定预设框范围51，并从原始点云数据中提取各预设框范围内的点云。即在初始目标检测框之外，且在预设框范围之内的点云将被提取出来。Method 2: Obtain a preset frame expansion amount; wherein, the frame expansion amount can be a preset expansion length, expansion area, or expansion volume; for example, as shown in Figure 5, the initial target detection frame 52 has a frame length, a surface area, a frame length Volume and other characteristics, so after obtaining the corresponding frame enlargement, the preset frame range 51 can be determined, and the point clouds within each preset frame range can be extracted from the original point cloud data. That is, point clouds outside the initial target detection frame and within the preset frame will be extracted.

值得说明的是，上述步骤103中，本申请实施例中的神经网络的输入数据可以包括待处理的点数据和各初始目标检测框的表达信息；初始目标检测框信息和该表达信息至少包括初始目标检测框的尺寸范围信息，另外，该初始目标检测框信息和表达信息还可以包括初始目标检测框中心点信息和朝向信息；待处理的点数据包括待处理的点位置(例如可以采用坐标的方式)。It is worth noting that in the above step 103, the input data of the neural network in the embodiment of the present application may include the point data to be processed and the expression information of each initial target detection frame; the initial target detection frame information and the expression information at least include the initial The size range information of the target detection frame. In addition, the initial target detection frame information and expression information can also include the center point information and orientation information of the initial target detection frame; the point data to be processed includes the point position to be processed (for example, the coordinates can be used Way).

此处该表达信息可以是指能够表达初始目标检测框的任何方式，本申请对此不作限定。为了理解方便，后续本申请实施例会提供几种常见的方式。The expression information here may refer to any method that can express the initial target detection frame, which is not limited in this application. For ease of understanding, several common methods will be provided in subsequent embodiments of this application.

则在步骤202之后，继续执行下述步骤203至步骤204。该步骤203至步骤204可以作为上述步骤103的具体实现方式。After step 202, continue to perform the following steps 203 to 204. This step 203 to step 204 can be used as a specific implementation manner of the above step 103.

步骤203、根据各初始目标检测框内原始点云数据中的点、初始目标检测框外预设框范围内的点云中的点以及各初始目标检测框的尺寸范围信息，确定各初始目标检测框的表达信息。Step 203: Determine each initial target detection based on the points in the original point cloud data within each initial target detection frame, the points in the point cloud within the preset frame range outside the initial target detection frame, and the size range information of each initial target detection frame. The expression information of the box.

此处，该步骤203可以采用多种方式，本申请实施例仅列举其中部分实施例，应该知道的是，凡是能够表达初始目标检测框的表达信息，均应被理解为本申请实施例中的表达信息。Here, step 203 can be implemented in a variety of ways. The embodiments of this application only list some of the embodiments. It should be noted that any expression information that can express the initial target detection frame should be understood as the expression information in the embodiments of this application. express information.

其中，方式A、可以采用在初始目标检测框中设置虚拟点的方式：Among them, method A can adopt the method of setting virtual points in the initial target detection frame:

例如，根据各初始目标检测框的尺寸范围信息，在各初始目标检测框中生成均匀填充于各初始目标检测框的虚拟点作为各初始目标检测框的表达信息。该虚拟点的分布即表达了初始目标检测框的范围。For example, according to the size range information of each initial target detection frame, virtual points uniformly filled in each initial target detection frame are generated in each initial target detection frame as expression information of each initial target detection frame. The distribution of this virtual point expresses the range of the initial target detection frame.

具体的，可以根据各初始目标检测框的尺寸范围信息，获得各初始目标检测框对应的虚拟点的间隔；然后根据各初始目标检测框对应的虚拟点的间隔，在各初始目标检测框中生成均匀填充于初始目标检测框的虚拟点作为各初始目标检测框的表达信息。如图5所示，初始目标检测框52内即均匀分布了虚拟点53。其中，该虚拟点53的间隔可以是根据初始目标检测框的尺寸范围来确定，如初始目标检测框的尺寸范围为一个长宽高分别为5m、3m、2m的框，那么可以在该框范围内确定每立方分米的小格中心或八个顶角设置虚拟点53，但不仅局限于此，本领域技术人员还可以采用其他的方式来获得虚拟点的间隔，如直接人为设置该间隔。Specifically, the interval of virtual points corresponding to each initial target detection frame can be obtained based on the size range information of each initial target detection frame; and then based on the interval of virtual points corresponding to each initial target detection frame, generate in each initial target detection frame The virtual points uniformly filled in the initial target detection frame serve as the expression information of each initial target detection frame. As shown in FIG. 5 , virtual points 53 are evenly distributed within the initial target detection frame 52 . Among them, the interval of the virtual points 53 can be determined according to the size range of the initial target detection frame. For example, the size range of the initial target detection frame is a frame with a length, width and height of 5m, 3m, and 2m respectively, then it can be within the frame range. The center of each cubic decimeter or the eight vertices of each cubic decimeter is determined to set the virtual point 53, but it is not limited to this. Those skilled in the art can also use other methods to obtain the interval of the virtual points, such as directly setting the interval manually.

为了便于后续神经网络能够识别出虚拟点、初始目标检测框内原始点云数据中的点，以及初始目标检测框外预设框范围内的点云中的点，需要为这些点进行赋值，例如：如图5所示，分别为各初始目标检测框52内原始点云数据中的点521、虚拟点53以及初始目标检测框52外预设框范围51内的点云中的点511分配预设的点类型值，如初始目标检测框内原始点云数据中的点对应的点类型值为1、虚拟点对应的点类型值为2、初始目标检测框外预设框范围内的点云中的点对应的点类型值为0，但不仅局限于此。In order to facilitate the subsequent neural network to identify virtual points, points in the original point cloud data within the initial target detection frame, and points in the point cloud within the preset frame outside the initial target detection frame, these points need to be assigned values, such as : As shown in Figure 5, pre-allocated points are respectively allocated to the points 521 and virtual points 53 in the original point cloud data within each initial target detection frame 52 and the points 511 in the point cloud within the preset frame range 51 outside the initial target detection frame 52. Set the point type value, for example, the point type value corresponding to the point in the original point cloud data within the initial target detection frame is 1, the point type value corresponding to the virtual point is 2, and the point cloud within the preset frame range outside the initial target detection frame The point type value corresponding to the point in is 0, but it is not limited to this.

另外，方式B、将尺寸范围信息增加到点云中的点的特征中的方式：In addition, method B is a method of adding size range information to the features of points in the point cloud:

例如可以将各初始目标检测框的尺寸范围信息增加到对应的初始目标检测框内原始点云数据中的点和初始目标检测框外预设框范围内的点云中的点的特征中，以使得所述特征中携带有所述表达信息，从而使得这些特征输入到后续的神经网络中后，后续的神经网络将能够获得到能够表达初始目标检测框的表达信息。For example, the size range information of each initial target detection frame can be added to the characteristics of points in the original point cloud data within the corresponding initial target detection frame and points in the point cloud within the preset frame range outside the initial target detection frame, so as to The expression information is carried in the features, so that after these features are input into the subsequent neural network, the subsequent neural network will be able to obtain expression information that can express the initial target detection frame.

此处，具体如何将初始目标检测框的尺寸范围信息增加到对应的初始目标检测框内原始点云数据中的点和初始目标检测框外预设框范围内的点云中的点的特征中有多种方式，此处仅列举其中两种，但不仅局限于此两种方式。Here, specifically how to add the size range information of the initial target detection frame to the characteristics of the points in the original point cloud data within the corresponding initial target detection frame and the points in the point cloud within the preset frame range outside the initial target detection frame There are many ways, only two of them are listed here, but they are not limited to these two ways.

其中一种：可以将各初始目标检测框的尺寸范围信息(l、w、h)增加到对应的初始目标检测框内原始点云数据中的点和初始目标检测框外预设框范围内的点云中的点的坐标(xi、yi、zi)中，生成各点带有所述表达信息的特征(xi、yi、zi、l、w、h)；其中，l、w和h分别表示初始目标检测框的长度、宽度和高度。One of them: you can add the size range information (l, w, h) of each initial target detection frame to the points in the original point cloud data within the corresponding initial target detection frame and the points within the preset frame range outside the initial target detection frame. From the coordinates (xi, yi, zi) of the points in the point cloud, the characteristics (xi, yi, zi, l, w, h) of each point with the expression information are generated; where l, w and h represent respectively The length, width and height of the initial object detection frame.

另一种：可以根据各初始目标检测框的尺寸范围信息(l、w、h)对对应的初始目标检测框内原始点云数据中的点和初始目标检测框外预设框范围内的点云中的点(xi、yi、zi)进行归一化处理，生成各点带有所述表达信息的特征其中，l、w和h分别表示初始目标检测框的长度、宽度和高度。Another method: based on the size range information (l, w, h) of each initial target detection frame, the points in the original point cloud data within the corresponding initial target detection frame and the points within the preset frame range outside the initial target detection frame can be compared The points (xi, yi, zi) in the cloud are normalized to generate features with the expression information for each point. Among them, l, w and h represent the length, width and height of the initial target detection frame respectively.

步骤204、生成至少包含各待处理的点位置以及各初始目标检测框的表达信息的神经网络的输入数据。Step 204: Generate input data of the neural network that at least contains the position of each point to be processed and the expression information of each initial target detection frame.

其中，各待处理的点位置可以采用坐标的形式，待处理的点一般是指初始目标检测框内原始点云数据中的点和初始目标检测框外预设框范围内的点云中的点。Among them, the position of each point to be processed can be in the form of coordinates. The points to be processed generally refer to the points in the original point cloud data within the initial target detection frame and the points in the point cloud within the preset frame range outside the initial target detection frame. .

步骤205、将输入数据输入到预先训练的神经网络中进行处理，根据预先训练的神经网络的输出得到各初始目标检测框对应的检测框结果和目标类别结果。Step 205: Input the input data into the pre-trained neural network for processing, and obtain the detection frame results and target category results corresponding to each initial target detection frame based on the output of the pre-trained neural network.

其中，预先训练的神经网络可以采用目前主流的能够对原始点云数据进行处理的目标检测网络，如PointNet网络、PointNet++网络、动态图卷积神经网络DGCNN或基于几何关系的卷积神经网络RSCNN等，但不仅局限于此。由于这些神经网络属于现有技术的目标检测网络，因此此处对这些神经网络的结构不再赘述。本申请实施例仅为这些神经网络提供输入数据，并获得这些神经网络的输出。Among them, the pre-trained neural network can use the current mainstream target detection network that can process original point cloud data, such as PointNet network, PointNet++ network, dynamic graph convolutional neural network DGCNN or geometric relationship-based convolutional neural network RSCNN, etc. , but not limited to this. Since these neural networks belong to target detection networks in the prior art, the structures of these neural networks will not be described again here. The embodiments of this application only provide input data to these neural networks and obtain the outputs of these neural networks.

具体的，当上述步骤203采用的是方式A时，此处神经网络的预先训练过程为：Specifically, when the above step 203 adopts method A, the pre-training process of the neural network here is:

获得训练样本数据集；训练样本数据集包括若干组训练样本数据；其中，每组训练样本数据包括初始目标检测框内原始点云数据中的点、初始目标检测框内虚拟点、初始目标检测框外预设框范围内的点云中的点、上述各点对应的点类型值、预先标注的初始目标检测框对应的检测框结果和目标类别结果。Obtain a training sample data set; the training sample data set includes several sets of training sample data; each set of training sample data includes points in the original point cloud data within the initial target detection frame, virtual points within the initial target detection frame, and the initial target detection frame Points in the point cloud within the outer preset frame, point type values corresponding to each of the above points, detection frame results and target category results corresponding to the pre-marked initial target detection frame.

将初始目标检测框内原始点云数据中的点、初始目标检测框内虚拟点、初始目标检测框外预设框范围内的点云中的点，以及上述各点对应的点类型值作为输入，将预先标注的初始目标检测框对应的检测框结果和目标类别结果作为输出，对所述神经网络进行训练。The points in the original point cloud data within the initial target detection frame, the virtual points within the initial target detection frame, the points in the point cloud within the preset frame outside the initial target detection frame, and the point type values corresponding to the above points are used as input , using the detection frame results and target category results corresponding to the pre-marked initial target detection frame as output to train the neural network.

具体的，当上述步骤203采用的是方式B时，此处神经网络的预先训练过程为：Specifically, when the above step 203 adopts method B, the pre-training process of the neural network here is:

获得训练样本数据集；训练样本数据集包括若干组训练样本数据；其中，每组训练样本数据包括初始目标检测框内待处理的点带有所述表达信息的特征、初始目标检测框外预设框范围内的待处理的点带有所述表达信息的特征、预先标注的初始目标检测框对应的检测框结果和目标类别结果；Obtain a training sample data set; the training sample data set includes several sets of training sample data; wherein each set of training sample data includes features of points to be processed within the initial target detection frame with the expression information, and presets outside the initial target detection frame. The points to be processed within the frame range have the characteristics of the expression information, the detection frame results and target category results corresponding to the pre-marked initial target detection frame;

将初始目标检测框内待处理的点带有所述表达信息的特征、初始目标检测框外预设框范围内的待处理的点带有所述表达信息的特征作为输入，将预先标注的初始目标检测框对应的检测框结果和目标类别结果作为输出，对所述神经网络进行训练。The points to be processed within the initial target detection frame with the characteristics of the expression information, and the points to be processed within the preset frame range outside the initial target detection frame with the characteristics of the expression information are used as input, and the pre-marked initial The detection frame results and target category results corresponding to the target detection frame are used as output to train the neural network.

此处，具体的神经网络训练方法可以有多种，如BGD(Batch Gradient Descent，批量梯度下降法)、SGD(Stochastic Gradient Descent，随机梯度下降法)、Adam优化算法(Adam optimization algorithm)、RMSprop(Root Mean Square prop，均方根传递法)等，但不仅局限于此。Here, there are many specific neural network training methods, such as BGD (Batch Gradient Descent, batch gradient descent method), SGD (Stochastic Gradient Descent, stochastic gradient descent method), Adam optimization algorithm (Adam optimization algorithm), RMSprop ( Root Mean Square prop, root mean square prop, etc., but not limited to this.

相应的，当上述步骤203采用的是方式A时，此处步骤205可以采用如下方式实现：Correspondingly, when the above step 203 adopts mode A, step 205 here can be implemented in the following way:

将各初始目标检测框内原始点云数据中的点、虚拟点以及初始目标检测框外预设框范围内的点云中的点和各自对应的点类型值输入到预先训练的神经网络中进行处理，根据预先训练的神经网络的输出得到各初始目标检测框对应的检测框结果和目标类别结果。The points and virtual points in the original point cloud data within each initial target detection frame, as well as the points in the point cloud within the preset frame outside the initial target detection frame and their corresponding point type values are input into the pre-trained neural network. Process, and obtain the detection frame results and target category results corresponding to each initial target detection frame based on the output of the pre-trained neural network.

相应的，当上述步骤203采用的是方式B时，此处步骤205可以采用如下方式实现：Correspondingly, when the above step 203 adopts mode B, step 205 here can be implemented in the following way:

将各初始目标检测框内待处理的点带有所述表达信息的特征、初始目标检测框外预设框范围内的待处理的点带有所述表达信息的特征输入到预先训练的神经网络中进行处理，根据预先训练的神经网络的输出得到各初始目标检测框对应的检测框结果和目标类别结果。The points to be processed within each initial target detection frame are characterized by the expression information, and the points to be processed within the preset frame outside the initial target detection frame are characterized by the expression information and are input into the pre-trained neural network. Processing is performed in the method, and the detection frame results and target category results corresponding to each initial target detection frame are obtained based on the output of the pre-trained neural network.

经过上述步骤201至步骤205，使得预先训练的神经网络处理的是原始点云数据，没有造成原始点云数据的位置信息的损失，另外预先训练的神经网络由于能够得到表征初始目标检测框的表达信息，因此其输出的检测框结果也可以避免存在过大的情况。最终形成的检测框结果无论从位置上还是尺寸上，将更贴近于目标的真实情况。After the above steps 201 to 205, the pre-trained neural network processes the original point cloud data without causing any loss of position information of the original point cloud data. In addition, the pre-trained neural network can obtain an expression that represents the initial target detection frame. information, so its output detection frame results can also avoid being too large. The final detection frame result will be closer to the real situation of the target in terms of position and size.

具体的，预先训练的神经网络输出的检测框结果可以包括神经网络输出的检测框的中心点坐标、检测框尺寸信息以及检测框所对应的朝向信息。Specifically, the detection frame result output by the pre-trained neural network may include the center point coordinates of the detection frame output by the neural network, the detection frame size information, and the orientation information corresponding to the detection frame.

具体的目标类别结果一般可以根据所关注的类别种类以数字的形式表示，对于同一目标物体，其对应的目标类别结果中表示各类别的数字之和为1。例如，整帧点云数据中，所关注的类别种类分别为行人和车辆，则目标类别结果可以表示为score的方式，即：(识别为行人的数值，识别为车辆的数值)，其中对于该行人可能输出了5个检测框，其目标类别结果可能分别为(0.90，0.10)、(0.87，0.13)、(0.78，0.22)、(0.96，0.04)、(0.89，0.11)，对于该车辆可能输出了4个检测框，其目标类别结果可能分别为(0.05，0.95)、(0.08，0.92)、(0.20，0.80)、(0.15，0.85)。一般情况下在目标检测领域中，还可以采用其他方式来表达检测框结果和目标类别结果，此处不再一一列举。Specific target category results can generally be expressed in the form of numbers according to the category of concern. For the same target object, the sum of the numbers representing each category in the corresponding target category result is 1. For example, in the whole frame point cloud data, the categories of interest are pedestrians and vehicles respectively, then the target category results can be expressed as scores, that is: (values identified as pedestrians, values identified as vehicles), where for the Pedestrians may output 5 detection frames, and their target category results may be (0.90, 0.10), (0.87, 0.13), (0.78, 0.22), (0.96, 0.04), (0.89, 0.11) respectively. For this vehicle, it may be Four detection frames are output, and their target category results may be (0.05, 0.95), (0.08, 0.92), (0.20, 0.80), (0.15, 0.85) respectively. Generally speaking, in the field of target detection, other methods can be used to express the detection frame results and target category results, which will not be listed here.

在步骤205之后，还可以继续执行步骤206：After step 205, step 206 can also be continued:

步骤206、根据非极大值抑制NMS算法，对各初始目标检测框对应的检测框结果和目标类别结果进行处理，以生成非极大值抑制后的最终检测框结果。Step 206: Process the detection frame results and target category results corresponding to each initial target detection frame according to the non-maximum suppression NMS algorithm to generate a final detection frame result after non-maximum suppression.

由于上述的检测框结果和对应的目标类别结果有若干个，特别是对于同一目标，可能有多个检测框，多个检测框存在重合区域，因此要进行非极大值抑制，以筛选出最优选的最终检测框结果。例如上述行人对应有5个检测框，其目标类别结果分别为(0.90，0.10)、(0.87，0.13)、(0.78，0.22)、(0.96，0.04)、(0.89，0.11)，上述车辆对应有4个检测框，其目标类别结果分别为(0.05，0.95)、(0.08，0.92)、(0.20，0.80)、(0.15，0.85)。通过上述非极大值抑制处理，最终检测框结果可能仅剩下了(0.96，0.04)对应的检测框和(0.05，0.95)对应的检测框。具体的非极大值抑制算法已经较为成熟，此处不再赘述。Since there are several above-mentioned detection frame results and corresponding target category results, especially for the same target, there may be multiple detection frames, and multiple detection frames have overlapping areas. Therefore, non-maximum suppression is required to filter out the best Preferred final detection box results. For example, the above-mentioned pedestrians correspond to 5 detection frames, and the target category results are (0.90, 0.10), (0.87, 0.13), (0.78, 0.22), (0.96, 0.04), (0.89, 0.11). The above-mentioned vehicles correspond to The target category results of the 4 detection frames are (0.05, 0.95), (0.08, 0.92), (0.20, 0.80), (0.15, 0.85) respectively. Through the above non-maximum suppression processing, the final detection frame result may only be the detection frame corresponding to (0.96, 0.04) and the detection frame corresponding to (0.05, 0.95). The specific non-maximum suppression algorithm is relatively mature and will not be described again here.

另外，如图6所示，本申请实施例还提供一种基于点云数据的目标检测装置，包括：In addition, as shown in Figure 6, this embodiment of the present application also provides a target detection device based on point cloud data, including:

初始信息获得单元61，用于获得原始点云数据和初始目标检测网络输出的各初始目标检测框，并获得各初始目标检测框信息。The initial information obtaining unit 61 is used to obtain the original point cloud data and each initial target detection frame output by the initial target detection network, and obtain the information of each initial target detection frame.

点云提取单元62，用于从所述原始点云数据中提取各初始目标检测框周围预设框范围内的点云。The point cloud extraction unit 62 is configured to extract point clouds within the preset frame range around each initial target detection frame from the original point cloud data.

输入数据生成单元63，用于根据各初始目标检测框内原始点云数据中的点、初始目标检测框外预设框范围内的点云中的点以及各初始目标检测框信息，生成神经网络的输入数据。The input data generation unit 63 is used to generate a neural network based on the points in the original point cloud data within each initial target detection frame, the points in the point cloud within the preset frame range outside the initial target detection frame, and the information on each initial target detection frame. input data.

结果生成单元64，用于将所述输入数据输入到预先训练的神经网络中进行处理，根据预先训练的神经网络的输出得到各初始目标检测框对应的检测框结果和目标类别结果。The result generation unit 64 is used to input the input data into a pre-trained neural network for processing, and obtain the detection frame results and target category results corresponding to each initial target detection frame based on the output of the pre-trained neural network.

值得说明的是，本申请实施例提供的基于点云数据的目标检测装置的具体实现方式可以参见上述图1和图5对应的方法实施例，此处不再赘述。It is worth noting that the specific implementation of the target detection device based on point cloud data provided by the embodiments of the present application can be referred to the method embodiments corresponding to the above-mentioned Figures 1 and 5, and will not be described again here.

另外，本申请实施例还提供一种计算机可读存储介质，包括程序或指令，当所述程序或指令在计算机上运行时，实现上述图1和图5所对应的方法。In addition, embodiments of the present application also provide a computer-readable storage medium, including a program or instructions. When the program or instructions are run on a computer, the methods corresponding to the above-mentioned Figures 1 and 5 are implemented.

另外，本申请实施例还提供一种包含指令的计算机程序产品，当所述计算机程序产品在计算机上运行时，使得所述计算机执行如上述图1和图5所对应的方法。In addition, embodiments of the present application also provide a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method corresponding to the above-mentioned FIG. 1 and FIG. 5 .

另外，本申请实施例还提供一种芯片系统，包括处理器，所述处理器与存储器的耦合，所述存储器存储有程序指令，当所述存储器存储的程序指令被所述处理器执行时实现上述图1和图5所对应的方法。In addition, embodiments of the present application also provide a chip system, including a processor, the processor is coupled to a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the implementation The methods corresponding to the above figures 1 and 5.

另外，本申请实施例还提供一种计算机服务器，包括存储器，以及与所述存储器通信连接的一个或多个处理器；In addition, embodiments of the present application also provide a computer server, including a memory, and one or more processors communicatively connected to the memory;

所述存储器中存储有可被所述一个或多个处理器执行的指令，所述指令被所述一个或多个处理器执行，以使所述一个或多个处理器实现如上述图1和图5所对应的方法。The memory stores instructions that can be executed by the one or more processors, and the instructions are executed by the one or more processors, so that the one or more processors implement the above-mentioned Figure 1 and The method corresponding to Figure 5.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will understand that embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

本申请中应用了具体实施例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。Specific embodiments are used in this application to illustrate the principles and implementation methods of this application. The description of the above embodiments is only used to help understand the method and its core idea of this application; at the same time, for those of ordinary skill in the art, based on this application The idea of the application may change in the specific implementation mode and application scope. In summary, the contents of this specification should not be understood as limiting the application.

Claims

1. The target detection method based on the point cloud data is characterized by comprising the following steps of:

obtaining original point cloud data and initial target detection frames output by an initial target detection network, and obtaining information of the initial target detection frames, wherein the initial target detection frames are obtained by inputting the original point cloud data into the initial target detection network, one part of points of the original point cloud data fall inside the initial target detection frames, and the other part of points fall outside the initial target detection frames;

extracting point clouds in a preset frame range around each initial target detection frame from the original point cloud data, wherein each preset frame range is larger than and surrounds the corresponding initial target detection frame;

generating input data of a neural network according to points in original point cloud data in each initial target detection frame, points in point cloud within a preset frame range outside the initial target detection frame and information of each initial target detection frame;

and inputting the input data into a pre-trained neural network for processing, and obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to the output of the pre-trained neural network.

2. The method of claim 1, wherein extracting a point cloud within a predetermined frame range around each initial target detection frame from the raw point cloud data comprises:

obtaining a preset frame multiplying power, wherein the frame multiplying power is larger than 1;

and determining each preset frame range according to each initial target detection frame and the frame multiplying power, and extracting point clouds in each preset frame range from the original point cloud data.

3. The method of claim 1, wherein extracting a point cloud within a predetermined frame range around each initial target detection frame from the raw point cloud data comprises:

obtaining a preset frame expansion amount; the frame expansion amount is preset expansion length, expansion area or expansion volume;

and determining each preset frame range according to each initial target detection frame and the frame expansion amount, and extracting point clouds in each preset frame range from the original point cloud data.

4. The method according to claim 1, wherein the input data of the neural network includes point data to be processed and expression information of each initial target detection frame; the initial target detection frame information and the expression information at least comprise size range information of an initial target detection frame; the point data to be processed comprises point positions to be processed;

The generating the input data of the neural network according to the points in the original point cloud data in each initial target detection frame, the points in the point cloud within the preset frame range outside the initial target detection frame and the information of each initial target detection frame comprises the following steps:

determining the expression information of each initial target detection frame according to the points in the original point cloud data in each initial target detection frame, the points in the point cloud within the range of the preset frame outside the initial target detection frame and the size range information of each initial target detection frame;

and generating input data of the neural network at least comprising the expression information of each point position to be processed and each initial target detection frame.

5. The method of claim 4, wherein the initial target detection frame information and the expression information further comprise initial target detection frame center point information and orientation information.

6. The method according to claim 4, wherein determining the expression information of each initial target detection frame based on the points in the original point cloud data in each initial target detection frame, the points in the point cloud within the preset frame range outside the initial target detection frame, and the size range information of each initial target detection frame includes:

Generating virtual points uniformly filled in each initial target detection frame as expression information of each initial target detection frame according to the size range information of each initial target detection frame;

the method further comprises the steps of:

and respectively distributing preset point type values to points in original point cloud data in each initial target detection frame, the virtual points and points in point clouds within a preset frame range outside the initial target detection frame.

7. The method of claim 6, wherein generating virtual points uniformly filled in each initial target detection frame as expression information of each initial target detection frame according to the size range information of each initial target detection frame, comprises:

obtaining the intervals of virtual points corresponding to the initial target detection frames according to the size range information of the initial target detection frames;

and generating virtual points uniformly filled in the initial target detection frames in each initial target detection frame as expression information of each target detection frame according to the intervals of the virtual points corresponding to each initial target detection frame.

8. The method of claim 7, wherein inputting the input data into a pre-trained neural network for processing, obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to an output of the pre-trained neural network, comprises:

And inputting points and virtual points in original point cloud data in each initial target detection frame, points in point clouds within a preset frame range outside the initial target detection frame and respective corresponding point type values into a pre-trained neural network for processing, and obtaining detection frame results and target category results corresponding to each initial target detection frame according to output of the pre-trained neural network.

9. The method of claim 8, further comprising a pre-training process of the neural network:

obtaining a training sample data set; the training sample data set comprises a plurality of groups of training sample data; each group of training sample data comprises points in original point cloud data in an initial target detection frame, virtual points in the initial target detection frame, points in point cloud in a preset frame range outside the initial target detection frame, point type values corresponding to the points, detection frame results corresponding to the initial target detection frame and target type results which are marked in advance;

and taking points in the original point cloud data in the initial target detection frame, virtual points in the initial target detection frame, points in the point cloud within a preset frame range outside the initial target detection frame and point type values corresponding to the points as inputs, taking a detection frame result corresponding to the initial target detection frame and a target class result which are marked in advance as outputs, and training the neural network.

10. The method according to claim 4, wherein determining the expression information of each initial target detection frame based on the points in the original point cloud data in each initial target detection frame, the points in the point cloud within the preset frame range outside the initial target detection frame, and the size range information of each initial target detection frame includes:

and adding the size range information of each initial target detection frame into the corresponding characteristics of the points in the original point cloud data in the initial target detection frame and the points in the point cloud within the range of the preset frame outside the initial target detection frame, so that the characteristics carry the expression information.

11. The method according to claim 10, wherein adding the size range information of each initial target detection frame to the corresponding feature of the point in the original point cloud data in the initial target detection frame and the point in the point cloud within the preset frame range outside the initial target detection frame, so that the feature carries the expression information, includes:

size range information of each initial target detection framel、w、h) Coordinates of points in original point cloud data added to corresponding initial target detection frame and points in point cloud within a preset frame range outside the initial target detection frame xi、yi、zi) In the process, the characteristics that each point has the expression information are generatedxi、yi、zi、l、w、h) The method comprises the steps of carrying out a first treatment on the surface of the Wherein,l、wandhrepresenting the length, width and height of the initial target detection frame, respectively.

12. The method according to claim 10, wherein adding the size range information of each initial target detection frame to the corresponding feature of the point in the original point cloud data in the initial target detection frame and the point in the point cloud within the preset frame range outside the initial target detection frame, so that the feature carries the expression information, includes:

according to the size range information of each initial target detection framel、w、h) The corresponding points in the original point cloud data in the initial target detection frame and the points in the point cloud within the range of the preset frame outside the initial target detection frame are treated with the methodxi、yi、zi) Normalized processing is carried out to generate the characteristics that each point has the expression information) The method comprises the steps of carrying out a first treatment on the surface of the Wherein,l、wandhrepresenting the length, width and height of the initial target detection frame, respectively.

13. The method according to claim 11 or 12, wherein inputting the input data into a pre-trained neural network for processing, obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to an output of the pre-trained neural network, comprises:

And inputting the characteristics of the expression information of the points to be processed in each initial target detection frame and the characteristics of the expression information of the points to be processed in a range of a preset frame outside the initial target detection frame into a pre-trained neural network for processing, and obtaining detection frame results and target class results corresponding to each initial target detection frame according to the output of the pre-trained neural network.

14. The method of claim 13, further comprising a pre-training process of the neural network:

obtaining a training sample data set; the training sample data set comprises a plurality of groups of training sample data; each group of training sample data comprises characteristics of points to be processed in an initial target detection frame with the expression information, characteristics of points to be processed in a preset frame range outside the initial target detection frame with the expression information, detection frame results corresponding to the initial target detection frame and target category results, which are marked in advance;

and taking the characteristics of the expression information of the points to be processed in the initial target detection frame and the characteristics of the expression information of the points to be processed in the range of the preset frame outside the initial target detection frame as inputs, taking the detection frame result and the target class result corresponding to the initial target detection frame which are marked in advance as outputs, and training the neural network.

15. The method of claim 1, wherein the detection frame result includes center point coordinates of a detection frame output by the neural network, detection frame size information, and orientation information corresponding to the detection frame.

16. The method of claim 1, further comprising, after obtaining the detection frame result and the target class result corresponding to each initial target detection frame:

and processing the detection frame results and the target class results corresponding to the initial target detection frames according to a non-maximum suppression NMS algorithm to generate final detection frame results after non-maximum suppression.

17. A point cloud data-based object detection apparatus, comprising:

an initial information obtaining unit configured to obtain initial point cloud data and initial target detection frames output by an initial target detection network, and obtain initial target detection frame information, where each initial target detection frame is obtained by inputting the initial point cloud data to the initial target detection network, a part of points of the initial point cloud data fall within each initial target detection frame, and another part of points of the initial point cloud data fall outside each initial target detection frame;

The point cloud extraction unit is used for extracting point clouds in a preset frame range around each initial target detection frame from the original point cloud data, wherein each preset frame range is larger than and surrounds the corresponding initial target detection frame;

the input data generating unit is used for generating input data of the neural network according to points in original point cloud data in each initial target detection frame, points in point cloud in a preset frame range outside the initial target detection frame and information of each initial target detection frame;

and the result generating unit is used for inputting the input data into a pre-trained neural network for processing, and obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to the output of the pre-trained neural network.

18. A computer readable storage medium comprising a program or instructions which, when run on a computer, implement the method of any one of claims 1 to 16.

19. A system on a chip comprising a processor coupled to a memory, the memory storing program instructions that when executed by the processor implement the method of any one of claims 1 to 16.

20. A computer server comprising a memory, and one or more processors communicatively coupled to the memory;

stored in the memory are instructions executable by the one or more processors to cause the one or more processors to implement the method of any one of claims 1 to 16.