+

CN120495616A - Intelligent UAV real-time spatial perception and target precision detection system - Google Patents

Intelligent UAV real-time spatial perception and target precision detection system

Info

Publication number
CN120495616A
CN120495616A CN202510396604.0A CN202510396604A CN120495616A CN 120495616 A CN120495616 A CN 120495616A CN 202510396604 A CN202510396604 A CN 202510396604A CN 120495616 A CN120495616 A CN 120495616A
Authority
CN
China
Prior art keywords
camera
real
unmanned aerial
aerial vehicle
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510396604.0A
Other languages
Chinese (zh)
Inventor
赵海山
何强俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202510396604.0A priority Critical patent/CN120495616A/en
Publication of CN120495616A publication Critical patent/CN120495616A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

According to the intelligent unmanned aerial vehicle real-time space sensing and target fine detection system, a space sensing module of the unmanned aerial vehicle is perfected through an integrated three-dimensional visual RGB space extraction method, a real-time target fine positioning and target fine recognition capability perfecting content extraction module is added to the unmanned aerial vehicle through an optimized target detection method, navigation and path planning basis is provided for the unmanned aerial vehicle through fusion of the two methods, and an intelligent unmanned aerial vehicle system comprising software and hardware is built. The unmanned aerial vehicle and JETSON TX are used as carrying and developing platforms, the combination of visual real-time space perception modeling and target detection methods of the unmanned aerial vehicle in a specific scene is realized, the unmanned aerial vehicle is used for unmanned aerial vehicles with flight speed limited by the scene, the space perception efficiency is high, the target detection speed is high, the accuracy is high, and obstacle avoidance and path planning are successfully completed under the condition that the environment is unknown and the unmanned aerial vehicle is required to perform obstacle avoidance flight and exploration.

Description

Intelligent unmanned aerial vehicle real-time space sensing and target fine detection system
Technical Field
The application relates to an unmanned aerial vehicle space perception target detection system, in particular to an intelligent unmanned aerial vehicle real-time space perception and target precision detection system, and belongs to the technical field of unmanned aerial vehicle target detection.
Background
Along with the continuous promotion of unmanned aerial vehicle using value, the performance also continuously promotes, and unmanned aerial vehicle not only carries on simple camera and is used for recording the video, still begins to carry special equipment such as depth camera, laser radar and accomplishes specific task. The application direction of the unmanned aerial vehicle at present comprises vegetation protection, street view shooting, electric power inspection, post-disaster rescue and the like, but is basically a basic application. Obstacle avoidance and path planning have been problems in unmanned aerial vehicle applications, which relate to the spatial perception of the unmanned aerial vehicle to the surrounding environment and content extraction based on target recognition, and at present, unmanned aerial vehicles are required to solve the problems of obstacle avoidance and path planning, mostly under the condition that the environment is known, to manually plan and control unmanned aerial vehicle flight. Then the successful completion of obstacle avoidance and path planning is still under exploration under the condition that the environment is unknown and unmanned aerial vehicle is required to perform edging flight and exploration, and the occurrence of the content extraction method based on convolutional neural network target detection provides a certain opportunity for solving the problem. The real-time three-dimensional map is provided for the unmanned aerial vehicle, the unmanned aerial vehicle can be helped to accurately position, the target in the scene can be identified by the content extraction method for target detection, and the position of the target can be given, so that the unmanned aerial vehicle and the target can be exactly provided with meaningful reference data for obstacle avoidance and path planning of the unmanned aerial vehicle by combining the unmanned aerial vehicle and the unmanned aerial vehicle, and the unmanned aerial vehicle can fly in a dry or semi-dry mode. More importantly, if the two methods achieve real-time processing, an unmanned aerial vehicle-mounted intelligent processing system can be established by using a space scene sensing method of an embedded microcomputer and a content extraction method based on target detection. In addition, the realization of the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method is not limited to solving the obstacle avoidance and path planning problems, and the collected data can be efficiently utilized to expand to other application directions, which are deep excavation of unmanned aerial vehicle application, so that the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method has a large application value.
In the aspect of space scene perception, with the improvement of hardware performance of a visual camera, a laser radar camera and the like, the scene perception can not only obtain a reliable data source, but also realize information complementation by cooperative work of multiple devices. The laser radar and the real-time space perception modeling have the characteristics, and the independent use has certain limitation, and the fusion can complement the advantages and disadvantages. For example, vision can work relatively stably in a dynamic environment with rich textures, and can provide very accurate point cloud matching for a laser radar, and relatively accurate direction and distance information provided by the laser radar can also assist in correcting point cloud images. In an environment with darker light or obviously lacking texture, the advantages of the laser radar can be utilized to assist real-time space perception modeling to record a scene by means of a small amount of information. In addition, the laser radar system and the real-time space perception modeling system cannot be limited in structure by only using one solution, and basically all the solutions can be configured with auxiliary positioning tools such as an inertial element, a satellite positioning system, an indoor base station positioning system and the like to form a complementary situation, which is the research trend in recent years, namely, the radar system and other sensors are subjected to data fusion and work cooperatively. Compared with the prior loose coupling fusion method based on Kalman filtering, the current trend is tight coupling fusion based on nonlinear global optimization. For example, real-time space perception modeling and IMU (inertial navigation system) fusion can realize real-time mutual calibration, so that a vision module can keep certain positioning precision when accelerating and decelerating suddenly or rotating suddenly, tracking loss is prevented, and positioning and map construction errors are reduced to a greater extent.
In the aspect of content scene perception, namely target detection, the current trend is to consider accuracy and speed, the key of the problem is to start with target detection based on candidate frames, specifically, the method can realize as much shared calculation amount as possible among different ROIs, remove redundant calculation, and efficiently utilize the characteristics obtained by CNN, so that the speed of the whole detection is improved. Meanwhile, the target detection of the candidate frame still has a certain virtual view and the condition of omission, and the two problems are key problems to be solved in specific application of the target detection.
The problems to be solved by the unmanned aerial vehicle space perception target detection in the prior art and the key technical difficulties of the application include:
(1) Obstacle avoidance and path planning are problems in current unmanned aerial vehicle applications, which relate to the spatial perception of unmanned aerial vehicles to the surrounding environment and content extraction based on target identification, and the current unmanned aerial vehicle aims to solve the problems of obstacle avoidance and path planning mostly under the condition that the environment is known, so that the unmanned aerial vehicle is manually planned and controlled to fly. The appearance of a content extraction method based on convolutional neural network target detection provides a certain opportunity for solving the problem under the condition that the environment is unknown and unmanned aerial vehicle is required to perform edging flight and exploration, obstacle avoidance and path planning are successfully completed, but the method is not mature enough, a real-time three-dimensional map cannot be provided, unmanned aerial vehicle cannot be accurately positioned, the content extraction of the target detection can not completely identify the target in the scene and give the position of the target, and the combination of the target and the target can provide some meaningful reference data for obstacle avoidance and path planning of the unmanned aerial vehicle, but the combination process has a plurality of problems, the technology is not mature enough, and the unmanned aerial vehicle can not realize the dry pre-flight or the semi-dry pre-flight of the unmanned aerial vehicle. Moreover, the two methods cannot achieve real-time processing, and an unmanned aerial vehicle-mounted intelligent processing system cannot be established by using a space scene sensing method of embedding a microcomputer and a content extraction method based on target detection. The problem of obstacle avoidance and path planning cannot be effectively solved, and the application of the unmanned aerial vehicle is restricted.
(2) In the prior art, a space perception module of an unmanned aerial vehicle is perfected by an integrated three-dimensional visual RGB space extraction method, a target detection method is lacked to add a real-time target fine positioning and target fine recognition capability perfecting content extraction module to the unmanned aerial vehicle, navigation and path planning basis cannot be provided for the unmanned aerial vehicle by combining the two methods, and an intelligent unmanned aerial vehicle system including software and hardware is not established, so that the unmanned aerial vehicle cannot finish real-time space perception and target fine detection. The prior art lacks a fast deep convolutional neural network feature extractor, which cannot learn and extract features of all interested targets to be used for specific target detection and identification, lacks a method for realizing fast composition of specific scenes by using single-purpose ORB-vision real-time space perception modeling, lacks an integrated target detection module to mark the interested targets in a vision real-time space perception modeling diagram and give specific position information, and causes poor real-time space perception capability, low target detection accuracy, poor obstacle avoidance capability and poor application safety.
(3) In the prior art, a sample library and an interesting target feature library of indoor interesting targets are lacking, real-time detection and matching of targets cannot be realized by utilizing an end-to-end based fast neural network, real-time composition of the detected targets is lacking by utilizing a three-dimensional visual RGB space extraction system, the identified interesting targets cannot be marked in a simulation map, specific position information cannot be given, a visual real-time space perception modeling method cannot be lacked, a three-dimensional point cloud map cannot be provided for the unmanned aerial vehicle, space positioning of the unmanned aerial vehicle cannot be given, data support cannot be provided for unmanned aerial vehicle navigation, extraction of target detection contents based on deep learning cannot be provided for the unmanned aerial vehicle, spatial positions of various targets cannot be provided for the unmanned aerial vehicle, avoidance and path planning cannot be performed for the unmanned aerial vehicle by utilizing a spatial relationship, a real-time perception and target detection system of the unmanned aerial vehicle is lacked, an intelligent system suitable for real-time perception and target detection of the unmanned aerial vehicle, which comprises software and hardware is lacked, the problem of obstacle avoidance and path planning of the unmanned aerial vehicle cannot be well solved, and unmanned aerial vehicle cannot be realized.
Disclosure of Invention
The method comprises the steps of constructing a rapid deep convolutional neural network feature extractor, learning and extracting features of all interested targets, using the features as specific target detection and identification, utilizing single-purpose ORB-vision real-time space perception modeling to realize rapid composition of specific scenes, marking the interested targets in the vision real-time space perception modeling graph by an integrated target detection module, giving specific position information, ensuring a certain accuracy, applying a machine vision target detection method to scene perception based on content, fusing a scene perception method based on space, establishing an intelligent unmanned aerial vehicle system, taking an unmanned aerial vehicle and JETSON TX as carrying and developing platforms, and combining the vision real-time space perception modeling of the unmanned aerial vehicle under the specific scenes and the target detection method.
In order to achieve the technical effects, the technical scheme adopted by the application is as follows:
The intelligent unmanned aerial vehicle real-time space perception and target fine detection system is characterized in that a space perception module of the unmanned aerial vehicle is perfected through an integrated three-dimensional visual RGB space extraction method, a target detection method is optimized to add real-time target fine positioning and target fine recognition capability to the unmanned aerial vehicle to perfect a content extraction module, navigation and path planning basis are provided for the unmanned aerial vehicle through fusion of the two methods, and an intelligent unmanned aerial vehicle system comprising software and hardware is built;
The application establishes a sample library and an interesting target feature library of indoor interesting targets, realizes real-time detection and matching of targets by using an end-to-end based rapid neural network, performs real-time composition by using a three-dimensional visual RGB space extraction system while detecting the targets, marks the identified interesting targets in a simulation graph and gives specific position information, and the core method comprises the following steps:
(1) The method is based on a visual real-time space perception modeling method, which comprises the steps of carrying a depth camera on an unmanned aerial vehicle, providing an RGB acquisition chart and a depth acquisition chart for real-time space perception modeling, adopting real-time space perception modeling, namely three-dimensional visual RGB space extraction to finish the perception of a space scene, finally providing a three-dimensional point cloud map for the unmanned aerial vehicle, providing the space positioning of the unmanned aerial vehicle, and providing data support for unmanned aerial vehicle navigation;
(2) The method comprises the steps of extracting target detection content based on deep learning, optimizing real-time processing capacity of a target detection network based on the flight speed of an unmanned aerial vehicle, converting a target detection frame target into a three-dimensional space by combining a conversion relation obtained by scene perception, providing the spatial positions of various targets for the unmanned aerial vehicle, and utilizing the spatial relation unmanned aerial vehicle to perform obstacle avoidance and path planning;
(3) And establishing a real-time sensing and target detection system of the unmanned aerial vehicle, namely based on optimization of real-time scene sensing and real-time target detection, embedding the realization of the two modules into a microcomputer, and establishing an intelligent system comprising software and hardware and suitable for the real-time sensing and target detection of the unmanned aerial vehicle.
Preferably, the target real-time fine detection network data format comprises a training data sample and a target real-time fine detection network data format, wherein the training data sample comprises a large acquisition graph of a plurality of objects, for each object in the acquisition graph, a training label not only comprises a class of the object, but also comprises coordinates of each corner point of a boundary box, the number of the objects among different training acquisition graphs is different, the problem that a loss function is difficult to define is solved by introducing a fixed three-dimensional label format to select label formats with different lengths and dimensions, and the definition format can input the acquisition graph with any size comprising any plurality of objects;
The acquisition map is segmented with regular grids, the grid size is slightly smaller than the minimum object desired to be detected, each grid has two pieces of key information, including the class of the object and the coordinates of the corner points comprising the grid of the object, in addition, in the case of no object in the grid, a special custom class, namely a 'dontcare' class, is used to uniformly maintain a fixed size on the data representation, and an object coverage value represented by 0 or 1 is also set to represent whether the grid has an object, and for the case that many objects are in the same grid, the object occupying the most pixels in the grid is selected, and in the case that there is an overlap of objects, the object of the bounding box with the minimum Y value is used.
Preferably, the real-time accurate target detection network training is divided into three steps:
the first step, a data layer acquires a training acquisition graph and a label, and a conversion layer carries out online data enhancement;
The second step, the full convolution network performs feature extraction and prediction on the object class and the boundary frame of each grid;
Predicting the object category and the target boundary box of each grid respectively, and then simultaneously calculating errors of two prediction tasks by using a loss function;
the prediction process comprises two points, namely, generating a final frame set by using a clustering function in the verification process, and measuring the performance of a model on a verification data set by using a simplified mAP (maximum likelihood) calculation value;
The network receives input collection graphs with different sizes, effectively applies CNN in a sliding window mode with step length, outputs a multi-dimensional array, is overlapped on the collection graphs, and uses GoogLeNet for deleting a final pooling layer to enable the CNN to be used in a sliding window with the maximum step length of 555 multiplied by 555 pixels and 16 pixels;
A final optimized loss function is generated using a linear combination of two independent loss functions, the loss functions comprising in the training data samples the sum of squares of the differences between the true and predicted object coverage of all meshes, the average absolute difference loss of the true and predicted corner points of the bounding box of the object covered at each mesh.
Preferably, the flow of real-time spatial perception modeling comprises:
Reading sensor acquisition graph data, namely reading and preprocessing acquisition graph information of an unmanned aerial vehicle camera in real-time space perception modeling, wherein the data of a depth camera comprises an RGB acquisition graph and a depth graph corresponding to the RGB acquisition graph;
Modeling a visual odometer, namely calculating the attitude change of a camera and a local map by estimating the rotation and translation relation between every two adjacent acquisition graphs by the visual odometer, wherein the key of the step is feature point extraction and acquisition graph matching;
the back end adopts a nonlinear global optimization algorithm to optimize the position and the gesture of a camera from the front end and the detection result of the current and the future from the other thread, and corrects a global unified track diagram and a point cloud diagram;
Judging whether the scene passed by the sensor or the unmanned aerial vehicle carrying the sensor is over or not, and if the scene passes by a certain place, providing information for the rear end to correct the position and the gesture again;
and fourthly, constructing a cruise map of the unmanned aerial vehicle which meets the task requirements according to the estimated camera track.
Preferably, the real-time spatial perception modeling framework:
1) Visual odometer modeling
The front end is in charge of receiving a video stream of a camera, namely an acquisition graph sequence, estimating the motion of the camera between adjacent frames by a feature matching method, and preliminarily obtaining mileage information with certain error accumulation, wherein the visual odometer modeling comprises four parts:
firstly, collecting a frame, wherein the carried information comprises the pose of an unmanned aerial vehicle camera, an RGB (red green blue) collection chart and a depth chart when the frame collection chart is shot;
secondly, a camera model is corresponding to a camera in actual shooting and only comprises internal parameters;
Thirdly, the local map comprises key frames and landmark information points, wherein the key frames and the landmark information points conforming to the matching rule are added into the map, the map is only the local map but not the global map, and only the landmark information points near the current position are included, and the more distant landmark information points are deleted;
fourthly, the landmark information points are map points with known information in the map, wherein the known information included in the landmark information points is feature description corresponding to the landmark information points, and the obtaining mode is to apply a feature matching algorithm to extract the landmark information points in batches;
2) Backend global optimization
The method comprises the steps of analyzing and processing noise problems on data in a global process, wherein the noise problems comprise a linear global optimization algorithm and a nonlinear global optimization algorithm, the linear global optimization assumes that each frame acquisition chart in the shooting process has a linear relation, a Kalman filtering algorithm is used for carrying out state estimation, if the linear relation exists between a previous frame and a next frame, the state estimation is completed through extended Kalman filtering, the difference between an observed value and an algorithm estimated value is calculated, namely, the error value of a pixel coordinate and the pixel coordinate of a corresponding 3D point projected to a two-dimensional plane through a camera position resource is calculated, the error of the linear global optimization default camera position resource and a space point has a causal relation, the camera position and the pose are firstly calculated, and then the position of the space point is further calculated according to the camera position;
3) So the detection of the arrival and departure
The key point is that a word bag model is established, the word bag model abstracts the features into words, the detection process is to match the words appearing in the two images to judge whether the two images describe the same scene, the features are classified into words, a dictionary comprising all possible word sets needs to be trained, massive data are needed to be established for training the dictionary, the dictionary is established as a clustering process, 1 hundred million features are assumed to be extracted from all the images, the K-means clustering method is used for gathering the features into hundred thousand words, a tree with K branches and depth d is constructed for the dictionary in the training process, coarse classification is provided for the upper node of the tree, fine classification is provided for the lower node of the tree, the tree extends to leaf nodes, the time complexity is reduced to logarithmic level by using the tree, and the feature matching speed is accelerated;
4) Patterning of
The two-dimensional plane points are converted into a three-dimensional space by using the data collected by the camera after optimization and correction of the camera gesture, so that three-dimensional space point cloud information is formed, besides a point cloud map, the optimization process of the camera gesture is shown in a g2o tool to form a gesture map, and the map can be defined and described according to specific situations.
Preferably, the three-dimensional visual RGB space extraction method comprises the following steps:
The sensor adopts a monocular camera for acquiring depth information, and the data source comprises an RGB acquisition chart and a depth chart.
The method comprises the steps of obtaining a color acquisition chart and a depth acquisition chart by using a depth camera, transferring 2D plane data to a 3D three-dimensional space by using a geometric model, wherein the coordinate system center points from pixel coordinates to acquisition chart coordinates are different, only have offset relation, the coordinate axes from the acquisition chart coordinates to a camera coordinate system are parallel, only have scaling relation, and the conversion relation from the pixel coordinates to the camera coordinate system is expressed as follows:
Where u, v are the offset between the origins of the coordinate system and the center of the acquisition plane, d x,dy is the scaling of the pixel coordinates and the actual imaging plane, d x=zc/fx,dy=zc/fy,fx,fy is the focal length of the camera on the x, y axes, and the form of writing into a matrix is:
the camera motion camera coordinate system and the world coordinate system are not parallel, have rotation and translation relations, and when the subsequent visual odometer calculates, the relation between the front frame and the rear frame is the same as above, and the matrix relation is given as follows:
And converting the points of the two-dimensional plane into a three-dimensional space, finally obtaining a series of point cloud data, and endowing RGB color attributes to obtain a pair of color three-dimensional maps preliminarily.
Preferably, three-dimensional visual RGB space extraction is implemented:
(1) Front-end visual odometer
The initialization is to start searching key frames by taking a first frame acquisition diagram as a reference, the matching between every two acquisition diagrams adopts an ORB algorithm to extract key points, then BRIEF descriptors are calculated for each key point, and finally quick matching is carried out by adopting a quick approximate nearest neighbor algorithm, wherein the ORB angular point extraction algorithm adds scale and rotation description on a FAST angular point extraction algorithm, adds feature information, has richer feature description and high matching precision, is more accurate and reliable in composition, adopts binary descriptors for BRIEF descriptors, and uses random point selection comparison;
After matching is finished, 2D points are projected to a 3D space according to the depth acquisition graph, 2D coordinates and corresponding 3D coordinates of a series of points are obtained, the position information of a camera is estimated by solving a PnP problem, the actual calculation result is a rotation and translation matrix between the front frame acquisition graph and the rear frame acquisition graph, all data are matched in sequence in pairs, and the pose of the camera is calculated, so that a complete visual odometer is finally obtained;
(2) Backend nonlinear global optimization
The three-dimensional visual RGB space extraction expresses the calculated gesture of the visual odometer through the gesture graph, the three-dimensional visual RGB space extraction comprises nodes and edges, the nodes represent the gestures of each camera, the edges represent the transformation among the gestures of the cameras, the gesture graph not only intuitively describes the visual odometer, but also is convenient for understanding the change of the gestures of the cameras, the nonlinear global optimization expression is graph optimization, the same scene cannot appear in a plurality of positions, the gesture graph has sparsity characteristics, and the gesture graph is solved by adopting a sparse BA algorithm to correct the gestures of the cameras.
The unmanned aerial vehicle hardware system comprises an airborne computer, an airborne module assembly, a camera and a cloud platform, an M100 four-rotor unmanned aerial vehicle, wherein the M100 four-rotor unmanned aerial vehicle provides a flight platform to realize a basic flight function, the camera and the cloud platform are acquisition graph acquisition components in the airborne hardware system, the airborne module assembly is a hardware assembly part positioned between the camera cloud platform and the M100 four-rotor unmanned aerial vehicle, 1) video data of the acquisition camera are sent to the airborne computer, 2) the airborne computer realizes control of the cloud platform through the airborne module assembly, 3) the airborne computer can realize flight control of the M100 four-rotor unmanned aerial vehicle through the airborne module assembly, 4) video acquisition graph data of the camera can be input to a graph transmission system of the M100 four-rotor unmanned aerial vehicle through the airborne module assembly, 5. Voltage conversion, the airborne module assembly converts 24V voltage acquired from a battery of the M100 unmanned aerial vehicle into 12V to supply power to the cloud platform and the airborne computer;
(1) Airborne computer
The onboard computer adopts NVIDIA JETSON TX RTS-ASG003 microcomputer, and the total weight is 170g;
(2) Airborne module assembly
The machine-mounted module assembly belongs to an intermediate execution processing unit in the whole machine-mounted hardware system, and comprises 1) video data output by a high-definition camera are divided into two paths through an HDMI distributor, one path of the video data is input into a video collector and is output to a machine-mounted computer by the video collector, the other path of the video data is output to a wireless image transmission system of an M100 unmanned aerial vehicle through an N1 encoder, 2) a USB-to-UART and PWM module is used for controlling the flight control of the M100 unmanned aerial vehicle and the control of the cloud deck by the machine-mounted computer, 3) a vision sensor realizes the autonomous obstacle avoidance of the M100 unmanned aerial vehicle, 4) an RC receiver receives a control signal of a ground remote controller to control the action of the cloud deck, 5) the wireless data transmission module provides a low-bandwidth data link for the machine-mounted hardware system and the ground system, and 6) supplies power to all components in the machine-mounted computer, the cloud deck and the machine-mounted module assembly;
The video acquisition part in the airborne module assembly consists of an HDMI distributor, a video acquisition device and an N1 encoder, wherein the HDMI distributor divides a video stream from a high-definition camera into two paths, and the two paths of video streams respectively enter the video acquisition device and the N1 encoder through HDMI interfaces;
The wireless data transmission module and the wireless data transmission module at the ground end realize a low-bandwidth wireless data transmission data link, and the sky end and the ground end provide a data path for bidirectional data transmission; the method comprises the steps of establishing a relatively independent vision sensing system by using a Guidance, providing five groups of vision ultrasonic combined sensors, monitoring environmental information in multiple directions in real time, sensing obstacles, matching with an unmanned aerial vehicle flight controller, enabling the aircraft to timely avoid possible collision in high-speed flight, receiving control signals of a ground remote controller by using an RC wireless receiver R7008SB, internally processing the control signals and outputting PWM waveforms to control the movement state of a tripod head, setting 16 receiving channels by using a receiver, converting a USB into a UART and PWM module, realizing two parts of functions, completing conversion from the USB to the URAT (TTL level), connecting a UART interface of the module with a UART interface of the M100 unmanned aerial vehicle, realizing the flight control of the M100 unmanned aerial vehicle by using the UART interface of the module, enabling the airborne computer to output PWM control signals through the module, connecting the PWM control signals output by the module with the heading and the pitching control signals of the tripod head, and realizing the movement state control of the tripod head by using the airborne computer;
(3) Camera and cradle head
The method comprises the steps that a GoPro Hero4 high-definition camera is adopted to collect video data, a MiNi3DPro cradle head is adopted to carry GoProHero high-definition camera to control a camera visual angle, the cradle head is a triaxial cradle head to realize motion control in three directions of pitching, rolling and heading, two control modes of the cradle head are set, one control mode is that an onboard computer outputs PWM waveforms through a USB-UART and PWM module to realize control of the cradle head, and the other control mode is that an onboard cradle head receiver RS7008SB is controlled by a ground remote controller to output PWM waveforms to control the cradle head;
Setting control signals of a cradle head navigation axis and a pitching axis, wherein the control signals of the heading axis and the pitching axis are PWM waveforms with the period of 50HZ, and realizing position control of the heading axis and the pitching axis by adjusting the duty ratio of the control signals, wherein the duty ratio is 5.1% and corresponds to the minimum position, 7.6% and corresponds to the balance position, and 10.1% and corresponds to the maximum position;
setting a mode control signal of a cradle head to realize the control of a locking mode, a heading and a pitching following mode, wherein the three modes of the heading following mode are realized, when a signal input to a mode control lead is a PWM signal with a period of 50HZ and a duty ratio of between 5 and 6 percent, the cradle head enters the locking mode, the heading, the pitching and the rolling are locked at the moment, the heading and the pitching are controlled by a remote controller or an airborne computer, when the signal input to the mode control lead is a PWM signal with a period of 50HZ and a duty ratio of between 6 and 9 percent, the cradle head enters the heading and the pitching following mode, the rolling is locked at the moment, the heading smoothly rotates along with the direction of a nose, the pitching smoothly rotates along with the elevation angle of an airplane, and when the signal input to the mode control lead is a PWM signal with a period of 50HZ and a duty ratio of between 9 and 100 percent, the MiNi3DPro enters the heading following mode, and the heading, the pitching and the rolling are locked at the moment, the heading and the pitching smoothly rotate along with the direction of the nose and the pitching are controlled by the remote controller or the airborne computer.
The unmanned aerial vehicle hardware system is integrated by setting each unit module of an unmanned aerial vehicle airborne module assembly, designing a cloud deck power supply system, enabling 25V voltage output by an unmanned aerial vehicle battery to be output to 19V, 12V and 5V voltages respectively after passing through three DCDC voltage conversion modules of the airborne module assembly, enabling the 19V voltage to supply power to an airborne computer, enabling the 12V voltage to supply power to the cloud deck, enabling the 5V voltage to supply power to an RC receiver and an HDMI distributor, enabling an unmanned aerial vehicle platform internal power supply system to supply power to Guidence visual sensors, an N1 encoder, an airborne unmanned aerial vehicle line sensor and wireless data transmission respectively, enabling an HDMI video collector and the airborne wireless data transmission to take power through USB interfaces of the airborne computer, and enabling the camera to be powered by a battery carried by the camera.
Preferably, the integration of the unmanned aerial vehicle software system is completed by adopting an ROS system platform, and communication between modules is carried out by using a message mechanism of ROS, and meanwhile, the coupling degree of the two modules is loose;
1) Target detection flow
The target detection system adopts Jetson-INFERENCE system and comprises classification, detection and segmentation, wherein the detection module comprises acquisition chart detection, video detection and camera real-time detection, and the video stream is finally decomposed into acquisition chart frames, and the essence of the detection is the acquisition chart detection;
The target detection system converts the video stream into an acquisition image frame, then detects the acquisition image by using a trained network model, finally obtains the category of the target and the pixel coordinates of the target frame, and the output of the target detection system is transmitted to the visual real-time space perception modeling system for positioning and navigation;
2) Three-dimensional visual RGB space extraction process
The three-dimensional visual RGB space extraction system is used for receiving an RGB acquisition image and a depth image, firstly matching the RGB acquisition image to obtain a key frame, then matching the depth acquisition image to construct a point cloud image, then carrying out nonlinear global optimization and local forward and backward detection on the corrected point cloud image, and finally receiving a target detection result to finish target positioning of a three-dimensional space;
3) Unmanned aerial vehicle airborne processing method integration
The onboard processing process of the software system is that after TX2 receives collected image data from a camera, a target detection module detects collected image content to obtain a position coordinate of a target, and then the position coordinate is transmitted to a visual real-time space perception modeling system in real time, and at the moment, the visual real-time space perception modeling system reconstructs a target detection result corresponding to a key frame in a three-dimensional space according to the key frame composition to realize space content extraction.
Compared with the prior art, the application has the innovation points and advantages that:
(1) According to the application, the space perception module of the unmanned aerial vehicle is perfected through integrating the three-dimensional visual RGB space extraction method, the real-time target fine positioning and target fine recognition capability perfecting content extraction module is added to the unmanned aerial vehicle through optimizing the target detection method, and finally navigation and path planning basis is provided for the unmanned aerial vehicle through fusing the two methods, and an intelligent unmanned aerial vehicle system comprising software and hardware is established. The content extraction method based on the convolutional neural network target detection is adopted to provide a real-time three-dimensional map for the unmanned aerial vehicle so as to help the unmanned aerial vehicle to accurately position, the content extraction method based on the target detection can identify the target in a scene and give the position of the target, and the two methods are combined to just provide meaningful reference data for obstacle avoidance and path planning of the unmanned aerial vehicle, so that the unmanned aerial vehicle can fly in a dry or semi-dry mode. A space scene sensing method for embedding a microcomputer into an unmanned aerial vehicle and a content extraction method based on target detection are used for establishing an unmanned aerial vehicle-mounted intelligent processing system, and the two methods achieve real-time processing. The realization of the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method is not limited to solving the problems of obstacle avoidance and path planning, the collected data can be efficiently utilized, the obstacle avoidance and path planning can be successfully completed under the condition that the environment is unknown and the unmanned aerial vehicle is required to perform the edging flight and exploration, the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method is expanded to other application directions, the deep excavation and expansion of unmanned aerial vehicle application are realized, and the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method has a large application value.
(2) The method comprises the steps of constructing a rapid deep convolutional neural network feature extractor, learning and extracting features of all interested targets, using the features as specific target detection and identification, utilizing single-purpose ORB-vision real-time space perception modeling to realize rapid composition of specific scenes, marking the interested targets in the vision real-time space perception modeling graph by an integrated target detection module, giving specific position information, ensuring a certain accuracy, applying a machine vision target detection method to scene perception based on content, fusing a scene perception method based on space, establishing an intelligent unmanned aerial vehicle system, taking an unmanned aerial vehicle and JETSON TX as carrying and developing platforms, and combining the vision real-time space perception modeling of the unmanned aerial vehicle under the specific scenes and the target detection method.
(3) The method comprises the steps of 1) carrying a depth camera on an unmanned aerial vehicle, providing an RGB acquisition chart and a depth acquisition chart for real-time space perception modeling, finally providing a three-dimensional point cloud map for the unmanned aerial vehicle, providing space positioning of the unmanned aerial vehicle, providing data support for unmanned aerial vehicle navigation, 2) extracting target detection content based on deep learning, optimizing real-time processing capacity based on the speed of unmanned aerial vehicle flight, combining a conversion relation obtained by scene perception, converting a target detection frame into a three-dimensional space, providing spatial positions of various targets for the unmanned aerial vehicle, and making obstacle avoidance and path planning by using a space relation unmanned aerial vehicle, 3) establishing a real-time perception and target detection system of the unmanned aerial vehicle, optimizing real-time perception scenes and real-time target detection based on the real-time perception, providing data support for unmanned aerial vehicle navigation, 2) realizing high accuracy of the unmanned aerial vehicle navigation, optimizing real-time processing capacity based on the target detection content, combining the conversion relation obtained by scene perception, and realizing the real-time perception of the unmanned aerial vehicle, and the unmanned aerial vehicle has high accuracy, and the unmanned aerial vehicle navigation system.
Drawings
Fig. 1 is a general block diagram of the on-board hardware of the unmanned aerial vehicle hardware system.
Fig. 2 is a system block diagram of an unmanned aerial vehicle hardware on-board module assembly.
FIG. 3 is a schematic diagram illustrating the connection between the USB to UART and PWM module output pins and the cradle head control signal line.
Fig. 4 is a schematic diagram of control signals for setting the pan/tilt axis and the tilt axis.
Fig. 5 is a schematic diagram of a mode control signal for setting a pan/tilt head.
Fig. 6 is a schematic diagram of each unit module of the unmanned aerial vehicle on-board module assembly.
Fig. 7 is a schematic diagram of the overall framework of the unmanned aerial vehicle on-board processing method.
Fig. 8 is a schematic diagram of an outdoor application case of the unmanned aerial vehicle of the present application.
Fig. 9 is a schematic diagram of an indoor application case of the unmanned aerial vehicle of the present application.
Detailed Description
The technical scheme of the intelligent unmanned aerial vehicle real-time space sensing and target fine detection system provided by the application is further described below with reference to the accompanying drawings, so that the application can be better understood and implemented by those skilled in the art.
With the great improvement of the performance of software and hardware, the machine vision is improved in accuracy and real-time, the content-based scene perception method and the space-based scene perception method are increased in recent years, the target detection method has real-time performance while ensuring a certain accuracy, the machine vision target detection method is applied to the content-based scene perception, the space-based scene perception method is fused, an intelligent unmanned aerial vehicle system is built, unmanned aerial vehicles and JETSON TX are used as carrying and developing platforms, the combination of the visual real-time space perception modeling and the target detection method of the unmanned aerial vehicle under a specific scene is realized, and the unmanned aerial vehicle with the flight speed limited by the scene is used;
(1) The scene perception method based on the content is characterized in that a scene perception task based on the content is completed by adopting a target detection method based on deep learning, and the unmanned aerial vehicle is different from common target detection by taking the unmanned aerial vehicle as a carrying platform, and firstly, the unmanned aerial vehicle has a certain flight speed, and the real-time requirement is provided for the target detection. Secondly, the unmanned aerial vehicle is far from near to near in the process of flying and passing through the object, and meanwhile, the shooting view angle can also have the conditions of front view, strabismus, overlook and the like due to the relative positions of the unmanned aerial vehicle and the object, so that the target detection method is required to have the characteristics of rotation and unchanged scale. And optimizing and constructing a neural network to finish the task of extracting real-time content of the unmanned aerial vehicle, and accurately identifying the target in the specific scene through training under the condition of sufficient data quantity.
(2) The space-based scene perception method comprises the steps of completing matching and realizing rapid composition by adopting a fast ORB corner detection method, and integrating a visual real-time space perception modeling method into the real-time space perception of a specific scene of the unmanned aerial vehicle.
(3) The intelligent unmanned aerial vehicle system is formed by embedding a space perception module based on a visual real-time space perception modeling method and a target detection content extraction method based on a convolutional neural network into a microcomputer and combining the space perception module and the target detection content extraction method with the unmanned aerial vehicle system. Through experiments, the problem of carrying two kinds of perception modules in the actual flight process is solved.
The method comprises the steps of improving a space perception module of an unmanned aerial vehicle through an integrated three-dimensional visual RGB space extraction method, improving a target detection method, adding a real-time target fine positioning and target fine recognition capability improvement content extraction module to the unmanned aerial vehicle, finally providing navigation and path planning basis for the unmanned aerial vehicle through fusion of the two methods, and establishing an intelligent unmanned aerial vehicle system comprising software and hardware;
The application establishes a sample library and an interesting target feature library of indoor interesting targets, realizes real-time detection and matching of targets by using an end-to-end based rapid neural network, performs real-time composition by using a three-dimensional visual RGB space extraction system while detecting the targets, marks the identified interesting targets in a simulation graph and gives specific position information, and the core method comprises the following steps:
(1) The method is based on a visual real-time space perception modeling method, which comprises the steps of carrying a depth camera on an unmanned aerial vehicle, providing an RGB acquisition chart and a depth acquisition chart for real-time space perception modeling, adopting real-time space perception modeling, namely three-dimensional visual RGB space extraction to finish the perception of a space scene, finally providing a three-dimensional point cloud map for the unmanned aerial vehicle, providing the space positioning of the unmanned aerial vehicle, and providing data support for unmanned aerial vehicle navigation;
(2) The method comprises the steps of extracting target detection content based on deep learning, optimizing real-time processing capacity in terms of target detection by a network based on the speed of unmanned aerial vehicle flight, converting a target detection frame target into a three-dimensional space by combining a conversion relation obtained by scene perception, providing the spatial positions of various targets for the unmanned aerial vehicle, and utilizing the spatial relation unmanned aerial vehicle to perform obstacle avoidance and path planning;
(3) And establishing a real-time sensing and target detection system of the unmanned aerial vehicle, namely based on optimization of real-time scene sensing and real-time target detection, embedding the realization of the two modules into a microcomputer, and establishing an intelligent system comprising software and hardware and suitable for the real-time sensing and target detection of the unmanned aerial vehicle.
1. Content extraction method for real-time accurate target detection
Target real-time accurate detection network data format
The training data sample comprises a large collection chart of a plurality of objects, for each object in the collection chart, the training label not only comprises the class of the object, but also comprises the coordinates of each corner point of the boundary box, the number of the objects is different among different training collection charts, the problem that the definition of a loss function is difficult due to the selection of label formats with different lengths and dimensions is solved by introducing a fixed three-dimensional label format, and the definition format can input the collection chart with any size comprising any plurality of objects.
The acquisition map is segmented with regular grids, the grid size is slightly smaller than the minimum object desired to be detected, each grid has two pieces of key information, including the class of the object and the coordinates of the corner points comprising the grid of the object, in addition, in the case of no object in the grid, a special custom class, namely a 'dontcare' class, is used to uniformly maintain a fixed size on the data representation, and an object coverage value represented by 0 or 1 is also set to represent whether the grid has an object, and for the case that many objects are in the same grid, the object occupying the most pixels in the grid is selected, and in the case that there is an overlap of objects, the object of the bounding box with the minimum Y value is used.
(II) real-time accurate detection network framework for targets
The real-time accurate target detection network training is divided into three steps:
the first step, a data layer acquires a training acquisition graph and a label, and a conversion layer carries out online data enhancement;
The second step, the full convolution network performs feature extraction and prediction on the object class and the boundary frame of each grid;
Predicting the object category and the target boundary box of each grid respectively, and then simultaneously calculating errors of two prediction tasks by using a loss function;
the prediction process comprises two points, namely, generating a final frame set by using a clustering function in the verification process, and measuring the performance of a model on a verification data set by using a simplified mAP (maximum likelihood) calculation value;
The network receives input collection graphs with different sizes, effectively applies CNN in a sliding window mode with step length, outputs a multi-dimensional array, is overlapped on the collection graphs, and uses GoogLeNet for deleting a final pooling layer to enable the CNN to be used in a sliding window with the maximum step length of 555 multiplied by 555 pixels and 16 pixels;
A final optimized loss function is generated using a linear combination of two independent loss functions, the loss functions comprising in the training data samples the sum of squares of the differences between the true and predicted object coverage of all meshes, the average absolute difference loss of the true and predicted corner points of the bounding box of the object covered at each mesh.
2. Visual real-time space perception modeling method
First, real-time space perception modeling framework
The real-time space perception modeling process comprises the following steps:
Reading sensor acquisition graph data, namely reading and preprocessing acquisition graph information of an unmanned aerial vehicle camera in real-time space perception modeling, wherein the data of a depth camera comprises an RGB acquisition graph and a depth graph corresponding to the RGB acquisition graph;
Modeling a visual odometer, namely calculating the attitude change of a camera and a local map by estimating the rotation and translation relation between every two adjacent acquisition graphs by the visual odometer, wherein the key of the step is feature point extraction and acquisition graph matching;
the back end adopts a nonlinear global optimization algorithm to optimize the position and the gesture of a camera from the front end and the detection result of the current and the future from the other thread, and corrects a global unified track diagram and a point cloud diagram;
Judging whether the scene passed by the sensor or the unmanned aerial vehicle carrying the sensor is over or not, and if the scene passes by a certain place, providing information for the rear end to correct the position and the gesture again;
and fourthly, constructing a cruise map of the unmanned aerial vehicle which meets the task requirements according to the estimated camera track.
1. Visual odometer modeling
The front end is in charge of receiving a video stream of a camera, namely an acquisition graph sequence, estimating the motion of the camera between adjacent frames by a feature matching method, and preliminarily obtaining mileage information with certain error accumulation, wherein the visual odometer modeling comprises four parts:
firstly, collecting a frame, wherein the carried information comprises the pose of an unmanned aerial vehicle camera, an RGB (red green blue) collection chart and a depth chart when the frame collection chart is shot;
secondly, a camera model is corresponding to a camera in actual shooting and only comprises internal parameters;
And thirdly, adding the key frames and the landmark information points which meet the matching rule into the map, wherein the map is only the local map but not the global map, and only comprises the landmark information points near the current position, and deleting the more distant landmark information points.
And fourthly, the landmark information points are map points with known information in the map, wherein the known information included in the landmark information points is feature description corresponding to the landmark information points, and the obtaining mode is to extract the landmark information points in batches by using a feature matching algorithm.
2. Backend global optimization
The global process analyzes the data to process noise problems, including linear global optimization and nonlinear global optimization algorithm;
The linear global optimization assumes that each frame acquisition graph in the shooting process has a linear relation, a Kalman filtering algorithm is used for carrying out state estimation, if the state estimation is carried out by adopting an extended Kalman filtering algorithm if only the previous frame and the next frame have the linear relation, the state estimation is completed by adopting the extended Kalman filtering, the difference between an observed value and an algorithm estimated value, namely the error value of pixel coordinates and the pixel coordinates of the corresponding 3D point projected to a two-dimensional plane through a camera position resource, the error generation of the linear global optimization default camera position resource and a space point has a causal relation, the camera position and the attitude are firstly solved, then the position of the space point is further solved according to the camera position resource, and the nonlinear global optimization directly puts all data into the same model for optimization solution to desalt the front-back relation between the data.
3. So the detection of the arrival and departure
The method is characterized in that a word bag model is established, the word bag model abstracts features into words, the detection process is to match words appearing in two images to judge whether the two images describe the same scene or not, the features are classified into words, a dictionary comprising all possible word sets needs to be trained, massive data are needed to be established for training the dictionary, the dictionary is established as a clustering process, 1 hundred million features are assumed to be extracted from all the images, the K-means clustering method is used for gathering the features into hundred thousand words, a tree with K branches and depth of d is established for the dictionary in the training process, coarse classification is provided for the upper node of the tree, fine classification is provided for the lower node of the tree, the tree extends to leaf nodes, the time complexity is reduced to logarithmic level by utilizing the tree, and the feature matching speed is accelerated.
4. Patterning of
The two-dimensional plane points are converted into a three-dimensional space by using the data collected by the camera after optimization and correction of the camera gesture, so that three-dimensional space point cloud information is formed, besides a point cloud map, the optimization process of the camera gesture is shown in a g2o tool to form a gesture map, and the map can be defined and described according to specific situations.
(II) three-dimensional visual RGB space extraction method
The sensor adopts a monocular camera for acquiring depth information, and the data source comprises an RGB acquisition chart and a depth chart.
The method comprises the steps of obtaining a color acquisition chart and a depth acquisition chart by using a depth camera, transferring 2D plane data to a 3D three-dimensional space by using a geometric model, wherein the coordinate system center points from pixel coordinates to acquisition chart coordinates are different, only have offset relation, the coordinate axes from the acquisition chart coordinates to a camera coordinate system are parallel, only have scaling relation, and the conversion relation from the pixel coordinates to the camera coordinate system is expressed as follows:
Where u, v are the offset between the origins of the coordinate system and the center of the acquisition plane, d x,dy is the scaling of the pixel coordinates and the actual imaging plane, d x=zc/fx,dy=zc/fy,fx,fy is the focal length of the camera on the x, y axes, and the form of writing into a matrix is:
the camera motion camera coordinate system and the world coordinate system are not parallel, have rotation and translation relations, and when the subsequent visual odometer calculates, the relation between the front frame and the rear frame is the same as above, and the matrix relation is given as follows:
And converting the points of the two-dimensional plane into a three-dimensional space, finally obtaining a series of point cloud data, and endowing RGB color attributes to obtain a pair of color three-dimensional maps preliminarily.
1. Three-dimensional visual RGB space extraction implementation
(1) Front-end visual odometer
The initialization is to start searching key frames by taking a first frame acquisition diagram as a reference, the matching between every two acquisition diagrams adopts an ORB algorithm to extract key points, then BRIEF descriptors are calculated for each key point, and finally quick matching is carried out by adopting a quick approximate nearest neighbor algorithm, wherein the ORB angular point extraction algorithm adds scale and rotation description on a FAST angular point extraction algorithm, adds feature information, has richer feature description and high matching precision, is more accurate and reliable in composition, adopts binary descriptors for BRIEF descriptors, and uses random point selection comparison;
After matching is completed, 2D points are projected to a 3D space according to the depth acquisition graph, 2D coordinates and corresponding 3D coordinates of a series of points are obtained, the position information of a camera is estimated by solving a PnP problem, the actual calculation result is a rotation and translation matrix between the front frame acquisition graph and the rear frame acquisition graph, all data are matched in sequence in pairs, and the pose of the camera is calculated, so that a complete visual odometer is finally obtained.
(2) Backend nonlinear global optimization
In the front-end visual odometer establishing process, only two adjacent frames of acquisition images are continuously matched and the corresponding camera gestures are solved, so that the situation that errors are accumulated cannot be avoided, and the camera gestures need to be corrected.
The three-dimensional visual RGB space extraction expresses the calculated gesture of the visual odometer through the gesture graph, the three-dimensional visual RGB space extraction comprises nodes and edges, the nodes represent the gestures of each camera, the edges represent the transformation among the gestures of the cameras, the gesture graph not only intuitively describes the visual odometer, but also is convenient for understanding the change of the gestures of the cameras, the nonlinear global optimization expression is graph optimization, the same scene cannot appear in a plurality of positions, the gesture graph has sparsity characteristics, and the gesture graph is solved by adopting a sparse BA algorithm to correct the gestures of the cameras.
(3) So the detection of the arrival and departure
Although the back-end global optimization is performed to correct the camera pose to a certain extent, there is a problem that after a period of time, when the unmanned aerial vehicle returns to the original point or the same place, the system can distinguish whether the unmanned aerial vehicle is the original point or has come and come, this is the problem to be solved by the local and the current detection, if the same scene is identified by collecting the image matching, the back-end can obtain another optimization information to adjust the track and the map to the local and the current detection result, the global optimization is completed, whether the same place is detected, the similarity of the current frame and all the previous frames needs to be compared, but the longer the time is, the larger the data volume is, so that the real-time performance is greatly reduced, in order to achieve a better effect, the three-dimensional visual RGB space extraction adopts a close-range loop and random loop mode to replace the traversing mode, the close-range loop is to match the current frame with the previous n frames, n is selected by self according to the situation, the random loop is to match the current frame with the previous n frames, n is selected by self according to the situation, and the local and comes after the camera is detected by the random loop.
3. Unmanned airport scene perception and target precision detection system
Unmanned aerial vehicle hardware system
The overall block diagram of the airborne hardware is shown in figure 1 and comprises an airborne computer, an airborne module assembly, a camera and a cloud platform, an M100 four-rotor unmanned aerial vehicle, wherein the M100 four-rotor unmanned aerial vehicle provides a flight platform to achieve basic flight functions, the camera and the cloud platform are acquisition graph acquisition components in an airborne hardware system, the airborne module assembly is a hardware assembly part positioned between the camera cloud platform and the M100 four-rotor unmanned aerial vehicle and the airborne computer, 1) video data of the acquisition camera are sent to the airborne computer, 2) the airborne computer achieves control of the cloud platform through the airborne module assembly, 3) the airborne computer can achieve flight control of the M100 four-rotor unmanned aerial vehicle through the airborne module assembly, 4) video acquisition graph data of the camera can be input into a graph transmission system of the M100 four-rotor unmanned aerial vehicle through the airborne module assembly, and 5. Voltage conversion is achieved through the airborne module assembly, and 24V voltage acquired from a battery of the M100 unmanned aerial vehicle is converted into 12V to supply power for the cloud platform and the airborne computer.
1. Unmanned aerial vehicle hardware system composition
(1) Airborne computer
The onboard computer adopts NVIDIA JETSON TX RTS-ASG003 microcomputer, has total weight of 170g, and has the size of bank card, and is very light.
(2) Airborne module assembly
The on-board module assembly belongs to an intermediate execution processing unit in the whole on-board hardware system, and a system block diagram of the on-board module assembly is shown in fig. 2. The method comprises the following steps of 1) dividing video data output by a high-definition camera into two paths through an HDMI distributor, wherein one path of video data enters a video collector and is output to an onboard computer through the video collector, the other path of video data is output to a wireless image transmission system of an M100 unmanned aerial vehicle through an N1 encoder, 2) a USB-UART-and PWM (universal serial bus-universal asynchronous receiver/transmitter) module is used for controlling flight control of the M100 and control of a cradle head by the onboard computer, 3) a vision sensor is used for realizing autonomous obstacle avoidance of the M100 unmanned aerial vehicle, 4) an RC (remote control) receiver receives control signals of a ground remote controller to control actions of the cradle head, 5) a wireless data transmission module is used for providing a low-bandwidth data link for an onboard hardware system and a ground system, and 6) power is supplied to all components in the onboard computer, the cradle head and an onboard module assembly.
The video acquisition part in the airborne module assembly consists of an HDMI distributor, a video acquisition device and an N1 encoder, wherein the HDMI distributor divides a video stream from a high-definition camera into two paths, and the two paths of video streams respectively enter the video acquisition device and the N1 encoder through HDMI interfaces.
The system comprises an airborne module assembly, a wireless data transmission module, a wireless receiver R7008SB, a receiver, a USB-to-UART (universal asynchronous receiver/transmitter) and a PWM module, wherein the wireless data transmission module of the wireless data transmission module and the wireless data transmission module of the ground end realize a low-bandwidth wireless data transmission data link, the sky end and the ground end provide data paths for bidirectional data transmission, the guildance establishes a relatively independent vision sensing system, the system is provided with five groups of vision ultrasonic combined sensors, environmental information in multiple directions is monitored in real time and obstacles are perceived, the five groups of vision ultrasonic combined sensors are matched with an unmanned aerial vehicle flight controller, the aircraft can avoid possible collision in high-speed flight in time, the RC wireless receiver R7008SB receives control signals of the ground remote controller, PWM waveforms are output after internal processing to control the motion state of a cradle head, the receiver is provided with 16 receiving channels, the USB-to-UART and the PWM module realizes two-to-UART (TTL level) conversion, a UART interface of the module is connected with a UART interface of the M100 unmanned aerial vehicle, the unmanned aerial vehicle flight control is realized, the UART-to the flight control signal can be independently flown by the UART computer, and the control signal can be connected with the cradle head motion state of the cradle head through the module. FIG. 3 illustrates the connection of the USB to UART and PWM module output pins to the cradle head control signal line.
(3) Camera and cradle head
The method is characterized in that a GoPro Hero4 high-definition camera is adopted to collect video data, a MiNi3DPro cradle head is adopted to carry GoProHero high-definition camera to control a camera visual angle, the cradle head is a triaxial cradle head to realize motion control in three directions of pitching, rolling and heading, two control modes of the cradle head are set, one control mode is that an onboard computer outputs PWM waveforms through a USB-UART and PWM module to realize control of the cradle head, and the other control mode is that a ground remote controller controls an onboard cradle head receiver RS7008SB to output PWM waveforms to control the cradle head.
Fig. 4 sets control signals of a pan-tilt navigation axis and a tilt-tilt axis, wherein the control signals of the heading axis and the tilt-tilt axis are PWM waveforms with a period of 50HZ, and position control of the heading axis and the tilt-tilt axis is achieved by adjusting a duty ratio of the control signals, wherein the duty ratio is 5.1% corresponding to a minimum position, 7.6% corresponding to a balance position, and 10.1% corresponding to a maximum position.
The method comprises the steps of setting a mode control signal of a cradle head to realize control of a locking mode, a heading and a pitching following mode and the heading following mode, enabling the cradle head to enter the locking mode when a signal input to a mode control lead is a PWM signal with a period of 50HZ and a duty ratio of between 5% and 6%, locking the heading, pitching and rolling, controlling the heading and pitching through a remote controller or an onboard computer, enabling the cradle head to enter the heading and pitching following mode when the signal input to the mode control lead is the PWM signal with the period of 50HZ and the duty ratio of between 6% and 9%, locking the rolling, enabling the heading to smoothly rotate along with the direction of a machine head, enabling the pitching to rotate along with the elevation of the machine head, and enabling the MiNi3DPro cradle head to enter the heading following mode when the signal input to the mode control lead is the PWM signal with the period of 50HZ and the duty ratio of between 9% and 100%, enabling the heading, pitching and rolling to smoothly rotate along with the direction of the machine head, and controlling the pitching through the remote controller or the onboard computer.
(4) Unmanned aerial vehicle interface
The battery power output interface of the unmanned aerial vehicle is input to the power input interface of the airborne module assembly, so that power supply of airborne equipment is realized. The output and output of the vision obstacle avoidance system are connected with a CAN-Bus of the unmanned aerial vehicle, the vision obstacle avoidance system is matched with a flight controller of the unmanned aerial vehicle to realize autonomous obstacle avoidance, and a power supply and a video interface of the N1 encoder are connected with a special interface on the unmanned aerial vehicle.
2. Unmanned aerial vehicle hardware system integration
The unmanned aerial vehicle on-board module assembly is arranged in the figure 6, a cradle head power supply system is designed, 25V voltage output by an unmanned aerial vehicle battery is output to 19V, 12V and 5V voltage respectively after passing through three DCDC voltage conversion modules of the on-board module assembly, wherein 19V voltage supplies power to an on-board computer, 12V voltage supplies power to the cradle head, 5V voltage supplies power to an RC receiver and an HDMI distributor, an unmanned aerial vehicle platform internal power supply system supplies power to Guidence visual sensors, an N1 encoder, an on-board wireless line transmission and a wireless data transmission respectively, an HDMI video collector and an on-board wireless data transmission take power through a USB interface of the on-board computer, and the camera is powered by a battery carried by the camera.
(II) unmanned aerial vehicle software System design
The integration of the unmanned aerial vehicle software system is completed by adopting an ROS system platform, communication between modules is performed by using a message mechanism of ROS, and meanwhile, the coupling degree of the two modules is very loose, so that the improvement effect can be continued in the later period.
1. Target detection flow
The target detection system adopts Jetson-INFERENCE system and comprises classification, detection and segmentation, wherein the detection module comprises acquisition chart detection, video detection and camera real-time detection, and the video stream is finally decomposed into acquisition chart frames, and the essence of the detection is the acquisition chart detection.
The target detection system converts the video stream into an acquisition image frame, then detects the acquisition image by using a trained network model, and finally obtains the category of the target and the pixel coordinates of the target frame. The output of the target detection system is transmitted to a visual real-time space perception modeling system for positioning and navigation.
2. Three-dimensional visual RGB space extraction process
The three-dimensional visual RGB space extraction system is used for receiving the RGB acquisition image and the depth image, firstly matching the RGB acquisition image to obtain a key frame, then matching the depth acquisition image to construct a point cloud image, then carrying out nonlinear global optimization and the forward and backward detection of the correction point cloud image, and finally receiving the target detection result to finish the target positioning of the three-dimensional space.
3. Unmanned aerial vehicle airborne processing method integration
The overall frame of the unmanned aerial vehicle on-board processing method is shown in fig. 7. The process of the software system airborne processing is that TX2 receives the collected image data from a camera, then the collected image content is detected by a target detection module to obtain the position coordinates of a target, and then the position coordinates are transmitted to a visual real-time space perception modeling system in real time, at the moment, the visual real-time space perception modeling system is patterning according to key frames, and experiments show that as the video stream is decomposed into the collected image frames, the continuous collected image comprises the key frames, and the key frames necessarily also comprise the target detection results, the target detection results corresponding to the key frames are directly reconstructed in a three-dimensional space to realize space content extraction.
4. Unmanned aerial vehicle outdoor application case
The experimental place is selected from a playground with more pedestrians, and the speed and accuracy of target detection on a microcomputer and the adaptability of target detection in a flight state are fully checked.
As shown in fig. 8, from the experimental result, the unmanned aerial vehicle carries the microcomputer and detects the effect very well to intensive crowd with DETECTNET, basically no omission is examined, and the effect is little different under the slow moving state, also can accomplish the task very well under the fast moving state, and the only problem is that only when rotating the visual angle in the twinkling of an eye, the short skew can appear but very fast can get back to the exact position to the detection frame. Experiments prove that the application can perfectly meet the real-time and accurate target detection purpose in practical application.
5. Unmanned aerial vehicle indoor application case
The experimental place is selected in a room with relatively complex scene, and the indoor environment is characterized by narrow space and more barriers. If the unmanned aerial vehicle flies indoors and the GPS navigation system fails, the unmanned aerial vehicle navigation is completed by fully utilizing the mode of combining inertial navigation and visual perception, firstly, the inertial navigation has higher precision when the carrier changes direction instantly, and the error is larger when the unmanned aerial vehicle runs for a long time, then the unmanned aerial vehicle can be positioned by utilizing the scene perception based on real-time space perception modeling, and then the gesture of the carrier detected by the inertial navigation when changing direction instantly is added to the rear-end global optimization stage of the real-time space perception modeling, thereby correcting the gesture of the unmanned aerial vehicle, and simultaneously providing basis for unmanned aerial vehicle navigation and path planning by utilizing the specific spatial position of the obstacle detected by utilizing the target detection. The indoor composition and detection effect are shown in fig. 9:
the output data of the system are the spatial position of the detection target and the gesture of the unmanned aerial vehicle, so that the spatial perception effect can be seen to be accurate, and the accurate target position can be provided for the unmanned aerial vehicle after normalizing the target frame. From this, it can be seen that the implementation of the accurate positioning of the unmanned aerial vehicle in the room can also be realized by combining real-time space perception modeling and target detection.
6. Summary
On the basis of integrating the real-time space perception modeling, the target detection and the unmanned aerial vehicle system, the unmanned aerial vehicle also carries the unmanned aerial vehicle system for controlling, positioning and navigation, path planning and other functions, so that when other applications are integrated, a lot of work is still done on the aspects of cooperative work among the modules and lifting effect. The real-time sensing method and the real-time target detection method are integrated into a microcomputer by using an unmanned aerial vehicle software platform ROS, and an intelligent unmanned aerial vehicle system is formed together with the unmanned aerial vehicle system.
(1) The method realizes SLAM-based spatial scene perception of the unmanned aerial vehicle, establishes a three-dimensional point cloud picture and positioning by utilizing real-time spatial perception modeling, and provides spatial position information for the unmanned aerial vehicle.
(2) The content scene perception based on the DETECTNET target detection method is realized, the position of a specific target in a three-dimensional space is provided for the unmanned aerial vehicle, and the unmanned aerial vehicle can be used for intelligent navigation and path planning.
(3) The unmanned aerial vehicle has no equipment such as a camera and a microcomputer, the research also realizes the hardware constitution design of the intelligent unmanned aerial vehicle, and the frame design of the cooperative work of the unmanned aerial vehicle carrying the specific equipment is completed.

Claims (10)

1.智能无人机实时空间感知和目标精检测系统,其特征在于,通过集成三维视觉RGB空间提取方法完善无人机的空间感知模块,优化目标检测方法给无人机增加实时目标精定位以及目标精识别能力完善内容提取模块,最终通过融合两方法给无人机提供导航和路径规划依据,并建立包括软件硬件在内的智能无人机系统;首先,构建快速深度卷积神经网络特征提取器,学习和提取所有感兴趣目标的特征,用作特定目标检测与识别;然后,利用基于单目的ORB-视觉实时空间感知建模实现特定场景的快速构图;最后,集成目标检测模块将感兴趣目标在视觉实时空间感知建模图中标注出来,并给出具体的位置信息;1. An intelligent UAV real-time spatial perception and target precision detection system, characterized by integrating a 3D visual RGB spatial extraction method to improve the UAV's spatial perception module, optimizing the target detection method to enhance the UAV's real-time target precision positioning and recognition capabilities, and improving the content extraction module. Ultimately, by integrating these two methods, the UAV is provided with a navigation and path planning basis, and an intelligent UAV system, including software and hardware, is established. First, a fast deep convolutional neural network feature extractor is constructed to learn and extract the features of all targets of interest for specific target detection and recognition. Then, a single-purpose ORB-based real-time spatial perception model is used to achieve rapid composition of specific scenes. Finally, the integrated target detection module marks the targets of interest in the visual real-time spatial perception modeling diagram and provides specific location information. 本申请建立室内感兴趣目标的样本库、感兴趣目标特征库,利用基于端到端的快速神经网络实现目标实时检测与匹配,在检测目标的同时利用三维视觉RGB空间提取系统进行实时构图,将识别的感兴趣目标标注在模拟图中,并给出具体的位置信息;核心方法包括:This application establishes a sample library and feature library of indoor targets of interest, uses an end-to-end fast neural network to achieve real-time target detection and matching, and uses a three-dimensional visual RGB space extraction system to perform real-time composition while detecting targets. The identified targets of interest are marked in the simulation map and their specific location information is given. The core methods include: (1)基于视觉实时空间感知建模方法:将深度相机搭载到无人机上,为实时空间感知建模提供RGB采集图和深度采集图,采用实时空间感知建模即三维视觉RGB空间提取完成对空间场景的感知,最终为无人机提供三维点云地图,给出无人机的空间定位,为无人机导航提供数据支持;(1) Based on the visual real-time spatial perception modeling method: the depth camera is mounted on the UAV to provide RGB acquisition images and depth acquisition images for real-time spatial perception modeling. The real-time spatial perception modeling, i.e., 3D visual RGB space extraction, is used to complete the perception of the spatial scene. Finally, a 3D point cloud map is provided for the UAV, which gives the spatial positioning of the UAV and provides data support for the UAV navigation. (2)基于深度学习的目标检测内容提取:基于无人机飞行具有的速度,目标检测网络优化实时处理能力,结合场景感知得到的转换关系,将目标检测框出目标转化到三维空间,为无人机提供各种目标的空间位置,并利用空间关系无人机做避障和路径规划;(2) Object detection content extraction based on deep learning: Based on the speed of UAV flight, the object detection network optimizes real-time processing capabilities, combines the conversion relationship obtained by scene perception, transforms the target detected into three-dimensional space, provides the spatial position of various targets for the UAV, and uses the spatial relationship to perform obstacle avoidance and path planning. (3)建立无人机实时感知及目标检测系统:基于对实时场景感知和实时目标检测的优化,将两个模块的实现嵌入到微型计算机,建立包括软硬件在内的适用于无人机的实时感知及目标检测的智能系统。(3) Establish a real-time perception and target detection system for UAVs: Based on the optimization of real-time scene perception and real-time target detection, the implementation of the two modules is embedded in a microcomputer to establish an intelligent system including software and hardware for real-time perception and target detection suitable for UAVs. 2.根据权利要求1所述智能无人机实时空间感知和目标精检测系统,其特征在于,目标实时精检测网络数据格式:训练数据样本包括多个对象的大采集图,对于采集图中的每个对象,训练标签不仅包括对象的类,还包括边界框的各个角点的坐标,不同训练采集图之间的对象数量不同,通过引入一个固定的三维标签格式解决对不同长度和维度的标签格式的选择使损失函数定义困难的问题,定义格式能够输入包括任意多个对象的任何尺寸的采集图;2. The intelligent unmanned aerial vehicle real-time spatial perception and target precision detection system according to claim 1, characterized in that the real-time target precision detection network data format: the training data samples include a large collection image of multiple objects, and for each object in the collection image, the training label includes not only the object class but also the coordinates of each corner point of the bounding box. The number of objects varies between different training collection images. By introducing a fixed three-dimensional label format, the problem of difficulty in defining the loss function due to the selection of label formats of different lengths and dimensions is solved. The defined format can input collection images of any size containing any number of objects; 用规则网格分割采集图,网格大小略小于期望检测的最小对象,每个网格都有两个关键信息,包括对象的类别和包括对象网格的角点坐标,另外,在网格中没有对象的情况下,使用特殊的自定义类即“dontcare”类,以便在数据表示上统一保持固定大小,并且还设置以0或1表示的对象覆盖值来表示网格是否存在对象,对于许多对象处于同一网格的情况,选择在网格中占用像素最多的对象,在对象有重叠的情况下,使用具有最小Y值的边界框的对象。The collected image is divided into regular grids with a grid size slightly smaller than the smallest object expected to be detected. Each grid has two key pieces of information, including the category of the object and the coordinates of the corner points of the grid that contains the object. In addition, when there is no object in the grid, a special custom class, the "dontcare" class, is used to uniformly maintain a fixed size in data representation, and an object coverage value represented by 0 or 1 is also set to indicate whether there is an object in the grid. For the case where many objects are in the same grid, the object that occupies the most pixels in the grid is selected. In the case of overlapping objects, the object with the bounding box with the smallest Y value is used. 3.根据权利要求1所述智能无人机实时空间感知和目标精检测系统,其特征在于,目标实时精检测网络训练分为三步:3. The intelligent unmanned aerial vehicle real-time spatial perception and target precision detection system according to claim 1 is characterized in that the target real-time precision detection network training is divided into three steps: 第一步:数据层获取训练采集图和标签,转换层进行在线数据增强;Step 1: The data layer obtains training collection images and labels, and the conversion layer performs online data enhancement; 第二步:全卷积网络对每个网格的对象类和边界框进行特征提取和预测;Step 2: The fully convolutional network extracts and predicts the object class and bounding box of each grid; 第三步:分别预测每个网格的对象类别和目标边界框,然后用损失函数同时计算两个预测任务的误差;Step 3: Predict the object category and target bounding box for each grid separately, and then use the loss function to calculate the error of the two prediction tasks simultaneously; 预测过程包括两点:一是在验证过程中,用一个聚类函数生成最终的边框集;二是在验证数据集上通过简化的mAP计算值来衡量模型性能;The prediction process consists of two steps: first, during the validation process, a clustering function is used to generate the final bounding box set; second, the model performance is measured by a simplified mAP calculation value on the validation dataset; 网络接受不同大小的输入采集图,并有效地以一种带步长的滑动窗口方式应用CNN,输出一个多维数组,叠加在采集图上,使用删除最终池化层的GoogLeNet使得CNN使用范围最大为555×555像素和16像素步长的滑动窗口;The network accepts input images of varying sizes and effectively applies a CNN in a sliding window fashion with a stride, outputting a multidimensional array that is superimposed on the image. Using GoogLeNet with the final pooling layer removed, the CNN uses a sliding window of up to 555×555 pixels with a stride of 16 pixels. 使用两个独立的损失函数的线性组合来生成最终的优化损失函数,损失函数包括在训练数据样本中,所有网格的真实和预测对象覆盖之间差的平方和,在每个网格所覆盖的对象的边界框的真实和预测角点的平均绝对差值损失。The final optimized loss function is generated using a linear combination of two independent loss functions, which include the sum of squares of the difference between the true and predicted object coverage of all grids in the training data samples, and the mean absolute difference loss between the true and predicted corner points of the bounding box of the object covered by each grid. 4.根据权利要求1所述智能无人机实时空间感知和目标精检测系统,其特征在于,实时空间感知建模的流程包括:4. The intelligent unmanned aerial vehicle real-time spatial perception and target precision detection system according to claim 1, wherein the process of real-time spatial perception modeling includes: 流程一,传感器采集图数据读取:在实时空间感知建模中为无人机相机采集图信息的读取和预处理,深度相机的数据包括RGB采集图和与其对应的深度图;Process 1: Reading sensor image data: This involves reading and preprocessing the image information collected by the drone camera in real-time spatial perception modeling. The depth camera data includes the RGB image and its corresponding depth map. 流程二,视觉里程计建模:视觉里程计通过估算两两相邻采集图间的旋转和平移关系,计算相机的姿态变化,以及局部地图,这一步的关键是特征点提取和采集图匹配;Process 2: Visual odometry modeling: Visual odometry estimates the rotation and translation relationship between two adjacent captured images to calculate the camera's pose change and local map. The key to this step is feature point extraction and captured image matching. 流程三,后端全局优化:后端采用非线性全局优化算法,对来自前端的相机位置和姿态以及另外一个线程得出的故地往来检测结果进行优化,纠正出一个全局统一的轨迹图和点云图;Process 3: Back-end global optimization: The back-end uses a nonlinear global optimization algorithm to optimize the camera position and posture from the front-end and the past-and-back detection results obtained by another thread, correcting them to a globally unified trajectory map and point cloud map. 流程四,故地往来检测:判断传感器或搭载传感器的无人机所经过的场景是否到过,如果检测出曾经到过某地,把信息提供给后端重新纠正位置和姿态;Process 4: Past-location detection: This process determines whether the sensor or the drone carrying the sensor has passed through a certain scene before. If it is detected that the sensor has been there before, the information is provided to the backend to correct the position and posture. 流程四,构图:根据估算的相机轨迹,建立符合任务要求的无人机巡航地图。Process 4, composition: Based on the estimated camera trajectory, a drone cruise map that meets the mission requirements is created. 5.根据权利要求1所述智能无人机实时空间感知和目标精检测系统,其特征在于,实时空间感知建模框架:5. The intelligent unmanned aerial vehicle real-time spatial perception and target precision detection system according to claim 1 is characterized in that the real-time spatial perception modeling framework: 1)视觉里程计建模1) Visual odometry modeling 前端负责接收相机的视频流即采集图序列,通过特征匹配方法估算相机在相邻帧间的运动,初步获得具有一定误差累积的里程信息,视觉里程计建模由四部分组成:The front-end receives the camera's video stream, or captured image sequences, and estimates the camera's motion between adjacent frames using feature matching methods. This allows for the initial acquisition of mileage information with a certain degree of error accumulation. Visual odometry modeling consists of four parts: 第一,采集图帧:携带的信息包括拍摄该帧采集图时的无人机相机位姿、RGB采集图、深度图;First, the acquisition frame: The information carried includes the drone camera pose, RGB acquisition image, and depth map when the frame was captured; 第二,相机模型:与实际拍摄时的相机对应,只包括内部参数;Second, the camera model: corresponds to the camera used in the actual shooting, and only includes internal parameters; 第三,局部地图:包括关键帧和路标信息点,符合匹配规则的关键帧和路标信息点将会添加进地图中,地图只是局部地图并不是全局地图,只包括当前位置附近的路标信息点,较远的路标信息点被删除;Third, local map: including keyframes and landmark information points. Keyframes and landmark information points that meet the matching rules will be added to the map. The map is only a local map, not a global map. It only includes landmark information points near the current location, and landmark information points farther away are deleted. 第四,路标信息点:是地图中具有已知信息的地图点,其中路标信息点包括的已知信息是与它们对应的特征描述,获取方式是运用特征匹配算法来批量提取;Fourth, landmark information points: These are map points with known information. The known information included in the landmark information points is the feature description corresponding to them. The acquisition method is to use feature matching algorithms to extract them in batches. 2)后端全局优化2) Backend global optimization 全局过程对数据进行分析处理噪声问题,包括线性全局优化和非线性全局优化算法;线性全局优化假设拍摄过程中的每一帧采集图存在线性关系,运用卡尔曼滤波算法进行状态估算,如果是假设只有前一帧和后一帧存在线性关系,此时通过扩展卡尔曼滤波完成状态估算;计算观测值和算法估算值之间的差值,即像素坐标与通过相机位资将对应3D点投影到二维平面的像素坐标的误差值,线性全局优化默认相机位资和空间点的误差产生具有因果关系,先求相机位置和姿态,然后根据相机位资再进一步求得空间点的位置;非线性全局优化直接把所有数据放入同一模型优化求解,淡化数据之间的前后关系;The global process analyzes the data and handles noise problems, including linear global optimization and nonlinear global optimization algorithms. Linear global optimization assumes that there is a linear relationship between each frame of the acquisition image during the shooting process, and uses the Kalman filter algorithm to estimate the state. If it is assumed that only the previous frame and the next frame have a linear relationship, the state estimation is completed by the extended Kalman filter. The difference between the observed value and the algorithm estimate is calculated, that is, the error value between the pixel coordinates and the pixel coordinates of the corresponding 3D point projected onto the two-dimensional plane through the camera position. Linear global optimization assumes that the error between the camera position and the spatial point has a causal relationship. The camera position and posture are first calculated, and then the position of the spatial point is further calculated based on the camera position. Nonlinear global optimization directly puts all data into the same model for optimization and solution, downplaying the relationship between the data. 3)故地往来检测3) Testing of travel to and from the old place 故地往来检测纠正视觉里程计的误差累计,关键在于建立词袋模型,词袋模型把特征抽象成一个个单词,检测的过程是把两幅图中出现的单词做匹配来判断两幅图是否描述的是同一个场景,要把特征归类成单词,需要训练一个包括所有可能的单词集合的字典,建立字典需要海量的数据进行训练;字典的建立是一个聚类过程,假设从所有图片中一共提取了1亿个特征,使用K均值聚类方法把它们聚成十万个单词,为字典在训练过程中构建一个有k个分支,深度为d的树,树的上层结点提供粗分类,下层结点提供细分类,一直延伸到叶子结点,利用这个树,将时间复杂度降低到对数级别,加快特征匹配的速度;The key to correcting the accumulated errors of the visual odometry for past-time detection lies in building a bag-of-words model. This model abstracts features into individual words. The detection process involves matching the words that appear in two images to determine whether they depict the same scene. To classify features into words, it is necessary to train a dictionary that includes all possible word sets. Building this dictionary requires massive amounts of data for training. Dictionary building is a clustering process. Assuming that a total of 100 million features are extracted from all images, they are clustered into 100,000 words using the K-means clustering method. During the dictionary training process, a tree with k branches and a depth of d is constructed. The upper nodes of the tree provide coarse classification, and the lower nodes provide fine classification, extending all the way to the leaf nodes. Using this tree, the time complexity is reduced to logarithmic level, accelerating feature matching. 4)构图4) Composition 用相机采集的数据经过优化后加上相机姿态的纠正将二维平面点转化到三维空间,形成三维空间点云信息,除了点云地图,将相机姿态的优化过程在给g2o工具中表示出来形成姿态图,根据具体情况还可以自己定义地图进行描述。The data collected by the camera is optimized and the camera posture is corrected to convert the two-dimensional plane points into three-dimensional space, forming three-dimensional space point cloud information. In addition to the point cloud map, the camera posture optimization process is represented in the g2o tool to form a posture graph. Depending on the specific situation, you can also define your own map for description. 6.根据权利要求1所述智能无人机实时空间感知和目标精检测系统,其特征在于,三维视觉RGB空间提取方法:6. The intelligent unmanned aerial vehicle real-time spatial perception and target precision detection system according to claim 1, characterized in that the three-dimensional visual RGB space extraction method: 传感器采用获取深度信息的单目相机,数据源包括RGB采集图和深度图。The sensor uses a monocular camera to obtain depth information, and the data source includes RGB acquisition image and depth map. 用深度相机得到彩色采集图和深度采集图,用几何模型将2D的平面数据转到3D的立体空间,从像素坐标到采集图坐标的:坐标系中心点不同,两者只存在偏移关系,从采集图坐标系到相机坐标系坐标轴平行,只存在缩放关系,从像素坐标系到相机坐标系的转换关系表示为:Use the depth camera to obtain the color acquisition image and the depth acquisition image, and use the geometric model to convert the 2D plane data into a 3D stereo space. From the pixel coordinates to the acquisition image coordinates: the center points of the coordinate systems are different, and there is only an offset relationship between the two. From the acquisition image coordinate system to the camera coordinate system, the coordinate axes are parallel, and there is only a scaling relationship. The conversion relationship from the pixel coordinate system to the camera coordinate system is expressed as: 其中u,v是坐标系原点之间的偏移量也是采集图平面中心,dx,dy是像素坐标和实际成像平面的缩放比例,缩放比例为dx=zc/fx,dy=zc/fy,fx,fy为相机在x,y两个轴上的焦距,写成矩阵的形式为:Where u and v are the offsets between the origins of the coordinate system and are also the center of the acquisition plane. dx and dy are the scaling ratios between the pixel coordinates and the actual imaging plane. The scaling ratios are dx = zc / fx and dy = zc / fy . fx and fy are the focal lengths of the camera on the x and y axes, respectively. Written in matrix form: 相机运动时相机坐标系和世界坐标系不平行,存在旋转和平移关系,在后续的视觉里程计计算时,前后两帧之间的关系同上,给出矩阵关系为:When the camera moves, the camera coordinate system and the world coordinate system are not parallel, and there is a rotation and translation relationship. In the subsequent visual odometry calculation, the relationship between the previous and next frames is the same as above, and the matrix relationship is given as: 把二维平面的点转换到三维空间,最后得到一系列点云数据,赋予RGB颜色属性初步得到一副彩色三维地图。Convert the points on the two-dimensional plane to three-dimensional space, and finally obtain a series of point cloud data, which are assigned RGB color attributes to initially obtain a color three-dimensional map. 7.根据权利要求1所述智能无人机实时空间感知和目标精检测系统,其特征在于,三维视觉RGB空间提取实现:7. The intelligent UAV real-time spatial perception and target precision detection system according to claim 1, characterized in that the 3D visual RGB space extraction is achieved: (1)前端视觉里程计(1) Front-end visual odometry 初始化是以第一帧采集图为基准,开始寻找关键帧,采集图两两之间的匹配采用ORB算法提取关键点,然后再对每个关键点计算BRIEF描述子,最后采用快速近似最邻近算法进行快速匹配,其中ORB角点提取算法在FAST角点提取算法上增加尺度和旋转的描述,增加特征信息,特征描述更丰富,匹配精度高,构图更准确可靠,BRIEF描述子采用二进制描述子,并使用随机选点比较;Initialization is based on the first frame of the acquisition image, and the search for key frames begins. The ORB algorithm is used to extract key points between the acquisition images, and then the BRIEF descriptor is calculated for each key point. Finally, the fast approximate nearest neighbor algorithm is used for fast matching. The ORB corner point extraction algorithm adds scale and rotation descriptions to the FAST corner point extraction algorithm, adding feature information, richer feature descriptions, higher matching accuracy, and more accurate and reliable composition. The BRIEF descriptor uses a binary descriptor and uses randomly selected points for comparison. 在匹配完成后再根据深度采集图,将2D点投影到3D空间,获取一系列点的2D坐标和对应的3D坐标,通过求解PnP问题对相机的位资进行估算,实际的计算结果就是前后两帧采集图间的旋转和平移矩阵,对所有数据按顺序两两匹配并计算相机姿态,最终得到一个完整的视觉里程计;After matching is completed, the 2D points are projected into 3D space based on the depth acquisition image to obtain the 2D coordinates and corresponding 3D coordinates of a series of points. The camera's position is estimated by solving the PnP problem. The actual calculation result is the rotation and translation matrix between the two frames of acquisition. All data are matched pairwise in sequence and the camera pose is calculated to obtain a complete visual odometry. (2)后端非线性全局优化(2) Back-end nonlinear global optimization 三维视觉RGB空间提取通过姿态图来表述视觉里程计计算好的姿态,包括节点和边,节点表示各个相机姿态,边表示相机姿态之间的变换,姿态图不仅直观的描述视觉里程计,也便于理解相机姿态的变化,非线性全局优化表述为图优化,相同场景不会出现在很多位置中,使得姿态图具有稀疏性特征,采用稀疏BA算法求解姿态图来对相机姿态进行纠正。Three-dimensional visual RGB space extraction uses a pose graph to represent the pose calculated by the visual odometry, including nodes and edges. Nodes represent the poses of each camera, and edges represent the transformation between camera poses. The pose graph not only intuitively describes the visual odometry but also facilitates understanding of changes in camera poses. Nonlinear global optimization is expressed as graph optimization. The same scene does not appear in many locations, making the pose graph sparse. A sparse BA algorithm is used to solve the pose graph to correct the camera pose. 8.根据权利要求1所述智能无人机实时空间感知和目标精检测系统,其特征在于,无人机硬件系统:包括以下四个部分:机载计算机;机载模块总成;相机和云台;M100四旋翼无人机,其中M100四旋翼无人机提供飞行平台,实现基本的飞行功能;相机和云台为机载硬件系统中的采集图获取组件;机载模块总成是处于相机云台,M100四旋翼无人机和机载计算机中间的一个硬件总成部分:1)采集相机的视频数据发送给机载计算机,2)机载计算机通过机载模块总成实现云台的控制,3)通过机载模块总成,机载计算机能实现对M100四旋翼无人机的飞行控制,4)相机的视频采集图数据通过机载模块总成能输入到M100四旋翼无人机的图传系统;5.电压转换,机载模块总成将从M100无人机电池获取的24V电压转换成12V给云台和机载计算机供电;8. The intelligent UAV real-time spatial perception and target precision detection system according to claim 1 is characterized in that the UAV hardware system includes the following four parts: an onboard computer; an onboard module assembly; a camera and a gimbal; an M100 quad-rotor UAV, wherein the M100 quad-rotor UAV provides a flight platform to realize basic flight functions; the camera and the gimbal are image acquisition components in the onboard hardware system; the onboard module assembly is a hardware assembly part located between the camera gimbal, the M100 quad-rotor UAV and the onboard computer: 1) the video data of the camera is collected and sent to the onboard computer, 2) the onboard computer controls the gimbal through the onboard module assembly, 3) the onboard computer can realize flight control of the M100 quad-rotor UAV through the onboard module assembly, 4) the video acquisition image data of the camera can be input into the image transmission system of the M100 quad-rotor UAV through the onboard module assembly; 5. voltage conversion, the onboard module assembly converts the 24V voltage obtained from the M100 UAV battery into 12V to power the gimbal and the onboard computer; (1)机载计算机(1) Onboard computer 机载计算机采用NVIDIA JETSON TX2 RTS-ASG003微型计算机,总重量170g;The onboard computer uses the NVIDIA JETSON TX2 RTS-ASG003 microcomputer, with a total weight of 170g; (2)机载模块总成(2) Airborne module assembly 机载模块总成在整个机载硬件系统中属于一个中间执行处理单元,内部组成及实现如下:1)高清摄像机输出的视频数据经过HDMI分配器分成两路,一路进视频采集器,由视频采集器输出给机载计算机;另外一路经N1编码器输出给M100无人机的无线图传系统;2)USB转UART及PWM模块用于机载计算机控制M100的飞行控制及云台的控制;3)视觉传感器实现M100无人机的自主避障;4)RC接收机接收地面遥控器的控制信号来控制云台的动作;5)无线数传模块给机载硬件系统和地面系统提供一个低带宽的数据链路;6)给机载计算机,云台及机载模块总成内部的各组件供电;The airborne module assembly is an intermediate execution processing unit in the entire airborne hardware system. Its internal composition and implementation are as follows: 1) The video data output by the high-definition camera is divided into two paths through the HDMI distributor. One path goes to the video collector and is output by the video collector to the airborne computer; the other path is output to the wireless image transmission system of the M100 UAV through the N1 encoder; 2) The USB to UART and PWM modules are used by the airborne computer to control the flight control and gimbal control of the M100; 3) The visual sensor realizes autonomous obstacle avoidance of the M100 UAV; 4) The RC receiver receives the control signal of the ground remote control to control the movement of the gimbal; 5) The wireless data transmission module provides a low-bandwidth data link between the airborne hardware system and the ground system; 6) It supplies power to the airborne computer, gimbal, and various components within the airborne module assembly; 机载模块总成中视频采集部分由HDMI分配器,视频采集器和N1编码器三部分组成,HDMI分配器将来自高清相机的视频流分成两路,两路视频流通过HDMI接口分别进入视频采集器和N1编码器;The video acquisition part of the airborne module assembly consists of three parts: an HDMI distributor, a video collector, and an N1 encoder. The HDMI distributor splits the video stream from the HD camera into two channels, which then enter the video collector and N1 encoder respectively through the HDMI interface. 机载模块总成中:无线数传模块和地面端的无线数传模块实现一条低带宽的无线数传数据链路,天空端和地面端提供双向数据传输的数据路径;Guidance建立相对独立的视觉传感系统,系统配备五组视觉超声波组合传感器,实时监测多个方向的环境信息并感知障碍物,与无人机飞行控制器配合,可使得飞行器在高速飞行中对可能发生的碰撞及时避让;RC无线接收机R7008SB接收地面遥控器的控制信号,经内部处理后输出PWM波形以控制云台运动状态的控制,接收机设置16个接收通道;USB转UART及PWM模块实现两部分功能,完成USB到URAT(TTL电平)转换,将模块的UART接口与M100无人机的UART接口相连,实现机载计算机对M100无人机的飞行控制,实现无人机的自主飞行;USB转PWM使得机载计算机通过该模块能输出PWM控制信号,将模块输出的PWM控制信号与云台的航向,俯仰控制信号相连,实现机载计算机对云台的运动状态控制;In the airborne module assembly, the wireless data transmission module and the ground-side wireless data transmission module establish a low-bandwidth wireless data transmission link, providing a bidirectional data path for data transmission between the air and ground terminals. The Guidance establishes a relatively independent visual sensing system equipped with five sets of visual and ultrasonic combination sensors to monitor the environment in multiple directions in real time and detect obstacles. In conjunction with the UAV flight controller, it can timely avoid potential collisions during high-speed flight. The RC wireless receiver R7008SB receives control signals from the ground remote control, processes them internally, and outputs PWM waveforms to control the gimbal's motion. The receiver has 16 receive channels. The USB to UART and PWM module performs two functions: completing USB to UART (TTL level) conversion, connecting the module's UART interface to the M100's UART interface, enabling the onboard computer to control the M100's flight, and achieving autonomous flight. The USB to PWM converter enables the onboard computer to output PWM control signals through the module. The module's PWM control signals are then connected to the gimbal's heading and pitch control signals, enabling the onboard computer to control the gimbal's motion. (3)相机及云台(3) Camera and gimbal 采用GoPro Hero4高清像机以采集视频数据,采用MiNi3DPro云台搭载GoProHero4高清像机,以实现相机视角的控制,云台为三轴云台,实现俯仰,横滚,航向三个方向的运动控制,云台的控制方式设置两种,一种是是机载计算机通过USB转UART及PWM模块输出PWM波形来实现对云台的控制,另外一种是地面遥控器控制机载云台接收机RS7008SB输出PWM波形来控制云台;A GoPro Hero4 HD camera is used to collect video data, and a MiNi3DPro gimbal is used with the GoPro Hero4 HD camera to control the camera's viewing angle. The gimbal is a three-axis gimbal that can achieve motion control in three directions: pitch, roll, and heading. There are two control methods for the gimbal: one is to use the onboard computer to control the gimbal through the USB to UART and PWM module to output PWM waveforms to achieve control, and the other is to use the ground remote control to control the onboard gimbal receiver RS7008SB to output PWM waveforms to control the gimbal. 设置云台航航轴和俯仰轴的控制信号,航向轴和俯仰轴的控制信号均为周期为50HZ的PWM波形,通过调整控制信号的占空比实现航向和俯仰轴的位置控制,其中占空比为5.1%时对应最小位置,7.6%对应平衡位置,10.1%对应最大位置;Set the control signals for the pan and tilt axes of the gimbal. The control signals for both the pan and tilt axes are PWM waveforms with a period of 50 Hz. Position control of the pan and tilt axes is achieved by adjusting the duty cycle of the control signals. A duty cycle of 5.1% corresponds to the minimum position, 7.6% corresponds to the equilibrium position, and 10.1% corresponds to the maximum position. 设置云台的模式控制信号,实现锁定模式,航向和俯仰跟随模式,航向跟随模式三种模式的控制,当输入到模式控制引线的信号为周期50HZ,占空比在5%至6%之间的PWM信号时,云台进入锁定模式,此时航向,俯仰和横滚都锁定,航向和俯仰通过遥控器或机载计算机控制;当输入到模式控制引线的信号为周期50HZ,占空比在6%至9%之间的PWM信号时,云台进入航向和俯仰跟随模式,此时横滚锁定,航向随机头的方向平滑转动,俯仰随飞机的仰角转动;当输入到模式控制引线的信号为周期50HZ,占空比在9%至100%之间的PWM信号时,MiNi3DPro云台进入航向跟随模式,此时航向,俯仰和横滚都锁定,航向随机头的方向平滑转动,俯仰通过遥控器或机载计算机控制。Set the gimbal's mode control signal to achieve control in three modes: lock mode, heading and pitch follow mode, and heading follow mode. When the signal input to the mode control pin is a PWM signal with a period of 50 Hz and a duty cycle between 5% and 6%, the gimbal enters lock mode, in which the heading, pitch, and roll are all locked, and the heading and pitch are controlled by the remote control or onboard computer. When the signal input to the mode control pin is a PWM signal with a period of 50 Hz and a duty cycle between 6% and 9%, the gimbal enters heading and pitch follow mode, in which the roll is locked, the heading rotates smoothly in the direction of the head, and the pitch rotates with the aircraft's pitch angle. When the signal input to the mode control pin is a PWM signal with a period of 50 Hz and a duty cycle between 9% and 100%, the MiNi3D Pro gimbal enters heading follow mode, in which the heading, pitch, and roll are all locked, the heading rotates smoothly in the direction of the head, and the pitch is controlled by the remote control or onboard computer. 9.根据权利要求1所述智能无人机实时空间感知和目标精检测系统,其特征在于,无人机硬件系统集成:设置无人机机载模块总成各单元模块,云台供电系统设计,无人机电池输出的25V电压经机载模块总成的三路DCDC电压转换模块后分别输出19V、12V及5V的电压,其中19V电压给机载计算机供电;12V电压给云台供电;5V电压给RC接收机,HDMI分配器供电,无人机平台内部供电系统分别给Guidence视觉传感器、N1编码器、机载无线图传、无线数传供电,HDMI视频采集器和机载无线数传通过机载计算机的USB接口取电,相机由相机自带的电池供电。9. The intelligent unmanned aerial vehicle real-time spatial perception and target precision detection system according to claim 1 is characterized in that the unmanned aerial vehicle hardware system is integrated: the unit modules of the unmanned aerial vehicle airborne module assembly are set, the gimbal power supply system is designed, the 25V voltage output by the unmanned aerial vehicle battery is output as 19V, 12V and 5V voltages respectively after passing through the three-way DCDC voltage conversion module of the airborne module assembly, wherein the 19V voltage is used to power the onboard computer; the 12V voltage is used to power the gimbal; the 5V voltage is used to power the RC receiver and the HDMI distributor; the internal power supply system of the unmanned aerial vehicle platform is used to power the Guidence visual sensor, N1 encoder, airborne wireless image transmission, and wireless data transmission respectively; the HDMI video collector and the airborne wireless data transmission are powered by the USB interface of the onboard computer, and the camera is powered by the camera's own battery. 10.根据权利要求1所述智能无人机实时空间感知和目标精检测系统,其特征在于,无人机软件系统的集成采用ROS系统平台来完成,并使用ROS的消息机制进行模块之间的通讯,同时两个模块的耦合程度很松;10. The intelligent UAV real-time spatial perception and target precision detection system according to claim 1, wherein the integration of the UAV software system is completed using the ROS system platform, and the ROS message mechanism is used for communication between modules, and the coupling degree between the two modules is very loose; 1)目标检测流程1) Target Detection Process 目标检测系统采用Jetson-Inference系统,包括分类、检测、分割,其中检测模块包括采集图检测、视频检测、还有相机实时检测,视频流最终分解为采集图帧,这些检测的本质都是采集图检测;The object detection system uses the Jetson Inference system, which includes classification, detection, and segmentation. The detection modules include image acquisition detection, video detection, and real-time camera detection. The video stream is ultimately decomposed into acquisition frames. The essence of these detections is image acquisition detection. 目标检测系统通过将视频流转化为采集图帧,然后运用已经训练好的网络模型对采集图进行检测,最终得到目标的类别和目标框的像素坐标,目标检测系统的输出将传给视觉实时空间感知建模系统进行定位和导航;The target detection system converts the video stream into captured image frames and then uses the trained network model to detect the captured images. It ultimately obtains the target category and pixel coordinates of the target frame. The output of the target detection system is then passed to the visual real-time spatial perception modeling system for positioning and navigation. 2)三维视觉RGB空间提取流程2) 3D visual RGB space extraction process 三维视觉RGB空间提取系统要接收RGB采集图和深度图,先匹配RGB采集图获得关键帧,再配合深度采集图构建点云图,然后是非线性全局优化和故地往来检测纠正点云图,最后接收目标检测的结果完成三维空间的目标定位;The 3D visual RGB space extraction system receives RGB acquisition images and depth maps, first matches the RGB acquisition images to obtain key frames, then builds a point cloud image with the depth acquisition image, then corrects the point cloud image through nonlinear global optimization and local detection, and finally receives the target detection results to complete the target positioning in 3D space. 3)无人机机载处理方法集成3) Integration of UAV onboard processing methods 软件系统机载处理的过程是TX2接收到来自相机的采集图数据后先由目标检测模块对采集图内容进行检测得出目标的位置坐标,然后实时传给视觉实时空间感知建模系统,此时视觉实时空间感知建模系统正在根据关键帧构图,把关键帧对应的目标检测结果在三维空间中重建实现空间内容提取。The onboard processing process of the software system is that after TX2 receives the captured image data from the camera, the target detection module first detects the content of the captured image to obtain the target position coordinates, and then transmits it to the visual real-time spatial perception modeling system in real time. At this time, the visual real-time spatial perception modeling system is composing the key frames and reconstructing the target detection results corresponding to the key frames in three-dimensional space to realize spatial content extraction.
CN202510396604.0A 2025-04-01 2025-04-01 Intelligent UAV real-time spatial perception and target precision detection system Pending CN120495616A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510396604.0A CN120495616A (en) 2025-04-01 2025-04-01 Intelligent UAV real-time spatial perception and target precision detection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510396604.0A CN120495616A (en) 2025-04-01 2025-04-01 Intelligent UAV real-time spatial perception and target precision detection system

Publications (1)

Publication Number Publication Date
CN120495616A true CN120495616A (en) 2025-08-15

Family

ID=96662469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510396604.0A Pending CN120495616A (en) 2025-04-01 2025-04-01 Intelligent UAV real-time spatial perception and target precision detection system

Country Status (1)

Country Link
CN (1) CN120495616A (en)

Similar Documents

Publication Publication Date Title
Xu et al. Power line-guided automatic electric transmission line inspection system
CN111599001B (en) Unmanned aerial vehicle navigation map construction system and method based on image three-dimensional reconstruction technology
CN110956651B (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
Jiang et al. Unmanned Aerial Vehicle-Based Photogrammetric 3D Mapping: A survey of techniques, applications, and challenges
CN112734765B (en) Mobile robot positioning method, system and medium based on fusion of instance segmentation and multiple sensors
Huang et al. Structure from motion technique for scene detection using autonomous drone navigation
McGee et al. Obstacle detection for small autonomous aircraft using sky segmentation
US20200301015A1 (en) Systems and methods for localization
CN113485441A (en) Distribution network inspection method combining unmanned aerial vehicle high-precision positioning and visual tracking technology
CN106570820A (en) Monocular visual 3D feature extraction method based on four-rotor unmanned aerial vehicle (UAV)
CN117036989A (en) Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision
CN115291536B (en) Verification method of semi-physical simulation platform for UAV tracking ground targets based on vision
CN116989772B (en) An air-ground multi-modal multi-agent collaborative positioning and mapping method
CN111831010A (en) A UAV Obstacle Avoidance Flight Method Based on Digital Space Slicing
Florea et al. Wilduav: Monocular uav dataset for depth estimation tasks
CN117636284A (en) Unmanned aerial vehicle autonomous landing method and device based on visual image guidance
Esfahani et al. A new approach to train convolutional neural networks for real-time 6-dof camera relocalization
Chen et al. Emergency uav landing on unknown field using depth-enhanced graph structure
Wang et al. Real-time Aircraft Bracket Junction Point Detection for Split Flying Vehicle Module Docking
Zheng et al. Air2Land: A deep learning dataset for unmanned aerial vehicle autolanding from air to land
CN120495616A (en) Intelligent UAV real-time spatial perception and target precision detection system
Wang et al. Online drone-based moving target detection system in dense-obstructer environment
Fu et al. Cooperative Target Detection and Recognition for Multiple Flapping-Wing Robots
Christie et al. Training object detectors with synthetic data for autonomous uav sampling applications
Krerngkamjornkit et al. Human body detection in search and rescue operation conducted by unmanned aerial vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载