CN120495616A - Intelligent UAV real-time spatial perception and target precision detection system - Google Patents
Intelligent UAV real-time spatial perception and target precision detection systemInfo
- Publication number
- CN120495616A CN120495616A CN202510396604.0A CN202510396604A CN120495616A CN 120495616 A CN120495616 A CN 120495616A CN 202510396604 A CN202510396604 A CN 202510396604A CN 120495616 A CN120495616 A CN 120495616A
- Authority
- CN
- China
- Prior art keywords
- camera
- real
- unmanned aerial
- aerial vehicle
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Image Analysis (AREA)
Abstract
According to the intelligent unmanned aerial vehicle real-time space sensing and target fine detection system, a space sensing module of the unmanned aerial vehicle is perfected through an integrated three-dimensional visual RGB space extraction method, a real-time target fine positioning and target fine recognition capability perfecting content extraction module is added to the unmanned aerial vehicle through an optimized target detection method, navigation and path planning basis is provided for the unmanned aerial vehicle through fusion of the two methods, and an intelligent unmanned aerial vehicle system comprising software and hardware is built. The unmanned aerial vehicle and JETSON TX are used as carrying and developing platforms, the combination of visual real-time space perception modeling and target detection methods of the unmanned aerial vehicle in a specific scene is realized, the unmanned aerial vehicle is used for unmanned aerial vehicles with flight speed limited by the scene, the space perception efficiency is high, the target detection speed is high, the accuracy is high, and obstacle avoidance and path planning are successfully completed under the condition that the environment is unknown and the unmanned aerial vehicle is required to perform obstacle avoidance flight and exploration.
Description
Technical Field
The application relates to an unmanned aerial vehicle space perception target detection system, in particular to an intelligent unmanned aerial vehicle real-time space perception and target precision detection system, and belongs to the technical field of unmanned aerial vehicle target detection.
Background
Along with the continuous promotion of unmanned aerial vehicle using value, the performance also continuously promotes, and unmanned aerial vehicle not only carries on simple camera and is used for recording the video, still begins to carry special equipment such as depth camera, laser radar and accomplishes specific task. The application direction of the unmanned aerial vehicle at present comprises vegetation protection, street view shooting, electric power inspection, post-disaster rescue and the like, but is basically a basic application. Obstacle avoidance and path planning have been problems in unmanned aerial vehicle applications, which relate to the spatial perception of the unmanned aerial vehicle to the surrounding environment and content extraction based on target recognition, and at present, unmanned aerial vehicles are required to solve the problems of obstacle avoidance and path planning, mostly under the condition that the environment is known, to manually plan and control unmanned aerial vehicle flight. Then the successful completion of obstacle avoidance and path planning is still under exploration under the condition that the environment is unknown and unmanned aerial vehicle is required to perform edging flight and exploration, and the occurrence of the content extraction method based on convolutional neural network target detection provides a certain opportunity for solving the problem. The real-time three-dimensional map is provided for the unmanned aerial vehicle, the unmanned aerial vehicle can be helped to accurately position, the target in the scene can be identified by the content extraction method for target detection, and the position of the target can be given, so that the unmanned aerial vehicle and the target can be exactly provided with meaningful reference data for obstacle avoidance and path planning of the unmanned aerial vehicle by combining the unmanned aerial vehicle and the unmanned aerial vehicle, and the unmanned aerial vehicle can fly in a dry or semi-dry mode. More importantly, if the two methods achieve real-time processing, an unmanned aerial vehicle-mounted intelligent processing system can be established by using a space scene sensing method of an embedded microcomputer and a content extraction method based on target detection. In addition, the realization of the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method is not limited to solving the obstacle avoidance and path planning problems, and the collected data can be efficiently utilized to expand to other application directions, which are deep excavation of unmanned aerial vehicle application, so that the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method has a large application value.
In the aspect of space scene perception, with the improvement of hardware performance of a visual camera, a laser radar camera and the like, the scene perception can not only obtain a reliable data source, but also realize information complementation by cooperative work of multiple devices. The laser radar and the real-time space perception modeling have the characteristics, and the independent use has certain limitation, and the fusion can complement the advantages and disadvantages. For example, vision can work relatively stably in a dynamic environment with rich textures, and can provide very accurate point cloud matching for a laser radar, and relatively accurate direction and distance information provided by the laser radar can also assist in correcting point cloud images. In an environment with darker light or obviously lacking texture, the advantages of the laser radar can be utilized to assist real-time space perception modeling to record a scene by means of a small amount of information. In addition, the laser radar system and the real-time space perception modeling system cannot be limited in structure by only using one solution, and basically all the solutions can be configured with auxiliary positioning tools such as an inertial element, a satellite positioning system, an indoor base station positioning system and the like to form a complementary situation, which is the research trend in recent years, namely, the radar system and other sensors are subjected to data fusion and work cooperatively. Compared with the prior loose coupling fusion method based on Kalman filtering, the current trend is tight coupling fusion based on nonlinear global optimization. For example, real-time space perception modeling and IMU (inertial navigation system) fusion can realize real-time mutual calibration, so that a vision module can keep certain positioning precision when accelerating and decelerating suddenly or rotating suddenly, tracking loss is prevented, and positioning and map construction errors are reduced to a greater extent.
In the aspect of content scene perception, namely target detection, the current trend is to consider accuracy and speed, the key of the problem is to start with target detection based on candidate frames, specifically, the method can realize as much shared calculation amount as possible among different ROIs, remove redundant calculation, and efficiently utilize the characteristics obtained by CNN, so that the speed of the whole detection is improved. Meanwhile, the target detection of the candidate frame still has a certain virtual view and the condition of omission, and the two problems are key problems to be solved in specific application of the target detection.
The problems to be solved by the unmanned aerial vehicle space perception target detection in the prior art and the key technical difficulties of the application include:
(1) Obstacle avoidance and path planning are problems in current unmanned aerial vehicle applications, which relate to the spatial perception of unmanned aerial vehicles to the surrounding environment and content extraction based on target identification, and the current unmanned aerial vehicle aims to solve the problems of obstacle avoidance and path planning mostly under the condition that the environment is known, so that the unmanned aerial vehicle is manually planned and controlled to fly. The appearance of a content extraction method based on convolutional neural network target detection provides a certain opportunity for solving the problem under the condition that the environment is unknown and unmanned aerial vehicle is required to perform edging flight and exploration, obstacle avoidance and path planning are successfully completed, but the method is not mature enough, a real-time three-dimensional map cannot be provided, unmanned aerial vehicle cannot be accurately positioned, the content extraction of the target detection can not completely identify the target in the scene and give the position of the target, and the combination of the target and the target can provide some meaningful reference data for obstacle avoidance and path planning of the unmanned aerial vehicle, but the combination process has a plurality of problems, the technology is not mature enough, and the unmanned aerial vehicle can not realize the dry pre-flight or the semi-dry pre-flight of the unmanned aerial vehicle. Moreover, the two methods cannot achieve real-time processing, and an unmanned aerial vehicle-mounted intelligent processing system cannot be established by using a space scene sensing method of embedding a microcomputer and a content extraction method based on target detection. The problem of obstacle avoidance and path planning cannot be effectively solved, and the application of the unmanned aerial vehicle is restricted.
(2) In the prior art, a space perception module of an unmanned aerial vehicle is perfected by an integrated three-dimensional visual RGB space extraction method, a target detection method is lacked to add a real-time target fine positioning and target fine recognition capability perfecting content extraction module to the unmanned aerial vehicle, navigation and path planning basis cannot be provided for the unmanned aerial vehicle by combining the two methods, and an intelligent unmanned aerial vehicle system including software and hardware is not established, so that the unmanned aerial vehicle cannot finish real-time space perception and target fine detection. The prior art lacks a fast deep convolutional neural network feature extractor, which cannot learn and extract features of all interested targets to be used for specific target detection and identification, lacks a method for realizing fast composition of specific scenes by using single-purpose ORB-vision real-time space perception modeling, lacks an integrated target detection module to mark the interested targets in a vision real-time space perception modeling diagram and give specific position information, and causes poor real-time space perception capability, low target detection accuracy, poor obstacle avoidance capability and poor application safety.
(3) In the prior art, a sample library and an interesting target feature library of indoor interesting targets are lacking, real-time detection and matching of targets cannot be realized by utilizing an end-to-end based fast neural network, real-time composition of the detected targets is lacking by utilizing a three-dimensional visual RGB space extraction system, the identified interesting targets cannot be marked in a simulation map, specific position information cannot be given, a visual real-time space perception modeling method cannot be lacked, a three-dimensional point cloud map cannot be provided for the unmanned aerial vehicle, space positioning of the unmanned aerial vehicle cannot be given, data support cannot be provided for unmanned aerial vehicle navigation, extraction of target detection contents based on deep learning cannot be provided for the unmanned aerial vehicle, spatial positions of various targets cannot be provided for the unmanned aerial vehicle, avoidance and path planning cannot be performed for the unmanned aerial vehicle by utilizing a spatial relationship, a real-time perception and target detection system of the unmanned aerial vehicle is lacked, an intelligent system suitable for real-time perception and target detection of the unmanned aerial vehicle, which comprises software and hardware is lacked, the problem of obstacle avoidance and path planning of the unmanned aerial vehicle cannot be well solved, and unmanned aerial vehicle cannot be realized.
Disclosure of Invention
The method comprises the steps of constructing a rapid deep convolutional neural network feature extractor, learning and extracting features of all interested targets, using the features as specific target detection and identification, utilizing single-purpose ORB-vision real-time space perception modeling to realize rapid composition of specific scenes, marking the interested targets in the vision real-time space perception modeling graph by an integrated target detection module, giving specific position information, ensuring a certain accuracy, applying a machine vision target detection method to scene perception based on content, fusing a scene perception method based on space, establishing an intelligent unmanned aerial vehicle system, taking an unmanned aerial vehicle and JETSON TX as carrying and developing platforms, and combining the vision real-time space perception modeling of the unmanned aerial vehicle under the specific scenes and the target detection method.
In order to achieve the technical effects, the technical scheme adopted by the application is as follows:
The intelligent unmanned aerial vehicle real-time space perception and target fine detection system is characterized in that a space perception module of the unmanned aerial vehicle is perfected through an integrated three-dimensional visual RGB space extraction method, a target detection method is optimized to add real-time target fine positioning and target fine recognition capability to the unmanned aerial vehicle to perfect a content extraction module, navigation and path planning basis are provided for the unmanned aerial vehicle through fusion of the two methods, and an intelligent unmanned aerial vehicle system comprising software and hardware is built;
The application establishes a sample library and an interesting target feature library of indoor interesting targets, realizes real-time detection and matching of targets by using an end-to-end based rapid neural network, performs real-time composition by using a three-dimensional visual RGB space extraction system while detecting the targets, marks the identified interesting targets in a simulation graph and gives specific position information, and the core method comprises the following steps:
(1) The method is based on a visual real-time space perception modeling method, which comprises the steps of carrying a depth camera on an unmanned aerial vehicle, providing an RGB acquisition chart and a depth acquisition chart for real-time space perception modeling, adopting real-time space perception modeling, namely three-dimensional visual RGB space extraction to finish the perception of a space scene, finally providing a three-dimensional point cloud map for the unmanned aerial vehicle, providing the space positioning of the unmanned aerial vehicle, and providing data support for unmanned aerial vehicle navigation;
(2) The method comprises the steps of extracting target detection content based on deep learning, optimizing real-time processing capacity of a target detection network based on the flight speed of an unmanned aerial vehicle, converting a target detection frame target into a three-dimensional space by combining a conversion relation obtained by scene perception, providing the spatial positions of various targets for the unmanned aerial vehicle, and utilizing the spatial relation unmanned aerial vehicle to perform obstacle avoidance and path planning;
(3) And establishing a real-time sensing and target detection system of the unmanned aerial vehicle, namely based on optimization of real-time scene sensing and real-time target detection, embedding the realization of the two modules into a microcomputer, and establishing an intelligent system comprising software and hardware and suitable for the real-time sensing and target detection of the unmanned aerial vehicle.
Preferably, the target real-time fine detection network data format comprises a training data sample and a target real-time fine detection network data format, wherein the training data sample comprises a large acquisition graph of a plurality of objects, for each object in the acquisition graph, a training label not only comprises a class of the object, but also comprises coordinates of each corner point of a boundary box, the number of the objects among different training acquisition graphs is different, the problem that a loss function is difficult to define is solved by introducing a fixed three-dimensional label format to select label formats with different lengths and dimensions, and the definition format can input the acquisition graph with any size comprising any plurality of objects;
The acquisition map is segmented with regular grids, the grid size is slightly smaller than the minimum object desired to be detected, each grid has two pieces of key information, including the class of the object and the coordinates of the corner points comprising the grid of the object, in addition, in the case of no object in the grid, a special custom class, namely a 'dontcare' class, is used to uniformly maintain a fixed size on the data representation, and an object coverage value represented by 0 or 1 is also set to represent whether the grid has an object, and for the case that many objects are in the same grid, the object occupying the most pixels in the grid is selected, and in the case that there is an overlap of objects, the object of the bounding box with the minimum Y value is used.
Preferably, the real-time accurate target detection network training is divided into three steps:
the first step, a data layer acquires a training acquisition graph and a label, and a conversion layer carries out online data enhancement;
The second step, the full convolution network performs feature extraction and prediction on the object class and the boundary frame of each grid;
Predicting the object category and the target boundary box of each grid respectively, and then simultaneously calculating errors of two prediction tasks by using a loss function;
the prediction process comprises two points, namely, generating a final frame set by using a clustering function in the verification process, and measuring the performance of a model on a verification data set by using a simplified mAP (maximum likelihood) calculation value;
The network receives input collection graphs with different sizes, effectively applies CNN in a sliding window mode with step length, outputs a multi-dimensional array, is overlapped on the collection graphs, and uses GoogLeNet for deleting a final pooling layer to enable the CNN to be used in a sliding window with the maximum step length of 555 multiplied by 555 pixels and 16 pixels;
A final optimized loss function is generated using a linear combination of two independent loss functions, the loss functions comprising in the training data samples the sum of squares of the differences between the true and predicted object coverage of all meshes, the average absolute difference loss of the true and predicted corner points of the bounding box of the object covered at each mesh.
Preferably, the flow of real-time spatial perception modeling comprises:
Reading sensor acquisition graph data, namely reading and preprocessing acquisition graph information of an unmanned aerial vehicle camera in real-time space perception modeling, wherein the data of a depth camera comprises an RGB acquisition graph and a depth graph corresponding to the RGB acquisition graph;
Modeling a visual odometer, namely calculating the attitude change of a camera and a local map by estimating the rotation and translation relation between every two adjacent acquisition graphs by the visual odometer, wherein the key of the step is feature point extraction and acquisition graph matching;
the back end adopts a nonlinear global optimization algorithm to optimize the position and the gesture of a camera from the front end and the detection result of the current and the future from the other thread, and corrects a global unified track diagram and a point cloud diagram;
Judging whether the scene passed by the sensor or the unmanned aerial vehicle carrying the sensor is over or not, and if the scene passes by a certain place, providing information for the rear end to correct the position and the gesture again;
and fourthly, constructing a cruise map of the unmanned aerial vehicle which meets the task requirements according to the estimated camera track.
Preferably, the real-time spatial perception modeling framework:
1) Visual odometer modeling
The front end is in charge of receiving a video stream of a camera, namely an acquisition graph sequence, estimating the motion of the camera between adjacent frames by a feature matching method, and preliminarily obtaining mileage information with certain error accumulation, wherein the visual odometer modeling comprises four parts:
firstly, collecting a frame, wherein the carried information comprises the pose of an unmanned aerial vehicle camera, an RGB (red green blue) collection chart and a depth chart when the frame collection chart is shot;
secondly, a camera model is corresponding to a camera in actual shooting and only comprises internal parameters;
Thirdly, the local map comprises key frames and landmark information points, wherein the key frames and the landmark information points conforming to the matching rule are added into the map, the map is only the local map but not the global map, and only the landmark information points near the current position are included, and the more distant landmark information points are deleted;
fourthly, the landmark information points are map points with known information in the map, wherein the known information included in the landmark information points is feature description corresponding to the landmark information points, and the obtaining mode is to apply a feature matching algorithm to extract the landmark information points in batches;
2) Backend global optimization
The method comprises the steps of analyzing and processing noise problems on data in a global process, wherein the noise problems comprise a linear global optimization algorithm and a nonlinear global optimization algorithm, the linear global optimization assumes that each frame acquisition chart in the shooting process has a linear relation, a Kalman filtering algorithm is used for carrying out state estimation, if the linear relation exists between a previous frame and a next frame, the state estimation is completed through extended Kalman filtering, the difference between an observed value and an algorithm estimated value is calculated, namely, the error value of a pixel coordinate and the pixel coordinate of a corresponding 3D point projected to a two-dimensional plane through a camera position resource is calculated, the error of the linear global optimization default camera position resource and a space point has a causal relation, the camera position and the pose are firstly calculated, and then the position of the space point is further calculated according to the camera position;
3) So the detection of the arrival and departure
The key point is that a word bag model is established, the word bag model abstracts the features into words, the detection process is to match the words appearing in the two images to judge whether the two images describe the same scene, the features are classified into words, a dictionary comprising all possible word sets needs to be trained, massive data are needed to be established for training the dictionary, the dictionary is established as a clustering process, 1 hundred million features are assumed to be extracted from all the images, the K-means clustering method is used for gathering the features into hundred thousand words, a tree with K branches and depth d is constructed for the dictionary in the training process, coarse classification is provided for the upper node of the tree, fine classification is provided for the lower node of the tree, the tree extends to leaf nodes, the time complexity is reduced to logarithmic level by using the tree, and the feature matching speed is accelerated;
4) Patterning of
The two-dimensional plane points are converted into a three-dimensional space by using the data collected by the camera after optimization and correction of the camera gesture, so that three-dimensional space point cloud information is formed, besides a point cloud map, the optimization process of the camera gesture is shown in a g2o tool to form a gesture map, and the map can be defined and described according to specific situations.
Preferably, the three-dimensional visual RGB space extraction method comprises the following steps:
The sensor adopts a monocular camera for acquiring depth information, and the data source comprises an RGB acquisition chart and a depth chart.
The method comprises the steps of obtaining a color acquisition chart and a depth acquisition chart by using a depth camera, transferring 2D plane data to a 3D three-dimensional space by using a geometric model, wherein the coordinate system center points from pixel coordinates to acquisition chart coordinates are different, only have offset relation, the coordinate axes from the acquisition chart coordinates to a camera coordinate system are parallel, only have scaling relation, and the conversion relation from the pixel coordinates to the camera coordinate system is expressed as follows:
Where u, v are the offset between the origins of the coordinate system and the center of the acquisition plane, d x,dy is the scaling of the pixel coordinates and the actual imaging plane, d x=zc/fx,dy=zc/fy,fx,fy is the focal length of the camera on the x, y axes, and the form of writing into a matrix is:
the camera motion camera coordinate system and the world coordinate system are not parallel, have rotation and translation relations, and when the subsequent visual odometer calculates, the relation between the front frame and the rear frame is the same as above, and the matrix relation is given as follows:
And converting the points of the two-dimensional plane into a three-dimensional space, finally obtaining a series of point cloud data, and endowing RGB color attributes to obtain a pair of color three-dimensional maps preliminarily.
Preferably, three-dimensional visual RGB space extraction is implemented:
(1) Front-end visual odometer
The initialization is to start searching key frames by taking a first frame acquisition diagram as a reference, the matching between every two acquisition diagrams adopts an ORB algorithm to extract key points, then BRIEF descriptors are calculated for each key point, and finally quick matching is carried out by adopting a quick approximate nearest neighbor algorithm, wherein the ORB angular point extraction algorithm adds scale and rotation description on a FAST angular point extraction algorithm, adds feature information, has richer feature description and high matching precision, is more accurate and reliable in composition, adopts binary descriptors for BRIEF descriptors, and uses random point selection comparison;
After matching is finished, 2D points are projected to a 3D space according to the depth acquisition graph, 2D coordinates and corresponding 3D coordinates of a series of points are obtained, the position information of a camera is estimated by solving a PnP problem, the actual calculation result is a rotation and translation matrix between the front frame acquisition graph and the rear frame acquisition graph, all data are matched in sequence in pairs, and the pose of the camera is calculated, so that a complete visual odometer is finally obtained;
(2) Backend nonlinear global optimization
The three-dimensional visual RGB space extraction expresses the calculated gesture of the visual odometer through the gesture graph, the three-dimensional visual RGB space extraction comprises nodes and edges, the nodes represent the gestures of each camera, the edges represent the transformation among the gestures of the cameras, the gesture graph not only intuitively describes the visual odometer, but also is convenient for understanding the change of the gestures of the cameras, the nonlinear global optimization expression is graph optimization, the same scene cannot appear in a plurality of positions, the gesture graph has sparsity characteristics, and the gesture graph is solved by adopting a sparse BA algorithm to correct the gestures of the cameras.
The unmanned aerial vehicle hardware system comprises an airborne computer, an airborne module assembly, a camera and a cloud platform, an M100 four-rotor unmanned aerial vehicle, wherein the M100 four-rotor unmanned aerial vehicle provides a flight platform to realize a basic flight function, the camera and the cloud platform are acquisition graph acquisition components in the airborne hardware system, the airborne module assembly is a hardware assembly part positioned between the camera cloud platform and the M100 four-rotor unmanned aerial vehicle, 1) video data of the acquisition camera are sent to the airborne computer, 2) the airborne computer realizes control of the cloud platform through the airborne module assembly, 3) the airborne computer can realize flight control of the M100 four-rotor unmanned aerial vehicle through the airborne module assembly, 4) video acquisition graph data of the camera can be input to a graph transmission system of the M100 four-rotor unmanned aerial vehicle through the airborne module assembly, 5. Voltage conversion, the airborne module assembly converts 24V voltage acquired from a battery of the M100 unmanned aerial vehicle into 12V to supply power to the cloud platform and the airborne computer;
(1) Airborne computer
The onboard computer adopts NVIDIA JETSON TX RTS-ASG003 microcomputer, and the total weight is 170g;
(2) Airborne module assembly
The machine-mounted module assembly belongs to an intermediate execution processing unit in the whole machine-mounted hardware system, and comprises 1) video data output by a high-definition camera are divided into two paths through an HDMI distributor, one path of the video data is input into a video collector and is output to a machine-mounted computer by the video collector, the other path of the video data is output to a wireless image transmission system of an M100 unmanned aerial vehicle through an N1 encoder, 2) a USB-to-UART and PWM module is used for controlling the flight control of the M100 unmanned aerial vehicle and the control of the cloud deck by the machine-mounted computer, 3) a vision sensor realizes the autonomous obstacle avoidance of the M100 unmanned aerial vehicle, 4) an RC receiver receives a control signal of a ground remote controller to control the action of the cloud deck, 5) the wireless data transmission module provides a low-bandwidth data link for the machine-mounted hardware system and the ground system, and 6) supplies power to all components in the machine-mounted computer, the cloud deck and the machine-mounted module assembly;
The video acquisition part in the airborne module assembly consists of an HDMI distributor, a video acquisition device and an N1 encoder, wherein the HDMI distributor divides a video stream from a high-definition camera into two paths, and the two paths of video streams respectively enter the video acquisition device and the N1 encoder through HDMI interfaces;
The wireless data transmission module and the wireless data transmission module at the ground end realize a low-bandwidth wireless data transmission data link, and the sky end and the ground end provide a data path for bidirectional data transmission; the method comprises the steps of establishing a relatively independent vision sensing system by using a Guidance, providing five groups of vision ultrasonic combined sensors, monitoring environmental information in multiple directions in real time, sensing obstacles, matching with an unmanned aerial vehicle flight controller, enabling the aircraft to timely avoid possible collision in high-speed flight, receiving control signals of a ground remote controller by using an RC wireless receiver R7008SB, internally processing the control signals and outputting PWM waveforms to control the movement state of a tripod head, setting 16 receiving channels by using a receiver, converting a USB into a UART and PWM module, realizing two parts of functions, completing conversion from the USB to the URAT (TTL level), connecting a UART interface of the module with a UART interface of the M100 unmanned aerial vehicle, realizing the flight control of the M100 unmanned aerial vehicle by using the UART interface of the module, enabling the airborne computer to output PWM control signals through the module, connecting the PWM control signals output by the module with the heading and the pitching control signals of the tripod head, and realizing the movement state control of the tripod head by using the airborne computer;
(3) Camera and cradle head
The method comprises the steps that a GoPro Hero4 high-definition camera is adopted to collect video data, a MiNi3DPro cradle head is adopted to carry GoProHero high-definition camera to control a camera visual angle, the cradle head is a triaxial cradle head to realize motion control in three directions of pitching, rolling and heading, two control modes of the cradle head are set, one control mode is that an onboard computer outputs PWM waveforms through a USB-UART and PWM module to realize control of the cradle head, and the other control mode is that an onboard cradle head receiver RS7008SB is controlled by a ground remote controller to output PWM waveforms to control the cradle head;
Setting control signals of a cradle head navigation axis and a pitching axis, wherein the control signals of the heading axis and the pitching axis are PWM waveforms with the period of 50HZ, and realizing position control of the heading axis and the pitching axis by adjusting the duty ratio of the control signals, wherein the duty ratio is 5.1% and corresponds to the minimum position, 7.6% and corresponds to the balance position, and 10.1% and corresponds to the maximum position;
setting a mode control signal of a cradle head to realize the control of a locking mode, a heading and a pitching following mode, wherein the three modes of the heading following mode are realized, when a signal input to a mode control lead is a PWM signal with a period of 50HZ and a duty ratio of between 5 and 6 percent, the cradle head enters the locking mode, the heading, the pitching and the rolling are locked at the moment, the heading and the pitching are controlled by a remote controller or an airborne computer, when the signal input to the mode control lead is a PWM signal with a period of 50HZ and a duty ratio of between 6 and 9 percent, the cradle head enters the heading and the pitching following mode, the rolling is locked at the moment, the heading smoothly rotates along with the direction of a nose, the pitching smoothly rotates along with the elevation angle of an airplane, and when the signal input to the mode control lead is a PWM signal with a period of 50HZ and a duty ratio of between 9 and 100 percent, the MiNi3DPro enters the heading following mode, and the heading, the pitching and the rolling are locked at the moment, the heading and the pitching smoothly rotate along with the direction of the nose and the pitching are controlled by the remote controller or the airborne computer.
The unmanned aerial vehicle hardware system is integrated by setting each unit module of an unmanned aerial vehicle airborne module assembly, designing a cloud deck power supply system, enabling 25V voltage output by an unmanned aerial vehicle battery to be output to 19V, 12V and 5V voltages respectively after passing through three DCDC voltage conversion modules of the airborne module assembly, enabling the 19V voltage to supply power to an airborne computer, enabling the 12V voltage to supply power to the cloud deck, enabling the 5V voltage to supply power to an RC receiver and an HDMI distributor, enabling an unmanned aerial vehicle platform internal power supply system to supply power to Guidence visual sensors, an N1 encoder, an airborne unmanned aerial vehicle line sensor and wireless data transmission respectively, enabling an HDMI video collector and the airborne wireless data transmission to take power through USB interfaces of the airborne computer, and enabling the camera to be powered by a battery carried by the camera.
Preferably, the integration of the unmanned aerial vehicle software system is completed by adopting an ROS system platform, and communication between modules is carried out by using a message mechanism of ROS, and meanwhile, the coupling degree of the two modules is loose;
1) Target detection flow
The target detection system adopts Jetson-INFERENCE system and comprises classification, detection and segmentation, wherein the detection module comprises acquisition chart detection, video detection and camera real-time detection, and the video stream is finally decomposed into acquisition chart frames, and the essence of the detection is the acquisition chart detection;
The target detection system converts the video stream into an acquisition image frame, then detects the acquisition image by using a trained network model, finally obtains the category of the target and the pixel coordinates of the target frame, and the output of the target detection system is transmitted to the visual real-time space perception modeling system for positioning and navigation;
2) Three-dimensional visual RGB space extraction process
The three-dimensional visual RGB space extraction system is used for receiving an RGB acquisition image and a depth image, firstly matching the RGB acquisition image to obtain a key frame, then matching the depth acquisition image to construct a point cloud image, then carrying out nonlinear global optimization and local forward and backward detection on the corrected point cloud image, and finally receiving a target detection result to finish target positioning of a three-dimensional space;
3) Unmanned aerial vehicle airborne processing method integration
The onboard processing process of the software system is that after TX2 receives collected image data from a camera, a target detection module detects collected image content to obtain a position coordinate of a target, and then the position coordinate is transmitted to a visual real-time space perception modeling system in real time, and at the moment, the visual real-time space perception modeling system reconstructs a target detection result corresponding to a key frame in a three-dimensional space according to the key frame composition to realize space content extraction.
Compared with the prior art, the application has the innovation points and advantages that:
(1) According to the application, the space perception module of the unmanned aerial vehicle is perfected through integrating the three-dimensional visual RGB space extraction method, the real-time target fine positioning and target fine recognition capability perfecting content extraction module is added to the unmanned aerial vehicle through optimizing the target detection method, and finally navigation and path planning basis is provided for the unmanned aerial vehicle through fusing the two methods, and an intelligent unmanned aerial vehicle system comprising software and hardware is established. The content extraction method based on the convolutional neural network target detection is adopted to provide a real-time three-dimensional map for the unmanned aerial vehicle so as to help the unmanned aerial vehicle to accurately position, the content extraction method based on the target detection can identify the target in a scene and give the position of the target, and the two methods are combined to just provide meaningful reference data for obstacle avoidance and path planning of the unmanned aerial vehicle, so that the unmanned aerial vehicle can fly in a dry or semi-dry mode. A space scene sensing method for embedding a microcomputer into an unmanned aerial vehicle and a content extraction method based on target detection are used for establishing an unmanned aerial vehicle-mounted intelligent processing system, and the two methods achieve real-time processing. The realization of the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method is not limited to solving the problems of obstacle avoidance and path planning, the collected data can be efficiently utilized, the obstacle avoidance and path planning can be successfully completed under the condition that the environment is unknown and the unmanned aerial vehicle is required to perform the edging flight and exploration, the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method is expanded to other application directions, the deep excavation and expansion of unmanned aerial vehicle application are realized, and the unmanned aerial vehicle-mounted real-time space sensing and target recognition-based content extraction method has a large application value.
(2) The method comprises the steps of constructing a rapid deep convolutional neural network feature extractor, learning and extracting features of all interested targets, using the features as specific target detection and identification, utilizing single-purpose ORB-vision real-time space perception modeling to realize rapid composition of specific scenes, marking the interested targets in the vision real-time space perception modeling graph by an integrated target detection module, giving specific position information, ensuring a certain accuracy, applying a machine vision target detection method to scene perception based on content, fusing a scene perception method based on space, establishing an intelligent unmanned aerial vehicle system, taking an unmanned aerial vehicle and JETSON TX as carrying and developing platforms, and combining the vision real-time space perception modeling of the unmanned aerial vehicle under the specific scenes and the target detection method.
(3) The method comprises the steps of 1) carrying a depth camera on an unmanned aerial vehicle, providing an RGB acquisition chart and a depth acquisition chart for real-time space perception modeling, finally providing a three-dimensional point cloud map for the unmanned aerial vehicle, providing space positioning of the unmanned aerial vehicle, providing data support for unmanned aerial vehicle navigation, 2) extracting target detection content based on deep learning, optimizing real-time processing capacity based on the speed of unmanned aerial vehicle flight, combining a conversion relation obtained by scene perception, converting a target detection frame into a three-dimensional space, providing spatial positions of various targets for the unmanned aerial vehicle, and making obstacle avoidance and path planning by using a space relation unmanned aerial vehicle, 3) establishing a real-time perception and target detection system of the unmanned aerial vehicle, optimizing real-time perception scenes and real-time target detection based on the real-time perception, providing data support for unmanned aerial vehicle navigation, 2) realizing high accuracy of the unmanned aerial vehicle navigation, optimizing real-time processing capacity based on the target detection content, combining the conversion relation obtained by scene perception, and realizing the real-time perception of the unmanned aerial vehicle, and the unmanned aerial vehicle has high accuracy, and the unmanned aerial vehicle navigation system.
Drawings
Fig. 1 is a general block diagram of the on-board hardware of the unmanned aerial vehicle hardware system.
Fig. 2 is a system block diagram of an unmanned aerial vehicle hardware on-board module assembly.
FIG. 3 is a schematic diagram illustrating the connection between the USB to UART and PWM module output pins and the cradle head control signal line.
Fig. 4 is a schematic diagram of control signals for setting the pan/tilt axis and the tilt axis.
Fig. 5 is a schematic diagram of a mode control signal for setting a pan/tilt head.
Fig. 6 is a schematic diagram of each unit module of the unmanned aerial vehicle on-board module assembly.
Fig. 7 is a schematic diagram of the overall framework of the unmanned aerial vehicle on-board processing method.
Fig. 8 is a schematic diagram of an outdoor application case of the unmanned aerial vehicle of the present application.
Fig. 9 is a schematic diagram of an indoor application case of the unmanned aerial vehicle of the present application.
Detailed Description
The technical scheme of the intelligent unmanned aerial vehicle real-time space sensing and target fine detection system provided by the application is further described below with reference to the accompanying drawings, so that the application can be better understood and implemented by those skilled in the art.
With the great improvement of the performance of software and hardware, the machine vision is improved in accuracy and real-time, the content-based scene perception method and the space-based scene perception method are increased in recent years, the target detection method has real-time performance while ensuring a certain accuracy, the machine vision target detection method is applied to the content-based scene perception, the space-based scene perception method is fused, an intelligent unmanned aerial vehicle system is built, unmanned aerial vehicles and JETSON TX are used as carrying and developing platforms, the combination of the visual real-time space perception modeling and the target detection method of the unmanned aerial vehicle under a specific scene is realized, and the unmanned aerial vehicle with the flight speed limited by the scene is used;
(1) The scene perception method based on the content is characterized in that a scene perception task based on the content is completed by adopting a target detection method based on deep learning, and the unmanned aerial vehicle is different from common target detection by taking the unmanned aerial vehicle as a carrying platform, and firstly, the unmanned aerial vehicle has a certain flight speed, and the real-time requirement is provided for the target detection. Secondly, the unmanned aerial vehicle is far from near to near in the process of flying and passing through the object, and meanwhile, the shooting view angle can also have the conditions of front view, strabismus, overlook and the like due to the relative positions of the unmanned aerial vehicle and the object, so that the target detection method is required to have the characteristics of rotation and unchanged scale. And optimizing and constructing a neural network to finish the task of extracting real-time content of the unmanned aerial vehicle, and accurately identifying the target in the specific scene through training under the condition of sufficient data quantity.
(2) The space-based scene perception method comprises the steps of completing matching and realizing rapid composition by adopting a fast ORB corner detection method, and integrating a visual real-time space perception modeling method into the real-time space perception of a specific scene of the unmanned aerial vehicle.
(3) The intelligent unmanned aerial vehicle system is formed by embedding a space perception module based on a visual real-time space perception modeling method and a target detection content extraction method based on a convolutional neural network into a microcomputer and combining the space perception module and the target detection content extraction method with the unmanned aerial vehicle system. Through experiments, the problem of carrying two kinds of perception modules in the actual flight process is solved.
The method comprises the steps of improving a space perception module of an unmanned aerial vehicle through an integrated three-dimensional visual RGB space extraction method, improving a target detection method, adding a real-time target fine positioning and target fine recognition capability improvement content extraction module to the unmanned aerial vehicle, finally providing navigation and path planning basis for the unmanned aerial vehicle through fusion of the two methods, and establishing an intelligent unmanned aerial vehicle system comprising software and hardware;
The application establishes a sample library and an interesting target feature library of indoor interesting targets, realizes real-time detection and matching of targets by using an end-to-end based rapid neural network, performs real-time composition by using a three-dimensional visual RGB space extraction system while detecting the targets, marks the identified interesting targets in a simulation graph and gives specific position information, and the core method comprises the following steps:
(1) The method is based on a visual real-time space perception modeling method, which comprises the steps of carrying a depth camera on an unmanned aerial vehicle, providing an RGB acquisition chart and a depth acquisition chart for real-time space perception modeling, adopting real-time space perception modeling, namely three-dimensional visual RGB space extraction to finish the perception of a space scene, finally providing a three-dimensional point cloud map for the unmanned aerial vehicle, providing the space positioning of the unmanned aerial vehicle, and providing data support for unmanned aerial vehicle navigation;
(2) The method comprises the steps of extracting target detection content based on deep learning, optimizing real-time processing capacity in terms of target detection by a network based on the speed of unmanned aerial vehicle flight, converting a target detection frame target into a three-dimensional space by combining a conversion relation obtained by scene perception, providing the spatial positions of various targets for the unmanned aerial vehicle, and utilizing the spatial relation unmanned aerial vehicle to perform obstacle avoidance and path planning;
(3) And establishing a real-time sensing and target detection system of the unmanned aerial vehicle, namely based on optimization of real-time scene sensing and real-time target detection, embedding the realization of the two modules into a microcomputer, and establishing an intelligent system comprising software and hardware and suitable for the real-time sensing and target detection of the unmanned aerial vehicle.
1. Content extraction method for real-time accurate target detection
Target real-time accurate detection network data format
The training data sample comprises a large collection chart of a plurality of objects, for each object in the collection chart, the training label not only comprises the class of the object, but also comprises the coordinates of each corner point of the boundary box, the number of the objects is different among different training collection charts, the problem that the definition of a loss function is difficult due to the selection of label formats with different lengths and dimensions is solved by introducing a fixed three-dimensional label format, and the definition format can input the collection chart with any size comprising any plurality of objects.
The acquisition map is segmented with regular grids, the grid size is slightly smaller than the minimum object desired to be detected, each grid has two pieces of key information, including the class of the object and the coordinates of the corner points comprising the grid of the object, in addition, in the case of no object in the grid, a special custom class, namely a 'dontcare' class, is used to uniformly maintain a fixed size on the data representation, and an object coverage value represented by 0 or 1 is also set to represent whether the grid has an object, and for the case that many objects are in the same grid, the object occupying the most pixels in the grid is selected, and in the case that there is an overlap of objects, the object of the bounding box with the minimum Y value is used.
(II) real-time accurate detection network framework for targets
The real-time accurate target detection network training is divided into three steps:
the first step, a data layer acquires a training acquisition graph and a label, and a conversion layer carries out online data enhancement;
The second step, the full convolution network performs feature extraction and prediction on the object class and the boundary frame of each grid;
Predicting the object category and the target boundary box of each grid respectively, and then simultaneously calculating errors of two prediction tasks by using a loss function;
the prediction process comprises two points, namely, generating a final frame set by using a clustering function in the verification process, and measuring the performance of a model on a verification data set by using a simplified mAP (maximum likelihood) calculation value;
The network receives input collection graphs with different sizes, effectively applies CNN in a sliding window mode with step length, outputs a multi-dimensional array, is overlapped on the collection graphs, and uses GoogLeNet for deleting a final pooling layer to enable the CNN to be used in a sliding window with the maximum step length of 555 multiplied by 555 pixels and 16 pixels;
A final optimized loss function is generated using a linear combination of two independent loss functions, the loss functions comprising in the training data samples the sum of squares of the differences between the true and predicted object coverage of all meshes, the average absolute difference loss of the true and predicted corner points of the bounding box of the object covered at each mesh.
2. Visual real-time space perception modeling method
First, real-time space perception modeling framework
The real-time space perception modeling process comprises the following steps:
Reading sensor acquisition graph data, namely reading and preprocessing acquisition graph information of an unmanned aerial vehicle camera in real-time space perception modeling, wherein the data of a depth camera comprises an RGB acquisition graph and a depth graph corresponding to the RGB acquisition graph;
Modeling a visual odometer, namely calculating the attitude change of a camera and a local map by estimating the rotation and translation relation between every two adjacent acquisition graphs by the visual odometer, wherein the key of the step is feature point extraction and acquisition graph matching;
the back end adopts a nonlinear global optimization algorithm to optimize the position and the gesture of a camera from the front end and the detection result of the current and the future from the other thread, and corrects a global unified track diagram and a point cloud diagram;
Judging whether the scene passed by the sensor or the unmanned aerial vehicle carrying the sensor is over or not, and if the scene passes by a certain place, providing information for the rear end to correct the position and the gesture again;
and fourthly, constructing a cruise map of the unmanned aerial vehicle which meets the task requirements according to the estimated camera track.
1. Visual odometer modeling
The front end is in charge of receiving a video stream of a camera, namely an acquisition graph sequence, estimating the motion of the camera between adjacent frames by a feature matching method, and preliminarily obtaining mileage information with certain error accumulation, wherein the visual odometer modeling comprises four parts:
firstly, collecting a frame, wherein the carried information comprises the pose of an unmanned aerial vehicle camera, an RGB (red green blue) collection chart and a depth chart when the frame collection chart is shot;
secondly, a camera model is corresponding to a camera in actual shooting and only comprises internal parameters;
And thirdly, adding the key frames and the landmark information points which meet the matching rule into the map, wherein the map is only the local map but not the global map, and only comprises the landmark information points near the current position, and deleting the more distant landmark information points.
And fourthly, the landmark information points are map points with known information in the map, wherein the known information included in the landmark information points is feature description corresponding to the landmark information points, and the obtaining mode is to extract the landmark information points in batches by using a feature matching algorithm.
2. Backend global optimization
The global process analyzes the data to process noise problems, including linear global optimization and nonlinear global optimization algorithm;
The linear global optimization assumes that each frame acquisition graph in the shooting process has a linear relation, a Kalman filtering algorithm is used for carrying out state estimation, if the state estimation is carried out by adopting an extended Kalman filtering algorithm if only the previous frame and the next frame have the linear relation, the state estimation is completed by adopting the extended Kalman filtering, the difference between an observed value and an algorithm estimated value, namely the error value of pixel coordinates and the pixel coordinates of the corresponding 3D point projected to a two-dimensional plane through a camera position resource, the error generation of the linear global optimization default camera position resource and a space point has a causal relation, the camera position and the attitude are firstly solved, then the position of the space point is further solved according to the camera position resource, and the nonlinear global optimization directly puts all data into the same model for optimization solution to desalt the front-back relation between the data.
3. So the detection of the arrival and departure
The method is characterized in that a word bag model is established, the word bag model abstracts features into words, the detection process is to match words appearing in two images to judge whether the two images describe the same scene or not, the features are classified into words, a dictionary comprising all possible word sets needs to be trained, massive data are needed to be established for training the dictionary, the dictionary is established as a clustering process, 1 hundred million features are assumed to be extracted from all the images, the K-means clustering method is used for gathering the features into hundred thousand words, a tree with K branches and depth of d is established for the dictionary in the training process, coarse classification is provided for the upper node of the tree, fine classification is provided for the lower node of the tree, the tree extends to leaf nodes, the time complexity is reduced to logarithmic level by utilizing the tree, and the feature matching speed is accelerated.
4. Patterning of
The two-dimensional plane points are converted into a three-dimensional space by using the data collected by the camera after optimization and correction of the camera gesture, so that three-dimensional space point cloud information is formed, besides a point cloud map, the optimization process of the camera gesture is shown in a g2o tool to form a gesture map, and the map can be defined and described according to specific situations.
(II) three-dimensional visual RGB space extraction method
The sensor adopts a monocular camera for acquiring depth information, and the data source comprises an RGB acquisition chart and a depth chart.
The method comprises the steps of obtaining a color acquisition chart and a depth acquisition chart by using a depth camera, transferring 2D plane data to a 3D three-dimensional space by using a geometric model, wherein the coordinate system center points from pixel coordinates to acquisition chart coordinates are different, only have offset relation, the coordinate axes from the acquisition chart coordinates to a camera coordinate system are parallel, only have scaling relation, and the conversion relation from the pixel coordinates to the camera coordinate system is expressed as follows:
Where u, v are the offset between the origins of the coordinate system and the center of the acquisition plane, d x,dy is the scaling of the pixel coordinates and the actual imaging plane, d x=zc/fx,dy=zc/fy,fx,fy is the focal length of the camera on the x, y axes, and the form of writing into a matrix is:
the camera motion camera coordinate system and the world coordinate system are not parallel, have rotation and translation relations, and when the subsequent visual odometer calculates, the relation between the front frame and the rear frame is the same as above, and the matrix relation is given as follows:
And converting the points of the two-dimensional plane into a three-dimensional space, finally obtaining a series of point cloud data, and endowing RGB color attributes to obtain a pair of color three-dimensional maps preliminarily.
1. Three-dimensional visual RGB space extraction implementation
(1) Front-end visual odometer
The initialization is to start searching key frames by taking a first frame acquisition diagram as a reference, the matching between every two acquisition diagrams adopts an ORB algorithm to extract key points, then BRIEF descriptors are calculated for each key point, and finally quick matching is carried out by adopting a quick approximate nearest neighbor algorithm, wherein the ORB angular point extraction algorithm adds scale and rotation description on a FAST angular point extraction algorithm, adds feature information, has richer feature description and high matching precision, is more accurate and reliable in composition, adopts binary descriptors for BRIEF descriptors, and uses random point selection comparison;
After matching is completed, 2D points are projected to a 3D space according to the depth acquisition graph, 2D coordinates and corresponding 3D coordinates of a series of points are obtained, the position information of a camera is estimated by solving a PnP problem, the actual calculation result is a rotation and translation matrix between the front frame acquisition graph and the rear frame acquisition graph, all data are matched in sequence in pairs, and the pose of the camera is calculated, so that a complete visual odometer is finally obtained.
(2) Backend nonlinear global optimization
In the front-end visual odometer establishing process, only two adjacent frames of acquisition images are continuously matched and the corresponding camera gestures are solved, so that the situation that errors are accumulated cannot be avoided, and the camera gestures need to be corrected.
The three-dimensional visual RGB space extraction expresses the calculated gesture of the visual odometer through the gesture graph, the three-dimensional visual RGB space extraction comprises nodes and edges, the nodes represent the gestures of each camera, the edges represent the transformation among the gestures of the cameras, the gesture graph not only intuitively describes the visual odometer, but also is convenient for understanding the change of the gestures of the cameras, the nonlinear global optimization expression is graph optimization, the same scene cannot appear in a plurality of positions, the gesture graph has sparsity characteristics, and the gesture graph is solved by adopting a sparse BA algorithm to correct the gestures of the cameras.
(3) So the detection of the arrival and departure
Although the back-end global optimization is performed to correct the camera pose to a certain extent, there is a problem that after a period of time, when the unmanned aerial vehicle returns to the original point or the same place, the system can distinguish whether the unmanned aerial vehicle is the original point or has come and come, this is the problem to be solved by the local and the current detection, if the same scene is identified by collecting the image matching, the back-end can obtain another optimization information to adjust the track and the map to the local and the current detection result, the global optimization is completed, whether the same place is detected, the similarity of the current frame and all the previous frames needs to be compared, but the longer the time is, the larger the data volume is, so that the real-time performance is greatly reduced, in order to achieve a better effect, the three-dimensional visual RGB space extraction adopts a close-range loop and random loop mode to replace the traversing mode, the close-range loop is to match the current frame with the previous n frames, n is selected by self according to the situation, the random loop is to match the current frame with the previous n frames, n is selected by self according to the situation, and the local and comes after the camera is detected by the random loop.
3. Unmanned airport scene perception and target precision detection system
Unmanned aerial vehicle hardware system
The overall block diagram of the airborne hardware is shown in figure 1 and comprises an airborne computer, an airborne module assembly, a camera and a cloud platform, an M100 four-rotor unmanned aerial vehicle, wherein the M100 four-rotor unmanned aerial vehicle provides a flight platform to achieve basic flight functions, the camera and the cloud platform are acquisition graph acquisition components in an airborne hardware system, the airborne module assembly is a hardware assembly part positioned between the camera cloud platform and the M100 four-rotor unmanned aerial vehicle and the airborne computer, 1) video data of the acquisition camera are sent to the airborne computer, 2) the airborne computer achieves control of the cloud platform through the airborne module assembly, 3) the airborne computer can achieve flight control of the M100 four-rotor unmanned aerial vehicle through the airborne module assembly, 4) video acquisition graph data of the camera can be input into a graph transmission system of the M100 four-rotor unmanned aerial vehicle through the airborne module assembly, and 5. Voltage conversion is achieved through the airborne module assembly, and 24V voltage acquired from a battery of the M100 unmanned aerial vehicle is converted into 12V to supply power for the cloud platform and the airborne computer.
1. Unmanned aerial vehicle hardware system composition
(1) Airborne computer
The onboard computer adopts NVIDIA JETSON TX RTS-ASG003 microcomputer, has total weight of 170g, and has the size of bank card, and is very light.
(2) Airborne module assembly
The on-board module assembly belongs to an intermediate execution processing unit in the whole on-board hardware system, and a system block diagram of the on-board module assembly is shown in fig. 2. The method comprises the following steps of 1) dividing video data output by a high-definition camera into two paths through an HDMI distributor, wherein one path of video data enters a video collector and is output to an onboard computer through the video collector, the other path of video data is output to a wireless image transmission system of an M100 unmanned aerial vehicle through an N1 encoder, 2) a USB-UART-and PWM (universal serial bus-universal asynchronous receiver/transmitter) module is used for controlling flight control of the M100 and control of a cradle head by the onboard computer, 3) a vision sensor is used for realizing autonomous obstacle avoidance of the M100 unmanned aerial vehicle, 4) an RC (remote control) receiver receives control signals of a ground remote controller to control actions of the cradle head, 5) a wireless data transmission module is used for providing a low-bandwidth data link for an onboard hardware system and a ground system, and 6) power is supplied to all components in the onboard computer, the cradle head and an onboard module assembly.
The video acquisition part in the airborne module assembly consists of an HDMI distributor, a video acquisition device and an N1 encoder, wherein the HDMI distributor divides a video stream from a high-definition camera into two paths, and the two paths of video streams respectively enter the video acquisition device and the N1 encoder through HDMI interfaces.
The system comprises an airborne module assembly, a wireless data transmission module, a wireless receiver R7008SB, a receiver, a USB-to-UART (universal asynchronous receiver/transmitter) and a PWM module, wherein the wireless data transmission module of the wireless data transmission module and the wireless data transmission module of the ground end realize a low-bandwidth wireless data transmission data link, the sky end and the ground end provide data paths for bidirectional data transmission, the guildance establishes a relatively independent vision sensing system, the system is provided with five groups of vision ultrasonic combined sensors, environmental information in multiple directions is monitored in real time and obstacles are perceived, the five groups of vision ultrasonic combined sensors are matched with an unmanned aerial vehicle flight controller, the aircraft can avoid possible collision in high-speed flight in time, the RC wireless receiver R7008SB receives control signals of the ground remote controller, PWM waveforms are output after internal processing to control the motion state of a cradle head, the receiver is provided with 16 receiving channels, the USB-to-UART and the PWM module realizes two-to-UART (TTL level) conversion, a UART interface of the module is connected with a UART interface of the M100 unmanned aerial vehicle, the unmanned aerial vehicle flight control is realized, the UART-to the flight control signal can be independently flown by the UART computer, and the control signal can be connected with the cradle head motion state of the cradle head through the module. FIG. 3 illustrates the connection of the USB to UART and PWM module output pins to the cradle head control signal line.
(3) Camera and cradle head
The method is characterized in that a GoPro Hero4 high-definition camera is adopted to collect video data, a MiNi3DPro cradle head is adopted to carry GoProHero high-definition camera to control a camera visual angle, the cradle head is a triaxial cradle head to realize motion control in three directions of pitching, rolling and heading, two control modes of the cradle head are set, one control mode is that an onboard computer outputs PWM waveforms through a USB-UART and PWM module to realize control of the cradle head, and the other control mode is that a ground remote controller controls an onboard cradle head receiver RS7008SB to output PWM waveforms to control the cradle head.
Fig. 4 sets control signals of a pan-tilt navigation axis and a tilt-tilt axis, wherein the control signals of the heading axis and the tilt-tilt axis are PWM waveforms with a period of 50HZ, and position control of the heading axis and the tilt-tilt axis is achieved by adjusting a duty ratio of the control signals, wherein the duty ratio is 5.1% corresponding to a minimum position, 7.6% corresponding to a balance position, and 10.1% corresponding to a maximum position.
The method comprises the steps of setting a mode control signal of a cradle head to realize control of a locking mode, a heading and a pitching following mode and the heading following mode, enabling the cradle head to enter the locking mode when a signal input to a mode control lead is a PWM signal with a period of 50HZ and a duty ratio of between 5% and 6%, locking the heading, pitching and rolling, controlling the heading and pitching through a remote controller or an onboard computer, enabling the cradle head to enter the heading and pitching following mode when the signal input to the mode control lead is the PWM signal with the period of 50HZ and the duty ratio of between 6% and 9%, locking the rolling, enabling the heading to smoothly rotate along with the direction of a machine head, enabling the pitching to rotate along with the elevation of the machine head, and enabling the MiNi3DPro cradle head to enter the heading following mode when the signal input to the mode control lead is the PWM signal with the period of 50HZ and the duty ratio of between 9% and 100%, enabling the heading, pitching and rolling to smoothly rotate along with the direction of the machine head, and controlling the pitching through the remote controller or the onboard computer.
(4) Unmanned aerial vehicle interface
The battery power output interface of the unmanned aerial vehicle is input to the power input interface of the airborne module assembly, so that power supply of airborne equipment is realized. The output and output of the vision obstacle avoidance system are connected with a CAN-Bus of the unmanned aerial vehicle, the vision obstacle avoidance system is matched with a flight controller of the unmanned aerial vehicle to realize autonomous obstacle avoidance, and a power supply and a video interface of the N1 encoder are connected with a special interface on the unmanned aerial vehicle.
2. Unmanned aerial vehicle hardware system integration
The unmanned aerial vehicle on-board module assembly is arranged in the figure 6, a cradle head power supply system is designed, 25V voltage output by an unmanned aerial vehicle battery is output to 19V, 12V and 5V voltage respectively after passing through three DCDC voltage conversion modules of the on-board module assembly, wherein 19V voltage supplies power to an on-board computer, 12V voltage supplies power to the cradle head, 5V voltage supplies power to an RC receiver and an HDMI distributor, an unmanned aerial vehicle platform internal power supply system supplies power to Guidence visual sensors, an N1 encoder, an on-board wireless line transmission and a wireless data transmission respectively, an HDMI video collector and an on-board wireless data transmission take power through a USB interface of the on-board computer, and the camera is powered by a battery carried by the camera.
(II) unmanned aerial vehicle software System design
The integration of the unmanned aerial vehicle software system is completed by adopting an ROS system platform, communication between modules is performed by using a message mechanism of ROS, and meanwhile, the coupling degree of the two modules is very loose, so that the improvement effect can be continued in the later period.
1. Target detection flow
The target detection system adopts Jetson-INFERENCE system and comprises classification, detection and segmentation, wherein the detection module comprises acquisition chart detection, video detection and camera real-time detection, and the video stream is finally decomposed into acquisition chart frames, and the essence of the detection is the acquisition chart detection.
The target detection system converts the video stream into an acquisition image frame, then detects the acquisition image by using a trained network model, and finally obtains the category of the target and the pixel coordinates of the target frame. The output of the target detection system is transmitted to a visual real-time space perception modeling system for positioning and navigation.
2. Three-dimensional visual RGB space extraction process
The three-dimensional visual RGB space extraction system is used for receiving the RGB acquisition image and the depth image, firstly matching the RGB acquisition image to obtain a key frame, then matching the depth acquisition image to construct a point cloud image, then carrying out nonlinear global optimization and the forward and backward detection of the correction point cloud image, and finally receiving the target detection result to finish the target positioning of the three-dimensional space.
3. Unmanned aerial vehicle airborne processing method integration
The overall frame of the unmanned aerial vehicle on-board processing method is shown in fig. 7. The process of the software system airborne processing is that TX2 receives the collected image data from a camera, then the collected image content is detected by a target detection module to obtain the position coordinates of a target, and then the position coordinates are transmitted to a visual real-time space perception modeling system in real time, at the moment, the visual real-time space perception modeling system is patterning according to key frames, and experiments show that as the video stream is decomposed into the collected image frames, the continuous collected image comprises the key frames, and the key frames necessarily also comprise the target detection results, the target detection results corresponding to the key frames are directly reconstructed in a three-dimensional space to realize space content extraction.
4. Unmanned aerial vehicle outdoor application case
The experimental place is selected from a playground with more pedestrians, and the speed and accuracy of target detection on a microcomputer and the adaptability of target detection in a flight state are fully checked.
As shown in fig. 8, from the experimental result, the unmanned aerial vehicle carries the microcomputer and detects the effect very well to intensive crowd with DETECTNET, basically no omission is examined, and the effect is little different under the slow moving state, also can accomplish the task very well under the fast moving state, and the only problem is that only when rotating the visual angle in the twinkling of an eye, the short skew can appear but very fast can get back to the exact position to the detection frame. Experiments prove that the application can perfectly meet the real-time and accurate target detection purpose in practical application.
5. Unmanned aerial vehicle indoor application case
The experimental place is selected in a room with relatively complex scene, and the indoor environment is characterized by narrow space and more barriers. If the unmanned aerial vehicle flies indoors and the GPS navigation system fails, the unmanned aerial vehicle navigation is completed by fully utilizing the mode of combining inertial navigation and visual perception, firstly, the inertial navigation has higher precision when the carrier changes direction instantly, and the error is larger when the unmanned aerial vehicle runs for a long time, then the unmanned aerial vehicle can be positioned by utilizing the scene perception based on real-time space perception modeling, and then the gesture of the carrier detected by the inertial navigation when changing direction instantly is added to the rear-end global optimization stage of the real-time space perception modeling, thereby correcting the gesture of the unmanned aerial vehicle, and simultaneously providing basis for unmanned aerial vehicle navigation and path planning by utilizing the specific spatial position of the obstacle detected by utilizing the target detection. The indoor composition and detection effect are shown in fig. 9:
the output data of the system are the spatial position of the detection target and the gesture of the unmanned aerial vehicle, so that the spatial perception effect can be seen to be accurate, and the accurate target position can be provided for the unmanned aerial vehicle after normalizing the target frame. From this, it can be seen that the implementation of the accurate positioning of the unmanned aerial vehicle in the room can also be realized by combining real-time space perception modeling and target detection.
6. Summary
On the basis of integrating the real-time space perception modeling, the target detection and the unmanned aerial vehicle system, the unmanned aerial vehicle also carries the unmanned aerial vehicle system for controlling, positioning and navigation, path planning and other functions, so that when other applications are integrated, a lot of work is still done on the aspects of cooperative work among the modules and lifting effect. The real-time sensing method and the real-time target detection method are integrated into a microcomputer by using an unmanned aerial vehicle software platform ROS, and an intelligent unmanned aerial vehicle system is formed together with the unmanned aerial vehicle system.
(1) The method realizes SLAM-based spatial scene perception of the unmanned aerial vehicle, establishes a three-dimensional point cloud picture and positioning by utilizing real-time spatial perception modeling, and provides spatial position information for the unmanned aerial vehicle.
(2) The content scene perception based on the DETECTNET target detection method is realized, the position of a specific target in a three-dimensional space is provided for the unmanned aerial vehicle, and the unmanned aerial vehicle can be used for intelligent navigation and path planning.
(3) The unmanned aerial vehicle has no equipment such as a camera and a microcomputer, the research also realizes the hardware constitution design of the intelligent unmanned aerial vehicle, and the frame design of the cooperative work of the unmanned aerial vehicle carrying the specific equipment is completed.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510396604.0A CN120495616A (en) | 2025-04-01 | 2025-04-01 | Intelligent UAV real-time spatial perception and target precision detection system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510396604.0A CN120495616A (en) | 2025-04-01 | 2025-04-01 | Intelligent UAV real-time spatial perception and target precision detection system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120495616A true CN120495616A (en) | 2025-08-15 |
Family
ID=96662469
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510396604.0A Pending CN120495616A (en) | 2025-04-01 | 2025-04-01 | Intelligent UAV real-time spatial perception and target precision detection system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120495616A (en) |
-
2025
- 2025-04-01 CN CN202510396604.0A patent/CN120495616A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Xu et al. | Power line-guided automatic electric transmission line inspection system | |
| CN111599001B (en) | Unmanned aerial vehicle navigation map construction system and method based on image three-dimensional reconstruction technology | |
| CN110956651B (en) | Terrain semantic perception method based on fusion of vision and vibrotactile sense | |
| Jiang et al. | Unmanned Aerial Vehicle-Based Photogrammetric 3D Mapping: A survey of techniques, applications, and challenges | |
| CN112734765B (en) | Mobile robot positioning method, system and medium based on fusion of instance segmentation and multiple sensors | |
| Huang et al. | Structure from motion technique for scene detection using autonomous drone navigation | |
| McGee et al. | Obstacle detection for small autonomous aircraft using sky segmentation | |
| US20200301015A1 (en) | Systems and methods for localization | |
| CN113485441A (en) | Distribution network inspection method combining unmanned aerial vehicle high-precision positioning and visual tracking technology | |
| CN106570820A (en) | Monocular visual 3D feature extraction method based on four-rotor unmanned aerial vehicle (UAV) | |
| CN117036989A (en) | Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision | |
| CN115291536B (en) | Verification method of semi-physical simulation platform for UAV tracking ground targets based on vision | |
| CN116989772B (en) | An air-ground multi-modal multi-agent collaborative positioning and mapping method | |
| CN111831010A (en) | A UAV Obstacle Avoidance Flight Method Based on Digital Space Slicing | |
| Florea et al. | Wilduav: Monocular uav dataset for depth estimation tasks | |
| CN117636284A (en) | Unmanned aerial vehicle autonomous landing method and device based on visual image guidance | |
| Esfahani et al. | A new approach to train convolutional neural networks for real-time 6-dof camera relocalization | |
| Chen et al. | Emergency uav landing on unknown field using depth-enhanced graph structure | |
| Wang et al. | Real-time Aircraft Bracket Junction Point Detection for Split Flying Vehicle Module Docking | |
| Zheng et al. | Air2Land: A deep learning dataset for unmanned aerial vehicle autolanding from air to land | |
| CN120495616A (en) | Intelligent UAV real-time spatial perception and target precision detection system | |
| Wang et al. | Online drone-based moving target detection system in dense-obstructer environment | |
| Fu et al. | Cooperative Target Detection and Recognition for Multiple Flapping-Wing Robots | |
| Christie et al. | Training object detectors with synthetic data for autonomous uav sampling applications | |
| Krerngkamjornkit et al. | Human body detection in search and rescue operation conducted by unmanned aerial vehicles |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication |