CN114964268B

CN114964268B - Unmanned aerial vehicle navigation method and device

Info

Publication number: CN114964268B
Application number: CN202210902202.XA
Authority: CN
Inventors: 李唯; 张宁远; 曹一丁; 郭伟; 杨雷
Original assignee: Baiyang Times Beijing Technology Co ltd
Current assignee: Baiyang Times Beijing Technology Co ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2023-05-02
Anticipated expiration: 2042-07-29
Also published as: CN114964268A

Abstract

The application discloses an unmanned aerial vehicle navigation method and device. The method comprises the following steps: constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments; based on simulation running information of a simulation target unmanned aerial vehicle in a simulation environment, constructing a deep reinforcement learning model for unmanned aerial vehicle navigation; when the target unmanned aerial vehicle operates, navigation is performed by utilizing the deep reinforcement learning model according to the real operation information of the target unmanned aerial vehicle. When the target unmanned aerial vehicle actually operates, the actual operation scheme can be obtained by utilizing the model and the actual operation information of the target unmanned aerial vehicle, so that the navigation of the target unmanned aerial vehicle is realized without the need of being familiar with the environment in advance, the navigation efficiency and accuracy of the unmanned aerial vehicle are improved, and the effect of the unmanned aerial vehicle is exerted to the maximum extent.

Description

Unmanned aerial vehicle navigation method and device

Technical Field

The application relates to the technical field of unmanned aerial vehicles, in particular to an unmanned aerial vehicle navigation method and device.

Background

In recent years, with the continuous development of intelligent control technology, robot technology and other technologies, unmanned aerial vehicle autonomous control technology has made great progress. The unmanned aerial vehicle is used as a flight platform capable of carrying various sensing devices and computing devices, has the advantages of small size, low manufacturing cost, high flexibility and the like, and can be widely applied to various tasks such as regional reconnaissance, disaster search and rescue and the like.

At present, in the existing unmanned aerial vehicle navigation method, an environment sensing device carried by the unmanned aerial vehicle is used for familiar with a known environment, an environment model corresponding to the known environment is built in advance, and then autonomous navigation is realized based on the environment model. Therefore, in the existing unmanned aerial vehicle navigation method, in order to realize a navigation scheme with higher accuracy, the requirement on the precision of an environment model is higher. In this case, if the known environment changes or the unmanned aerial vehicle enters the unknown environment, it is also difficult to achieve accurate autonomous navigation based on the environment model generated in advance.

Disclosure of Invention

The embodiment of the application provides an unmanned aerial vehicle navigation method and device, so as to solve the problem that the traditional unmanned aerial vehicle navigation method is difficult to meet the requirement of accurate navigation.

In a first aspect, an embodiment of the present application provides a method for navigating an unmanned aerial vehicle, including:

constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments;

based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment, constructing a deep reinforcement learning model for unmanned aerial vehicle navigation;

and when the target unmanned aerial vehicle runs, navigating by utilizing the deep reinforcement learning model according to the real running information of the target unmanned aerial vehicle.

Optionally, the constructing a deep reinforcement learning model for unmanned aerial vehicle navigation based on the simulation running information of the simulation target unmanned aerial vehicle in the simulation environment includes:

based on the simulation running information, constructing a navigation strategy model for planning navigation information by using a deep learning algorithm;

constructing a navigation evaluation model for evaluating the navigation information based on the navigation strategy model by using a reinforcement learning algorithm;

and optimizing the navigation strategy model based on the navigation evaluation model until the navigation strategy model is converged, and taking the converged navigation strategy model as the deep reinforcement learning model.

Optionally, the constructing a navigation strategy model for planning navigation information based on the simulation running information and by using a deep learning algorithm includes:

taking the simulation visual information of the simulation target unmanned aerial vehicle in the simulation environment as the input of a navigation prediction model, and taking the simulation navigation information of the simulation target unmanned aerial vehicle in the simulation environment as the output of the navigation prediction model, and constructing the navigation prediction model;

the simulation task information of the simulation target unmanned aerial vehicle in the simulation environment and the output of the navigation prediction model are taken as the input of a navigation matching model together, and the matching degree between the output of the navigation prediction model and the simulation task information is taken as the output of the navigation matching model, so that the navigation matching model is constructed;

And constructing the navigation strategy model based on the navigation prediction model and the navigation matching model.

Optionally, the constructing a navigation evaluation model for evaluating the navigation information based on the navigation strategy model and using a reinforcement learning algorithm includes:

obtaining navigation information matched with the simulation task information from simulation navigation information output by the navigation prediction model as target navigation information;

and taking the output of the navigation strategy model and the target navigation information together as the input of the navigation evaluation model, and taking the rewarding evaluation value corresponding to the target navigation information as the output of the navigation evaluation model to construct the navigation evaluation model.

Optionally, when the target unmanned aerial vehicle operates, before navigating by using the deep reinforcement learning model according to the real operation information of the target unmanned aerial vehicle, the method further includes:

constructing a navigation test environment of the target unmanned aerial vehicle, and performing navigation test on the deep reinforcement learning model in the navigation test environment to obtain a test result;

determining test random information based on the simulation environment and the navigation test environment;

And updating the deep reinforcement learning model according to the test result and the test random information.

Optionally, when the target unmanned aerial vehicle operates, navigating according to the real operation information of the target unmanned aerial vehicle and by using the deep reinforcement learning model, including:

acquiring real visual information and real task information when the target unmanned aerial vehicle runs;

inputting the real visual information and the real task information into the deep reinforcement learning model;

obtaining predicted navigation information output by the depth enhancement model; the predicted navigation information is navigation information matched with the real task information;

and controlling the target unmanned aerial vehicle to operate based on the predicted navigation information.

Optionally, the constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the device parameters of the target unmanned aerial vehicle and the environment parameters of different known environments includes:

according to the equipment parameters of the target unmanned aerial vehicle, constructing a digital twin model corresponding to the target unmanned aerial vehicle as the simulation target unmanned aerial vehicle;

building a simulation environment set corresponding to different known environments according to the environment parameters of the different known environments;

And constructing the simulation environment based on the digital twin model and the simulation environment set.

Optionally, the constructing a digital twin model corresponding to the target unmanned aerial vehicle as the simulation target unmanned aerial vehicle according to the device parameters of the target unmanned aerial vehicle includes:

based on the control and state estimation system test parameters in the equipment parameters, constructing a control and state estimation system simulation model of the simulation target unmanned aerial vehicle;

based on simulation control parameters output by the control and state estimation system simulation model and power system test parameters in the equipment parameters, constructing a power system simulation model of the simulation target unmanned aerial vehicle;

based on the simulation power system parameters output by the power system simulation model and the dynamic model test parameters in the equipment parameters, constructing a dynamic simulation model of the simulation target unmanned aerial vehicle;

based on simulation dynamics parameters output by the dynamics simulation model and rigid motion model test parameters in the equipment parameters, constructing a rigid motion simulation model of the simulation target unmanned aerial vehicle;

and constructing the digital twin model according to the control and state estimation system simulation model, the power system simulation model, the dynamics simulation model and the rigid body motion simulation model.

Optionally, the unmanned aerial vehicle navigation method further includes:

obtaining simulation motion parameters output by the rigid motion simulation model;

and updating the control and state estimation system simulation model and/or the dynamics simulation model according to the simulation motion parameters.

In a second aspect, an embodiment of the present application provides an unmanned aerial vehicle navigation device, including:

the simulation environment construction module is used for constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments;

the model construction module is used for constructing a deep reinforcement learning model for unmanned aerial vehicle navigation based on simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment;

and the navigation module is used for navigating according to the real operation information of the target unmanned aerial vehicle and by utilizing the deep reinforcement learning model when the target unmanned aerial vehicle operates.

From the above technical solutions, the embodiments of the present application have the following advantages:

according to the method and the device, the simulation environment corresponding to the simulation target unmanned aerial vehicle can be built through the device parameters of the target unmanned aerial vehicle and the environment parameters of different known environments, and then the deep reinforcement learning model for unmanned aerial vehicle navigation is built based on the simulation running information of the simulation target unmanned aerial vehicle in the simulation environment. Therefore, when the target unmanned aerial vehicle operates, navigation can be performed according to the real operation information of the target unmanned aerial vehicle and by using the deep reinforcement learning model. Because the deep reinforcement learning model is constructed based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment, the model can provide a simulation operation scheme of the simulation target unmanned aerial vehicle in the simulation environment. Therefore, when the target unmanned aerial vehicle actually operates, the actual operation scheme can be obtained by utilizing the model and the actual operation information of the target unmanned aerial vehicle, so that the navigation of the target unmanned aerial vehicle is realized without the need of being familiar with the environment in advance, the navigation efficiency and accuracy of the unmanned aerial vehicle are improved, and the effect of the unmanned aerial vehicle is exerted to the greatest extent.

Drawings

Fig. 1 is a flowchart of a method for navigating an unmanned aerial vehicle according to an embodiment of the present application;

FIG. 2 is a flowchart of an implementation of constructing a deep reinforcement learning model according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an unmanned aerial vehicle navigation device according to an embodiment of the present application.

Detailed Description

As described above, the inventors found in the study on the unmanned aerial vehicle navigation method that: the existing unmanned aerial vehicle navigation method generally utilizes an environment sensing device carried by the unmanned aerial vehicle to be familiar with a known environment, builds an environment model corresponding to the known environment in advance, and realizes autonomous navigation based on the environment model. Therefore, in the existing unmanned aerial vehicle navigation method, in order to realize a navigation scheme with higher accuracy, the requirement on the precision of an environment model is higher. In this case, if the known environment changes or the unmanned aerial vehicle enters the unknown environment, it is also difficult to achieve accurate autonomous navigation based on the environment model generated in advance.

In order to solve the above problems, an embodiment of the present application provides a method for navigating an unmanned aerial vehicle. The method may include: through the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments, a simulation environment corresponding to the simulation target unmanned aerial vehicle can be constructed, and then a deep reinforcement learning model for unmanned aerial vehicle navigation is constructed based on simulation running information of the simulation target unmanned aerial vehicle in the simulation environment. Therefore, when the target unmanned aerial vehicle operates, navigation can be performed according to the real operation information of the target unmanned aerial vehicle and by using the deep reinforcement learning model.

Because the deep reinforcement learning model is constructed based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment, the model can provide a simulation operation scheme of the simulation target unmanned aerial vehicle in the simulation environment. Therefore, when the target unmanned aerial vehicle actually operates, an actual operation scheme can be formed by utilizing the model and the actual operation information of the target unmanned aerial vehicle, so that the navigation of the target unmanned aerial vehicle is realized without being familiar with the environment in advance, the navigation efficiency and accuracy of the unmanned aerial vehicle are improved, and the effect of the unmanned aerial vehicle is exerted to the greatest extent.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Fig. 1 is a flowchart of a method for unmanned aerial vehicle navigation according to an embodiment of the present application. Referring to fig. 1, the unmanned aerial vehicle navigation method provided in the embodiment of the present application may include:

S101: and constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments.

Because the equipment configured by different types of unmanned aerial vehicles such as a multi-rotor unmanned aerial vehicle, a single-rotor unmanned aerial vehicle, a fixed-wing unmanned aerial vehicle and the like is different, the influence on environmental parameters such as air flow, air pressure and the like is also different. Based on the method, the simulation environment of the simulation unmanned aerial vehicle can be constructed by comprehensively considering the equipment parameters of the target unmanned aerial vehicle and the parameters of different known environments, so that the simulation environment with higher precision is constructed. The method for acquiring the device parameters of the target unmanned aerial vehicle may not be specifically limited. For example, the device parameters of the target drone may be obtained from a producer associated with the target drone, or, if the control system of the target drone is configured with a drone database, the device parameters may be obtained directly from the drone database. In addition, the method for acquiring the environmental parameters of different known environments is not limited in particular. For example, the environmental parameters of the corresponding environment may be determined based on the existing environmental model, or the environmental parameters of the operating environment may be obtained by using the environmental awareness device carried by the target unmanned aerial vehicle each time the target unmanned aerial vehicle operates.

In addition, the embodiment of the present application is not limited to a specific manner of constructing the simulation environment corresponding to the simulation target unmanned aerial vehicle, and for convenience of understanding, the following description is made with reference to a possible implementation manner.

In one possible implementation manner, S101 may specifically include: according to the equipment parameters of the target unmanned aerial vehicle, constructing a digital twin model corresponding to the target unmanned aerial vehicle as a simulation target unmanned aerial vehicle; constructing simulation environment sets corresponding to different known environments according to environment parameters of the different known environments; and constructing an unmanned aerial vehicle simulation environment based on the digital twin model and the simulation environment set. Therefore, high-precision simulation of the target unmanned aerial vehicle is realized through a digital twin technology, and the simulation environment set is combined, so that a deep reinforcement learning model with higher accuracy can be constructed later, and accurate unmanned aerial vehicle navigation is realized.

Specifically, the construction process of the digital twin model corresponding to the target unmanned aerial vehicle may include: based on the control and state estimation system test parameters in the equipment parameters, constructing a control and state estimation system simulation model corresponding to the simulation target unmanned aerial vehicle; based on simulation control parameters output by the control and state estimation system simulation model and power system test parameters in equipment parameters, constructing a power system simulation model corresponding to the simulation target unmanned aerial vehicle; based on simulation power system parameters output by the power system simulation model and dynamic model test parameters in equipment parameters, constructing a dynamic simulation model corresponding to the simulation target unmanned aerial vehicle; based on simulation dynamics parameters output by the dynamics simulation model and rigid motion model test parameters in equipment parameters, constructing a rigid motion simulation model corresponding to the simulation target unmanned aerial vehicle; and constructing a digital twin model according to the control and state estimation system simulation model, the power system simulation model, the dynamics simulation model and the rigid body motion simulation model.

In practical application, the target unmanned aerial vehicle takes a four-rotor unmanned aerial vehicle as an example, for a control and state system model, a series PID ((Proportional Integral Derivative, proportional integral derivative) control algorithm is adopted to determine a flight control simulation model, a sensor model added with sensor noise is adopted as a state estimation simulation model, and then a control and state estimation system simulation model is constructed based on the flight control simulation model and the state estimation simulation model.

In addition, in order to improve the accuracy of the simulation target unmanned aerial vehicle obtained based on the digital twin technology, in the embodiment of the application, the simulation target unmanned aerial vehicle can be optimized by using the simulation motion parameters output by the rigid motion simulation model. Specifically, the simulation motion parameters output by the rigid motion simulation model can be obtained; and updating the control and state estimation system simulation model and/or the dynamics simulation model according to the simulation motion parameters. Here, the simulation motion parameter may be a simulation air flow rate in a simulation environment in which the simulation target unmanned aerial vehicle is located.

S102: based on simulation running information of a simulation target unmanned aerial vehicle in a simulation environment, a deep reinforcement learning model for unmanned aerial vehicle navigation is constructed.

Here, the simulation running information may include simulation visual information of the simulation target unmanned aerial vehicle in the simulation environment, simulation navigation information of the simulation target unmanned aerial vehicle in the simulation environment, and simulation task information of the simulation target unmanned aerial vehicle in the simulation environment. In addition, for the construction process of the deep reinforcement learning model, reference is made to the description below for technical details.

In addition, in the embodiment of the application, in order to improve the accuracy of the deep reinforcement learning model, the deep reinforcement learning model can be optimized through a virtual-real migration technology. Specifically, a navigation test environment of the target unmanned aerial vehicle can be built, a navigation test is conducted on the deep reinforcement learning model in the navigation test environment to obtain a test result, then test random information is determined based on the simulation environment and the navigation test environment, and then the deep reinforcement learning model can be updated according to the test result and the test random information. The navigation test environment is a real environment, so that the random test information can be embodied as a small gap between the simulation environment and the real environment. In practical application, the test random information may be random environment information, such as random illumination information, random wind speed information, etc., and may also be a dynamic fuzzy error model affected by the angular speed of the unmanned aerial vehicle camera, a randomized unmanned aerial vehicle dynamic response model, etc. Thus, by simulating the gap between virtual and real, and optimizing the deep reinforcement learning model in this way, the gap can be reduced, and the accuracy of the deep reinforcement learning model can be improved.

S103: when the target unmanned aerial vehicle operates, navigation is performed by utilizing the deep reinforcement learning model according to the real operation information of the target unmanned aerial vehicle.

Here, the real operation information may include real visual information and real task information when the target unmanned aerial vehicle is operated.

For the real visual information, the environment information shot by the target unmanned aerial vehicle during running can be reflected, and in particular, if the target unmanned aerial vehicle is configured with an RGB (Red Green Blue) camera and a depth camera, the real visual information can be obtained in the following manner: shooting images of the running environment by utilizing an RGB camera and a depth camera respectively; performing feature processing and image recognition on the respectively shot images to obtain respectively processed images and image recognition results; and taking the respectively processed image and the image recognition result as real visual information. In addition, the real visual information may be acquired in a real-time acquisition manner, or may be acquired according to a preset acquisition frequency, for example, 60 frames per second, which is not particularly limited in this embodiment of the present application.

For real task information, it may represent the environmental destination of the target drone at runtime. Specifically, the real task information can be obtained through a mode that an unmanned aerial vehicle operator issues an instruction, specifically, the unmanned aerial vehicle operator can directly use an information input module configured by the target unmanned aerial vehicle to issue the instruction containing the real task information to the target unmanned aerial vehicle through an information input mode. For example, the information input module may be embodied as a keyboard, and the unmanned aerial vehicle operator manually inputs an instruction containing real task information by operating the keyboard, so as to obtain the instruction by the target unmanned aerial vehicle. Or, the information input module can be embodied as a voice acquisition module, the unmanned aerial vehicle operator inputs an instruction containing real task information in a voice mode, and the target unmanned aerial vehicle performs voice recognition to determine the real task information.

In addition, the implementation process of performing actual navigation of the target unmanned aerial vehicle by using the deep reinforcement learning model may not be specifically limited. For ease of understanding, the following description is provided in connection with one possible embodiment.

In one possible implementation, S103 may specifically include: acquiring real visual information and real task information when a target unmanned aerial vehicle runs; inputting real visual information and real task information into a deep reinforcement learning model; obtaining predicted navigation information output by the depth strengthening model; and controlling the operation of the target unmanned aerial vehicle based on the predicted navigation information. The predicted navigation information is navigation information matched with the real task information. Therefore, the deep reinforcement learning model constructed by the simulation operation information of the target simulation unmanned aerial vehicle in the simulation operation environment can obtain the actual operation scheme after providing the real visual information and the real character information to realize the navigation of the target unmanned aerial vehicle without being familiar with the environment in advance, so that the efficiency and the accuracy of the navigation of the unmanned aerial vehicle are improved, and the effect of the unmanned aerial vehicle is exerted to the maximum extent.

It can be appreciated that in the unmanned aerial vehicle navigation process, related operations can be performed not only for the target unmanned aerial vehicle, such as building an unmanned aerial vehicle simulation environment, building a deep reinforcement learning model for unmanned aerial vehicle navigation, and the like, but also for different unmanned aerial vehicles, so as to realize autonomous navigation for various unmanned aerial vehicles. In order to facilitate understanding of a navigation method for a specific unmanned aerial vehicle, in the embodiment of the present application, a target unmanned aerial vehicle is taken as an example to make a detailed description.

Based on the above relevant content of S101-S103, in the embodiment of the present application, through the device parameters of the target unmanned aerial vehicle and the environmental parameters of different known environments, a simulation environment corresponding to the simulation target unmanned aerial vehicle can be constructed first, and then based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment, a deep reinforcement learning model for unmanned aerial vehicle navigation is constructed. Therefore, when the target unmanned aerial vehicle operates, navigation can be performed according to the real operation information of the target unmanned aerial vehicle and by using the deep reinforcement learning model. Because the deep reinforcement learning model is constructed based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment, the model can provide a simulation operation scheme of the simulation target unmanned aerial vehicle in the simulation environment. Therefore, when the target unmanned aerial vehicle actually operates, the actual operation scheme can be obtained by utilizing the model and the actual operation information of the target unmanned aerial vehicle, so that the navigation of the target unmanned aerial vehicle is realized without the need of being familiar with the environment in advance, the navigation efficiency and accuracy of the unmanned aerial vehicle are improved, and the effect of the unmanned aerial vehicle is exerted to the greatest extent.

In order to achieve accurate autonomous navigation of the unmanned aerial vehicle, the embodiment of the application can adopt deep reinforcement learning to navigate the target unmanned aerial vehicle. Based on this, embodiments of the present application may provide one possible implementation of constructing a deep reinforcement learning model. Which may specifically include S201-S203. S201 to S203 will be described below with reference to the embodiments and drawings, respectively.

Fig. 2 is a flowchart of an implementation manner of constructing a deep reinforcement learning model according to an embodiment of the present application. As shown in connection with fig. 2, S201 to S203 may specifically include:

s201: based on the simulation running information, a navigation strategy model for planning navigation information is constructed by utilizing a deep learning algorithm.

The embodiment of the present application may not be limited to a specific process for constructing the navigation policy model, and for convenience of understanding, a possible implementation will be described below.

In one possible implementation, S201 may specifically include: taking simulation visual information of the simulation target unmanned aerial vehicle in a simulation environment as input of a navigation prediction model, and taking simulation navigation information of the simulation target unmanned aerial vehicle in the simulation environment as output of the navigation prediction model to construct the navigation prediction model; the simulation task information of the simulation target unmanned aerial vehicle in the simulation environment and the output of the navigation prediction model are taken as the input of the navigation matching model together, and the matching degree between the output of the navigation prediction model and the simulation task information is taken as the output of the navigation matching model, so that the navigation matching model is constructed; and constructing a navigation strategy model based on the navigation prediction model and the navigation matching model. Here, the navigation prediction model can predict the navigation path based on the simulation visual information of the simulation target unmanned aerial vehicle in the simulation environment, and the navigation matching model can judge the matching degree between the navigation path predicted by the navigation prediction model and the simulation task information, so that task execution of the task is facilitated.

Wherein the navigation prediction model may be composed of a multi-layered network structure. Specifically, the simulated visual information takes an RGB image photographed by the RGB camera subjected to the feature processing, a depth image photographed by the depth camera subjected to the feature processing, and an image recognition result of the depth image as examples, and the deep learning models corresponding to the three different simulated visual information are processed to jointly form the navigation prediction model. The network structure of the deep learning model corresponding to the RGB image can be embodied in a ResNet50 network as a first layer and a full-connection layer as a second layer; the network structure of the deep learning model corresponding to the depth image and the image recognition result of the depth image respectively can be represented by a first layer being a CNN (Convolutional Neural Networks, convolutional neural network) network and a second layer being a full-connection layer. Further, the processing of the deep learning network corresponding to the three different simulated visual information may include performing joint embedding training on the three network structures, embedding the three network structures into the same vector space to perform information fusion, and storing the three network structures through a memory network.

In addition, the navigation matching model may be a classification model, such as a transducer model. For example, if the simulated task information is embodied as a boy searching for a yellow hat, the degree of matching of the navigation matching model output may be expressed as matching when the simulated navigation information is embodied as a boy of at least one yellow hat appears in the simulated visual information of the target unmanned aerial vehicle; when the simulated navigation information is embodied as boys with yellow caps not appearing in the simulated visual information of the target unmanned aerial vehicle, the matching degree output by the navigation matching model can be expressed as mismatching.

S202: based on the navigation strategy model, constructing a navigation evaluation model for evaluating navigation information by using a reinforcement learning algorithm.

The process of constructing the navigation evaluation model may not be specifically limited, and for convenience of understanding, the following description will be made with reference to one possible implementation.

In one possible implementation, S202 may specifically include: obtaining navigation information matched with simulation task information from simulation navigation information output by a navigation prediction model as target navigation information; and taking the output of the navigation strategy model and the target navigation information together as the input of the navigation evaluation model, and taking the rewarding evaluation value corresponding to the target navigation information as the output of the navigation evaluation model to construct the navigation evaluation model. Here, the prize evaluation value may be determined based on criteria required at the time of actual application, for example, a completion time period based on simulation task information, or the like. Therefore, the accuracy of the navigation prediction model is judged by evaluating the target navigation information, so that the navigation prediction model is conveniently optimized based on the navigation evaluation model.

S203: and optimizing the navigation strategy model based on the navigation evaluation model until the navigation strategy model is converged, and taking the converged navigation strategy model as a deep reinforcement learning model.

The navigation evaluation model is used for optimizing the parameters of the navigation strategy model, so that the accuracy of the navigation prediction model is further improved, the navigation efficiency and accuracy of the unmanned aerial vehicle are improved, and the unmanned aerial vehicle is maximally exerted.

Based on the above-mentioned related content of S201-S203, in the embodiment of the present application, navigation information is planned by constructing a navigation policy model, and a navigation evaluation model is constructed to evaluate the navigation information and update and optimize the navigation policy model, so that when the target unmanned aerial vehicle actually operates, the finally obtained deep reinforcement learning model and the actual operation information of the target unmanned aerial vehicle can be utilized to obtain an actual operation scheme to realize the navigation of the target unmanned aerial vehicle, and the environment does not need to be familiar in advance, thereby improving the efficiency and accuracy of unmanned aerial vehicle navigation, and maximally playing the role of the unmanned aerial vehicle.

Based on the unmanned aerial vehicle navigation method provided by the embodiment, the embodiment of the application also provides an unmanned aerial vehicle navigation device. The unmanned aerial vehicle navigation device is described below with reference to the embodiments and drawings, respectively.

Fig. 3 is a schematic structural diagram of an unmanned aerial vehicle navigation device according to an embodiment of the present application. Referring to fig. 3, the unmanned aerial vehicle navigation device 300 provided in the embodiment of the present application may include:

The simulation environment construction module 301 is configured to construct a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the device parameter of the target unmanned aerial vehicle and the environment parameters of different known environments;

the model construction module 302 is configured to construct a deep reinforcement learning model for unmanned aerial vehicle navigation based on simulation operation information of the simulation target unmanned aerial vehicle in a simulation environment;

the navigation module 303 is configured to navigate by using the deep reinforcement learning model according to real operation information of the target unmanned aerial vehicle when the target unmanned aerial vehicle is operating.

In the embodiment of the application, through the cooperation of the simulation environment construction module 301, the model construction module 302 and the navigation module 303, when the target unmanned aerial vehicle actually operates, the actual operation information of the model and the target unmanned aerial vehicle can be utilized to obtain the actual operation scheme so as to realize the navigation of the target unmanned aerial vehicle, and the environment is not required to be familiar in advance, so that the navigation efficiency and accuracy of the unmanned aerial vehicle are improved, and the effect of the unmanned aerial vehicle is exerted to the maximum extent.

In one embodiment, to improve the efficiency and accuracy of unmanned aerial vehicle navigation, the model building module 302 may specifically include:

the navigation strategy model construction module is used for constructing a navigation strategy model for planning navigation information based on simulation operation information by utilizing a deep learning algorithm;

The navigation evaluation model construction module is used for constructing a navigation evaluation model for evaluating navigation information based on the navigation strategy model by utilizing a reinforcement learning algorithm;

and the model optimization module is used for optimizing the navigation strategy model based on the navigation evaluation model until the navigation strategy model is converged, and taking the converged navigation strategy model as a deep reinforcement learning model.

As an implementation manner, in order to improve the efficiency and accuracy of unmanned aerial vehicle navigation, the navigation policy model building module specifically may include:

the first construction module is used for taking simulation visual information of the simulation target unmanned aerial vehicle in a simulation environment as input of a navigation prediction model, and taking simulation navigation information of the simulation target unmanned aerial vehicle in the simulation environment as output of the navigation prediction model to construct the navigation prediction model;

the second construction module is used for taking the simulation task information of the simulation target unmanned aerial vehicle in the simulation environment and the output of the navigation prediction model together as the input of the navigation matching model, and taking the matching degree between the output of the navigation prediction model and the simulation task information as the output of the navigation matching model to construct the navigation matching model;

and the third construction module is used for constructing a navigation strategy model based on the navigation prediction model and the navigation matching model.

As an implementation manner, in order to improve the efficiency and accuracy of unmanned aerial vehicle navigation, the navigation evaluation model building module specifically may include:

the fourth construction module is used for acquiring navigation information matched with the simulation task information from the simulation navigation information output by the navigation prediction model as target navigation information;

and the fifth construction module is used for constructing a navigation evaluation model by taking the output of the navigation strategy model and the target navigation information together as the input of the navigation evaluation model and taking the rewarding evaluation value corresponding to the target navigation information as the output of the navigation evaluation model.

As an embodiment, in order to improve the efficiency and accuracy of unmanned aerial vehicle navigation, the unmanned aerial vehicle navigation device 300 further includes:

the navigation test module is used for constructing a navigation test environment of the target unmanned aerial vehicle, and performing navigation test on the deep reinforcement learning model in the navigation test environment to obtain a test result;

the test random information determining module is used for determining test random information based on the simulation environment and the navigation test environment;

and the first model updating module is used for updating the deep reinforcement learning model according to the test result and the test random information.

As an embodiment, in order to improve the efficiency and accuracy of unmanned aerial vehicle navigation, the navigation module 303 specifically includes:

the first navigation module is used for acquiring real visual information and real task information when the target unmanned aerial vehicle runs;

the second navigation module is used for inputting real visual information and real task information into the deep reinforcement learning model;

the third navigation module is used for acquiring predicted navigation information output by the depth strengthening model; the predicted navigation information is the navigation information matched with the real task information;

and the fourth navigation module is used for controlling the operation of the target unmanned aerial vehicle based on the predicted navigation information.

As an embodiment, to improve the efficiency and accuracy of unmanned aerial vehicle navigation, the simulation environment construction module 301 may specifically include:

the digital twin model construction module is used for constructing a digital twin model corresponding to the target unmanned aerial vehicle as a simulation target unmanned aerial vehicle according to the equipment parameters of the target unmanned aerial vehicle;

the simulation environment set building module is used for building simulation environment sets corresponding to different known environments according to the environment parameters of the different known environments;

the simulation environment construction submodule is used for constructing a simulation environment based on the digital twin model and the simulation environment set.

As an implementation manner, in order to improve the efficiency and accuracy of unmanned aerial vehicle navigation, the digital twin model building module specifically may include:

the first simulation model construction module is used for constructing a control and state estimation system simulation model of the simulation target unmanned aerial vehicle based on control and state estimation system test parameters in the equipment parameters;

the second simulation model construction module is used for constructing a power system simulation model of the simulation target unmanned aerial vehicle based on simulation control parameters output by the control and state estimation system simulation model and power system test parameters in the equipment parameters;

the third simulation model construction module is used for constructing a dynamic simulation model of the simulation target unmanned aerial vehicle based on the simulation power system parameters output by the power system simulation model and the dynamic model test parameters in the equipment parameters;

the fourth simulation model construction module is used for constructing a rigid motion simulation model of the simulation target unmanned aerial vehicle based on simulation dynamic parameters output by the dynamic simulation model and rigid motion model test parameters in equipment parameters;

and the fifth simulation model construction module is used for constructing a digital twin model according to the control and state estimation system simulation model, the power system simulation model, the dynamics simulation model and the rigid body motion simulation model.

As an embodiment, in order to improve the efficiency and accuracy of unmanned aerial vehicle navigation, the unmanned aerial vehicle navigation model further includes:

the simulation motion parameter acquisition module is used for acquiring simulation motion parameters output by the rigid motion simulation model;

and the second model updating module is used for updating the control and state estimation system simulation model and/or the dynamic simulation model according to the simulation motion parameters.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method of unmanned aerial vehicle navigation, comprising:

constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments; the simulation target unmanned aerial vehicle is a digital twin model of the target unmanned aerial vehicle constructed based on the equipment parameters;

when the target unmanned aerial vehicle runs, navigation is carried out by utilizing the deep reinforcement learning model according to the real running information of the target unmanned aerial vehicle;

based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment, constructing a deep reinforcement learning model for unmanned aerial vehicle navigation, comprising:

optimizing the navigation strategy model based on the navigation evaluation model until the navigation strategy model converges, and taking the converged navigation strategy model as the deep reinforcement learning model;

based on the simulation operation information, and by using a deep learning algorithm, constructing a navigation strategy model for planning navigation information, including:

constructing the navigation strategy model based on the navigation prediction model and the navigation matching model;

when the target unmanned aerial vehicle operates, according to the real operation information of the target unmanned aerial vehicle and before the navigation is performed by using the deep reinforcement learning model, the method further comprises the following steps:

2. The method of claim 1, wherein the constructing a navigation assessment model for assessing the navigation information based on the navigation strategy model and using a reinforcement learning algorithm comprises:

3. The method according to claim 1 or 2, wherein the navigating with the deep reinforcement learning model according to the real operation information of the target unmanned aerial vehicle when the target unmanned aerial vehicle is operating comprises:

4. The method according to claim 1 or 2, wherein the constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the device parameters of the target unmanned aerial vehicle and the environment parameters of different known environments includes:

5. The method according to claim 4, wherein the constructing a digital twin model corresponding to the target unmanned aerial vehicle as the simulated target unmanned aerial vehicle according to the device parameters of the target unmanned aerial vehicle includes:

6. The method of claim 5, wherein the method further comprises:

7. An unmanned aerial vehicle navigation device, comprising:

the simulation environment construction module is used for constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments; the simulation target unmanned aerial vehicle is a digital twin model of the target unmanned aerial vehicle constructed based on the equipment parameters;

the navigation module is used for navigating according to the real operation information of the target unmanned aerial vehicle and by utilizing the deep reinforcement learning model when the target unmanned aerial vehicle operates;

The model construction module specifically comprises:

the navigation strategy model construction module is used for constructing a navigation strategy model for planning navigation information based on the simulation operation information by utilizing a deep learning algorithm;

the navigation evaluation model construction module is used for constructing a navigation evaluation model for evaluating the navigation information based on the navigation strategy model by utilizing a reinforcement learning algorithm;

the model optimization module is used for optimizing the navigation strategy model based on the navigation evaluation model until the navigation strategy model is converged, and taking the converged navigation strategy model as the deep reinforcement learning model;

the navigation strategy model construction module specifically comprises:

the first construction module is used for taking the simulation visual information of the simulation target unmanned aerial vehicle in the simulation environment as the input of a navigation prediction model, and taking the simulation navigation information of the simulation target unmanned aerial vehicle in the simulation environment as the output of the navigation prediction model to construct the navigation prediction model;

the second construction module is used for taking the simulation task information of the simulation target unmanned aerial vehicle in the simulation environment and the output of the navigation prediction model together as the input of a navigation matching model, and taking the matching degree between the output of the navigation prediction model and the simulation task information as the output of the navigation matching model to construct the navigation matching model;

The third construction module is used for constructing the navigation strategy model based on the navigation prediction model and the navigation matching model;

the unmanned aerial vehicle navigation device further includes: