CN108594858B

CN108594858B - Unmanned aerial vehicle searching method and device for Markov moving target

Info

Publication number: CN108594858B
Application number: CN201810779927.8A
Authority: CN
Inventors: 陈立家; 王赞; 汪晓群; 薛政钢; 管禹; 赵瑞杰; 冯帅栋; 冯子凯; 王敬飞; 赵成伟; 袁蒙恩
Original assignee: Henan Zhouhe Network Technology Co ltd; Henan University
Current assignee: Henan Zhouhe Network Technology Co ltd; Henan University
Priority date: 2018-07-16
Filing date: 2018-07-16
Publication date: 2020-10-27
Anticipated expiration: 2038-07-16
Also published as: CN108594858A

Abstract

The invention discloses a UAV search method and device for a Markov moving target. After receiving a search task, all possible states and their probability distributions during the movement process of the Markov moving target and the UAV search process are obtained. ; Build a Markov model for UAV behavior prediction under search tasks, and establish a multi-stage heuristic strategy iteration algorithm based on Markov decision-making; obtain the most profitable search behavior strategy, and plan the optimal UAV search track . The invention overcomes the shortcoming of high search cost caused by the lack of strict mathematical definition of the motion law of the search target in the traditional search algorithm. The virtual target position to be searched in the next moment, and the search behavior strategy with the most profit can be obtained, which can be applied to search for moving targets, and the target can be successfully searched with a lower search cost.

Description

UAV search method and device for Markov moving target

技术领域technical field

本发明涉及无人机的控制领域，尤其涉及马尔科夫运动目标的无人机搜索方法及装置。The invention relates to the control field of unmanned aerial vehicles, in particular to a method and a device for searching for unmanned aerial vehicles of Markov moving targets.

背景技术Background technique

如今，科技的力量渗透在各行各业。作为一股新的科技力量，无人机在各行各业的应用前景备受关注。其中无人机在军事目标搜索系统中搜索与营救应用已经得到世界各国的重视。航迹规划是无人机应用于目标搜索系统中必须考虑的重要问题。Today, the power of technology permeates all walks of life. As a new scientific and technological force, the application prospects of UAVs in all walks of life have attracted much attention. Among them, the search and rescue application of UAVs in military target search systems has been paid attention to by countries all over the world. Track planning is an important issue that must be considered in the application of UAV to target search system.

无人机目标搜索系统的航迹规划包括两大类，即静止目标搜索系统航迹规划以及运动目标搜索系统航迹规划。当前，对于无人机静止目标搜索系统航迹规划的研究成果比较成熟，已经广泛应用于实际问题。但对于运动目标搜索系统航迹规划的研究工作相对较少。目前，对于运动目标搜索系统航迹规划问题的研究方向主要集中在对搜索算法本身的设计与应用上，而对搜索过程中目标的运动规律问题缺少严格的数学定义。另一方面，无人机独立执行任务的能力较弱，其智能化程度在很多情况下不足以满足人们的期望。在任务执行的过程中，无人机一般都是按照事先载入的程序进行，应变能力差。因此，多无人机协同参与任务是一种比较可靠的选择。这也使得无人机在搜索方面的应用与实际的应用契合的越来越紧密。The track planning of the UAV target search system includes two categories, namely the track planning of the stationary target search system and the track planning of the moving target search system. At present, the research results of the trajectory planning of the UAV stationary target search system are relatively mature and have been widely used in practical problems. However, there are relatively few researches on the trajectory planning of the moving target search system. At present, the research direction of the trajectory planning problem of the moving target search system mainly focuses on the design and application of the search algorithm itself, but there is no strict mathematical definition for the problem of the movement law of the target in the search process. On the other hand, the ability of UAV to perform tasks independently is weak, and its intelligence is not enough to meet people's expectations in many cases. In the process of task execution, UAVs generally follow the pre-loaded procedures, and the adaptability is poor. Therefore, multi-UAV cooperative participation in tasks is a relatively reliable choice. This also makes the application of drones in search more and more closely aligned with practical applications.

马尔科夫决策过程(Markov Decision Processes，MDP)是一种具有决策能力的马尔科夫奖赏过程。在功能应用方面，它是用来解决在不确定性环境下序贯决策问题的理论知识。其中的不确定环境下序贯决策问题是指在一系列连续的或者离散的时刻(称为决策时刻)做出决策。人们在做决策的时候不仅要考虑决策当前的效果，也要照顾到所做的决策对长远利益的影响。MDP是一种应用非常广泛的决策过程。Markov Decision Processes (MDP) is a Markov reward process with decision-making ability. In functional applications, it is theoretical knowledge used to solve sequential decision-making problems in uncertain environments. The sequential decision-making problem in uncertain environment refers to making decisions at a series of continuous or discrete moments (called decision moments). When people make decisions, they should not only consider the current effect of the decision, but also take into account the impact of the decision on the long-term interests. MDP is a very widely used decision-making process.

传统的无人机搜索算法，如扫描线搜索算法，只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义，存在搜索代价高昂的较大缺点。Traditional UAV search algorithms, such as scan line search algorithms, only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which has the disadvantage of high search cost.

发明内容SUMMARY OF THE INVENTION

为了克服现有技术的不足，本发明的目的在于提供马尔科夫运动目标的无人机搜索方法及装置，旨在解决现有的传统算法应用于无人机搜索任务时不能满足运动目标搜索航迹规划需求或者搜索代价高昂的问题。In order to overcome the deficiencies of the prior art, the purpose of the present invention is to provide a UAV search method and device for Markov moving targets, aiming to solve the problem that the existing traditional algorithms cannot satisfy the moving target search navigation when applied to the UAV search task. trace planning requirements or search for costly problems.

本发明的目的采用以下技术方案实现：Purpose of the present invention adopts following technical scheme to realize:

一种马尔科夫运动目标的无人机搜索方法，包括：A UAV search method for Markov moving targets, comprising:

目标步骤，接收到搜索任务后，构建马尔科夫运动目标的概率模型，从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布；In the target step, after receiving the search task, construct the probability model of the Markov moving target, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;

无人机步骤，获取无人机搜索过程中所有可能出现的状态及其概率分布；UAV step to obtain all possible states and their probability distributions during the UAV search process;

构建步骤，根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布，构建搜索任务下无人机行为预测的马尔科夫模型，建立基于马尔科夫决策的多阶段启发式策略迭代算法；The construction step is to construct a Markov model for UAV behavior prediction under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process, and establish Markov-based decision-making. The multi-stage heuristic strategy iteration algorithm of ;

规划步骤，利用基于马尔科夫决策的多阶段启发式策略迭代算法，获取收益最大的搜索行为策略，从而规划出无人机最优的搜索航迹。In the planning step, the multi-stage heuristic strategy iteration algorithm based on Markov decision is used to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV.

在上述实施例的基础上，优选的，所述目标步骤，具体为：On the basis of the above embodiment, preferably, the target step is specifically:

接收到搜索任务后，对目标的运动区域进行栅格化划分，利用概率论对马尔科夫运动目标建模，获取马尔科夫运动目标的概率模型，从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布；所述目标的运动区域等同于无人机的搜索区域。After receiving the search task, the moving area of the target is divided into grids, the Markov moving target is modeled by probability theory, and the probability model of the Markov moving target is obtained, so as to obtain all the marks during the movement of the Markov moving target. The possible states and their probability distributions; the movement area of the target is equivalent to the search area of the UAV.

在上述任意实施例的基础上，优选的，所述无人机步骤，具体为：On the basis of any of the above-mentioned embodiments, preferably, the unmanned aerial vehicle step is specifically:

对无人机的飞行行为进行编码并描述，对无人机在执行搜索任务过程中存在的状态进行描述，从而获取无人机搜索过程中所有可能出现的状态及其概率分布。The flight behavior of the UAV is encoded and described, and the state of the UAV in the process of performing the search task is described, so as to obtain all possible states and their probability distributions during the UAV search process.

在上述任意实施例的基础上，优选的，所述构建搜索任务下无人机行为预测的马尔科夫模型，具体为：On the basis of any of the above-mentioned embodiments, preferably, the construction of a Markov model for predicting the behavior of the UAV under the search task is specifically:

设无人机搜索任务进行中的时刻集合T＝{1,2,3,…}；Set the time set T={1,2,3,…} when the UAV search task is in progress;

设无人机的离散状态空间S＝(s₁,s₂,s₃,…)，该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态；Set the discrete state space S=(s ₁ , s ₂ , s ₃ ,...) of the UAV, which contains all possible states during the UAV search process and the Markov moving target movement process;

设无人机的动作空间A＝{a₁,a₂,...,a_x,...,a_q}，表示无人机所有可能的改变状态的动作，其中元素a_x表示第x个动作，q为动作空间中的元素个数；Let the action space A of the UAV = {a ₁ ,a ₂ ,...,a _x ,...,a _q }, which represents all possible actions of the UAV to change the state, where the element a _x represents the xth actions, q is the number of elements in the action space;

设无人机处于状态s_n下的可行动作集合A(s_n)＝{a₁(s_n),a₂(s_n),a₃(s_n),...},表示无人机位于某个状态下可以采取的所有动作集合；Suppose the feasible action set A(s _n ) of the UAV in the state _sn = {a ₁ (s _n ), a ₂ (s _n ), a ₃ (s _n ),...}, representing the UAV A collection of all actions that can be taken in a state;

设T(s_n,a_x(s_n),s_m)表示无人机所有状态转移概率集合，其中的任意元素p(s_m|s_n,a_x(s_n))表示在状态s_n下，执行可用动作a_x(s_n)之后，状态变化到s_m的概率，

Let T(s _n , a _x (s _n ), s _m ) represent the set of all state transition probabilities of the UAV, and any element p(s _m |s _n , a _x (s _n )) in the state s _n Next, the probability that the state changes to s _m after performing the available action a _x (s _n ),

设报酬集合R(s_n)的任意元素r(s_n,a_x(s_n))表示在状态s_n执行动作a_x(s_n)的报酬；Let any element r(s _n , a _x (s _n )) of the reward set R(s _n ) represent the reward for performing the action a _x (s _n ) in the state _sn ;

则无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型为：Then the Markov model for predicting the search behavior of the UAV under any search task is:

MDP＝{S,A,T(s_n,a_x(s_n),s_m),R(s_n)}→π(s_n)；MDP={S,A,T(s _n ,a _x (s _n ),s _m ),R(s _n )}→π(s _n );

其中，π为策略，表示从状态集合到动作集合的映射，π(s_n)代表无人机从状态s_n到动作集合的映射，→表示输出最优策略。Among them, π is the strategy, which represents the mapping from the state set to the action set, π(s _n ) represents the mapping of the UAV from the state s _n to the action set, and → represents the output optimal strategy.

在上述实施例的基础上，优选的，所述规划步骤，包括：On the basis of the above embodiment, preferably, the planning step includes:

计算步骤：用MDP折扣模型计算报酬效用函数，其中折扣因子γ满足：0<γ<1；折扣模型的报酬函数

表示在从时刻t＝0开始无人机从状态s_n使用策略π后的折扣期望总报酬；Calculation steps: use the MDP discount model to calculate the reward utility function, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model

represents the discounted expected total reward after the UAV uses policy π from state sn from time _t =0;

根据MDP折扣模型的最优方程，建立在状态s_n下无人机在搜索任务中搜索操作收益的最优状态值函数方程

以及最优动作值函数方程

并根据两个最优函数方程建立最优搜索策略函数方程

According to the optimal equation of the MDP discount model, the optimal state value function equation of the UAV's search operation revenue in the search task under the state _sn is established

and the optimal action value function equation

And establish the optimal search strategy function equation according to the two optimal function equations

给定步骤：对协同搜索区域进行栅格化划分，确定MDP模型的离散状态空间S，给定参与搜索任务的无人机数目g，g、i为正整数，s_UAV(i)为第i架无人机当前状态，s_UAV(i)∈S，A(s_UAV(i))为第i架无人机在状态s_UAV(i)下的动作集合，K_i为第i架无人机的最大搜索步长；给定折扣因子γ和策略迭代的结束条件ε，令迭代次数b＝0；Given step: divide the collaborative search area into grids, determine the discrete state space S of the MDP model, and give the number of UAVs participating in the search task g, where g and i are positive integers, and s _UAV(i) is the i-th UAV(i) is the current state of a UAV, s _UAV(i) ∈ S, A(s _UAV(i) ) is the action set of the i-th UAV in the state s _UAV(i) , and K _i is the i-th UAV The maximum search step size of the machine; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;

初始步骤：确定目标运动的初始位置以及每个无人机开始搜索的位置；每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布，从而确定每个无人机下一时刻即将搜索的虚拟目标位置；Initial step: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the target existence probability distribution in the entire area according to the initial position of the target movement and the target movement heuristic information, so as to determine The virtual target position that each drone will search for at the next moment;

迭代步骤：每个无人机根据自己的虚拟目标位置，并根据自己当前的状态s_UAV(i)，迭代计算各自的状态值函数V^b+1(s_UAV(i))，令迭代次数b＝b+1；Iterative step: each UAV calculates its own state value function V ^b+1 (s _UAV(i) ) iteratively according to its own virtual target position and its current state s _UAV(i) , let the number of iterations b =b+1;

判断步骤：如果||V^b+1(s_UAV(i))-V^b(s_UAV(i))||<ε，则结束迭代，进入遍历步骤；否则，转到迭代步骤；Judgment step: if ||V ^b+1 (s _UAV(i) )-V ^b (s _UAV(i) )||<ε, end the iteration and enter the traversal step; otherwise, go to the iteration step;

遍历步骤：每个无人机根据最终得到的状态值函数V^b+1(s_UAV(i))遍历A(s_UAV(i))获得Q(s_UAV(i),a_i)，最终求得收益最大的搜索行为策略π_i(t+1)^*；Traversal step: each UAV traverses A(s _UAV(i) ) according to the final state value function V ^b+1 (s _UAV( i) ) to obtain Q(s _UAV(i) , a _i ), and finally finds get the most profitable search behavior strategy π _i (t+1) ^* ;

转移步骤：按照所求的最优策略π_i(t+1)^*执行动作a_i，状态由

转移到

同时，无人机获得立即报酬r_i(s_UAV(i),a_i)，此时令t＝t+1，令第i架无人机搜索步长k_i＝k_i+1；Transition step: according to the required optimal strategy π _i (t+1) ^* execute action a _i , the state is given by

move to

At the same time, the UAV obtains an immediate reward ri (s _UAV(i) _, a _i ), at this time, let t=t+1, and let the i-th UAV search step _ki = _ki +1;

结束步骤：若在某一个时刻t，第i架无人机位置s_UAV(i)与目标当前模拟位置s_target相同，则第i架无人机成功搜索到目标，搜索任务完成，算法结束；若搜索步长

i＝1,2,...,n，则搜索任务失败，算法结束；所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现的位置。Ending step: If at a certain time t, the position s _UAV(i) of the i-th UAV is the same as the current simulated position s _target of the target, then the i-th UAV successfully searches for the target, the search task is completed, and the algorithm ends; If the search step

i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. possible location.

一种马尔科夫运动目标的无人机搜索装置，包括：A UAV search device for Markov moving targets, comprising:

目标模块，用于接收到搜索任务后，构建马尔科夫运动目标的概率模型，从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布；The target module is used to construct the probability model of the Markov moving target after receiving the search task, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;

无人机模块，用于获取无人机搜索过程中所有可能出现的状态及其概率分布；The UAV module is used to obtain all possible states and their probability distributions during the UAV search process;

构建模块，用于根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布，构建搜索任务下无人机行为预测的马尔科夫模型，建立基于马尔科夫决策的多阶段启发式策略迭代算法；The building block is used to construct a Markov model for UAV behavior prediction under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process. multi-stage heuristic policy iteration algorithm for decision-making;

规划模块，用于利用基于马尔科夫决策的多阶段启发式策略迭代算法，获取收益最大的搜索行为策略，从而规划出无人机最优的搜索航迹。The planning module is used to use the multi-stage heuristic strategy iteration algorithm based on Markov decision to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV.

在上述实施例的基础上，优选的，所述目标模块用于：On the basis of the above embodiment, preferably, the target module is used for:

在上述任意实施例的基础上，优选的，所述无人机模块用于：On the basis of any of the above embodiments, preferably, the UAV module is used for:

设T(s_n,a_x(s_n),s_m)表示无人机所有状态转移概率集合，其中的任意元素p(s_m|s_n,a_x(s_n))表示在状态s_n下，执行可用动作a_x(s_n)之后，状态变化到s_m的概率，其中i,j＝1,2,3,...,

Let T(s _n , a _x (s _n ), s _m ) represent the set of all state transition probabilities of the UAV, and any element p(s _m |s _n , a _x (s _n )) in the state s _n below, the probability that the state changes to s _m after performing the available actions a _x (s _n ), where i,j=1,2,3,...,

在上述实施例的基础上，优选的，所述规划模块包括：On the basis of the above embodiment, preferably, the planning module includes:

计算单元，用于：用MDP折扣模型计算报酬效用函数，其中折扣因子γ满足：0<γ<1；折扣模型的报酬函数

表示在从时刻t＝0开始无人机从状态s_n使用策略π后的折扣期望总报酬；The calculation unit is used to: calculate the reward utility function with the MDP discount model, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model

以及最优动作值函数方程

并根据两个最优函数方程建立最优搜索策略函数方程

and the optimal action value function equation

给定单元，用于：对协同搜索区域进行栅格化划分，确定MDP模型的离散状态空间S，给定参与搜索任务的无人机数目g，s_UAV(i)为第i架无人机当前状态，s_UAV(i)∈S，A(s_UAV(i))为第i架无人机在状态s_UAV(i)下的动作集合，K_i为第i架无人机的最大搜索步长；给定折扣因子γ和策略迭代的结束条件ε，令迭代次数b＝0；The given unit is used to: rasterize the collaborative search area, determine the discrete state space S of the MDP model, given the number of UAVs participating in the search task g, s _UAV(i) is the i-th UAV The current state, s _UAV(i) ∈ S, A(s _UAV(i) ) is the action set of the i-th UAV in the state s _UAV(i) , and K _i is the maximum search of the i-th UAV Step size; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;

初始单元，用于：确定目标运动的初始位置以及每个无人机开始搜索的位置；每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布，从而确定每个无人机下一时刻即将搜索的虚拟目标位置；The initial unit is used to: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the existence probability distribution of the target in the entire area according to the initial position of the target movement and the target movement heuristic information , so as to determine the virtual target position that each UAV will search for at the next moment;

迭代单元，用于：每个无人机根据自己的虚拟目标位置，并根据自己当前的状态s_UAV(i)，迭代计算各自的状态值函数V^b+1(s_UAV(i))，令迭代次数b＝b+1；The iterative unit is used for: each UAV based on its own virtual target position and its current state s _UAV(i) , iteratively calculates its own state value function V ^b+1 (s _UAV(i) ), let The number of iterations b=b+1;

判断单元，用于：如果||V^b+1(s_UAV(i))-V^b(s_UAV(i))||<ε，则结束迭代，调用遍历单元，否则，调用迭代单元；The judgment unit is used for: if ||V ^b+1 (s _UAV(i) )-V ^b (s _UAV(i) )||<ε, end the iteration and call the traversal unit, otherwise, call the iteration unit;

遍历单元，用于：每个无人机根据最终得到的状态值函数V^b+1(s_UAV(i))遍历A(s_UAV(i))获得Q(s_UAV(i),a_i)，最终求得收益最大的搜索行为策略π_i(t+1)^*；Traversal unit for: each UAV traverses A(s _UAV(i) ) according to the final state value function V ^b+1 (s _UAV( i) ) to obtain Q(s _UAV(i) , a _i ) , and finally obtain the search behavior strategy with the largest profit π _i (t+1) ^* ;

转移单元，用于：按照所求的最优策略π_i(t+1)^*执行动作a_i，状态由

转移到

同时，无人机获得立即报酬r_i(s_UAV(i),a_i)，此时令t＝t+1，令第i架无人机搜索步长k_i＝k_i+1；The transition unit is used to: execute the action a _i according to the required optimal strategy π _i (t+1) ^* , the state is given by

move to

结束单元，用于：若在某一个时刻t，第i架无人机位置s_UAV(i)与目标当前模拟位置S_target相同，则第i架无人机成功搜索到目标，搜索任务完成，算法结束；若搜索步长

i＝1,2,...,n，则搜索任务失败，算法结束；所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现的位置。The end unit is used for: if at a certain time t, the position s _UAV(i) of the i-th UAV is the same as the current simulated position S _target of the target, then the i-th UAV successfully searches for the target, and the search task is completed, The algorithm ends; if the search step is

相比现有技术，本发明的有益效果在于：Compared with the prior art, the beneficial effects of the present invention are:

本发明公开了马尔科夫运动目标的无人机搜索方法及装置，在接收到搜索任务后，构建马尔科夫运动目标的概率模型，得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布；获取无人机搜索过程中所有可能出现的状态及其概率分布；构建搜索任务下无人机行为预测的马尔科夫模型，建立基于马尔科夫决策的多阶段启发式策略迭代算法；利用基于马尔科夫决策的多阶段启发式策略迭代算法，获取收益最大的搜索行为策略，从而规划出无人机最优的搜索航迹。The invention discloses a UAV search method and device for a Markov moving target. After a search task is received, a probability model of the Markov moving target is constructed to obtain all possible states and states during the movement of the Markov moving target. Its probability distribution; obtain all possible states and their probability distributions in the UAV search process; build a Markov model for UAV behavior prediction under search tasks, and establish a multi-stage heuristic strategy iteration algorithm based on Markov decision ; Use the multi-stage heuristic strategy iteration algorithm based on Markov decision to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV.

本发明能够采用MDP折扣模型确定搜索任务下无人机存在状态转移概率集合和转移过程中飞行行为操作的报酬集合，计算报酬效用函数，建立搜索任务下无人机行为收益的最优方程，进行迭代计算和判断，得到收益最大的搜索行为策略，规划出无人机最优的搜索航迹。The present invention can use the MDP discount model to determine the probability set of the existence state transition of the UAV under the search task and the reward set of the flight behavior operation in the transfer process, calculate the reward utility function, establish the optimal equation of the UAV behavior benefit under the search task, and carry out Iterative calculation and judgment are used to obtain the most profitable search behavior strategy, and the optimal search path of the UAV is planned.

本发明克服了传统搜索算法，如扫描线搜索算法，只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的较大缺点，根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布，确定无人机下一时刻即将搜索的虚拟目标的位置，并得到一组无人机的搜索行为操作序列，此序列为行为收益最大的搜索行为策略，能够应用于搜索运动目标，且能够以较低的搜索成本成功搜索到目标。The invention overcomes the major shortcomings of traditional search algorithms, such as scan line search algorithms, which only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which leads to high search costs. The existence probability distribution of the Kov moving target in the search area determines the position of the virtual target that the UAV will search for at the next moment, and obtains a set of UAV’s search behavior operation sequence. This sequence is the search behavior strategy with the greatest behavioral benefit. , which can be applied to search for moving targets, and can successfully search for targets with low search cost.

附图说明Description of drawings

下面结合附图和实施例对本发明进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.

图1示出了本发明实施例提供的一种搜索区域栅格划分示意图；FIG. 1 shows a schematic diagram of grid division of a search area provided by an embodiment of the present invention;

图2a～2d分别示出了本发明实施例提供的马尔科夫运动目标在不同时刻的存在概率分布图；Figures 2a-2d respectively show the existence probability distribution diagrams of the Markov moving objects provided by the embodiments of the present invention at different times;

图3示出了本发明实施例提供的一种MDP-MSHPI算法理论框架结构示意图；3 shows a schematic diagram of a theoretical framework structure of an MDP-MSHPI algorithm provided by an embodiment of the present invention;

图4示出了本发明实施例提供的一种单个无人机运用MSHPI算法搜索Markov目标的流程图；4 shows a flowchart of a single UAV searching for a Markov target using the MSHPI algorithm according to an embodiment of the present invention;

图5示出了本发明实施例提供的一种多个无人机运用MSHPI算法搜索Markov目标的流程图；FIG. 5 shows a flowchart of a plurality of unmanned aerial vehicles searching for Markov targets by using the MSHPI algorithm according to an embodiment of the present invention;

图6示出了本发明实施例提供的一种马尔科夫运动目标的无人机搜索装置的结构示意图。FIG. 6 shows a schematic structural diagram of a UAV search device for a Markov moving target provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面，结合附图以及具体实施方式，对本发明做进一步描述，需要说明的是，在不相冲突的前提下，以下描述的各实施例之间或各技术特征之间可以任意组合形成新的实施例。The present invention will be further described below with reference to the accompanying drawings and specific embodiments. It should be noted that, on the premise of no conflict, the embodiments or technical features described below can be combined arbitrarily to form new embodiments. .

具体实施例一Specific embodiment one

本发明实施例提供了一种马尔科夫运动目标的无人机搜索方法，包括：An embodiment of the present invention provides a method for searching an unmanned aerial vehicle for a Markov moving target, including:

构建步骤，根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布，构建搜索任务下无人机行为预测的马尔科夫模型，建立基于马尔科夫决策的多阶段启发式策略迭代算法(Multi-Stage Heuristic Strategy IterationAlgorithm,MDP-MSHPI)；The construction step is to construct a Markov model for UAV behavior prediction under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process, and establish Markov-based decision-making. The Multi-Stage Heuristic Strategy Iteration Algorithm (MDP-MSHPI);

规划步骤，利用基于马尔科夫决策的多阶段启发式策略迭代算法，获取收益最大的搜索行为策略，从而规划出无人机最优的搜索航迹。本发明实施例中，无人机的数量可以为一个、两个或多个。In the planning step, the multi-stage heuristic strategy iteration algorithm based on Markov decision is used to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV. In this embodiment of the present invention, the number of drones may be one, two or more.

如图1、图2所示，优选的，所述目标步骤，可以具体为：接收到搜索任务后，对目标的运动区域进行栅格化划分，利用概率论对马尔科夫运动目标建模，获取马尔科夫运动目标的概率模型，从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布；所述目标的运动区域等同于无人机的搜索区域。这样做的好处是，能够对运动目标的运动规律进行模拟，获取马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布，例如可以得到某一时刻下运动目标最可能所处的位置。As shown in FIG. 1 and FIG. 2 , preferably, the target step may be specifically: after receiving the search task, divide the moving area of the target into a grid, and use probability theory to model the Markov moving target, Obtain the probability model of the Markov moving target, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target; the moving area of the target is equivalent to the search area of the UAV. The advantage of this is that it can simulate the motion law of the moving target, and obtain all possible states and their probability distributions during the movement of the Markov moving target. For example, the most likely position of the moving target at a certain moment can be obtained. .

优选的，所述无人机步骤，可以具体为：对无人机的飞行行为进行编码并描述，对无人机在执行搜索任务过程中存在的状态进行描述，从而获取无人机搜索过程中所有可能出现的状态及其概率分布。这样做的好处是，能够获取无人机在搜索过程中可能出现的状态及其概率分布。Preferably, the unmanned aerial vehicle step may be specifically: coding and describing the flying behavior of the unmanned aerial vehicle, and describing the state of the unmanned aerial vehicle in the process of executing the search task, so as to obtain the information of the unmanned aerial vehicle in the search process. All possible states and their probability distributions. The advantage of this is that the possible states of the drone during the search process and their probability distribution can be obtained.

优选的，所述构建搜索任务下无人机行为预测的马尔科夫模型的步骤，可以具体为：Preferably, the step of constructing the Markov model for predicting the behavior of the UAV under the search task may be specifically:

优选的，所述规划步骤，可以包括：Preferably, the planning step may include:

以及最优动作值函数方程

并根据两个最优函数方程建立最优搜索策略函数方程

and the optimal action value function equation

转移到

move to

i＝1,2,...,n，则搜索任务失败，算法结束；所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现(所处)的位置。Ending step: If at a certain time t, the position s _UAV(i) of the i-th UAV is the same as the current simulated position s _target of the target, then the i-th UAV successfully searches for the target, the search task is completed, and the algorithm ends; If the search step

i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. where it might appear.

这样做的好处是，能够采用MDP折扣模型确定搜索任务下无人机存在状态转移概率集合和转移过程中飞行行为操作的报酬集合，计算报酬效用函数，建立搜索任务下无人机行为收益的最优方程，进行迭代计算和判断，得到收益最大的搜索行为策略，规划出无人机最优的搜索航迹。The advantage of this is that the MDP discount model can be used to determine the probability set of UAV existence state transition under the search task and the reward set of the flight behavior operation during the transfer process, calculate the reward utility function, and establish the maximum value of the UAV behavior benefit under the search task. The optimal equation is used to perform iterative calculation and judgment to obtain the most profitable search behavior strategy, and to plan the optimal search path of the UAV.

本发明的一种应用场景可以是：An application scenario of the present invention may be:

1.对目标的运动区域(同样是无人机的搜索区域)进行栅格化划分，划分后的区域模型如图1所示；利用概率论的知识对马尔科夫运动目标建模，获得马尔科夫运动目标的概率模型，马尔科夫运动目标在运动一定步数后在运动区域的存在概率如图2a～2d所示，其中图2a为初始时刻，图2b为运动一步后，图2c为运动5步后，图2d为运动10步后；1. Perform grid division on the moving area of the target (also the search area of the UAV), and the divided area model is shown in Figure 1; use the knowledge of probability theory to model the Markov moving target, and obtain the Markov moving target. The probability model of the Markov moving target, the existence probability of the Markov moving target in the moving area after moving a certain number of steps is shown in Figure 2a ~ 2d, in which Figure 2a is the initial moment, Figure 2b is after one step of movement, Figure 2c is After 5 steps of exercise, Figure 2d shows after 10 steps of exercise;

MDP-MSHPI算法理论框架结构如图3所示；The theoretical framework of the MDP-MSHPI algorithm is shown in Figure 3;

2.对无人机的飞行行为进行编码并描述，并对无人机在执行搜索任务过程中存在的状态进行描述；2. Code and describe the flight behavior of the UAV, and describe the state of the UAV in the process of performing the search task;

3.构建搜索任务下无人机行为预测的马尔科夫模型，在这一步骤中需要提前定义以下内容：3. Build a Markov model for UAV behavior prediction under search tasks. In this step, the following contents need to be defined in advance:

定义1：设T＝{1,2,3,…}表示时刻集合；Definition 1: Let T={1,2,3,…} represent the time set;

定义2：设S＝(s₁,s₂,s₃,…)，该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态；Definition 2: Set S=(s ₁ , s ₂ , s ₃ ,...), the state space contains all possible states during the UAV search process and the Markov moving target movement process;

定义3：设A(s_n)＝{a₁(s_n),a₂(s_n),a₃(s_n),...},表示无人机处于状态s_n下的可以采取的所有动作集合；Definition 3: Let A(s _n )={a ₁ (s _n ), a ₂ (s _n ), a ₃ (s _n ),...}, it means that the UAV is in the state _sn that can be taken A collection of all actions;

定义4：设T(s_n,a_x(s_n),s_m)表示无人机所有状态转移概率集合，其元素p(s_m|s_n,a_x(s_n))表示在状态s_n下，执行可用动作a_x(s_n)之后，无人机状态变化到s_m的概率，并假设

Definition 4: Let T(s _n , a _x (s _n ), s _m ) represent the set of all state transition probabilities of the UAV, and its element p(s _m |s _n , a _x (s _n )) represents the state s Under _n , the probability of the UAV state changing to s _m after performing the available actions a _x (s _n ), and assuming

定义5：设R(s_n)表示报酬集合，其中任意元素r(s_n,a_x(s_n))表示在状态s_n执行动作a_x(s_n)的报酬；Definition 5: Let R(s _n ) represent the reward set, where any element r(s _n , a _x (s _n )) represents the reward for performing the action a _x (s _n ) in the state _sn ;

定义6：设A＝{a₁,a₂,...,a_q}，A代表无人机的动作空间，包含了无人机所有可能的改变无人机状态的动作，其中元素a_x表示第x个动作，q为动作空间中的元素个数x＝1,2,3,...q；Definition 6: Let A={a ₁ ,a ₂ ,...,a _q }, A represents the action space of the UAV, including all possible actions of the UAV to change the state of the UAV, among which the element a _x Represents the xth action, q is the number of elements in the action space x=1,2,3,...q;

给出无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型如下：The Markov model that predicts the search behavior of the UAV to perform any search task is as follows:

其中，π为策略，表示从状态集合到动作集合的映射，π(s_n)代表无人机从状态s_n到动作集合的映射；Among them, π is the strategy, which represents the mapping from the state set to the action set, and π(s _n ) represents the mapping of the UAV from the state s _n to the action set;

4.确定无人机在该搜索任务下状态转移概率集合T(s_n,a_x(s_n),s_m)，并根据搜索任务要求确定状态转移过程中的报酬集合R(s_n)；4. Determine the state transition probability set T(s _n , a _x (s _n ), s _m ) of the UAV under the search task, and determine the reward set R(s _n ) in the state transition process according to the search task requirements;

根据无人机当前所处的状态s_n，建立无人机搜索任务下搜索操作收益的最优状态值函数方程；According to the current state s _n of the UAV, establish the optimal state value function equation of the search operation revenue under the UAV search task;

本发明在计算报酬效用函数时使用MDP折扣模型，即在确定一个策略并执行后，决策者会在时刻T依一定的概率获取一定的报酬，报酬累加起来后就是该系统的效用函数，其中的折扣模型中的折扣因子用γ表示，且有0<γ<1，MDP折扣模型的效用函数表示为

该函数表示系统在t＝0时刻从s_n状态下，使用策略π后无人机的折扣期望总报酬，其中效用函数为有界函数；The present invention uses the MDP discount model when calculating the reward utility function, that is, after a strategy is determined and executed, the decision maker will obtain a certain reward at time T according to a certain probability, and the cumulative reward is the utility function of the system, where the The discount factor in the discount model is represented by γ, and 0<γ<1, the utility function of the MDP discount model is expressed as

This function represents the discounted expected total reward of the UAV after using the strategy π from the state of sn at _t =0, where the utility function is a bounded function;

根据MDP折扣模型的最优方程，建立无人机在状态s_n下执行该搜索任务过程中搜索操作收益的最优状态值函数方程

以及最优动作值函数方程

并根据两个最优函数模型建立最优搜索策略函数方程

According to the optimal equation of the MDP discount model, the optimal state value function equation of the search operation revenue in the process of the UAV performing the search task in the state _sn is established.

and the optimal action value function equation

And establish the optimal search strategy function equation according to the two optimal function models

5.初始化参数；给定参与搜索任务的无人机数目g，s_UAV(i)为第i架无人机当前状态，s_UAV(i)∈S，A(s_UAV(i))为第i架无人机在状态s_UAV(i)下的动作集合；K_i为第i架无人机的最大搜索步长；给定折扣因子γ和策略迭代的结束条件ε，令迭代次数b＝0；其中i为正整数；5. Initialization parameters; given the number of UAVs participating in the search task g, s _UAV(i) is the current state of the ith UAV, s _UAV(i) ∈ S, and A(s _UAV(i) ) is the current state of the ith UAV. The action set of i UAV in state s _UAV(i) ; K _i is the maximum search step size of the i-th UAV; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0; where i is a positive integer;

6.确定目标运动的初始位置以及每个无人机开始搜索的位置；每架无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布；从而确定每个无人机下一时刻即将搜索的虚拟目标位置；6. Determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the target's existence probability distribution in the entire area according to the initial position of the target's movement and the target movement heuristic information; The virtual target position that the drone will search for at the next moment;

7.每个无人机根据自己的虚拟目标位置，并根据自己当前的状态s_UAV(i)，通过计算搜索操作收益的最优状态值函数方程得到V^b+1(s_UAV(i))迭代次数b＝b+1；7. According to its own virtual target position and its current state s _UAV(i) , each UAV obtains V ^b+1 (s _UAV(i) ) by calculating the optimal state value function equation of the search operation revenue The number of iterations b=b+1;

8.如果||V^b+1(s_UAV(i))-V^b(s_UAV(i))||<ε,则结束迭代，进行下一步骤，否则，转到步骤7；8. If ||V ^b+1 (s _UAV(i) )-V ^b (s _UAV(i) )||<ε, end the iteration and go to the next step, otherwise, go to step 7;

9.每架无人机根据最终得到的状态值函数V^b+1(s_UAV(i))遍历A(s_UAV(i))获得Q(s_UAV(i),a_i),最终求得收益最大的行为策略π_i(t+1)^*；每架无人机按照以上步骤都能找到下一时刻的最优搜索行为策略，如图4所示；9. Each UAV traverses A(s _UAV(i) ) according to the final state value function V ^b+1 (s _UAV( i) ) to obtain Q(s _UAV(i) , a _i ), and finally obtains The most profitable behavior strategy π _i (t+1) ^* ; each UAV can find the optimal search behavior strategy at the next moment according to the above steps, as shown in Figure 4;

10.按照所求的最优策略π_i(t+1)^*执行动作a_i，状态由

转移到

于此同时，无人机获得立即报酬r_i(s_UAV(i),a_i),t＝t+1，第i架无人机搜索步长k_i＝k_i+1；10. According to the required optimal strategy π _i (t+1) ^* execute action a _i , the state is given by

move to

At the same time, the UAV obtains an immediate reward ri (s _UAV(i) _, a _i ), t=t+1, and the i-th UAV searches for a step size _ki = _ki +1;

11.若在某一个时刻t，第i架无人机位置s_UAV(i)与目标真实位置s_target相同，则第i架无人机成功搜索到目标，搜索任务完成，算法结束；若搜索步长

i＝1,2,...,n，则搜索失败，算法结束；多架无人机协同搜索Markov运动目标的程序框图如图5所示。11. If at a certain time t, the position s _UAV(i) of the i-th UAV is the same as the real position of the target s _target , then the i-th UAV successfully searches for the target, the search task is completed, and the algorithm ends; step size

If i=1,2,...,n, the search fails and the algorithm ends; the block diagram of the coordinated search of Markov moving targets by multiple UAVs is shown in Figure 5.

在上述的具体实施例一中，提供了马尔科夫运动目标的无人机搜索方法，与之相对应的，本申请还提供马尔科夫运动目标的无人机搜索装置。由于装置实施例基本相似于方法实施例，所以描述得比较简单，相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。In the above-mentioned specific embodiment 1, a UAV search method for a Markov moving target is provided. Correspondingly, the present application also provides a UAV search device for a Markov moving target. Since the apparatus embodiment is basically similar to the method embodiment, the description is relatively simple, and reference may be made to part of the description of the method embodiment for related parts. The apparatus embodiments described below are merely illustrative.

具体实施例二Specific embodiment two

如图6所示，本发明实施例提供了一种马尔科夫运动目标的无人机搜索装置，包括：As shown in FIG. 6 , an embodiment of the present invention provides a UAV search device for a Markov moving target, including:

目标模块201，用于接收到搜索任务后，构建马尔科夫运动目标的概率模型，从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布；The target module 201 is configured to construct a probability model of the Markov moving target after receiving the search task, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;

无人机模块202，用于获取无人机搜索过程中所有可能出现的状态及其概率分布；The UAV module 202 is used to obtain all possible states and their probability distributions in the UAV search process;

构建模块203，用于根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布，构建搜索任务下无人机行为预测的马尔科夫模型，建立基于马尔科夫决策的多阶段启发式策略迭代算法；The building module 203 is used to construct a Markov model for predicting the behavior of the UAV under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process, and establish a Markov model based on Markov behavior prediction. Multi-stage heuristic policy iteration algorithm for Kov decision;

规划模块204，用于利用基于马尔科夫决策的多阶段启发式策略迭代算法，获取收益最大的搜索行为策略，从而规划出无人机最优的搜索航迹。The planning module 204 is configured to use the multi-stage heuristic strategy iterative algorithm based on Markov decision to obtain the search behavior strategy with the greatest profit, thereby planning the optimal search track of the UAV.

本发明实施例克服了传统搜索算法，如扫描线搜索算法，只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的较大缺点，根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布，确定无人机下一时刻即将搜索的虚拟目标的位置，并得到一组无人机的搜索行为操作序列，此序列为行为收益最大的搜索行为策略，能够应用于搜索运动目标，且能够以较低的搜索成本成功搜索到目标。The embodiment of the present invention overcomes the major shortcomings of traditional search algorithms, such as scan line search algorithms, which only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which leads to high search costs. At this moment, the existence probability distribution of the Markov moving target in the search area determines the position of the virtual target that the UAV will search for at the next moment, and obtains a sequence of search behaviors of the UAV. This sequence is the search with the greatest behavioral benefit. The behavior strategy can be applied to search for moving targets, and the target can be successfully searched with low search cost.

优选的，所述目标模块201可以用于：接收到搜索任务后，对目标的运动区域进行栅格化划分，利用概率论对马尔科夫运动目标建模，获取马尔科夫运动目标的概率模型，从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布；所述目标的运动区域等同于无人机的搜索区域。Preferably, the target module 201 can be used to: after receiving the search task, perform grid division on the moving area of the target, use probability theory to model the Markov moving target, and obtain the probability model of the Markov moving target , so as to obtain all possible states and their probability distributions during the movement of the Markov moving target; the movement area of the target is equivalent to the search area of the UAV.

优选的，所述无人机模块202可以用于：对无人机的飞行行为进行编码并描述，对无人机在执行搜索任务过程中存在的状态进行描述，从而获取无人机搜索过程中所有可能出现的状态及其概率分布。Preferably, the UAV module 202 can be used to: encode and describe the flight behavior of the UAV, and describe the state of the UAV in the process of executing the search task, so as to obtain the information of the UAV in the process of searching for the UAV. All possible states and their probability distributions.

优选的，所述构建搜索任务下无人机行为预测的马尔科夫模型，可以具体为：Preferably, the construction of the Markov model for predicting the behavior of the UAV under the search task may be specifically:

设T(s_n,a_x(s_n),s_m)表示无人机所有状态转移概率集合，其中的任意元素p(s_m|s_n,a_x(s_n))表示在状态s_n下，执行可用动作a_x(s_n)之后，状态变化到s_m的概率，其中

Let T(s _n , a _x (s _n ), s _m ) represent the set of all state transition probabilities of the UAV, and any element p(s _m |s _n , a _x (s _n )) in the state s _n , the probability that the state changes to s _m after performing the available actions a _x (s _n ), where

优选的，所述规划模块204可以包括：Preferably, the planning module 204 may include:

以及最优动作值函数方程

并根据两个最优函数方程建立最优搜索策略函数方程

and the optimal action value function equation

判断单元，用于：如果||V^b+1(s_UAV(i))-V^b(s_UAV(i))||<ε，则结束迭代，调用遍历单元；否则，调用迭代单元；Judging unit, used for: if ||V ^b+1 (s _UAV(i) )-V ^b (s _UAV(i) )||<ε, end the iteration and call the traversal unit; otherwise, call the iteration unit;

转移到

move to

i＝1,2,...,n，则搜索任务失败，算法结束；所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现的位置。The end unit is used for: if at a certain time t, the position s _UAV(i) of the i-th UAV is the same as the current simulation position s _target of the target, then the i-th UAV successfully searches for the target, and the search task is completed, The algorithm ends; if the search step is

本发明从使用目的上，效能上，进步及新颖性等观点进行阐述，其具有的实用进步性，己符合专利法所强调的功能增进及使用要件，本发明以上的说明及附图，仅为本发明的较佳实施例而己，并非以此局限本发明，因此，凡一切与本发明构造，装置，待征等近似、雷同的，即凡依本发明专利申请范围所作的等同替换或修饰等，皆应属本发明的专利申请保护的范围之内。The present invention is explained from the viewpoints of purpose of use, efficiency, progress and novelty, etc. The practical progress of the present invention has met the functional enhancement and use requirements emphasized by the patent law. The above description and drawings of the present invention are only for The preferred embodiments of the present invention are not intended to limit the present invention. Therefore, all structures, devices, and waiting lists are similar or similar to those of the present invention, that is, any equivalent replacement or modification made according to the scope of the patent application of the present invention. etc., shall all fall within the scope of protection of the patent application of the present invention.

需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。尽管本发明已进行了一定程度的描述，明显地，在不脱离本发明的精神和范围的条件下，可进行各个条件的适当变化。可以理解，本发明不限于所述实施方案，而归于权利要求的范围，其包括所述每个因素的等同替换。对本领域的技术人员来说，可根据以上描述的技术方案以及构思，做出其它各种相应的改变以及形变，而所有的这些改变以及形变都应该属于本发明权利要求的保护范围之内。It should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other under the condition of no conflict. Although this invention has been described to a certain extent, it will be apparent that suitable changes in various conditions may be made without departing from the spirit and scope of the invention. It is to be understood that the invention is not limited to the embodiments described, but is to be included within the scope of the claims, which include equivalents for each of the elements described. For those skilled in the art, various other corresponding changes and deformations can be made according to the technical solutions and concepts described above, and all these changes and deformations should fall within the protection scope of the claims of the present invention.

Claims

1. An unmanned aerial vehicle searching method for a Markov moving target is characterized by comprising the following steps:

a target step, namely after receiving a search task, constructing a probability model of the Markov moving target so as to obtain all possible states and probability distribution thereof in the moving process of the Markov moving target;

the unmanned plane step, obtaining all possible states and probability distribution thereof in the unmanned plane searching process;

a construction step, wherein a Markov model for predicting unmanned aerial vehicle behaviors under a search task is constructed according to all possible states and probability distribution thereof in the unmanned aerial vehicle search process and the Markov moving target movement process, and a multi-stage heuristic strategy iterative algorithm based on Markov decision is established;

a planning step, namely acquiring a search behavior strategy with the maximum profit by using a multi-stage heuristic strategy iterative algorithm based on Markov decision, so as to plan the optimal search track of the unmanned aerial vehicle;

the establishment of the Markov model for unmanned aerial vehicle behavior prediction under the search task specifically comprises the following steps:

setting a time set T ═ 1,2,3, … } when the unmanned aerial vehicle search task is in progress;

let the discrete state space S ═ S of unmanned aerial vehicle (S)₁，s₂，s₃…) containing all possible states during drone search and during markov moving target motion;

let action space A ═ a of unmanned aerial vehicle ═ a₁，a₂，...，a_x，...，a_qRepresents all possible state-changing actions of the drone, element a_xRepresents the x-th action, and q is the number of elements in the action space;

let unmanned aerial vehicle be in state s_nSet of possible actions A(s) of_n)＝{a₁(s_n)，a₂(s_n)，a₃(s_n) ,., representing the set of all actions that a drone can take in a certain state;

let T(s)_n，a_x(s_n)，s_m) Represents the set of all state transition probabilities of the drone, with any element p(s)_m|s_n，a_x(s_n) Is in state s)_nNext, perform available action a_x(s_n) After that, the state changes to s_mThe probability of (a) of (b) being,

set of remunerations R(s)_n) Any element r(s) of_n，a_x(s_n) Is in state s)_nPerforming action a_x(s_n) A reward of (1);

the markov model for predicting the search behavior of the unmanned aerial vehicle executing any search task is as follows:

MDP＝{S，A，T(s_n，a_x(s_n)，s_m)，R(s_n)}→π(s_n)；

where π is the policy, which represents the mapping from the state set to the action set, π(s)_n) Slave state s for unmanned aerial vehicle_nMapping to action set, → representing the output optimal strategy;

the planning step comprises:

a calculation step: calculating a reward utility function with the MDP discount model, wherein the discount factor γ satisfies: gamma is more than 0 and less than 1; reward function for discount model

Representing the discount expectation total reward after the unmanned aerial vehicle uses the strategy pi from the state sn from the moment t-0;

according to the optimal equation of the MDP discount model, an optimal state value function equation of the search operation income of the unmanned aerial vehicle in the search task under the state sn is established

And an optimal action value function equation

And establishing an optimal search strategy function equation according to the two optimal function equations

The method comprises the following steps: rasterizing a collaborative search area, determining a discrete state space S of an MDP model, giving the number g, g and i of unmanned aerial vehicles participating in a search task as positive integers, and giving S_UAV(i)For the ith unmanned plane current state, s_UAV(i)∈S，A(s_UAV(i)) For the ith unmanned plane at state s_UAV(i)Set of actions of_iThe maximum search step length of the ith unmanned aerial vehicle is set; giving a discount factor gamma and a strategy iteration ending condition, and enabling the iteration number b to be 0;

the method comprises the following initial steps: determining an initial position of target motion and a position where each unmanned aerial vehicle starts to search; each unmanned aerial vehicle obtains the existence probability distribution of the target in the whole area according to the initial position of the target starting to move and the heuristic information of the target movement, so that the virtual target position to be searched at the next moment of each unmanned aerial vehicle is determined;

iteration step: each unmanned aerial vehicle is positioned according to the virtual target and the current state s_UAV(i)Iteratively calculating respective state value functions V^b+1(s_UAV(i)) Making the iteration number b equal to b + 1;

a judging step: if V | |^b+1(s_UAV(i))-V^b(s_UAV(i)) If | is less, ending the iteration and entering the traversal step; otherwise, turning to the iteration step;

traversing: each unmanned aerial vehicle obtains a state value function V according to the final result^b+1(s_UAV(i)) Traverse A(s)_UAV(i)) Obtaining Q(s)_UAV(i)，a_i) Finally, the search behavior strategy pi with the maximum profit is obtained_i(t+1)^*；

A transfer step: according to the optimal strategy pi_i(t+1)^*Performing action a_iThe state is given by

Is transferred to

At the same time, the drone gets an immediate reward r_i(s_UAV(i)，a_i) When t is t +1, the ith unmanned aerial vehicle searches for step length k_i＝k_i+1；

And (5) finishing the steps: if at a certain moment t, the ith unmanned aerial vehicle position s_UAV(i)With the current simulated position s of the target_targetIf the number of the unmanned aerial vehicles is the same as the number of the unmanned aerial vehicles, the unmanned aerial vehicle of the ith frame successfully searches the target, the search task is completed, and the algorithm is ended; if the search step size is

The search task fails and the algorithm ends; and the current simulation position of the target is the current most likely position of the target obtained according to all likely states and probability distribution of the Markov motion target in the motion process.

2. The unmanned aerial vehicle searching method for a markov moving target according to claim 1, wherein the target step specifically is:

after receiving a search task, rasterizing a motion area of a target, modeling a Markov motion target by using probability theory, and acquiring a probability model of the Markov motion target so as to obtain all possible states and probability distribution of the Markov motion target in the motion process; the motion area of the target is equal to the search area of the unmanned aerial vehicle.

3. The unmanned aerial vehicle searching method for a markov moving target according to claim 1 or 2, wherein the unmanned aerial vehicle comprises:

the flight behavior of the unmanned aerial vehicle is coded and described, and states of the unmanned aerial vehicle in the process of executing a search task are described, so that all possible states and probability distribution of the states in the process of searching the unmanned aerial vehicle are obtained.

4. An unmanned aerial vehicle searching device for Markov moving targets, comprising:

the target module is used for constructing a probability model of the Markov moving target after receiving the search task so as to obtain all possible states and probability distribution of the Markov moving target in the moving process;

the unmanned aerial vehicle module is used for acquiring all possible states and probability distribution thereof in the searching process of the unmanned aerial vehicle;

the building module is used for building a Markov model for predicting the behavior of the unmanned aerial vehicle under a search task according to all possible states and probability distribution thereof in the unmanned aerial vehicle search process and the Markov moving target motion process, and building a multi-stage heuristic strategy iterative algorithm based on Markov decision;

the planning module is used for acquiring a search behavior strategy with the maximum profit by utilizing a multi-stage heuristic strategy iterative algorithm based on Markov decision, so as to plan the optimal search track of the unmanned aerial vehicle;

let T(s)_n，a_x(s_n)，s_m) Represents the set of all state transition probabilities of the drone, with any element p(s)_m|s_n，a_x(s_n) Is in state s)_nNext, perform available action a_x(s_n) After that, the state changes to s_mWherein, the probability of

MDP＝{S，A，T(s_n，a_x(s_n)，s_m)，R(s_n)}→π(s_n)；

the planning module comprises:

a computing unit to: calculating a reward utility function with the MDP discount model, wherein the discount factor γ satisfies: gamma is more than 0 and less than 1; reward function for discount model

Indicating that the drone starts from state s at time t-0_nDiscounts after strategy π are used to expect a total reward;

establishing a state s according to an optimal equation of the MDP discount model_nOptimal state value function equation for searching operation income of unmanned aerial vehicle in search task

And an optimal action value function equation

A given unit for: rasterizing a collaborative search area, determining a discrete state space S of an MDP model, and giving the number g, S of unmanned aerial vehicles participating in a search task_UAV(i)For the ith unmanned plane current state, s_UAV(i)∈S，A(s_UAV(i)) For the ith unmanned plane at state s_UAV(i)Set of actions of_iThe maximum search step length of the ith unmanned aerial vehicle is set; giving a discount factor gamma and a strategy iteration ending condition, and enabling the iteration number b to be 0;

an initial unit configured to: determining an initial position of target motion and a position where each unmanned aerial vehicle starts to search; each unmanned aerial vehicle obtains the existence probability distribution of the target in the whole area according to the initial position of the target starting to move and the heuristic information of the target movement, so that the virtual target position to be searched at the next moment of each unmanned aerial vehicle is determined;

an iteration unit to: each unmanned aerial vehicle is positioned according to the virtual target and the current state s_UAV(i)Iteratively calculating respective state value functions V^b+1(s_UAV(i)) Making the iteration number b equal to b + 1;

a determination unit configured to: if V | |^b+1(s_UAV(i))-V^b(s_UAV(i)) If the | is less, ending the iteration and calling the traversal unit, otherwise, calling the iteration unit;

a traversal unit to: each unmanned aerial vehicle obtains a state value function V according to the final result^b+1(s_UAV(i)) Traverse A(s)_UAV(i)) Obtaining Q(s)_UAV(i)，a_i) Finally get outMaximum benefit search behavior strategy pi_i(t+1)^*；

A transfer unit for: according to the optimal strategy pi_i(t+1)^*Performing action a_iThe state is given by

Is transferred to

An ending unit configured to: if at a certain moment t, the ith unmanned aerial vehicle position s_UAV(i)With the current simulated position s of the target_targetIf the number of the unmanned aerial vehicles is the same as the number of the unmanned aerial vehicles, the unmanned aerial vehicle of the ith frame successfully searches the target, the search task is completed, and the algorithm is ended; if the search step size is

5. The Markov moving target drone searching device of claim 4, wherein the target module is to:

6. A Markov moving target drone search device as claimed in claim 4 or 5, wherein the drone module is to: