+

CN108594858B - Unmanned aerial vehicle searching method and device for Markov moving target - Google Patents

Unmanned aerial vehicle searching method and device for Markov moving target Download PDF

Info

Publication number
CN108594858B
CN108594858B CN201810779927.8A CN201810779927A CN108594858B CN 108594858 B CN108594858 B CN 108594858B CN 201810779927 A CN201810779927 A CN 201810779927A CN 108594858 B CN108594858 B CN 108594858B
Authority
CN
China
Prior art keywords
search
uav
unmanned aerial
target
aerial vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810779927.8A
Other languages
Chinese (zh)
Other versions
CN108594858A (en
Inventor
陈立家
王赞
汪晓群
薛政钢
管禹
赵瑞杰
冯帅栋
冯子凯
王敬飞
赵成伟
袁蒙恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Zhouhe Network Technology Co ltd
Henan University
Original Assignee
Henan Zhouhe Network Technology Co ltd
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Zhouhe Network Technology Co ltd, Henan University filed Critical Henan Zhouhe Network Technology Co ltd
Priority to CN201810779927.8A priority Critical patent/CN108594858B/en
Publication of CN108594858A publication Critical patent/CN108594858A/en
Application granted granted Critical
Publication of CN108594858B publication Critical patent/CN108594858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了马尔科夫运动目标的无人机搜索方法及装置,在接收到搜索任务后,得到马尔科夫运动目标运动过程中和无人机搜索过程中所有可能出现的状态及其概率分布;构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;获取收益最大的搜索行为策略,规划出无人机最优的搜索航迹。本发明克服了传统搜索算法对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的缺点,根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布,确定无人机下一时刻即将搜索的虚拟目标位置,并得到收益最大的搜索行为策略,能够应用于搜索运动目标,且能够以较低的搜索成本成功搜索到目标。

Figure 201810779927

The invention discloses a UAV search method and device for a Markov moving target. After receiving a search task, all possible states and their probability distributions during the movement process of the Markov moving target and the UAV search process are obtained. ; Build a Markov model for UAV behavior prediction under search tasks, and establish a multi-stage heuristic strategy iteration algorithm based on Markov decision-making; obtain the most profitable search behavior strategy, and plan the optimal UAV search track . The invention overcomes the shortcoming of high search cost caused by the lack of strict mathematical definition of the motion law of the search target in the traditional search algorithm. The virtual target position to be searched in the next moment, and the search behavior strategy with the most profit can be obtained, which can be applied to search for moving targets, and the target can be successfully searched with a lower search cost.

Figure 201810779927

Description

马尔科夫运动目标的无人机搜索方法及装置UAV search method and device for Markov moving target

技术领域technical field

本发明涉及无人机的控制领域,尤其涉及马尔科夫运动目标的无人机搜索方法及装置。The invention relates to the control field of unmanned aerial vehicles, in particular to a method and a device for searching for unmanned aerial vehicles of Markov moving targets.

背景技术Background technique

如今,科技的力量渗透在各行各业。作为一股新的科技力量,无人机在各行各业的应用前景备受关注。其中无人机在军事目标搜索系统中搜索与营救应用已经得到世界各国的重视。航迹规划是无人机应用于目标搜索系统中必须考虑的重要问题。Today, the power of technology permeates all walks of life. As a new scientific and technological force, the application prospects of UAVs in all walks of life have attracted much attention. Among them, the search and rescue application of UAVs in military target search systems has been paid attention to by countries all over the world. Track planning is an important issue that must be considered in the application of UAV to target search system.

无人机目标搜索系统的航迹规划包括两大类,即静止目标搜索系统航迹规划以及运动目标搜索系统航迹规划。当前,对于无人机静止目标搜索系统航迹规划的研究成果比较成熟,已经广泛应用于实际问题。但对于运动目标搜索系统航迹规划的研究工作相对较少。目前,对于运动目标搜索系统航迹规划问题的研究方向主要集中在对搜索算法本身的设计与应用上,而对搜索过程中目标的运动规律问题缺少严格的数学定义。另一方面,无人机独立执行任务的能力较弱,其智能化程度在很多情况下不足以满足人们的期望。在任务执行的过程中,无人机一般都是按照事先载入的程序进行,应变能力差。因此,多无人机协同参与任务是一种比较可靠的选择。这也使得无人机在搜索方面的应用与实际的应用契合的越来越紧密。The track planning of the UAV target search system includes two categories, namely the track planning of the stationary target search system and the track planning of the moving target search system. At present, the research results of the trajectory planning of the UAV stationary target search system are relatively mature and have been widely used in practical problems. However, there are relatively few researches on the trajectory planning of the moving target search system. At present, the research direction of the trajectory planning problem of the moving target search system mainly focuses on the design and application of the search algorithm itself, but there is no strict mathematical definition for the problem of the movement law of the target in the search process. On the other hand, the ability of UAV to perform tasks independently is weak, and its intelligence is not enough to meet people's expectations in many cases. In the process of task execution, UAVs generally follow the pre-loaded procedures, and the adaptability is poor. Therefore, multi-UAV cooperative participation in tasks is a relatively reliable choice. This also makes the application of drones in search more and more closely aligned with practical applications.

马尔科夫决策过程(Markov Decision Processes,MDP)是一种具有决策能力的马尔科夫奖赏过程。在功能应用方面,它是用来解决在不确定性环境下序贯决策问题的理论知识。其中的不确定环境下序贯决策问题是指在一系列连续的或者离散的时刻(称为决策时刻)做出决策。人们在做决策的时候不仅要考虑决策当前的效果,也要照顾到所做的决策对长远利益的影响。MDP是一种应用非常广泛的决策过程。Markov Decision Processes (MDP) is a Markov reward process with decision-making ability. In functional applications, it is theoretical knowledge used to solve sequential decision-making problems in uncertain environments. The sequential decision-making problem in uncertain environment refers to making decisions at a series of continuous or discrete moments (called decision moments). When people make decisions, they should not only consider the current effect of the decision, but also take into account the impact of the decision on the long-term interests. MDP is a very widely used decision-making process.

传统的无人机搜索算法,如扫描线搜索算法,只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义,存在搜索代价高昂的较大缺点。Traditional UAV search algorithms, such as scan line search algorithms, only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which has the disadvantage of high search cost.

发明内容SUMMARY OF THE INVENTION

为了克服现有技术的不足,本发明的目的在于提供马尔科夫运动目标的无人机搜索方法及装置,旨在解决现有的传统算法应用于无人机搜索任务时不能满足运动目标搜索航迹规划需求或者搜索代价高昂的问题。In order to overcome the deficiencies of the prior art, the purpose of the present invention is to provide a UAV search method and device for Markov moving targets, aiming to solve the problem that the existing traditional algorithms cannot satisfy the moving target search navigation when applied to the UAV search task. trace planning requirements or search for costly problems.

本发明的目的采用以下技术方案实现:Purpose of the present invention adopts following technical scheme to realize:

一种马尔科夫运动目标的无人机搜索方法,包括:A UAV search method for Markov moving targets, comprising:

目标步骤,接收到搜索任务后,构建马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;In the target step, after receiving the search task, construct the probability model of the Markov moving target, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;

无人机步骤,获取无人机搜索过程中所有可能出现的状态及其概率分布;UAV step to obtain all possible states and their probability distributions during the UAV search process;

构建步骤,根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;The construction step is to construct a Markov model for UAV behavior prediction under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process, and establish Markov-based decision-making. The multi-stage heuristic strategy iteration algorithm of ;

规划步骤,利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。In the planning step, the multi-stage heuristic strategy iteration algorithm based on Markov decision is used to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV.

在上述实施例的基础上,优选的,所述目标步骤,具体为:On the basis of the above embodiment, preferably, the target step is specifically:

接收到搜索任务后,对目标的运动区域进行栅格化划分,利用概率论对马尔科夫运动目标建模,获取马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;所述目标的运动区域等同于无人机的搜索区域。After receiving the search task, the moving area of the target is divided into grids, the Markov moving target is modeled by probability theory, and the probability model of the Markov moving target is obtained, so as to obtain all the marks during the movement of the Markov moving target. The possible states and their probability distributions; the movement area of the target is equivalent to the search area of the UAV.

在上述任意实施例的基础上,优选的,所述无人机步骤,具体为:On the basis of any of the above-mentioned embodiments, preferably, the unmanned aerial vehicle step is specifically:

对无人机的飞行行为进行编码并描述,对无人机在执行搜索任务过程中存在的状态进行描述,从而获取无人机搜索过程中所有可能出现的状态及其概率分布。The flight behavior of the UAV is encoded and described, and the state of the UAV in the process of performing the search task is described, so as to obtain all possible states and their probability distributions during the UAV search process.

在上述任意实施例的基础上,优选的,所述构建搜索任务下无人机行为预测的马尔科夫模型,具体为:On the basis of any of the above-mentioned embodiments, preferably, the construction of a Markov model for predicting the behavior of the UAV under the search task is specifically:

设无人机搜索任务进行中的时刻集合T={1,2,3,…};Set the time set T={1,2,3,…} when the UAV search task is in progress;

设无人机的离散状态空间S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Set the discrete state space S=(s 1 , s 2 , s 3 ,...) of the UAV, which contains all possible states during the UAV search process and the Markov moving target movement process;

设无人机的动作空间A={a1,a2,...,ax,...,aq},表示无人机所有可能的改变状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数;Let the action space A of the UAV = {a 1 ,a 2 ,...,a x ,...,a q }, which represents all possible actions of the UAV to change the state, where the element a x represents the xth actions, q is the number of elements in the action space;

设无人机处于状态sn下的可行动作集合A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机位于某个状态下可以采取的所有动作集合;Suppose the feasible action set A(s n ) of the UAV in the state sn = {a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, representing the UAV A collection of all actions that can be taken in a state;

设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其中的任意元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,状态变化到sm的概率,

Figure BDA0001732310350000031
Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and any element p(s m |s n , a x (s n )) in the state s n Next, the probability that the state changes to s m after performing the available action a x (s n ),
Figure BDA0001732310350000031

设报酬集合R(sn)的任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Let any element r(s n , a x (s n )) of the reward set R(s n ) represent the reward for performing the action a x (s n ) in the state sn ;

则无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型为:Then the Markov model for predicting the search behavior of the UAV under any search task is:

MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );

其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射,→表示输出最优策略。Among them, π is the strategy, which represents the mapping from the state set to the action set, π(s n ) represents the mapping of the UAV from the state s n to the action set, and → represents the output optimal strategy.

在上述实施例的基础上,优选的,所述规划步骤,包括:On the basis of the above embodiment, preferably, the planning step includes:

计算步骤:用MDP折扣模型计算报酬效用函数,其中折扣因子γ满足:0<γ<1;折扣模型的报酬函数

Figure BDA0001732310350000032
表示在从时刻t=0开始无人机从状态sn使用策略π后的折扣期望总报酬;Calculation steps: use the MDP discount model to calculate the reward utility function, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model
Figure BDA0001732310350000032
represents the discounted expected total reward after the UAV uses policy π from state sn from time t =0;

根据MDP折扣模型的最优方程,建立在状态sn下无人机在搜索任务中搜索操作收益的最优状态值函数方程

Figure BDA0001732310350000033
Figure BDA0001732310350000034
以及最优动作值函数方程
Figure BDA0001732310350000037
Figure BDA0001732310350000035
并根据两个最优函数方程建立最优搜索策略函数方程
Figure BDA0001732310350000036
According to the optimal equation of the MDP discount model, the optimal state value function equation of the UAV's search operation revenue in the search task under the state sn is established
Figure BDA0001732310350000033
Figure BDA0001732310350000034
and the optimal action value function equation
Figure BDA0001732310350000037
Figure BDA0001732310350000035
And establish the optimal search strategy function equation according to the two optimal function equations
Figure BDA0001732310350000036

给定步骤:对协同搜索区域进行栅格化划分,确定MDP模型的离散状态空间S,给定参与搜索任务的无人机数目g,g、i为正整数,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合,Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;Given step: divide the collaborative search area into grids, determine the discrete state space S of the MDP model, and give the number of UAVs participating in the search task g, where g and i are positive integers, and s UAV(i) is the i-th UAV(i) is the current state of a UAV, s UAV(i) ∈ S, A(s UAV(i) ) is the action set of the i-th UAV in the state s UAV(i) , and K i is the i-th UAV The maximum search step size of the machine; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;

初始步骤:确定目标运动的初始位置以及每个无人机开始搜索的位置;每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布,从而确定每个无人机下一时刻即将搜索的虚拟目标位置;Initial step: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the target existence probability distribution in the entire area according to the initial position of the target movement and the target movement heuristic information, so as to determine The virtual target position that each drone will search for at the next moment;

迭代步骤:每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),迭代计算各自的状态值函数Vb+1(sUAV(i)),令迭代次数b=b+1;Iterative step: each UAV calculates its own state value function V b+1 (s UAV(i) ) iteratively according to its own virtual target position and its current state s UAV(i) , let the number of iterations b =b+1;

判断步骤:如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,进入遍历步骤;否则,转到迭代步骤;Judgment step: if ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and enter the traversal step; otherwise, go to the iteration step;

遍历步骤:每个无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的搜索行为策略πi(t+1)*Traversal step: each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ), and finally finds get the most profitable search behavior strategy π i (t+1) * ;

转移步骤:按照所求的最优策略πi(t+1)*执行动作ai,状态由

Figure BDA0001732310350000041
转移到
Figure BDA0001732310350000042
同时,无人机获得立即报酬ri(sUAV(i),ai),此时令t=t+1,令第i架无人机搜索步长ki=ki+1;Transition step: according to the required optimal strategy π i (t+1) * execute action a i , the state is given by
Figure BDA0001732310350000041
move to
Figure BDA0001732310350000042
At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), at this time, let t=t+1, and let the i-th UAV search step ki = ki +1;

结束步骤:若在某一个时刻t,第i架无人机位置sUAV(i)与目标当前模拟位置starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长

Figure BDA0001732310350000043
i=1,2,...,n,则搜索任务失败,算法结束;所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现的位置。Ending step: If at a certain time t, the position s UAV(i) of the i-th UAV is the same as the current simulated position s target of the target, then the i-th UAV successfully searches for the target, the search task is completed, and the algorithm ends; If the search step
Figure BDA0001732310350000043
i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. possible location.

一种马尔科夫运动目标的无人机搜索装置,包括:A UAV search device for Markov moving targets, comprising:

目标模块,用于接收到搜索任务后,构建马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;The target module is used to construct the probability model of the Markov moving target after receiving the search task, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;

无人机模块,用于获取无人机搜索过程中所有可能出现的状态及其概率分布;The UAV module is used to obtain all possible states and their probability distributions during the UAV search process;

构建模块,用于根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;The building block is used to construct a Markov model for UAV behavior prediction under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process. multi-stage heuristic policy iteration algorithm for decision-making;

规划模块,用于利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。The planning module is used to use the multi-stage heuristic strategy iteration algorithm based on Markov decision to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV.

在上述实施例的基础上,优选的,所述目标模块用于:On the basis of the above embodiment, preferably, the target module is used for:

接收到搜索任务后,对目标的运动区域进行栅格化划分,利用概率论对马尔科夫运动目标建模,获取马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;所述目标的运动区域等同于无人机的搜索区域。After receiving the search task, the moving area of the target is divided into grids, the Markov moving target is modeled by probability theory, and the probability model of the Markov moving target is obtained, so as to obtain all the marks during the movement of the Markov moving target. The possible states and their probability distributions; the movement area of the target is equivalent to the search area of the UAV.

在上述任意实施例的基础上,优选的,所述无人机模块用于:On the basis of any of the above embodiments, preferably, the UAV module is used for:

对无人机的飞行行为进行编码并描述,对无人机在执行搜索任务过程中存在的状态进行描述,从而获取无人机搜索过程中所有可能出现的状态及其概率分布。The flight behavior of the UAV is encoded and described, and the state of the UAV in the process of performing the search task is described, so as to obtain all possible states and their probability distributions during the UAV search process.

在上述任意实施例的基础上,优选的,所述构建搜索任务下无人机行为预测的马尔科夫模型,具体为:On the basis of any of the above-mentioned embodiments, preferably, the construction of a Markov model for predicting the behavior of the UAV under the search task is specifically:

设无人机搜索任务进行中的时刻集合T={1,2,3,…};Set the time set T={1,2,3,…} when the UAV search task is in progress;

设无人机的离散状态空间S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Set the discrete state space S=(s 1 , s 2 , s 3 ,...) of the UAV, which contains all possible states during the UAV search process and the Markov moving target movement process;

设无人机的动作空间A={a1,a2,...,ax,...,aq},表示无人机所有可能的改变状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数;Let the action space A of the UAV = {a 1 ,a 2 ,...,a x ,...,a q }, which represents all possible actions of the UAV to change the state, where the element a x represents the xth actions, q is the number of elements in the action space;

设无人机处于状态sn下的可行动作集合A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机位于某个状态下可以采取的所有动作集合;Suppose the feasible action set A(s n ) of the UAV in the state sn = {a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, representing the UAV A collection of all actions that can be taken in a state;

设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其中的任意元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,状态变化到sm的概率,其中i,j=1,2,3,...,

Figure BDA0001732310350000052
Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and any element p(s m |s n , a x (s n )) in the state s n below, the probability that the state changes to s m after performing the available actions a x (s n ), where i,j=1,2,3,...,
Figure BDA0001732310350000052

设报酬集合R(sn)的任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Let any element r(s n , a x (s n )) of the reward set R(s n ) represent the reward for performing the action a x (s n ) in the state sn ;

则无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型为:Then the Markov model for predicting the search behavior of the UAV under any search task is:

MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );

其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射,→表示输出最优策略。Among them, π is the strategy, which represents the mapping from the state set to the action set, π(s n ) represents the mapping of the UAV from the state s n to the action set, and → represents the output optimal strategy.

在上述实施例的基础上,优选的,所述规划模块包括:On the basis of the above embodiment, preferably, the planning module includes:

计算单元,用于:用MDP折扣模型计算报酬效用函数,其中折扣因子γ满足:0<γ<1;折扣模型的报酬函数

Figure BDA0001732310350000051
表示在从时刻t=0开始无人机从状态sn使用策略π后的折扣期望总报酬;The calculation unit is used to: calculate the reward utility function with the MDP discount model, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model
Figure BDA0001732310350000051
represents the discounted expected total reward after the UAV uses policy π from state sn from time t =0;

根据MDP折扣模型的最优方程,建立在状态sn下无人机在搜索任务中搜索操作收益的最优状态值函数方程

Figure BDA0001732310350000061
Figure BDA0001732310350000062
以及最优动作值函数方程
Figure BDA0001732310350000063
Figure BDA0001732310350000064
并根据两个最优函数方程建立最优搜索策略函数方程
Figure BDA0001732310350000065
According to the optimal equation of the MDP discount model, the optimal state value function equation of the UAV's search operation revenue in the search task under the state sn is established
Figure BDA0001732310350000061
Figure BDA0001732310350000062
and the optimal action value function equation
Figure BDA0001732310350000063
Figure BDA0001732310350000064
And establish the optimal search strategy function equation according to the two optimal function equations
Figure BDA0001732310350000065

给定单元,用于:对协同搜索区域进行栅格化划分,确定MDP模型的离散状态空间S,给定参与搜索任务的无人机数目g,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合,Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;The given unit is used to: rasterize the collaborative search area, determine the discrete state space S of the MDP model, given the number of UAVs participating in the search task g, s UAV(i) is the i-th UAV The current state, s UAV(i) ∈ S, A(s UAV(i) ) is the action set of the i-th UAV in the state s UAV(i) , and K i is the maximum search of the i-th UAV Step size; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;

初始单元,用于:确定目标运动的初始位置以及每个无人机开始搜索的位置;每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布,从而确定每个无人机下一时刻即将搜索的虚拟目标位置;The initial unit is used to: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the existence probability distribution of the target in the entire area according to the initial position of the target movement and the target movement heuristic information , so as to determine the virtual target position that each UAV will search for at the next moment;

迭代单元,用于:每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),迭代计算各自的状态值函数Vb+1(sUAV(i)),令迭代次数b=b+1;The iterative unit is used for: each UAV based on its own virtual target position and its current state s UAV(i) , iteratively calculates its own state value function V b+1 (s UAV(i) ), let The number of iterations b=b+1;

判断单元,用于:如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,调用遍历单元,否则,调用迭代单元;The judgment unit is used for: if ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and call the traversal unit, otherwise, call the iteration unit;

遍历单元,用于:每个无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的搜索行为策略πi(t+1)*Traversal unit for: each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ) , and finally obtain the search behavior strategy with the largest profit π i (t+1) * ;

转移单元,用于:按照所求的最优策略πi(t+1)*执行动作ai,状态由

Figure BDA0001732310350000066
转移到
Figure BDA0001732310350000067
同时,无人机获得立即报酬ri(sUAV(i),ai),此时令t=t+1,令第i架无人机搜索步长ki=ki+1;The transition unit is used to: execute the action a i according to the required optimal strategy π i (t+1) * , the state is given by
Figure BDA0001732310350000066
move to
Figure BDA0001732310350000067
At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), at this time, let t=t+1, and let the i-th UAV search step ki = ki +1;

结束单元,用于:若在某一个时刻t,第i架无人机位置sUAV(i)与目标当前模拟位置Starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长

Figure BDA0001732310350000068
i=1,2,...,n,则搜索任务失败,算法结束;所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现的位置。The end unit is used for: if at a certain time t, the position s UAV(i) of the i-th UAV is the same as the current simulated position S target of the target, then the i-th UAV successfully searches for the target, and the search task is completed, The algorithm ends; if the search step is
Figure BDA0001732310350000068
i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. possible location.

相比现有技术,本发明的有益效果在于:Compared with the prior art, the beneficial effects of the present invention are:

本发明公开了马尔科夫运动目标的无人机搜索方法及装置,在接收到搜索任务后,构建马尔科夫运动目标的概率模型,得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;获取无人机搜索过程中所有可能出现的状态及其概率分布;构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。The invention discloses a UAV search method and device for a Markov moving target. After a search task is received, a probability model of the Markov moving target is constructed to obtain all possible states and states during the movement of the Markov moving target. Its probability distribution; obtain all possible states and their probability distributions in the UAV search process; build a Markov model for UAV behavior prediction under search tasks, and establish a multi-stage heuristic strategy iteration algorithm based on Markov decision ; Use the multi-stage heuristic strategy iteration algorithm based on Markov decision to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV.

本发明能够采用MDP折扣模型确定搜索任务下无人机存在状态转移概率集合和转移过程中飞行行为操作的报酬集合,计算报酬效用函数,建立搜索任务下无人机行为收益的最优方程,进行迭代计算和判断,得到收益最大的搜索行为策略,规划出无人机最优的搜索航迹。The present invention can use the MDP discount model to determine the probability set of the existence state transition of the UAV under the search task and the reward set of the flight behavior operation in the transfer process, calculate the reward utility function, establish the optimal equation of the UAV behavior benefit under the search task, and carry out Iterative calculation and judgment are used to obtain the most profitable search behavior strategy, and the optimal search path of the UAV is planned.

本发明克服了传统搜索算法,如扫描线搜索算法,只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的较大缺点,根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布,确定无人机下一时刻即将搜索的虚拟目标的位置,并得到一组无人机的搜索行为操作序列,此序列为行为收益最大的搜索行为策略,能够应用于搜索运动目标,且能够以较低的搜索成本成功搜索到目标。The invention overcomes the major shortcomings of traditional search algorithms, such as scan line search algorithms, which only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which leads to high search costs. The existence probability distribution of the Kov moving target in the search area determines the position of the virtual target that the UAV will search for at the next moment, and obtains a set of UAV’s search behavior operation sequence. This sequence is the search behavior strategy with the greatest behavioral benefit. , which can be applied to search for moving targets, and can successfully search for targets with low search cost.

附图说明Description of drawings

下面结合附图和实施例对本发明进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.

图1示出了本发明实施例提供的一种搜索区域栅格划分示意图;FIG. 1 shows a schematic diagram of grid division of a search area provided by an embodiment of the present invention;

图2a~2d分别示出了本发明实施例提供的马尔科夫运动目标在不同时刻的存在概率分布图;Figures 2a-2d respectively show the existence probability distribution diagrams of the Markov moving objects provided by the embodiments of the present invention at different times;

图3示出了本发明实施例提供的一种MDP-MSHPI算法理论框架结构示意图;3 shows a schematic diagram of a theoretical framework structure of an MDP-MSHPI algorithm provided by an embodiment of the present invention;

图4示出了本发明实施例提供的一种单个无人机运用MSHPI算法搜索Markov目标的流程图;4 shows a flowchart of a single UAV searching for a Markov target using the MSHPI algorithm according to an embodiment of the present invention;

图5示出了本发明实施例提供的一种多个无人机运用MSHPI算法搜索Markov目标的流程图;FIG. 5 shows a flowchart of a plurality of unmanned aerial vehicles searching for Markov targets by using the MSHPI algorithm according to an embodiment of the present invention;

图6示出了本发明实施例提供的一种马尔科夫运动目标的无人机搜索装置的结构示意图。FIG. 6 shows a schematic structural diagram of a UAV search device for a Markov moving target provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面,结合附图以及具体实施方式,对本发明做进一步描述,需要说明的是,在不相冲突的前提下,以下描述的各实施例之间或各技术特征之间可以任意组合形成新的实施例。The present invention will be further described below with reference to the accompanying drawings and specific embodiments. It should be noted that, on the premise of no conflict, the embodiments or technical features described below can be combined arbitrarily to form new embodiments. .

具体实施例一Specific embodiment one

本发明实施例提供了一种马尔科夫运动目标的无人机搜索方法,包括:An embodiment of the present invention provides a method for searching an unmanned aerial vehicle for a Markov moving target, including:

目标步骤,接收到搜索任务后,构建马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;In the target step, after receiving the search task, construct the probability model of the Markov moving target, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;

无人机步骤,获取无人机搜索过程中所有可能出现的状态及其概率分布;UAV step to obtain all possible states and their probability distributions during the UAV search process;

构建步骤,根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法(Multi-Stage Heuristic Strategy IterationAlgorithm,MDP-MSHPI);The construction step is to construct a Markov model for UAV behavior prediction under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process, and establish Markov-based decision-making. The Multi-Stage Heuristic Strategy Iteration Algorithm (MDP-MSHPI);

规划步骤,利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。本发明实施例中,无人机的数量可以为一个、两个或多个。In the planning step, the multi-stage heuristic strategy iteration algorithm based on Markov decision is used to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV. In this embodiment of the present invention, the number of drones may be one, two or more.

本发明克服了传统搜索算法,如扫描线搜索算法,只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的较大缺点,根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布,确定无人机下一时刻即将搜索的虚拟目标的位置,并得到一组无人机的搜索行为操作序列,此序列为行为收益最大的搜索行为策略,能够应用于搜索运动目标,且能够以较低的搜索成本成功搜索到目标。The invention overcomes the major shortcomings of traditional search algorithms, such as scan line search algorithms, which only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which leads to high search costs. The existence probability distribution of the Kov moving target in the search area determines the position of the virtual target that the UAV will search for at the next moment, and obtains a set of UAV’s search behavior operation sequence. This sequence is the search behavior strategy with the greatest behavioral benefit. , which can be applied to search for moving targets, and can successfully search for targets with low search cost.

如图1、图2所示,优选的,所述目标步骤,可以具体为:接收到搜索任务后,对目标的运动区域进行栅格化划分,利用概率论对马尔科夫运动目标建模,获取马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;所述目标的运动区域等同于无人机的搜索区域。这样做的好处是,能够对运动目标的运动规律进行模拟,获取马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,例如可以得到某一时刻下运动目标最可能所处的位置。As shown in FIG. 1 and FIG. 2 , preferably, the target step may be specifically: after receiving the search task, divide the moving area of the target into a grid, and use probability theory to model the Markov moving target, Obtain the probability model of the Markov moving target, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target; the moving area of the target is equivalent to the search area of the UAV. The advantage of this is that it can simulate the motion law of the moving target, and obtain all possible states and their probability distributions during the movement of the Markov moving target. For example, the most likely position of the moving target at a certain moment can be obtained. .

优选的,所述无人机步骤,可以具体为:对无人机的飞行行为进行编码并描述,对无人机在执行搜索任务过程中存在的状态进行描述,从而获取无人机搜索过程中所有可能出现的状态及其概率分布。这样做的好处是,能够获取无人机在搜索过程中可能出现的状态及其概率分布。Preferably, the unmanned aerial vehicle step may be specifically: coding and describing the flying behavior of the unmanned aerial vehicle, and describing the state of the unmanned aerial vehicle in the process of executing the search task, so as to obtain the information of the unmanned aerial vehicle in the search process. All possible states and their probability distributions. The advantage of this is that the possible states of the drone during the search process and their probability distribution can be obtained.

优选的,所述构建搜索任务下无人机行为预测的马尔科夫模型的步骤,可以具体为:Preferably, the step of constructing the Markov model for predicting the behavior of the UAV under the search task may be specifically:

设无人机搜索任务进行中的时刻集合T={1,2,3,…};Set the time set T={1,2,3,…} when the UAV search task is in progress;

设无人机的离散状态空间S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Set the discrete state space S=(s 1 , s 2 , s 3 ,...) of the UAV, which contains all possible states during the UAV search process and the Markov moving target movement process;

设无人机的动作空间A={a1,a2,...,ax,...,aq},表示无人机所有可能的改变状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数;Let the action space A of the UAV = {a 1 ,a 2 ,...,a x ,...,a q }, which represents all possible actions of the UAV to change the state, where the element a x represents the xth actions, q is the number of elements in the action space;

设无人机处于状态sn下的可行动作集合A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机位于某个状态下可以采取的所有动作集合;Suppose the feasible action set A(s n ) of the UAV in the state sn = {a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, representing the UAV A collection of all actions that can be taken in a state;

设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其中的任意元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,状态变化到sm的概率,

Figure BDA0001732310350000091
Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and any element p(s m |s n , a x (s n )) in the state s n Next, the probability that the state changes to s m after performing the available action a x (s n ),
Figure BDA0001732310350000091

设报酬集合R(sn)的任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Let any element r(s n , a x (s n )) of the reward set R(s n ) represent the reward for performing the action a x (s n ) in the state sn ;

则无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型为:Then the Markov model for predicting the search behavior of the UAV under any search task is:

MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );

其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射,→表示输出最优策略。Among them, π is the strategy, which represents the mapping from the state set to the action set, π(s n ) represents the mapping of the UAV from the state s n to the action set, and → represents the output optimal strategy.

优选的,所述规划步骤,可以包括:Preferably, the planning step may include:

计算步骤:用MDP折扣模型计算报酬效用函数,其中折扣因子γ满足:0<γ<1;折扣模型的报酬函数

Figure BDA0001732310350000092
表示在从时刻t=0开始无人机从状态sn使用策略π后的折扣期望总报酬;Calculation steps: use the MDP discount model to calculate the reward utility function, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model
Figure BDA0001732310350000092
represents the discounted expected total reward after the UAV uses policy π from state sn from time t =0;

根据MDP折扣模型的最优方程,建立在状态sn下无人机在搜索任务中搜索操作收益的最优状态值函数方程

Figure BDA0001732310350000101
Figure BDA0001732310350000102
以及最优动作值函数方程
Figure BDA0001732310350000103
Figure BDA0001732310350000104
并根据两个最优函数方程建立最优搜索策略函数方程
Figure BDA0001732310350000105
According to the optimal equation of the MDP discount model, the optimal state value function equation of the UAV's search operation revenue in the search task under the state sn is established
Figure BDA0001732310350000101
Figure BDA0001732310350000102
and the optimal action value function equation
Figure BDA0001732310350000103
Figure BDA0001732310350000104
And establish the optimal search strategy function equation according to the two optimal function equations
Figure BDA0001732310350000105

给定步骤:对协同搜索区域进行栅格化划分,确定MDP模型的离散状态空间S,给定参与搜索任务的无人机数目g,g、i为正整数,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合,Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;Given step: divide the collaborative search area into grids, determine the discrete state space S of the MDP model, and give the number of UAVs participating in the search task g, where g and i are positive integers, and s UAV(i) is the i-th UAV(i) is the current state of a UAV, s UAV(i) ∈ S, A(s UAV(i) ) is the action set of the i-th UAV in the state s UAV(i) , and K i is the i-th UAV The maximum search step size of the machine; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;

初始步骤:确定目标运动的初始位置以及每个无人机开始搜索的位置;每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布,从而确定每个无人机下一时刻即将搜索的虚拟目标位置;Initial step: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the target existence probability distribution in the entire area according to the initial position of the target movement and the target movement heuristic information, so as to determine The virtual target position that each drone will search for at the next moment;

迭代步骤:每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),迭代计算各自的状态值函数Vb+1(sUAV(i)),令迭代次数b=b+1;Iterative step: each UAV calculates its own state value function V b+1 (s UAV(i) ) iteratively according to its own virtual target position and its current state s UAV(i) , let the number of iterations b =b+1;

判断步骤:如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,进入遍历步骤;否则,转到迭代步骤;Judgment step: if ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and enter the traversal step; otherwise, go to the iteration step;

遍历步骤:每个无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的搜索行为策略πi(t+1)*Traversal step: each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ), and finally finds get the most profitable search behavior strategy π i (t+1) * ;

转移步骤:按照所求的最优策略πi(t+1)*执行动作ai,状态由

Figure BDA0001732310350000106
转移到
Figure BDA0001732310350000107
同时,无人机获得立即报酬ri(sUAV(i),ai),此时令t=t+1,令第i架无人机搜索步长ki=ki+1;Transition step: according to the required optimal strategy π i (t+1) * execute action a i , the state is given by
Figure BDA0001732310350000106
move to
Figure BDA0001732310350000107
At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), at this time, let t=t+1, and let the i-th UAV search step ki = ki +1;

结束步骤:若在某一个时刻t,第i架无人机位置sUAV(i)与目标当前模拟位置starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长

Figure BDA0001732310350000111
i=1,2,...,n,则搜索任务失败,算法结束;所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现(所处)的位置。Ending step: If at a certain time t, the position s UAV(i) of the i-th UAV is the same as the current simulated position s target of the target, then the i-th UAV successfully searches for the target, the search task is completed, and the algorithm ends; If the search step
Figure BDA0001732310350000111
i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. where it might appear.

这样做的好处是,能够采用MDP折扣模型确定搜索任务下无人机存在状态转移概率集合和转移过程中飞行行为操作的报酬集合,计算报酬效用函数,建立搜索任务下无人机行为收益的最优方程,进行迭代计算和判断,得到收益最大的搜索行为策略,规划出无人机最优的搜索航迹。The advantage of this is that the MDP discount model can be used to determine the probability set of UAV existence state transition under the search task and the reward set of the flight behavior operation during the transfer process, calculate the reward utility function, and establish the maximum value of the UAV behavior benefit under the search task. The optimal equation is used to perform iterative calculation and judgment to obtain the most profitable search behavior strategy, and to plan the optimal search path of the UAV.

本发明的一种应用场景可以是:An application scenario of the present invention may be:

1.对目标的运动区域(同样是无人机的搜索区域)进行栅格化划分,划分后的区域模型如图1所示;利用概率论的知识对马尔科夫运动目标建模,获得马尔科夫运动目标的概率模型,马尔科夫运动目标在运动一定步数后在运动区域的存在概率如图2a~2d所示,其中图2a为初始时刻,图2b为运动一步后,图2c为运动5步后,图2d为运动10步后;1. Perform grid division on the moving area of the target (also the search area of the UAV), and the divided area model is shown in Figure 1; use the knowledge of probability theory to model the Markov moving target, and obtain the Markov moving target. The probability model of the Markov moving target, the existence probability of the Markov moving target in the moving area after moving a certain number of steps is shown in Figure 2a ~ 2d, in which Figure 2a is the initial moment, Figure 2b is after one step of movement, Figure 2c is After 5 steps of exercise, Figure 2d shows after 10 steps of exercise;

MDP-MSHPI算法理论框架结构如图3所示;The theoretical framework of the MDP-MSHPI algorithm is shown in Figure 3;

2.对无人机的飞行行为进行编码并描述,并对无人机在执行搜索任务过程中存在的状态进行描述;2. Code and describe the flight behavior of the UAV, and describe the state of the UAV in the process of performing the search task;

3.构建搜索任务下无人机行为预测的马尔科夫模型,在这一步骤中需要提前定义以下内容:3. Build a Markov model for UAV behavior prediction under search tasks. In this step, the following contents need to be defined in advance:

定义1:设T={1,2,3,…}表示时刻集合;Definition 1: Let T={1,2,3,…} represent the time set;

定义2:设S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Definition 2: Set S=(s 1 , s 2 , s 3 ,...), the state space contains all possible states during the UAV search process and the Markov moving target movement process;

定义3:设A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机处于状态sn下的可以采取的所有动作集合;Definition 3: Let A(s n )={a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, it means that the UAV is in the state sn that can be taken A collection of all actions;

定义4:设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,无人机状态变化到sm的概率,并假设

Figure BDA0001732310350000112
Definition 4: Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and its element p(s m |s n , a x (s n )) represents the state s Under n , the probability of the UAV state changing to s m after performing the available actions a x (s n ), and assuming
Figure BDA0001732310350000112

定义5:设R(sn)表示报酬集合,其中任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Definition 5: Let R(s n ) represent the reward set, where any element r(s n , a x (s n )) represents the reward for performing the action a x (s n ) in the state sn ;

定义6:设A={a1,a2,...,aq},A代表无人机的动作空间,包含了无人机所有可能的改变无人机状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数x=1,2,3,...q;Definition 6: Let A={a 1 ,a 2 ,...,a q }, A represents the action space of the UAV, including all possible actions of the UAV to change the state of the UAV, among which the element a x Represents the xth action, q is the number of elements in the action space x=1,2,3,...q;

给出无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型如下:The Markov model that predicts the search behavior of the UAV to perform any search task is as follows:

MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );

其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射;Among them, π is the strategy, which represents the mapping from the state set to the action set, and π(s n ) represents the mapping of the UAV from the state s n to the action set;

4.确定无人机在该搜索任务下状态转移概率集合T(sn,ax(sn),sm),并根据搜索任务要求确定状态转移过程中的报酬集合R(sn);4. Determine the state transition probability set T(s n , a x (s n ), s m ) of the UAV under the search task, and determine the reward set R(s n ) in the state transition process according to the search task requirements;

根据无人机当前所处的状态sn,建立无人机搜索任务下搜索操作收益的最优状态值函数方程;According to the current state s n of the UAV, establish the optimal state value function equation of the search operation revenue under the UAV search task;

本发明在计算报酬效用函数时使用MDP折扣模型,即在确定一个策略并执行后,决策者会在时刻T依一定的概率获取一定的报酬,报酬累加起来后就是该系统的效用函数,其中的折扣模型中的折扣因子用γ表示,且有0<γ<1,MDP折扣模型的效用函数表示为

Figure BDA0001732310350000121
该函数表示系统在t=0时刻从sn状态下,使用策略π后无人机的折扣期望总报酬,其中效用函数为有界函数;The present invention uses the MDP discount model when calculating the reward utility function, that is, after a strategy is determined and executed, the decision maker will obtain a certain reward at time T according to a certain probability, and the cumulative reward is the utility function of the system, where the The discount factor in the discount model is represented by γ, and 0<γ<1, the utility function of the MDP discount model is expressed as
Figure BDA0001732310350000121
This function represents the discounted expected total reward of the UAV after using the strategy π from the state of sn at t =0, where the utility function is a bounded function;

根据MDP折扣模型的最优方程,建立无人机在状态sn下执行该搜索任务过程中搜索操作收益的最优状态值函数方程

Figure BDA0001732310350000122
Figure BDA0001732310350000123
以及最优动作值函数方程
Figure BDA0001732310350000124
Figure BDA0001732310350000125
并根据两个最优函数模型建立最优搜索策略函数方程
Figure BDA0001732310350000126
According to the optimal equation of the MDP discount model, the optimal state value function equation of the search operation revenue in the process of the UAV performing the search task in the state sn is established.
Figure BDA0001732310350000122
Figure BDA0001732310350000123
and the optimal action value function equation
Figure BDA0001732310350000124
Figure BDA0001732310350000125
And establish the optimal search strategy function equation according to the two optimal function models
Figure BDA0001732310350000126

5.初始化参数;给定参与搜索任务的无人机数目g,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合;Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;其中i为正整数;5. Initialization parameters; given the number of UAVs participating in the search task g, s UAV(i) is the current state of the ith UAV, s UAV(i) ∈ S, and A(s UAV(i) ) is the current state of the ith UAV. The action set of i UAV in state s UAV(i) ; K i is the maximum search step size of the i-th UAV; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0; where i is a positive integer;

6.确定目标运动的初始位置以及每个无人机开始搜索的位置;每架无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布;从而确定每个无人机下一时刻即将搜索的虚拟目标位置;6. Determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the target's existence probability distribution in the entire area according to the initial position of the target's movement and the target movement heuristic information; The virtual target position that the drone will search for at the next moment;

7.每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),通过计算搜索操作收益的最优状态值函数方程得到Vb+1(sUAV(i))迭代次数b=b+1;7. According to its own virtual target position and its current state s UAV(i) , each UAV obtains V b+1 (s UAV(i) ) by calculating the optimal state value function equation of the search operation revenue The number of iterations b=b+1;

8.如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,进行下一步骤,否则,转到步骤7;8. If ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and go to the next step, otherwise, go to step 7;

9.每架无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的行为策略πi(t+1)*;每架无人机按照以上步骤都能找到下一时刻的最优搜索行为策略,如图4所示;9. Each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ), and finally obtains The most profitable behavior strategy π i (t+1) * ; each UAV can find the optimal search behavior strategy at the next moment according to the above steps, as shown in Figure 4;

10.按照所求的最优策略πi(t+1)*执行动作ai,状态由

Figure BDA0001732310350000131
转移到
Figure BDA0001732310350000132
于此同时,无人机获得立即报酬ri(sUAV(i),ai),t=t+1,第i架无人机搜索步长ki=ki+1;10. According to the required optimal strategy π i (t+1) * execute action a i , the state is given by
Figure BDA0001732310350000131
move to
Figure BDA0001732310350000132
At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), t=t+1, and the i-th UAV searches for a step size ki = ki +1;

11.若在某一个时刻t,第i架无人机位置sUAV(i)与目标真实位置starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长

Figure BDA0001732310350000133
Figure BDA0001732310350000134
i=1,2,...,n,则搜索失败,算法结束;多架无人机协同搜索Markov运动目标的程序框图如图5所示。11. If at a certain time t, the position s UAV(i) of the i-th UAV is the same as the real position of the target s target , then the i-th UAV successfully searches for the target, the search task is completed, and the algorithm ends; step size
Figure BDA0001732310350000133
Figure BDA0001732310350000134
If i=1,2,...,n, the search fails and the algorithm ends; the block diagram of the coordinated search of Markov moving targets by multiple UAVs is shown in Figure 5.

在上述的具体实施例一中,提供了马尔科夫运动目标的无人机搜索方法,与之相对应的,本申请还提供马尔科夫运动目标的无人机搜索装置。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。In the above-mentioned specific embodiment 1, a UAV search method for a Markov moving target is provided. Correspondingly, the present application also provides a UAV search device for a Markov moving target. Since the apparatus embodiment is basically similar to the method embodiment, the description is relatively simple, and reference may be made to part of the description of the method embodiment for related parts. The apparatus embodiments described below are merely illustrative.

具体实施例二Specific embodiment two

如图6所示,本发明实施例提供了一种马尔科夫运动目标的无人机搜索装置,包括:As shown in FIG. 6 , an embodiment of the present invention provides a UAV search device for a Markov moving target, including:

目标模块201,用于接收到搜索任务后,构建马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;The target module 201 is configured to construct a probability model of the Markov moving target after receiving the search task, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;

无人机模块202,用于获取无人机搜索过程中所有可能出现的状态及其概率分布;The UAV module 202 is used to obtain all possible states and their probability distributions in the UAV search process;

构建模块203,用于根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;The building module 203 is used to construct a Markov model for predicting the behavior of the UAV under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process, and establish a Markov model based on Markov behavior prediction. Multi-stage heuristic policy iteration algorithm for Kov decision;

规划模块204,用于利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。The planning module 204 is configured to use the multi-stage heuristic strategy iterative algorithm based on Markov decision to obtain the search behavior strategy with the greatest profit, thereby planning the optimal search track of the UAV.

本发明实施例克服了传统搜索算法,如扫描线搜索算法,只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的较大缺点,根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布,确定无人机下一时刻即将搜索的虚拟目标的位置,并得到一组无人机的搜索行为操作序列,此序列为行为收益最大的搜索行为策略,能够应用于搜索运动目标,且能够以较低的搜索成本成功搜索到目标。The embodiment of the present invention overcomes the major shortcomings of traditional search algorithms, such as scan line search algorithms, which only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which leads to high search costs. At this moment, the existence probability distribution of the Markov moving target in the search area determines the position of the virtual target that the UAV will search for at the next moment, and obtains a sequence of search behaviors of the UAV. This sequence is the search with the greatest behavioral benefit. The behavior strategy can be applied to search for moving targets, and the target can be successfully searched with low search cost.

优选的,所述目标模块201可以用于:接收到搜索任务后,对目标的运动区域进行栅格化划分,利用概率论对马尔科夫运动目标建模,获取马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;所述目标的运动区域等同于无人机的搜索区域。Preferably, the target module 201 can be used to: after receiving the search task, perform grid division on the moving area of the target, use probability theory to model the Markov moving target, and obtain the probability model of the Markov moving target , so as to obtain all possible states and their probability distributions during the movement of the Markov moving target; the movement area of the target is equivalent to the search area of the UAV.

优选的,所述无人机模块202可以用于:对无人机的飞行行为进行编码并描述,对无人机在执行搜索任务过程中存在的状态进行描述,从而获取无人机搜索过程中所有可能出现的状态及其概率分布。Preferably, the UAV module 202 can be used to: encode and describe the flight behavior of the UAV, and describe the state of the UAV in the process of executing the search task, so as to obtain the information of the UAV in the process of searching for the UAV. All possible states and their probability distributions.

优选的,所述构建搜索任务下无人机行为预测的马尔科夫模型,可以具体为:Preferably, the construction of the Markov model for predicting the behavior of the UAV under the search task may be specifically:

设无人机搜索任务进行中的时刻集合T={1,2,3,…};Set the time set T={1,2,3,…} when the UAV search task is in progress;

设无人机的离散状态空间S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Set the discrete state space S=(s 1 , s 2 , s 3 ,...) of the UAV, which contains all possible states during the UAV search process and the Markov moving target movement process;

设无人机的动作空间A={a1,a2,...,ax,...,aq},表示无人机所有可能的改变状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数;Let the action space A of the UAV = {a 1 ,a 2 ,...,a x ,...,a q }, which represents all possible actions of the UAV to change the state, where the element a x represents the xth actions, q is the number of elements in the action space;

设无人机处于状态sn下的可行动作集合A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机位于某个状态下可以采取的所有动作集合;Suppose the feasible action set A(s n ) of the UAV in the state sn = {a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, representing the UAV A collection of all actions that can be taken in a state;

设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其中的任意元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,状态变化到sm的概率,其中

Figure BDA0001732310350000151
Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and any element p(s m |s n , a x (s n )) in the state s n , the probability that the state changes to s m after performing the available actions a x (s n ), where
Figure BDA0001732310350000151

设报酬集合R(sn)的任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Let any element r(s n , a x (s n )) of the reward set R(s n ) represent the reward for performing the action a x (s n ) in the state sn ;

则无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型为:Then the Markov model for predicting the search behavior of the UAV under any search task is:

MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );

其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射,→表示输出最优策略。Among them, π is the strategy, which represents the mapping from the state set to the action set, π(s n ) represents the mapping of the UAV from the state s n to the action set, and → represents the output optimal strategy.

优选的,所述规划模块204可以包括:Preferably, the planning module 204 may include:

计算单元,用于:用MDP折扣模型计算报酬效用函数,其中折扣因子γ满足:0<γ<1;折扣模型的报酬函数

Figure BDA0001732310350000152
表示在从时刻t=0开始无人机从状态sn使用策略π后的折扣期望总报酬;The calculation unit is used to: calculate the reward utility function with the MDP discount model, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model
Figure BDA0001732310350000152
represents the discounted expected total reward after the UAV uses policy π from state sn from time t =0;

根据MDP折扣模型的最优方程,建立在状态sn下无人机在搜索任务中搜索操作收益的最优状态值函数方程

Figure BDA0001732310350000153
Figure BDA0001732310350000154
以及最优动作值函数方程
Figure BDA0001732310350000155
Figure BDA0001732310350000156
并根据两个最优函数方程建立最优搜索策略函数方程
Figure BDA0001732310350000157
According to the optimal equation of the MDP discount model, the optimal state value function equation of the UAV's search operation revenue in the search task under the state sn is established
Figure BDA0001732310350000153
Figure BDA0001732310350000154
and the optimal action value function equation
Figure BDA0001732310350000155
Figure BDA0001732310350000156
And establish the optimal search strategy function equation according to the two optimal function equations
Figure BDA0001732310350000157

给定单元,用于:对协同搜索区域进行栅格化划分,确定MDP模型的离散状态空间S,给定参与搜索任务的无人机数目g,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合,Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;The given unit is used to: rasterize the collaborative search area, determine the discrete state space S of the MDP model, given the number of UAVs participating in the search task g, s UAV(i) is the i-th UAV The current state, s UAV(i) ∈ S, A(s UAV(i) ) is the action set of the i-th UAV in the state s UAV(i) , and K i is the maximum search of the i-th UAV Step size; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;

初始单元,用于:确定目标运动的初始位置以及每个无人机开始搜索的位置;每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布,从而确定每个无人机下一时刻即将搜索的虚拟目标位置;The initial unit is used to: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the existence probability distribution of the target in the entire area according to the initial position of the target movement and the target movement heuristic information , so as to determine the virtual target position that each UAV will search for at the next moment;

迭代单元,用于:每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),迭代计算各自的状态值函数Vb+1(sUAV(i)),令迭代次数b=b+1;The iterative unit is used for: each UAV based on its own virtual target position and its current state s UAV(i) , iteratively calculates its own state value function V b+1 (s UAV(i) ), let The number of iterations b=b+1;

判断单元,用于:如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,调用遍历单元;否则,调用迭代单元;Judging unit, used for: if ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and call the traversal unit; otherwise, call the iteration unit;

遍历单元,用于:每个无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的搜索行为策略πi(t+1)*Traversal unit for: each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ) , and finally obtain the search behavior strategy with the largest profit π i (t+1) * ;

转移单元,用于:按照所求的最优策略πi(t+1)*执行动作ai,状态由

Figure BDA0001732310350000161
转移到
Figure BDA0001732310350000162
同时,无人机获得立即报酬ri(sUAV(i),ai),此时令t=t+1,令第i架无人机搜索步长ki=ki+1;The transition unit is used to: execute the action a i according to the required optimal strategy π i (t+1) * , the state is given by
Figure BDA0001732310350000161
move to
Figure BDA0001732310350000162
At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), at this time, let t=t+1, and let the i-th UAV search step ki = ki +1;

结束单元,用于:若在某一个时刻t,第i架无人机位置sUAV(i)与目标当前模拟位置starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长

Figure BDA0001732310350000163
i=1,2,...,n,则搜索任务失败,算法结束;所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现的位置。The end unit is used for: if at a certain time t, the position s UAV(i) of the i-th UAV is the same as the current simulation position s target of the target, then the i-th UAV successfully searches for the target, and the search task is completed, The algorithm ends; if the search step is
Figure BDA0001732310350000163
i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. possible location.

本发明从使用目的上,效能上,进步及新颖性等观点进行阐述,其具有的实用进步性,己符合专利法所强调的功能增进及使用要件,本发明以上的说明及附图,仅为本发明的较佳实施例而己,并非以此局限本发明,因此,凡一切与本发明构造,装置,待征等近似、雷同的,即凡依本发明专利申请范围所作的等同替换或修饰等,皆应属本发明的专利申请保护的范围之内。The present invention is explained from the viewpoints of purpose of use, efficiency, progress and novelty, etc. The practical progress of the present invention has met the functional enhancement and use requirements emphasized by the patent law. The above description and drawings of the present invention are only for The preferred embodiments of the present invention are not intended to limit the present invention. Therefore, all structures, devices, and waiting lists are similar or similar to those of the present invention, that is, any equivalent replacement or modification made according to the scope of the patent application of the present invention. etc., shall all fall within the scope of protection of the patent application of the present invention.

需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。尽管本发明已进行了一定程度的描述,明显地,在不脱离本发明的精神和范围的条件下,可进行各个条件的适当变化。可以理解,本发明不限于所述实施方案,而归于权利要求的范围,其包括所述每个因素的等同替换。对本领域的技术人员来说,可根据以上描述的技术方案以及构思,做出其它各种相应的改变以及形变,而所有的这些改变以及形变都应该属于本发明权利要求的保护范围之内。It should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other under the condition of no conflict. Although this invention has been described to a certain extent, it will be apparent that suitable changes in various conditions may be made without departing from the spirit and scope of the invention. It is to be understood that the invention is not limited to the embodiments described, but is to be included within the scope of the claims, which include equivalents for each of the elements described. For those skilled in the art, various other corresponding changes and deformations can be made according to the technical solutions and concepts described above, and all these changes and deformations should fall within the protection scope of the claims of the present invention.

Claims (6)

1. An unmanned aerial vehicle searching method for a Markov moving target is characterized by comprising the following steps:
a target step, namely after receiving a search task, constructing a probability model of the Markov moving target so as to obtain all possible states and probability distribution thereof in the moving process of the Markov moving target;
the unmanned plane step, obtaining all possible states and probability distribution thereof in the unmanned plane searching process;
a construction step, wherein a Markov model for predicting unmanned aerial vehicle behaviors under a search task is constructed according to all possible states and probability distribution thereof in the unmanned aerial vehicle search process and the Markov moving target movement process, and a multi-stage heuristic strategy iterative algorithm based on Markov decision is established;
a planning step, namely acquiring a search behavior strategy with the maximum profit by using a multi-stage heuristic strategy iterative algorithm based on Markov decision, so as to plan the optimal search track of the unmanned aerial vehicle;
the establishment of the Markov model for unmanned aerial vehicle behavior prediction under the search task specifically comprises the following steps:
setting a time set T ═ 1,2,3, … } when the unmanned aerial vehicle search task is in progress;
let the discrete state space S ═ S of unmanned aerial vehicle (S)1,s2,s3…) containing all possible states during drone search and during markov moving target motion;
let action space A ═ a of unmanned aerial vehicle ═ a1,a2,...,ax,...,aqRepresents all possible state-changing actions of the drone, element axRepresents the x-th action, and q is the number of elements in the action space;
let unmanned aerial vehicle be in state snSet of possible actions A(s) ofn)={a1(sn),a2(sn),a3(sn) ,., representing the set of all actions that a drone can take in a certain state;
let T(s)n,ax(sn),sm) Represents the set of all state transition probabilities of the drone, with any element p(s)m|sn,ax(sn) Is in state s)nNext, perform available action ax(sn) After that, the state changes to smThe probability of (a) of (b) being,
Figure FDA0002635446200000011
set of remunerations R(s)n) Any element r(s) ofn,ax(sn) Is in state s)nPerforming action ax(sn) A reward of (1);
the markov model for predicting the search behavior of the unmanned aerial vehicle executing any search task is as follows:
MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);
where π is the policy, which represents the mapping from the state set to the action set, π(s)n) Slave state s for unmanned aerial vehiclenMapping to action set, → representing the output optimal strategy;
the planning step comprises:
a calculation step: calculating a reward utility function with the MDP discount model, wherein the discount factor γ satisfies: gamma is more than 0 and less than 1; reward function for discount model
Figure FDA0002635446200000012
Representing the discount expectation total reward after the unmanned aerial vehicle uses the strategy pi from the state sn from the moment t-0;
according to the optimal equation of the MDP discount model, an optimal state value function equation of the search operation income of the unmanned aerial vehicle in the search task under the state sn is established
Figure FDA0002635446200000021
And an optimal action value function equation
Figure FDA0002635446200000022
And establishing an optimal search strategy function equation according to the two optimal function equations
Figure FDA0002635446200000023
The method comprises the following steps: rasterizing a collaborative search area, determining a discrete state space S of an MDP model, giving the number g, g and i of unmanned aerial vehicles participating in a search task as positive integers, and giving SUAV(i)For the ith unmanned plane current state, sUAV(i)∈S,A(sUAV(i)) For the ith unmanned plane at state sUAV(i)Set of actions ofiThe maximum search step length of the ith unmanned aerial vehicle is set; giving a discount factor gamma and a strategy iteration ending condition, and enabling the iteration number b to be 0;
the method comprises the following initial steps: determining an initial position of target motion and a position where each unmanned aerial vehicle starts to search; each unmanned aerial vehicle obtains the existence probability distribution of the target in the whole area according to the initial position of the target starting to move and the heuristic information of the target movement, so that the virtual target position to be searched at the next moment of each unmanned aerial vehicle is determined;
iteration step: each unmanned aerial vehicle is positioned according to the virtual target and the current state sUAV(i)Iteratively calculating respective state value functions Vb+1(sUAV(i)) Making the iteration number b equal to b + 1;
a judging step: if V | |b+1(sUAV(i))-Vb(sUAV(i)) If | is less, ending the iteration and entering the traversal step; otherwise, turning to the iteration step;
traversing: each unmanned aerial vehicle obtains a state value function V according to the final resultb+1(sUAV(i)) Traverse A(s)UAV(i)) Obtaining Q(s)UAV(i),ai) Finally, the search behavior strategy pi with the maximum profit is obtainedi(t+1)*
A transfer step: according to the optimal strategy pii(t+1)*Performing action aiThe state is given by
Figure FDA0002635446200000024
Is transferred to
Figure FDA0002635446200000025
At the same time, the drone gets an immediate reward ri(sUAV(i),ai) When t is t +1, the ith unmanned aerial vehicle searches for step length ki=ki+1;
And (5) finishing the steps: if at a certain moment t, the ith unmanned aerial vehicle position sUAV(i)With the current simulated position s of the targettargetIf the number of the unmanned aerial vehicles is the same as the number of the unmanned aerial vehicles, the unmanned aerial vehicle of the ith frame successfully searches the target, the search task is completed, and the algorithm is ended; if the search step size is
Figure FDA0002635446200000031
The search task fails and the algorithm ends; and the current simulation position of the target is the current most likely position of the target obtained according to all likely states and probability distribution of the Markov motion target in the motion process.
2. The unmanned aerial vehicle searching method for a markov moving target according to claim 1, wherein the target step specifically is:
after receiving a search task, rasterizing a motion area of a target, modeling a Markov motion target by using probability theory, and acquiring a probability model of the Markov motion target so as to obtain all possible states and probability distribution of the Markov motion target in the motion process; the motion area of the target is equal to the search area of the unmanned aerial vehicle.
3. The unmanned aerial vehicle searching method for a markov moving target according to claim 1 or 2, wherein the unmanned aerial vehicle comprises:
the flight behavior of the unmanned aerial vehicle is coded and described, and states of the unmanned aerial vehicle in the process of executing a search task are described, so that all possible states and probability distribution of the states in the process of searching the unmanned aerial vehicle are obtained.
4. An unmanned aerial vehicle searching device for Markov moving targets, comprising:
the target module is used for constructing a probability model of the Markov moving target after receiving the search task so as to obtain all possible states and probability distribution of the Markov moving target in the moving process;
the unmanned aerial vehicle module is used for acquiring all possible states and probability distribution thereof in the searching process of the unmanned aerial vehicle;
the building module is used for building a Markov model for predicting the behavior of the unmanned aerial vehicle under a search task according to all possible states and probability distribution thereof in the unmanned aerial vehicle search process and the Markov moving target motion process, and building a multi-stage heuristic strategy iterative algorithm based on Markov decision;
the planning module is used for acquiring a search behavior strategy with the maximum profit by utilizing a multi-stage heuristic strategy iterative algorithm based on Markov decision, so as to plan the optimal search track of the unmanned aerial vehicle;
the establishment of the Markov model for unmanned aerial vehicle behavior prediction under the search task specifically comprises the following steps:
setting a time set T ═ 1,2,3, … } when the unmanned aerial vehicle search task is in progress;
let the discrete state space S ═ S of unmanned aerial vehicle (S)1,s2,s3…) containing all possible states during drone search and during markov moving target motion;
let action space A ═ a of unmanned aerial vehicle ═ a1,a2,...,ax,...,aqRepresents all possible state-changing actions of the drone, element axRepresents the x-th action, and q is the number of elements in the action space;
let unmanned aerial vehicle be in state snSet of possible actions A(s) ofn)={a1(sn),a2(sn),a3(sn) ,., representing the set of all actions that a drone can take in a certain state;
let T(s)n,ax(sn),sm) Represents the set of all state transition probabilities of the drone, with any element p(s)m|sn,ax(sn) Is in state s)nNext, perform available action ax(sn) After that, the state changes to smWherein, the probability of
Figure FDA0002635446200000041
Set of remunerations R(s)n) Any element r(s) ofn,ax(sn) Is in state s)nPerforming action ax(sn) A reward of (1);
the markov model for predicting the search behavior of the unmanned aerial vehicle executing any search task is as follows:
MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);
where π is the policy, which represents the mapping from the state set to the action set, π(s)n) Slave state s for unmanned aerial vehiclenMapping to action set, → representing the output optimal strategy;
the planning module comprises:
a computing unit to: calculating a reward utility function with the MDP discount model, wherein the discount factor γ satisfies: gamma is more than 0 and less than 1; reward function for discount model
Figure FDA0002635446200000042
Indicating that the drone starts from state s at time t-0nDiscounts after strategy π are used to expect a total reward;
establishing a state s according to an optimal equation of the MDP discount modelnOptimal state value function equation for searching operation income of unmanned aerial vehicle in search task
Figure FDA0002635446200000043
And an optimal action value function equation
Figure FDA0002635446200000044
And establishing an optimal search strategy function equation according to the two optimal function equations
Figure FDA0002635446200000045
A given unit for: rasterizing a collaborative search area, determining a discrete state space S of an MDP model, and giving the number g, S of unmanned aerial vehicles participating in a search taskUAV(i)For the ith unmanned plane current state, sUAV(i)∈S,A(sUAV(i)) For the ith unmanned plane at state sUAV(i)Set of actions ofiThe maximum search step length of the ith unmanned aerial vehicle is set; giving a discount factor gamma and a strategy iteration ending condition, and enabling the iteration number b to be 0;
an initial unit configured to: determining an initial position of target motion and a position where each unmanned aerial vehicle starts to search; each unmanned aerial vehicle obtains the existence probability distribution of the target in the whole area according to the initial position of the target starting to move and the heuristic information of the target movement, so that the virtual target position to be searched at the next moment of each unmanned aerial vehicle is determined;
an iteration unit to: each unmanned aerial vehicle is positioned according to the virtual target and the current state sUAV(i)Iteratively calculating respective state value functions Vb+1(sUAV(i)) Making the iteration number b equal to b + 1;
a determination unit configured to: if V | |b+1(sUAV(i))-Vb(sUAV(i)) If the | is less, ending the iteration and calling the traversal unit, otherwise, calling the iteration unit;
a traversal unit to: each unmanned aerial vehicle obtains a state value function V according to the final resultb+1(sUAV(i)) Traverse A(s)UAV(i)) Obtaining Q(s)UAV(i),ai) Finally get outMaximum benefit search behavior strategy pii(t+1)*
A transfer unit for: according to the optimal strategy pii(t+1)*Performing action aiThe state is given by
Figure FDA0002635446200000051
Is transferred to
Figure FDA0002635446200000052
At the same time, the drone gets an immediate reward ri(sUAV(i),ai) When t is t +1, the ith unmanned aerial vehicle searches for step length ki=ki+1;
An ending unit configured to: if at a certain moment t, the ith unmanned aerial vehicle position sUAV(i)With the current simulated position s of the targettargetIf the number of the unmanned aerial vehicles is the same as the number of the unmanned aerial vehicles, the unmanned aerial vehicle of the ith frame successfully searches the target, the search task is completed, and the algorithm is ended; if the search step size is
Figure FDA0002635446200000053
The search task fails and the algorithm ends; and the current simulation position of the target is the current most likely position of the target obtained according to all likely states and probability distribution of the Markov motion target in the motion process.
5. The Markov moving target drone searching device of claim 4, wherein the target module is to:
after receiving a search task, rasterizing a motion area of a target, modeling a Markov motion target by using probability theory, and acquiring a probability model of the Markov motion target so as to obtain all possible states and probability distribution of the Markov motion target in the motion process; the motion area of the target is equal to the search area of the unmanned aerial vehicle.
6. A Markov moving target drone search device as claimed in claim 4 or 5, wherein the drone module is to:
the flight behavior of the unmanned aerial vehicle is coded and described, and states of the unmanned aerial vehicle in the process of executing a search task are described, so that all possible states and probability distribution of the states in the process of searching the unmanned aerial vehicle are obtained.
CN201810779927.8A 2018-07-16 2018-07-16 Unmanned aerial vehicle searching method and device for Markov moving target Active CN108594858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810779927.8A CN108594858B (en) 2018-07-16 2018-07-16 Unmanned aerial vehicle searching method and device for Markov moving target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810779927.8A CN108594858B (en) 2018-07-16 2018-07-16 Unmanned aerial vehicle searching method and device for Markov moving target

Publications (2)

Publication Number Publication Date
CN108594858A CN108594858A (en) 2018-09-28
CN108594858B true CN108594858B (en) 2020-10-27

Family

ID=63617662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810779927.8A Active CN108594858B (en) 2018-07-16 2018-07-16 Unmanned aerial vehicle searching method and device for Markov moving target

Country Status (1)

Country Link
CN (1) CN108594858B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11579611B1 (en) 2020-03-30 2023-02-14 Amazon Technologies, Inc. Predicting localized population densities for generating flight routes
US11640764B1 (en) * 2020-06-01 2023-05-02 Amazon Technologies, Inc. Optimal occupancy distribution for route or path planning
US11868145B1 (en) 2019-09-27 2024-01-09 Amazon Technologies, Inc. Selecting safe flight routes based on localized population densities and ground conditions

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409592B (en) * 2018-10-15 2021-08-24 浙江工业大学 Optimal strategy solution for mobile robots in dynamic environments
CN109270833A (en) * 2018-10-23 2019-01-25 大连海事大学 Variable-discourse-domain fuzzy control method based on brushless direct current motor Q learning
CN109521447B (en) * 2018-11-16 2022-10-14 福州大学 Missing target searching method based on greedy strategy
WO2021043387A1 (en) * 2019-09-03 2021-03-11 Huawei Technologies Co., Ltd. Large-scale policy evaluation in multi-agent systems
CN112824998A (en) * 2019-11-20 2021-05-21 南京航空航天大学 Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process
CN110883776B (en) * 2019-11-29 2021-04-23 河南大学 A Robot Path Planning Algorithm Based on Improved DQN Based on Fast Search Mechanism
CN111107602B (en) * 2019-12-24 2021-07-27 杭州电子科技大学 A Secure Routing Method with Minimum Energy Consumption and Delay Weighting for Wireless Body Area Networks
CN112382134B (en) * 2020-04-26 2021-07-30 北京三快在线科技有限公司 Method, apparatus, storage medium and electronic device for generating flight path
CN113242556B (en) * 2021-06-04 2022-08-23 重庆邮电大学 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services
CN114115285B (en) * 2021-11-29 2024-07-19 大连海事大学 Multi-agent emotion target path planning method and device
CN114722946B (en) * 2022-04-12 2022-12-20 中国人民解放军国防科技大学 Synthesis of Asynchronous Action and Cooperative Strategy for Unmanned Aerial Vehicle Based on Probabilistic Model Checking
CN117333759A (en) * 2023-09-14 2024-01-02 成都飞机工业(集团)有限责任公司 Target searching method based on probability prediction

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Path Planning Algorithm for Mobile Robots Based on Single Chain Sequential Backtracking Q-Learning
KR101299134B1 (en) * 2012-03-19 2013-08-22 한국과학기술연구원 Searching system and method for moving targets in dynamic environment by cooperation of plural robots
CN103381826A (en) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 Adaptive Cruise Control Method Based on Approximate Policy Iteration
CN105425820A (en) * 2016-01-05 2016-03-23 合肥工业大学 Unmanned aerial vehicle cooperative search method for moving object with perception capability
CN106919181A (en) * 2016-10-20 2017-07-04 湖南大学 A kind of unmanned plane barrier-avoiding method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101299134B1 (en) * 2012-03-19 2013-08-22 한국과학기술연구원 Searching system and method for moving targets in dynamic environment by cooperation of plural robots
CN102799179A (en) * 2012-07-06 2012-11-28 山东大学 Path Planning Algorithm for Mobile Robots Based on Single Chain Sequential Backtracking Q-Learning
CN103381826A (en) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 Adaptive Cruise Control Method Based on Approximate Policy Iteration
CN105425820A (en) * 2016-01-05 2016-03-23 合肥工业大学 Unmanned aerial vehicle cooperative search method for moving object with perception capability
CN106919181A (en) * 2016-10-20 2017-07-04 湖南大学 A kind of unmanned plane barrier-avoiding method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于部分可观察马尔科夫决策过程的序列规划问题的研究;刘峰;《中国优秀博士学位论文全文数据库基础科学辑》;20151115;第11-15页 *
基于马尔可夫过程的水下运动目标启发式搜索;吴芳,等;《电子与信息学报》;20100515;第1088-1093页 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11868145B1 (en) 2019-09-27 2024-01-09 Amazon Technologies, Inc. Selecting safe flight routes based on localized population densities and ground conditions
US11579611B1 (en) 2020-03-30 2023-02-14 Amazon Technologies, Inc. Predicting localized population densities for generating flight routes
US11640764B1 (en) * 2020-06-01 2023-05-02 Amazon Technologies, Inc. Optimal occupancy distribution for route or path planning

Also Published As

Publication number Publication date
CN108594858A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN108594858B (en) Unmanned aerial vehicle searching method and device for Markov moving target
CN113110592B (en) Unmanned aerial vehicle obstacle avoidance and path planning method
Wu et al. Multi UAV cluster control method based on virtual core in improved artificial potential field
Konev et al. MotionCNN: A strong baseline for motion prediction in autonomous driving
Zhu et al. Deep reinforcement learning for real-time assembly planning in robot-based prefabricated construction
CN111047087A (en) Intelligent optimization method and device for path under cooperation of unmanned aerial vehicle and vehicle
Addanki et al. Placeto: Efficient progressive device placement optimization
CN118393900B (en) Automatic driving decision control method, device, system, equipment and storage medium
CN118393973B (en) Automatic driving control method, device, system, equipment and storage medium
Guillen-Perez et al. Learning from oracle demonstrations—a new approach to develop autonomous intersection management control algorithms based on multiagent deep reinforcement learning
CN115034459A (en) Pedestrian trajectory time sequence prediction method
CN117406762A (en) A UAV remote control algorithm based on segmented reinforcement learning
CN119399570A (en) A method for predicting intelligent agent actions based on multi-scale spatial perception
Xiang et al. Socialcvae: Predicting pedestrian trajectory via interaction conditioned latents
CN116362109A (en) Intelligent unmanned system and method based on digital twinning
CN114818124A (en) Virtual-real fusion grid rudder model parameter optimization method based on DPPO
Guan et al. Ab-mapper: Attention and bicnet based multi-agent path finding for dynamic crowded environment
Soliman et al. Aireye: Uav-based intelligent drl mobile target visitation
Nguyen et al. Apprenticeship bootstrapping
CN117522078A (en) Migrant mission planning method and system under unmanned system cluster environment coupling
CN117192998A (en) Unmanned aerial vehicle autonomous decision-making method and device based on state prediction of Transformer neural network
CN115293334A (en) Model-based unmanned equipment control method for high sample rate deep reinforcement learning
Tian et al. The application of path planning algorithm based on deep reinforcement learning for mobile robots
CN114676909A (en) A charging path planning method for unmanned vehicles based on deep reinforcement learning
Sandström On the efficiency of transfer learning in a fighter pilot behavior modelling context

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载