CN108594858B - Unmanned aerial vehicle searching method and device for Markov moving target - Google Patents
Unmanned aerial vehicle searching method and device for Markov moving target Download PDFInfo
- Publication number
- CN108594858B CN108594858B CN201810779927.8A CN201810779927A CN108594858B CN 108594858 B CN108594858 B CN 108594858B CN 201810779927 A CN201810779927 A CN 201810779927A CN 108594858 B CN108594858 B CN 108594858B
- Authority
- CN
- China
- Prior art keywords
- search
- uav
- unmanned aerial
- target
- aerial vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开了马尔科夫运动目标的无人机搜索方法及装置,在接收到搜索任务后,得到马尔科夫运动目标运动过程中和无人机搜索过程中所有可能出现的状态及其概率分布;构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;获取收益最大的搜索行为策略,规划出无人机最优的搜索航迹。本发明克服了传统搜索算法对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的缺点,根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布,确定无人机下一时刻即将搜索的虚拟目标位置,并得到收益最大的搜索行为策略,能够应用于搜索运动目标,且能够以较低的搜索成本成功搜索到目标。
The invention discloses a UAV search method and device for a Markov moving target. After receiving a search task, all possible states and their probability distributions during the movement process of the Markov moving target and the UAV search process are obtained. ; Build a Markov model for UAV behavior prediction under search tasks, and establish a multi-stage heuristic strategy iteration algorithm based on Markov decision-making; obtain the most profitable search behavior strategy, and plan the optimal UAV search track . The invention overcomes the shortcoming of high search cost caused by the lack of strict mathematical definition of the motion law of the search target in the traditional search algorithm. The virtual target position to be searched in the next moment, and the search behavior strategy with the most profit can be obtained, which can be applied to search for moving targets, and the target can be successfully searched with a lower search cost.
Description
技术领域technical field
本发明涉及无人机的控制领域,尤其涉及马尔科夫运动目标的无人机搜索方法及装置。The invention relates to the control field of unmanned aerial vehicles, in particular to a method and a device for searching for unmanned aerial vehicles of Markov moving targets.
背景技术Background technique
如今,科技的力量渗透在各行各业。作为一股新的科技力量,无人机在各行各业的应用前景备受关注。其中无人机在军事目标搜索系统中搜索与营救应用已经得到世界各国的重视。航迹规划是无人机应用于目标搜索系统中必须考虑的重要问题。Today, the power of technology permeates all walks of life. As a new scientific and technological force, the application prospects of UAVs in all walks of life have attracted much attention. Among them, the search and rescue application of UAVs in military target search systems has been paid attention to by countries all over the world. Track planning is an important issue that must be considered in the application of UAV to target search system.
无人机目标搜索系统的航迹规划包括两大类,即静止目标搜索系统航迹规划以及运动目标搜索系统航迹规划。当前,对于无人机静止目标搜索系统航迹规划的研究成果比较成熟,已经广泛应用于实际问题。但对于运动目标搜索系统航迹规划的研究工作相对较少。目前,对于运动目标搜索系统航迹规划问题的研究方向主要集中在对搜索算法本身的设计与应用上,而对搜索过程中目标的运动规律问题缺少严格的数学定义。另一方面,无人机独立执行任务的能力较弱,其智能化程度在很多情况下不足以满足人们的期望。在任务执行的过程中,无人机一般都是按照事先载入的程序进行,应变能力差。因此,多无人机协同参与任务是一种比较可靠的选择。这也使得无人机在搜索方面的应用与实际的应用契合的越来越紧密。The track planning of the UAV target search system includes two categories, namely the track planning of the stationary target search system and the track planning of the moving target search system. At present, the research results of the trajectory planning of the UAV stationary target search system are relatively mature and have been widely used in practical problems. However, there are relatively few researches on the trajectory planning of the moving target search system. At present, the research direction of the trajectory planning problem of the moving target search system mainly focuses on the design and application of the search algorithm itself, but there is no strict mathematical definition for the problem of the movement law of the target in the search process. On the other hand, the ability of UAV to perform tasks independently is weak, and its intelligence is not enough to meet people's expectations in many cases. In the process of task execution, UAVs generally follow the pre-loaded procedures, and the adaptability is poor. Therefore, multi-UAV cooperative participation in tasks is a relatively reliable choice. This also makes the application of drones in search more and more closely aligned with practical applications.
马尔科夫决策过程(Markov Decision Processes,MDP)是一种具有决策能力的马尔科夫奖赏过程。在功能应用方面,它是用来解决在不确定性环境下序贯决策问题的理论知识。其中的不确定环境下序贯决策问题是指在一系列连续的或者离散的时刻(称为决策时刻)做出决策。人们在做决策的时候不仅要考虑决策当前的效果,也要照顾到所做的决策对长远利益的影响。MDP是一种应用非常广泛的决策过程。Markov Decision Processes (MDP) is a Markov reward process with decision-making ability. In functional applications, it is theoretical knowledge used to solve sequential decision-making problems in uncertain environments. The sequential decision-making problem in uncertain environment refers to making decisions at a series of continuous or discrete moments (called decision moments). When people make decisions, they should not only consider the current effect of the decision, but also take into account the impact of the decision on the long-term interests. MDP is a very widely used decision-making process.
传统的无人机搜索算法,如扫描线搜索算法,只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义,存在搜索代价高昂的较大缺点。Traditional UAV search algorithms, such as scan line search algorithms, only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which has the disadvantage of high search cost.
发明内容SUMMARY OF THE INVENTION
为了克服现有技术的不足,本发明的目的在于提供马尔科夫运动目标的无人机搜索方法及装置,旨在解决现有的传统算法应用于无人机搜索任务时不能满足运动目标搜索航迹规划需求或者搜索代价高昂的问题。In order to overcome the deficiencies of the prior art, the purpose of the present invention is to provide a UAV search method and device for Markov moving targets, aiming to solve the problem that the existing traditional algorithms cannot satisfy the moving target search navigation when applied to the UAV search task. trace planning requirements or search for costly problems.
本发明的目的采用以下技术方案实现:Purpose of the present invention adopts following technical scheme to realize:
一种马尔科夫运动目标的无人机搜索方法,包括:A UAV search method for Markov moving targets, comprising:
目标步骤,接收到搜索任务后,构建马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;In the target step, after receiving the search task, construct the probability model of the Markov moving target, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;
无人机步骤,获取无人机搜索过程中所有可能出现的状态及其概率分布;UAV step to obtain all possible states and their probability distributions during the UAV search process;
构建步骤,根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;The construction step is to construct a Markov model for UAV behavior prediction under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process, and establish Markov-based decision-making. The multi-stage heuristic strategy iteration algorithm of ;
规划步骤,利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。In the planning step, the multi-stage heuristic strategy iteration algorithm based on Markov decision is used to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV.
在上述实施例的基础上,优选的,所述目标步骤,具体为:On the basis of the above embodiment, preferably, the target step is specifically:
接收到搜索任务后,对目标的运动区域进行栅格化划分,利用概率论对马尔科夫运动目标建模,获取马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;所述目标的运动区域等同于无人机的搜索区域。After receiving the search task, the moving area of the target is divided into grids, the Markov moving target is modeled by probability theory, and the probability model of the Markov moving target is obtained, so as to obtain all the marks during the movement of the Markov moving target. The possible states and their probability distributions; the movement area of the target is equivalent to the search area of the UAV.
在上述任意实施例的基础上,优选的,所述无人机步骤,具体为:On the basis of any of the above-mentioned embodiments, preferably, the unmanned aerial vehicle step is specifically:
对无人机的飞行行为进行编码并描述,对无人机在执行搜索任务过程中存在的状态进行描述,从而获取无人机搜索过程中所有可能出现的状态及其概率分布。The flight behavior of the UAV is encoded and described, and the state of the UAV in the process of performing the search task is described, so as to obtain all possible states and their probability distributions during the UAV search process.
在上述任意实施例的基础上,优选的,所述构建搜索任务下无人机行为预测的马尔科夫模型,具体为:On the basis of any of the above-mentioned embodiments, preferably, the construction of a Markov model for predicting the behavior of the UAV under the search task is specifically:
设无人机搜索任务进行中的时刻集合T={1,2,3,…};Set the time set T={1,2,3,…} when the UAV search task is in progress;
设无人机的离散状态空间S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Set the discrete state space S=(s 1 , s 2 , s 3 ,...) of the UAV, which contains all possible states during the UAV search process and the Markov moving target movement process;
设无人机的动作空间A={a1,a2,...,ax,...,aq},表示无人机所有可能的改变状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数;Let the action space A of the UAV = {a 1 ,a 2 ,...,a x ,...,a q }, which represents all possible actions of the UAV to change the state, where the element a x represents the xth actions, q is the number of elements in the action space;
设无人机处于状态sn下的可行动作集合A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机位于某个状态下可以采取的所有动作集合;Suppose the feasible action set A(s n ) of the UAV in the state sn = {a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, representing the UAV A collection of all actions that can be taken in a state;
设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其中的任意元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,状态变化到sm的概率, Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and any element p(s m |s n , a x (s n )) in the state s n Next, the probability that the state changes to s m after performing the available action a x (s n ),
设报酬集合R(sn)的任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Let any element r(s n , a x (s n )) of the reward set R(s n ) represent the reward for performing the action a x (s n ) in the state sn ;
则无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型为:Then the Markov model for predicting the search behavior of the UAV under any search task is:
MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );
其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射,→表示输出最优策略。Among them, π is the strategy, which represents the mapping from the state set to the action set, π(s n ) represents the mapping of the UAV from the state s n to the action set, and → represents the output optimal strategy.
在上述实施例的基础上,优选的,所述规划步骤,包括:On the basis of the above embodiment, preferably, the planning step includes:
计算步骤:用MDP折扣模型计算报酬效用函数,其中折扣因子γ满足:0<γ<1;折扣模型的报酬函数表示在从时刻t=0开始无人机从状态sn使用策略π后的折扣期望总报酬;Calculation steps: use the MDP discount model to calculate the reward utility function, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model represents the discounted expected total reward after the UAV uses policy π from state sn from time t =0;
根据MDP折扣模型的最优方程,建立在状态sn下无人机在搜索任务中搜索操作收益的最优状态值函数方程 以及最优动作值函数方程 并根据两个最优函数方程建立最优搜索策略函数方程 According to the optimal equation of the MDP discount model, the optimal state value function equation of the UAV's search operation revenue in the search task under the state sn is established and the optimal action value function equation And establish the optimal search strategy function equation according to the two optimal function equations
给定步骤:对协同搜索区域进行栅格化划分,确定MDP模型的离散状态空间S,给定参与搜索任务的无人机数目g,g、i为正整数,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合,Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;Given step: divide the collaborative search area into grids, determine the discrete state space S of the MDP model, and give the number of UAVs participating in the search task g, where g and i are positive integers, and s UAV(i) is the i-th UAV(i) is the current state of a UAV, s UAV(i) ∈ S, A(s UAV(i) ) is the action set of the i-th UAV in the state s UAV(i) , and K i is the i-th UAV The maximum search step size of the machine; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;
初始步骤:确定目标运动的初始位置以及每个无人机开始搜索的位置;每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布,从而确定每个无人机下一时刻即将搜索的虚拟目标位置;Initial step: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the target existence probability distribution in the entire area according to the initial position of the target movement and the target movement heuristic information, so as to determine The virtual target position that each drone will search for at the next moment;
迭代步骤:每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),迭代计算各自的状态值函数Vb+1(sUAV(i)),令迭代次数b=b+1;Iterative step: each UAV calculates its own state value function V b+1 (s UAV(i) ) iteratively according to its own virtual target position and its current state s UAV(i) , let the number of iterations b =b+1;
判断步骤:如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,进入遍历步骤;否则,转到迭代步骤;Judgment step: if ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and enter the traversal step; otherwise, go to the iteration step;
遍历步骤:每个无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的搜索行为策略πi(t+1)*;Traversal step: each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ), and finally finds get the most profitable search behavior strategy π i (t+1) * ;
转移步骤:按照所求的最优策略πi(t+1)*执行动作ai,状态由转移到同时,无人机获得立即报酬ri(sUAV(i),ai),此时令t=t+1,令第i架无人机搜索步长ki=ki+1;Transition step: according to the required optimal strategy π i (t+1) * execute action a i , the state is given by move to At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), at this time, let t=t+1, and let the i-th UAV search step ki = ki +1;
结束步骤:若在某一个时刻t,第i架无人机位置sUAV(i)与目标当前模拟位置starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长i=1,2,...,n,则搜索任务失败,算法结束;所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现的位置。Ending step: If at a certain time t, the position s UAV(i) of the i-th UAV is the same as the current simulated position s target of the target, then the i-th UAV successfully searches for the target, the search task is completed, and the algorithm ends; If the search step i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. possible location.
一种马尔科夫运动目标的无人机搜索装置,包括:A UAV search device for Markov moving targets, comprising:
目标模块,用于接收到搜索任务后,构建马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;The target module is used to construct the probability model of the Markov moving target after receiving the search task, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;
无人机模块,用于获取无人机搜索过程中所有可能出现的状态及其概率分布;The UAV module is used to obtain all possible states and their probability distributions during the UAV search process;
构建模块,用于根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;The building block is used to construct a Markov model for UAV behavior prediction under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process. multi-stage heuristic policy iteration algorithm for decision-making;
规划模块,用于利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。The planning module is used to use the multi-stage heuristic strategy iteration algorithm based on Markov decision to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV.
在上述实施例的基础上,优选的,所述目标模块用于:On the basis of the above embodiment, preferably, the target module is used for:
接收到搜索任务后,对目标的运动区域进行栅格化划分,利用概率论对马尔科夫运动目标建模,获取马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;所述目标的运动区域等同于无人机的搜索区域。After receiving the search task, the moving area of the target is divided into grids, the Markov moving target is modeled by probability theory, and the probability model of the Markov moving target is obtained, so as to obtain all the marks during the movement of the Markov moving target. The possible states and their probability distributions; the movement area of the target is equivalent to the search area of the UAV.
在上述任意实施例的基础上,优选的,所述无人机模块用于:On the basis of any of the above embodiments, preferably, the UAV module is used for:
对无人机的飞行行为进行编码并描述,对无人机在执行搜索任务过程中存在的状态进行描述,从而获取无人机搜索过程中所有可能出现的状态及其概率分布。The flight behavior of the UAV is encoded and described, and the state of the UAV in the process of performing the search task is described, so as to obtain all possible states and their probability distributions during the UAV search process.
在上述任意实施例的基础上,优选的,所述构建搜索任务下无人机行为预测的马尔科夫模型,具体为:On the basis of any of the above-mentioned embodiments, preferably, the construction of a Markov model for predicting the behavior of the UAV under the search task is specifically:
设无人机搜索任务进行中的时刻集合T={1,2,3,…};Set the time set T={1,2,3,…} when the UAV search task is in progress;
设无人机的离散状态空间S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Set the discrete state space S=(s 1 , s 2 , s 3 ,...) of the UAV, which contains all possible states during the UAV search process and the Markov moving target movement process;
设无人机的动作空间A={a1,a2,...,ax,...,aq},表示无人机所有可能的改变状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数;Let the action space A of the UAV = {a 1 ,a 2 ,...,a x ,...,a q }, which represents all possible actions of the UAV to change the state, where the element a x represents the xth actions, q is the number of elements in the action space;
设无人机处于状态sn下的可行动作集合A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机位于某个状态下可以采取的所有动作集合;Suppose the feasible action set A(s n ) of the UAV in the state sn = {a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, representing the UAV A collection of all actions that can be taken in a state;
设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其中的任意元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,状态变化到sm的概率,其中i,j=1,2,3,..., Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and any element p(s m |s n , a x (s n )) in the state s n below, the probability that the state changes to s m after performing the available actions a x (s n ), where i,j=1,2,3,...,
设报酬集合R(sn)的任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Let any element r(s n , a x (s n )) of the reward set R(s n ) represent the reward for performing the action a x (s n ) in the state sn ;
则无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型为:Then the Markov model for predicting the search behavior of the UAV under any search task is:
MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );
其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射,→表示输出最优策略。Among them, π is the strategy, which represents the mapping from the state set to the action set, π(s n ) represents the mapping of the UAV from the state s n to the action set, and → represents the output optimal strategy.
在上述实施例的基础上,优选的,所述规划模块包括:On the basis of the above embodiment, preferably, the planning module includes:
计算单元,用于:用MDP折扣模型计算报酬效用函数,其中折扣因子γ满足:0<γ<1;折扣模型的报酬函数表示在从时刻t=0开始无人机从状态sn使用策略π后的折扣期望总报酬;The calculation unit is used to: calculate the reward utility function with the MDP discount model, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model represents the discounted expected total reward after the UAV uses policy π from state sn from time t =0;
根据MDP折扣模型的最优方程,建立在状态sn下无人机在搜索任务中搜索操作收益的最优状态值函数方程 以及最优动作值函数方程 并根据两个最优函数方程建立最优搜索策略函数方程 According to the optimal equation of the MDP discount model, the optimal state value function equation of the UAV's search operation revenue in the search task under the state sn is established and the optimal action value function equation And establish the optimal search strategy function equation according to the two optimal function equations
给定单元,用于:对协同搜索区域进行栅格化划分,确定MDP模型的离散状态空间S,给定参与搜索任务的无人机数目g,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合,Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;The given unit is used to: rasterize the collaborative search area, determine the discrete state space S of the MDP model, given the number of UAVs participating in the search task g, s UAV(i) is the i-th UAV The current state, s UAV(i) ∈ S, A(s UAV(i) ) is the action set of the i-th UAV in the state s UAV(i) , and K i is the maximum search of the i-th UAV Step size; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;
初始单元,用于:确定目标运动的初始位置以及每个无人机开始搜索的位置;每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布,从而确定每个无人机下一时刻即将搜索的虚拟目标位置;The initial unit is used to: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the existence probability distribution of the target in the entire area according to the initial position of the target movement and the target movement heuristic information , so as to determine the virtual target position that each UAV will search for at the next moment;
迭代单元,用于:每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),迭代计算各自的状态值函数Vb+1(sUAV(i)),令迭代次数b=b+1;The iterative unit is used for: each UAV based on its own virtual target position and its current state s UAV(i) , iteratively calculates its own state value function V b+1 (s UAV(i) ), let The number of iterations b=b+1;
判断单元,用于:如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,调用遍历单元,否则,调用迭代单元;The judgment unit is used for: if ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and call the traversal unit, otherwise, call the iteration unit;
遍历单元,用于:每个无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的搜索行为策略πi(t+1)*;Traversal unit for: each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ) , and finally obtain the search behavior strategy with the largest profit π i (t+1) * ;
转移单元,用于:按照所求的最优策略πi(t+1)*执行动作ai,状态由转移到同时,无人机获得立即报酬ri(sUAV(i),ai),此时令t=t+1,令第i架无人机搜索步长ki=ki+1;The transition unit is used to: execute the action a i according to the required optimal strategy π i (t+1) * , the state is given by move to At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), at this time, let t=t+1, and let the i-th UAV search step ki = ki +1;
结束单元,用于:若在某一个时刻t,第i架无人机位置sUAV(i)与目标当前模拟位置Starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长i=1,2,...,n,则搜索任务失败,算法结束;所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现的位置。The end unit is used for: if at a certain time t, the position s UAV(i) of the i-th UAV is the same as the current simulated position S target of the target, then the i-th UAV successfully searches for the target, and the search task is completed, The algorithm ends; if the search step is i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. possible location.
相比现有技术,本发明的有益效果在于:Compared with the prior art, the beneficial effects of the present invention are:
本发明公开了马尔科夫运动目标的无人机搜索方法及装置,在接收到搜索任务后,构建马尔科夫运动目标的概率模型,得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;获取无人机搜索过程中所有可能出现的状态及其概率分布;构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。The invention discloses a UAV search method and device for a Markov moving target. After a search task is received, a probability model of the Markov moving target is constructed to obtain all possible states and states during the movement of the Markov moving target. Its probability distribution; obtain all possible states and their probability distributions in the UAV search process; build a Markov model for UAV behavior prediction under search tasks, and establish a multi-stage heuristic strategy iteration algorithm based on Markov decision ; Use the multi-stage heuristic strategy iteration algorithm based on Markov decision to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV.
本发明能够采用MDP折扣模型确定搜索任务下无人机存在状态转移概率集合和转移过程中飞行行为操作的报酬集合,计算报酬效用函数,建立搜索任务下无人机行为收益的最优方程,进行迭代计算和判断,得到收益最大的搜索行为策略,规划出无人机最优的搜索航迹。The present invention can use the MDP discount model to determine the probability set of the existence state transition of the UAV under the search task and the reward set of the flight behavior operation in the transfer process, calculate the reward utility function, establish the optimal equation of the UAV behavior benefit under the search task, and carry out Iterative calculation and judgment are used to obtain the most profitable search behavior strategy, and the optimal search path of the UAV is planned.
本发明克服了传统搜索算法,如扫描线搜索算法,只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的较大缺点,根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布,确定无人机下一时刻即将搜索的虚拟目标的位置,并得到一组无人机的搜索行为操作序列,此序列为行为收益最大的搜索行为策略,能够应用于搜索运动目标,且能够以较低的搜索成本成功搜索到目标。The invention overcomes the major shortcomings of traditional search algorithms, such as scan line search algorithms, which only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which leads to high search costs. The existence probability distribution of the Kov moving target in the search area determines the position of the virtual target that the UAV will search for at the next moment, and obtains a set of UAV’s search behavior operation sequence. This sequence is the search behavior strategy with the greatest behavioral benefit. , which can be applied to search for moving targets, and can successfully search for targets with low search cost.
附图说明Description of drawings
下面结合附图和实施例对本发明进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.
图1示出了本发明实施例提供的一种搜索区域栅格划分示意图;FIG. 1 shows a schematic diagram of grid division of a search area provided by an embodiment of the present invention;
图2a~2d分别示出了本发明实施例提供的马尔科夫运动目标在不同时刻的存在概率分布图;Figures 2a-2d respectively show the existence probability distribution diagrams of the Markov moving objects provided by the embodiments of the present invention at different times;
图3示出了本发明实施例提供的一种MDP-MSHPI算法理论框架结构示意图;3 shows a schematic diagram of a theoretical framework structure of an MDP-MSHPI algorithm provided by an embodiment of the present invention;
图4示出了本发明实施例提供的一种单个无人机运用MSHPI算法搜索Markov目标的流程图;4 shows a flowchart of a single UAV searching for a Markov target using the MSHPI algorithm according to an embodiment of the present invention;
图5示出了本发明实施例提供的一种多个无人机运用MSHPI算法搜索Markov目标的流程图;FIG. 5 shows a flowchart of a plurality of unmanned aerial vehicles searching for Markov targets by using the MSHPI algorithm according to an embodiment of the present invention;
图6示出了本发明实施例提供的一种马尔科夫运动目标的无人机搜索装置的结构示意图。FIG. 6 shows a schematic structural diagram of a UAV search device for a Markov moving target provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面,结合附图以及具体实施方式,对本发明做进一步描述,需要说明的是,在不相冲突的前提下,以下描述的各实施例之间或各技术特征之间可以任意组合形成新的实施例。The present invention will be further described below with reference to the accompanying drawings and specific embodiments. It should be noted that, on the premise of no conflict, the embodiments or technical features described below can be combined arbitrarily to form new embodiments. .
具体实施例一Specific embodiment one
本发明实施例提供了一种马尔科夫运动目标的无人机搜索方法,包括:An embodiment of the present invention provides a method for searching an unmanned aerial vehicle for a Markov moving target, including:
目标步骤,接收到搜索任务后,构建马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;In the target step, after receiving the search task, construct the probability model of the Markov moving target, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;
无人机步骤,获取无人机搜索过程中所有可能出现的状态及其概率分布;UAV step to obtain all possible states and their probability distributions during the UAV search process;
构建步骤,根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法(Multi-Stage Heuristic Strategy IterationAlgorithm,MDP-MSHPI);The construction step is to construct a Markov model for UAV behavior prediction under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process, and establish Markov-based decision-making. The Multi-Stage Heuristic Strategy Iteration Algorithm (MDP-MSHPI);
规划步骤,利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。本发明实施例中,无人机的数量可以为一个、两个或多个。In the planning step, the multi-stage heuristic strategy iteration algorithm based on Markov decision is used to obtain the most profitable search behavior strategy, so as to plan the optimal search path of the UAV. In this embodiment of the present invention, the number of drones may be one, two or more.
本发明克服了传统搜索算法,如扫描线搜索算法,只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的较大缺点,根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布,确定无人机下一时刻即将搜索的虚拟目标的位置,并得到一组无人机的搜索行为操作序列,此序列为行为收益最大的搜索行为策略,能够应用于搜索运动目标,且能够以较低的搜索成本成功搜索到目标。The invention overcomes the major shortcomings of traditional search algorithms, such as scan line search algorithms, which only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which leads to high search costs. The existence probability distribution of the Kov moving target in the search area determines the position of the virtual target that the UAV will search for at the next moment, and obtains a set of UAV’s search behavior operation sequence. This sequence is the search behavior strategy with the greatest behavioral benefit. , which can be applied to search for moving targets, and can successfully search for targets with low search cost.
如图1、图2所示,优选的,所述目标步骤,可以具体为:接收到搜索任务后,对目标的运动区域进行栅格化划分,利用概率论对马尔科夫运动目标建模,获取马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;所述目标的运动区域等同于无人机的搜索区域。这样做的好处是,能够对运动目标的运动规律进行模拟,获取马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,例如可以得到某一时刻下运动目标最可能所处的位置。As shown in FIG. 1 and FIG. 2 , preferably, the target step may be specifically: after receiving the search task, divide the moving area of the target into a grid, and use probability theory to model the Markov moving target, Obtain the probability model of the Markov moving target, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target; the moving area of the target is equivalent to the search area of the UAV. The advantage of this is that it can simulate the motion law of the moving target, and obtain all possible states and their probability distributions during the movement of the Markov moving target. For example, the most likely position of the moving target at a certain moment can be obtained. .
优选的,所述无人机步骤,可以具体为:对无人机的飞行行为进行编码并描述,对无人机在执行搜索任务过程中存在的状态进行描述,从而获取无人机搜索过程中所有可能出现的状态及其概率分布。这样做的好处是,能够获取无人机在搜索过程中可能出现的状态及其概率分布。Preferably, the unmanned aerial vehicle step may be specifically: coding and describing the flying behavior of the unmanned aerial vehicle, and describing the state of the unmanned aerial vehicle in the process of executing the search task, so as to obtain the information of the unmanned aerial vehicle in the search process. All possible states and their probability distributions. The advantage of this is that the possible states of the drone during the search process and their probability distribution can be obtained.
优选的,所述构建搜索任务下无人机行为预测的马尔科夫模型的步骤,可以具体为:Preferably, the step of constructing the Markov model for predicting the behavior of the UAV under the search task may be specifically:
设无人机搜索任务进行中的时刻集合T={1,2,3,…};Set the time set T={1,2,3,…} when the UAV search task is in progress;
设无人机的离散状态空间S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Set the discrete state space S=(s 1 , s 2 , s 3 ,...) of the UAV, which contains all possible states during the UAV search process and the Markov moving target movement process;
设无人机的动作空间A={a1,a2,...,ax,...,aq},表示无人机所有可能的改变状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数;Let the action space A of the UAV = {a 1 ,a 2 ,...,a x ,...,a q }, which represents all possible actions of the UAV to change the state, where the element a x represents the xth actions, q is the number of elements in the action space;
设无人机处于状态sn下的可行动作集合A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机位于某个状态下可以采取的所有动作集合;Suppose the feasible action set A(s n ) of the UAV in the state sn = {a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, representing the UAV A collection of all actions that can be taken in a state;
设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其中的任意元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,状态变化到sm的概率, Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and any element p(s m |s n , a x (s n )) in the state s n Next, the probability that the state changes to s m after performing the available action a x (s n ),
设报酬集合R(sn)的任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Let any element r(s n , a x (s n )) of the reward set R(s n ) represent the reward for performing the action a x (s n ) in the state sn ;
则无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型为:Then the Markov model for predicting the search behavior of the UAV under any search task is:
MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );
其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射,→表示输出最优策略。Among them, π is the strategy, which represents the mapping from the state set to the action set, π(s n ) represents the mapping of the UAV from the state s n to the action set, and → represents the output optimal strategy.
优选的,所述规划步骤,可以包括:Preferably, the planning step may include:
计算步骤:用MDP折扣模型计算报酬效用函数,其中折扣因子γ满足:0<γ<1;折扣模型的报酬函数表示在从时刻t=0开始无人机从状态sn使用策略π后的折扣期望总报酬;Calculation steps: use the MDP discount model to calculate the reward utility function, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model represents the discounted expected total reward after the UAV uses policy π from state sn from time t =0;
根据MDP折扣模型的最优方程,建立在状态sn下无人机在搜索任务中搜索操作收益的最优状态值函数方程 以及最优动作值函数方程 并根据两个最优函数方程建立最优搜索策略函数方程 According to the optimal equation of the MDP discount model, the optimal state value function equation of the UAV's search operation revenue in the search task under the state sn is established and the optimal action value function equation And establish the optimal search strategy function equation according to the two optimal function equations
给定步骤:对协同搜索区域进行栅格化划分,确定MDP模型的离散状态空间S,给定参与搜索任务的无人机数目g,g、i为正整数,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合,Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;Given step: divide the collaborative search area into grids, determine the discrete state space S of the MDP model, and give the number of UAVs participating in the search task g, where g and i are positive integers, and s UAV(i) is the i-th UAV(i) is the current state of a UAV, s UAV(i) ∈ S, A(s UAV(i) ) is the action set of the i-th UAV in the state s UAV(i) , and K i is the i-th UAV The maximum search step size of the machine; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;
初始步骤:确定目标运动的初始位置以及每个无人机开始搜索的位置;每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布,从而确定每个无人机下一时刻即将搜索的虚拟目标位置;Initial step: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the target existence probability distribution in the entire area according to the initial position of the target movement and the target movement heuristic information, so as to determine The virtual target position that each drone will search for at the next moment;
迭代步骤:每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),迭代计算各自的状态值函数Vb+1(sUAV(i)),令迭代次数b=b+1;Iterative step: each UAV calculates its own state value function V b+1 (s UAV(i) ) iteratively according to its own virtual target position and its current state s UAV(i) , let the number of iterations b =b+1;
判断步骤:如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,进入遍历步骤;否则,转到迭代步骤;Judgment step: if ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and enter the traversal step; otherwise, go to the iteration step;
遍历步骤:每个无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的搜索行为策略πi(t+1)*;Traversal step: each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ), and finally finds get the most profitable search behavior strategy π i (t+1) * ;
转移步骤:按照所求的最优策略πi(t+1)*执行动作ai,状态由转移到同时,无人机获得立即报酬ri(sUAV(i),ai),此时令t=t+1,令第i架无人机搜索步长ki=ki+1;Transition step: according to the required optimal strategy π i (t+1) * execute action a i , the state is given by move to At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), at this time, let t=t+1, and let the i-th UAV search step ki = ki +1;
结束步骤:若在某一个时刻t,第i架无人机位置sUAV(i)与目标当前模拟位置starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长i=1,2,...,n,则搜索任务失败,算法结束;所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现(所处)的位置。Ending step: If at a certain time t, the position s UAV(i) of the i-th UAV is the same as the current simulated position s target of the target, then the i-th UAV successfully searches for the target, the search task is completed, and the algorithm ends; If the search step i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. where it might appear.
这样做的好处是,能够采用MDP折扣模型确定搜索任务下无人机存在状态转移概率集合和转移过程中飞行行为操作的报酬集合,计算报酬效用函数,建立搜索任务下无人机行为收益的最优方程,进行迭代计算和判断,得到收益最大的搜索行为策略,规划出无人机最优的搜索航迹。The advantage of this is that the MDP discount model can be used to determine the probability set of UAV existence state transition under the search task and the reward set of the flight behavior operation during the transfer process, calculate the reward utility function, and establish the maximum value of the UAV behavior benefit under the search task. The optimal equation is used to perform iterative calculation and judgment to obtain the most profitable search behavior strategy, and to plan the optimal search path of the UAV.
本发明的一种应用场景可以是:An application scenario of the present invention may be:
1.对目标的运动区域(同样是无人机的搜索区域)进行栅格化划分,划分后的区域模型如图1所示;利用概率论的知识对马尔科夫运动目标建模,获得马尔科夫运动目标的概率模型,马尔科夫运动目标在运动一定步数后在运动区域的存在概率如图2a~2d所示,其中图2a为初始时刻,图2b为运动一步后,图2c为运动5步后,图2d为运动10步后;1. Perform grid division on the moving area of the target (also the search area of the UAV), and the divided area model is shown in Figure 1; use the knowledge of probability theory to model the Markov moving target, and obtain the Markov moving target. The probability model of the Markov moving target, the existence probability of the Markov moving target in the moving area after moving a certain number of steps is shown in Figure 2a ~ 2d, in which Figure 2a is the initial moment, Figure 2b is after one step of movement, Figure 2c is After 5 steps of exercise, Figure 2d shows after 10 steps of exercise;
MDP-MSHPI算法理论框架结构如图3所示;The theoretical framework of the MDP-MSHPI algorithm is shown in Figure 3;
2.对无人机的飞行行为进行编码并描述,并对无人机在执行搜索任务过程中存在的状态进行描述;2. Code and describe the flight behavior of the UAV, and describe the state of the UAV in the process of performing the search task;
3.构建搜索任务下无人机行为预测的马尔科夫模型,在这一步骤中需要提前定义以下内容:3. Build a Markov model for UAV behavior prediction under search tasks. In this step, the following contents need to be defined in advance:
定义1:设T={1,2,3,…}表示时刻集合;Definition 1: Let T={1,2,3,…} represent the time set;
定义2:设S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Definition 2: Set S=(s 1 , s 2 , s 3 ,...), the state space contains all possible states during the UAV search process and the Markov moving target movement process;
定义3:设A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机处于状态sn下的可以采取的所有动作集合;Definition 3: Let A(s n )={a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, it means that the UAV is in the state sn that can be taken A collection of all actions;
定义4:设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,无人机状态变化到sm的概率,并假设 Definition 4: Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and its element p(s m |s n , a x (s n )) represents the state s Under n , the probability of the UAV state changing to s m after performing the available actions a x (s n ), and assuming
定义5:设R(sn)表示报酬集合,其中任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Definition 5: Let R(s n ) represent the reward set, where any element r(s n , a x (s n )) represents the reward for performing the action a x (s n ) in the state sn ;
定义6:设A={a1,a2,...,aq},A代表无人机的动作空间,包含了无人机所有可能的改变无人机状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数x=1,2,3,...q;Definition 6: Let A={a 1 ,a 2 ,...,a q }, A represents the action space of the UAV, including all possible actions of the UAV to change the state of the UAV, among which the element a x Represents the xth action, q is the number of elements in the action space x=1,2,3,...q;
给出无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型如下:The Markov model that predicts the search behavior of the UAV to perform any search task is as follows:
MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );
其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射;Among them, π is the strategy, which represents the mapping from the state set to the action set, and π(s n ) represents the mapping of the UAV from the state s n to the action set;
4.确定无人机在该搜索任务下状态转移概率集合T(sn,ax(sn),sm),并根据搜索任务要求确定状态转移过程中的报酬集合R(sn);4. Determine the state transition probability set T(s n , a x (s n ), s m ) of the UAV under the search task, and determine the reward set R(s n ) in the state transition process according to the search task requirements;
根据无人机当前所处的状态sn,建立无人机搜索任务下搜索操作收益的最优状态值函数方程;According to the current state s n of the UAV, establish the optimal state value function equation of the search operation revenue under the UAV search task;
本发明在计算报酬效用函数时使用MDP折扣模型,即在确定一个策略并执行后,决策者会在时刻T依一定的概率获取一定的报酬,报酬累加起来后就是该系统的效用函数,其中的折扣模型中的折扣因子用γ表示,且有0<γ<1,MDP折扣模型的效用函数表示为该函数表示系统在t=0时刻从sn状态下,使用策略π后无人机的折扣期望总报酬,其中效用函数为有界函数;The present invention uses the MDP discount model when calculating the reward utility function, that is, after a strategy is determined and executed, the decision maker will obtain a certain reward at time T according to a certain probability, and the cumulative reward is the utility function of the system, where the The discount factor in the discount model is represented by γ, and 0<γ<1, the utility function of the MDP discount model is expressed as This function represents the discounted expected total reward of the UAV after using the strategy π from the state of sn at t =0, where the utility function is a bounded function;
根据MDP折扣模型的最优方程,建立无人机在状态sn下执行该搜索任务过程中搜索操作收益的最优状态值函数方程 以及最优动作值函数方程 并根据两个最优函数模型建立最优搜索策略函数方程 According to the optimal equation of the MDP discount model, the optimal state value function equation of the search operation revenue in the process of the UAV performing the search task in the state sn is established. and the optimal action value function equation And establish the optimal search strategy function equation according to the two optimal function models
5.初始化参数;给定参与搜索任务的无人机数目g,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合;Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;其中i为正整数;5. Initialization parameters; given the number of UAVs participating in the search task g, s UAV(i) is the current state of the ith UAV, s UAV(i) ∈ S, and A(s UAV(i) ) is the current state of the ith UAV. The action set of i UAV in state s UAV(i) ; K i is the maximum search step size of the i-th UAV; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0; where i is a positive integer;
6.确定目标运动的初始位置以及每个无人机开始搜索的位置;每架无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布;从而确定每个无人机下一时刻即将搜索的虚拟目标位置;6. Determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the target's existence probability distribution in the entire area according to the initial position of the target's movement and the target movement heuristic information; The virtual target position that the drone will search for at the next moment;
7.每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),通过计算搜索操作收益的最优状态值函数方程得到Vb+1(sUAV(i))迭代次数b=b+1;7. According to its own virtual target position and its current state s UAV(i) , each UAV obtains V b+1 (s UAV(i) ) by calculating the optimal state value function equation of the search operation revenue The number of iterations b=b+1;
8.如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,进行下一步骤,否则,转到步骤7;8. If ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and go to the next step, otherwise, go to
9.每架无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的行为策略πi(t+1)*;每架无人机按照以上步骤都能找到下一时刻的最优搜索行为策略,如图4所示;9. Each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ), and finally obtains The most profitable behavior strategy π i (t+1) * ; each UAV can find the optimal search behavior strategy at the next moment according to the above steps, as shown in Figure 4;
10.按照所求的最优策略πi(t+1)*执行动作ai,状态由转移到于此同时,无人机获得立即报酬ri(sUAV(i),ai),t=t+1,第i架无人机搜索步长ki=ki+1;10. According to the required optimal strategy π i (t+1) * execute action a i , the state is given by move to At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), t=t+1, and the i-th UAV searches for a step size ki = ki +1;
11.若在某一个时刻t,第i架无人机位置sUAV(i)与目标真实位置starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长 i=1,2,...,n,则搜索失败,算法结束;多架无人机协同搜索Markov运动目标的程序框图如图5所示。11. If at a certain time t, the position s UAV(i) of the i-th UAV is the same as the real position of the target s target , then the i-th UAV successfully searches for the target, the search task is completed, and the algorithm ends; step size If i=1,2,...,n, the search fails and the algorithm ends; the block diagram of the coordinated search of Markov moving targets by multiple UAVs is shown in Figure 5.
在上述的具体实施例一中,提供了马尔科夫运动目标的无人机搜索方法,与之相对应的,本申请还提供马尔科夫运动目标的无人机搜索装置。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。In the above-mentioned
具体实施例二Specific embodiment two
如图6所示,本发明实施例提供了一种马尔科夫运动目标的无人机搜索装置,包括:As shown in FIG. 6 , an embodiment of the present invention provides a UAV search device for a Markov moving target, including:
目标模块201,用于接收到搜索任务后,构建马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;The target module 201 is configured to construct a probability model of the Markov moving target after receiving the search task, so as to obtain all possible states and their probability distributions during the movement of the Markov moving target;
无人机模块202,用于获取无人机搜索过程中所有可能出现的状态及其概率分布;The UAV module 202 is used to obtain all possible states and their probability distributions in the UAV search process;
构建模块203,用于根据无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布,构建搜索任务下无人机行为预测的马尔科夫模型,建立基于马尔科夫决策的多阶段启发式策略迭代算法;The building module 203 is used to construct a Markov model for predicting the behavior of the UAV under the search task according to all possible states and their probability distributions during the UAV search process and the Markov moving target movement process, and establish a Markov model based on Markov behavior prediction. Multi-stage heuristic policy iteration algorithm for Kov decision;
规划模块204,用于利用基于马尔科夫决策的多阶段启发式策略迭代算法,获取收益最大的搜索行为策略,从而规划出无人机最优的搜索航迹。The planning module 204 is configured to use the multi-stage heuristic strategy iterative algorithm based on Markov decision to obtain the search behavior strategy with the greatest profit, thereby planning the optimal search track of the UAV.
本发明实施例克服了传统搜索算法,如扫描线搜索算法,只注重算法本身的设计且对搜索目标运动规律缺乏严格数学定义而导致搜索代价高昂的较大缺点,根据无人机当前飞行状态和此刻马尔科夫运动目标在搜索区域的存在概率分布,确定无人机下一时刻即将搜索的虚拟目标的位置,并得到一组无人机的搜索行为操作序列,此序列为行为收益最大的搜索行为策略,能够应用于搜索运动目标,且能够以较低的搜索成本成功搜索到目标。The embodiment of the present invention overcomes the major shortcomings of traditional search algorithms, such as scan line search algorithms, which only focus on the design of the algorithm itself and lack strict mathematical definition of the motion law of the search target, which leads to high search costs. At this moment, the existence probability distribution of the Markov moving target in the search area determines the position of the virtual target that the UAV will search for at the next moment, and obtains a sequence of search behaviors of the UAV. This sequence is the search with the greatest behavioral benefit. The behavior strategy can be applied to search for moving targets, and the target can be successfully searched with low search cost.
优选的,所述目标模块201可以用于:接收到搜索任务后,对目标的运动区域进行栅格化划分,利用概率论对马尔科夫运动目标建模,获取马尔科夫运动目标的概率模型,从而得到马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布;所述目标的运动区域等同于无人机的搜索区域。Preferably, the target module 201 can be used to: after receiving the search task, perform grid division on the moving area of the target, use probability theory to model the Markov moving target, and obtain the probability model of the Markov moving target , so as to obtain all possible states and their probability distributions during the movement of the Markov moving target; the movement area of the target is equivalent to the search area of the UAV.
优选的,所述无人机模块202可以用于:对无人机的飞行行为进行编码并描述,对无人机在执行搜索任务过程中存在的状态进行描述,从而获取无人机搜索过程中所有可能出现的状态及其概率分布。Preferably, the UAV module 202 can be used to: encode and describe the flight behavior of the UAV, and describe the state of the UAV in the process of executing the search task, so as to obtain the information of the UAV in the process of searching for the UAV. All possible states and their probability distributions.
优选的,所述构建搜索任务下无人机行为预测的马尔科夫模型,可以具体为:Preferably, the construction of the Markov model for predicting the behavior of the UAV under the search task may be specifically:
设无人机搜索任务进行中的时刻集合T={1,2,3,…};Set the time set T={1,2,3,…} when the UAV search task is in progress;
设无人机的离散状态空间S=(s1,s2,s3,…),该状态空间包含了无人机搜索过程中和马尔科夫运动目标运动过程中所有可能出现的状态;Set the discrete state space S=(s 1 , s 2 , s 3 ,...) of the UAV, which contains all possible states during the UAV search process and the Markov moving target movement process;
设无人机的动作空间A={a1,a2,...,ax,...,aq},表示无人机所有可能的改变状态的动作,其中元素ax表示第x个动作,q为动作空间中的元素个数;Let the action space A of the UAV = {a 1 ,a 2 ,...,a x ,...,a q }, which represents all possible actions of the UAV to change the state, where the element a x represents the xth actions, q is the number of elements in the action space;
设无人机处于状态sn下的可行动作集合A(sn)={a1(sn),a2(sn),a3(sn),...},表示无人机位于某个状态下可以采取的所有动作集合;Suppose the feasible action set A(s n ) of the UAV in the state sn = {a 1 (s n ), a 2 (s n ), a 3 (s n ),...}, representing the UAV A collection of all actions that can be taken in a state;
设T(sn,ax(sn),sm)表示无人机所有状态转移概率集合,其中的任意元素p(sm|sn,ax(sn))表示在状态sn下,执行可用动作ax(sn)之后,状态变化到sm的概率,其中 Let T(s n , a x (s n ), s m ) represent the set of all state transition probabilities of the UAV, and any element p(s m |s n , a x (s n )) in the state s n , the probability that the state changes to s m after performing the available actions a x (s n ), where
设报酬集合R(sn)的任意元素r(sn,ax(sn))表示在状态sn执行动作ax(sn)的报酬;Let any element r(s n , a x (s n )) of the reward set R(s n ) represent the reward for performing the action a x (s n ) in the state sn ;
则无人机执行任意一个搜索任务下的搜索行为预测的马尔科夫模型为:Then the Markov model for predicting the search behavior of the UAV under any search task is:
MDP={S,A,T(sn,ax(sn),sm),R(sn)}→π(sn);MDP={S,A,T(s n ,a x (s n ),s m ),R(s n )}→π(s n );
其中,π为策略,表示从状态集合到动作集合的映射,π(sn)代表无人机从状态sn到动作集合的映射,→表示输出最优策略。Among them, π is the strategy, which represents the mapping from the state set to the action set, π(s n ) represents the mapping of the UAV from the state s n to the action set, and → represents the output optimal strategy.
优选的,所述规划模块204可以包括:Preferably, the planning module 204 may include:
计算单元,用于:用MDP折扣模型计算报酬效用函数,其中折扣因子γ满足:0<γ<1;折扣模型的报酬函数表示在从时刻t=0开始无人机从状态sn使用策略π后的折扣期望总报酬;The calculation unit is used to: calculate the reward utility function with the MDP discount model, where the discount factor γ satisfies: 0<γ<1; the reward function of the discount model represents the discounted expected total reward after the UAV uses policy π from state sn from time t =0;
根据MDP折扣模型的最优方程,建立在状态sn下无人机在搜索任务中搜索操作收益的最优状态值函数方程 以及最优动作值函数方程 并根据两个最优函数方程建立最优搜索策略函数方程 According to the optimal equation of the MDP discount model, the optimal state value function equation of the UAV's search operation revenue in the search task under the state sn is established and the optimal action value function equation And establish the optimal search strategy function equation according to the two optimal function equations
给定单元,用于:对协同搜索区域进行栅格化划分,确定MDP模型的离散状态空间S,给定参与搜索任务的无人机数目g,sUAV(i)为第i架无人机当前状态,sUAV(i)∈S,A(sUAV(i))为第i架无人机在状态sUAV(i)下的动作集合,Ki为第i架无人机的最大搜索步长;给定折扣因子γ和策略迭代的结束条件ε,令迭代次数b=0;The given unit is used to: rasterize the collaborative search area, determine the discrete state space S of the MDP model, given the number of UAVs participating in the search task g, s UAV(i) is the i-th UAV The current state, s UAV(i) ∈ S, A(s UAV(i) ) is the action set of the i-th UAV in the state s UAV(i) , and K i is the maximum search of the i-th UAV Step size; given the discount factor γ and the end condition ε of the strategy iteration, let the number of iterations b = 0;
初始单元,用于:确定目标运动的初始位置以及每个无人机开始搜索的位置;每个无人机根据目标开始运动的初始位置以及目标运动启发式信息获得目标在整个区域的存在概率分布,从而确定每个无人机下一时刻即将搜索的虚拟目标位置;The initial unit is used to: determine the initial position of the target movement and the position where each drone starts to search; each drone obtains the existence probability distribution of the target in the entire area according to the initial position of the target movement and the target movement heuristic information , so as to determine the virtual target position that each UAV will search for at the next moment;
迭代单元,用于:每个无人机根据自己的虚拟目标位置,并根据自己当前的状态sUAV(i),迭代计算各自的状态值函数Vb+1(sUAV(i)),令迭代次数b=b+1;The iterative unit is used for: each UAV based on its own virtual target position and its current state s UAV(i) , iteratively calculates its own state value function V b+1 (s UAV(i) ), let The number of iterations b=b+1;
判断单元,用于:如果||Vb+1(sUAV(i))-Vb(sUAV(i))||<ε,则结束迭代,调用遍历单元;否则,调用迭代单元;Judging unit, used for: if ||V b+1 (s UAV(i) )-V b (s UAV(i) )||<ε, end the iteration and call the traversal unit; otherwise, call the iteration unit;
遍历单元,用于:每个无人机根据最终得到的状态值函数Vb+1(sUAV(i))遍历A(sUAV(i))获得Q(sUAV(i),ai),最终求得收益最大的搜索行为策略πi(t+1)*;Traversal unit for: each UAV traverses A(s UAV(i) ) according to the final state value function V b+1 (s UAV( i) ) to obtain Q(s UAV(i) , a i ) , and finally obtain the search behavior strategy with the largest profit π i (t+1) * ;
转移单元,用于:按照所求的最优策略πi(t+1)*执行动作ai,状态由转移到同时,无人机获得立即报酬ri(sUAV(i),ai),此时令t=t+1,令第i架无人机搜索步长ki=ki+1;The transition unit is used to: execute the action a i according to the required optimal strategy π i (t+1) * , the state is given by move to At the same time, the UAV obtains an immediate reward ri (s UAV(i) , a i ), at this time, let t=t+1, and let the i-th UAV search step ki = ki +1;
结束单元,用于:若在某一个时刻t,第i架无人机位置sUAV(i)与目标当前模拟位置starget相同,则第i架无人机成功搜索到目标,搜索任务完成,算法结束;若搜索步长i=1,2,...,n,则搜索任务失败,算法结束;所述目标当前模拟位置为根据马尔科夫运动目标运动过程中所有可能出现的状态及其概率分布获取的目标当前最可能出现的位置。The end unit is used for: if at a certain time t, the position s UAV(i) of the i-th UAV is the same as the current simulation position s target of the target, then the i-th UAV successfully searches for the target, and the search task is completed, The algorithm ends; if the search step is i=1,2,...,n, then the search task fails and the algorithm ends; the current simulated position of the target is the current maximum position of the target obtained according to all possible states and their probability distributions during the movement of the Markov moving target. possible location.
本发明从使用目的上,效能上,进步及新颖性等观点进行阐述,其具有的实用进步性,己符合专利法所强调的功能增进及使用要件,本发明以上的说明及附图,仅为本发明的较佳实施例而己,并非以此局限本发明,因此,凡一切与本发明构造,装置,待征等近似、雷同的,即凡依本发明专利申请范围所作的等同替换或修饰等,皆应属本发明的专利申请保护的范围之内。The present invention is explained from the viewpoints of purpose of use, efficiency, progress and novelty, etc. The practical progress of the present invention has met the functional enhancement and use requirements emphasized by the patent law. The above description and drawings of the present invention are only for The preferred embodiments of the present invention are not intended to limit the present invention. Therefore, all structures, devices, and waiting lists are similar or similar to those of the present invention, that is, any equivalent replacement or modification made according to the scope of the patent application of the present invention. etc., shall all fall within the scope of protection of the patent application of the present invention.
需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。尽管本发明已进行了一定程度的描述,明显地,在不脱离本发明的精神和范围的条件下,可进行各个条件的适当变化。可以理解,本发明不限于所述实施方案,而归于权利要求的范围,其包括所述每个因素的等同替换。对本领域的技术人员来说,可根据以上描述的技术方案以及构思,做出其它各种相应的改变以及形变,而所有的这些改变以及形变都应该属于本发明权利要求的保护范围之内。It should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other under the condition of no conflict. Although this invention has been described to a certain extent, it will be apparent that suitable changes in various conditions may be made without departing from the spirit and scope of the invention. It is to be understood that the invention is not limited to the embodiments described, but is to be included within the scope of the claims, which include equivalents for each of the elements described. For those skilled in the art, various other corresponding changes and deformations can be made according to the technical solutions and concepts described above, and all these changes and deformations should fall within the protection scope of the claims of the present invention.
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810779927.8A CN108594858B (en) | 2018-07-16 | 2018-07-16 | Unmanned aerial vehicle searching method and device for Markov moving target |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810779927.8A CN108594858B (en) | 2018-07-16 | 2018-07-16 | Unmanned aerial vehicle searching method and device for Markov moving target |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN108594858A CN108594858A (en) | 2018-09-28 |
| CN108594858B true CN108594858B (en) | 2020-10-27 |
Family
ID=63617662
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810779927.8A Active CN108594858B (en) | 2018-07-16 | 2018-07-16 | Unmanned aerial vehicle searching method and device for Markov moving target |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN108594858B (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11579611B1 (en) | 2020-03-30 | 2023-02-14 | Amazon Technologies, Inc. | Predicting localized population densities for generating flight routes |
| US11640764B1 (en) * | 2020-06-01 | 2023-05-02 | Amazon Technologies, Inc. | Optimal occupancy distribution for route or path planning |
| US11868145B1 (en) | 2019-09-27 | 2024-01-09 | Amazon Technologies, Inc. | Selecting safe flight routes based on localized population densities and ground conditions |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109409592B (en) * | 2018-10-15 | 2021-08-24 | 浙江工业大学 | Optimal strategy solution for mobile robots in dynamic environments |
| CN109270833A (en) * | 2018-10-23 | 2019-01-25 | 大连海事大学 | Variable-discourse-domain fuzzy control method based on brushless direct current motor Q learning |
| CN109521447B (en) * | 2018-11-16 | 2022-10-14 | 福州大学 | Missing target searching method based on greedy strategy |
| WO2021043387A1 (en) * | 2019-09-03 | 2021-03-11 | Huawei Technologies Co., Ltd. | Large-scale policy evaluation in multi-agent systems |
| CN112824998A (en) * | 2019-11-20 | 2021-05-21 | 南京航空航天大学 | Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process |
| CN110883776B (en) * | 2019-11-29 | 2021-04-23 | 河南大学 | A Robot Path Planning Algorithm Based on Improved DQN Based on Fast Search Mechanism |
| CN111107602B (en) * | 2019-12-24 | 2021-07-27 | 杭州电子科技大学 | A Secure Routing Method with Minimum Energy Consumption and Delay Weighting for Wireless Body Area Networks |
| CN112382134B (en) * | 2020-04-26 | 2021-07-30 | 北京三快在线科技有限公司 | Method, apparatus, storage medium and electronic device for generating flight path |
| CN113242556B (en) * | 2021-06-04 | 2022-08-23 | 重庆邮电大学 | Unmanned aerial vehicle resource dynamic deployment method based on differentiated services |
| CN114115285B (en) * | 2021-11-29 | 2024-07-19 | 大连海事大学 | Multi-agent emotion target path planning method and device |
| CN114722946B (en) * | 2022-04-12 | 2022-12-20 | 中国人民解放军国防科技大学 | Synthesis of Asynchronous Action and Cooperative Strategy for Unmanned Aerial Vehicle Based on Probabilistic Model Checking |
| CN117333759A (en) * | 2023-09-14 | 2024-01-02 | 成都飞机工业(集团)有限责任公司 | Target searching method based on probability prediction |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Path Planning Algorithm for Mobile Robots Based on Single Chain Sequential Backtracking Q-Learning |
| KR101299134B1 (en) * | 2012-03-19 | 2013-08-22 | 한국과학기술연구원 | Searching system and method for moving targets in dynamic environment by cooperation of plural robots |
| CN103381826A (en) * | 2013-07-31 | 2013-11-06 | 中国人民解放军国防科学技术大学 | Adaptive Cruise Control Method Based on Approximate Policy Iteration |
| CN105425820A (en) * | 2016-01-05 | 2016-03-23 | 合肥工业大学 | Unmanned aerial vehicle cooperative search method for moving object with perception capability |
| CN106919181A (en) * | 2016-10-20 | 2017-07-04 | 湖南大学 | A kind of unmanned plane barrier-avoiding method |
-
2018
- 2018-07-16 CN CN201810779927.8A patent/CN108594858B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101299134B1 (en) * | 2012-03-19 | 2013-08-22 | 한국과학기술연구원 | Searching system and method for moving targets in dynamic environment by cooperation of plural robots |
| CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Path Planning Algorithm for Mobile Robots Based on Single Chain Sequential Backtracking Q-Learning |
| CN103381826A (en) * | 2013-07-31 | 2013-11-06 | 中国人民解放军国防科学技术大学 | Adaptive Cruise Control Method Based on Approximate Policy Iteration |
| CN105425820A (en) * | 2016-01-05 | 2016-03-23 | 合肥工业大学 | Unmanned aerial vehicle cooperative search method for moving object with perception capability |
| CN106919181A (en) * | 2016-10-20 | 2017-07-04 | 湖南大学 | A kind of unmanned plane barrier-avoiding method |
Non-Patent Citations (2)
| Title |
|---|
| 基于部分可观察马尔科夫决策过程的序列规划问题的研究;刘峰;《中国优秀博士学位论文全文数据库基础科学辑》;20151115;第11-15页 * |
| 基于马尔可夫过程的水下运动目标启发式搜索;吴芳,等;《电子与信息学报》;20100515;第1088-1093页 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11868145B1 (en) | 2019-09-27 | 2024-01-09 | Amazon Technologies, Inc. | Selecting safe flight routes based on localized population densities and ground conditions |
| US11579611B1 (en) | 2020-03-30 | 2023-02-14 | Amazon Technologies, Inc. | Predicting localized population densities for generating flight routes |
| US11640764B1 (en) * | 2020-06-01 | 2023-05-02 | Amazon Technologies, Inc. | Optimal occupancy distribution for route or path planning |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108594858A (en) | 2018-09-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108594858B (en) | Unmanned aerial vehicle searching method and device for Markov moving target | |
| CN113110592B (en) | Unmanned aerial vehicle obstacle avoidance and path planning method | |
| Wu et al. | Multi UAV cluster control method based on virtual core in improved artificial potential field | |
| Konev et al. | MotionCNN: A strong baseline for motion prediction in autonomous driving | |
| Zhu et al. | Deep reinforcement learning for real-time assembly planning in robot-based prefabricated construction | |
| CN111047087A (en) | Intelligent optimization method and device for path under cooperation of unmanned aerial vehicle and vehicle | |
| Addanki et al. | Placeto: Efficient progressive device placement optimization | |
| CN118393900B (en) | Automatic driving decision control method, device, system, equipment and storage medium | |
| CN118393973B (en) | Automatic driving control method, device, system, equipment and storage medium | |
| Guillen-Perez et al. | Learning from oracle demonstrations—a new approach to develop autonomous intersection management control algorithms based on multiagent deep reinforcement learning | |
| CN115034459A (en) | Pedestrian trajectory time sequence prediction method | |
| CN117406762A (en) | A UAV remote control algorithm based on segmented reinforcement learning | |
| CN119399570A (en) | A method for predicting intelligent agent actions based on multi-scale spatial perception | |
| Xiang et al. | Socialcvae: Predicting pedestrian trajectory via interaction conditioned latents | |
| CN116362109A (en) | Intelligent unmanned system and method based on digital twinning | |
| CN114818124A (en) | Virtual-real fusion grid rudder model parameter optimization method based on DPPO | |
| Guan et al. | Ab-mapper: Attention and bicnet based multi-agent path finding for dynamic crowded environment | |
| Soliman et al. | Aireye: Uav-based intelligent drl mobile target visitation | |
| Nguyen et al. | Apprenticeship bootstrapping | |
| CN117522078A (en) | Migrant mission planning method and system under unmanned system cluster environment coupling | |
| CN117192998A (en) | Unmanned aerial vehicle autonomous decision-making method and device based on state prediction of Transformer neural network | |
| CN115293334A (en) | Model-based unmanned equipment control method for high sample rate deep reinforcement learning | |
| Tian et al. | The application of path planning algorithm based on deep reinforcement learning for mobile robots | |
| CN114676909A (en) | A charging path planning method for unmanned vehicles based on deep reinforcement learning | |
| Sandström | On the efficiency of transfer learning in a fighter pilot behavior modelling context |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |