CN114676909A - A charging path planning method for unmanned vehicles based on deep reinforcement learning - Google Patents
A charging path planning method for unmanned vehicles based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114676909A CN114676909A CN202210302308.6A CN202210302308A CN114676909A CN 114676909 A CN114676909 A CN 114676909A CN 202210302308 A CN202210302308 A CN 202210302308A CN 114676909 A CN114676909 A CN 114676909A
- Authority
- CN
- China
- Prior art keywords
- network
- node
- model
- gapn
- charging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
- G06Q10/047—Optimisation of routes or paths, e.g. travelling salesman problem
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Artificial Intelligence (AREA)
- Marketing (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Primary Health Care (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Operations Research (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Quality & Reliability (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Description
技术领域technical field
本发明属于机器学习应用领域,具体涉及模型搭建与训练,通过实际问题的解决效果来反应模型的优劣,尤其涉及基于深度强化学习的无人充电车路径规划方法。The invention belongs to the application field of machine learning, and specifically relates to model building and training, and reflects the advantages and disadvantages of the model through the solution effect of practical problems, and particularly relates to a path planning method for unmanned charging vehicles based on deep reinforcement learning.
背景技术Background technique
在大多数无线传感器网络中,每个传感器节点都是固定部署在一个位置,并且通过电池设备或者依靠环境能源,例如风能、太阳能等进行供给电量,但是这样的能源供给存在不可控性和不稳定性,无线传感网工作效率以及稳定性也受到了极大的影响,网络的生命周期也大大降低。In most wireless sensor networks, each sensor node is fixedly deployed in one location, and supplies electricity through battery equipment or relying on ambient energy, such as wind energy, solar energy, etc., but such energy supply is uncontrollable and unstable The efficiency and stability of wireless sensor networks are also greatly affected, and the life cycle of the network is also greatly reduced.
为了解决传统无线传感器网络的能量供给问题,利用移动设备对传感器节点进行补充电量的方案成为研究的热点。这样的充电方式实际上是基于移动性节点的思想提出的,引入移动性节点的可以提高网络的灵活性,为传感器网中的多种任务,例如传感器节点电量补充、数据收集、目标跟踪和节点定位等提供了新的思路,极大提高了传感器网络的容错率,延长了传感器网络的生命周期。为了让充电车能够合理选择需要补充能量的传感器节点,平衡节点之间的能量损耗,更加有效地执行充电任务,避免网络中出现因电量不足而出现的节点“死亡”现象,需要对充电车进行有效调度以及合理的路径规划。In order to solve the energy supply problem of traditional wireless sensor networks, the scheme of using mobile devices to supplement the power of sensor nodes has become a research hotspot. Such a charging method is actually proposed based on the idea of mobility nodes. The introduction of mobility nodes can improve the flexibility of the network and provide various tasks in the sensor network, such as sensor node power replenishment, data collection, target tracking and node tracking. Positioning provides new ideas, greatly improves the fault tolerance rate of sensor networks, and prolongs the life cycle of sensor networks. In order to allow the charging vehicle to reasonably select the sensor nodes that need to be supplemented with energy, balance the energy loss between nodes, perform charging tasks more effectively, and avoid the phenomenon of node "death" due to insufficient power in the network, it is necessary to conduct Effective scheduling and reasonable path planning.
路径规划问题是经典的组合优化问题之一,近些年来,研究表明深度学习和强化学习在解决组合优化问题上有独特的优势。基于深度强化学习的模型通常将问题转化到高维空间中,在高维空间中寻找并学习变量之间存在的关系,并结合图神经网络对于图结构的处理,获取到问题的最优解,这是传统运筹学方法所无法做到的。所以,如何设计神经网络的框架结构和损失函数,优化模型的训练过程,是目前深度强化学习解决组合优化问题的技术难点。The path planning problem is one of the classic combinatorial optimization problems. In recent years, studies have shown that deep learning and reinforcement learning have unique advantages in solving combinatorial optimization problems. The model based on deep reinforcement learning usually transforms the problem into a high-dimensional space, finds and learns the relationship between variables in the high-dimensional space, and combines the processing of the graph structure with the graph neural network to obtain the optimal solution to the problem. This is not possible with traditional operations research methods. Therefore, how to design the frame structure and loss function of the neural network and optimize the training process of the model is the technical difficulty of deep reinforcement learning to solve the combinatorial optimization problem.
发明内容SUMMARY OF THE INVENTION
本发明所要解决的技术问题在于,在考虑节点位置、耗电速率、充电速率以及节点电量容量等限制条件情况下,为充电车设计合理的节点充电路径规划方案,避免无线传感网络中的节点出现电量不足而导致的“死亡现象”,并且尽可能减少无人车所消耗的路径成本,从而延长无线传感网络的生命周期,提高网络可用性。The technical problem to be solved by the present invention is to design a reasonable node charging path planning scheme for charging vehicles under the consideration of node location, power consumption rate, charging rate and node power capacity and other constraints, so as to avoid the nodes in the wireless sensor network. There is a "death phenomenon" caused by insufficient power, and the path cost consumed by unmanned vehicles is minimized, thereby extending the life cycle of wireless sensor networks and improving network availability.
本发明的技术方案是:The technical scheme of the present invention is:
基于深度强化学习的无人充电车路径规划方法,包括如下步骤:The path planning method for unmanned charging vehicles based on deep reinforcement learning includes the following steps:
步骤(1)收集无线传感网络中的节点S={s0,s1,s2,…,sn}信息,包含节点的位置信息和紧急状况;Step (1) Collect node S= { s 0 , s 1 , s 2 , .
步骤(2)搭建基于深度强化学习的模型GAPN,包含点编码器、图编码器、解码器以及注意力机制模块;Step (2) build a model GAPN based on deep reinforcement learning, including a point encoder, a graph encoder, a decoder and an attention mechanism module;
步骤(3)以搜集到节点信息数据为基础,生成训练数据集和测试数据集用于训练GAPN模型;Step (3) is based on the collected node information data, and generates a training data set and a test data set for training the GAPN model;
步骤(4)设定模型训练目标为最小化节点死亡率以及最小化充电车路径成本,定义损失函数,使用强化学习actor-critic算法对GAPN进行训练,得到最终的模型;Step (4) Set the model training objectives to minimize the node mortality rate and minimize the path cost of the charging vehicle, define the loss function, and use the reinforcement learning actor-critic algorithm to train the GAPN to obtain the final model;
步骤(5)将无线传感网络中的节点S={s0,s1,s2,…,sn}信息输入到GAPN模型中,模型返回充电车访问所有节点的方案π={s0,s′1,s′2,…,s′n},其中π是对S的重新排序;Step (5) Input the information of nodes S={s 0 , s 1 , s 2 ,...,s n } in the wireless sensor network into the GAPN model, and the model returns the scheme π={s 0 for the charging vehicle to visit all nodes ,s′ 1 ,s′ 2 ,…,s′ n }, where π is the reordering of S;
步骤(6)利用一种启发式聚类方法结合GAPN,解决多个充电车场景下的路径规划问题,即无线传感网络具有多个充电车时,调度充电车以及规划充电车路径的方案;Step (6) utilizes a heuristic clustering method combined with GAPN to solve the path planning problem in multiple charging vehicle scenarios, that is, when the wireless sensor network has multiple charging vehicles, a scheme for scheduling charging vehicles and planning charging vehicle paths;
步骤(7)将模型的图拓扑嵌入模块进行修改,利用模型泛化性来解决大规模旅行商问题的方案,即模型在小规模旅行商问题数据集上进行训练,然后将模型用于解决大规模场景下的旅行商问题。Step (7) Modify the graph topology embedding module of the model, and use the model generalization to solve the large-scale traveling salesman problem, that is, the model is trained on the small-scale traveling salesman problem dataset, and then the model is used to solve large The Traveling Salesman Problem in Scale Scenarios.
本发明进一步优选,步骤(4)中使用的actor-critic算法具体为:The present invention is further preferred, and the actor-critic algorithm used in step (4) is specifically:
步骤4.1:以GAPN的网络结构为基础搭建actor网络和critic网络,θ代表actor网络参数,θv代表critic网络参数,两个网络的区别在于actor网络在解码过程中使用平均采样的方式,critic网络采用贪心采样的策略;Step 4.1: Build an actor network and a critic network based on the GAPN network structure. θ represents the actor network parameters, and θ v represents the critic network parameters. The difference between the two networks is that the actor network uses average sampling in the decoding process. The critic network Adopt greedy sampling strategy;
步骤4.2:一批量训练数据集合X,共有B个数据,对于每一个数据集实例xi,应用当前actor网络进行求解最终路径成本L(πi|xi),应用critic网络求解其基线期望值 Step 4.2: A batch of training data set X, with a total of B data, for each data set instance xi , the current actor network is used to solve the final path cost L(π i | xi ), and the critic network is used to solve its baseline expectation value
步骤4.3:利用步骤4.2中得到的值进行两个网络批量策略梯度计算,actor网络策略梯度gθ计算: critic网络策略梯度 Step 4.3: Use the values obtained in step 4.2 to perform two network batch policy gradient calculations, and the actor network policy gradient g θ calculation: critic network policy gradient
步骤4.4:利用梯度下降法对两个网络的参数分别进行更新,更新规则分别为:θ=θ+lr*gθ,其中lr为学习率;Step 4.4: Use the gradient descent method to update the parameters of the two networks respectively. The update rules are: θ=θ+lr*g θ , where lr is the learning rate;
步骤4.5:重复步骤4.2至步骤4.4T次,T为训练前设定好的训练代数。Step 4.5: Repeat steps 4.2 to 4.4 T times, where T is the training algebra set before training.
此外,本发明还提出一种利用模型泛化性来解决大规模旅行商问题的方案。目前解决大规模旅行商问题,比如500,1000或更多的节点的问题,用同等大小的数据集进行训练模型时间成本较高,本发明利用图神经网络的模型泛化能力,对整个点集编码的方式进行修改,将点集的邻接矩阵作为图神经网络的输入,使得模型能够使用小规模问题的数据集进行训练,然后用于解决大规模问题的场景。In addition, the present invention also proposes a solution to the large-scale traveling salesman problem by using the model generalization. At present, to solve the large-scale traveling salesman problem, such as the problem of 500, 1000 or more nodes, the time cost of training the model with the same size data set is relatively high. The encoding method is modified, and the adjacency matrix of the point set is used as the input of the graph neural network, so that the model can be trained using the dataset of small-scale problems, and then used to solve the scene of large-scale problems.
本发明进一步优选,提出了一个解决多个充电车场景下的路径规划解决方案,如附图2所示,即无线传感网络具有多个充电车时,调度充电车以及规划充电车路径的方案。现有m辆充电车用于解决无线传感网络节点集合S的充电问题,步骤如下:The present invention is further preferred, and proposes a solution for path planning in the scenario of multiple charging vehicles, as shown in FIG. 2 , that is, when the wireless sensor network has multiple charging vehicles, the scheme of scheduling charging vehicles and planning the path of charging vehicles . The existing m charging vehicles are used to solve the charging problem of the wireless sensor network node set S, and the steps are as follows:
步骤(6)中使用的启发式聚类算法具体为:(1)利用启发式聚类算法将节点集合S划分成m个子集,分别交给m个充电车进行充电。The specific heuristic clustering algorithm used in step (6) is as follows: (1) Using the heuristic clustering algorithm to divide the node set S into m subsets, which are respectively handed over to m charging vehicles for charging.
(2)对m任务子集,分别利用训练好的GAPN模型进行处理,得到m个充电车各自的充电路径方案。(2) For the m task subsets, the trained GAPN models are used for processing, and the respective charging path schemes of m charging vehicles are obtained.
步骤6.1:定义m个子集合R={S1,S2,…,Sm},用于存放每个充电车将要访问的节点;定义e(Sj)为Sj中所有节点的紧急程度之和;Step 6.1: Define m subsets R={S 1 , S 2 ,..., S m } to store the nodes to be visited by each charging vehicle; define e(S j ) as the sum of the urgency of all nodes in S j and;
步骤6.2:对于输入的无线传感网络节点集合S,按照节点的初始紧急程度从大到小排序;Step 6.2: For the input wireless sensor network node set S, according to the initial urgency of the node Sort from largest to smallest;
步骤6.3:先将节点集合S中前m个节点分别放入m个子集S1,S2,…,Sm中;;Step 6.3: first put the first m nodes in the node set S into m subsets S 1 , S 2 ,..., S m respectively;
步骤6.4:先将集合R按照e(Sj)递增顺序排序,遍历集合S中剩下的节点,对于当前遍历到的节点si,如果Sj中不存在节点sk和节点si的开启时间之差Ri,k大于旅行时间Γi,k,或者截止时间之差Di,k小于旅行时间Γi,k,则将si放入Sj,否则就更换另一个子集,直至条件满足;Step 6.4: First sort the set R in the ascending order of e(S j ), traverse the remaining nodes in the set S, and for the currently traversed node si , if there is no node sk and node si in S j to open If the time difference Ri ,k is greater than the travel time Γ i,k , or the deadline difference Di,k is smaller than the travel time Γ i,k , then put si into S j , otherwise replace another subset until condition is satisfied;
步骤6.5:得到最终m个子集合R={S1,S2,…,Sm},用于m个充电车任务执行。Step 6.5: Obtain the final m subsets R={S 1 , S 2 , . . . , S m }, which are used for the execution of m charging vehicle tasks.
本发明的有益效果为:The beneficial effects of the present invention are:
在无线传感网络的为节点充电的场景中,良好的充电调度方案对于移动充电车进行电量补充是至关重要的;本发明在考虑了节点的耗电速率、充电车的充电速率和路径成本等限制条件,发明了一种基于深度强化学习的决策模型,为移动充电车提供更好的充电路径方案,以延长无线传感网络的生命周期。In the scenario of charging nodes in a wireless sensor network, a good charging scheduling scheme is crucial for the power replenishment of mobile charging vehicles; the present invention considers the power consumption rate of nodes, charging rate of charging vehicles and path cost Due to other constraints, a decision-making model based on deep reinforcement learning was invented to provide a better charging path scheme for mobile charging vehicles to extend the life cycle of wireless sensor networks.
附图说明Description of drawings
图1为本发明提出的深度学习模型解决路径规划问题示意图。FIG. 1 is a schematic diagram of solving a path planning problem by a deep learning model proposed by the present invention.
图2为本发明提出的解决多个充电车路径规划问题方案示意图。FIG. 2 is a schematic diagram of a solution to the problem of path planning for multiple charging vehicles proposed by the present invention.
图3为本发明应用于实际系统的框架结构示意图。FIG. 3 is a schematic diagram of a frame structure of the present invention applied to an actual system.
图4为本发明实际应用到的移动智能体充电车。FIG. 4 is a mobile intelligent body charging vehicle to which the present invention is actually applied.
图5、为图4中设计的移动智能体充电车线条图;Figure 5 is a line drawing of the mobile intelligent charging vehicle designed in Figure 4;
图6、为本发明的步骤流程图。FIG. 6 is a flow chart of the steps of the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施方式,进一步阐明本发明,应理解下述具体实施方式仅用于说明本发明而不用于限制本发明的范围。需要说明的是,下面描述中使用的词语“前”、“后”、“左”、“右”、“上”和“下”指的是附图中的方向,词语“内”和“外”分别指的是朝向或远离特定部件几何中心的方向。The present invention will be further clarified below with reference to the accompanying drawings and specific embodiments. It should be understood that the following specific embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. It should be noted that the words "front", "rear", "left", "right", "upper" and "lower" used in the following description refer to the directions in the drawings, and the words "inner" and "outer" ” refer to directions towards or away from the geometric center of a particular part, respectively.
如图1和6所示,本实施例提出深度学习模型解决路径规划问题示意图,输入数据包含有所有节点的数据信息,通过模型的编码和解码过程,返回最终的路径规划方案,具体包括以下步骤:As shown in Figures 1 and 6, this embodiment proposes a schematic diagram of a deep learning model to solve the path planning problem. The input data contains the data information of all nodes, and the final path planning scheme is returned through the encoding and decoding process of the model, which specifically includes the following steps :
(1)将无线传感网络中的节点集合定义为S={s0,s1,s2,…,si},其中s0表示的是基站,也就是充电车出发的起点和最终返回的终点。除了基站外的每个节点si都由一个四维向量表示,si={ci,[ri,di],Ti(t),ei(t)},其中ci=[xi,yi]表示节点si的二维坐标,[ri,di]表示的是节点si的时间窗,ri表示开启时间,di表示截止时间,节点si只能在该时间窗内被充电,ei(t)和Ti(t)分别表示在t时刻节点si的紧急状况和所需要充电的时间,ei(t)越大,代表si的电量越低,需要充电的时间越长。(1) Define the node set in the wireless sensor network as S={s 0 , s 1 , s 2 ,..., s i }, where s 0 represents the base station, that is, the starting point and the final return of the charging car the end point. Each node s i except the base station is represented by a four-dimensional vector, s i ={ ci ,[r i , d i ],T i (t),e i (t)}, where ci =[x i , y i ] represents the two-dimensional coordinates of node s i , [ri , d i ] represents the time window of node s i , ri represents the opening time, d i represents the deadline , and the node s i can only be is charged within the time window, e i (t) and T i (t) represent the emergency situation of node si at time t and the required charging time, respectively, the larger e i (t), the lower the power of si , the longer it takes to charge.
(2)充电车以v的恒定速度在网络中移动,从节点si移动到sj的旅行时间其中Li,j代表节点si和sj之间的距离。(2) The charging vehicle moves in the network at a constant speed of v, and the travel time from node si to sj where Li,j represents the distance between nodes s i and s j .
(3)节点si的紧急状况ei(t)随时间t变化,变化的规律用数学公式表示为其中表示节点si的初始紧急状态,f(t)是一个递增项,ei(t)存在上限值emax,始终保持ei(t)≤emax。节点si的时间窗[ri,di]取决于节点的初始紧急状态耗电速率,那么si的截止时间 (3) The emergency situation e i (t) of node s i changes with time t, and the law of change is expressed by mathematical formula as in Represents the initial emergency state of node s i , f(t) is an increasing term, e i (t) has an upper limit e max , and always keeps e i (t)≤e max . The time window [r i ,d i ] of node si depends on the initial emergency state power consumption rate of the node, then the deadline for si
(4)对于充电车来说,如果在节点si的时间窗之外访问的话,将会受到一个惩罚,用数学公式表示为:(4) For the charging car, if it visits outside the time window of node si , it will be punished by a penalty, which is expressed as:
其中τi代表充电车到达节点si的时间,σ是一个常数,取决于问题的规模。where τ i represents the time for the charging car to reach node si and σ is a constant that depends on the scale of the problem.
(5)对于模型给出的一个路径规划的方案π,评价该方案的函数为:其中β是一个负常数,其大小取决于路径成本和节点死亡成本相对大小。(5) For a path planning scheme π given by the model, the function to evaluate the scheme is: where β is a negative constant whose size depends on the relative size of path cost and node death cost.
(6)搭建深度强化学习模型GAPN,GAPN使用的是编码器-解码器的框架结构,编码器将所有节点si进行编码,将si={ci,[ri,di],Ti(t),ei(t)}分成静态gi={ci,[ri,di]}和动态δi(t)={Ti(t),ei(t)}两个部分分别进行编码操作,得到编码后的向量 (6) Build a deep reinforcement learning model GAPN. GAPN uses an encoder-decoder framework. The encoder encodes all nodes si , and sets si ={ ci ,[r i , d i ],T i (t), e i (t)} is divided into static g i ={ ci ,[r i , d i ]} and dynamic δ i (t)={T i (t),e i (t)} two Each part is encoded separately, and the encoded vector is obtained.
其中relu是一个非线性的激活函数,W1,W2和b都是可训练的参数,在训练中可以被学习到。where relu is a nonlinear activation function, W 1 , W 2 and b are all trainable parameters that can be learned during training.
(8)GAPN中的图神经网络对无线传感网络所有节点集合S做编码,编码网络一共有N层,每一层的表达式为:(8) The graph neural network in GAPN encodes all node sets S of the wireless sensor network. The encoding network has a total of N layers, and the expression of each layer is:
(9)解码器部分依据注意力机制对编码器生成的隐藏层向量进行权重计算,计算结果权重最高的节点,则为下一个将要访问的点,权重计算公式如下:(9) The decoder part uses the attention mechanism to generate the hidden layer vector of the encoder Perform weight calculation, and the node with the highest weight of the calculation result is the next point to be visited. The weight calculation formula is as follows:
其中oi表示解码所依据的语境编码信息,包含图神经网络所编码的信息,qi表示节点si的编码向量。Among them, o i represents the context encoding information based on decoding, including the information encoded by the graph neural network, and q i represents the encoding vector of node si .
(10)得到完整的神经网络模型后,使用强化学习中的actor-critic方法对模型进行训练,得到最终的GAPN模型。(10) After obtaining the complete neural network model, use the actor-critic method in reinforcement learning to train the model to obtain the final GAPN model.
(11)将待解决的问题节点集合,以四维特征的形式输入到GAPN模型中,模型返回最终访问所有节点的顺序,即充电车的路径规划方案。(11) Input the set of problem nodes to be solved into the GAPN model in the form of four-dimensional features, and the model returns the order of finally visiting all nodes, that is, the path planning scheme of the charging vehicle.
图3展示了部署模型系统至智能体(移动无人车)集群时的系统结构图,当当中心基站将收集到的节点信息进处理,并且通过GAPN模块进行处理,Figure 3 shows the system structure diagram when deploying the model system to an agent (mobile unmanned vehicle) cluster. When the central base station processes the collected node information and processes it through the GAPN module,
返回相应的任务指令,发送给移动智能体,移动智能体利用控制指令控制其运动控制器,执行相应的任务。具体地,每个智能体如图4和5所示,该设备上运行有智能体代理程序。智能体代理程序首先负责与智能体上的运动控制器进行通信,向其发送控制指令、收集及初步处理智能体感知到的数据,同时智能体代理会进行任务决策,通过基站中心发送的任务指令,进行相应的执行操作。The corresponding task instruction is returned and sent to the mobile agent, and the mobile agent uses the control instruction to control its motion controller to execute the corresponding task. Specifically, each agent is shown in Figures 4 and 5, and an agent agent program runs on the device. The agent agent program is first responsible for communicating with the motion controller on the agent, sending control commands to it, collecting and preliminarily processing the data sensed by the agent, and at the same time, the agent agent will make task decisions and pass the task instructions sent by the base station center. , and perform the corresponding operations.
本发明方案所公开的技术手段不仅限于上述实施方式所公开的技术手段,还包括由以上技术特征任意组合所组成的技术方案。The technical means disclosed in the solution of the present invention are not limited to the technical means disclosed in the above embodiments, but also include technical solutions composed of any combination of the above technical features.
Claims (3)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210302308.6A CN114676909B (en) | 2022-03-25 | 2022-03-25 | Unmanned vehicle charging path planning method based on deep reinforcement learning |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210302308.6A CN114676909B (en) | 2022-03-25 | 2022-03-25 | Unmanned vehicle charging path planning method based on deep reinforcement learning |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114676909A true CN114676909A (en) | 2022-06-28 |
| CN114676909B CN114676909B (en) | 2024-04-09 |
Family
ID=82076143
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210302308.6A Active CN114676909B (en) | 2022-03-25 | 2022-03-25 | Unmanned vehicle charging path planning method based on deep reinforcement learning |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114676909B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115759915A (en) * | 2022-10-21 | 2023-03-07 | 东南大学 | Multi-constraint Vehicle Path Planning Method Based on Attention Mechanism and Deep Reinforcement Learning |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021007812A1 (en) * | 2019-07-17 | 2021-01-21 | 深圳大学 | Deep neural network hyperparameter optimization method, electronic device and storage medium |
| CN112788560A (en) * | 2020-12-18 | 2021-05-11 | 昆明理工大学 | Space-time charging scheduling method based on deep reinforcement learning |
| CN113159432A (en) * | 2021-04-28 | 2021-07-23 | 杭州电子科技大学 | Multi-agent path planning method based on deep reinforcement learning |
| WO2021248607A1 (en) * | 2020-06-10 | 2021-12-16 | 深圳大学 | Deep reinforcement learning-based taxi dispatching method and system |
-
2022
- 2022-03-25 CN CN202210302308.6A patent/CN114676909B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021007812A1 (en) * | 2019-07-17 | 2021-01-21 | 深圳大学 | Deep neural network hyperparameter optimization method, electronic device and storage medium |
| WO2021248607A1 (en) * | 2020-06-10 | 2021-12-16 | 深圳大学 | Deep reinforcement learning-based taxi dispatching method and system |
| CN112788560A (en) * | 2020-12-18 | 2021-05-11 | 昆明理工大学 | Space-time charging scheduling method based on deep reinforcement learning |
| CN113159432A (en) * | 2021-04-28 | 2021-07-23 | 杭州电子科技大学 | Multi-agent path planning method based on deep reinforcement learning |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115759915A (en) * | 2022-10-21 | 2023-03-07 | 东南大学 | Multi-constraint Vehicle Path Planning Method Based on Attention Mechanism and Deep Reinforcement Learning |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114676909B (en) | 2024-04-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Mao et al. | A comparison of deep reinforcement learning models for isolated traffic signal control | |
| CN112902969B (en) | A path planning method for unmanned aerial vehicles in the process of data collection | |
| CN108594858B (en) | Unmanned aerial vehicle searching method and device for Markov moving target | |
| CN113223305B (en) | Multi-intersection traffic light control method, system and storage medium based on reinforcement learning | |
| CN110341690A (en) | A PHEV Energy Management Method Based on Deterministic Policy Gradient Learning | |
| CN118759846A (en) | Multi-agent deep reinforcement learning path planning method based on improved A* heuristic | |
| CN111596668B (en) | Mobile robot anthropomorphic path planning method based on reverse reinforcement learning | |
| CN110488842A (en) | A kind of track of vehicle prediction technique based on two-way kernel ridge regression | |
| CN111752304B (en) | UAV data collection method and related equipment | |
| CN117744994B (en) | Patrol unmanned aerial vehicle-aircraft nest distribution scheduling method based on goblet sea squirt algorithm | |
| CN109862532A (en) | Layout optimization method and system of multi-sensor nodes for rail transit condition monitoring | |
| CN115187056A (en) | A Multi-Agent Cooperative Resource Allocation Method Considering the Principle of Fairness | |
| Dong et al. | Double ant colony algorithm based on dynamic feedback for energy-saving route planning for ships | |
| Wang et al. | Cooperative motion planning for persistent 3d visual coverage with multiple quadrotor uavs | |
| CN119047673A (en) | Heterogeneous system cooperative delivery path optimization method under constraint of road network and energy consumption | |
| CN115759199A (en) | Multi-robot environment exploration method and system based on hierarchical graph neural network | |
| CN115907248A (en) | Multi-robot unknown environment path planning method based on geometric neural network | |
| Sun et al. | Leveraging digital twin and DRL for collaborative context offloading in C-V2X autonomous driving | |
| CN117610763A (en) | Electric automobile charging planning method, device, computer equipment and storage medium | |
| Li et al. | Adaptive multi-agent deep mixed reinforcement learning for traffic light control | |
| CN118444646A (en) | AGV scheduling method on topological graph based on self-attention mechanism reinforcement learning | |
| CN116739466A (en) | Vehicle path planning method for distribution center based on multi-agent deep reinforcement learning | |
| Han et al. | Dynamic collaborative charging algorithm for mobile and static nodes in Industrial Internet of Things | |
| Zhu et al. | Adaptive Broad Deep Reinforcement Learning for Intelligent Traffic Light Control | |
| CN114676909B (en) | Unmanned vehicle charging path planning method based on deep reinforcement learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |