+

CN115618497B - A method for airfoil optimization design based on deep reinforcement learning - Google Patents

A method for airfoil optimization design based on deep reinforcement learning Download PDF

Info

Publication number
CN115618497B
CN115618497B CN202211374735.1A CN202211374735A CN115618497B CN 115618497 B CN115618497 B CN 115618497B CN 202211374735 A CN202211374735 A CN 202211374735A CN 115618497 B CN115618497 B CN 115618497B
Authority
CN
China
Prior art keywords
airfoil
reward
value
model
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211374735.1A
Other languages
Chinese (zh)
Other versions
CN115618497A (en
Inventor
屈峰
段少凯
孙迪
惠心雨
白俊强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211374735.1A priority Critical patent/CN115618497B/en
Publication of CN115618497A publication Critical patent/CN115618497A/en
Application granted granted Critical
Publication of CN115618497B publication Critical patent/CN115618497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/28Design optimisation, verification or simulation using fluid dynamics, e.g. using Navier-Stokes equations or computational fluid dynamics [CFD]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/10Numerical modelling
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Fluid Mechanics (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Algebra (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Automation & Control Theory (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Computational Mathematics (AREA)
  • Structures Of Non-Positive Displacement Pumps (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides an airfoil optimization design method based on deep reinforcement learning, which is different from supervised learning, has the characteristic of autonomous learning strategy and maximum long-term rewarding, and is an optimization method which is more similar to intelligent, and has migration and universality of strategy. If the design conditions, such as the incoming flow Mach number, the Reynolds number and the like, are changed within a certain range, the original optimization strategy can still provide the initial optimization direction, and the optimization target can be obviously improved within a short step number.

Description

Airfoil profile optimization design method based on deep reinforcement learning
Technical Field
The invention belongs to the field of aircraft design, and provides an airfoil optimal design method based on deep reinforcement learning.
Background
The wing section is used as a main component of the aircraft, not only promotes the lifting force required by the flight, but also ensures the stability and operability of the aircraft. With the understanding and exploration of the aerodynamic performance of airfoils, multiple airfoil libraries are correspondingly built. At present, a super airfoil which completely meets all design states does not exist, a reference airfoil which meets the requirements is initially selected from an airfoil library according to factors such as design purposes, working conditions and the like, and the reference airfoil is continuously improved based on the reference airfoil and combined with the design requirements. For improving the reference airfoil, the initial method is to use wind tunnels for repeated experiments, which consumes huge manpower and resources.
Since the 21 st century, advances in science and technology have injected new activities into Computational Fluid Dynamics (CFD) technology, which soon became the primary means of analyzing and solving fluid problems. The optimization period can be greatly shortened by using a computational fluid dynamics technology, and a large amount of manpower and material resources are saved, so that a designer can perform repeated computation for a plurality of times until the expected airfoil profile is obtained, for example, the evolution and genetic algorithm are widely applied to the pneumatic optimization problem. However, evolutionary and genetic algorithms have low utilization of the large amount of computational data generated during the optimization process.
The application of machine learning in hydrodynamics is rapidly developed in recent years, the most common method for rapidly predicting the performance of machine learning in the optimization design at present is based on a response surface, the mapping between input and output is constructed by utilizing the existing data to accelerate the optimization process, and the method partially replaces the role of CFD technology in the optimization process and belongs to supervised learning. Unlike supervised learning, reinforcement learning builds a mapping of state parameters to action parameters of the environment, with the purpose of iteratively updating strategies that interact with the environment to obtain the maximum jackpot. At this point, unlike the actual output requiring prediction in supervised learning, a reward corresponding to a certain action. The airfoil design problem is not the optimum of a certain performance pursued alone, but the overall performance after repeated trade-off comparison is the complex system engineering, and manual modification is often needed by combining experience and understanding of engineers. The training process of reinforcement learning is quite similar to the process of engineers accumulating experience. In analogy to traditional trial and error methods, agents responsible for action in reinforcement learning update the strategy of taking action by taking different "actions" continuously during the training process and observing the return of the design results over a period of time in the future (or after taking a series of actions). As the gain increases, the agent may be considered to acquire the same design experience as an engineer to some extent. In the aspect of airfoil optimization, li Runze optimizes the airfoil pressure distribution by using reinforcement learning to reduce the resistance of the transonic airfoil, viquerat performs optimization attempts with and without constraint, the reinforcement learning is still in a starting stage on the airfoil optimization, and the possibility that an intelligent body cannot find the direction and falls into local optimum exists. In general, reinforcement learning is applied to airfoil optimization in aerodynamic optimization design of an aircraft, so that optimization efficiency can be effectively improved, and the method has wide application scenes.
Disclosure of Invention
Because the optimization efficiency of the current CFD-based method is lower, the invention provides a deep reinforcement learning-based airfoil optimization design method, which is different from the supervised learning, has the characteristic of autonomous learning strategy and maximum long-term rewarding, is a more intelligent optimization method, and has migration or strategy universality.
The technical scheme of the invention is as follows:
The airfoil optimization design method based on deep reinforcement learning comprises the following steps:
Step 1, performing geometric parametrization of an airfoil by using a free-form surface deformation method, namely establishing a free-form surface deformation control frame around a reference airfoil, establishing a mapping relation between the control frame and the airfoil, and obtaining a new airfoil by changing the position of a control frame point;
And 2, establishing an optimal design model, and confirming a single design target and constraint conditions according to flight requirements, wherein the design target and constraint conditions are various aerodynamic parameters of the airfoil, such as lift coefficient, drag coefficient, airfoil thickness and the like, and the design target and constraint conditions are expressed by mathematical expressions. The general single objective optimization problem can be written in the following mathematical form:
Minimize:f(x)
subject to:gw(x)≥0,w=1,2,···,W
hr(x)=0,r=1,2,···,R
Where x is a design variable, f (x) is an objective function, g w (x) is an inequality constraint, and a total of W, and h r (x) is an equality constraint, and a total of R.
And 3, establishing a reward function according to the optimization target and constraint conditions, wherein the total reward value is obtained by linear summation of the reward values of all pneumatic parameters, wherein the target increase reward value is achieved, the constraint not increase reward value is met, the constraint decrease reward value is not met, and meanwhile, the magnitude difference between the target and the constraint is balanced by multiplying the target reward value and the constraint with different coefficients.
And 4, establishing an agent, wherein the agent comprises a strategy model pi and a cost function model, the strategy function model can output an action strategy, the cost function model can output advantage estimation and a cost function, the strategy model and the cost function model both use an artificial neural network model containing two hidden layers, and the number of hidden layer nodes is 64. Initializing policy model parameters θ 0 and cost function model parametersAerodynamic parameters of the airfoil relating to design targets and constraints are taken as states, including lift coefficient, drag coefficient, maximum thickness and moment coefficient of the airfoil. The aerodynamic parameters of the reference wing are taken as an initial state s 0.
And 5, the current agent gives an action a, namely a new design variable, according to the state and the rewarding value by the strategy model.
And 6, performing actions on the wing profile to obtain a new wing profile.
And 7, building a structural grid for the new airfoil, performing airfoil bypass numerical simulation by using an open source solver CFL3D, wherein a main control equation is a Reynolds average N-S equation, a turbulence model is a k-omega SST model, calculating to obtain a lift coefficient, a drag coefficient, a maximum thickness and a moment coefficient of the airfoil, and determining that aerodynamic parameters of the new airfoil are in a new state.
And 8, calculating the rewarding value according to the rewarding function in the step 3 by the calculated pneumatic parameter.
Step 9, repeating steps 5-8 for e-1 times by the current strategy model to obtain a track and a rewarding value { r e } containing states and actions obtained by each cycle, wherein the track tau= { s 0,a0,···,se-1,ae-1,se }, s 0 and a 0 are initial states and actions, s e is an e-step state, s e-1 is an e-1 step state, and a e-1 is an e-1 step action.
And step 10, repeating the steps 5-9 for n-1 times based on the current strategy model to obtain n tracks and rewards.
Step 11, calculating the dominance estimation, namely the difference value between the expected rewards of each action a and the average value of the expected rewards of all possible actions in the state, based on the cost function model of the current intelligent agent according to the obtained n track parameters and rewards.
Step 12, constructing a loss function according to the dominance estimation, the track and the rewarding value, and optimizing strategy model parameters theta and cost function model parameters based on a random gradient descent algorithmAnd the optimization target is that the loss function is minimum, and the strategy model and the cost function model are updated by using the optimized parameters to obtain a new strategy model and a new cost function model.
And step 13, circularly repeating the steps 5-12 until the loss function is not reduced any more, and finishing training.
Advantageous effects
1. Compared with algorithms such as genetic algorithm, the method only takes a large amount of calculated data as an optimization target and a constraint evaluation standard, and the experience obtained in the process of learning optimization is tried in the deep reinforcement learning, so that the utilization efficiency of the data can be improved, the calculated amount is reduced, and the optimization efficiency is improved.
2. The airfoil optimization design method based on the deep reinforcement learning provided by the invention has the advantages that the strategy model obtained by the deep reinforcement learning optimization has a certain degree of universality or mobility, and if the design conditions such as the incoming flow Mach number, the Reynolds number and the like are changed within a certain range. The original optimization strategy can still provide an initial optimization direction, and the optimization target can be obviously improved within a short step number, which is the characteristics and advantages not possessed by the traditional optimization method.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is an overall flow chart of an airfoil optimization method of the present invention.
Fig. 2 is a free-form surface deformation control frame.
FIG. 3 is a schematic view of an airfoil design space.
Fig. 4 is a schematic diagram of a fully connected network.
FIG. 5 is a reinforcement learning optimization flow.
FIG. 6 is a grid overview and partial view of an airfoil.
FIG. 7 is an airfoil optimized drag convergence curve of the proximal optimization strategy (PPO) herein.
FIG. 8 is an airfoil optimized drag convergence curve comparing strategy non-dominant ordered genetic algorithm (NSGA-II).
FIG. 9 is an airfoil-optimized drag convergence curve employing a pre-trained PPO strategy.
Detailed Description
The following detailed description of embodiments of the invention is exemplary and intended to be illustrative of the invention and not to be construed as limiting the invention.
Step 1, the RAE2822 is used as a reference airfoil, the reference airfoil is parameterized by adopting a free-form surface deformation method (FFD), a control frame of the free-form surface deformation method is shown in fig. 2, design variables consist of 14 points for controlling an upper airfoil surface and a lower airfoil surface, the disturbance range of each design point in the y direction is +/-0.02, and the design space of the airfoil is shown in fig. 3.
And 2, the design target is to minimize the resistance of the airfoil in the design state, and three constraint conditions are met simultaneously, namely (1) the lift coefficient is not reduced, (2) the absolute value of the moment coefficient of the airfoil is not increased, and (3) the maximum thickness of the airfoil is not reduced. The mathematical expression of the optimization design problem is as follows:
Minimize:Cd
Subject to:Cl≥Cl0
|Cm|≤|Cm0|
t≥t0
Wherein C d is the airfoil drag coefficient, C l is the airfoil lift coefficient, C l0 is the lift coefficient of the reference airfoil, C m is the moment coefficient of the airfoil, C m0 is the moment coefficient of the reference airfoil, t is the thickness of the airfoil, t 0 is the thickness of the reference airfoil.
And 3, constructing a reward function according to the design target and the constraint condition in the step 2, wherein the reward function is constructed as follows:
And reward cd=100×(Cd-Cd0 if C d-Cd0 < 0).
Reward cl=10×(Cl-Cl0 if C l-Cl0 < 0), otherwise reward cl =0.
Let reorder cm=100×(Cm-Cm0 if C m-Cm0 < 0), otherwise let reorder cm =0.
Let reward t=50×(t-t0 if t-t 0 < 0), otherwise let reward t =0.
Final prize value reward=reward cd+rewardcl+rewardcm+100rewardt.
Where reward cd is the drag coefficient rewards value, reward cl is the lift coefficient rewards value, reward cm is the moment coefficient rewards value, reward t is the airfoil thickness rewards value, 10, 50 and 100 are used to balance the magnitude difference between the target and the constraint.
And 4, building an agent, wherein the agent comprises a strategy model pi and a cost function model, the strategy function model outputs an action strategy, the cost function model outputs an advantage estimation and a cost function, the model uses an artificial neural network model containing two hidden layers, and the number of hidden layer nodes is 64 based on PyTorch writing.
The artificial neural network is a feedforward neural network, a plurality of neuron nodes are connected together, the output of each layer is used as the input of the next layer, a calculation structure with a grouping hierarchy is formed through weight calculation, and a full-connection network schematic diagram is shown in fig. 4. Initializing policy model parameters θ 0 and cost function model parametersThe lift coefficient, drag coefficient, maximum thickness, and moment coefficient of the airfoil are taken as states, and the lift coefficient, drag coefficient, maximum thickness, and moment coefficient of the reference airfoil are taken as initial states s 0. The relationship of each part in the reinforcement learning optimization flow is shown in fig. 5.
And 5, the current agent gives actions according to the states and the rewards by the strategy model, namely new airfoil design variables.
And 6, performing actions on the wing profile to obtain a new wing profile.
And 7, establishing a C-shaped structured grid for the two-dimensional new airfoil, wherein the grid overview and the partial diagram of the airfoil are shown in fig. 6. The working conditions are calculated as ma=0.734, re=6.5x10 6 and α=2.79°, where Ma is the incoming flow mach number, re is the incoming flow reynolds number and α is the incoming flow attack angle. When the influence of the penetration force and the external heating is not considered, the open source solver CFL3D is used for carrying out airfoil streaming numerical simulation calculation, the main control equation is a Reynolds average N-S equation, the turbulence model is a k-omega SST turbulence model, and the resistance coefficient c d and the moment coefficient c m, the lift coefficient c l and the airfoil thickness t of the new airfoil are obtained through calculation. And confirming the aerodynamic parameters of the new airfoil to be in a new state.
And 8, calculating the rewarding value according to the rewarding function in the step 3 by using the calculated c d、cm、cl and t.
Step 9, repeating steps 5-8 for e-1 times by the current strategy model to obtain a track and a rewarding value { r e } containing states and actions obtained by each cycle, wherein the track tau= { s 0,a0,···,se-1,ae-1,se }, s 0 and a 0 are initial states and actions, s e is an e-step state, s e-1 is an e-1 step state, and a e-1 is an e-1 step action.
And 10, repeating the steps 5-9 based on the current strategy model to obtain n tracks and rewards.
And 11, calculating a dominance estimation based on the current cost function model according to the obtained track parameters, wherein the dominance estimation is interpolation of the average value of the expected rewards of each action a and the expected rewards of all actions in the state, if the dominance estimation is greater than 0, the action a is better than the average, and otherwise, the dominance estimation is inferior to the average.
Step 12, constructing a loss function according to the dominance estimation, the track and the rewarding value, and optimizing strategy model parameters theta and cost function model parameters based on a random gradient descent algorithmAnd the optimization target is that the loss function is minimum, and the strategy model and the cost function model are updated by using the optimized parameters to obtain a new strategy model and a new cost function model.
And step 13, circularly repeating the steps 5-12 until the loss function is not reduced any more, and finishing training.
Near-end policy optimization (PPO) is a deep reinforcement learning algorithm. FIG. 7 shows an optimized drag convergence curve from the proximal optimization strategy herein, from which it can be seen that there is a significant improvement in airfoil drag, about a 30% improvement over the initial airfoil RAE2822, with a 50 frame reduction in drag. In order to check the optimizing effect of PPO in the single-target constraint pneumatic optimizing design, a non-dominant sorting genetic algorithm (NSGA-II) is adopted as a comparison algorithm, wherein the population scale of the NSGA-II is 100, the evolution algebra is 100, an optimizing convergence curve is shown in figure 8, the optimizing effect of PPO is still different from that of the NSGA-II, and then the optimizing effect of PPO is improved by adopting a pre-training method. The pre-training is based on the model which is trained in similar problems, so that a better initial point can be provided for optimization, the pre-training can avoid sinking into local optimization, and a better optimization result is obtained.
The network is pre-trained, the optimized convergence curve of the pre-trained PPO is shown in fig. 9, and the comparison between the optimized result obtained by optimizing the two methods and the required calculated amount is shown in table 1. The optimized convergence trend can be observed to be obviously more stable than that of the genetic algorithm, the convergence trend of the two is the same, and under the condition that the resistance optimization effect is similar, the calculated amount of the pre-trained PPO strategy is 0.05 times of that of the genetic algorithm, so that the calculated amount is obviously reduced. As shown in table 2, the PPO strategy herein and the pre-trained PPO strategy also performed much better than the genetic algorithm in resistance optimization when the fixed calculation was 500. Compared with a genetic algorithm, the airfoil optimization design method based on deep reinforcement learning can remarkably reduce the calculated amount, and meanwhile, the airfoil optimization design method is not easy to fall into local optimum after pretraining.
TABLE 1
TABLE 2
The design conditions were then changed to demonstrate that reinforcement learning derived strategies had a degree of universality in that the base airfoil was maintained unchanged, the design state was changed from ma=0.734, re=6.5×10 6, α=2.79 ° to 1#: ma=0.72, re=1×10 7,α=2.79°,2#:Ma=0.76,Re=8×106, α=2.79 ° and 3#: ma=0.6, re=6.5×10 6, α=2.79°, respectively. To verify the validity of the strategy, the existing strategy was directly used to optimize the three states for 5 steps, 10 steps, 15 steps and 20 steps, and the change in resistance was observed. The results are shown in Table 3, it can be seen that in the three design states, the resistance of # 1 is reduced by 7.4% after 20 steps of optimization, the resistance of # 2 is reduced by 3.7%, and the resistance of # 3 is reduced by 6.8%, and the constraints of the optimization results of the three design states meet the requirements. In summary, reinforcement learning learns a strategy with a certain universality, and can provide direction and preliminary optimization results for optimization.
TABLE 3 Table 3
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made in the above embodiments by those skilled in the art without departing from the spirit and principles of the invention.

Claims (7)

1.一种基于深度强化学习的翼型优化设计方法,其特征在于:包括以下步骤:1. A method for airfoil optimization design based on deep reinforcement learning, characterized in that it comprises the following steps: 步骤1:用自由曲面变形法进行翼型的几何参数化:在基准翼型周围建立自由曲面变形控制框,建立控制框与翼型的映射关系,通过更改控制框点的位置,得到新的翼型;Step 1: Use the free-form surface deformation method to perform geometric parameterization of the airfoil: establish a free-form surface deformation control frame around the reference airfoil, establish a mapping relationship between the control frame and the airfoil, and obtain a new airfoil by changing the position of the control frame points; 步骤2:建立优化设计模型,根据飞行要求,确认单设计目标和约束条件;Step 2: Establish an optimization design model and confirm the single design objectives and constraints according to flight requirements; 步骤3:根据优化目标和约束条件建立奖励函数;Step 3: Establish a reward function based on the optimization objectives and constraints; 步骤4:建立智能体,包含策略模型π和价值函数模型,策略函数模型输出动作策略,价值函数模型输出优势估计和价值函数;初始化策略模型参数θ0和价值函数模型参数将翼型设计目标和约束条件的气动参数作为状态,其中基准机翼对应的气动参数作为初始状态s0Step 4: Establish an agent, including a policy model π and a value function model. The policy function model outputs action strategies, and the value function model outputs advantage estimates and value functions; initialize policy model parameters θ 0 and value function model parameters The aerodynamic parameters of the airfoil design objectives and constraints are taken as states, where the aerodynamic parameters corresponding to the reference airfoil are taken as the initial state s 0 ; 步骤5:当前智能体由策略模型根据状态和奖励值给出动作,得到新的翼型设计变量;Step 5: The current agent takes actions according to the state and reward value through the policy model to obtain new airfoil design variables; 步骤6:对翼型实施动作得到新的翼型;Step 6: Apply actions to the airfoil to obtain a new airfoil; 步骤7:对得到的新翼型建立结构化网格,并进行新翼型绕流数值模拟,计算得到新翼型的气动参数作为新的状态;Step 7: Establish a structured grid for the obtained new airfoil, perform numerical simulation of the flow around the new airfoil, and calculate the aerodynamic parameters of the new airfoil as the new state; 步骤8:,利用步骤7计算得到的气动参数,根据步骤3中的奖励函数计算得到奖励值;Step 8: Using the aerodynamic parameters calculated in step 7, the reward value is calculated according to the reward function in step 3; 步骤9:由当前策略模型循环重复步骤5-8共e-1次,得到包含每次循环得到的状态和动作的轨迹和奖励值{re},轨迹τ={s0,a0,…,se-1,ae-1,se},其中s0和a0为初始状态和动作,se为e步状态,se-1为e-1步状态,ae-1为e-1步动作;Step 9: The current policy model repeats steps 5-8 for a total of e-1 times to obtain the trajectory and reward value {r e } containing the state and action obtained in each cycle, the trajectory τ = {s 0 , a 0 , … , s e-1 , a e-1 , s e }, where s 0 and a 0 are the initial state and action, s e is the e-step state, s e-1 is the e-1 step state, and a e-1 is the e-1 step action; 步骤10:基于当前策略模型重复步骤5-9共n-1次,得到n个的轨迹和奖励值;Step 10: Repeat steps 5-9 for a total of n-1 times based on the current policy model to obtain n trajectories and reward values; 步骤11:根据得到的n条轨迹参数和奖励值,基于当前智能体的价值函数模型计算优势估计;Step 11: Calculate the advantage estimate based on the value function model of the current agent according to the obtained n trajectory parameters and reward values; 步骤12:根据优势估计、轨迹和奖励值构建损失函数,优化策略模型参数θ和价值函数模型参数优化目标为损失函数最小,用优化后的参数更新策略模型和价值函数模型,得到新的策略模型和价值函数模型;Step 12: Construct a loss function based on advantage estimates, trajectories, and reward values to optimize the policy model parameters θ and the value function model parameters The optimization goal is to minimize the loss function, and use the optimized parameters to update the policy model and value function model to obtain a new policy model and value function model; 步骤13:循环重复步骤5-12直至达到损失函数不再降低,完成训练。Step 13: Repeat steps 5-12 until the loss function stops decreasing and the training is completed. 2.根据权利要求1所述一种基于深度强化学习的翼型优化设计方法,其特征在于:步骤2中,设计目标和约束条件为翼型的气动参数。2. According to the airfoil optimization design method based on deep reinforcement learning in claim 1, it is characterized in that: in step 2, the design objectives and constraints are the aerodynamic parameters of the airfoil. 3.根据权利要求2所述一种基于深度强化学习的翼型优化设计方法,其特征在于:步骤2中,设计目标是使翼型在设计状态下阻力最小,同时满足三个约束条件:(1)升力系数不减,(2)翼型的力矩系数的绝对值不增,(3)翼型的最大厚度不减;3. The airfoil optimization design method based on deep reinforcement learning according to claim 2, characterized in that: in step 2, the design goal is to minimize the drag of the airfoil under the design state, while satisfying three constraints: (1) the lift coefficient does not decrease, (2) the absolute value of the moment coefficient of the airfoil does not increase, and (3) the maximum thickness of the airfoil does not decrease; 优化设计问题的数学表达式如下:The mathematical expression of the optimization design problem is as follows: Minimize:Cd Minimize: C d Subject to: Subject to: |Cm|≤|Cm0||C m |≤|C m0 | t≥t0 t≥t0 其中Cd为翼型阻力系数、Cl为翼型升力系数、为基准翼型的升力系数、Cm为翼型的力矩系数、Cm0为基准翼型的力矩系数、t为翼型的厚度、t0为基准翼型的厚度。Where Cd is the airfoil drag coefficient, Cl is the airfoil lift coefficient, is the lift coefficient of the reference airfoil, Cm is the moment coefficient of the airfoil, Cm0 is the moment coefficient of the reference airfoil, t is the thickness of the airfoil, and t0 is the thickness of the reference airfoil. 4.根据权利要求3所述一种基于深度强化学习的翼型优化设计方法,其特征在于:步骤3中,总奖励值是由各个气动参数奖励值线性和得到,其中达到目标增加奖励值,满足约束不增加奖励值,不满足约束减小奖励值,同时在目标奖励值和约束奖励值乘以不同的系数用以平衡目标和约束之间的量级差异。4. According to claim 3, a method for airfoil optimization design based on deep reinforcement learning is characterized in that: in step 3, the total reward value is obtained by the linear sum of the reward values of each aerodynamic parameter, wherein the reward value is increased when the target is achieved, the reward value is not increased when the constraint is met, and the reward value is reduced when the constraint is not met, and at the same time, the target reward value and the constraint reward value are multiplied by different coefficients to balance the magnitude difference between the target and the constraint. 5.根据权利要求4所述一种基于深度强化学习的翼型优化设计方法,其特征在于:根据步骤2中的设计目标和约束条件构建奖励函数如下:5. According to the airfoil optimization design method based on deep reinforcement learning of claim 4, it is characterized in that: the reward function is constructed according to the design objectives and constraints in step 2 as follows: 如果Cd-Cd0<0则rewardcd=100×(Cd-Cd0);If C d - C d0 < 0, then reward cd = 100 × (C d - C d0 ); 如果否则令rewardcl=0;if Otherwise, let reward cl = 0; 如果Cm-Cm0<0则令rewardcm=100×(Cm-Cm0),否则令rewardcm=0;If C m - C m0 < 0, then let reward cm = 100 × (C m - C m0 ), otherwise let reward cm = 0; 如果t-t0<0则令rewardt=50×(t-t0),否则令rewardt=0;If tt 0 <0, let reward t =50×(tt 0 ), otherwise let reward t =0; 最终奖励值reward=rewardcd+rewardcl+rewardcm+100rewardtFinal reward value reward = reward cd + reward cl + reward cm + 100 reward t ; 其中rewardcd为阻力系数奖励值,rewardcl为升力系数奖励值,rewardcm为力矩系数奖励值,rewardt为翼型厚度奖励值。Among them, reward cd is the reward value of drag coefficient, reward cl is the reward value of lift coefficient, reward cm is the reward value of moment coefficient, and reward t is the reward value of airfoil thickness. 6.根据权利要求1所述一种基于深度强化学习的翼型优化设计方法,其特征在于:步骤4中,策略模型和价值函数模型均使用含有两层隐藏层的人工神经网络模型,隐藏层节点数为64。6. According to the airfoil optimization design method based on deep reinforcement learning in claim 1, it is characterized in that: in step 4, both the strategy model and the value function model use an artificial neural network model containing two hidden layers, and the number of hidden layer nodes is 64. 7.根据权利要求1所述一种基于深度强化学习的翼型优化设计方法,其特征在于:步骤7中,使用开源求解器CFL3D进行新翼型绕流数值模拟,主控方程为雷诺平均N-S方程,湍流模型为k-ωSST模型,计算得到新翼型的气动参数。7. According to claim 1, a method for airfoil optimization design based on deep reinforcement learning is characterized in that: in step 7, the open source solver CFL3D is used to perform numerical simulation of the flow around the new airfoil, the main control equation is the Reynolds-averaged Navier-Stokes equation, the turbulence model is the k-ωSST model, and the aerodynamic parameters of the new airfoil are calculated.
CN202211374735.1A 2022-11-04 2022-11-04 A method for airfoil optimization design based on deep reinforcement learning Active CN115618497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211374735.1A CN115618497B (en) 2022-11-04 2022-11-04 A method for airfoil optimization design based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211374735.1A CN115618497B (en) 2022-11-04 2022-11-04 A method for airfoil optimization design based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN115618497A CN115618497A (en) 2023-01-17
CN115618497B true CN115618497B (en) 2025-03-21

Family

ID=84876923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211374735.1A Active CN115618497B (en) 2022-11-04 2022-11-04 A method for airfoil optimization design based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115618497B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150680A (en) * 2023-09-15 2023-12-01 上海师范大学 Airfoil profile optimization design method based on deep learning and reinforcement learning
CN117634334B (en) * 2023-10-27 2024-07-23 西北工业大学 Optimization design method of fighter airfoil considering both aerodynamics and stealth and airfoil family with wide speed range
WO2025100881A1 (en) * 2023-11-09 2025-05-15 포항공과대학교 산학협력단 Cfd automation method for optimal airfoil flow analysis based on reinforcement training, cfd airfoil flow analysis method, and cfd airfoil flow analysis device
CN119249911B (en) * 2024-12-03 2025-04-04 西北工业大学 Flow active control synergy design method based on transfer learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN114626277A (en) * 2022-04-02 2022-06-14 浙江大学 Active flow control method based on reinforcement learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614631B (en) * 2018-10-18 2022-10-14 清华大学 Fully automatic aerodynamic optimization method for aircraft based on reinforcement learning and transfer learning
CN111694365B (en) * 2020-07-01 2021-04-20 武汉理工大学 A Deep Reinforcement Learning Based Path Tracking Method for Unmanned Vessel Formation
CN114861368B (en) * 2022-06-13 2023-09-12 中南大学 A construction method of railway longitudinal section design learning model based on proximal strategy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN114626277A (en) * 2022-04-02 2022-06-14 浙江大学 Active flow control method based on reinforcement learning

Also Published As

Publication number Publication date
CN115618497A (en) 2023-01-17

Similar Documents

Publication Publication Date Title
CN115618497B (en) A method for airfoil optimization design based on deep reinforcement learning
CN112084727A (en) A Transition Prediction Method Based on Neural Network
CN113093526B (en) Overshoot-free PID controller parameter setting method based on reinforcement learning
CN109409614A (en) A kind of Methods of electric load forecasting based on BR neural network
CN114862237B (en) Power system economic load scheduling optimization method based on improved eagle optimization algorithm
CN109800517B (en) Improved reverse modeling method for magnetorheological damper
CN116628854B (en) Wing section aerodynamic characteristic prediction method, system, electronic equipment and storage medium
CN114676639A (en) Aircraft aerodynamic shape optimization method, device and medium based on neural network
CN111142383B (en) Online learning method for optimal controller of nonlinear system
CN111007724A (en) A Quantitative Tracking Control Method for Specified Performance of Hypersonic Aircraft Based on Interval Type II Fuzzy Neural Network
CN114818487A (en) Natural gas and wet gas pipeline liquid holdup prediction model method based on PSO-BP neural network
CN112990674A (en) Multi-target operation scheduling method for offshore floating wind power plant
CN116702292A (en) Pneumatic optimization method for wind nozzle of flat steel box girder based on deep reinforcement learning
CN112632728A (en) Turbine mechanical blade profile design and performance prediction method based on deep learning
CN119129367A (en) Power load model parameter identification method based on improved deep reinforcement learning
Niu et al. Artificial intelligence-based response surface progressive optimality algorithm for operation optimization of multiple hydropower reservoirs
CN119270909A (en) A hypersonic vehicle adaptive attitude control method and system based on proximal strategy optimization
CN116880191A (en) Intelligent control method of process industrial production system based on time sequence prediction
CN111325308A (en) A Nonlinear System Identification Method
CN119578702A (en) Real-time evaluation method and device for distributed photovoltaic carrying capacity of distribution network
CN115167150B (en) Batch process two-dimensional off-orbit strategy staggered Q learning optimal tracking control method with unknown system dynamics
CN117454705B (en) Wing structure/material multi-scale aeroelastic optimization method, device and medium
CN118034030A (en) A method for optimizing PID parameters based on LSTM algorithm in combination with diesel engine
Mukesh et al. Influence of search algorithms on aerodynamic design optimisation of aircraft wings
CN112507604B (en) A Data-Driven Voltage-Frequency Response Modeling Method for Renewable Power Sources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载