CN115618497B

CN115618497B - A method for airfoil optimization design based on deep reinforcement learning

Info

Publication number: CN115618497B
Application number: CN202211374735.1A
Authority: CN
Inventors: 屈峰; 段少凯; 孙迪; 惠心雨; 白俊强
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2025-03-21
Anticipated expiration: 2042-11-04
Also published as: CN115618497A

Abstract

The invention provides an airfoil optimization design method based on deep reinforcement learning, which is different from supervised learning, has the characteristic of autonomous learning strategy and maximum long-term rewarding, and is an optimization method which is more similar to intelligent, and has migration and universality of strategy. If the design conditions, such as the incoming flow Mach number, the Reynolds number and the like, are changed within a certain range, the original optimization strategy can still provide the initial optimization direction, and the optimization target can be obviously improved within a short step number.

Description

Airfoil profile optimization design method based on deep reinforcement learning

Technical Field

The invention belongs to the field of aircraft design, and provides an airfoil optimal design method based on deep reinforcement learning.

Background

The wing section is used as a main component of the aircraft, not only promotes the lifting force required by the flight, but also ensures the stability and operability of the aircraft. With the understanding and exploration of the aerodynamic performance of airfoils, multiple airfoil libraries are correspondingly built. At present, a super airfoil which completely meets all design states does not exist, a reference airfoil which meets the requirements is initially selected from an airfoil library according to factors such as design purposes, working conditions and the like, and the reference airfoil is continuously improved based on the reference airfoil and combined with the design requirements. For improving the reference airfoil, the initial method is to use wind tunnels for repeated experiments, which consumes huge manpower and resources.

Since the 21 st century, advances in science and technology have injected new activities into Computational Fluid Dynamics (CFD) technology, which soon became the primary means of analyzing and solving fluid problems. The optimization period can be greatly shortened by using a computational fluid dynamics technology, and a large amount of manpower and material resources are saved, so that a designer can perform repeated computation for a plurality of times until the expected airfoil profile is obtained, for example, the evolution and genetic algorithm are widely applied to the pneumatic optimization problem. However, evolutionary and genetic algorithms have low utilization of the large amount of computational data generated during the optimization process.

The application of machine learning in hydrodynamics is rapidly developed in recent years, the most common method for rapidly predicting the performance of machine learning in the optimization design at present is based on a response surface, the mapping between input and output is constructed by utilizing the existing data to accelerate the optimization process, and the method partially replaces the role of CFD technology in the optimization process and belongs to supervised learning. Unlike supervised learning, reinforcement learning builds a mapping of state parameters to action parameters of the environment, with the purpose of iteratively updating strategies that interact with the environment to obtain the maximum jackpot. At this point, unlike the actual output requiring prediction in supervised learning, a reward corresponding to a certain action. The airfoil design problem is not the optimum of a certain performance pursued alone, but the overall performance after repeated trade-off comparison is the complex system engineering, and manual modification is often needed by combining experience and understanding of engineers. The training process of reinforcement learning is quite similar to the process of engineers accumulating experience. In analogy to traditional trial and error methods, agents responsible for action in reinforcement learning update the strategy of taking action by taking different "actions" continuously during the training process and observing the return of the design results over a period of time in the future (or after taking a series of actions). As the gain increases, the agent may be considered to acquire the same design experience as an engineer to some extent. In the aspect of airfoil optimization, li Runze optimizes the airfoil pressure distribution by using reinforcement learning to reduce the resistance of the transonic airfoil, viquerat performs optimization attempts with and without constraint, the reinforcement learning is still in a starting stage on the airfoil optimization, and the possibility that an intelligent body cannot find the direction and falls into local optimum exists. In general, reinforcement learning is applied to airfoil optimization in aerodynamic optimization design of an aircraft, so that optimization efficiency can be effectively improved, and the method has wide application scenes.

Disclosure of Invention

Because the optimization efficiency of the current CFD-based method is lower, the invention provides a deep reinforcement learning-based airfoil optimization design method, which is different from the supervised learning, has the characteristic of autonomous learning strategy and maximum long-term rewarding, is a more intelligent optimization method, and has migration or strategy universality.

The technical scheme of the invention is as follows:

The airfoil optimization design method based on deep reinforcement learning comprises the following steps:

Step 1, performing geometric parametrization of an airfoil by using a free-form surface deformation method, namely establishing a free-form surface deformation control frame around a reference airfoil, establishing a mapping relation between the control frame and the airfoil, and obtaining a new airfoil by changing the position of a control frame point;

And 2, establishing an optimal design model, and confirming a single design target and constraint conditions according to flight requirements, wherein the design target and constraint conditions are various aerodynamic parameters of the airfoil, such as lift coefficient, drag coefficient, airfoil thickness and the like, and the design target and constraint conditions are expressed by mathematical expressions. The general single objective optimization problem can be written in the following mathematical form:

Minimize:f(x)

subject to:g_w(x)≥0,w=1,2,···,W

h_r(x)=0,r=1,2,···,R

Where x is a design variable, f (x) is an objective function, g _w (x) is an inequality constraint, and a total of W, and h _r (x) is an equality constraint, and a total of R.

And 3, establishing a reward function according to the optimization target and constraint conditions, wherein the total reward value is obtained by linear summation of the reward values of all pneumatic parameters, wherein the target increase reward value is achieved, the constraint not increase reward value is met, the constraint decrease reward value is not met, and meanwhile, the magnitude difference between the target and the constraint is balanced by multiplying the target reward value and the constraint with different coefficients.

And 4, establishing an agent, wherein the agent comprises a strategy model pi and a cost function model, the strategy function model can output an action strategy, the cost function model can output advantage estimation and a cost function, the strategy model and the cost function model both use an artificial neural network model containing two hidden layers, and the number of hidden layer nodes is 64. Initializing policy model parameters θ ₀ and cost function model parametersAerodynamic parameters of the airfoil relating to design targets and constraints are taken as states, including lift coefficient, drag coefficient, maximum thickness and moment coefficient of the airfoil. The aerodynamic parameters of the reference wing are taken as an initial state s ₀.

And 5, the current agent gives an action a, namely a new design variable, according to the state and the rewarding value by the strategy model.

And 6, performing actions on the wing profile to obtain a new wing profile.

And 7, building a structural grid for the new airfoil, performing airfoil bypass numerical simulation by using an open source solver CFL3D, wherein a main control equation is a Reynolds average N-S equation, a turbulence model is a k-omega SST model, calculating to obtain a lift coefficient, a drag coefficient, a maximum thickness and a moment coefficient of the airfoil, and determining that aerodynamic parameters of the new airfoil are in a new state.

And 8, calculating the rewarding value according to the rewarding function in the step 3 by the calculated pneumatic parameter.

Step 9, repeating steps 5-8 for e-1 times by the current strategy model to obtain a track and a rewarding value { r _e } containing states and actions obtained by each cycle, wherein the track tau= { s ₀,a₀,···,s_e-1,a_e-1,s_e }, s ₀ and a ₀ are initial states and actions, s _e is an e-step state, s _e-1 is an e-1 step state, and a _e-1 is an e-1 step action.

And step 10, repeating the steps 5-9 for n-1 times based on the current strategy model to obtain n tracks and rewards.

Step 11, calculating the dominance estimation, namely the difference value between the expected rewards of each action a and the average value of the expected rewards of all possible actions in the state, based on the cost function model of the current intelligent agent according to the obtained n track parameters and rewards.

Step 12, constructing a loss function according to the dominance estimation, the track and the rewarding value, and optimizing strategy model parameters theta and cost function model parameters based on a random gradient descent algorithmAnd the optimization target is that the loss function is minimum, and the strategy model and the cost function model are updated by using the optimized parameters to obtain a new strategy model and a new cost function model.

And step 13, circularly repeating the steps 5-12 until the loss function is not reduced any more, and finishing training.

Advantageous effects

1. Compared with algorithms such as genetic algorithm, the method only takes a large amount of calculated data as an optimization target and a constraint evaluation standard, and the experience obtained in the process of learning optimization is tried in the deep reinforcement learning, so that the utilization efficiency of the data can be improved, the calculated amount is reduced, and the optimization efficiency is improved.

2. The airfoil optimization design method based on the deep reinforcement learning provided by the invention has the advantages that the strategy model obtained by the deep reinforcement learning optimization has a certain degree of universality or mobility, and if the design conditions such as the incoming flow Mach number, the Reynolds number and the like are changed within a certain range. The original optimization strategy can still provide an initial optimization direction, and the optimization target can be obviously improved within a short step number, which is the characteristics and advantages not possessed by the traditional optimization method.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is an overall flow chart of an airfoil optimization method of the present invention.

Fig. 2 is a free-form surface deformation control frame.

FIG. 3 is a schematic view of an airfoil design space.

Fig. 4 is a schematic diagram of a fully connected network.

FIG. 5 is a reinforcement learning optimization flow.

FIG. 6 is a grid overview and partial view of an airfoil.

FIG. 7 is an airfoil optimized drag convergence curve of the proximal optimization strategy (PPO) herein.

FIG. 8 is an airfoil optimized drag convergence curve comparing strategy non-dominant ordered genetic algorithm (NSGA-II).

FIG. 9 is an airfoil-optimized drag convergence curve employing a pre-trained PPO strategy.

Detailed Description

The following detailed description of embodiments of the invention is exemplary and intended to be illustrative of the invention and not to be construed as limiting the invention.

Step 1, the RAE2822 is used as a reference airfoil, the reference airfoil is parameterized by adopting a free-form surface deformation method (FFD), a control frame of the free-form surface deformation method is shown in fig. 2, design variables consist of 14 points for controlling an upper airfoil surface and a lower airfoil surface, the disturbance range of each design point in the y direction is +/-0.02, and the design space of the airfoil is shown in fig. 3.

And 2, the design target is to minimize the resistance of the airfoil in the design state, and three constraint conditions are met simultaneously, namely (1) the lift coefficient is not reduced, (2) the absolute value of the moment coefficient of the airfoil is not increased, and (3) the maximum thickness of the airfoil is not reduced. The mathematical expression of the optimization design problem is as follows:

Minimize:C_d

Subject to:C_l≥C_l0

|C_m|≤|C_m0|

t≥t₀

Wherein C _d is the airfoil drag coefficient, C _l is the airfoil lift coefficient, C _l0 is the lift coefficient of the reference airfoil, C _m is the moment coefficient of the airfoil, C _m0 is the moment coefficient of the reference airfoil, t is the thickness of the airfoil, t ₀ is the thickness of the reference airfoil.

And 3, constructing a reward function according to the design target and the constraint condition in the step 2, wherein the reward function is constructed as follows:

And reward _cd＝100×(C_d-C_d0 if C _d-C_d0 < 0).

Reward _cl＝10×(C_l-C_l0 if C _l-C_l0 < 0), otherwise reward _cl =0.

Let reorder _cm＝100×(C_m-C_m0 if C _m-C_m0 < 0), otherwise let reorder _cm =0.

Let reward _t＝50×(t-t₀ if t-t ₀ < 0), otherwise let reward _t =0.

Final prize value reward=reward _cd+reward_cl+reward_cm+100reward_t.

Where reward _cd is the drag coefficient rewards value, reward _cl is the lift coefficient rewards value, reward _cm is the moment coefficient rewards value, reward _t is the airfoil thickness rewards value, 10, 50 and 100 are used to balance the magnitude difference between the target and the constraint.

And 4, building an agent, wherein the agent comprises a strategy model pi and a cost function model, the strategy function model outputs an action strategy, the cost function model outputs an advantage estimation and a cost function, the model uses an artificial neural network model containing two hidden layers, and the number of hidden layer nodes is 64 based on PyTorch writing.

The artificial neural network is a feedforward neural network, a plurality of neuron nodes are connected together, the output of each layer is used as the input of the next layer, a calculation structure with a grouping hierarchy is formed through weight calculation, and a full-connection network schematic diagram is shown in fig. 4. Initializing policy model parameters θ ₀ and cost function model parametersThe lift coefficient, drag coefficient, maximum thickness, and moment coefficient of the airfoil are taken as states, and the lift coefficient, drag coefficient, maximum thickness, and moment coefficient of the reference airfoil are taken as initial states s ₀. The relationship of each part in the reinforcement learning optimization flow is shown in fig. 5.

And 5, the current agent gives actions according to the states and the rewards by the strategy model, namely new airfoil design variables.

And 6, performing actions on the wing profile to obtain a new wing profile.

And 7, establishing a C-shaped structured grid for the two-dimensional new airfoil, wherein the grid overview and the partial diagram of the airfoil are shown in fig. 6. The working conditions are calculated as ma=0.734, re=6.5x10 ⁶ and α=2.79°, where Ma is the incoming flow mach number, re is the incoming flow reynolds number and α is the incoming flow attack angle. When the influence of the penetration force and the external heating is not considered, the open source solver CFL3D is used for carrying out airfoil streaming numerical simulation calculation, the main control equation is a Reynolds average N-S equation, the turbulence model is a k-omega SST turbulence model, and the resistance coefficient c _d and the moment coefficient c _m, the lift coefficient c _l and the airfoil thickness t of the new airfoil are obtained through calculation. And confirming the aerodynamic parameters of the new airfoil to be in a new state.

And 8, calculating the rewarding value according to the rewarding function in the step 3 by using the calculated c _d、c_m、c_l and t.

And 10, repeating the steps 5-9 based on the current strategy model to obtain n tracks and rewards.

And 11, calculating a dominance estimation based on the current cost function model according to the obtained track parameters, wherein the dominance estimation is interpolation of the average value of the expected rewards of each action a and the expected rewards of all actions in the state, if the dominance estimation is greater than 0, the action a is better than the average, and otherwise, the dominance estimation is inferior to the average.

Near-end policy optimization (PPO) is a deep reinforcement learning algorithm. FIG. 7 shows an optimized drag convergence curve from the proximal optimization strategy herein, from which it can be seen that there is a significant improvement in airfoil drag, about a 30% improvement over the initial airfoil RAE2822, with a 50 frame reduction in drag. In order to check the optimizing effect of PPO in the single-target constraint pneumatic optimizing design, a non-dominant sorting genetic algorithm (NSGA-II) is adopted as a comparison algorithm, wherein the population scale of the NSGA-II is 100, the evolution algebra is 100, an optimizing convergence curve is shown in figure 8, the optimizing effect of PPO is still different from that of the NSGA-II, and then the optimizing effect of PPO is improved by adopting a pre-training method. The pre-training is based on the model which is trained in similar problems, so that a better initial point can be provided for optimization, the pre-training can avoid sinking into local optimization, and a better optimization result is obtained.

The network is pre-trained, the optimized convergence curve of the pre-trained PPO is shown in fig. 9, and the comparison between the optimized result obtained by optimizing the two methods and the required calculated amount is shown in table 1. The optimized convergence trend can be observed to be obviously more stable than that of the genetic algorithm, the convergence trend of the two is the same, and under the condition that the resistance optimization effect is similar, the calculated amount of the pre-trained PPO strategy is 0.05 times of that of the genetic algorithm, so that the calculated amount is obviously reduced. As shown in table 2, the PPO strategy herein and the pre-trained PPO strategy also performed much better than the genetic algorithm in resistance optimization when the fixed calculation was 500. Compared with a genetic algorithm, the airfoil optimization design method based on deep reinforcement learning can remarkably reduce the calculated amount, and meanwhile, the airfoil optimization design method is not easy to fall into local optimum after pretraining.

TABLE 1

TABLE 2

The design conditions were then changed to demonstrate that reinforcement learning derived strategies had a degree of universality in that the base airfoil was maintained unchanged, the design state was changed from ma=0.734, re=6.5×10 ⁶, α=2.79 ° to 1#: ma=0.72, re=1×10 ⁷,α＝2.79°,2#：Ma＝0.76,Re＝8×10⁶, α=2.79 ° and 3#: ma=0.6, re=6.5×10 ⁶, α=2.79°, respectively. To verify the validity of the strategy, the existing strategy was directly used to optimize the three states for 5 steps, 10 steps, 15 steps and 20 steps, and the change in resistance was observed. The results are shown in Table 3, it can be seen that in the three design states, the resistance of # 1 is reduced by 7.4% after 20 steps of optimization, the resistance of # 2 is reduced by 3.7%, and the resistance of # 3 is reduced by 6.8%, and the constraints of the optimization results of the three design states meet the requirements. In summary, reinforcement learning learns a strategy with a certain universality, and can provide direction and preliminary optimization results for optimization.

TABLE 3 Table 3

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made in the above embodiments by those skilled in the art without departing from the spirit and principles of the invention.

Claims

1. A method for airfoil optimization design based on deep reinforcement learning, characterized in that it comprises the following steps:

Step 1: Use the free-form surface deformation method to perform geometric parameterization of the airfoil: establish a free-form surface deformation control frame around the reference airfoil, establish a mapping relationship between the control frame and the airfoil, and obtain a new airfoil by changing the position of the control frame points;

Step 2: Establish an optimization design model and confirm the single design objectives and constraints according to flight requirements;

Step 3: Establish a reward function based on the optimization objectives and constraints;

Step 4: Establish an agent, including a policy model π and a value function model. The policy function model outputs action strategies, and the value function model outputs advantage estimates and value functions; initialize policy model parameters θ ₀ and value function model parameters The aerodynamic parameters of the airfoil design objectives and constraints are taken as states, where the aerodynamic parameters corresponding to the reference airfoil are taken as the initial state s ₀ ;

Step 5: The current agent takes actions according to the state and reward value through the policy model to obtain new airfoil design variables;

Step 6: Apply actions to the airfoil to obtain a new airfoil;

Step 7: Establish a structured grid for the obtained new airfoil, perform numerical simulation of the flow around the new airfoil, and calculate the aerodynamic parameters of the new airfoil as the new state;

Step 8: Using the aerodynamic parameters calculated in step 7, the reward value is calculated according to the reward function in step 3;

Step 9: The current policy model repeats steps 5-8 for a total of e-1 times to obtain the trajectory and reward value {r _e } containing the state and action obtained in each cycle, the trajectory τ = {s ₀ , a ₀ , … , s _e-1 , a _e-1 , s _e }, where s ₀ and a ₀ are the initial state and action, s _e is the e-step state, s _e-1 is the e-1 step state, and a _e-1 is the e-1 step action;

Step 10: Repeat steps 5-9 for a total of n-1 times based on the current policy model to obtain n trajectories and reward values;

Step 11: Calculate the advantage estimate based on the value function model of the current agent according to the obtained n trajectory parameters and reward values;

Step 12: Construct a loss function based on advantage estimates, trajectories, and reward values to optimize the policy model parameters θ and the value function model parameters The optimization goal is to minimize the loss function, and use the optimized parameters to update the policy model and value function model to obtain a new policy model and value function model;

Step 13: Repeat steps 5-12 until the loss function stops decreasing and the training is completed.

2. According to the airfoil optimization design method based on deep reinforcement learning in claim 1, it is characterized in that: in step 2, the design objectives and constraints are the aerodynamic parameters of the airfoil.

3. The airfoil optimization design method based on deep reinforcement learning according to claim 2, characterized in that: in step 2, the design goal is to minimize the drag of the airfoil under the design state, while satisfying three constraints: (1) the lift coefficient does not decrease, (2) the absolute value of the moment coefficient of the airfoil does not increase, and (3) the maximum thickness of the airfoil does not decrease;

The mathematical expression of the optimization design problem is as follows:

Minimize: C _d

Subject to：

|C _m |≤|C _m0 |

_t≥t0

Where _Cd is the airfoil drag coefficient, _Cl is the airfoil lift coefficient, is the lift coefficient of the reference airfoil, _Cm is the moment coefficient of the airfoil, _Cm0 is the moment coefficient of the reference airfoil, t is the thickness of the airfoil, and _t0 is the thickness of the reference airfoil.

4. According to claim 3, a method for airfoil optimization design based on deep reinforcement learning is characterized in that: in step 3, the total reward value is obtained by the linear sum of the reward values of each aerodynamic parameter, wherein the reward value is increased when the target is achieved, the reward value is not increased when the constraint is met, and the reward value is reduced when the constraint is not met, and at the same time, the target reward value and the constraint reward value are multiplied by different coefficients to balance the magnitude difference between the target and the constraint.

5. According to the airfoil optimization design method based on deep reinforcement learning of claim 4, it is characterized in that: the reward function is constructed according to the design objectives and constraints in step 2 as follows:

If C _d - C _d0 < 0, then reward _cd = 100 × (C _d - C _d0 );

if Otherwise, let reward _cl = 0;

If C _m - _{C m0} < 0, then let reward _cm = 100 × (C _m - _{C m0} ), otherwise let reward _cm = 0;

If tt ₀ <0, let reward _t =50×(tt ₀ ), otherwise let reward _t =0;

Final reward value reward = reward _cd + reward _cl + reward _cm + 100 reward _t ;

Among them, reward _cd is the reward value of drag coefficient, reward _cl is the reward value of lift coefficient, reward _cm is the reward value of moment coefficient, and reward _t is the reward value of airfoil thickness.

6. According to the airfoil optimization design method based on deep reinforcement learning in claim 1, it is characterized in that: in step 4, both the strategy model and the value function model use an artificial neural network model containing two hidden layers, and the number of hidden layer nodes is 64.

7. According to claim 1, a method for airfoil optimization design based on deep reinforcement learning is characterized in that: in step 7, the open source solver CFL3D is used to perform numerical simulation of the flow around the new airfoil, the main control equation is the Reynolds-averaged Navier-Stokes equation, the turbulence model is the k-ωSST model, and the aerodynamic parameters of the new airfoil are calculated.