CN111857081A

CN111857081A - Performance control method of chip packaging and testing production line based on Q-learning reinforcement learning

Info

Publication number: CN111857081A
Application number: CN202010797879.2A
Authority: CN
Inventors: 李波; 冯益铭; 钱鑫森
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2020-10-30
Anticipated expiration: 2040-08-10
Also published as: CN111857081B

Abstract

The invention relates to the field of performance control and optimization of a semiconductor chip packaging and testing production line, in particular to a performance control method for a chip packaging and testing production line based on Q-learning reinforcement learning. The invention establishes a more accurate performance prediction model of the series-parallel production line for semiconductor packaging and testing, and comprehensively uses the Morris screening method and the Arena simulation method to carry out quantitative analysis of the global sensitivity, and obtains several influencing factors and their influence rules that have the greatest impact on the performance of the production line, thereby avoiding the need for The equipment Markov state space is huge, and the traditional mathematical model analysis is not applicable. The invention controls the variability factors of the production line on the basis of performance prediction and sensitivity analysis, and improves the value method of the parameter ε, so that the algorithm converges faster and avoids local optimization, and the performance control method has better flexibility. and real-time.

Description

Performance control of chip packaging and testing production line based on Q-learning reinforcement learning method

技术领域technical field

本发明涉及半导体芯片封装测试生产线性能控制与优化领域，具体是面向半导体芯片封装测试生产线的，涉及一种结合灵敏度分析和Q-learning强化学习算法的性能控制方法。The invention relates to the field of performance control and optimization of a semiconductor chip packaging and testing production line, in particular to a semiconductor chip packaging and testing production line, and relates to a performance control method combining sensitivity analysis and Q-learning reinforcement learning algorithm.

背景技术Background technique

半导体制造业对国民经济的发展具有巨大的战略价值，为保持我国半导体制造业良好发展，除了扩大生产规模，还需关注制造系统的生产效率，加强生产管理控制技术。由于半导体制造系统具有工艺路径高度重入、生产过程高度复杂、制造周期漫长、系统规模庞大及高度不确定性等生产特点，对生产线进行性能控制难度较大。缓冲区容量大小、设备突发故障、设备预防性维护、产品重工等多种变动性因素也大大影响了制造系统的生产性能，导致生产效率降低，生产周期延长，影响生产计划的正常执行。The semiconductor manufacturing industry has great strategic value for the development of the national economy. In order to maintain the good development of my country's semiconductor manufacturing industry, in addition to expanding the production scale, it is also necessary to pay attention to the production efficiency of the manufacturing system and strengthen the production management and control technology. Due to the production characteristics of semiconductor manufacturing systems such as highly re-entrant process paths, highly complex production processes, long manufacturing cycles, large system scale and high uncertainty, it is difficult to control the performance of the production line. A variety of variable factors such as buffer capacity, sudden equipment failure, equipment preventive maintenance, and product rework also greatly affect the production performance of the manufacturing system, resulting in reduced production efficiency, prolonged production cycle, and impact on the normal execution of production plans.

当前对生产线性能进行智能、全面、动态控制的研究较少，大多局限于生产线变动性的某一方面，未能全局地考察生产线上的多种变动性因素；当前研究中建立的半导体串并联生产线性能预测模型与实际生产情况存在一定偏差，精确度有所欠缺；传统的性能控制优化方法难以针对生产线变动性因素的变化进行实时控制，灵活性不足。At present, there are few studies on intelligent, comprehensive and dynamic control of production line performance, most of which are limited to one aspect of production line variability, and fail to comprehensively examine various variable factors on the production line; the semiconductor series-parallel production line established in the current research There is a certain deviation between the performance prediction model and the actual production situation, and the accuracy is lacking; the traditional performance control optimization method is difficult to control the changes of the production line variability factors in real time, and the flexibility is insufficient.

发明内容SUMMARY OF THE INVENTION

针对现有半导体芯片封装测试生产线性能控制模型与策略的不足，本发明提出了一种基于Q-learning强化学习的芯片封装测试生产线性能控制方法。本发明方法针对现有的变动性因素响应不及时、变动性因素考虑不周全、控制策略存在冲突等问题，结合灵敏度分析和Q-learning强化学习算法对半导体芯片封装测试生产线制造性能进行智能控制。Aiming at the deficiencies of the existing semiconductor chip packaging and testing production line performance control models and strategies, the present invention proposes a performance control method for chip packaging and testing production lines based on Q-learning reinforcement learning. Aiming at the problems of untimely response to variable factors, incomplete consideration of variable factors, and conflicting control strategies, the method of the invention combines sensitivity analysis and Q-learning reinforcement learning algorithm to intelligently control the manufacturing performance of semiconductor chip packaging and testing production lines.

一种基于Q-learning强化学习的芯片封装测试生产线性能控制方法，包括以下步骤：A method for controlling the performance of a chip packaging and testing production line based on Q-learning reinforcement learning, comprising the following steps:

步骤1：构建半导体芯片封装测试串并联生产线抽象模型；Step 1: Build an abstract model of a series-parallel production line for semiconductor chip packaging and testing;

步骤2：基于步骤1构建的生产线抽象模型，建立半导体芯片封装测试串并联生产线性能的预测模型；Step 2: Based on the abstract model of the production line constructed in Step 1, establish a prediction model for the performance of the series-parallel production line for semiconductor chip packaging and testing;

步骤3：基于步骤1构建的生产线抽象模型，根据Morris筛选法定性分析与Arena仿真定量分析，得到关键变动性因素对生产线性能的影响机制；Step 3: Based on the abstract model of the production line constructed in Step 1, according to the qualitative analysis of the Morris screening method and the quantitative analysis of the Arena simulation, the impact mechanism of the key variable factors on the performance of the production line is obtained;

步骤4：基于步骤2建立的半导体芯片封装测试串并联生产线性能的预测模型和步骤3所得的关键变动性分析，建立基于Q-learning强化学习算法的性能控制模型，以生产线效益指标最优为性能控制目标进行迭代求解，得到全局的最优性能控制策略。Step 4: Based on the prediction model of the performance of the series-parallel production line for semiconductor chip packaging and testing established in Step 2 and the key variability analysis obtained in Step 3, establish a performance control model based on the Q-learning reinforcement learning algorithm, and take the production line benefit index as the best performance The control objective is iteratively solved, and the global optimal performance control strategy is obtained.

所述的步骤1具体为：The step 1 is specifically:

半导体芯片封装测试生产线模型抽象：以半导体生产制造产线后道工序，即芯片封装测试生产线作为研究对象，假设工站间存在有限缓冲区，排队规则为先来先服务，将其抽象为包含重入(重工)的多工站串并联排队生产线模型。Model abstraction of semiconductor chip packaging and testing production line: Taking the back-end process of the semiconductor manufacturing production line, that is, the chip packaging and testing production line as the research object, it is assumed that there is a limited buffer between the stations, and the queuing rule is first-come, first-served, and it is abstracted to include heavy Multi-station serial-parallel queuing production line model for input (heavy industry).

所述的步骤2具体为：The step 2 is specifically:

步骤2.1：变动性计算：计算到达变动性c_a和加工时间变动性c_e。Step 2.1: Variability calculation: Calculate arrival variability _ca and processing time variability _ce .

步骤2.2：确定性能预测基本指标。Step 2.2: Determine the basic indicators of performance prediction.

由工件在队列处的平均加工时间CT_q和有效加工时间t_e得到驻留于工站的平均时间CT(生产周期)，进一步计算得到工站处平均在制品水平WIP，将工件生产速率TH、生产周期CT、在制品水平WIP作为生产线性能预测基本指标。From the average processing time CT _q and the effective processing time t _e of the workpiece at the queue, the average time CT (production cycle) resident at the station is obtained, and the average work-in-process level WIP at the station is further calculated. The workpiece production rate TH, The production cycle CT and the WIP level of the work-in-progress are used as the basic indicators for the performance prediction of the production line.

CT＝CT_q+t_e CT=CT _q + _te

WIP＝CT×THWIP=CT×TH

步骤2.3：建立生产线性能预测模型。Step 2.3: Build a production line performance prediction model.

步骤2.3.1：计算产品j在工站i的排队时间：Step 2.3.1: Calculate the queuing time of product j at station i:

其中c_a ^ij、c_e ^ij分别为产品j在工站i的到达变动性和加工时间变动性，u^ij为工站i的利用率，m^ij为工站i并联设备数量，t_e ^ij为产品j在工站i的有效加工时间。where c _a ^ij and c _e ^ij are the arrival variability and processing time variability of product j at station i, respectively, u ^ij is the utilization rate of station i, m ^ij is the number of parallel devices at station i, and t _e ^ij is Effective processing time of product j at station i.

步骤2.3.2：计算工件生产速率TH。Step 2.3.2: Calculate the workpiece production rate TH.

假设工站i中有m^ij(b>m>1)台并联设备，b为工站i前缓冲区容量大小，k为工站i正在加工工件数，若有0≤k≤b，工站i前无等待的工件j(0＜j＜r，r表示生产线中一共加工的产品数量)加工时的概率p₀为：Assuming that there are m ^ij (b>m>1) parallel devices in station i, b is the buffer capacity in front of station i, k is the number of workpieces being processed by station i, if there is 0≤k≤b, station i The probability p ₀ of the workpiece j (0<j<r, r represents the total number of products processed in the production line) without waiting before i is:

工件j在缓存区容量大小为b的阻塞概率

为:Blocking probability of workpiece j in buffer size b

for:

设q_hj为工件j在工站h上的不良品率，Q_ij为工站i监测到的不良品率，其取值范围为0＜h＜i≤s，其中s表示该串并联生产线中工站数量，则在工站i上检测并移除的工件j的不良品概率Q_ij为：Let q _hj be the defective product rate of workpiece j on station h, and Q _ij be the defective product rate monitored by station i, and its value range is 0<h<i≤s, where s represents the series-parallel production line. the number of stations, then the defective product probability Q _ij of workpiece j detected and removed at station i is:

表示生产线中所有带有不良品检测工站编号的集合。

Indicates the collection of all stations with defective product inspection station numbers in the production line.

则工件j在工站i的生产速率TH_ij为：Then the production rate TH _ij of workpiece j at station i is:

当某工站利用率为最大时，记工站I为产品J的瓶颈工站，生产速率记为r_b ^IJ＝max(u^ij)。When the utilization rate of a certain station is the maximum, station I is denoted as the bottleneck station of product J, and the production rate is denoted as r _b ^IJ =max(u ^ij ).

步骤2.3.3：计算生产线的生产周期(逻辑生产周期)CT_j和在制品水平WIP_j。Step 2.3.3: Calculate the production line's production cycle (logical production cycle) CT _j and the work-in-process level WIP _j .

计算工件平均等待成批时间WTBT：Calculate the average waiting batch time WTBT for workpieces:

其中r_a代表工件到达工站的速率，其中k_ij表示工站i的产品j加工批量大小，此时

则

改写CT_q ^ij计算公式：where _{ra represents the rate at which the workpiece arrives at the station, and k ij} _represents the batch size of the product j at the station i. At this time,

but

Rewrite the calculation formula of CT _q ^ij :

计算产品j在工站i的生产周期CT_j和在制品水平WIP_j：Calculate the production cycle CT _j of product j at station i and the WIP _j level of work in process:

从而得到产品j在整条串并联生产线的生产周期(逻辑生产周期)CT_j和在制品水平WIP_j：Thus, the production cycle (logical production cycle) CT _j of product j in the entire series-parallel production line and the WIP _j of the work-in-process level are obtained:

步骤2.4：对生产线性能预测模型性能进行评估。Step 2.4: Evaluate the performance of the production line performance prediction model.

步骤2.4.1：计算生产线性能指标F。Step 2.4.1: Calculate the production line performance index F.

如图3，以生产线最佳情形、最差情形和实际最差情形下的WIP-CT和WIP-TH曲线作为标杆划定了性能象限中的“优区”和“劣区”，构成生产线的性能评估图。As shown in Figure 3, the WIP-CT and WIP-TH curves in the best case, the worst case and the actual worst case of the production line are used as benchmarks to delineate the "excellent area" and "inferior area" in the performance quadrant. Performance evaluation graph.

将实际性能点的距离除以最佳情形与实际最差情形标杆之间距离的比值作为性能评估指标，记为F：Divide the distance of the actual performance point by the ratio of the distance between the best case and the actual worst case benchmark as the performance evaluation index, denoted as F:

其中w代表给定实际在制品水平，t代表实际生产周期，T₀表示生产线的理论加工时间，此处T₀＝CT；r_b代表生产线的瓶颈速率，此处r_b＝TH_ij，当且仅当u_ij＝u_max。where w represents a given actual WIP level, t represents the actual production cycle, T ₀ represents the theoretical processing time of the production line, where T ₀ =CT; _{rb represents the bottleneck rate of the production line, where r b} ₌ TH _ij , if and only if u _ij =u _max .

步骤2.4.2：计算生产线效益指标Bf。Step 2.4.2: Calculate the production line benefit index Bf.

考察生产成本，将生产线性能指标F改写为效益指标Bf：Considering the production cost, rewrite the production line performance index F as the benefit index Bf:

Bf＝C*FBf=C*F

其中C为成本因子，c₁为单位设备成本，c₂为单位缓冲区容量成本，c₃为其余固定成本，m₁和b₁分别为当前并联设备数量和缓冲区容量大小，m₀和b₀分别为初始并联设备数量和缓冲区容量大小。where C is the cost factor, c ₁ is the unit equipment cost, c ₂ is the unit buffer capacity cost, c ₃ is the remaining fixed cost, m ₁ and b ₁ are the current number of parallel devices and buffer capacity, respectively, m ₀ and b ₀ is the initial number of parallel devices and the size of the buffer capacity, respectively.

所述步骤3具体为：The step 3 is specifically:

步骤3.1：Morris筛选法灵敏度定性分析。Step 3.1: Qualitative analysis of the sensitivity of the Morris screening method.

选取生产线性能预测模型中的随机参数x，预先设定固定步长C和最大变幅M，以步长C对参数x进行扰动变化，将性能评估指标F的平均变化率作为灵敏度系数S：Select the random parameter x in the performance prediction model of the production line, preset a fixed step size C and the maximum variation M, and use the step size C to perturb the change of the parameter x, and take the average change rate of the performance evaluation index F as the sensitivity coefficient S:

其中，Y₀为参数x初始值对应的性能评估指标F；Y_g、Y_g+1为第g次和第g+1次参数xg扰动变化后的性能评估指标F；P_g、P_g+1分别为第g次、第g+1次参数扰动变化后其值相对于初始值的变化率，n为运算次数。Among them, Y ₀ is the performance evaluation index F corresponding to the initial value of parameter x; Y _g , Y _g+1 are the performance evaluation index F after the gth and g+1th perturbation changes of the parameter xg; P _g , P _g + 1 is the rate of change of its value relative to the initial value after the g-th and g+1-th parameter perturbation changes, respectively, and n is the number of operations.

根据表1的灵敏度分级标准，将较灵敏和高灵敏度系数的参数确定为对半导体封装测试生产线性能影响较大的因素。According to the sensitivity classification standard in Table 1, the parameters with relatively sensitive and high sensitivity coefficients are determined as factors that have a greater impact on the performance of the semiconductor packaging and testing production line.

表1灵敏度分级标准Table 1 Sensitivity grading standard

灵敏度系数绝对值Absolute value of sensitivity coefficient 灵敏度分级Sensitivity classification 0.00≤/S/＜0.050.00≤/S/＜0.05 不灵敏insensitive 0.05≤/S/＜0.200.05≤/S/＜0.20 中等灵敏Moderately sensitive 0.20≤/S/＜1.000.20≤/S/＜1.00 较灵敏more sensitive /S/≥1.00/S/≥1.00 高灵敏High sensitivity

步骤3.2：Arena仿真灵敏度定量分析。Step 3.2: Arena simulation sensitivity quantitative analysis.

在Arena软件中建立半导体芯片封装测试串并联生产线模型。每台设备具有独立的随机加工时间，失效时间和维修时间。A series-parallel production line model for semiconductor chip packaging and testing is established in Arena software. Each piece of equipment has independent random processing time, failure time and maintenance time.

令生产线上的工件到达速率、工站设备加工速率、平均失效前时间m_f、平均修复时间m_p分别服从负指数分布和正态分布，加工批量大小k、缓冲区容量大小b和并联设备数量m均为固定的正整数，且有b＞m＞1，并设置仿真实验预热时间设置、运行总时间和实验重复次数。Let the arrival rate of the workpiece on the production line, the processing rate of the station equipment, the average time before failure m _f , and the average repair time _mp obey the negative exponential distribution and the normal distribution, respectively, the processing batch size k, the buffer capacity size b and the number of parallel equipment m is a fixed positive integer, and there is b>m>1, and set the simulation experiment warm-up time setting, the total running time and the number of repetitions of the experiment.

实验得到生产线总体性能、生产周期CT、生产速率TH和在制品水平WIP关于影响生产线性能的关键因素的变化曲线。The experiment obtains the change curve of the overall performance of the production line, the production cycle CT, the production rate TH and the WIP level of the work-in-process about the key factors affecting the performance of the production line.

所述步骤4具体为：The step 4 is specifically:

步骤4.1：以生产线性能预测模型作为强化学习外界环境，生产线变动性的变化为触发条件，基于事件触发策略与周期触发策略相结合的动态控制方法，建立如图5所示的基于强化学习的半导体芯片封装测试生产线性能控制模型。Step 4.1: Using the production line performance prediction model as the external environment for reinforcement learning, the change in the variability of the production line is the triggering condition, and based on the dynamic control method combining the event-triggered strategy and the cycle-triggered strategy, establish the semiconductor based on reinforcement learning as shown in Figure 5. Chip packaging test production line performance control model.

步骤4.2：初始化Q(s，a)，

a∈A(s)，其中Q值是对长期报酬的反映，S为系统状态集，A(s)为步骤4.2所得关键因素的动作策略集。给定参数学习率因子α和折扣因子γ，确定回报函数r。Step 4.2: Initialize Q(s, a),

a∈A(s), where the Q value is a reflection of long-term rewards, S is the system state set, and A(s) is the action policy set of key factors obtained in step 4.2. Given the parameters learning rate factor α and discount factor γ, determine the reward function r.

步骤4.3：给定起始状态s，并根据ε-贪婪策略在状态s选择动作a。改进ε的取值方式，设为函数：

其中p为算法当前执行部署步数，M为算法总迭代步数，所以随着算法执行步数的增加其值会从初始值0.2逐渐减小。Step 4.3: Given a starting state s, and choose action a in state s according to the ε-greedy policy. Improve the value of ε and set it as a function:

Among them, p is the current number of deployment steps of the algorithm, and M is the total number of iteration steps of the algorithm, so as the number of execution steps of the algorithm increases, its value will gradually decrease from the initial value of 0.2.

步骤4.4：根据ε-贪婪策略在状态s选择动作a，b为a的选择序号，得到回报r和下一个状态s_next，a_next代表下一个动作，更新Q值：Step 4.4: According to the ε-greedy strategy, select the action a in the state s, and b is the selection number of a, get the reward r and the next state s _next , a _next represents the next action, and update the Q value:

s＝s_next，a＝a_next s=s _next , a=a _next

步骤4.5：转向步骤4.4，直到系统趋向稳定状态，也就是收敛状态。Step 4.5: Go to step 4.4 until the system tends to a steady state, that is, a convergent state.

步骤4.6：重复执行步骤4.2到步骤4.5，直到学习周期(算法预先设置的步骤4.2到步骤4.5重复执行的次数)结束则停止迭代。Step 4.6: Repeat steps 4.2 to 4.5 until the learning cycle (the number of repeated executions of steps 4.2 to 4.5 preset by the algorithm) ends, then stop the iteration.

步骤4.7：输出最终策略

并得到生产线性能的指标优化情况。Step 4.7: Output the final policy

And get the index optimization of the production line performance.

本发明建立了更加精确的半导体封装测试串并联生产线性能预测模型，并综合使用Morris筛选法与Arena仿真法开展全局灵敏度定量分析，得到对生产线性能影响最大的若干影响因素及其影响规律，避免了设备马尔科夫状态空间庞大，传统数学模型分析不适用的情况。本发明提出了一种基于Q-learning算法的生产线性能控制模型，在性能预测和灵敏度分析的基础上对生产线变动性因素进行控制，并改进参数ε的取值方式，使得算法收敛速度更快并避免局部最优，同时性能控制方法具有更好的灵活性和实时性。The invention establishes a more accurate performance prediction model of the series-parallel production line for semiconductor packaging and testing, and comprehensively uses the Morris screening method and the Arena simulation method to carry out quantitative analysis of the global sensitivity, and obtains several influencing factors and their influence rules that have the greatest impact on the performance of the production line, thereby avoiding the need for The equipment Markov state space is huge, and the traditional mathematical model analysis is not applicable. The invention proposes a production line performance control model based on the Q-learning algorithm, which controls the variability factors of the production line on the basis of performance prediction and sensitivity analysis, and improves the value method of the parameter ε, so that the algorithm converges faster and is more efficient. Avoid local optima, while the performance control method has better flexibility and real-time performance.

附图说明Description of drawings

图1为本发明的流程图；Fig. 1 is the flow chart of the present invention;

图2为半导体芯片封装测试生产线抽象模型；Figure 2 is an abstract model of a semiconductor chip packaging and testing production line;

图3为现有工厂物理学三大标杆性能评估方法图；Figure 3 is a diagram of the three benchmark performance evaluation methods for existing factory physics;

图4为生产线仿真模型逻辑结构示意图；Fig. 4 is a schematic diagram of the logic structure of a production line simulation model;

图5为实施例基于强化学习的生产线性能控制模型；Fig. 5 is the production line performance control model based on reinforcement learning of the embodiment;

图6为生产线性能关于变动性ca和ce的变化图；Figure 6 is a graph showing the variation of production line performance with respect to variability ca and ce;

图7为不同变动性水平CV1下性能控制前后的生产线性能指标变化情况；Figure 7 shows the changes of production line performance indicators before and after performance control under different variability levels CV1;

图8为不同变动性水平CV2下性能控制前后的生产线性能指标变化情况。Figure 8 shows the changes of production line performance indicators before and after performance control under different variability levels CV2.

具体实施方式Detailed ways

下面结合附图和实施例对本发明做进一步的详细说明，本实施例在以本发明技术方案为前提下进行实施，给出了详细的实施方式和具体的操作过程(图1)，但本发明的保护范围不限于下述的实施例。The present invention will be further described in detail below in conjunction with the accompanying drawings and examples. The scope of protection is not limited to the following examples.

实施例主要可以分为以下几个步骤：The embodiment can be mainly divided into the following steps:

步骤1：半导体芯片封装测试生产线模型抽象：以芯片封装测试生产线作为研究对象，假设工站间存在有限大小的缓冲区，排队规则为先来先服务，将其抽象为包含重入(重工)的多工站串并联排队生产线模型(图2)。Step 1: Model abstraction of semiconductor chip packaging and testing production line: Taking the chip packaging and testing production line as the research object, assuming that there is a buffer of limited size between stations, the queuing rule is first-come-first-served, and abstracting it to include re-entry (heavy industry) Multi-station series-parallel queuing production line model (Figure 2).

步骤2：Step 2:

步骤2.1：变动性计算。Step 2.1: Volatility calculation.

计算到达变动性c_a和加工时间变动性c_e。Arrival variability _ca and processing time variability _ce are calculated.

CT＝CT_q+t_e CT=CT _q + _te

WIP＝CT×THWIP=CT×TH

工件j在工站i的损失率

为：Loss rate of workpiece j at station i

for:

设q_hj为工件j在工站h上的不良品率，Q_ij为工站i监测到的不良品率，其取值范围为0＜h＜i≤s，其中s表示该串并联生产线中工站数量。在工站i上检测并移除的工件j的不良品概率Q_ij为：Let q _hj be the defective product rate of workpiece j on station h, and Q _ij be the defective product rate monitored by station i, and its value range is 0<h<i≤s, where s represents the series-parallel production line. number of stations. The defective product probability Q _ij of workpiece j detected and removed at station i is:

表示生产线中所有带有不良品检测工站编号的集合。

记产品J的瓶颈工站I生产速率为r_b ^IJ＝max(u^ij)。Denote the production rate of the bottleneck station I of product J as r _b ^IJ =max(u ^ij ).

其中r_a代表工件到达工站的速率，k_ij表示工站i的产品j加工批量大小，此时

则

改写CT_q ^ij计算公式：where _{ra represents the rate at which the workpiece arrives at the station, and k ij} _represents the processing batch size of product j at station i. At this time,

but

Rewrite the calculation formula of CT _q ^ij :

Bf＝C*FBf=C*F

步骤3：Step 3:

选取生产线性能预测模型中的某一个参数x，预先设定固定步长C和最大变幅M，以步长C对参数x进行扰动变化，将性能评估指标F的平均变化率作为灵敏度系数S：Select a certain parameter x in the production line performance prediction model, preset a fixed step size C and a maximum variation M, and use the step size C to perturb and change the parameter x, and take the average rate of change of the performance evaluation index F as the sensitivity coefficient S:

其中，Y₀为参数x初始值对应的性能评估指标F；Y_g、Y_g+1为第g次和第g+1次参数x扰动变化后的性能评估指标F；P_g、P_g+1分别为第g次、第g+1次参数扰动变化后其值相对于初始值的变化率，n为运算次数。Among them, Y ₀ is the performance evaluation index F corresponding to the initial value of parameter x; Y _g , Y _g+1 are the performance evaluation index F after the gth and g+1th disturbance changes of parameter x; P _g , P _g + 1 is the rate of change of its value relative to the initial value after the g-th and g+1-th parameter perturbation changes, respectively, and n is the number of operations.

表1为Morris筛选法所得性能评估指标F对于不同参数的灵敏度系数。Table 1 shows the sensitivity coefficients of the performance evaluation index F obtained by the Morris screening method for different parameters.

表1指标F的灵敏度系数STable 1 Sensitivity coefficient S of index F

参数名parameter name 单位unit 参数意义Parameter meaning 灵敏度系数SSensitivity coefficient S uu ％% 利用率Utilization 1.2421.242 r0r0 件/分钟Pieces/min 投料速率Feed rate -0.163-0.163 rara 件/分钟Pieces/min 生产速率production rate 0.6220.622 kk 件piece 加工批量大小Processing batch size 0.4780.478 caca // 工件到达时间变动性Workpiece arrival time variability 0.3500.350 cece // 加工变动性Process variability 0.4570.457 mm 台tower 设备并联数量Number of devices connected in parallel -1.134-1.134 AA ％% 设备可用率equipment availability -0.104-0.104 bb 件piece 缓冲区容量大小buffer size 0.5810.581 QQ ％% 工件不良品率Workpiece defective rate -0.029-0.029

根据表2的灵敏度分级标准以及参数间的关系，将并联设备数量m、加工批量大小k、工件到达时间变动性c_a、加工变动性c_e和缓冲区容量大小b确定为对半导体封装测试生产线性能影响较大的因素。According to the sensitivity classification standard and the relationship between the parameters in Table 2, the number of parallel devices m, the processing batch size k, the workpiece arrival time variability c _a , the processing variability c _e and the buffer capacity size b are determined as the semiconductor packaging test production line Factors that have a greater impact on performance.

表2灵敏度分级标准Table 2 Sensitivity grading standard

在Arena软件中建立半导体芯片封装测试串并联生产线模型，如图4。每台设备具有独立的随机加工时间，失效时间和维修时间。A series-parallel production line model for semiconductor chip packaging and testing is established in Arena software, as shown in Figure 4. Each piece of equipment has independent random processing time, failure time and maintenance time.

令生产线上的工件到达速率、工站设备加工速率、平均失效前时间m_f、平均修复时间m_p分别服从负指数分布和正态分布，加工批量大小k、缓冲区容量大小b和并联设备数量m均为固定的正整数，且有b＞m＞1，仿真实验预热时间设置为600分钟，运行总时间设置为1200分钟，重复3次试验。Let the arrival rate of the workpiece on the production line, the processing rate of the station equipment, the average time before failure m _f , and the average repair time _mp obey the negative exponential distribution and the normal distribution, respectively, the processing batch size k, the buffer capacity size b and the number of parallel equipment m is a fixed positive integer, and there is b>m>1, the preheating time of the simulation experiment is set to 600 minutes, the total running time is set to 1200 minutes, and the experiment is repeated 3 times.

实验得到生产线总体性能、生产周期CT、生产速率TH和在制品水平WIP关于影响生产线性能的关键因素的变化曲线。如图6所示，为生产线性能关于时间变动性c_a和加工变动性c_e的变化图。The experiment obtains the change curve of the overall performance of the production line, the production cycle CT, the production rate TH and the WIP level of the work-in-process about the key factors affecting the performance of the production line. As shown in FIG. 6 , it is _a graph showing changes in line performance with respect to time variability ca and process variability _ce .

步骤4：Step 4:

步骤4.1：以生产线性能预测模型作为强化学习外界环境，以生产线变动性的变化为触发条件，基于事件触发策略与周期触发策略相结合的动态控制方法，建立如图5所示的基于强化学习的半导体芯片封装测试生产线性能控制模型。Step 4.1: Take the production line performance prediction model as the external environment for reinforcement learning, take the change of the variability of the production line as the triggering condition, and establish a dynamic control method based on the combination of the event trigger strategy and the cycle trigger strategy, as shown in Figure 5. Semiconductor chip packaging and testing production line performance control model.

步骤4.2：初始化Q(s，a)，

a∈A(s)，其中Q值是对长期报酬的反映，S为系统状态集。划分方式如表3所示：Step 4.2: Initialize Q(s, a),

a∈A(s), where the Q value is a reflection of long-term rewards and S is the system state set. The division method is shown in Table 3:

表3系统状态集S划分Table 3 Division of system state set S

系统状态system status 划分依据Division basis 系统状态system status 划分依据Division basis s1s1 0≤Bf≤0.10≤Bf≤0.1 s2s2 0.1＜Bf≤0.20.1＜Bf≤0.2 s3s3 0.2＜Bf≤0.30.2＜Bf≤0.3 s4s4 0.3＜Bf≤0.40.3＜Bf≤0.4 s5s5 0.4＜Bf≤050.4＜Bf≤05 s6s6 0.5＜Bf≤0.60.5＜Bf≤0.6 s7s7 0.6＜Bf≤0.70.6＜Bf≤0.7 s8s8 0.7＜Bf≤0.80.7＜Bf≤0.8 s9s9 0.8＜Bf≤0.90.8＜Bf≤0.9 s10s10 0.9＜Bf≤1.00.9＜Bf≤1.0 s11s11 Bf≥1.0Bf≥1.0

A(s)为动作策略集，A(s)：{a1:工站i并联设备数量+1,a2:工站i并联设备数量-1,a3:工站i缓冲区容量+1,a4:工站i缓冲区容量-1,a5:产品j加工批量大小+1,a6:产品j加工批量大小-1}。设参数学习率因子α为0.1，折扣因子γ为0.9，确定回报函数r如下，Bf_pre代表生产线上一次优化后的效益指标：A(s) is the action strategy set, A(s): {a1: the number of parallel devices in station i +1, a2: the number of parallel devices in station i -1, a3: the buffer capacity of station i +1, a4: Station i buffer capacity -1, a5: product j processing batch size +1, a6: product j processing batch size -1}. Let the parameter learning rate factor α be 0.1 and the discount factor γ to be 0.9, determine the reward function r as follows, Bf _pre represents the benefit index after the first optimization of the production line:

步骤4.3：给定起始状态s，并根据ε-贪婪策略在状态s选择动作a。Step 4.3: Given a starting state s, and choose action a in state s according to the ε-greedy policy.

s＝s_next，a＝a_next s=s _next , a=a _next

步骤4.7：输出最终策略

并得到生产线性能的指标优化情况。图7和图8分别为不同变动性水平CV1和CV2下性能控制前后的生产线性能指标变化情况。Step 4.7: Output the final policy

And get the index optimization of the production line performance. Figures 7 and 8 show the changes of production line performance indicators before and after performance control under different variability levels CV1 and CV2, respectively.

综上所述，本发明建立了更加精确的半导体封装测试串并联生产线性能预测模型，综合使用Morris筛选法与Arena仿真法开展全局灵敏度定量分析，得到对生产线性能影响最大的若干影响因素及其影响规律，避免了设备马尔科夫状态空间庞大，传统数学模型分析不适用的情况；并改进参数ε的取值方式，使得算法收敛速度更快并避免局部最优，同时具有更好的灵活性和实时性。To sum up, the present invention establishes a more accurate model for predicting the performance of a series-parallel production line for semiconductor packaging and testing, comprehensively uses the Morris screening method and the Arena simulation method to carry out quantitative analysis of global sensitivity, and obtains several influencing factors that have the greatest impact on the performance of the production line and their effects. It avoids the situation where the equipment Markov state space is huge and the traditional mathematical model analysis is not applicable; and the value method of the parameter ε is improved to make the algorithm converge faster and avoid local optimization, and at the same time have better flexibility and real-time.

Claims

1. A chip packaging test production line performance control method based on Q-learning reinforcement learning comprises the following steps:

step 1: and constructing an abstract model of a serial-parallel production line for semiconductor chip packaging test.

Step 2: and (3) establishing a prediction model of the series-parallel production line performance of the semiconductor chip packaging test based on the abstract model of the production line constructed in the step (1).

And step 3: and (3) based on the production line abstract model constructed in the step (1), obtaining an influence mechanism of key variable factors on the production line performance according to Morris screening legal analysis and Arena simulation quantitative analysis.

And 4, step 4: and (3) establishing a performance control model based on the prediction model established in the step (2) and the key variability analysis obtained in the step (3), and performing iterative solution by taking the optimal production line benefit index as a performance control target to obtain a global optimal performance control strategy.

2. The method for controlling the performance of the chip packaging test production line based on Q-learning of claim 1, wherein:

the step 1 specifically comprises the following steps: the subsequent procedure of a semiconductor production manufacturing production line, namely a chip packaging test production line, is taken as a research object, and if a limited buffer area exists between work stations, a queuing rule is firstly provided and firstly serviced, and the queuing rule is abstracted to a multi-station serial-parallel queuing production line model containing reentry.

3. The method as claimed in claim 1, wherein the step 2 is specifically as follows:

step 2.1: calculating the mobility: calculating the arrival variability c_aAnd variability during processing c_e；

Step 2.2: determining basic performance prediction indexes;

from the mean processing time CT of the workpieces at the queue_qAnd effective processing time t_eObtaining the average time CT (computed tomography) of residence in a work station, namely a production period; further calculating to obtain average work-in-process level WIP at a work station, and taking the work piece production rate TH, the production period CT and the work-in-process level WIP as basic indexes of production line performance prediction;

CT＝CT_q+t_e

WIP＝CT×TH

step 2.3: establishing a production line performance prediction model;

step 2.3.1: calculating the queuing time of the product j at the station i:

wherein c is_a ^ij、c_e ^ijThe arrival variability and the processing time variability of the product j at the station i, u^ijFor the utilization of station i, m^ijFor the number of I parallel devices of a station，t_e ^ijThe effective processing time of the product j at the station i;

step 2.3.2: calculating the production rate TH of the workpiece;

in the construction station i, there is m^ijA parallel device, b is the capacity of a buffer area in front of the station i, k is the number of workpieces being processed by the station i, b>m>1; if k is more than or equal to 0 and less than or equal to b, the probability p of the workpiece j without waiting before the station i to be processed₀Where 0 < j < r, r denotes the amount of product co-processed in the production line:

blocking probability of workpiece j with capacity of buffer area being b

Comprises the following steps:

let q_hjThe defective rate, Q, of the workpiece j on the station h_ijThe value range of the defective product rate monitored by the work station i is more than 0 h and more than i and less than or equal to s, wherein s represents the number of the work stations in the series-parallel production line, and the defective product probability Q of the workpiece j detected and removed on the work station i_ijComprises the following steps:

representing all the sets with the numbers of the defective product detection stations in the production line;

the production rate TH of the workpiece j at the station i_ijComprises the following steps:

when the utilization rate of a certain work station is maximum, the register station I is the bottleneck work station of the product J, and the production rate is recorded as r_b ^IJ＝max(u^ij)；

Step 2.3.3: calculating production cycle CT of production line_jAnd WIP at work-in-process level_j；

Calculating the average waiting batch time WTBT of the workpieces:

wherein r is_aRepresenting the rate at which the workpiece arrives at the station, where k_ijThe size of the processing batch of the product j of the work station i is shown, at the moment

Then

Rewriting CT_q ^ijCalculating the formula:

calculating the production period CT of the product j at the station i_jAnd WIP at work-in-process level_j：

Thereby obtaining the production period CT of the product j on the whole series-parallel production line_jAnd WIP at work-in-process level_j：

Step 2.4: evaluating the performance of the production line performance prediction model;

step 2.4.1: calculating a production line performance index F;

defining a good area and a bad area in a performance quadrant by taking WIP-CT and WIP-TH curves of the production line under the best condition, the worst condition and the actual worst condition as benchmarks to form a performance evaluation graph of the production line;

and taking the ratio of the distance of the actual performance point divided by the distance between the best case and the actual worst case benchmarks as a performance evaluation index, and recording as F:

where w represents the actual work-in-process level given, T represents the actual production cycle, T₀Represents the theoretical processing time of the production line, here T₀＝CT；r_bRepresents the bottleneck rate of the production line, where r_b＝TH_ijIf and only if u_ij＝u_max；

Step 2.4.2: calculating a production line benefit index Bf;

and (3) inspecting the production cost, and rewriting the production line performance index F into a benefit index Bf:

Bf＝C*F

wherein C is a cost factor, C₁As unit cost of equipment, c₂Cost per unit buffer capacity, c₃M is the remaining fixed cost₁And b₁Respectively the current number of parallel devices and the buffer capacity, m₀And b₀The initial parallel device number and the buffer capacity are respectively.

4. The method for controlling the performance of the chip packaging test production line based on the Q-learning reinforcement learning of claim 1, wherein the step 3 specifically comprises:

step 3.1: carrying out sensitivity qualitative analysis by a Morris screening method;

selecting a random parameter x in a production line performance prediction model, presetting a fixed step length C and a maximum amplitude M, carrying out disturbance change on the parameter x by using the step length C, and taking the average change rate of a performance evaluation index F as a sensitivity coefficient S:

wherein, Y₀A performance evaluation index F corresponding to the initial value of the parameter x; y is_g、Y_g+1Is the parameter x at the g th and the g +1 th times_gA performance evaluation index F after disturbance change; p_g、P_g+1 is the change rate of the value of the parameter after the parameter is disturbed and changed for the g-th time and the g + 1-th time respectively relative to the initial value, and n is the operation times;

according to the sensitivity grading standard, determining parameters with more sensitivity and high sensitivity coefficient as factors which have larger influence on the performance of the semiconductor packaging test production line; the sensitivity grading standard according to the absolute value of the sensitivity coefficient comprises the following steps: the sensitivity is not less than 0.00/S/< 0.05, the sensitivity is medium-sensitive less than 0.05/S/< 0.20, the sensitivity is more sensitive less than 0.20/S/< 1.00, and the sensitivity is high more than or equal to 1.00;

step 3.2: quantitative analysis of Arena simulation sensitivity;

establishing a semiconductor chip packaging test series-parallel production line model in Arena software, wherein each device has independent random processing time, failure time and maintenance time;

the arrival rate of the workpieces on the production line, the processing rate of the station equipment and the average time m before failure_fAverage repair time m_pRespectively obeying negative exponential distribution and normal distribution, wherein the processing batch size k, the buffer area capacity size b and the number m of parallel devices are fixed positive integers, b is more than m and is more than 1, and the preheating time setting and operation of the simulation experiment are setTotal time and number of experimental replicates;

the experiment results in the change curves of the overall performance of the production line, the production period CT, the production rate TH and the work-in-process level WIP with respect to key factors influencing the performance of the production line.

5. The method for controlling the performance of the chip packaging test production line based on the Q-learning reinforcement learning of claim 1, wherein the step 4 specifically comprises:

step 4.1: establishing a semiconductor chip packaging test production line performance control model based on reinforcement learning by taking the production line performance prediction model as a reinforcement learning external environment and the change of production line variability as a trigger condition based on a dynamic control method combining an event trigger strategy and a periodic trigger strategy;

step 4.2: q (s, a) is initialized,

a belongs to A (S), wherein the value of Q is the reflection of long-term remuneration, S is a system state set, and A (S) is an action strategy set of key factors obtained in the step 4.2; giving a parameter learning rate factor alpha and a discount factor gamma, and determining a return function r;

step 4.3: giving an initial state s, and selecting an action a in the state s according to a greedy strategy; the improved value taking mode is set as a function:

wherein p is the current deployment step number of the algorithm, and M is the total iteration step number of the algorithm;

step 4.4: selecting the selection sequence number of the action a and b as a in the state s according to a greedy strategy to obtain a return r and a next state s_nexts，a_nextRepresenting the next action, update the Q value:

S＝S_next，a＝a_next

step 4.5: turning to step 4.4, until the system tends to a steady state, i.e. a convergence state;

step 4.6: repeatedly executing the step 4.2 to the step 4.5 until the learning period, namely the number of times of repeated execution of the step 4.2 to the step 4.5 preset by the algorithm, is over, and stopping iteration;

step 4.7: output final strategy

And obtaining the index optimization condition of the production line performance.