+

CN111857081A - Performance control method of chip packaging and testing production line based on Q-learning reinforcement learning - Google Patents

Performance control method of chip packaging and testing production line based on Q-learning reinforcement learning Download PDF

Info

Publication number
CN111857081A
CN111857081A CN202010797879.2A CN202010797879A CN111857081A CN 111857081 A CN111857081 A CN 111857081A CN 202010797879 A CN202010797879 A CN 202010797879A CN 111857081 A CN111857081 A CN 111857081A
Authority
CN
China
Prior art keywords
production line
performance
station
production
rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010797879.2A
Other languages
Chinese (zh)
Other versions
CN111857081B (en
Inventor
李波
冯益铭
钱鑫森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010797879.2A priority Critical patent/CN111857081B/en
Publication of CN111857081A publication Critical patent/CN111857081A/en
Application granted granted Critical
Publication of CN111857081B publication Critical patent/CN111857081B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
    • G05B19/41885Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by modeling, simulation of the manufacturing system
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32339Object oriented modeling, design, analysis, implementation, simulation language
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Manufacturing & Machinery (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • General Factory Administration (AREA)

Abstract

本发明涉及半导体芯片封装测试生产线性能控制与优化领域,具体为一种基于Q‑learning强化学习的芯片封装测试生产线性能控制方法。本发明建立了更加精确的半导体封装测试串并联生产线性能预测模型,并综合使用Morris筛选法与Arena仿真法开展全局灵敏度定量分析,得到对生产线性能影响最大的若干影响因素及其影响规律,避免了设备马尔科夫状态空间庞大,传统数学模型分析不适用的情况。本发明在性能预测和灵敏度分析的基础上对生产线变动性因素进行控制,并改进参数ε的取值方式,使得算法收敛速度更快并避免局部最优,同时性能控制方法具有更好的灵活性和实时性。

Figure 202010797879

The invention relates to the field of performance control and optimization of a semiconductor chip packaging and testing production line, in particular to a performance control method for a chip packaging and testing production line based on Q-learning reinforcement learning. The invention establishes a more accurate performance prediction model of the series-parallel production line for semiconductor packaging and testing, and comprehensively uses the Morris screening method and the Arena simulation method to carry out quantitative analysis of the global sensitivity, and obtains several influencing factors and their influence rules that have the greatest impact on the performance of the production line, thereby avoiding the need for The equipment Markov state space is huge, and the traditional mathematical model analysis is not applicable. The invention controls the variability factors of the production line on the basis of performance prediction and sensitivity analysis, and improves the value method of the parameter ε, so that the algorithm converges faster and avoids local optimization, and the performance control method has better flexibility. and real-time.

Figure 202010797879

Description

基于Q-learning强化学习的芯片封装测试生产线性能控制 方法Performance control of chip packaging and testing production line based on Q-learning reinforcement learning method

技术领域technical field

本发明涉及半导体芯片封装测试生产线性能控制与优化领域,具体是面向半导体芯片封装测试生产线的,涉及一种结合灵敏度分析和Q-learning强化学习算法的性能控制方法。The invention relates to the field of performance control and optimization of a semiconductor chip packaging and testing production line, in particular to a semiconductor chip packaging and testing production line, and relates to a performance control method combining sensitivity analysis and Q-learning reinforcement learning algorithm.

背景技术Background technique

半导体制造业对国民经济的发展具有巨大的战略价值,为保持我国半导体制造业良好发展,除了扩大生产规模,还需关注制造系统的生产效率,加强生产管理控制技术。由于半导体制造系统具有工艺路径高度重入、生产过程高度复杂、制造周期漫长、系统规模庞大及高度不确定性等生产特点,对生产线进行性能控制难度较大。缓冲区容量大小、设备突发故障、设备预防性维护、产品重工等多种变动性因素也大大影响了制造系统的生产性能,导致生产效率降低,生产周期延长,影响生产计划的正常执行。The semiconductor manufacturing industry has great strategic value for the development of the national economy. In order to maintain the good development of my country's semiconductor manufacturing industry, in addition to expanding the production scale, it is also necessary to pay attention to the production efficiency of the manufacturing system and strengthen the production management and control technology. Due to the production characteristics of semiconductor manufacturing systems such as highly re-entrant process paths, highly complex production processes, long manufacturing cycles, large system scale and high uncertainty, it is difficult to control the performance of the production line. A variety of variable factors such as buffer capacity, sudden equipment failure, equipment preventive maintenance, and product rework also greatly affect the production performance of the manufacturing system, resulting in reduced production efficiency, prolonged production cycle, and impact on the normal execution of production plans.

当前对生产线性能进行智能、全面、动态控制的研究较少,大多局限于生产线变动性的某一方面,未能全局地考察生产线上的多种变动性因素;当前研究中建立的半导体串并联生产线性能预测模型与实际生产情况存在一定偏差,精确度有所欠缺;传统的性能控制优化方法难以针对生产线变动性因素的变化进行实时控制,灵活性不足。At present, there are few studies on intelligent, comprehensive and dynamic control of production line performance, most of which are limited to one aspect of production line variability, and fail to comprehensively examine various variable factors on the production line; the semiconductor series-parallel production line established in the current research There is a certain deviation between the performance prediction model and the actual production situation, and the accuracy is lacking; the traditional performance control optimization method is difficult to control the changes of the production line variability factors in real time, and the flexibility is insufficient.

发明内容SUMMARY OF THE INVENTION

针对现有半导体芯片封装测试生产线性能控制模型与策略的不足,本发明提出了一种基于Q-learning强化学习的芯片封装测试生产线性能控制方法。本发明方法针对现有的变动性因素响应不及时、变动性因素考虑不周全、控制策略存在冲突等问题,结合灵敏度分析和Q-learning强化学习算法对半导体芯片封装测试生产线制造性能进行智能控制。Aiming at the deficiencies of the existing semiconductor chip packaging and testing production line performance control models and strategies, the present invention proposes a performance control method for chip packaging and testing production lines based on Q-learning reinforcement learning. Aiming at the problems of untimely response to variable factors, incomplete consideration of variable factors, and conflicting control strategies, the method of the invention combines sensitivity analysis and Q-learning reinforcement learning algorithm to intelligently control the manufacturing performance of semiconductor chip packaging and testing production lines.

一种基于Q-learning强化学习的芯片封装测试生产线性能控制方法,包括以下步骤:A method for controlling the performance of a chip packaging and testing production line based on Q-learning reinforcement learning, comprising the following steps:

步骤1:构建半导体芯片封装测试串并联生产线抽象模型;Step 1: Build an abstract model of a series-parallel production line for semiconductor chip packaging and testing;

步骤2:基于步骤1构建的生产线抽象模型,建立半导体芯片封装测试串并联生产线性能的预测模型;Step 2: Based on the abstract model of the production line constructed in Step 1, establish a prediction model for the performance of the series-parallel production line for semiconductor chip packaging and testing;

步骤3:基于步骤1构建的生产线抽象模型,根据Morris筛选法定性分析与Arena仿真定量分析,得到关键变动性因素对生产线性能的影响机制;Step 3: Based on the abstract model of the production line constructed in Step 1, according to the qualitative analysis of the Morris screening method and the quantitative analysis of the Arena simulation, the impact mechanism of the key variable factors on the performance of the production line is obtained;

步骤4:基于步骤2建立的半导体芯片封装测试串并联生产线性能的预测模型和步骤3所得的关键变动性分析,建立基于Q-learning强化学习算法的性能控制模型,以生产线效益指标最优为性能控制目标进行迭代求解,得到全局的最优性能控制策略。Step 4: Based on the prediction model of the performance of the series-parallel production line for semiconductor chip packaging and testing established in Step 2 and the key variability analysis obtained in Step 3, establish a performance control model based on the Q-learning reinforcement learning algorithm, and take the production line benefit index as the best performance The control objective is iteratively solved, and the global optimal performance control strategy is obtained.

所述的步骤1具体为:The step 1 is specifically:

半导体芯片封装测试生产线模型抽象:以半导体生产制造产线后道工序,即芯片封装测试生产线作为研究对象,假设工站间存在有限缓冲区,排队规则为先来先服务,将其抽象为包含重入(重工)的多工站串并联排队生产线模型。Model abstraction of semiconductor chip packaging and testing production line: Taking the back-end process of the semiconductor manufacturing production line, that is, the chip packaging and testing production line as the research object, it is assumed that there is a limited buffer between the stations, and the queuing rule is first-come, first-served, and it is abstracted to include heavy Multi-station serial-parallel queuing production line model for input (heavy industry).

所述的步骤2具体为:The step 2 is specifically:

步骤2.1:变动性计算:计算到达变动性ca和加工时间变动性ceStep 2.1: Variability calculation: Calculate arrival variability ca and processing time variability ce .

步骤2.2:确定性能预测基本指标。Step 2.2: Determine the basic indicators of performance prediction.

由工件在队列处的平均加工时间CTq和有效加工时间te得到驻留于工站的平均时间CT(生产周期),进一步计算得到工站处平均在制品水平WIP,将工件生产速率TH、生产周期CT、在制品水平WIP作为生产线性能预测基本指标。From the average processing time CT q and the effective processing time t e of the workpiece at the queue, the average time CT (production cycle) resident at the station is obtained, and the average work-in-process level WIP at the station is further calculated. The workpiece production rate TH, The production cycle CT and the WIP level of the work-in-progress are used as the basic indicators for the performance prediction of the production line.

CT=CTq+te CT=CT q + te

WIP=CT×THWIP=CT×TH

步骤2.3:建立生产线性能预测模型。Step 2.3: Build a production line performance prediction model.

步骤2.3.1:计算产品j在工站i的排队时间:Step 2.3.1: Calculate the queuing time of product j at station i:

Figure BDA0002626324590000021
Figure BDA0002626324590000021

其中ca ij、ce ij分别为产品j在工站i的到达变动性和加工时间变动性,uij为工站i的利用率,mij为工站i并联设备数量,te ij为产品j在工站i的有效加工时间。where c a ij and c e ij are the arrival variability and processing time variability of product j at station i, respectively, u ij is the utilization rate of station i, m ij is the number of parallel devices at station i, and t e ij is Effective processing time of product j at station i.

步骤2.3.2:计算工件生产速率TH。Step 2.3.2: Calculate the workpiece production rate TH.

假设工站i中有mij(b>m>1)台并联设备,b为工站i前缓冲区容量大小,k为工站i正在加工工件数,若有0≤k≤b,工站i前无等待的工件j(0<j<r,r表示生产线中一共加工的产品数量)加工时的概率p0为:Assuming that there are m ij (b>m>1) parallel devices in station i, b is the buffer capacity in front of station i, k is the number of workpieces being processed by station i, if there is 0≤k≤b, station i The probability p 0 of the workpiece j (0<j<r, r represents the total number of products processed in the production line) without waiting before i is:

Figure BDA0002626324590000022
Figure BDA0002626324590000022

工件j在缓存区容量大小为b的阻塞概率

Figure BDA0002626324590000023
为:Blocking probability of workpiece j in buffer size b
Figure BDA0002626324590000023
for:

Figure BDA0002626324590000024
Figure BDA0002626324590000024

设qhj为工件j在工站h上的不良品率,Qij为工站i监测到的不良品率,其取值范围为0<h<i≤s,其中s表示该串并联生产线中工站数量,则在工站i上检测并移除的工件j的不良品概率Qij为:Let q hj be the defective product rate of workpiece j on station h, and Q ij be the defective product rate monitored by station i, and its value range is 0<h<i≤s, where s represents the series-parallel production line. the number of stations, then the defective product probability Q ij of workpiece j detected and removed at station i is:

Figure BDA0002626324590000031
Figure BDA0002626324590000031

Figure BDA0002626324590000039
表示生产线中所有带有不良品检测工站编号的集合。
Figure BDA0002626324590000039
Indicates the collection of all stations with defective product inspection station numbers in the production line.

则工件j在工站i的生产速率THij为:Then the production rate TH ij of workpiece j at station i is:

Figure BDA0002626324590000032
Figure BDA0002626324590000032

当某工站利用率为最大时,记工站I为产品J的瓶颈工站,生产速率记为rb IJ=max(uij)。When the utilization rate of a certain station is the maximum, station I is denoted as the bottleneck station of product J, and the production rate is denoted as r b IJ =max(u ij ).

步骤2.3.3:计算生产线的生产周期(逻辑生产周期)CTj和在制品水平WIPjStep 2.3.3: Calculate the production line's production cycle (logical production cycle) CT j and the work-in-process level WIP j .

计算工件平均等待成批时间WTBT:Calculate the average waiting batch time WTBT for workpieces:

Figure BDA0002626324590000033
Figure BDA0002626324590000033

其中ra代表工件到达工站的速率,其中kij表示工站i的产品j加工批量大小,此时

Figure BDA0002626324590000034
Figure BDA0002626324590000035
改写CTq ij计算公式:where ra represents the rate at which the workpiece arrives at the station, and k ij represents the batch size of the product j at the station i. At this time,
Figure BDA0002626324590000034
but
Figure BDA0002626324590000035
Rewrite the calculation formula of CT q ij :

Figure BDA0002626324590000036
Figure BDA0002626324590000036

计算产品j在工站i的生产周期CTj和在制品水平WIPjCalculate the production cycle CT j of product j at station i and the WIP j level of work in process:

Figure BDA0002626324590000037
Figure BDA0002626324590000037

Figure BDA0002626324590000038
Figure BDA0002626324590000038

从而得到产品j在整条串并联生产线的生产周期(逻辑生产周期)CTj和在制品水平WIPjThus, the production cycle (logical production cycle) CT j of product j in the entire series-parallel production line and the WIP j of the work-in-process level are obtained:

Figure BDA0002626324590000041
Figure BDA0002626324590000041

Figure BDA0002626324590000042
Figure BDA0002626324590000042

步骤2.4:对生产线性能预测模型性能进行评估。Step 2.4: Evaluate the performance of the production line performance prediction model.

步骤2.4.1:计算生产线性能指标F。Step 2.4.1: Calculate the production line performance index F.

如图3,以生产线最佳情形、最差情形和实际最差情形下的WIP-CT和WIP-TH曲线作为标杆划定了性能象限中的“优区”和“劣区”,构成生产线的性能评估图。As shown in Figure 3, the WIP-CT and WIP-TH curves in the best case, the worst case and the actual worst case of the production line are used as benchmarks to delineate the "excellent area" and "inferior area" in the performance quadrant. Performance evaluation graph.

将实际性能点的距离除以最佳情形与实际最差情形标杆之间距离的比值作为性能评估指标,记为F:Divide the distance of the actual performance point by the ratio of the distance between the best case and the actual worst case benchmark as the performance evaluation index, denoted as F:

Figure BDA0002626324590000043
Figure BDA0002626324590000043

其中w代表给定实际在制品水平,t代表实际生产周期,T0表示生产线的理论加工时间,此处T0=CT;rb代表生产线的瓶颈速率,此处rb=THij,当且仅当uij=umaxwhere w represents a given actual WIP level, t represents the actual production cycle, T 0 represents the theoretical processing time of the production line, where T 0 =CT; rb represents the bottleneck rate of the production line, where r b = TH ij , if and only if u ij =u max .

步骤2.4.2:计算生产线效益指标Bf。Step 2.4.2: Calculate the production line benefit index Bf.

考察生产成本,将生产线性能指标F改写为效益指标Bf:Considering the production cost, rewrite the production line performance index F as the benefit index Bf:

Bf=C*FBf=C*F

Figure BDA0002626324590000044
Figure BDA0002626324590000044

其中C为成本因子,c1为单位设备成本,c2为单位缓冲区容量成本,c3为其余固定成本,m1和b1分别为当前并联设备数量和缓冲区容量大小,m0和b0分别为初始并联设备数量和缓冲区容量大小。where C is the cost factor, c 1 is the unit equipment cost, c 2 is the unit buffer capacity cost, c 3 is the remaining fixed cost, m 1 and b 1 are the current number of parallel devices and buffer capacity, respectively, m 0 and b 0 is the initial number of parallel devices and the size of the buffer capacity, respectively.

所述步骤3具体为:The step 3 is specifically:

步骤3.1:Morris筛选法灵敏度定性分析。Step 3.1: Qualitative analysis of the sensitivity of the Morris screening method.

选取生产线性能预测模型中的随机参数x,预先设定固定步长C和最大变幅M,以步长C对参数x进行扰动变化,将性能评估指标F的平均变化率作为灵敏度系数S:Select the random parameter x in the performance prediction model of the production line, preset a fixed step size C and the maximum variation M, and use the step size C to perturb the change of the parameter x, and take the average change rate of the performance evaluation index F as the sensitivity coefficient S:

Figure BDA0002626324590000045
Figure BDA0002626324590000045

其中,Y0为参数x初始值对应的性能评估指标F;Yg、Yg+1为第g次和第g+1次参数xg扰动变化后的性能评估指标F;Pg、Pg+1分别为第g次、第g+1次参数扰动变化后其值相对于初始值的变化率,n为运算次数。Among them, Y 0 is the performance evaluation index F corresponding to the initial value of parameter x; Y g , Y g+1 are the performance evaluation index F after the gth and g+1th perturbation changes of the parameter xg; P g , P g + 1 is the rate of change of its value relative to the initial value after the g-th and g+1-th parameter perturbation changes, respectively, and n is the number of operations.

根据表1的灵敏度分级标准,将较灵敏和高灵敏度系数的参数确定为对半导体封装测试生产线性能影响较大的因素。According to the sensitivity classification standard in Table 1, the parameters with relatively sensitive and high sensitivity coefficients are determined as factors that have a greater impact on the performance of the semiconductor packaging and testing production line.

表1灵敏度分级标准Table 1 Sensitivity grading standard

灵敏度系数绝对值Absolute value of sensitivity coefficient 灵敏度分级Sensitivity classification 0.00≤/S/<0.050.00≤/S/<0.05 不灵敏insensitive 0.05≤/S/<0.200.05≤/S/<0.20 中等灵敏Moderately sensitive 0.20≤/S/<1.000.20≤/S/<1.00 较灵敏more sensitive /S/≥1.00/S/≥1.00 高灵敏High sensitivity

步骤3.2:Arena仿真灵敏度定量分析。Step 3.2: Arena simulation sensitivity quantitative analysis.

在Arena软件中建立半导体芯片封装测试串并联生产线模型。每台设备具有独立的随机加工时间,失效时间和维修时间。A series-parallel production line model for semiconductor chip packaging and testing is established in Arena software. Each piece of equipment has independent random processing time, failure time and maintenance time.

令生产线上的工件到达速率、工站设备加工速率、平均失效前时间mf、平均修复时间mp分别服从负指数分布和正态分布,加工批量大小k、缓冲区容量大小b和并联设备数量m均为固定的正整数,且有b>m>1,并设置仿真实验预热时间设置、运行总时间和实验重复次数。Let the arrival rate of the workpiece on the production line, the processing rate of the station equipment, the average time before failure m f , and the average repair time mp obey the negative exponential distribution and the normal distribution, respectively, the processing batch size k, the buffer capacity size b and the number of parallel equipment m is a fixed positive integer, and there is b>m>1, and set the simulation experiment warm-up time setting, the total running time and the number of repetitions of the experiment.

实验得到生产线总体性能、生产周期CT、生产速率TH和在制品水平WIP关于影响生产线性能的关键因素的变化曲线。The experiment obtains the change curve of the overall performance of the production line, the production cycle CT, the production rate TH and the WIP level of the work-in-process about the key factors affecting the performance of the production line.

所述步骤4具体为:The step 4 is specifically:

步骤4.1:以生产线性能预测模型作为强化学习外界环境,生产线变动性的变化为触发条件,基于事件触发策略与周期触发策略相结合的动态控制方法,建立如图5所示的基于强化学习的半导体芯片封装测试生产线性能控制模型。Step 4.1: Using the production line performance prediction model as the external environment for reinforcement learning, the change in the variability of the production line is the triggering condition, and based on the dynamic control method combining the event-triggered strategy and the cycle-triggered strategy, establish the semiconductor based on reinforcement learning as shown in Figure 5. Chip packaging test production line performance control model.

步骤4.2:初始化Q(s,a),

Figure BDA0002626324590000051
a∈A(s),其中Q值是对长期报酬的反映,S为系统状态集,A(s)为步骤4.2所得关键因素的动作策略集。给定参数学习率因子α和折扣因子γ,确定回报函数r。Step 4.2: Initialize Q(s, a),
Figure BDA0002626324590000051
a∈A(s), where the Q value is a reflection of long-term rewards, S is the system state set, and A(s) is the action policy set of key factors obtained in step 4.2. Given the parameters learning rate factor α and discount factor γ, determine the reward function r.

步骤4.3:给定起始状态s,并根据ε-贪婪策略在状态s选择动作a。改进ε的取值方式,设为函数:

Figure BDA0002626324590000052
其中p为算法当前执行部署步数,M为算法总迭代步数,所以随着算法执行步数的增加其值会从初始值0.2逐渐减小。Step 4.3: Given a starting state s, and choose action a in state s according to the ε-greedy policy. Improve the value of ε and set it as a function:
Figure BDA0002626324590000052
Among them, p is the current number of deployment steps of the algorithm, and M is the total number of iteration steps of the algorithm, so as the number of execution steps of the algorithm increases, its value will gradually decrease from the initial value of 0.2.

步骤4.4:根据ε-贪婪策略在状态s选择动作a,b为a的选择序号,得到回报r和下一个状态snext,anext代表下一个动作,更新Q值:Step 4.4: According to the ε-greedy strategy, select the action a in the state s, and b is the selection number of a, get the reward r and the next state s next , a next represents the next action, and update the Q value:

Figure BDA0002626324590000061
Figure BDA0002626324590000061

s=snext,a=anext s=s next , a=a next

步骤4.5:转向步骤4.4,直到系统趋向稳定状态,也就是收敛状态。Step 4.5: Go to step 4.4 until the system tends to a steady state, that is, a convergent state.

步骤4.6:重复执行步骤4.2到步骤4.5,直到学习周期(算法预先设置的步骤4.2到步骤4.5重复执行的次数)结束则停止迭代。Step 4.6: Repeat steps 4.2 to 4.5 until the learning cycle (the number of repeated executions of steps 4.2 to 4.5 preset by the algorithm) ends, then stop the iteration.

步骤4.7:输出最终策略

Figure BDA0002626324590000062
并得到生产线性能的指标优化情况。Step 4.7: Output the final policy
Figure BDA0002626324590000062
And get the index optimization of the production line performance.

本发明建立了更加精确的半导体封装测试串并联生产线性能预测模型,并综合使用Morris筛选法与Arena仿真法开展全局灵敏度定量分析,得到对生产线性能影响最大的若干影响因素及其影响规律,避免了设备马尔科夫状态空间庞大,传统数学模型分析不适用的情况。本发明提出了一种基于Q-learning算法的生产线性能控制模型,在性能预测和灵敏度分析的基础上对生产线变动性因素进行控制,并改进参数ε的取值方式,使得算法收敛速度更快并避免局部最优,同时性能控制方法具有更好的灵活性和实时性。The invention establishes a more accurate performance prediction model of the series-parallel production line for semiconductor packaging and testing, and comprehensively uses the Morris screening method and the Arena simulation method to carry out quantitative analysis of the global sensitivity, and obtains several influencing factors and their influence rules that have the greatest impact on the performance of the production line, thereby avoiding the need for The equipment Markov state space is huge, and the traditional mathematical model analysis is not applicable. The invention proposes a production line performance control model based on the Q-learning algorithm, which controls the variability factors of the production line on the basis of performance prediction and sensitivity analysis, and improves the value method of the parameter ε, so that the algorithm converges faster and is more efficient. Avoid local optima, while the performance control method has better flexibility and real-time performance.

附图说明Description of drawings

图1为本发明的流程图;Fig. 1 is the flow chart of the present invention;

图2为半导体芯片封装测试生产线抽象模型;Figure 2 is an abstract model of a semiconductor chip packaging and testing production line;

图3为现有工厂物理学三大标杆性能评估方法图;Figure 3 is a diagram of the three benchmark performance evaluation methods for existing factory physics;

图4为生产线仿真模型逻辑结构示意图;Fig. 4 is a schematic diagram of the logic structure of a production line simulation model;

图5为实施例基于强化学习的生产线性能控制模型;Fig. 5 is the production line performance control model based on reinforcement learning of the embodiment;

图6为生产线性能关于变动性ca和ce的变化图;Figure 6 is a graph showing the variation of production line performance with respect to variability ca and ce;

图7为不同变动性水平CV1下性能控制前后的生产线性能指标变化情况;Figure 7 shows the changes of production line performance indicators before and after performance control under different variability levels CV1;

图8为不同变动性水平CV2下性能控制前后的生产线性能指标变化情况。Figure 8 shows the changes of production line performance indicators before and after performance control under different variability levels CV2.

具体实施方式Detailed ways

下面结合附图和实施例对本发明做进一步的详细说明,本实施例在以本发明技术方案为前提下进行实施,给出了详细的实施方式和具体的操作过程(图1),但本发明的保护范围不限于下述的实施例。The present invention will be further described in detail below in conjunction with the accompanying drawings and examples. The scope of protection is not limited to the following examples.

实施例主要可以分为以下几个步骤:The embodiment can be mainly divided into the following steps:

步骤1:半导体芯片封装测试生产线模型抽象:以芯片封装测试生产线作为研究对象,假设工站间存在有限大小的缓冲区,排队规则为先来先服务,将其抽象为包含重入(重工)的多工站串并联排队生产线模型(图2)。Step 1: Model abstraction of semiconductor chip packaging and testing production line: Taking the chip packaging and testing production line as the research object, assuming that there is a buffer of limited size between stations, the queuing rule is first-come-first-served, and abstracting it to include re-entry (heavy industry) Multi-station series-parallel queuing production line model (Figure 2).

步骤2:Step 2:

步骤2.1:变动性计算。Step 2.1: Volatility calculation.

计算到达变动性ca和加工时间变动性ceArrival variability ca and processing time variability ce are calculated.

步骤2.2:确定性能预测基本指标。Step 2.2: Determine the basic indicators of performance prediction.

由工件在队列处的平均加工时间CTq和有效加工时间te得到驻留于工站的平均时间CT(生产周期),进一步计算得到工站处平均在制品水平WIP,将工件生产速率TH、生产周期CT、在制品水平WIP作为生产线性能预测基本指标。From the average processing time CT q and the effective processing time t e of the workpiece at the queue, the average time CT (production cycle) resident at the station is obtained, and the average work-in-process level WIP at the station is further calculated. The workpiece production rate TH, The production cycle CT and the WIP level of the work-in-progress are used as the basic indicators for the performance prediction of the production line.

CT=CTq+te CT=CT q + te

WIP=CT×THWIP=CT×TH

步骤2.3:建立生产线性能预测模型。Step 2.3: Build a production line performance prediction model.

步骤2.3.1:计算产品j在工站i的排队时间:Step 2.3.1: Calculate the queuing time of product j at station i:

Figure BDA0002626324590000071
Figure BDA0002626324590000071

其中ca ij、ce ij分别为产品j在工站i的到达变动性和加工时间变动性,uij为工站i的利用率,mij为工站i并联设备数量,te ij为产品j在工站i的有效加工时间。where c a ij and c e ij are the arrival variability and processing time variability of product j at station i, respectively, u ij is the utilization rate of station i, m ij is the number of parallel devices at station i, and t e ij is Effective processing time of product j at station i.

步骤2.3.2:计算工件生产速率TH。Step 2.3.2: Calculate the workpiece production rate TH.

假设工站i中有mij(b>m>1)台并联设备,b为工站i前缓冲区容量大小,k为工站i正在加工工件数,若有0≤k≤b,工站i前无等待的工件j(0<j<r,r表示生产线中一共加工的产品数量)加工时的概率p0为:Assuming that there are m ij (b>m>1) parallel devices in station i, b is the buffer capacity in front of station i, k is the number of workpieces being processed by station i, if there is 0≤k≤b, station i The probability p 0 of the workpiece j (0<j<r, r represents the total number of products processed in the production line) without waiting before i is:

Figure BDA0002626324590000072
Figure BDA0002626324590000072

工件j在工站i的损失率

Figure BDA0002626324590000073
为:Loss rate of workpiece j at station i
Figure BDA0002626324590000073
for:

Figure BDA0002626324590000074
Figure BDA0002626324590000074

设qhj为工件j在工站h上的不良品率,Qij为工站i监测到的不良品率,其取值范围为0<h<i≤s,其中s表示该串并联生产线中工站数量。在工站i上检测并移除的工件j的不良品概率Qij为:Let q hj be the defective product rate of workpiece j on station h, and Q ij be the defective product rate monitored by station i, and its value range is 0<h<i≤s, where s represents the series-parallel production line. number of stations. The defective product probability Q ij of workpiece j detected and removed at station i is:

Figure BDA0002626324590000081
Figure BDA0002626324590000081

Figure BDA0002626324590000082
表示生产线中所有带有不良品检测工站编号的集合。
Figure BDA0002626324590000082
Indicates the collection of all stations with defective product inspection station numbers in the production line.

则工件j在工站i的生产速率THij为:Then the production rate TH ij of workpiece j at station i is:

Figure BDA0002626324590000083
Figure BDA0002626324590000083

记产品J的瓶颈工站I生产速率为rb IJ=max(uij)。Denote the production rate of the bottleneck station I of product J as r b IJ =max(u ij ).

步骤2.3.3:计算生产线的生产周期(逻辑生产周期)CTj和在制品水平WIPjStep 2.3.3: Calculate the production line's production cycle (logical production cycle) CT j and the work-in-process level WIP j .

计算工件平均等待成批时间WTBT:Calculate the average waiting batch time WTBT for workpieces:

Figure BDA0002626324590000084
Figure BDA0002626324590000084

其中ra代表工件到达工站的速率,kij表示工站i的产品j加工批量大小,此时

Figure BDA0002626324590000085
Figure BDA0002626324590000086
改写CTq ij计算公式:where ra represents the rate at which the workpiece arrives at the station, and k ij represents the processing batch size of product j at station i. At this time,
Figure BDA0002626324590000085
but
Figure BDA0002626324590000086
Rewrite the calculation formula of CT q ij :

Figure BDA0002626324590000087
Figure BDA0002626324590000087

计算产品j在工站i的生产周期CTj和在制品水平WIPjCalculate the production cycle CT j of product j at station i and the WIP j level of work in process:

Figure BDA0002626324590000088
Figure BDA0002626324590000088

Figure BDA0002626324590000089
Figure BDA0002626324590000089

从而得到产品j在整条串并联生产线的生产周期(逻辑生产周期)CTj和在制品水平WIPjThus, the production cycle (logical production cycle) CT j of product j in the entire series-parallel production line and the WIP j of the work-in-process level are obtained:

Figure BDA00026263245900000810
Figure BDA00026263245900000810

Figure BDA00026263245900000811
Figure BDA00026263245900000811

步骤2.4:对生产线性能预测模型性能进行评估。Step 2.4: Evaluate the performance of the production line performance prediction model.

步骤2.4.1:计算生产线性能指标F。Step 2.4.1: Calculate the production line performance index F.

如图3,以生产线最佳情形、最差情形和实际最差情形下的WIP-CT和WIP-TH曲线作为标杆划定了性能象限中的“优区”和“劣区”,构成生产线的性能评估图。As shown in Figure 3, the WIP-CT and WIP-TH curves in the best case, the worst case and the actual worst case of the production line are used as benchmarks to delineate the "excellent area" and "inferior area" in the performance quadrant. Performance evaluation graph.

将实际性能点的距离除以最佳情形与实际最差情形标杆之间距离的比值作为性能评估指标,记为F:Divide the distance of the actual performance point by the ratio of the distance between the best case and the actual worst case benchmark as the performance evaluation index, denoted as F:

Figure BDA0002626324590000091
Figure BDA0002626324590000091

其中w代表给定实际在制品水平,t代表实际生产周期,T0表示生产线的理论加工时间,此处T0=CT;rb代表生产线的瓶颈速率,此处rb=THij,当且仅当uij=umaxwhere w represents a given actual WIP level, t represents the actual production cycle, T 0 represents the theoretical processing time of the production line, where T 0 =CT; rb represents the bottleneck rate of the production line, where r b = TH ij , if and only if u ij =u max .

步骤2.4.2:计算生产线效益指标Bf。Step 2.4.2: Calculate the production line benefit index Bf.

考察生产成本,将生产线性能指标F改写为效益指标Bf:Considering the production cost, rewrite the production line performance index F as the benefit index Bf:

Bf=C*FBf=C*F

Figure BDA0002626324590000092
Figure BDA0002626324590000092

其中C为成本因子,c1为单位设备成本,c2为单位缓冲区容量成本,c3为其余固定成本,m1和b1分别为当前并联设备数量和缓冲区容量大小,m0和b0分别为初始并联设备数量和缓冲区容量大小。where C is the cost factor, c 1 is the unit equipment cost, c 2 is the unit buffer capacity cost, c 3 is the remaining fixed cost, m 1 and b 1 are the current number of parallel devices and buffer capacity, respectively, m 0 and b 0 is the initial number of parallel devices and the size of the buffer capacity, respectively.

步骤3:Step 3:

步骤3.1:Morris筛选法灵敏度定性分析。Step 3.1: Qualitative analysis of the sensitivity of the Morris screening method.

选取生产线性能预测模型中的某一个参数x,预先设定固定步长C和最大变幅M,以步长C对参数x进行扰动变化,将性能评估指标F的平均变化率作为灵敏度系数S:Select a certain parameter x in the production line performance prediction model, preset a fixed step size C and a maximum variation M, and use the step size C to perturb and change the parameter x, and take the average rate of change of the performance evaluation index F as the sensitivity coefficient S:

Figure BDA0002626324590000093
Figure BDA0002626324590000093

其中,Y0为参数x初始值对应的性能评估指标F;Yg、Yg+1为第g次和第g+1次参数x扰动变化后的性能评估指标F;Pg、Pg+1分别为第g次、第g+1次参数扰动变化后其值相对于初始值的变化率,n为运算次数。Among them, Y 0 is the performance evaluation index F corresponding to the initial value of parameter x; Y g , Y g+1 are the performance evaluation index F after the gth and g+1th disturbance changes of parameter x; P g , P g + 1 is the rate of change of its value relative to the initial value after the g-th and g+1-th parameter perturbation changes, respectively, and n is the number of operations.

表1为Morris筛选法所得性能评估指标F对于不同参数的灵敏度系数。Table 1 shows the sensitivity coefficients of the performance evaluation index F obtained by the Morris screening method for different parameters.

表1指标F的灵敏度系数STable 1 Sensitivity coefficient S of index F

参数名parameter name 单位unit 参数意义Parameter meaning 灵敏度系数SSensitivity coefficient S uu % 利用率Utilization 1.2421.242 r<sub>0</sub>r<sub>0</sub> 件/分钟Pieces/min 投料速率Feed rate -0.163-0.163 rara 件/分钟Pieces/min 生产速率production rate 0.6220.622 kk piece 加工批量大小Processing batch size 0.4780.478 c<sub>a</sub>c<sub>a</sub> // 工件到达时间变动性Workpiece arrival time variability 0.3500.350 c<sub>e</sub>c<sub>e</sub> // 加工变动性Process variability 0.4570.457 mm tower 设备并联数量Number of devices connected in parallel -1.134-1.134 AA % 设备可用率equipment availability -0.104-0.104 bb piece 缓冲区容量大小buffer size 0.5810.581 QQ % 工件不良品率Workpiece defective rate -0.029-0.029

根据表2的灵敏度分级标准以及参数间的关系,将并联设备数量m、加工批量大小k、工件到达时间变动性ca、加工变动性ce和缓冲区容量大小b确定为对半导体封装测试生产线性能影响较大的因素。According to the sensitivity classification standard and the relationship between the parameters in Table 2, the number of parallel devices m, the processing batch size k, the workpiece arrival time variability c a , the processing variability c e and the buffer capacity size b are determined as the semiconductor packaging test production line Factors that have a greater impact on performance.

表2灵敏度分级标准Table 2 Sensitivity grading standard

灵敏度系数绝对值Absolute value of sensitivity coefficient 灵敏度分级Sensitivity classification 0.00≤/S/<0.050.00≤/S/<0.05 不灵敏insensitive 0.05≤/S/<0.200.05≤/S/<0.20 中等灵敏Moderately sensitive 0.20≤/S/<1.000.20≤/S/<1.00 较灵敏more sensitive /S/≥1.00/S/≥1.00 高灵敏High sensitivity

步骤3.2:Arena仿真灵敏度定量分析。Step 3.2: Arena simulation sensitivity quantitative analysis.

在Arena软件中建立半导体芯片封装测试串并联生产线模型,如图4。每台设备具有独立的随机加工时间,失效时间和维修时间。A series-parallel production line model for semiconductor chip packaging and testing is established in Arena software, as shown in Figure 4. Each piece of equipment has independent random processing time, failure time and maintenance time.

令生产线上的工件到达速率、工站设备加工速率、平均失效前时间mf、平均修复时间mp分别服从负指数分布和正态分布,加工批量大小k、缓冲区容量大小b和并联设备数量m均为固定的正整数,且有b>m>1,仿真实验预热时间设置为600分钟,运行总时间设置为1200分钟,重复3次试验。Let the arrival rate of the workpiece on the production line, the processing rate of the station equipment, the average time before failure m f , and the average repair time mp obey the negative exponential distribution and the normal distribution, respectively, the processing batch size k, the buffer capacity size b and the number of parallel equipment m is a fixed positive integer, and there is b>m>1, the preheating time of the simulation experiment is set to 600 minutes, the total running time is set to 1200 minutes, and the experiment is repeated 3 times.

实验得到生产线总体性能、生产周期CT、生产速率TH和在制品水平WIP关于影响生产线性能的关键因素的变化曲线。如图6所示,为生产线性能关于时间变动性ca和加工变动性ce的变化图。The experiment obtains the change curve of the overall performance of the production line, the production cycle CT, the production rate TH and the WIP level of the work-in-process about the key factors affecting the performance of the production line. As shown in FIG. 6 , it is a graph showing changes in line performance with respect to time variability ca and process variability ce .

步骤4:Step 4:

步骤4.1:以生产线性能预测模型作为强化学习外界环境,以生产线变动性的变化为触发条件,基于事件触发策略与周期触发策略相结合的动态控制方法,建立如图5所示的基于强化学习的半导体芯片封装测试生产线性能控制模型。Step 4.1: Take the production line performance prediction model as the external environment for reinforcement learning, take the change of the variability of the production line as the triggering condition, and establish a dynamic control method based on the combination of the event trigger strategy and the cycle trigger strategy, as shown in Figure 5. Semiconductor chip packaging and testing production line performance control model.

步骤4.2:初始化Q(s,a),

Figure BDA0002626324590000113
a∈A(s),其中Q值是对长期报酬的反映,S为系统状态集。划分方式如表3所示:Step 4.2: Initialize Q(s, a),
Figure BDA0002626324590000113
a∈A(s), where the Q value is a reflection of long-term rewards and S is the system state set. The division method is shown in Table 3:

表3系统状态集S划分Table 3 Division of system state set S

系统状态system status 划分依据Division basis 系统状态system status 划分依据Division basis s1s1 0≤Bf≤0.10≤Bf≤0.1 s2s2 0.1<Bf≤0.20.1<Bf≤0.2 s3s3 0.2<Bf≤0.30.2<Bf≤0.3 s4s4 0.3<Bf≤0.40.3<Bf≤0.4 s5s5 0.4<Bf≤050.4<Bf≤05 s6s6 0.5<Bf≤0.60.5<Bf≤0.6 s7s7 0.6<Bf≤0.70.6<Bf≤0.7 s8s8 0.7<Bf≤0.80.7<Bf≤0.8 s9s9 0.8<Bf≤0.90.8<Bf≤0.9 s10s10 0.9<Bf≤1.00.9<Bf≤1.0 s11s11 Bf≥1.0Bf≥1.0

A(s)为动作策略集,A(s):{a1:工站i并联设备数量+1,a2:工站i并联设备数量-1,a3:工站i缓冲区容量+1,a4:工站i缓冲区容量-1,a5:产品j加工批量大小+1,a6:产品j加工批量大小-1}。设参数学习率因子α为0.1,折扣因子γ为0.9,确定回报函数r如下,Bfpre代表生产线上一次优化后的效益指标:A(s) is the action strategy set, A(s): {a1: the number of parallel devices in station i +1, a2: the number of parallel devices in station i -1, a3: the buffer capacity of station i +1, a4: Station i buffer capacity -1, a5: product j processing batch size +1, a6: product j processing batch size -1}. Let the parameter learning rate factor α be 0.1 and the discount factor γ to be 0.9, determine the reward function r as follows, Bf pre represents the benefit index after the first optimization of the production line:

Figure BDA0002626324590000111
Figure BDA0002626324590000111

步骤4.3:给定起始状态s,并根据ε-贪婪策略在状态s选择动作a。Step 4.3: Given a starting state s, and choose action a in state s according to the ε-greedy policy.

步骤4.4:根据ε-贪婪策略在状态s选择动作a,b为a的选择序号,得到回报r和下一个状态snext,anext代表下一个动作,更新Q值:Step 4.4: According to the ε-greedy strategy, select the action a in the state s, and b is the selection number of a, get the reward r and the next state s next , a next represents the next action, and update the Q value:

Figure BDA0002626324590000112
Figure BDA0002626324590000112

s=snext,a=anext s=s next , a=a next

步骤4.5:转向步骤4.4,直到系统趋向稳定状态,也就是收敛状态。Step 4.5: Go to step 4.4 until the system tends to a steady state, that is, a convergent state.

步骤4.6:重复执行步骤4.2到步骤4.5,直到学习周期(算法预先设置的步骤4.2到步骤4.5重复执行的次数)结束则停止迭代。Step 4.6: Repeat steps 4.2 to 4.5 until the learning cycle (the number of repeated executions of steps 4.2 to 4.5 preset by the algorithm) ends, then stop the iteration.

步骤4.7:输出最终策略

Figure BDA0002626324590000121
并得到生产线性能的指标优化情况。图7和图8分别为不同变动性水平CV1和CV2下性能控制前后的生产线性能指标变化情况。Step 4.7: Output the final policy
Figure BDA0002626324590000121
And get the index optimization of the production line performance. Figures 7 and 8 show the changes of production line performance indicators before and after performance control under different variability levels CV1 and CV2, respectively.

综上所述,本发明建立了更加精确的半导体封装测试串并联生产线性能预测模型,综合使用Morris筛选法与Arena仿真法开展全局灵敏度定量分析,得到对生产线性能影响最大的若干影响因素及其影响规律,避免了设备马尔科夫状态空间庞大,传统数学模型分析不适用的情况;并改进参数ε的取值方式,使得算法收敛速度更快并避免局部最优,同时具有更好的灵活性和实时性。To sum up, the present invention establishes a more accurate model for predicting the performance of a series-parallel production line for semiconductor packaging and testing, comprehensively uses the Morris screening method and the Arena simulation method to carry out quantitative analysis of global sensitivity, and obtains several influencing factors that have the greatest impact on the performance of the production line and their effects. It avoids the situation where the equipment Markov state space is huge and the traditional mathematical model analysis is not applicable; and the value method of the parameter ε is improved to make the algorithm converge faster and avoid local optimization, and at the same time have better flexibility and real-time.

Claims (5)

1. A chip packaging test production line performance control method based on Q-learning reinforcement learning comprises the following steps:
step 1: and constructing an abstract model of a serial-parallel production line for semiconductor chip packaging test.
Step 2: and (3) establishing a prediction model of the series-parallel production line performance of the semiconductor chip packaging test based on the abstract model of the production line constructed in the step (1).
And step 3: and (3) based on the production line abstract model constructed in the step (1), obtaining an influence mechanism of key variable factors on the production line performance according to Morris screening legal analysis and Arena simulation quantitative analysis.
And 4, step 4: and (3) establishing a performance control model based on the prediction model established in the step (2) and the key variability analysis obtained in the step (3), and performing iterative solution by taking the optimal production line benefit index as a performance control target to obtain a global optimal performance control strategy.
2. The method for controlling the performance of the chip packaging test production line based on Q-learning of claim 1, wherein:
the step 1 specifically comprises the following steps: the subsequent procedure of a semiconductor production manufacturing production line, namely a chip packaging test production line, is taken as a research object, and if a limited buffer area exists between work stations, a queuing rule is firstly provided and firstly serviced, and the queuing rule is abstracted to a multi-station serial-parallel queuing production line model containing reentry.
3. The method as claimed in claim 1, wherein the step 2 is specifically as follows:
step 2.1: calculating the mobility: calculating the arrival variability caAnd variability during processing ce
Step 2.2: determining basic performance prediction indexes;
from the mean processing time CT of the workpieces at the queueqAnd effective processing time teObtaining the average time CT (computed tomography) of residence in a work station, namely a production period; further calculating to obtain average work-in-process level WIP at a work station, and taking the work piece production rate TH, the production period CT and the work-in-process level WIP as basic indexes of production line performance prediction;
CT=CTq+te
WIP=CT×TH
step 2.3: establishing a production line performance prediction model;
step 2.3.1: calculating the queuing time of the product j at the station i:
Figure FDA0002626324580000011
wherein c isa ij、ce ijThe arrival variability and the processing time variability of the product j at the station i, uijFor the utilization of station i, mijFor the number of I parallel devices of a station,te ijThe effective processing time of the product j at the station i;
step 2.3.2: calculating the production rate TH of the workpiece;
in the construction station i, there is mijA parallel device, b is the capacity of a buffer area in front of the station i, k is the number of workpieces being processed by the station i, b>m>1; if k is more than or equal to 0 and less than or equal to b, the probability p of the workpiece j without waiting before the station i to be processed0Where 0 < j < r, r denotes the amount of product co-processed in the production line:
Figure FDA0002626324580000021
blocking probability of workpiece j with capacity of buffer area being b
Figure FDA00026263245800000210
Comprises the following steps:
Figure FDA0002626324580000022
let qhjThe defective rate, Q, of the workpiece j on the station hijThe value range of the defective product rate monitored by the work station i is more than 0 h and more than i and less than or equal to s, wherein s represents the number of the work stations in the series-parallel production line, and the defective product probability Q of the workpiece j detected and removed on the work station iijComprises the following steps:
Figure FDA0002626324580000023
Figure FDA0002626324580000029
representing all the sets with the numbers of the defective product detection stations in the production line;
the production rate TH of the workpiece j at the station iijComprises the following steps:
Figure FDA0002626324580000024
when the utilization rate of a certain work station is maximum, the register station I is the bottleneck work station of the product J, and the production rate is recorded as rb IJ=max(uij);
Step 2.3.3: calculating production cycle CT of production linejAnd WIP at work-in-process levelj
Calculating the average waiting batch time WTBT of the workpieces:
Figure FDA0002626324580000025
wherein r isaRepresenting the rate at which the workpiece arrives at the station, where kijThe size of the processing batch of the product j of the work station i is shown, at the moment
Figure FDA0002626324580000026
Then
Figure FDA0002626324580000027
Rewriting CTq ijCalculating the formula:
Figure FDA0002626324580000028
calculating the production period CT of the product j at the station ijAnd WIP at work-in-process levelj
Figure FDA0002626324580000031
Figure FDA0002626324580000032
Thereby obtaining the production period CT of the product j on the whole series-parallel production linejAnd WIP at work-in-process levelj
Figure FDA0002626324580000033
Figure FDA0002626324580000034
Step 2.4: evaluating the performance of the production line performance prediction model;
step 2.4.1: calculating a production line performance index F;
defining a good area and a bad area in a performance quadrant by taking WIP-CT and WIP-TH curves of the production line under the best condition, the worst condition and the actual worst condition as benchmarks to form a performance evaluation graph of the production line;
and taking the ratio of the distance of the actual performance point divided by the distance between the best case and the actual worst case benchmarks as a performance evaluation index, and recording as F:
Figure FDA0002626324580000035
where w represents the actual work-in-process level given, T represents the actual production cycle, T0Represents the theoretical processing time of the production line, here T0=CT;rbRepresents the bottleneck rate of the production line, where rb=THijIf and only if uij=umax
Step 2.4.2: calculating a production line benefit index Bf;
and (3) inspecting the production cost, and rewriting the production line performance index F into a benefit index Bf:
Bf=C*F
Figure FDA0002626324580000036
wherein C is a cost factor, C1As unit cost of equipment, c2Cost per unit buffer capacity, c3M is the remaining fixed cost1And b1Respectively the current number of parallel devices and the buffer capacity, m0And b0The initial parallel device number and the buffer capacity are respectively.
4. The method for controlling the performance of the chip packaging test production line based on the Q-learning reinforcement learning of claim 1, wherein the step 3 specifically comprises:
step 3.1: carrying out sensitivity qualitative analysis by a Morris screening method;
selecting a random parameter x in a production line performance prediction model, presetting a fixed step length C and a maximum amplitude M, carrying out disturbance change on the parameter x by using the step length C, and taking the average change rate of a performance evaluation index F as a sensitivity coefficient S:
Figure FDA0002626324580000041
wherein, Y0A performance evaluation index F corresponding to the initial value of the parameter x; y isg、Yg+1Is the parameter x at the g th and the g +1 th timesgA performance evaluation index F after disturbance change; pg、Pg+1 is the change rate of the value of the parameter after the parameter is disturbed and changed for the g-th time and the g + 1-th time respectively relative to the initial value, and n is the operation times;
according to the sensitivity grading standard, determining parameters with more sensitivity and high sensitivity coefficient as factors which have larger influence on the performance of the semiconductor packaging test production line; the sensitivity grading standard according to the absolute value of the sensitivity coefficient comprises the following steps: the sensitivity is not less than 0.00/S/< 0.05, the sensitivity is medium-sensitive less than 0.05/S/< 0.20, the sensitivity is more sensitive less than 0.20/S/< 1.00, and the sensitivity is high more than or equal to 1.00;
step 3.2: quantitative analysis of Arena simulation sensitivity;
establishing a semiconductor chip packaging test series-parallel production line model in Arena software, wherein each device has independent random processing time, failure time and maintenance time;
the arrival rate of the workpieces on the production line, the processing rate of the station equipment and the average time m before failurefAverage repair time mpRespectively obeying negative exponential distribution and normal distribution, wherein the processing batch size k, the buffer area capacity size b and the number m of parallel devices are fixed positive integers, b is more than m and is more than 1, and the preheating time setting and operation of the simulation experiment are setTotal time and number of experimental replicates;
the experiment results in the change curves of the overall performance of the production line, the production period CT, the production rate TH and the work-in-process level WIP with respect to key factors influencing the performance of the production line.
5. The method for controlling the performance of the chip packaging test production line based on the Q-learning reinforcement learning of claim 1, wherein the step 4 specifically comprises:
step 4.1: establishing a semiconductor chip packaging test production line performance control model based on reinforcement learning by taking the production line performance prediction model as a reinforcement learning external environment and the change of production line variability as a trigger condition based on a dynamic control method combining an event trigger strategy and a periodic trigger strategy;
step 4.2: q (s, a) is initialized,
Figure FDA0002626324580000042
a belongs to A (S), wherein the value of Q is the reflection of long-term remuneration, S is a system state set, and A (S) is an action strategy set of key factors obtained in the step 4.2; giving a parameter learning rate factor alpha and a discount factor gamma, and determining a return function r;
step 4.3: giving an initial state s, and selecting an action a in the state s according to a greedy strategy; the improved value taking mode is set as a function:
Figure FDA0002626324580000051
wherein p is the current deployment step number of the algorithm, and M is the total iteration step number of the algorithm;
step 4.4: selecting the selection sequence number of the action a and b as a in the state s according to a greedy strategy to obtain a return r and a next state snexts,anextRepresenting the next action, update the Q value:
Figure FDA0002626324580000052
S=Snext,a=anext
step 4.5: turning to step 4.4, until the system tends to a steady state, i.e. a convergence state;
step 4.6: repeatedly executing the step 4.2 to the step 4.5 until the learning period, namely the number of times of repeated execution of the step 4.2 to the step 4.5 preset by the algorithm, is over, and stopping iteration;
step 4.7: output final strategy
Figure FDA0002626324580000053
And obtaining the index optimization condition of the production line performance.
CN202010797879.2A 2020-08-10 2020-08-10 Chip packaging test production linear energy control method based on Q-learning reinforcement learning Expired - Fee Related CN111857081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010797879.2A CN111857081B (en) 2020-08-10 2020-08-10 Chip packaging test production linear energy control method based on Q-learning reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010797879.2A CN111857081B (en) 2020-08-10 2020-08-10 Chip packaging test production linear energy control method based on Q-learning reinforcement learning

Publications (2)

Publication Number Publication Date
CN111857081A true CN111857081A (en) 2020-10-30
CN111857081B CN111857081B (en) 2023-05-05

Family

ID=72971238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010797879.2A Expired - Fee Related CN111857081B (en) 2020-08-10 2020-08-10 Chip packaging test production linear energy control method based on Q-learning reinforcement learning

Country Status (1)

Country Link
CN (1) CN111857081B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631216A (en) * 2020-12-11 2021-04-09 江苏晶度半导体科技有限公司 Semiconductor test packaging production line performance prediction control system based on DQN and DNN twin neural network algorithm
CN113033815A (en) * 2021-02-07 2021-06-25 广州杰赛科技股份有限公司 Intelligent valve cooperation control method, device, equipment and storage medium
CN113962470A (en) * 2021-10-29 2022-01-21 上海新科乾物联技术有限公司 Optimized scheduling method and system based on disturbance prediction
CN115933412A (en) * 2023-01-12 2023-04-07 中国航发湖南动力机械研究所 Aero-engine control method and device based on event-triggered predictive control
CN120631674A (en) * 2025-08-12 2025-09-12 弘润半导体(苏州)有限公司 Chip packaging test production linear energy control method based on reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004127170A (en) * 2002-10-07 2004-04-22 Matsushita Electric Ind Co Ltd Production plan creation method
CN103676881A (en) * 2013-12-16 2014-03-26 北京化工大学 Dynamic bottleneck analytical method of semiconductor production line
CN108646684A (en) * 2018-05-30 2018-10-12 电子科技大学 A kind of multi-product production line production cycle prediction technique based on mobility measurement
CN109270904A (en) * 2018-10-22 2019-01-25 中车青岛四方机车车辆股份有限公司 A kind of flexible job shop batch dynamic dispatching optimization method
CN110378439A (en) * 2019-08-09 2019-10-25 重庆理工大学 Single robot path planning method based on Q-Learning algorithm
CN110517002A (en) * 2019-08-29 2019-11-29 烟台大学 Production Control Method Based on Reinforcement Learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004127170A (en) * 2002-10-07 2004-04-22 Matsushita Electric Ind Co Ltd Production plan creation method
CN103676881A (en) * 2013-12-16 2014-03-26 北京化工大学 Dynamic bottleneck analytical method of semiconductor production line
CN108646684A (en) * 2018-05-30 2018-10-12 电子科技大学 A kind of multi-product production line production cycle prediction technique based on mobility measurement
CN109270904A (en) * 2018-10-22 2019-01-25 中车青岛四方机车车辆股份有限公司 A kind of flexible job shop batch dynamic dispatching optimization method
CN110378439A (en) * 2019-08-09 2019-10-25 重庆理工大学 Single robot path planning method based on Q-Learning algorithm
CN110517002A (en) * 2019-08-29 2019-11-29 烟台大学 Production Control Method Based on Reinforcement Learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张树林: "一种机器人搬运生产线的调度优化方法及实验平台设计" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631216A (en) * 2020-12-11 2021-04-09 江苏晶度半导体科技有限公司 Semiconductor test packaging production line performance prediction control system based on DQN and DNN twin neural network algorithm
CN113033815A (en) * 2021-02-07 2021-06-25 广州杰赛科技股份有限公司 Intelligent valve cooperation control method, device, equipment and storage medium
CN113962470A (en) * 2021-10-29 2022-01-21 上海新科乾物联技术有限公司 Optimized scheduling method and system based on disturbance prediction
CN113962470B (en) * 2021-10-29 2022-06-24 上海新科乾物联技术有限公司 Optimized scheduling method and system based on disturbance prediction
CN115933412A (en) * 2023-01-12 2023-04-07 中国航发湖南动力机械研究所 Aero-engine control method and device based on event-triggered predictive control
CN120631674A (en) * 2025-08-12 2025-09-12 弘润半导体(苏州)有限公司 Chip packaging test production linear energy control method based on reinforcement learning

Also Published As

Publication number Publication date
CN111857081B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN111857081A (en) Performance control method of chip packaging and testing production line based on Q-learning reinforcement learning
CN113792924B (en) A single-piece job shop scheduling method based on Deep Q-network deep reinforcement learning
CN106874581B (en) Building air conditioner energy consumption prediction method based on BP neural network model
CN108694467B (en) A method and system for predicting line loss rate of distribution network
CN106651089B (en) Modeling and Optimal Solving Method of Distributed Set Robust Model for Production Scheduling Problem
CN111427750B (en) GPU power consumption estimation method, system and medium of computer platform
CN107767022A (en) A kind of Dynamic Job-shop Scheduling rule intelligent selecting method of creation data driving
CN111898867B (en) Airplane final assembly production line productivity prediction method based on deep neural network
CN115062528A (en) A forecasting method for industrial process time series data
CN111880489B (en) Regression scheduling method for complex manufacturing system
CN117408433A (en) A decision-making method and device for technical transformation project optimization considering multi-objective contributions
CN113328467B (en) Probability voltage stability evaluation method, system, terminal device and medium
CN110097205A (en) A kind of building load prediction weather forecast data preprocessing method
CN119671365A (en) A project performance control method, device, equipment, product and storage medium
CN117713084A (en) Power system partition load demand forecasting method, system, equipment and storage medium
CN118779737A (en) A method and system for fault control of electric energy metering multi-calibration pipeline
CN109523136A (en) A kind of scheduling knowledge management system towards intelligence manufacture
Chen et al. A fuzzy-neural approach for remaining cycle time estimation in a semiconductor manufacturing factory—a simulation study
CN111369072A (en) An Online Prediction Model of Kernel Least Mean Square Time Series Based on Sparsification Method
Jinlian et al. Long and medium term power load forecasting based on a combination model of GMDH, PSO and LSSVM
CN108171435A (en) A kind of production schedule decision-making technique for considering preventive maintenance
CN118586438A (en) A method for predicting rural photovoltaic power generation based on improved gated recurrent unit network
CN118644008A (en) Data-driven method and device for supporting adjustable resource scheduling domain aggregation evaluation
CN107563511A (en) A kind of real-time system pot life is quickly estimated and optimization method
CN116933639A (en) High-precision polyethylene pipe slow crack growth rate prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20230505

CF01 Termination of patent right due to non-payment of annual fee
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载