Disclosure of Invention
The present invention has been made in view of the above-described problems occurring in the prior art.
Therefore, the invention provides a chip packaging test production linear energy control method based on reinforcement learning, which solves the problem that quick intervention and closed-loop regulation cannot be realized.
In order to solve the technical problems, the invention provides the following technical scheme:
the invention provides a chip packaging test production linear energy control method based on reinforcement learning, which comprises the steps of scanning occupancy rate data of each radio frequency band to generate a joint state vector, wherein the joint state vector comprises an electrostatic grade value, a band number sequence, a band occupancy rate sequence and packaging parameters;
Inputting the joint state vector into a pre-trained reinforcement learning strategy model, triggering a causal intervention mark when a high risk state is detected, and generating a frequency band reservation instruction, a forced forbidden instruction set and a quantum regulation instruction set;
Transmitting a frequency band reservation instruction, a forced forbidden instruction set and a quantum regulation instruction set to a radio frequency tester and chip packaging equipment through a communication interface, distributing a low-load test channel, prohibiting a high-risk channel, adjusting temperature and pressure, and starting a chip function test after adjustment;
And collecting test data of the chip functional test, verifying compliance of the test data, generating a qualified report, and optimizing the reinforcement learning strategy model through abnormal characteristics to complete performance control of the chip packaging test production line.
Inputting a joint state vector into a pre-trained reinforcement learning strategy model, generating an operation decision vector, controlling the performance of the chip packaging test production line, and updating the node association strength of a causal rule map through incremental training of packaging conflict logs;
the operation decision vector comprises a frequency band reservation instruction, a forced forbidden instruction set and a quantum regulation instruction set.
As a preferable scheme of the method for controlling the performance of the chip packaging test production line based on reinforcement learning, the method for updating the node association strength of the causal rule map through the incremental training of the packaging conflict log is to arrange a joint state vector, an operation decision vector and a quantum regulation instruction set in the packaging conflict log as input data of a reinforcement learning strategy model after the causal intervention mark and the packaging intervention mark are activated, and update the association strength among a current static grade value node, a current frequency band occupancy node, a current temperature node and a current pressure node of the causal rule map.
The invention relates to a chip packaging test production line performance control method based on reinforcement learning, which is characterized in that a packaging intervention mark is an activated mark after determining that any node of temperature and pressure reaches a high risk packaging state after detecting the states of a current temperature node and a current pressure node through a causal rule graph;
the packaging conflict log is a data set recorded by the reinforcement learning strategy model after the packaging intervention mark is activated.
The method for controlling the performance of the chip packaging test production line based on reinforcement learning is characterized in that the triggering causal intervention mark is a mark activated after detecting states of a current static grade value node and a current frequency band occupancy rate node through a causal rule graph and confirming that the current static grade value node and the current frequency band occupancy rate node reach a predefined static grade threshold and frequency band exceeding criterion.
The low-load test channel is a radio frequency test channel which detects the node state of the frequency band occupancy rate through a causal rule graph and identifies the frequency band occupancy rate lower than a load threshold value;
The high-risk channel is a radio frequency test channel which detects states of a current frequency band occupancy rate node and a current static grade value node through a causal rule map and identifies any one of the frequency band occupancy rate and the static grade value exceeding the standard.
The method for controlling the performance of the chip packaging test production line based on reinforcement learning is characterized in that the temperature and pressure adjustment is that a quantum regulation instruction set is transmitted to chip packaging equipment through a communication interface, the temperature and pressure state of the packaging equipment is detected in real time by utilizing a gallium nitride quantum dot sensor array, and the bonding temperature and the patch pressure of the chip packaging equipment are automatically adjusted to a standard range according to the temperature regulation instruction and the pressure regulation instruction in the quantum regulation instruction set.
The method for controlling the performance of the chip packaging test production line based on reinforcement learning is characterized in that test data of the chip functional test are collected, compliance of the test data is verified, event counts and functional results of the chip functional test are collected through a radio frequency tester, whether the event counts meet preset standards or not is analyzed, whether the functional results meet JEDEC standards or not is verified, and the qualification of the chip test is determined.
In a second aspect, the invention provides a computer device comprising a memory and a processor, the memory storing a computer program, wherein the computer program when executed by the processor implements any step of the reinforcement learning based chip package test line performance control method according to the first aspect of the invention.
In a third aspect, the present invention provides a computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements any step of the reinforcement learning based chip package test line performance control method according to the first aspect of the present invention.
The method has the beneficial effects that intelligent perception and causal reasoning of the running state of the chip packaging test production line are realized through the reinforcement learning strategy model and the causal rule map. The reinforcement learning strategy model can identify a high-risk state in real time in a complex and changeable test environment, generates a frequency band reservation instruction, a forced forbidden instruction set and a quantum regulation instruction set according to a causal intervention logic, so as to realize dynamic scheduling and closed-loop regulation of a radio frequency tester and chip packaging equipment, further supports an incremental training mechanism based on packaging conflict logs, continuously optimizes node association strength in decision logic and causal rule patterns, improves adaptability under actual scenes such as process fluctuation, equipment aging and the like, realizes quick response and accurate intervention on abnormal states, and enhances stability, efficiency and intelligent level of a test process.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Referring to fig. 1 to 4, in an embodiment of the present invention, a method for controlling performance of a chip package test line based on reinforcement learning is provided, including the following steps:
s1, scanning occupancy rate data of each radio frequency band to generate a joint state vector.
Disposing a quantum entanglement state frequency band probe array, disposing a gallium nitride quantum dot sensor array on the surface of production line equipment, and initializing the gallium nitride quantum dot sensor array to a quantum entanglement state;
The method comprises the steps of transmitting a sweep frequency detection signal to a quantum entangled state frequency band probe array, monitoring a quantum association collapse event through a single photon detector, recording collapse time sequence and amplitude, and outputting a frequency band response matrix containing frequency band numbers and quantum association amplitude;
The method comprises the steps of loading a frequency band response matrix into a three-dimensional coordinate space, loading a frequency band number of the frequency band response matrix on a frequency band number axis, loading a second-level attenuation factor on a time attenuation axis, loading a quantum correlation amplitude of the frequency band response matrix on a quantum correlation intensity axis, and constructing the three-dimensional coordinate space through coordinate axis assignment, wherein the second-level attenuation factor is an attribute of the frequency band response matrix, signals of correlation collapse events are attenuated along with time in a quantum sensing scene, attenuation characteristics are recorded in a second-level time scale, and the second-level attenuation factor is formed, wherein the second-level attenuation factor is generated to provide time attenuation characteristics for a frequency band topological field and is integrated into a frequency band occupation rate sequence of a joint state vector, so that an operation decision vector can accurately control frequency band allocation and risk channel forbidden based on dynamic time information, and judging ambiguity caused by missing time dimensions in subsequent steps is avoided.
Measuring the surface charge density of production line equipment through an electric field sensor, outputting a voltage signal, generating an electrostatic grade value through analog-to-digital conversion, determining a predefined weight through JEDEC standards (which are related specifications of semiconductor devices and define a sensitivity test method and classification of the devices to electrostatic discharge), and superposing the electrostatic grade value on a quantum correlation intensity axis of a frequency band response matrix to generate a frequency band topological field, wherein the predefined weight is that if the electrostatic grade value is less than 500V (low risk), the weight is 0.1, if the electrostatic grade value is 500-2000V (medium risk), the weight is 0.5, and if the electrostatic grade value is more than 2000V (high risk), the weight is 1.0;
arranging a gallium nitride quantum dot sensor array on the surface of chip packaging equipment, monitoring bonding temperature and pressure in the packaging process, and generating packaging parameters comprising temperature and pressure; loading the packaging parameters into a three-dimensional coordinate space, loading temperature on a temperature axis, loading pressure on a pressure axis, and loading quantum correlation amplitude of the packaging parameters on a quantum correlation intensity axis to output a packaging topology field;
Applying a one-dimensional convolution kernel (with the size of 3, the stride of 1 and covering 3 frequency band numbers) to the frequency band topological field along the frequency band number, extracting the distribution characteristics of the frequency band numbers by sliding the one-dimensional convolution kernel, generating frequency band convolution characteristics comprising the distribution characteristics, processing the frequency band convolution characteristics by means of mean pooling (selecting 2 adjacent windows along the frequency band number, compressing frequency band number dimensions in the average window) to generate a frequency band scalar set, applying a two-dimensional convolution kernel (with the size of 3 x 3, the stride of 1 and covering 3 x 3 temperature and pressure areas) to the time topology tensor along the temperature axis and the pressure axis, extracting the distribution characteristics of the temperature and the pressure by sliding the two-dimensional convolution kernel, generating temperature-pressure convolution characteristics comprising the distribution characteristics, processing the temperature-pressure convolution characteristics by means of mean pooling (selecting 2 x 2 adjacent windows along the temperature axis and the pressure axis, compressing the volume characteristic values in the window, compressing the temperature axis and the pressure axis dimensions), and generating packaging parameters;
And (3) arranging the frequency band number sequences according to the ascending order of the frequency band numbers, associating the frequency band scalar sets to the corresponding frequency band number sequences, adding the static grade value to the end of the frequency band number sequences, the frequency band occupancy rate sequences and the encapsulation parameters, and outputting the combined state vector formed by the frequency band number sequences, the frequency band occupancy rate sequences, the encapsulation parameters and the static grade value.
The frequency band occupancy rate sequence is generated based on the frequency band occupancy rate and is obtained by ascending arrangement of frequency band numbers.
S2, inputting the combined state vector into a pre-trained reinforcement learning strategy model, and triggering a causal intervention mark when a high risk state is detected to generate a frequency band reserved instruction, a forced forbidden instruction set and a quantum regulation instruction set.
Constructing a causal rule map, setting an electrostatic level threshold and a frequency band standard exceeding criterion, and analyzing causal relations among historical nodes in the causal rule map according to historical test data (comprising electrostatic level value records, frequency band occupancy rate records, temperature records, pressure records and test results) to generate a causal rule map containing time dimension, wherein the historical test data is acquired through an electric field sensor, a quantum entanglement state frequency band probe array and a gallium nitride quantum dot sensor array based on production line operation records;
The method comprises the steps of constructing a causal rule graph, setting an electrostatic grade threshold and a frequency band exceeding criterion, wherein an electrostatic grade value, a frequency band occupancy rate sequence and a packaging parameter are firstly obtained from a joint state vector and are used as input variables of the causal rule graph, then the electrostatic grade value, the frequency band occupancy rate sequence, the temperature and the pressure are respectively defined as a current electrostatic grade value node, a current frequency band occupancy rate node, a current temperature node and a current pressure node in the causal rule graph, each current node represents a key variable, the electrostatic grade threshold is set to be the electrostatic grade value of 2000V according to a predefined weight, the electrostatic grade value is judged to be in a high risk state if the electrostatic grade value is 2500V, the frequency band exceeding criterion is set to be the frequency band occupancy rate of 80% according to the frequency band occupancy rate sequence, the electrostatic grade value and the frequency band exceeding criterion are respectively stored in the current electrostatic grade value node and the current frequency band occupancy rate node if the frequency band occupancy rate is 85%, and the frequency band exceeding criterion is judged to be in an exceeding state;
Firstly, extracting static grade value record, frequency band occupancy rate record, temperature record, pressure record and test result (success or failure) from the historical test data, and respectively defining the static grade value node, the frequency band occupancy rate node, the historical temperature node, the historical pressure node and the historical test result node in the causal rule map, wherein each historical node represents a key variable; the method comprises the steps of carrying out correlation analysis on historical test data, comparing the correlation degree of static grade value records, frequency band occupancy records, temperature records and pressure records with test results through Pearson correlation coefficients, determining correlation strengths of historical static grade value nodes, historical frequency band occupancy nodes, historical temperature nodes and historical pressure nodes with the historical test result nodes, carrying out time sequence analysis on the historical test data, identifying front-back dependency relations of records with time through observing the change trend of the static grade value records, the frequency band occupancy records, the temperature records and the pressure records, determining time sequence correlation of the historical static grade value nodes, the historical frequency band occupancy nodes, the historical temperature nodes and the historical pressure nodes with the historical test result nodes, constructing directed edges of a causal rule graph based on the correlation strengths and the time sequence correlation, establishing the directed edges from the historical static grade value nodes, the historical frequency band occupancy nodes, the historical temperature nodes or the historical pressure nodes to the historical test result nodes, reflecting causal actions of various historical node variables on the test results, adding time labels for each directed edge of the causal rule graph, and recording the results from the historical static grade value nodes, the method comprises the steps of generating a causal rule map comprising a time dimension, and triggering an operation decision vector by integrating a causal effect delay time from a historical frequency band occupancy node, a historical temperature node or a historical pressure node to a historical test result node, finally, determining an activation intervention mark by detecting the state of a current node (current electrostatic level value node, current frequency band occupancy node, current temperature node and current pressure node) through the causal rule map, determining whether an operation decision vector needs to be generated and generating a command, generating the causal rule map comprising a time dimension and used for guiding real-time decisions of a current electrostatic level value node, a current frequency band occupancy node, a current temperature node and a current pressure node, triggering the operation decision vector, wherein the triggering operation decision vector is based on an electrostatic level threshold, a frequency band superscalar, a temperature threshold and a pressure threshold, and determining whether the operation decision vector needs to be generated or not and generating the command by the causal rule map, wherein the operation decision vector is a reinforcement learning model and a joint state vector comprising a frequency band reservation command, a forced command set and a quantum command set are output, and the triggering is that the operation decision vector is tightly related by the causal decision map through a decision rule.
Inputting the joint state vector into a pre-trained reinforcement learning strategy model to generate an operation decision vector, wherein the operation decision vector is specifically as follows:
If the current static level value node reaches the static level threshold value and the current frequency band occupancy rate node exceeds the frequency band exceeding criterion, a causal intervention mark is activated;
If the current temperature node or the current pressure node reaches a high-risk packaging state (the temperature is more than 150 ℃ or the pressure is more than 1MPa, based on historical test data statistics), a packaging intervention mark is activated;
if the causal intervention sign is not activated, outputting a frequency band reservation instruction (comprising frequency band allocation, bandwidth configuration, priority mark and time scheduling) in the operation decision vector;
When both the causal intervention mark and the encapsulation intervention mark are activated, based on causal relation of a causal rule map, shielding an operation instruction with the frequency band occupancy rate exceeding the standard, generating a forcible disabling instruction set, and disabling the radio frequency testing machine from using the high-risk frequency band;
When the packaging intervention mark is activated, detecting the state of the temperature and the pressure of the packaging equipment, comparing the temperature and the pressure with preset standard values, judging whether the temperature and the pressure exceed the standard ranges, and generating a quantum regulation instruction set (comprising a heating instruction, a cooling instruction, a pressurizing instruction, a depressurizing instruction, a state monitoring instruction and a calibration instruction) so as to optimize the temperature or the pressure, wherein the temperature standard value range is 20-30 ℃, the pressure standard value range is 0.8-1.2 standard atmospheric pressure (atm), and the quantum regulation instruction set is set based on the operation precision requirement of the packaging equipment;
Recording an electrostatic grade value, a frequency band occupancy sequence, temperature and pressure in a joint state vector, a decision type and an adjustment parameter in an operation decision vector and a temperature or pressure adjustment instruction in a quantum regulation instruction set when a packaging intervention mark is activated to generate a packaging conflict log, wherein the decision type refers to specific optimization actions selected when the packaging intervention mark is activated, the specific optimization actions comprise temperature optimization, pressure optimization, joint optimization (simultaneously adjusting the temperature and the pressure) and maintaining the current situation, the adjustment parameter refers to specific adjustment quantity or target value corresponding to each decision type, and the adjustment parameter comprises a temperature optimization parameter, namely a target temperature (for example 25 ℃), an adjustment amplitude (for example, heating up to 2 ℃) and an adjustment duration (for example, 10 seconds), a pressure optimization parameter, namely a target pressure (for example, 1.0 atm), an adjustment amplitude (for example, pressurizing up to 0.1 atm) and an adjustment duration (for example, 5 seconds), and the joint optimization parameter, simultaneously appointing a target value of the temperature and the pressure and the adjustment amplitude (for example, heating up to 1 ℃ and pressurizing up to 0.05 atm);
when the encapsulation intervention mark is activated, recording a joint state vector, an operation decision vector and a quantum regulation instruction set to generate an encapsulation conflict log;
Based on the packaging conflict log, starting incremental training, and improving the accuracy and instantaneity of a causal rule map guiding operation decision vector and a quantum regulation instruction set, wherein when the packaging conflict log is accumulated to reach a specified number (for example, 100), the joint state vector, the operation decision vector and the quantum regulation instruction set in the packaging conflict log are processed as input data of the incremental training;
The causal rule map guides real-time decision of the chip packaging equipment through the association strength of the current static grade value node, the current frequency band occupancy rate node, the current temperature node, the current pressure node and the directed edge, and generates an operation decision vector for controlling the operation of the chip packaging equipment;
Searching a frequency band triggering a causal intervention mark in a causal rule map, and sending a forcible disabling instruction set to a radio frequency testing machine to generate an extended operation decision vector (comprising a frequency band disabling instruction, a decision type and an adjustment parameter);
The method comprises the steps of obtaining temperature and pressure from a gallium nitride quantum dot sensor array, obtaining a frequency band occupancy rate sequence from a quantum entangled state frequency band probe array, updating the association strength of a causal rule map, and generating a final operation decision vector (comprising a frequency band reservation instruction, a forced forbidden instruction set and a quantum regulation instruction set).
S3, transmitting a frequency band reservation instruction, a forcible disabling instruction set and a quantum regulation instruction set to the radio frequency tester and the chip packaging equipment through the communication interface, distributing a low-load test channel, prohibiting a high-risk channel, adjusting temperature and pressure, and starting a chip function test after adjustment.
The method comprises the steps of transmitting a frequency band reservation instruction and a forced forbidden instruction set to a radio frequency testing machine through a communication interface, wherein the frequency band reservation instruction allocates a low-load testing channel according to a channel with frequency band occupancy rate lower than a load threshold (such as 0-80%) in a frequency band occupancy rate sequence to ensure that the radio frequency testing machine operates in a stable testing environment, and the forced forbidden instruction set configures the radio frequency testing machine through the communication interface according to a high risk state with frequency band occupancy rate exceeding standard or static class value higher than 2000V identified by a causal rule map to inhibit the use of the high risk testing channel to prevent electrical interference or equipment failure in the testing process, wherein the load threshold is set through a current frequency band occupancy rate node state based on the causal rule map, and is determined by analyzing the frequency band occupancy rate sequence to identify the low-load testing channel to ensure that the radio frequency testing machine operates in the stable testing environment;
The quantum regulation instruction set is transmitted to the chip packaging equipment through the communication interface, and comprises a temperature regulation instruction and a pressure regulation instruction, so that the gallium nitride quantum dot sensor array is driven to regulate the bonding temperature and the patch pressure of the chip packaging equipment in real time.
The causal rule map identifies the low risk state of the radio frequency testing machine through the current frequency band occupancy rate node (frequency band occupancy rate < 80%) and the current static grade value node (< 2000V), selects the corresponding radio frequency testing machine as a target radio frequency testing machine, activates a communication port of the target radio frequency testing machine through a communication interface, establishes connection with a low-load test channel, and simultaneously limits non-target equipment to access a high-load test channel so as to avoid resource competition and scheduling conflict;
The method comprises the steps of sending a test starting signal to a target radio frequency tester through a communication interface, loading a preset chip function test, wherein the preset chip function test comprises a logic function test, a time sequence test and a power consumption test to start the chip test, the preset chip function test is set based on the design specification and the test requirement of a chip, and the logic function test is used for verifying whether the logic function of the chip is correct or not and checking whether the output of the chip meets the design specification or not. For example, logic gates (e.g., AND AND OR) of a digital circuit OR instruction execution of a microcontroller built in a chip are tested to ensure no logic errors, timing testing is to check whether the chip's test signal timing at a specified clock frequency meets the requirements, to measure test signal propagation delay, setup time AND hold time, e.g., to verify whether the chip can be properly synchronized while running at high speed, to avoid timing violations (e.g., data loss), AND power consumption testing is to measure the chip's power consumption in different modes of operation (e.g., idle AND full), to verify whether the chip meets the design power consumption range. For example, testing static power consumption (on standby) and dynamic power consumption (on run) ensures that the chip is power efficient and not excessive;
In the test process, the gallium nitride quantum dot sensor array collects the packaging parameters of the chip packaging equipment every second, if the temperature or the pressure deviates from the target range of the quantum regulation instruction set, for example, the temperature rises to 32 ℃ or the pressure drops to 0.7 standard atmosphere, the bonding temperature and the patch pressure of the chip packaging equipment are automatically regulated according to the temperature regulation instruction and the pressure regulation instruction of the quantum regulation instruction set, for example, the cooling power of the chip packaging equipment is increased through the temperature regulation instruction or the pressure of the chip packaging equipment is increased to the target value of the quantum regulation instruction set through the pressure regulation instruction, the process stability is ensured, and the regulation log is recorded to a data record memory, wherein the temperature and the pressure regulation record is included.
S4, collecting test data of the chip functional test, verifying compliance of the test data, generating a qualified report, and optimizing the reinforcement learning strategy model through abnormal characteristics to complete performance control of the chip packaging test production line.
After the completed chip function test, collecting test data by a radio frequency tester, wherein the test data comprises event counts (the number of errors in the test process, such as the abnormal triggering number of test signals) and functional results (whether the test passes or not, such as a logic function verification result);
analyzing the compliance of the event count (for example, the number of errors is 5 times lower than the maximum allowable value of the event count counted by the test result recorded by the historical test data) by the radio frequency tester and verifying whether the functional result accords with JEDEC standard;
If the event count is compliant and the functional result passes, generating a qualified report, recording the batch yield (for example, 98%), the test efficiency (for example, 1000 sheets tested per hour) and the mean value of the packaging parameters of the chip packaging equipment (for example, the temperature is 25 ℃ and the pressure is 1.0 standard atmosphere), and storing the report into a data record memory for subsequent performance analysis;
If an abnormality (an over-standard event count or a failure of a functional result) is detected, a defect report is generated, an abnormality feature (for example, the temperature is over-limited to 35 ℃ and the pressure is fluctuated to 1.5 standard atmospheric pressure, or an electrostatic grade value is abnormal) is extracted, a current temperature node and a current pressure node of a causal rule graph are related, causal relations between the abnormality feature and the over-standard event count or the failure of the functional result are analyzed through the causal rule graph, and the influence of process parameters (temperature, pressure and electrostatic grade) on the chip test failure is determined;
the formula of the pearson correlation coefficient is:
;
Wherein, the Is the pearson correlation coefficient, the range is [ -1,1],1 is strong positive correlation, -1 is strong negative correlation, 0 is no correlation,To test the data quantity, i.e. the sum of the number of tests in the analysis,Is the firstAbnormal characteristic values of the secondary test (such as temperature 35 ℃),As the mean value of the outlier feature values,Is the firstA test failure indicator of the secondary test (e.g. 0/1 of event count or functional result),To be the mean value of the test failure indicators,For index of test, represent the firstThe number of the secondary test is used for traversing the data points of each test;
It should be noted that this formula is a standard method in statistics, proposed by cals pearson at the end of 19 th century, widely used in data analysis, and is a well-known mathematical formula;
The method comprises the steps of inputting abnormal characteristics in a defect report through a communication interface, combining a current static grade value node, a current temperature node and a current pressure node of a causal rule map, utilizing a test result recorded by historical test data as a reward function, adjusting parameters of a reinforcement learning strategy model, optimizing a static grade threshold value (for example, from 2000V to 1800V) and a temperature and pressure risk interval (for example, from a temperature high risk interval to >30 ℃ to >28 ℃) set by the causal rule map, so as to improve the adaptability of the reinforcement learning strategy model to the environmental change of a chip packaging test production line, generate a more accurate operation decision vector and a quantum regulation instruction set, and ensure the accuracy of production linear energy control;
Updating the reinforcement learning strategy model through incremental training, adjusting the parameters of the reinforcement learning strategy model by using the packaging parameters and the test data of the chip packaging equipment, and improving the adaptability of the reinforcement learning strategy model to the environment change (such as equipment aging or batch difference) of the chip packaging test production line;
The updated reinforcement learning strategy model is further calibrated according to the current static grade value node, the current frequency band occupancy rate node, the current temperature node and the current pressure node of the causal rule map, and an optimized operation decision vector and a quantum regulation instruction set are generated to ensure control accuracy.
The embodiment also provides computer equipment, which is suitable for the situation of the reinforcement learning-based chip package test production linear energy control method, and comprises a memory and a processor, wherein the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the reinforcement learning-based chip package test production linear energy control method.
The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
The present embodiment also provides a storage medium having a computer program stored thereon, which when executed by a processor implements the method for implementing the performance control method for a chip package test line based on reinforcement learning as set forth in the above embodiment, where the storage medium may be implemented by any type of volatile or nonvolatile Memory device or combination thereof, such as a static random access Memory (Static Random Access Memory, SRAM for short), an electrically erasable Programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), an erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM for short), a Programmable Read-Only Memory (PROM for short), a Read-Only Memory (ROM for short), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In conclusion, the intelligent sensing and causal reasoning of the chip packaging test production line running state are realized through the reinforcement learning strategy model and the causal rule map. The reinforcement learning strategy model can identify a high-risk state in real time in a complex and changeable test environment, generates a frequency band reservation instruction, a forced forbidden instruction set and a quantum regulation instruction set according to a causal intervention logic, so as to realize dynamic scheduling and closed-loop regulation of a radio frequency tester and chip packaging equipment, further supports an incremental training mechanism based on packaging conflict logs, continuously optimizes node association strength in decision logic and causal rule patterns, improves adaptability under actual scenes such as process fluctuation, equipment aging and the like, realizes quick response and accurate intervention on abnormal states, and enhances stability, efficiency and intelligent level of a test process.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.