CN117390517A

CN117390517A - Method for establishing prejudgment model based on classification model to knowledge degree of information

Info

Publication number: CN117390517A
Application number: CN202311460605.4A
Authority: CN
Inventors: 王利刚; 徐颖
Original assignee: Institute of Psychology of CAS
Current assignee: Institute of Psychology of CAS
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2024-01-12

Abstract

The invention discloses a method for establishing a pre-judging model based on the knowledge degree of a classification model on information, which comprises the steps of presenting a to-be-read language material on a display; the tested eye movement data tested in the process of first reading the material of the language is collected; after the tested reading carefully is determined, the read material of the language is presented again on the display for re-reading, and the eyeball movement data of the tested person in the process of re-reading the material of the language is collected; extracting key characteristic indexes in the tested eyeball movement data in the process of reading the material twice; taking key characteristic indexes in eyeball motion data as input, constructing an eye motion prediction model based on a neural network, and performing classification judgment on whether to reread or not; judging whether the tested material is reread on the new language according to the trained eye movement prediction model; according to the invention, through establishing the reading text, 20 key features related to eye movement information are extracted to establish the prediction model, and whether the reading is first or second reading can be predicted with the accuracy of 78%.

Description

A method for establishing a predictive model based on the classification model’s knowledge of information

技术领域Technical field

本发明涉及人工智能技术领域，具体涉及一种基于分类模型对信息知晓程度的预判模型建立方法。The present invention relates to the field of artificial intelligence technology, and specifically relates to a method for establishing a predictive model based on the degree of knowledge of information by a classification model.

背景技术Background technique

重读策略是一种公认使用最广泛的阅读理解策略，可以有效帮助读者快速阅读，提高对信息内容的熟悉度，也能更好地记忆细节，加深对语篇内容的理解，提高阅读效率。The rereading strategy is recognized as the most widely used reading comprehension strategy. It can effectively help readers read quickly, improve their familiarity with information content, remember details better, deepen their understanding of text content, and improve reading efficiency.

中国专利文献CN 112545451 A中公开了一种阅读眼动记录方法及装置，包括：S1登记被试的年龄、文化水平，测量被试者的视力水平，从文字库中选取适宜文体、字体、字号的阅读材料；S2控制装置通过连接装置将所述阅读材料发送至显示器，并以设定的时间进行显示；S3显示器上显示材料时，要求被试者按照自己的阅读习惯进行阅读，在阅读结束后闭上眼睛；S4眼动仪将在阅读过程中实时记录眼动数据，通过连接装置将眼动数据传回至控制装置进行储存，其中的眼动数据包括注视、凝视、眼跳、回视、阅读、平稳视觉跟踪、线性眼动行为和快速眼动行为数据；S5控制装置进行眼动数据分析，得出被试的阅读习惯以及阅读策略。上述专利文献是通过无创方式实时记录阅读过程中的眼动坐标点，分析得眼跳、注视等眼动数据，进而量化评价阅读能力的指标，是对被试阅读能力的评价，用于动态观察疾病对于眼睛的影响，但是上述研究所采用的眼动指标较少，且笼统不精细，没有构建整体的阅读水平评估模型。此外，现有技术中未见通过文本阅读并结合眼动对信息知晓程度做出的判断方法，大都将视觉图像特征和文本特征进行融合，如中国专利文献CN 116522212公开的基于图像文本融合的谎言检测方法，通过融合视觉图像特征和文本特征采用分类模型输出谎言检测结果。Chinese patent document CN 112545451 A discloses a reading eye movement recording method and device, including: S1 registering the subject's age and education level, measuring the subject's vision level, and selecting appropriate style, font, and font size from the character library reading materials; the S2 control device sends the reading materials to the display through the connecting device and displays them at a set time; when the materials are displayed on the S3 display, the subjects are required to read according to their own reading habits, and at the end of reading and then close your eyes; the S4 eye tracker will record eye movement data in real time during the reading process, and transmit the eye movement data back to the control device through the connecting device for storage. The eye movement data includes gaze, gaze, saccades, and return glances. , reading, smooth visual tracking, linear eye movement behavior and rapid eye movement behavior data; the S5 control device conducts eye movement data analysis to obtain the subject's reading habits and reading strategies. The above-mentioned patent document uses a non-invasive method to record eye movement coordinate points in the reading process in real time, analyzes eye movement data such as saccades and fixations, and then quantitatively evaluates reading ability indicators. It is an evaluation of the subject's reading ability and is used for dynamic observation. The impact of diseases on the eyes, however, the eye movement indicators used in the above studies are few, general and not precise, and no overall reading level assessment model has been constructed. In addition, there is no method in the existing technology to judge the degree of information awareness through text reading combined with eye movements. Most of them fuse visual image features and text features. For example, Chinese patent document CN 116522212 discloses a lie based on image and text fusion. The detection method uses a classification model to output lie detection results by fusing visual image features and text features.

发明内容Contents of the invention

为解决上述所存在的技术问题，本发明通过阅读语篇即可获得被试对文本信息的知晓程度，使对信息的知晓程度检测更为简单、方便，为此，本发明提供了一种基于分类模型对信息知晓程度的预判模型建立方法。In order to solve the above-mentioned technical problems, the present invention can obtain the subject's awareness of the text information by reading the text, making the detection of the awareness of the information simpler and more convenient. To this end, the present invention provides a method based on A method for establishing a prediction model for the classification model’s knowledge of information.

本发明采用如下技术方案：The present invention adopts the following technical solutions:

一方面，本发明提供了一种基于分类模型对信息知晓程度的预判模型建立方法，在显示器上呈现待阅读的语篇材料；被试针对语篇材料完成首次阅读，采集首次阅读语篇材料过程中被试的眼球运动数据；确定被试认真阅读后，再次在显示器上呈现被试已阅读过的语篇材料供其重读，采集重读语篇材料过程中被试的眼球运动数据；提取两次阅读语篇材料过程中被试眼球运动数据中的关键特征指标；以眼球运动数据中的关键特征指标作为输入，构建基于神经网络的眼动预测模型，进行是否重读的分类判断；结合训练好的眼动预测模型，判断被试对新语篇材料是否重读。On the one hand, the present invention provides a method for establishing a predictive model based on the classification model's knowledge of information, and presents the text material to be read on the display; the subject completes the first reading of the text material and collects the first reading text material. The subject's eye movement data during the process; after confirming that the subject has read carefully, the text material that the subject has read is presented on the monitor again for re-reading, and the subject's eye movement data during the process of re-reading the text material is collected; two The key feature indicators in the subject's eye movement data during the second reading of the text material; using the key feature indicators in the eye movement data as input, an eye movement prediction model based on the neural network is constructed to classify and judge whether to reread; combined with the training The eye movement prediction model can determine whether the subject stresses the new discourse material.

进一步地，被试完成首次阅读后，在显示器上呈现阅读题供被试作答，当被试达到所设定的做题正确率后确定认真阅读了语篇材料；被试完成语篇重读后，在显示器上通过改变首次阅读题的呈现顺序再次供被试作答。这里的阅读题优选单项选择题。Further, after the subject completes the first reading, reading questions are presented on the monitor for the subject to answer. When the subject reaches the set accuracy rate, it is confirmed that the subject has read the text material carefully; after the subject completes the re-reading of the text, Change the order of presentation of the first reading questions on the monitor for subjects to answer again. The reading questions here are preferably single-choice questions.

进一步地，所提取的阅读眼球运动数据中的关键特征指标为首次阅读和重读时差异显著的眼动指标。Furthermore, the key feature indicators in the extracted reading eye movement data are eye movement indicators that are significantly different between first reading and rereading.

优选地，所提取的关键特征指标包括整体指标和局部兴趣区指标；Preferably, the extracted key feature indicators include overall indicators and local interest area indicators;

所述整体指标包括：总阅读时间、注视时间比例、总注视次数、平均注视时间、向前眼跳次数、平均瞳孔大小、平均回视距离、总回视次数、行内平均回视时间、行内平均回视距离、行内回视次数、跨行回视次数、跨行平均回视时间和跨行平均回视距离；The overall indicators include: total reading time, fixation time ratio, total number of fixations, average fixation time, number of forward saccades, average pupil size, average lookback distance, total number of lookbacks, average lookback time within a line, average within a line Looking back distance, number of looking back within a line, number of looking back across lines, average looking back time across lines, and average looking back distance across lines;

所述局部兴趣区指标包括：IA_总注视时间、IA_总注视次数、IA_平均回视出次数、IA_选择性回视路径阅读时间、IA_平均首次注视时间、IA_平均第一遍阅读时间、IA_平均第一遍注视次数、IA_平均第二次注视时间、IA_平均第二遍阅读时间和IA_平均第二遍注视次数。The local area of interest indicators include: IA_total fixation time, IA_total number of fixations, IA_average number of look-backs, IA_selective look-back path reading time, IA_average first fixation time, IA_average first fixation time Reading time, IA_Average number of first fixations, IA_Average second fixation time, IA_Average second reading time, and IA_Average number of second fixations.

进一步地，所述行内回视、跨行回视和向前眼跳的判断方法是：Further, the method for judging intra-line glances, cross-line glances and forward saccades is:

步骤1、确定语篇材料的行间距A_pix，第一行的顶部Y轴坐标B_pix，眼跳事件的起点坐标current_sac_start_Y和终点坐标current_sac_end_Y；Step 1. Determine the line spacing A_pix of the discourse material, the top Y-axis coordinate B_pix of the first line, the starting point coordinate current_sac_start_Y and the end point coordinate current_sac_end_Y of the saccade event;

步骤2、计算眼跳事件起点所在行数并取整：Step 2. Calculate the number of rows where the saccade event starts and round it up:

start_row＝(“current_sac_start_Y”-B)/Astart_row＝(“current_sac_start_Y”-B)/A

步骤3、计算眼跳事件终点所在行数并取整：Step 3. Calculate the number of rows where the end point of the saccade event is and round it up:

end_row＝(“current_sac_end_Y”-B)/Aend_row＝(“current_sac_end_Y”-B)/A

步骤4、计算跳行数值row_skip＝“end_row”–“start_row”；Step 4. Calculate the row skip value row_skip="end_row"-"start_row";

步骤5、计算眼跳同行内跳动x_skip，Step 5. Calculate x_skip within the same saccade,

x_skip＝“current_sac_end_X”-“current_sac_start_X”x_skip="current_sac_end_X"-"current_sac_start_X"

若row_skip＜0时，则为跨行回视；若row_skip＝0且x_skip＜0时，则为行内回视；若row_skip＞0时，则为向前眼跳。If row_skip<0, it is a cross-row glance; if row_skip=0 and x_skip<0, it is an intra-row glance; if row_skip>0, it is a forward saccade.

更进一步地，所述的行内回视距离和跨行回视距离的计算方法是：Furthermore, the calculation method of the intra-line look-back distance and the cross-line look-back distance is:

行内回视距离(个)：characters＝x_skip/C；In-line lookback distance (number): characters=x_skip/C;

跨行回视距离(个)：characters＝(D*row skip-x_skip)/C；Looking back distance across rows (characters): characters=(D*row skip-x_skip)/C;

其中C为已知每个字母的像素大小，D为矩形语篇材料的长度尺寸，单位pix。Among them, C is the known pixel size of each letter, and D is the length dimension of the rectangular discourse material, in pix.

本发明技术方案具有如下优点：The technical solution of the present invention has the following advantages:

A.本发明所提供的基于神经网络分类模型对信息知晓程度的判断方法，通过建立阅读文本，监测被试在阅读文本时的眼动信息，提取出涉及眼动信息的20个关键特征，建立基于神经网络的眼动预测模型，准确预测出读者对语篇材料的知晓程度是首次阅读还是重读；本发明可以以77％的准确率检测出个体是否熟悉文本材料，在教学领域有助于评估学生的知识掌握程度，在测谎领域有助于辅助检测是否说谎。A. The method for judging the degree of information awareness based on the neural network classification model provided by the present invention, by establishing a reading text, monitoring the subject's eye movement information when reading the text, extracting 20 key features involving eye movement information, and establishing The eye movement prediction model based on the neural network can accurately predict whether the reader's awareness of the text material is reading it for the first time or re-reading it; this invention can detect whether an individual is familiar with the text material with an accuracy of 77%, which is helpful for evaluation in the teaching field. Students' knowledge mastery can help detect whether they are lying in the field of polygraph detection.

B.目前市面上眼动数据分析软件所导出的眼跳系列指标中基本只有总体的眼跳距离、眼跳时间等，为了满足语篇材料的眼动特点需求，本发明将眼跳进一步划分为向前眼跳、行内回视以及跨行回视的距离、时间等，显著提高了是否重读的检测准确率。B. Currently, the saccade series indicators derived by eye movement data analysis software on the market basically only include the overall saccade distance, saccade time, etc. In order to meet the needs of eye movement characteristics of discourse materials, the present invention further divides saccades into The distance and time of forward saccades, intra-line glances, and cross-line glances significantly improve the detection accuracy of rereading.

附图说明Description of the drawings

为了更清楚地说明本发明具体实施方式，下面将对具体实施方式中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the specific embodiments of the present invention more clearly, the drawings needed to be used in the specific implementations will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present invention. For ordinary people in the art, For technical personnel, other drawings can also be obtained based on these drawings without exerting creative work.

图1为本发明所提供的预判模型建立方法框图；Figure 1 is a block diagram of a method for establishing a predictive model provided by the present invention;

图2为本发明提供的测试流程图；Figure 2 is a test flow chart provided by the present invention;

图3为本发明所提供的基于神经网络的语篇重读眼动预测模型构建流程图。Figure 3 is a flow chart for building a text stress eye movement prediction model based on neural networks provided by the present invention.

具体实施方式Detailed ways

下面将结合附图对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

如图1和图2所示，本发明提供了一种基于分类模型对信息知晓程度的预判模型建立方法，其包括电脑主机、显示器、眼动仪等，其在电脑主机中配置有语篇重读的软件系统。首先要对被试读者的眼镜进行九点校准，屏幕左上角显示数字“3”，引导被试的注视点到屏幕左上角，数字开始倒数变化“3-2-1”。As shown in Figures 1 and 2, the present invention provides a method for establishing a predictive model based on the classification model's awareness of information, which includes a computer host, a monitor, an eye tracker, etc., and the computer host is configured with a discourse Reread software system. First, a nine-point calibration is performed on the subject's reader's glasses. The number "3" is displayed in the upper left corner of the screen. Guide the subject's gaze to the upper left corner of the screen, and the number begins to count down and changes to "3-2-1".

预判模型建立的具体方法如下：The specific methods for establishing the prediction model are as follows:

【S01】在显示器上呈现待阅读的语篇材料。其中的语篇材料为从电脑主机中调取的文本材料，可以结合不同被试者的实际情况调取相应的语篇材料，所调取的语篇材料可以是被试感兴趣的文本，当然可以是不感兴趣的文本。[S01] Present the discourse material to be read on the display. The discourse materials are text materials retrieved from the computer host. Corresponding discourse materials can be retrieved based on the actual situation of different subjects. The retrieved discourse materials can be texts that the subjects are interested in. Of course, Can be uninteresting text.

【S02】被试针对语篇材料完成首次阅读，采集首次阅读语篇材料过程中被试的眼球运动数据。[S02] The subject completed the first reading of the text material, and collected the subject’s eye movement data during the first reading of the text material.

被试按日常阅读的速度和习惯从头仔细阅读。阅读不限时间，以理解文章内容为准，过程中注意眼睛始终看着屏幕。当阅读完成后，可以通过操作显示器进入答题阶段，比如按空格进入后面的四道阅读理解题目，其为四选一的选择题目，根据被试阅读的语篇材料进行作答。The subjects read carefully from the beginning according to their daily reading speed and habits. There is no time limit for reading, and it is subject to understanding the content of the article. Please pay attention to keeping your eyes on the screen during the process. After reading is completed, you can enter the question-answering stage by operating the monitor, for example, pressing the space button to enter the next four reading comprehension questions. These are four-choice questions, and the subjects can answer according to the text materials they read.

【S03】确定被试认真阅读后，再次在显示器上呈现被试已阅读过的语篇材料供其重读，采集重读语篇材料过程中被试的眼球运动数据。当被试达到设定的做题正确率时，则认为被试对语篇材料进行了认真阅读；然后再次在显示器上呈现首次阅读的语篇材料供被试重读，同步采集被试阅读时的眼球运动数据。在进行重读前，可以请被试休息几分钟(比如3分钟)，询问被试是否休息好，得到肯定回答后进行第二遍阅读。在重读开始前需再次进行九点眼动校准，并给出提醒，要求被试“第二篇文章和第一篇的内容一致，但是题目设置有所不同，所以还是请你当成一篇新的来看待”，以此防止被试重读时不认真阅读的情况。当然了，在重读语篇材料后，还可以在显示器上呈现阅读单选题，可以采用与首次阅读后所呈现的阅读单选题完全相同的题目或者改变阅读题的呈现顺序，当然还可以重新设置阅读题目，根据被试答题正确率，确定被试重读时是否认真。若两次阅读都确定为认真时，则确定眼球运动数据是可以采纳并用于后续预判模型建立。[S03] After confirming that the subject has read carefully, the text material that the subject has read is presented on the monitor again for re-reading, and the eye movement data of the subject during the process of re-reading the text material is collected. When the subject reaches the set correct rate of answering questions, it is considered that the subject has carefully read the text material; then the text material read for the first time is presented on the display again for the subject to reread, and the subject's reading data is collected simultaneously. Eye movement data. Before re-reading, you can ask the subject to rest for a few minutes (for example, 3 minutes), ask the subject if he or she has rested well, and read the subject a second time after receiving an affirmative answer. Before re-reading begins, the nine-point eye movement calibration needs to be performed again, and a reminder is given, asking the subject: "The content of the second article is the same as the first article, but the question settings are different, so please treat it as a new one." Treat it" to prevent subjects from not reading carefully when re-reading. Of course, after re-reading the text material, you can also present multiple-choice reading questions on the monitor. You can use the same questions as those presented after the first reading or change the order of presentation of the reading questions. Of course, you can also re- Set reading questions, and determine whether the subjects are serious when rereading based on the correct rate of their answers. If both readings are determined to be serious, it is determined that the eye movement data can be adopted and used for subsequent prediction model building.

【S04】提取两次阅读语篇材料过程中被试眼球运动数据中的关键特征指标。[S04] Extract key feature indicators from the subjects’ eye movement data during the two readings of discourse materials.

本发明优选采用眼动仪全程采集两次阅读语篇材料过程中被试的眼球运动数据，提取眼球运动数据中的若干个关键特征指标。本发明提取的阅读眼球运动数据的关键特征指标为首次阅读和重读时差异显著的眼动指标。The present invention preferably uses an eye tracker to collect the subject's eye movement data during two readings of text materials, and extracts several key characteristic indicators in the eye movement data. The key characteristic indicators of the reading eye movement data extracted by the present invention are the eye movement indicators with significant differences between first reading and rereading.

本发明中创造性地引入了向前眼跳、行内回视和跨行回视参数，通过如下计算方法进一步梳理出被试的眼动特点，使本发明对于被试是否重读具有更高的区分度，成为建立预判重读模型的关键。具体的行内回视、跨行回视和向前眼跳的计算与判断方法如下：The present invention creatively introduces parameters of forward saccade, intra-line glancing and cross-line glancing, and further sorts out the subject's eye movement characteristics through the following calculation method, so that the present invention has a higher degree of discrimination on whether the subject is rereading, Becomes the key to establishing a predictive rereading model. The specific calculation and judgment methods for intra-line gazing, cross-line gazing and forward saccades are as follows:

步骤1、确定语篇材料的行间距A_pix，第一行的顶部Y轴坐标B_pix，眼跳事件的起点坐标current_sac_start_Y和终点坐标current_sac_end_Y；比如行间距A_pix为62pix，第一行的顶部Y轴坐标B_pix为177pix。Step 1. Determine the line spacing A_pix of the discourse material, the top Y-axis coordinate B_pix of the first line, the starting point coordinate current_sac_start_Y and the end coordinate current_sac_end_Y of the saccade event; for example, the line spacing A_pix is 62pix, and the top Y-axis coordinate B_pix of the first line is 177pix.

start_row＝(“current_sac_start_Y”-177)/62start_row＝(“current_sac_start_Y”-177)/62

end_row＝(“current_sac_end_Y”-177)/62end_row＝(“current_sac_end_Y”-177)/62

步骤4、计算跳行数值row_skip＝“end_row”-“start_row”；Step 4. Calculate the row skip value row_skip="end_row"-"start_row";

步骤6、计算行内回视距离和跨行回视距离，单位为字母个数，具体方法是：Step 6. Calculate the intra-line looking back distance and the cross-line looking back distance. The unit is the number of letters. The specific method is:

跨行回视距离(个)：characters＝(D*row_skip-x_skip)/C；Looking back distance across rows (characters): characters=(D*row_skip-x_skip)/C;

其中C为已知每个字母的像素大小，D为矩形语篇材料的长度尺寸，单位pix，比如每个字母的像素大小可以通过电脑进行设置，比如设置为C＝21pix，整个矩形文本中每行的长度也可以通过电脑进行设置，比如设置为D＝1574pix，这里不再赘述。Among them, C is the known pixel size of each letter, and D is the length size of the rectangular discourse material, in pix. For example, the pixel size of each letter can be set through the computer, for example, set to C=21pix, and each letter in the entire rectangular text The length of the line can also be set through the computer, for example, set to D=1574pix, which will not be described here.

由于目前的眼动仪无法直接获取本发明优选的关键特征指标，通过眼动数据分析软件导出眼跳指标中的眼跳信息后，再结合本发明所提供的上述计算与判断方法，得出是否为向前眼跳、行内回视、跨行回视及回视距离的关键特征指标。Since the current eye tracker cannot directly obtain the key characteristic indicators preferred by the present invention, after deriving the saccade information in the saccade indicators through the eye movement data analysis software, and then combined with the above calculation and judgment method provided by the present invention, it is concluded whether It is the key characteristic index of forward saccades, intra-line retrogazes, cross-line retrogazes and retrogaze distances.

下面通过测试各被试阅读语篇材料所得眼动数据，分别对整体指标和局部兴趣区指标进行分析，提取差异显著的眼动指标。Next, by testing the eye movement data obtained by each subject reading text materials, the overall indicators and local interest area indicators are analyzed respectively, and eye movement indicators with significant differences are extracted.

(1)整体指标分析(1) Overall indicator analysis

表1不同阅读水平的读者重读时整体眼动指标描述(n₁＝63,n₂＝40)Table 1 Description of overall eye movement indicators during rereading by readers with different reading levels (n ₁ = 63, n ₂ = 40)

表2不同阅读水平的读者重读时整体眼动指标重复测量结果(N＝103)Table 2 Repeated measurement results of overall eye movement indicators when readers with different reading levels reread (N=103)

从表1、表2中可以得出，除了注视时间比例和平均瞳孔大小，在总阅读时间(F_(1,101)＝107.25，p<0.001)、总注视次数(F_(1,101)＝103.83，p<0.001)、平均注视时间(F_(1,101)＝17.22，p<0.001)、向前眼跳次数(F_(1,101)＝476.69，p<0.001)上的重读主效应显著，重读后的总阅读时间、平均注视时间显著短于首次阅读，重读后的总注视次数、向前眼跳次数显著少于首次阅读；阅读水平的主效应及阅读次数和阅读水平之间的交互效应均不显著(p>0.05)。It can be concluded from Table 1 and Table 2 that, in addition to the proportion of fixation time and average pupil size, the total reading time (F _(1,101) =107.25, p<0.001) and the total number of fixations (F _(1,101) =103.83, p< 0.001), average fixation time (F _(1,101) = 17.22, p < 0.001), and number of forward saccades ( F _(1,101) = 476.69, p < 0.001). The main effect of rereading was significant. The total reading time after rereading, The average fixation time is significantly shorter than that of the first reading, and the total number of fixations and the number of forward saccades after rereading are significantly shorter than that of the first reading; the main effect of reading level and the interaction effect between the number of readings and reading level are not significant (p>0.05 ).

表3不同阅读水平的读者重读时回视指标描述(n₁＝63,n₂＝40)Table 3 Description of retrospection indicators when readers with different reading levels reread (n ₁ = 63, n ₂ = 40)

表4不同阅读水平的读者重读时回视指标重复测量结果(N＝103)Table 4 Repeated measurement results of retrospection indicators during rereading by readers with different reading levels (N=103)

从表3、表4中可以看出，除了行内平均回视时间和跨行平均回视时间，在平均回视距离(F_(1,101)＝7.77，p<0.01)、总回视次数(F_(1,101)＝82.74，p<0.001)、行内平均回视距离(F_(1,101)＝9.27，p<0.01)、行内回视次数(F_(1,101)＝90.97，p<0.001)、跨行回视次数(F_(1,101)＝703.79，p<0.001)、跨行平均回视距离(F_(1,101)＝11.54，p<0.001)上的主效应显著，重读后的平均回视距离和行内平均回视距离显著长于首次阅读，重读后的跨行回视距离显著短于首次阅读，总回视次数、行内回视次数、跨行回视次数显著少于首次阅读；除了跨行回视次数(F_(1,101)＝7.58，p<0.01)，在其他整体回视指标上阅读水平的主效应均不显著(p>0.05)，阅读水平低的读者跨行回视次数显著多于阅读水平高的读者；除了行内回视次数(F_(1,101)＝4.13，p<0.05)，在其他整体回视指标上，阅读次数和阅读水平之间的交互效应均不显著(p>0.05)。As can be seen from Table 3 and Table 4, in addition to the average look-back time within a row and the average look-back time across rows, the average look-back distance (F _(1,101) = 7.77, p<0.01), the total number of looks back (F _{(1,101) )} ＝82.74, p<0.001), average intra-row gaze distance (F _(1,101) ＝9.27, p<0.01), intra-line gaze times (F _(1,101) ＝90.97, p<0.001), cross-line gaze times (F _(1,101) ＝703.79, p<0.001) and the average gaze distance across lines (F _(1,101) ＝11.54, p<0.001) were significant. The average gaze distance after rereading and the average gaze distance within a line were significantly longer than those for the first time. Reading, the distance of cross-line glances after rereading is significantly shorter than that of first reading, and the number of total glances, the number of within-line glances, and the number of cross-line glances are significantly less than those of first reading; except for the number of cross-line glances (F _(1,101) = 7.58, p< 0.01), the main effect of reading level is not significant in other overall looking back indicators (p>0.05), readers with low reading level look back across lines significantly more times than readers with high reading level; except for the number of looking back within a line (F _{( 1,101)} =4.13, p<0.05). On other overall retrospection indicators, the interaction effect between reading times and reading level is not significant (p>0.05).

(2)局部分析(2)Local analysis

表5不同阅读水平的读者重读时局部眼动指标描述(n₁＝63,n₂＝40)Table 5 Description of local eye movement indicators during rereading by readers with different reading levels (n ₁ = 63, n ₂ = 40)

表6不同阅读水平的读者重读时局部眼动指标重复测量方差分析(N＝103)Table 6 Repeated measures variance analysis of local eye movement indicators when readers with different reading levels reread (N=103)

从表5、表6中可以看出，在所有局部眼动指标上阅读次数的主效应均显著，重读后的IA_总注视时间、IA_选择性回视路径阅读时间显著短于首次阅读，IA_总注视次数、IA_平均回视出次数显著少于首次阅读；除了IA_平均回视出次数(F_(1,101)＝4.34，p<0.05)和IA_选择性回视路径阅读时间(F_(1,101)＝6.41，p<0.05)，在其他局部指标上阅读水平的主效应均不显著(p>0.05)，阅读水平低的读者IA_平均回视出次数显著多于阅读水平高的读者，IA_选择性回视路径阅读时间显著短于阅读水平高的读者；除了IA_平均回视出次数(F_(1,101)＝4.60，p<0.05)、IA_选择性回视路径阅读时间(F_(1,101)＝4.38，p<0.05)，在其他局部眼动指标上，阅读次数和阅读水平之间的交互效应均不显著(p>0.05)。It can be seen from Table 5 and Table 6 that the main effect of reading times is significant on all local eye movement indicators. The IA_total fixation time and IA_selective retrospection path reading time after rereading are significantly shorter than those for the first reading. IA_total fixation times and IA_average look-out times were significantly less than the first reading; except for IA_average look-out times (F _(1,101) = 4.34, p<0.05) and IA_selective look-back path reading time ( F _(1,101) =6.41, p<0.05), the main effect of reading level on other local indicators is not significant (p>0.05), readers with low reading level IA_ average number of look backs are significantly more than those with high reading level For readers, the reading time of IA_selective retrospection path is significantly shorter than that of readers with high reading level; except for IA_average number of retrospections (F _(1,101) = 4.60, p<0.05), IA_selective retrospection path reading time (F _(1,101) =4.38, p<0.05). On other local eye movement indicators, the interaction effect between reading times and reading level was not significant (p>0.05).

表7不同阅读水平的读者重读时第一遍和第二遍局部眼动指标描述(n₁＝63,n₂＝40)Table 7 Description of local eye movement indicators for the first and second times when readers with different reading levels reread (n ₁ = 63, n ₂ = 40)

表8不同阅读水平的读者重读时第一遍和第二遍局部眼动指标重复测量方差分析(N＝103)Table 8 Repeated measures variance analysis of local eye movement indicators for the first and second times when readers with different reading levels reread (N=103)

从表7、表8中可以看出，在所有第一遍和第二遍局部眼动指标上阅读次数的主效应均显著，重读后的IA_平均首次注视时间、IA_平均第一遍阅读时间、IA_平均第二次注视时间、IA_平均第二遍阅读时间显著短于首次阅读，重读后的IA_平均第一遍注视次数、IA_平均第二遍注视次数显著少于首次阅读；除了IA_平均第一遍阅读时间(F_(1,101)＝5.11，p<0.05)和IA_平均第一遍注视次数(F_(1,101)＝4.90，p<0.05)，在其他第一遍和第二遍局部指标上阅读水平的主效应均不显著(p>0.05)，阅读水平低的读者IA_平均第一遍阅读时间显著长于阅读水平高的读者，IA_平均第一遍注视次数显著多于阅读水平高的读者；除了IA_平均第一遍注视次数(F_(1,101)＝4.44，p<0.05)、IA_均第二次注视时间(F_(1,101)＝6.99，p<0.05)，在其他第一遍和第二遍局部指标上，阅读次数和阅读水平之间的交互效应均不显著(p>0.05)。It can be seen from Table 7 and Table 8 that the main effect of the number of readings is significant on all local eye movement indicators of the first and second passes. IA_average first fixation time after rereading, IA_average first reading Time, IA_average second fixation time, IA_average second reading time are significantly shorter than first reading time, IA_average first fixation number, IA_average second fixation number after rereading are significantly shorter than first reading time ;Except for IA_average first-pass reading time (F _(1,101) =5.11, p<0.05) and IA_average first-pass fixation times (F _(1,101) =4.90, p<0.05), in other first-pass and The main effect of reading level on the local indicators of the second pass is not significant (p>0.05). The average first-pass reading time of readers with low reading level IA_ is significantly longer than that of readers with high reading level. The average number of IA_ first-pass fixations is significant. More than readers with high reading level; except IA_average first fixation times (F _(1,101) =4.44, p<0.05), IA_average second fixation time (F _(1,101) =6.99, p<0.05) , on other local indicators of the first and second passes, the interaction effect between the number of readings and the reading level is not significant (p>0.05).

通过上述实验分析，选取以上结果中对于首次阅读和重读差异显著的眼动指标，共20个关键特征指标，具体如下表所示。Through the above experimental analysis, a total of 20 key feature indicators were selected for the eye movement indicators that have significant differences between first reading and rereading in the above results, as shown in the table below.

【S05】以阅读眼球运动数据中的关键特征指标作为输入，构建基于神经网络的眼动预测模型，进行是否重读的分类判断。如图3所示，输入的数据为提取的20个关键特征指标，本发明初步选定前馈神经网络模型(FNN)，采用两层网络，中间神经元优选10个，输出变量为是否为首次阅读和是否为重读2个神经元；采用5折交叉验证，1份作为测试集，剩余4份作为训练集用于模型训练，重复第二步4次，同时为每个样本标注首次阅读和重读的标签。通过调整网络结构、激活函数、学习率、正则化等参数，对模型进行优化和调参，以提高模型的性能和泛化能力；使用测试集的数据对模型进行评估和测试，计算模型的准确率、精确率、召回率、F1值、loss等指标，评估模型的性能和预测能力。[S05] Taking the key feature indicators in reading eye movement data as input, construct an eye movement prediction model based on neural networks to classify whether to reread or not. As shown in Figure 3, the input data is the extracted 20 key feature indicators. The present invention initially selects the feedforward neural network model (FNN), using a two-layer network, with 10 intermediate neurons preferably, and the output variable is whether it is the first time. There are 2 neurons for reading and rereading; 5-fold cross-validation is used, 1 set is used as a test set, and the remaining 4 sets are used as training sets for model training. Repeat the second step 4 times, and mark the first reading and rereading for each sample. Tag of. Optimize and adjust parameters of the model by adjusting network structure, activation function, learning rate, regularization and other parameters to improve the performance and generalization ability of the model; use the data of the test set to evaluate and test the model, and calculate the accuracy of the model Rate, precision, recall, F1 value, loss and other indicators to evaluate the performance and prediction ability of the model.

下表是所得到的模型预测结果：The following table is the obtained model prediction results:

5折交叉验证结果(acc)如下:The 5-fold cross-validation results (acc) are as follows:

11 22 33 44 55 MEANMEAN 0.7140.714 0.7320.732 0.7560.756 0.7070.707 0.8060.806 0.740.74

由上表可以看出，其最终的平均准确率可达到74％。As can be seen from the table above, the final average accuracy can reach 74%.

全集结果如下表所示：The full set results are shown in the table below:

LOSSLOSS ACCACC PRECISIONPRECISION RECALLRECALL F1F1 0.5480.548 0.7690.769 0.7740.774 0.7880.788 0.7810.781

【S06】结合训练好的眼动预测模型，判断被试对新语篇材料是否重读。[S06] Combined with the trained eye movement prediction model, determine whether the subject stresses the new discourse material.

本发明根据被试阅读语篇时的20项眼动特征指标，可以以77％的准确率检测出个体是否熟悉文本材料，在教学领域有助于评估学生的知识掌握程度，在测谎领域有助于辅助检测是否说谎，在测谎领域有着重要的应用价值。Based on 20 eye movement characteristic indicators when subjects read text, this invention can detect whether an individual is familiar with text materials with an accuracy of 77%. It is helpful in the teaching field to evaluate students' knowledge mastery and is useful in the field of lie detection. It helps to assist in detecting lies and has important application value in the field of lie detection.

显然，上述实施例仅仅是为清楚地说明所作的举例，而并非对实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明的保护范围之中。Obviously, the above-mentioned embodiments are only examples for clear explanation and are not intended to limit the implementation. For those of ordinary skill in the art, other different forms of changes or modifications can be made based on the above description. An exhaustive list of all implementations is neither necessary nor possible. The obvious changes or modifications derived therefrom are still within the protection scope of the present invention.

Claims

1. A method for establishing a prediction model based on the classification model's awareness of information, which is characterized by presenting the text material to be read on the display; the subject completes the first reading of the text material and collects the process of reading the text material for the first time After confirming that the subject has read carefully, the text material that the subject has read is presented on the monitor again for re-reading, and the eye movement data of the subject during the process of re-reading the text material is collected; extracted twice Key feature indicators in the subject's eye movement data during reading text materials; using the key feature indicators in the eye movement data as input, a neural network-based eye movement prediction model is constructed to classify and judge whether to reread; combined with the trained The eye movement prediction model determines whether the subject stresses the new discourse material.

2. The method for establishing a predictive model based on the classification model's awareness of information according to claim 1, characterized in that after the subject completes the first reading, reading questions are presented on the display for the subject to answer. When the subject reaches After setting the correct rate of answering questions, it is confirmed that the subject has carefully read the text material; after the subject completes the re-reading of the text, the subject will be asked to answer again by changing the order of presentation of the first reading questions on the monitor.

3. The method for establishing a prediction model based on the classification model's awareness of information according to claim 1, characterized in that the key feature indicators in the extracted reading eye movement data are eye movements with significant differences between first reading and rereading. index.

4. The method for establishing a predictive model based on the classification model's knowledge of information according to claim 3, characterized in that the extracted key feature indicators include overall indicators and local interest area indicators;

The overall indicators include: total reading time, fixation time ratio, total number of fixations, average fixation time, number of forward saccades, average pupil size, average lookback distance, total number of lookbacks, average lookback time within a line, average within a line Looking back distance, number of looking back within a line, number of looking back across lines, average looking back time across lines, and average looking back distance across lines;

The local area of interest indicators include: IA_total fixation time, IA_total number of fixations, IA_average number of look-backs, IA_selective look-back path reading time, IA_average first fixation time, IA_average first fixation time Reading time, IA_Average number of first fixations, IA_Average second fixation time, IA_Average second reading time, and IA_Average number of second fixations.

5. The method for establishing a predictive model based on the classification model's knowledge of information according to claim 4, characterized in that the method for judging within-line glances, cross-line glances and forward saccades is:

Step 1. Determine the line spacing A pix of the discourse material, the top Y-axis coordinate B pix of the first line, the starting point coordinate current_sac_start_Y and the end point coordinate current_sac_end_Y of the saccade event;

Step 2. Calculate the number of rows where the saccade event starts and round it up:

start_row＝(“current_sac_start_Y”-B)/A

Step 3. Calculate the number of rows where the end point of the saccade event is and round it up:

end_row＝(“current_sac_end_Y”-B)/A

Step 4. Calculate the row skip value row_skip="end_row"-"start_row";

Step 5. Calculate x_skip within the same saccade,

x_skip="current_sac_end_X"-"current_sac_start_X"

If row_skip<0, it is a cross-row saccade; if row_skip=0 and x_skip<0, it is an intra-row saccade; if row_skip>0, it is a forward saccade.

6. The prediction model establishment method based on the classification model's awareness of information according to claim 5, characterized in that the calculation method of the intra-line look-back distance and the cross-line look-back distance is:

In-line lookback distance (number): characters=x_skip/C;

Looking back distance across rows (characters): characters=(D*row skip-x_skip)/C;

Among them, C is the known pixel size of each letter, and D is the length dimension of the rectangular discourse material, in pix.