+

CN119888762A - Handwriting mathematical formula recognition method and system - Google Patents

Handwriting mathematical formula recognition method and system Download PDF

Info

Publication number
CN119888762A
CN119888762A CN202411790409.8A CN202411790409A CN119888762A CN 119888762 A CN119888762 A CN 119888762A CN 202411790409 A CN202411790409 A CN 202411790409A CN 119888762 A CN119888762 A CN 119888762A
Authority
CN
China
Prior art keywords
model
mathematical formula
sequence
tree structure
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411790409.8A
Other languages
Chinese (zh)
Inventor
高良才
赵文祺
颜钦钦
朱建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202411790409.8A priority Critical patent/CN119888762A/en
Publication of CN119888762A publication Critical patent/CN119888762A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/226Character recognition characterised by the type of writing of cursive writing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

本发明属于信息技术、图像识别技术领域,涉及一种手写数学公式识别方法和系统。该方法包括:构造手写数学公式标注数据;构建手写数学公式识别初始模型,使用标注数据对初始模型进行训练,在序列解码任务和树结构预测任务上进行联合优化,得到手写数学公式识别模型;采用训练得到的手写数学公式识别模型进行手写数学公式识别。本发明融合了序列解码模型与树形解码模型的优势,有效提升了模型对数学公式结构的理解及其泛化能力,在保持现有基于编码器‑解码器模型方法的高效并行训练优势的基础上,显著提高了手写数学公式识别的准确率。

The present invention belongs to the fields of information technology and image recognition technology, and relates to a method and system for handwritten mathematical formula recognition. The method comprises: constructing handwritten mathematical formula annotated data; constructing an initial model for handwritten mathematical formula recognition, training the initial model using the annotated data, performing joint optimization on the sequence decoding task and the tree structure prediction task, and obtaining a handwritten mathematical formula recognition model; and using the trained handwritten mathematical formula recognition model to perform handwritten mathematical formula recognition. The present invention combines the advantages of the sequence decoding model and the tree decoding model, effectively improving the model's understanding of the mathematical formula structure and its generalization ability, and significantly improving the accuracy of handwritten mathematical formula recognition while maintaining the advantages of efficient parallel training of the existing encoder-decoder model method.

Description

Handwriting mathematical formula recognition method and system
Technical Field
The invention belongs to the technical fields of information technology and image recognition, and particularly relates to a handwriting mathematical formula recognition method and system.
Background
The handwritten mathematical formula identification can identify and analyze the mathematical formula from the handwritten material, and has important application in many fields such as education, scientific research, engineering design, data analysis and the like. The technology can help us to quickly and accurately convert and process complex mathematical formulas, reduce repeated labor and provide convenience for life and work of people. For example, in the educational field, handwriting recognition can help teachers and students to quickly convert formulas in a blackboard or note into electronic documents, improving teaching and learning efficiency. In the scientific research field, researchers can utilize the technology to extract and analyze mathematical formulas from manuscripts rapidly, and the scientific research process is accelerated. In addition, handwriting formula recognition can also help engineers and data analysts to quickly convert and apply various complex formulas in design and data processing, thereby improving work efficiency and accuracy. Through carrying out efficient recognition and digital processing on the handwriting formula, a user can integrate and utilize the information better, and the development of the related field is promoted.
Handwriting mathematical formula recognition is classified into online recognition and offline recognition, which differ in the manner of input, in that online input is dynamic in nature and offline input is static. The on-line input is a handwriting stroke track, the track comprises the sequence, curvature and the like of the strokes, and the off-line input is an image containing a handwriting mathematical formula. The recognition method of the invention belongs to an offline handwriting mathematical formula recognition method.
The mathematical formula consists of symbols and symbol relations, wherein the symbols comprise Arabic numerals, english letters, greek letters, operators, other special characters and the like, the symbol relations are different from the character sequence relations of the conventional linear arrangement, the characteristics of multilevel and high structuring are presented, and mathematical symbols such as scores, root numbers, integrals and the like not only comprise basic characters, but also involve complex structures such as upper and lower marks, nested symbols and the like, and the recognition difficulty is greatly increased. Therefore, in the task of recognition of handwritten mathematical formulas, a core challenge is how to efficiently model the sign and structural relationships of the mathematical formulas. The recognition method is not only required to accurately recognize characters in the image, but also is more important to construct complex hierarchical relationships among the characters, and in addition, the recognition difficulty is greatly increased due to the diversity of handwriting, the similarity of handwriting symbols, the difference of handwriting styles, the complex structure of mathematical formulas and the dependency of the context.
The handwriting mathematical formula recognition technology uses a rule-based method, and with the continuous progress of the deep learning technology, the encoder-decoder-based recognition method can tightly combine an input image with recognition contents, so that a model can fully understand the contents in the image and transcribe into an accurate character sequence according to the image contents, has stronger modeling capability, and shows more excellent performance compared with the traditional method.
In a computer system, the representation method of the mathematical formula is mainly divided into two types of representation based on a sequence and representation based on a tree structure, and two corresponding decoding models, namely a model based on sequence decoding and a model based on tree decoding, are also developed by the identification method based on an encoder-decoder in handwriting mathematical formula identification. The method based on the sequence decoding model takes the LaTeX expression as a decoding target, has the advantages that the LaTeX expression is a widely adopted mathematical formula expression, can be well adapted to the existing research results in the fields of natural language processing, optical character recognition and the like, and has stronger universality. The method based on the tree decoding model regards a mathematical formula as a tree structure to carry out predictive decoding. The model design fully integrates the characteristic of the tree structure, which enhances the ability of the model to understand the mathematical formula in theory, and ensures that the model can always decode the legal mathematical formula in the predictive reasoning process.
Methods based on sequence decoding models have certain limitations. First, the model does not fully take into account the tree-like structure characteristics of the mathematical formulas during the training process, which may lead to insufficient understanding of the formula structure by the model, thereby affecting its generalization ability when processing complex structural formulas. Secondly, as a target of sequence prediction, the LaTeX expression cannot guarantee the grammar normalization of the model decoding result. For example, in processing complex mathematical formulas, the model may produce left and right brackets that do not match, resulting in predictions that do not conform to LaTeX's grammar specification.
Meanwhile, the prior method based on the tree-shaped decoding model has the defects. First, these models need to be built depending on the recurrent neural network, and cannot be trained in parallel with high efficiency. In addition, these models are relatively complex in decoding manner, lack of versatility and wide applicability as compared with LaTeX expression, and their performance is generally inferior to the sequence decoding model-based method, resulting in failure of the tree decoding method to find wide application.
Disclosure of Invention
Aiming at the respective advantages and disadvantages of the two decoding models, the invention combines the advantages of the sequence decoding model and the tree-shaped decoding model, and provides a handwritten mathematical formula recognition method and a handwritten mathematical formula recognition system based on formula tree structure perception.
The technical scheme adopted by the invention is as follows:
a handwritten mathematical formula recognition method comprising the steps of:
constructing handwritten mathematical formula labeling data;
Constructing a handwritten mathematical formula recognition initial model, training the initial model by using labeling data, and performing joint optimization on a sequence decoding task and a tree structure prediction task to obtain the handwritten mathematical formula recognition model;
And carrying out handwriting mathematical formula recognition by adopting a handwriting mathematical formula recognition model obtained through training.
Further, the constructing handwritten mathematical formula annotation data includes converting the mathematical formula into a tree structure, the converting the mathematical formula into a tree structure including representing the tree structure of the mathematical formula as a series of tuples, wherein each tuple (c, p) includes a child node c of each subtree and its parent node p.
Further, the model structure of the initial model includes:
The visual coding model is used for extracting high-level visual characteristics from the input handwritten mathematical formula sheet;
the image position coding module and the character position coding module are used for respectively providing explicit position information for the visual characteristics and the word embedding vectors;
The decoding model is used for outputting a character semantic feature sequence;
the formula tree structure perception module is used for constructing the feature vector output by the decoding model into a formula structure tree;
and the projection layer is used for decoding the mathematical formula sequence in an autoregressive mode by utilizing the feature vector output by the decoding model.
Further, the decoding model employs a transducer decoding model that introduces an overlay attention mechanism.
Further, the processing procedure of the formula tree structure sensing module comprises the following steps:
Inputting a character semantic feature sequence X extracted by a decoding model;
Performing feature extraction on the character semantic feature sequence X by using a transducer encoder to obtain X';
The feature vector X' output by the transducer encoder is mapped into a child node vector and a father node vector through two linear projection functions F c (-) and F p (-), respectively;
Combining and adding the child node vectors and the father node vectors in pairs to obtain a relation feature matrix M, wherein a vector M i,j at each position in the matrix M represents a relation feature vector of which the character with the index of i is used as a child node and the character with the index of j is used as a father node;
Using function F score (·) to convert the relationship feature matrix M to a relationship score matrix S, function F score (·) consisting of a ReLU activation function and a vector dot product operation;
Each element S i,j in the relation score matrix S output by the formula tree structure perception module represents relation scores with the ith character as a child node and the jth character as a father node, wherein the higher the score is, the greater the possibility of establishing father-son relation is, and for the element S a,b with the highest score in each row, the model prediction is considered to have father-son relation with the a symbol as a child node and the b symbol as a father node, so that the whole mathematical formula tree structure is constructed by establishing the relation among the characters.
Further, the training strategy of the initial model comprises the steps of combining a loss function of a sequence decoding task and a loss function of a tree structure prediction task to train the model;
The loss function of the sequence decoding task is:
Wherein, X t represents the feature vector of the t time step output by the decoding model, W o represents the linear projection parameter matrix, and b o represents the bias vector;
The loss function of the tree structure prediction task is as follows:
s is a relation score matrix output by the formula tree structure sensing module and used for estimating the probability of parent-child relation between each character t and other characters.
Further, the handwriting mathematical formula recognition by using the handwriting mathematical formula recognition model obtained by training comprises:
Acquiring a handwritten mathematical formula image to be identified;
Preprocessing a handwritten mathematical formula image to be recognized;
Inputting the preprocessed handwritten mathematical formula image into a handwritten mathematical formula recognition model obtained through training, predicting each symbol in the formula image by the model, and generating an initial recognition sequence;
generating a group of candidate sequences by adopting a beam search strategy for the initial identification sequence, and calculating a sequence decoding score for each candidate sequence;
Calculating a relation score matrix of the candidate sequences by using a formula tree structure perception module, and further calculating a tree structure prediction score for each candidate sequence;
and adding the sequence decoding score and the tree structure prediction score, and selecting a candidate sequence with the highest comprehensive score as a final output sequence.
A handwritten mathematical formula recognition system, comprising:
the annotation data construction module is used for constructing annotation data of the handwriting mathematical formula;
the model training module is used for constructing a handwritten mathematical formula recognition initial model, training the initial model by using the labeling data, and carrying out joint optimization on a sequence decoding task and a tree structure prediction task to obtain the handwritten mathematical formula recognition model;
And the formula recognition module is used for carrying out handwriting mathematical formula recognition by adopting the handwriting mathematical formula recognition model obtained through training.
The beneficial effects of the invention are as follows:
The invention effectively improves the understanding and generalization capability of the model to the mathematical formula structure by fusing the methods based on the sequence decoding and the tree decoding, and obviously improves the accuracy of handwriting mathematical formula identification on the basis of maintaining the high-efficiency parallel training advantages of the existing encoder-decoder model method. Specifically, the invention can perform joint optimization on the two tasks in the training stage, and simultaneously trains the understanding capability of the model on the two aspects of the sequence arrangement and the tree structure of the mathematical formula. In the reasoning prediction stage, the invention provides that the tree structure prediction scoring mechanism is integrated into the beam search algorithm, so that the model considers the sequence decoding of a mathematical formula and the rationality of the tree structure when generating the LaTeX expression, and the identification accuracy is further improved. In summary, the recognition method provided by the invention improves the recognition capability of the sequence decoding model on the handwriting mathematical formula of the complex structure.
Drawings
Fig. 1 is a schematic diagram of the structure of the model of the present invention.
FIG. 2 is a schematic diagram of a formula tree structure awareness module.
Fig. 3 is an image sample diagram of a handwritten formula to be recognized.
Detailed Description
The present invention will be further described in detail with reference to the following examples and drawings, so that the above objects, features and advantages of the present invention can be more clearly understood.
The invention combines the advantages of a sequence decoding model and a tree-shaped decoding model, provides a handwritten mathematical formula recognition method based on formula tree structure perception, introduces a formula tree structure perception module on the basis of the flexibility and high-efficiency training efficiency of the sequence decoding model, and improves the understanding and generalization capability of complex formula structures. By carrying out joint optimization on two tasks of sequence prediction and tree structure prediction, the method can decode the LaTeX sequence and consider the rationality of the formula tree structure, so that the recognition result is more complete and accurate in grammar structure. Experimental results on a plurality of data sets show that compared with the traditional sequence decoding model and tree-shaped decoding model, the method provided by the invention has better recognition performance, and the potential of a method for combining sequence and tree-shaped decoding in the field of handwriting mathematical formula recognition is also demonstrated.
The invention relates to a handwritten mathematical formula recognition method based on an encoder-decoder, which completes a recognition flow by constructing a deep neural network model. Firstly, constructing handwritten mathematical formula labeling data, constructing a handwritten mathematical formula identification initial model, training the initial model by using the labeling data, and automatically learning visual characteristics of the handwritten mathematical formula and semantic structures of LaTeX expressions from the data to obtain a final model. And then, carrying out handwriting mathematical formula recognition by adopting a final model obtained through training.
And 1, constructing handwritten mathematical formula labeling data, and converting a mathematical formula based on a LaTeX expression into a tree structure.
In particular, the present invention represents the tree structure of the mathematical formula as a series of tuples, wherein each tuple (c, p) contains the child node c of each subtree and its parent node p. Notably, in order to disambiguate the repeated symbols that may be introduced and ensure compatibility with the LaTeX expression labels used by the sequence decoding model, the present invention chooses to use the index of the mathematical symbol in the LaTeX expression as the node identification in the formula tree, rather than directly using the mathematical symbol itself. For example, for mathematical formula 3 2 -1=8 in fig. 1, its corresponding LaTeX expression is labeled "3 {2} -1=8", and its formula tree structure label may be expressed as "(0, -1), (1, -1), (2, -1), (3, 0), (4, -1), (5, 0), (6, 5), (7, 6), (8, 7)", where-1 indicates that the node has no parent node, i.e., the node does not participate in the tree structure building process. By the method, the sequence labeling and tree structure labeling data of the mathematical formula are unified, and further, the model is allowed to perform end-to-end joint optimization on sequence decoding and tree structure prediction tasks.
And 2, constructing a handwritten mathematical formula to identify an initial model.
The model structure of the invention is shown in figure 1, and mainly comprises four parts, namely 1) a visual coding model for extracting high-level visual features with rich meaning from an input handwritten mathematical formula. The module adopts DenseNet model widely applied in the field of handwriting mathematical formula recognition as backbone network. 2) The image position coding module and the character position coding module respectively provide explicit position information for the visual features and the word embedding vectors. The image position coding module and the character position coding module use the same sine and cosine position coding. 3) The invention adopts a transform decoding model which introduces an overlay attention mechanism. 4) And the formula tree structure perception module is used for constructing the feature vector output by the decoding model into a formula structure tree to help the model to better capture the structural information of the mathematical formula. 5) And the projection layer is used for decoding the LaTeX sequence in an autoregressive mode by utilizing the feature vector output by the decoding model.
The visual coding model can be implemented by adopting DenseNet models and the like.
The decoding model is shown in fig. 1 and comprises a self-attention module, an overlay attention module and a feedforward neural network. The self-attention module is responsible for modeling the information interaction between the current decoding position and the previous position, the overlay attention module is responsible for capturing the information interaction between the decoder and the encoder, guiding the decoder to generate a final output sequence by utilizing the characteristic information generated by the encoder, and the feedforward neural network is responsible for carrying out nonlinear transformation on the representation after attention calculation, so that the expressive power of the model is enhanced and more accurate characteristic characterization is generated.
An architectural overview of the proposed formula tree structure aware module is shown in fig. 2, where the "SOS" flag is used to indicate the start of a sequence. The input of the module is a character semantic feature sequence X epsilon R T×d extracted by a decoding model, wherein T represents the LaTeX sequence length, and d represents the feature vector dimension of the character. In order to map the semantic features of the characters into the semantic space of the formula tree structure, a transducer encoder is used for extracting features of the character semantic feature sequence X to obtain X' E R T×d, so that the tree structure relationship can be conveniently built in the semantic space later.
X′=TransformerEncoder(X),
Then, the feature vector X' output by the transducer encoder is mapped into a child node vector X c∈RT×d and a parent node vector X p∈RT×d, respectively, by two linear projection functions F c (·) and F p (·).
Xc=Fc(X′)=X′Wc,
Xp=Fp(X′)=X′Wp,
Where W c∈Rd×d and W p∈Rd×d are both trainable linear projection parameter matrices, where d represents the feature vector dimensions of the character. In order to construct the formula tree structure relation between characters, the invention combines and adds the child node vector and the father node vector pairwise to obtain the relation characteristic matrix M. In this matrix, the vector M i,j∈Rd for each position represents a relationship feature vector in which the character with index i in the LaTeX expression is used as a child node and the character with index j is used as a parent node.
Finally, the present invention uses a function F score (·) consisting of a ReLU activation function and a vector dot product operation to convert the relationship feature matrix M into a relationship score matrix S ε R T ×T:
S=Fscore(M)=max(0,M)vs.
wherein v s is a weight vector.
Each element S i,j in the relationship score matrix S output by the formula tree structure sensing module represents a relationship score with the ith character as a child node and the jth character as a parent node, where a higher score means a higher likelihood that a parent-child relationship is established. In the mathematical formula tree structure, each node except the root node has only one parent node. Therefore, for the element S a,b with the highest score in each row, the model prediction can be considered to have a parent-child relationship with the a symbol as a child node and the b symbol as a parent node, so that the whole mathematical formula tree structure is constructed by establishing the relationship between characters.
And step 3, training the initial model by using the labeling data to obtain a final model.
1) Training strategy
In order to combine the tasks of sequence decoding and tree decoding and thus enable the model to perform end-to-end joint optimization on both tasks, the present invention combines the sequence decoding and tree structure prediction loss functions to train the model.
Specifically, for the LaTeX expression sequence y 1,…,yT, in the sequence decoding task, the feature vector X e R T×L output by the decoding model can be used to calculate the probability that each character appears at time step t, and then the cross entropy loss function is used to calculate the loss L seq of the sequence decoding task.
Where X t represents the eigenvector of the t-th time step of the decoding model output, W o represents the linear projection parameter matrix, and b o represents the bias vector.
In the tree structure prediction task, the relation score matrix S e R T×T output by the formula tree structure perception module is used for estimating the probability of the parent-child relation between each character t and other characters, and then the cross entropy loss function is also used for calculating the loss L struct of the tree structure prediction task.
And finally, adding the loss of the sequence decoding and tree structure prediction tasks to obtain a loss function L for training the method.
L=Lseq+Lstruct
2) Model training
According to the invention, the PyTorch framework can be used for constructing an initial model, the labeling data is used for performing supervised training, and the model optimization super parameters are adjusted according to experimental conditions to obtain a final model.
And 4, carrying out handwriting mathematical formula recognition by adopting a final model obtained through training.
1) And acquiring a formula image to be identified, and acquiring a handwriting formula image to be identified by using a scanning or mobile equipment photographing method. The sample is shown in fig. 3, for example.
2) Preprocessing a handwriting formula to be identified, performing image enhancement and noise removal on the formula image, and then performing inclination correction and scale normalization operation on the formula image, so that the formula image to be identified and the image in the training image data set have the same scale and angle.
3) And inputting the preprocessed formula image into a final model obtained by training, predicting each symbol in the formula image by the model, and generating an initial recognition sequence.
4) A bundle search strategy is applied to the initial recognition sequence to generate a set of candidate sequences, and a sequence decoding score S seq (y) is calculated for each candidate sequence y using the forward and reverse sequence decoded bundle search scores.
5) And calculating a relation score matrix of the candidate sequences by using a formula tree structure perception module, constructing corresponding tree structure labels by using the candidate sequences y, calculating cross entropy loss of the relation score matrix and the tree structure labels, and calculating a tree structure prediction score S struct (y) for each candidate sequence by using negative cross entropy loss.
6) And adding the sequence decoding score and the tree structure prediction score, namely S seq(y)+Sstruct (y), and selecting the candidate sequence with the highest comprehensive score as a final output sequence.
By considering the scores on sequence decoding and tree structure prediction at the same time, the method ensures the accuracy of the final generated sequence and also considers the rationality of the formula tree structure.
The key point of the invention is a formula tree structure sensing module. The module effectively improves the understanding of the model to the mathematical formula structure and the generalization capability thereof by fusing the methods based on the sequence decoding and the tree decoding. Specifically, the module can perform joint optimization on the two tasks in a training stage, and simultaneously trains the understanding capability of the model on the two aspects of the sequence arrangement and the tree structure of the mathematical formula. In the reasoning prediction stage, the invention provides that the tree structure prediction scoring mechanism is integrated into the beam search algorithm, so that the model considers the sequence decoding of a mathematical formula and the rationality of the tree structure when generating the LaTeX expression, and the identification accuracy is further improved.
The above embodiment takes a mathematical formula expression in a LaTeX format as an example to describe the handwriting mathematical formula recognition method of the present invention. In addition to LaTeX expressions, the method of the present invention is also applicable to other forms of mathematical formula expressions, such as MathML, etc.
The performance of the model of the present invention on formula recognition accuracy (ExpRate) is compared with the existing most advanced model on CROHME 2014/2016/2019 test datasets common to handwritten mathematical formula recognition tasks. None of the methods uses data enhancement to ensure fair comparison. As shown in Table 1, the process of the present invention is always superior to the existing process in all indexes.
TABLE 1 formula identification accuracy comparison of the invention with the prior art
Another embodiment of the present invention provides a handwritten mathematical formula recognition system, comprising:
the annotation data construction module is used for constructing annotation data of the handwriting mathematical formula;
the model training module is used for constructing a handwritten mathematical formula recognition initial model, training the initial model by using the labeling data, and carrying out joint optimization on a sequence decoding task and a tree structure prediction task to obtain the handwritten mathematical formula recognition model;
And the formula recognition module is used for carrying out handwriting mathematical formula recognition by adopting the handwriting mathematical formula recognition model obtained through training.
The above-mentioned division of the modules is only illustrative, and the above-mentioned functions can be distributed by different functional modules according to the needs in practical application to complete all or part of the functions described in the above-mentioned method. The specific working process of each module may refer to the corresponding process in the foregoing method embodiment, which is not described herein.
Another embodiment of the invention provides a computer device (computer, server, smart phone, etc.) comprising a memory storing a computer program configured to be executed by the processor and a processor, the computer program comprising instructions for performing the steps of the method of the invention.
Another embodiment of the invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program which, when executed by a computer, performs the steps of the method of the invention.
The above-disclosed embodiments of the present invention are intended to aid in understanding the contents of the present invention and to enable the same to be carried into practice, and it will be understood by those of ordinary skill in the art that various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention. The invention should not be limited to what has been disclosed in the examples of the specification, but rather by the scope of the invention as defined in the claims.

Claims (10)

1.一种手写数学公式识别方法,其特征在于,包括以下步骤:1. A method for recognizing handwritten mathematical formulas, comprising the following steps: 构造手写数学公式标注数据;Construct handwritten mathematical formula annotation data; 构建手写数学公式识别初始模型,使用标注数据对初始模型进行训练,在序列解码任务和树结构预测任务上进行联合优化,得到手写数学公式识别模型;Construct an initial model for handwritten mathematical formula recognition, use the labeled data to train the initial model, and perform joint optimization on the sequence decoding task and the tree structure prediction task to obtain a handwritten mathematical formula recognition model; 采用训练得到的手写数学公式识别模型进行手写数学公式识别。The trained handwritten mathematical formula recognition model is used to perform handwritten mathematical formula recognition. 2.根据权利要求1所述的方法,其特征在于,所述构造手写数学公式标注数据,包括将数学公式转换为树状结构;所述将数学公式转换为树状结构,包括:将数学公式的树状结构表示为一系列二元组,其中每个二元组(c,p)包含每个子树的子节点c及其父节点p。2. The method according to claim 1 is characterized in that the constructing of handwritten mathematical formula annotation data includes converting the mathematical formula into a tree structure; the converting of the mathematical formula into a tree structure includes: representing the tree structure of the mathematical formula as a series of tuples, wherein each tuple (c, p) contains the child node c of each subtree and its parent node p. 3.根据权利要求1所述的方法,其特征在于,所述初始模型的模型结构包括:3. The method according to claim 1, characterized in that the model structure of the initial model comprises: 视觉编码模型,用于从输入的手写数学公式片中提取高层视觉特征;A visual encoding model that extracts high-level visual features from the input handwritten math formula slices; 图像位置编码模块与字符位置编码模块,用于分别为视觉特征和词嵌入向量提供显式的位置信息;The image position encoding module and the character position encoding module are used to provide explicit position information for visual features and word embedding vectors respectively; 解码模型,用于输出字符语义特征序列;A decoding model, used to output a sequence of character semantic features; 公式树结构感知模块,用于将解码模型输出的特征向量构建为公式结构树;A formula tree structure perception module is used to construct the feature vector output by the decoding model into a formula structure tree; 投影层,用于利用解码模型输出的特征向量以自回归的形式来解码数学公式序列。The projection layer is used to decode the mathematical formula sequence in an autoregressive manner using the feature vector output by the decoding model. 4.根据权利要求3所述的方法,其特征在于,所述解码模型采用引入覆盖注意力机制的Transformer解码模型。4. The method according to claim 3 is characterized in that the decoding model adopts a Transformer decoding model that introduces an overlay attention mechanism. 5.根据权利要求3所述的方法,其特征在于,所述公式树结构感知模块的处理过程包括:5. The method according to claim 3, characterized in that the processing process of the formula tree structure perception module includes: 输入由解码模型提取得到的字符语义特征序列X;Input the character semantic feature sequence X extracted by the decoding model; 使用Transformer编码器对字符语义特征序列X进行特征提取得到XUse the Transformer encoder to extract the character semantic feature sequence X to obtain X ; 通过两个线性投影函数Fc(·)和Fp(·),将Transformer编码器输出的特征向量X′分别映射为子节点向量和父节点向量;Through two linear projection functions F c (·) and F p (·), the feature vector X′ output by the Transformer encoder is mapped to the child node vector and the parent node vector respectively; 将子节点向量和父节点向量进行两两组合并相加,得到关系特征矩阵M,在矩阵M中每个位置的向量Mi,j代表下标为i的字符作为子节点与下标为j的字符作为父节点的关系特征向量;The child node vectors and parent node vectors are combined and added in pairs to obtain the relationship feature matrix M. The vector Mi ,j at each position in the matrix M represents the relationship feature vector between the character with subscript i as the child node and the character with subscript j as the parent node; 使用函数Fscore(·)来将关系特征矩阵M转换为关系分数矩阵S,函数Fscore(·)由一个ReLU激活函数和一个向量点积运算组成;Use the function F score (·) to convert the relationship feature matrix M into the relationship score matrix S. The function F score (·) consists of a ReLU activation function and a vector dot product operation; 在公式树结构感知模块输出的关系分数矩阵S中的每个元素Si,j代表以第i个字符作为子节点,第j个字符作为父节点的关系分数,其中分数越高意味着父子关系成立的可能性越大;对于每行中分数最高的元素Sa,b,认为模型预测存在以第a个符号为子节点,第b个符号为父节点的父子关系,进而通过建立字符之间的关系来构建整个数学公式树结构。Each element Si ,j in the relationship score matrix S output by the formula tree structure perception module represents the relationship score with the i-th character as the child node and the j-th character as the parent node, where the higher the score means the greater the possibility of the parent-child relationship. For the element Sa ,b with the highest score in each row, it is considered that the model predicts the existence of a parent-child relationship with the a-th symbol as the child node and the b-th symbol as the parent node, and then the entire mathematical formula tree structure is constructed by establishing the relationship between the characters. 6.根据权利要求1所述的方法,其特征在于,所述初始模型的训练策略包括:将序列解码任务的损失函数和树结构预测任务的损失函数相结合来训练模型;6. The method according to claim 1, characterized in that the training strategy of the initial model comprises: combining the loss function of the sequence decoding task and the loss function of the tree structure prediction task to train the model; 所述序列解码任务的损失函数为:The loss function of the sequence decoding task is: 其中,Xt表示解码模型输出的第t个时间步的特征向量,Wo表示线性投影参数矩阵,bo表示偏置向量;Among them, Xt represents the feature vector of the tth time step output by the decoding model, Wo represents the linear projection parameter matrix, and bo represents the bias vector; 所述树结构预测任务的损失函数为:The loss function of the tree structure prediction task is: 其中,S为公式树结构感知模块输出的关系分数矩阵,用于估计每个字符t与其他字符间存在父子关系的概率。Among them, S is the relationship score matrix output by the formula tree structure perception module, which is used to estimate the probability that each character t has a parent-child relationship with other characters. 7.根据权利要求1所述的方法,其特征在于,所述采用训练得到的手写数学公式识别模型进行手写数学公式识别,包括:7. The method according to claim 1, characterized in that the step of using the trained handwritten mathematical formula recognition model to perform handwritten mathematical formula recognition comprises: 获取待识别的手写数学公式图像;Obtain a handwritten mathematical formula image to be recognized; 对待识别的手写数学公式图像进行预处理;Preprocess the handwritten mathematical formula image to be recognized; 将预处理后的手写数学公式图像输入至训练得到的手写数学公式识别模型中,该模型对公式图像中的每一个符号进行预测,并生成初始识别序列;The preprocessed handwritten mathematical formula image is input into the trained handwritten mathematical formula recognition model, which predicts each symbol in the formula image and generates an initial recognition sequence; 对初始识别序列采用束搜索策略生成一组候选序列,并为每个候选序列计算序列解码得分;A beam search strategy is used for the initial recognition sequence to generate a set of candidate sequences, and a sequence decoding score is calculated for each candidate sequence; 利用公式树结构感知模块计算候选序列的关系分数矩阵,进而为每个候选序列计算树结构预测得分;The formula tree structure perception module is used to calculate the relationship score matrix of the candidate sequence, and then the tree structure prediction score is calculated for each candidate sequence; 将序列解码得分与树结构预测得分相加,选取综合得分最高的候选序列作为最终的输出序列。The sequence decoding score is added to the tree structure prediction score, and the candidate sequence with the highest comprehensive score is selected as the final output sequence. 8.一种手写数学公式识别系统,其特征在于,包括:8. A handwritten mathematical formula recognition system, characterized by comprising: 标注数据构造模块,用于构造手写数学公式标注数据;Annotation data construction module, used to construct annotation data of handwritten mathematical formulas; 模型训练模块,用于构建手写数学公式识别初始模型,使用标注数据对初始模型进行训练,在序列解码任务和树结构预测任务上进行联合优化,得到手写数学公式识别模型;公式识别模块,用于采用训练得到的手写数学公式识别模型进行手写数学公式识别。The model training module is used to construct an initial model for handwritten mathematical formula recognition, use labeled data to train the initial model, and perform joint optimization on the sequence decoding task and the tree structure prediction task to obtain a handwritten mathematical formula recognition model; the formula recognition module is used to use the trained handwritten mathematical formula recognition model to perform handwritten mathematical formula recognition. 9.一种计算机设备,其特征在于,包括存储器和处理器,所述存储器存储计算机程序,所述计算机程序被配置为由所述处理器执行,所述计算机程序包括用于执行权利要求1~7中任一项所述方法的指令。9. A computer device, characterized in that it comprises a memory and a processor, the memory stores a computer program, the computer program is configured to be executed by the processor, and the computer program comprises instructions for executing the method according to any one of claims 1 to 7. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储计算机程序,所述计算机程序被计算机执行时,实现权利要求1~7中任一项所述的方法。10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a computer, the method according to any one of claims 1 to 7 is implemented.
CN202411790409.8A 2024-12-06 2024-12-06 Handwriting mathematical formula recognition method and system Pending CN119888762A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411790409.8A CN119888762A (en) 2024-12-06 2024-12-06 Handwriting mathematical formula recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411790409.8A CN119888762A (en) 2024-12-06 2024-12-06 Handwriting mathematical formula recognition method and system

Publications (1)

Publication Number Publication Date
CN119888762A true CN119888762A (en) 2025-04-25

Family

ID=95432541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411790409.8A Pending CN119888762A (en) 2024-12-06 2024-12-06 Handwriting mathematical formula recognition method and system

Country Status (1)

Country Link
CN (1) CN119888762A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120544225A (en) * 2025-07-29 2025-08-26 科大讯飞股份有限公司 Formula recognition and model training methods, devices, related equipment and program products

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120544225A (en) * 2025-07-29 2025-08-26 科大讯飞股份有限公司 Formula recognition and model training methods, devices, related equipment and program products

Similar Documents

Publication Publication Date Title
CN111160343B (en) Off-line mathematical formula symbol identification method based on Self-Attention
CN113792177B (en) Scene text visual question answering method based on knowledge-guided deep attention network
CN110609891A (en) A Visual Dialogue Generation Method Based on Context-Aware Graph Neural Network
Tong et al. MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes
CN113449801B (en) Image character behavior description generation method based on multi-level image context coding and decoding
CN113780059A (en) Continuous sign language identification method based on multiple feature points
CN112329767A (en) System and method for extracting key information from contract text images based on joint pre-training
CN114239574A (en) A Knowledge Extraction Method for Miner Irregularities Based on Entity and Relation Joint Learning
CN117132997B (en) Handwriting form recognition method based on multi-head attention mechanism and knowledge graph
CN119888762A (en) Handwriting mathematical formula recognition method and system
CN113468891A (en) Text processing method and device
Xiao et al. An extended attention mechanism for scene text recognition
CN114020900A (en) Chart English abstract generation method based on fusion space position attention mechanism
CN112651225A (en) Multi-item selection machine reading understanding method based on multi-stage maximum attention
CN113806551A (en) Domain knowledge extraction method based on multi-text structure data
CN115525777A (en) Knowledge graph triple significance evaluation method based on natural language question-answering
CN118296000A (en) A construction method based on deep fusion model and cross-modal data hash retrieval
CN114818739B (en) A visual question answering method optimized using position information
CN119314164A (en) OCR image description generation method and system based on heterogeneous representation
CN119131826A (en) A method and system for extracting content from electric power knowledge pictures
CN115359486A (en) Method and system for determining custom information in document image
Zhou et al. Generative External Knowledge for Zero-shot Action Recognition
CN119445260B (en) Image emotion analysis method and device, storage medium and electronic equipment
Deepa et al. Synthetic Data Generation for Document Text Recognition
CN118093866B (en) Consistency regularization-based semi-supervised text classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载