CN119888762A

CN119888762A - Handwriting mathematical formula recognition method and system

Info

Publication number: CN119888762A
Application number: CN202411790409.8A
Authority: CN
Inventors: 高良才; 赵文祺; 颜钦钦; 朱建华
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2024-12-06
Filing date: 2024-12-06
Publication date: 2025-04-25

Abstract

The present invention belongs to the fields of information technology and image recognition technology, and relates to a method and system for handwritten mathematical formula recognition. The method comprises: constructing handwritten mathematical formula annotated data; constructing an initial model for handwritten mathematical formula recognition, training the initial model using the annotated data, performing joint optimization on the sequence decoding task and the tree structure prediction task, and obtaining a handwritten mathematical formula recognition model; and using the trained handwritten mathematical formula recognition model to perform handwritten mathematical formula recognition. The present invention combines the advantages of the sequence decoding model and the tree decoding model, effectively improving the model's understanding of the mathematical formula structure and its generalization ability, and significantly improving the accuracy of handwritten mathematical formula recognition while maintaining the advantages of efficient parallel training of the existing encoder-decoder model method.

Description

Handwriting mathematical formula recognition method and system

Technical Field

The invention belongs to the technical fields of information technology and image recognition, and particularly relates to a handwriting mathematical formula recognition method and system.

Background

The handwritten mathematical formula identification can identify and analyze the mathematical formula from the handwritten material, and has important application in many fields such as education, scientific research, engineering design, data analysis and the like. The technology can help us to quickly and accurately convert and process complex mathematical formulas, reduce repeated labor and provide convenience for life and work of people. For example, in the educational field, handwriting recognition can help teachers and students to quickly convert formulas in a blackboard or note into electronic documents, improving teaching and learning efficiency. In the scientific research field, researchers can utilize the technology to extract and analyze mathematical formulas from manuscripts rapidly, and the scientific research process is accelerated. In addition, handwriting formula recognition can also help engineers and data analysts to quickly convert and apply various complex formulas in design and data processing, thereby improving work efficiency and accuracy. Through carrying out efficient recognition and digital processing on the handwriting formula, a user can integrate and utilize the information better, and the development of the related field is promoted.

Handwriting mathematical formula recognition is classified into online recognition and offline recognition, which differ in the manner of input, in that online input is dynamic in nature and offline input is static. The on-line input is a handwriting stroke track, the track comprises the sequence, curvature and the like of the strokes, and the off-line input is an image containing a handwriting mathematical formula. The recognition method of the invention belongs to an offline handwriting mathematical formula recognition method.

The mathematical formula consists of symbols and symbol relations, wherein the symbols comprise Arabic numerals, english letters, greek letters, operators, other special characters and the like, the symbol relations are different from the character sequence relations of the conventional linear arrangement, the characteristics of multilevel and high structuring are presented, and mathematical symbols such as scores, root numbers, integrals and the like not only comprise basic characters, but also involve complex structures such as upper and lower marks, nested symbols and the like, and the recognition difficulty is greatly increased. Therefore, in the task of recognition of handwritten mathematical formulas, a core challenge is how to efficiently model the sign and structural relationships of the mathematical formulas. The recognition method is not only required to accurately recognize characters in the image, but also is more important to construct complex hierarchical relationships among the characters, and in addition, the recognition difficulty is greatly increased due to the diversity of handwriting, the similarity of handwriting symbols, the difference of handwriting styles, the complex structure of mathematical formulas and the dependency of the context.

The handwriting mathematical formula recognition technology uses a rule-based method, and with the continuous progress of the deep learning technology, the encoder-decoder-based recognition method can tightly combine an input image with recognition contents, so that a model can fully understand the contents in the image and transcribe into an accurate character sequence according to the image contents, has stronger modeling capability, and shows more excellent performance compared with the traditional method.

In a computer system, the representation method of the mathematical formula is mainly divided into two types of representation based on a sequence and representation based on a tree structure, and two corresponding decoding models, namely a model based on sequence decoding and a model based on tree decoding, are also developed by the identification method based on an encoder-decoder in handwriting mathematical formula identification. The method based on the sequence decoding model takes the LaTeX expression as a decoding target, has the advantages that the LaTeX expression is a widely adopted mathematical formula expression, can be well adapted to the existing research results in the fields of natural language processing, optical character recognition and the like, and has stronger universality. The method based on the tree decoding model regards a mathematical formula as a tree structure to carry out predictive decoding. The model design fully integrates the characteristic of the tree structure, which enhances the ability of the model to understand the mathematical formula in theory, and ensures that the model can always decode the legal mathematical formula in the predictive reasoning process.

Methods based on sequence decoding models have certain limitations. First, the model does not fully take into account the tree-like structure characteristics of the mathematical formulas during the training process, which may lead to insufficient understanding of the formula structure by the model, thereby affecting its generalization ability when processing complex structural formulas. Secondly, as a target of sequence prediction, the LaTeX expression cannot guarantee the grammar normalization of the model decoding result. For example, in processing complex mathematical formulas, the model may produce left and right brackets that do not match, resulting in predictions that do not conform to LaTeX's grammar specification.

Meanwhile, the prior method based on the tree-shaped decoding model has the defects. First, these models need to be built depending on the recurrent neural network, and cannot be trained in parallel with high efficiency. In addition, these models are relatively complex in decoding manner, lack of versatility and wide applicability as compared with LaTeX expression, and their performance is generally inferior to the sequence decoding model-based method, resulting in failure of the tree decoding method to find wide application.

Disclosure of Invention

Aiming at the respective advantages and disadvantages of the two decoding models, the invention combines the advantages of the sequence decoding model and the tree-shaped decoding model, and provides a handwritten mathematical formula recognition method and a handwritten mathematical formula recognition system based on formula tree structure perception.

The technical scheme adopted by the invention is as follows:

a handwritten mathematical formula recognition method comprising the steps of:

constructing handwritten mathematical formula labeling data;

Constructing a handwritten mathematical formula recognition initial model, training the initial model by using labeling data, and performing joint optimization on a sequence decoding task and a tree structure prediction task to obtain the handwritten mathematical formula recognition model;

And carrying out handwriting mathematical formula recognition by adopting a handwriting mathematical formula recognition model obtained through training.

Further, the constructing handwritten mathematical formula annotation data includes converting the mathematical formula into a tree structure, the converting the mathematical formula into a tree structure including representing the tree structure of the mathematical formula as a series of tuples, wherein each tuple (c, p) includes a child node c of each subtree and its parent node p.

Further, the model structure of the initial model includes:

The visual coding model is used for extracting high-level visual characteristics from the input handwritten mathematical formula sheet;

the image position coding module and the character position coding module are used for respectively providing explicit position information for the visual characteristics and the word embedding vectors;

The decoding model is used for outputting a character semantic feature sequence;

the formula tree structure perception module is used for constructing the feature vector output by the decoding model into a formula structure tree;

and the projection layer is used for decoding the mathematical formula sequence in an autoregressive mode by utilizing the feature vector output by the decoding model.

Further, the decoding model employs a transducer decoding model that introduces an overlay attention mechanism.

Further, the processing procedure of the formula tree structure sensing module comprises the following steps:

Inputting a character semantic feature sequence X extracted by a decoding model;

Performing feature extraction on the character semantic feature sequence X by using a transducer encoder to obtain X';

The feature vector X' output by the transducer encoder is mapped into a child node vector and a father node vector through two linear projection functions F _c (-) and F _p (-), respectively;

Combining and adding the child node vectors and the father node vectors in pairs to obtain a relation feature matrix M, wherein a vector M _i,j at each position in the matrix M represents a relation feature vector of which the character with the index of i is used as a child node and the character with the index of j is used as a father node;

Using function F _score (·) to convert the relationship feature matrix M to a relationship score matrix S, function F _score (·) consisting of a ReLU activation function and a vector dot product operation;

Each element S _i,j in the relation score matrix S output by the formula tree structure perception module represents relation scores with the ith character as a child node and the jth character as a father node, wherein the higher the score is, the greater the possibility of establishing father-son relation is, and for the element S _a,b with the highest score in each row, the model prediction is considered to have father-son relation with the a symbol as a child node and the b symbol as a father node, so that the whole mathematical formula tree structure is constructed by establishing the relation among the characters.

Further, the training strategy of the initial model comprises the steps of combining a loss function of a sequence decoding task and a loss function of a tree structure prediction task to train the model;

The loss function of the sequence decoding task is:

Wherein, X _t represents the feature vector of the t time step output by the decoding model, W _o represents the linear projection parameter matrix, and b _o represents the bias vector;

The loss function of the tree structure prediction task is as follows:

s is a relation score matrix output by the formula tree structure sensing module and used for estimating the probability of parent-child relation between each character t and other characters.

Further, the handwriting mathematical formula recognition by using the handwriting mathematical formula recognition model obtained by training comprises:

Acquiring a handwritten mathematical formula image to be identified;

Preprocessing a handwritten mathematical formula image to be recognized;

Inputting the preprocessed handwritten mathematical formula image into a handwritten mathematical formula recognition model obtained through training, predicting each symbol in the formula image by the model, and generating an initial recognition sequence;

generating a group of candidate sequences by adopting a beam search strategy for the initial identification sequence, and calculating a sequence decoding score for each candidate sequence;

Calculating a relation score matrix of the candidate sequences by using a formula tree structure perception module, and further calculating a tree structure prediction score for each candidate sequence;

and adding the sequence decoding score and the tree structure prediction score, and selecting a candidate sequence with the highest comprehensive score as a final output sequence.

A handwritten mathematical formula recognition system, comprising:

the annotation data construction module is used for constructing annotation data of the handwriting mathematical formula;

the model training module is used for constructing a handwritten mathematical formula recognition initial model, training the initial model by using the labeling data, and carrying out joint optimization on a sequence decoding task and a tree structure prediction task to obtain the handwritten mathematical formula recognition model;

And the formula recognition module is used for carrying out handwriting mathematical formula recognition by adopting the handwriting mathematical formula recognition model obtained through training.

The beneficial effects of the invention are as follows:

The invention effectively improves the understanding and generalization capability of the model to the mathematical formula structure by fusing the methods based on the sequence decoding and the tree decoding, and obviously improves the accuracy of handwriting mathematical formula identification on the basis of maintaining the high-efficiency parallel training advantages of the existing encoder-decoder model method. Specifically, the invention can perform joint optimization on the two tasks in the training stage, and simultaneously trains the understanding capability of the model on the two aspects of the sequence arrangement and the tree structure of the mathematical formula. In the reasoning prediction stage, the invention provides that the tree structure prediction scoring mechanism is integrated into the beam search algorithm, so that the model considers the sequence decoding of a mathematical formula and the rationality of the tree structure when generating the LaTeX expression, and the identification accuracy is further improved. In summary, the recognition method provided by the invention improves the recognition capability of the sequence decoding model on the handwriting mathematical formula of the complex structure.

Drawings

Fig. 1 is a schematic diagram of the structure of the model of the present invention.

FIG. 2 is a schematic diagram of a formula tree structure awareness module.

Fig. 3 is an image sample diagram of a handwritten formula to be recognized.

Detailed Description

The present invention will be further described in detail with reference to the following examples and drawings, so that the above objects, features and advantages of the present invention can be more clearly understood.

The invention combines the advantages of a sequence decoding model and a tree-shaped decoding model, provides a handwritten mathematical formula recognition method based on formula tree structure perception, introduces a formula tree structure perception module on the basis of the flexibility and high-efficiency training efficiency of the sequence decoding model, and improves the understanding and generalization capability of complex formula structures. By carrying out joint optimization on two tasks of sequence prediction and tree structure prediction, the method can decode the LaTeX sequence and consider the rationality of the formula tree structure, so that the recognition result is more complete and accurate in grammar structure. Experimental results on a plurality of data sets show that compared with the traditional sequence decoding model and tree-shaped decoding model, the method provided by the invention has better recognition performance, and the potential of a method for combining sequence and tree-shaped decoding in the field of handwriting mathematical formula recognition is also demonstrated.

The invention relates to a handwritten mathematical formula recognition method based on an encoder-decoder, which completes a recognition flow by constructing a deep neural network model. Firstly, constructing handwritten mathematical formula labeling data, constructing a handwritten mathematical formula identification initial model, training the initial model by using the labeling data, and automatically learning visual characteristics of the handwritten mathematical formula and semantic structures of LaTeX expressions from the data to obtain a final model. And then, carrying out handwriting mathematical formula recognition by adopting a final model obtained through training.

And 1, constructing handwritten mathematical formula labeling data, and converting a mathematical formula based on a LaTeX expression into a tree structure.

In particular, the present invention represents the tree structure of the mathematical formula as a series of tuples, wherein each tuple (c, p) contains the child node c of each subtree and its parent node p. Notably, in order to disambiguate the repeated symbols that may be introduced and ensure compatibility with the LaTeX expression labels used by the sequence decoding model, the present invention chooses to use the index of the mathematical symbol in the LaTeX expression as the node identification in the formula tree, rather than directly using the mathematical symbol itself. For example, for mathematical formula 3 ² -1=8 in fig. 1, its corresponding LaTeX expression is labeled "3 {2} -1=8", and its formula tree structure label may be expressed as "(0, -1), (1, -1), (2, -1), (3, 0), (4, -1), (5, 0), (6, 5), (7, 6), (8, 7)", where-1 indicates that the node has no parent node, i.e., the node does not participate in the tree structure building process. By the method, the sequence labeling and tree structure labeling data of the mathematical formula are unified, and further, the model is allowed to perform end-to-end joint optimization on sequence decoding and tree structure prediction tasks.

And 2, constructing a handwritten mathematical formula to identify an initial model.

The model structure of the invention is shown in figure 1, and mainly comprises four parts, namely 1) a visual coding model for extracting high-level visual features with rich meaning from an input handwritten mathematical formula. The module adopts DenseNet model widely applied in the field of handwriting mathematical formula recognition as backbone network. 2) The image position coding module and the character position coding module respectively provide explicit position information for the visual features and the word embedding vectors. The image position coding module and the character position coding module use the same sine and cosine position coding. 3) The invention adopts a transform decoding model which introduces an overlay attention mechanism. 4) And the formula tree structure perception module is used for constructing the feature vector output by the decoding model into a formula structure tree to help the model to better capture the structural information of the mathematical formula. 5) And the projection layer is used for decoding the LaTeX sequence in an autoregressive mode by utilizing the feature vector output by the decoding model.

The visual coding model can be implemented by adopting DenseNet models and the like.

The decoding model is shown in fig. 1 and comprises a self-attention module, an overlay attention module and a feedforward neural network. The self-attention module is responsible for modeling the information interaction between the current decoding position and the previous position, the overlay attention module is responsible for capturing the information interaction between the decoder and the encoder, guiding the decoder to generate a final output sequence by utilizing the characteristic information generated by the encoder, and the feedforward neural network is responsible for carrying out nonlinear transformation on the representation after attention calculation, so that the expressive power of the model is enhanced and more accurate characteristic characterization is generated.

An architectural overview of the proposed formula tree structure aware module is shown in fig. 2, where the "SOS" flag is used to indicate the start of a sequence. The input of the module is a character semantic feature sequence X epsilon R ^T×d extracted by a decoding model, wherein T represents the LaTeX sequence length, and d represents the feature vector dimension of the character. In order to map the semantic features of the characters into the semantic space of the formula tree structure, a transducer encoder is used for extracting features of the character semantic feature sequence X to obtain X' E R ^T×d, so that the tree structure relationship can be conveniently built in the semantic space later.

X′=TransformerEncoder(X),

Then, the feature vector X' output by the transducer encoder is mapped into a child node vector X ^c∈R^T×d and a parent node vector X ^p∈R^T×d, respectively, by two linear projection functions F _c (·) and F _p (·).

X^c＝F_c(X′)＝X′W_c,

X^p＝F_p(X′)＝X′W_p,

Where W _c∈R^d×d and W _p∈R^d×d are both trainable linear projection parameter matrices, where d represents the feature vector dimensions of the character. In order to construct the formula tree structure relation between characters, the invention combines and adds the child node vector and the father node vector pairwise to obtain the relation characteristic matrix M. In this matrix, the vector M _i,j∈R^d for each position represents a relationship feature vector in which the character with index i in the LaTeX expression is used as a child node and the character with index j is used as a parent node.

Finally, the present invention uses a function F _score (·) consisting of a ReLU activation function and a vector dot product operation to convert the relationship feature matrix M into a relationship score matrix S ε R ^T ^×T:

S=F_score(M)＝max(0,M)v_s.

wherein v _s is a weight vector.

Each element S _i,j in the relationship score matrix S output by the formula tree structure sensing module represents a relationship score with the ith character as a child node and the jth character as a parent node, where a higher score means a higher likelihood that a parent-child relationship is established. In the mathematical formula tree structure, each node except the root node has only one parent node. Therefore, for the element S _a,b with the highest score in each row, the model prediction can be considered to have a parent-child relationship with the a symbol as a child node and the b symbol as a parent node, so that the whole mathematical formula tree structure is constructed by establishing the relationship between characters.

And step 3, training the initial model by using the labeling data to obtain a final model.

1) Training strategy

In order to combine the tasks of sequence decoding and tree decoding and thus enable the model to perform end-to-end joint optimization on both tasks, the present invention combines the sequence decoding and tree structure prediction loss functions to train the model.

Specifically, for the LaTeX expression sequence y ₁,…,y_T, in the sequence decoding task, the feature vector X e R ^T×L output by the decoding model can be used to calculate the probability that each character appears at time step t, and then the cross entropy loss function is used to calculate the loss L _seq of the sequence decoding task.

Where X _t represents the eigenvector of the t-th time step of the decoding model output, W _o represents the linear projection parameter matrix, and b _o represents the bias vector.

In the tree structure prediction task, the relation score matrix S e R ^T×T output by the formula tree structure perception module is used for estimating the probability of the parent-child relation between each character t and other characters, and then the cross entropy loss function is also used for calculating the loss L _struct of the tree structure prediction task.

And finally, adding the loss of the sequence decoding and tree structure prediction tasks to obtain a loss function L for training the method.

L=L_seq+L_struct

2) Model training

According to the invention, the PyTorch framework can be used for constructing an initial model, the labeling data is used for performing supervised training, and the model optimization super parameters are adjusted according to experimental conditions to obtain a final model.

And 4, carrying out handwriting mathematical formula recognition by adopting a final model obtained through training.

1) And acquiring a formula image to be identified, and acquiring a handwriting formula image to be identified by using a scanning or mobile equipment photographing method. The sample is shown in fig. 3, for example.

2) Preprocessing a handwriting formula to be identified, performing image enhancement and noise removal on the formula image, and then performing inclination correction and scale normalization operation on the formula image, so that the formula image to be identified and the image in the training image data set have the same scale and angle.

3) And inputting the preprocessed formula image into a final model obtained by training, predicting each symbol in the formula image by the model, and generating an initial recognition sequence.

4) A bundle search strategy is applied to the initial recognition sequence to generate a set of candidate sequences, and a sequence decoding score S _seq (y) is calculated for each candidate sequence y using the forward and reverse sequence decoded bundle search scores.

5) And calculating a relation score matrix of the candidate sequences by using a formula tree structure perception module, constructing corresponding tree structure labels by using the candidate sequences y, calculating cross entropy loss of the relation score matrix and the tree structure labels, and calculating a tree structure prediction score S _struct (y) for each candidate sequence by using negative cross entropy loss.

6) And adding the sequence decoding score and the tree structure prediction score, namely S _seq(y)+S_struct (y), and selecting the candidate sequence with the highest comprehensive score as a final output sequence.

By considering the scores on sequence decoding and tree structure prediction at the same time, the method ensures the accuracy of the final generated sequence and also considers the rationality of the formula tree structure.

The key point of the invention is a formula tree structure sensing module. The module effectively improves the understanding of the model to the mathematical formula structure and the generalization capability thereof by fusing the methods based on the sequence decoding and the tree decoding. Specifically, the module can perform joint optimization on the two tasks in a training stage, and simultaneously trains the understanding capability of the model on the two aspects of the sequence arrangement and the tree structure of the mathematical formula. In the reasoning prediction stage, the invention provides that the tree structure prediction scoring mechanism is integrated into the beam search algorithm, so that the model considers the sequence decoding of a mathematical formula and the rationality of the tree structure when generating the LaTeX expression, and the identification accuracy is further improved.

The above embodiment takes a mathematical formula expression in a LaTeX format as an example to describe the handwriting mathematical formula recognition method of the present invention. In addition to LaTeX expressions, the method of the present invention is also applicable to other forms of mathematical formula expressions, such as MathML, etc.

The performance of the model of the present invention on formula recognition accuracy (ExpRate) is compared with the existing most advanced model on CROHME 2014/2016/2019 test datasets common to handwritten mathematical formula recognition tasks. None of the methods uses data enhancement to ensure fair comparison. As shown in Table 1, the process of the present invention is always superior to the existing process in all indexes.

TABLE 1 formula identification accuracy comparison of the invention with the prior art

Another embodiment of the present invention provides a handwritten mathematical formula recognition system, comprising:

The above-mentioned division of the modules is only illustrative, and the above-mentioned functions can be distributed by different functional modules according to the needs in practical application to complete all or part of the functions described in the above-mentioned method. The specific working process of each module may refer to the corresponding process in the foregoing method embodiment, which is not described herein.

Another embodiment of the invention provides a computer device (computer, server, smart phone, etc.) comprising a memory storing a computer program configured to be executed by the processor and a processor, the computer program comprising instructions for performing the steps of the method of the invention.

Another embodiment of the invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program which, when executed by a computer, performs the steps of the method of the invention.

The above-disclosed embodiments of the present invention are intended to aid in understanding the contents of the present invention and to enable the same to be carried into practice, and it will be understood by those of ordinary skill in the art that various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention. The invention should not be limited to what has been disclosed in the examples of the specification, but rather by the scope of the invention as defined in the claims.

Claims

1. A method for recognizing handwritten mathematical formulas, comprising the following steps:

Construct handwritten mathematical formula annotation data;

Construct an initial model for handwritten mathematical formula recognition, use the labeled data to train the initial model, and perform joint optimization on the sequence decoding task and the tree structure prediction task to obtain a handwritten mathematical formula recognition model;

The trained handwritten mathematical formula recognition model is used to perform handwritten mathematical formula recognition.

2. The method according to claim 1 is characterized in that the constructing of handwritten mathematical formula annotation data includes converting the mathematical formula into a tree structure; the converting of the mathematical formula into a tree structure includes: representing the tree structure of the mathematical formula as a series of tuples, wherein each tuple (c, p) contains the child node c of each subtree and its parent node p.

3. The method according to claim 1, characterized in that the model structure of the initial model comprises:

A visual encoding model that extracts high-level visual features from the input handwritten math formula slices;

The image position encoding module and the character position encoding module are used to provide explicit position information for visual features and word embedding vectors respectively;

A decoding model, used to output a sequence of character semantic features;

A formula tree structure perception module is used to construct the feature vector output by the decoding model into a formula structure tree;

The projection layer is used to decode the mathematical formula sequence in an autoregressive manner using the feature vector output by the decoding model.

4. The method according to claim 3 is characterized in that the decoding model adopts a Transformer decoding model that introduces an overlay attention mechanism.

5. The method according to claim 3, characterized in that the processing process of the formula tree structure perception module includes:

Input the character semantic feature sequence X extracted by the decoding model;

Use the Transformer encoder to extract the character semantic feature sequence X to obtain X ^′ ;

Through two linear projection functions F _c (·) and F _p (·), the feature vector X′ output by the Transformer encoder is mapped to the child node vector and the parent node vector respectively;

The child node vectors and parent node vectors are combined and added in pairs to obtain the relationship feature matrix M. The vector Mi _,j at each position in the matrix M represents the relationship feature vector between the character with subscript i as the child node and the character with subscript j as the parent node;

Use the function F _score (·) to convert the relationship feature matrix M into the relationship score matrix S. The function F _score (·) consists of a ReLU activation function and a vector dot product operation;

Each element Si _,j in the relationship score matrix S output by the formula tree structure perception module represents the relationship score with the i-th character as the child node and the j-th character as the parent node, where the higher the score means the greater the possibility of the parent-child relationship. For the element Sa _,b with the highest score in each row, it is considered that the model predicts the existence of a parent-child relationship with the a-th symbol as the child node and the b-th symbol as the parent node, and then the entire mathematical formula tree structure is constructed by establishing the relationship between the characters.

6. The method according to claim 1, characterized in that the training strategy of the initial model comprises: combining the loss function of the sequence decoding task and the loss function of the tree structure prediction task to train the model;

The loss function of the sequence decoding task is:

Among them, _Xt represents the feature vector of the tth time step output by the decoding model, _Wo represents the linear projection parameter matrix, and _bo represents the bias vector;

The loss function of the tree structure prediction task is:

Among them, S is the relationship score matrix output by the formula tree structure perception module, which is used to estimate the probability that each character t has a parent-child relationship with other characters.

7. The method according to claim 1, characterized in that the step of using the trained handwritten mathematical formula recognition model to perform handwritten mathematical formula recognition comprises:

Obtain a handwritten mathematical formula image to be recognized;

Preprocess the handwritten mathematical formula image to be recognized;

The preprocessed handwritten mathematical formula image is input into the trained handwritten mathematical formula recognition model, which predicts each symbol in the formula image and generates an initial recognition sequence;

A beam search strategy is used for the initial recognition sequence to generate a set of candidate sequences, and a sequence decoding score is calculated for each candidate sequence;

The formula tree structure perception module is used to calculate the relationship score matrix of the candidate sequence, and then the tree structure prediction score is calculated for each candidate sequence;

The sequence decoding score is added to the tree structure prediction score, and the candidate sequence with the highest comprehensive score is selected as the final output sequence.

8. A handwritten mathematical formula recognition system, characterized by comprising:

Annotation data construction module, used to construct annotation data of handwritten mathematical formulas;

The model training module is used to construct an initial model for handwritten mathematical formula recognition, use labeled data to train the initial model, and perform joint optimization on the sequence decoding task and the tree structure prediction task to obtain a handwritten mathematical formula recognition model; the formula recognition module is used to use the trained handwritten mathematical formula recognition model to perform handwritten mathematical formula recognition.

9. A computer device, characterized in that it comprises a memory and a processor, the memory stores a computer program, the computer program is configured to be executed by the processor, and the computer program comprises instructions for executing the method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a computer, the method according to any one of claims 1 to 7 is implemented.