Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention aims to provide a system and a method for analyzing the semantic emotion of a financial expert language, so as to solve the problem that it is difficult to obtain effective and reliable financial information from the financial expert language.
In order to solve the problems, the invention adopts the following technical scheme:
In one aspect, the invention provides a financial expert language semantic emotion analysis system, which comprises a natural language processing module, an emotion analysis module and a market emotion assessment module;
The natural language processing module is used for carrying out word segmentation, part-of-speech tagging, entity identification and professional term extraction on financial expert language data to form a word embedding matrix;
The emotion analysis module is used for carrying out emotion polarity analysis on each text vector in the word embedding matrix, wherein the emotion polarities comprise positive, negative and neutral, and carrying out optimistic, cautious or pessimistic category classification on financial expert speaking data, and combining the emotion polarities and the category classification to obtain the emotion intensity of the word embedding matrix;
The market emotion assessment module is used for constructing a market emotion assessment model based on historical financial expert speaking data and corresponding historical market feedback data, assessing the market emotion of the word embedded matrix by adopting the market emotion assessment model, and carrying out correlation analysis on the market emotion and the market key index to obtain the influence of the financial expert speaking data on the market key index.
As an implementation manner, the word segmentation, part-of-speech tagging, entity identification and term extraction are performed on financial expert speech data to form a word embedding matrix, which includes:
word segmentation is carried out on financial expert speaking data by adopting a word segmentation model trained by a corpus in the financial field;
Labeling the parts of speech of the segmented financial expert speaking data;
Identifying financial entities of the financial expert speaking data after part of speech tagging by adopting a long-short-term memory network or a transducer model;
extracting the technical terms by combining a deep learning model of a dictionary in the financial field to form a financial technical term library, and extracting the technical terms of the financial entity by calculating cosine similarity of the identified financial entity and the technical terms in the financial technical term library;
A word embedding matrix of financial expert speech data is constructed based on the extracted technical terms of the financial entity.
As an implementation manner, the emotion polarity analysis on each text vector in the word embedding matrix includes:
analyzing each text vector in the word embedding matrix by using the trained deep neural network model, and calculating the emotion polarity score of each word;
The optimistic, cautious or pessimistic class classification of financial expert speech data includes:
Converting the context information of the financial expert language data into a feature vector by using TF-IDF or word2vec, and integrating the feature vector with the score of the emotion polarity to obtain a comprehensive feature vector;
classifying the financial expert speaking data by a trained softmax regression model;
The emotion intensity of the word embedding matrix obtained by combining emotion polarity and category classification comprises the following steps:
and calculating the emotion intensity of the word embedding matrix according to the weight and polarity scores of different emotion categories.
As an implementation manner, the building the market emotion assessment model based on the historical financial expert language data and the corresponding historical market feedback data includes:
collecting historical language data and related market feedback data of financial experts, and preprocessing the data;
Extracting key features from the preprocessed data to construct key feature vectors, wherein the key features comprise scores of emotion polarities of historical language data of financial experts, category classification and market key indexes of related market feedback data;
And constructing realization sequence data based on the constructed key feature vector, and training by adopting a long-short-term memory network to obtain a market emotion assessment model.
As an implementation manner, the estimating the market emotion of the word embedding matrix by using the market emotion estimation model, and performing correlation analysis on the market emotion and the market key index to obtain the influence of the financial expert speaking data on the market key index includes:
And obtaining the influence of financial expert speaking data on the market key indexes according to the obtained market emotion of the word embedding matrix, the score of emotion polarity and the change curve of the market key indexes of the market feedback data related to category classification matching.
In another aspect, the present invention provides a financial expert language semantic emotion analysis method, including:
word segmentation, part-of-speech tagging, entity identification and professional term extraction are carried out on financial expert speaking data to form a word embedding matrix;
carrying out emotion polarity analysis on each text vector in the word embedding matrix, wherein the emotion polarities comprise positive, negative and neutral, and carrying out optimistic, cautious or pessimistic category classification on financial expert speaking data, and combining the emotion polarities and the category classification to obtain the emotion intensity of the word embedding matrix;
And constructing a market emotion assessment model based on the historical financial expert speaking data and the corresponding historical market feedback data, assessing the market emotion of the word embedded matrix by adopting the market emotion assessment model, and carrying out correlation analysis on the market emotion and the market key index to obtain the influence of the financial expert speaking data on the market key index.
As an implementation manner, the word segmentation, part-of-speech tagging, entity identification and term extraction are performed on financial expert speech data to form a word embedding matrix, which includes:
word segmentation is carried out on financial expert speaking data by adopting a word segmentation model trained by a corpus in the financial field;
Labeling the parts of speech of the segmented financial expert speaking data;
Identifying financial entities of the financial expert speaking data after part of speech tagging by adopting a long-short-term memory network or a transducer model;
extracting the technical terms by combining a deep learning model of a dictionary in the financial field to form a financial technical term library, and extracting the technical terms of the financial entity by calculating cosine similarity of the identified financial entity and the technical terms in the financial technical term library;
A word embedding matrix of financial expert speech data is constructed based on the extracted technical terms of the financial entity.
As an implementation manner, the emotion polarity analysis on each text vector in the word embedding matrix includes:
analyzing each text vector in the word embedding matrix by using the trained deep neural network model, and calculating the emotion polarity score of each word;
The optimistic, cautious or pessimistic class classification of financial expert speech data includes:
Converting the context information of the financial expert language data into a feature vector by using TF-IDF or word2vec, and integrating the feature vector with the score of the emotion polarity to obtain a comprehensive feature vector;
classifying the financial expert speaking data by a trained softmax regression model;
The emotion intensity of the word embedding matrix obtained by combining emotion polarity and category classification comprises the following steps:
and calculating the emotion intensity of the word embedding matrix according to the weight and polarity scores of different emotion categories.
As an implementation manner, the building the market emotion assessment model based on the historical financial expert language data and the corresponding historical market feedback data includes:
collecting historical language data and related market feedback data of financial experts, and preprocessing the data;
Extracting key features from the preprocessed data to construct key feature vectors, wherein the key features comprise scores of emotion polarities of historical language data of financial experts, category classification and market key indexes of related market feedback data;
And constructing realization sequence data based on the constructed key feature vector, and training by adopting a long-short-term memory network to obtain a market emotion assessment model.
As an implementation manner, the estimating the market emotion of the word embedding matrix by using the market emotion estimation model, and performing correlation analysis on the market emotion and the market key index to obtain the influence of the financial expert speaking data on the market key index includes:
And obtaining the influence of financial expert speaking data on the market key indexes according to the obtained market emotion of the word embedding matrix, the score of emotion polarity and the change curve of the market key indexes of the market feedback data related to category classification matching.
The financial expert language semantic emotion analysis system and the financial expert language semantic emotion analysis method have the beneficial effects that emotion polarity analysis and category classification are carried out on financial expert language data, emotion analysis on the financial expert language data is achieved, correlation is carried out on the basis of market feedback data, the relation between the financial expert language data and market key indexes is obtained, and therefore influence analysis of the financial expert language data on the market key indexes is achieved.
Detailed Description
The present invention will be described in further detail with reference to specific examples.
It should be noted that these examples are only for illustrating the present invention, and not for limiting the present invention, and simple modifications of the method under the premise of the inventive concept are all within the scope of the claimed invention.
Referring to fig. 1, a financial expert language semantic emotion analysis system includes a natural language processing module 100, an emotion analysis module 200, and a market emotion assessment module 300.
The natural language processing module 100 is used for word segmentation, part-of-speech tagging, entity identification and term extraction on financial expert speech data to form a word embedding matrix.
The word segmentation of the financial expert speaking data comprises word segmentation of the financial expert speaking data by adopting a word segmentation model trained by a corpus in the financial field. The word segmentation model can be combined with a rule method and a statistical method, so that common words can be recognized, and financial professional terms can be accurately segmented.
The part-of-speech tagging comprises the steps of tagging part-of-speech of financial expert after word segmentation, and assigning part-of-speech tags such as nouns, verbs, adjectives and the like to each vocabulary so as to provide more accurate semantic information for subsequent emotion analysis.
The entity identification comprises the step of identifying financial entities of the word-part tagged financial expert speaking data by adopting a long-short-term memory network or a transducer model. Financial entities in the text, such as person names, institution names, stock codes, etc., are identified to facilitate extraction of key information.
The term of art extraction includes that a deep learning model combined with a dictionary in the financial field is adopted to extract terms of art to form a financial term library, and the cosine similarity of the identified financial entity and the terms in the financial term library is calculated to extract the terms of the financial entity.
Forming the word embedding matrix includes constructing the word embedding matrix of financial expert speech data based on the extracted terminology of the financial entity.
Wherein, The word is represented as being embedded in a matrix,The sequence of words to be input is represented,Is a word embedding function that maps each word into a dense vector of fixed dimensions that captures the semantic information of the word.
For example, consider a section of financial expert comments that "recent market fluctuations are large, but the investment value of high-quality stocks is still significant". In the implementation process, sentences are firstly divided into word units such as 'recent', 'market', 'fluctuation', 'larger' and the like through word segmentation processing. Subsequently, part-of-speech tagging is performed, such as "recent" being noted as a temporal noun and "larger" being noted as an adjective. The entity identification and term extraction step will then identify financial entities and terms such as "market" and "premium stocks". Finally, through word embedding technology, each vocabulary is converted into a vector capable of expressing the deep semantic meaning of the vocabulary, and the vectors are then input into a deep learning model for emotion analysis to judge the emotion tendency expressed by the comment.
The emotion analysis module 200 is configured to analyze emotion polarity of each text vector in the word embedding matrix, where the emotion polarity includes positive, negative and neutral, and performs optimistic, cautious or pessimistic category classification on financial expert speech data, and combines the emotion polarity and the category classification to obtain emotion intensity of the word embedding matrix.
The process specifically comprises the following steps:
And analyzing each text vector in the word embedding matrix by using the trained deep neural network model, and calculating the emotion polarity score of each word. The deep neural network may be a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN).
Wherein, A score representing the polarity of the emotion,For the word to be embedded in the matrix,Is an emotion polarity analysis function.
The optimistic, cautious or pessimistic classification of category for financial expert speech data includes:
And converting the context information of the financial expert language data into a feature vector by using TF-IDF or word2vec, and integrating the feature vector with the score of the emotion polarity to obtain a comprehensive feature vector. The context information includes:
The mood level, the context information is not divided into several levels, but is obtained and expressed in various ways, and the following are several common obtaining ways:
lexical characteristics, such as part of speech, word frequency, word embedding (word embeddings), etc.
Sentence characteristics, such as sentence length, sentence structure, grammar analysis, etc.
The chapter features are chapter structures, paragraph relations, topic consistency and the like.
The financial expert speech data was category-classified by a trained softmax regression model.
The softmax regression model is trained using comprehensive feature vectors constructed from emotional and contextual features extracted from financial expert speech data, and outputs classification probabilities that the financial expert speech data belongs to optimistic, cautious or pessimistic.
Wherein, A probability distribution representing the classification of the text,The context information is represented by a representation of the context information,A score representing the polarity of the emotion,Is a text classification function.
The method for obtaining the emotion strength of the word embedding matrix by combining emotion polarity and category classification comprises the following steps:
and calculating the emotion intensity of the word embedding matrix according to the weight and polarity scores of different emotion categories.
Wherein I represents an emotion intensity value,Is the weight of the i-th emotion class,Is the score of emotion polarity, N is the total number of words.
For example, the word "wave" may be determined as neutral, while "premium stock" may be determined as positive emotion. In the text classification phase, the entire comment may be divided into "optimistic" categories. In the emotional intensity assessment, the comment may obtain a higher emotional intensity value according to the model calculation, indicating that the comment has a larger positive influence on the market emotion. Through this series of analyses, the present invention is able to provide investors with quantitative assessments concerning market emotion, thereby assisting investment decisions.
Through the steps, the emotion tendencies of financial language can be accurately judged, emotion intensity evaluation can be carried out by combining with context factors, and an efficient and reliable solution is provided for market emotion analysis.
The market emotion assessment module 300 is configured to construct a market emotion assessment model based on historical financial expert speech data and corresponding historical market feedback data, assess the market emotion of the word embedded matrix by adopting the market emotion assessment model, and perform correlation analysis on the market emotion and the market key index to obtain the influence of the financial expert speech data on the market key index.
Wherein, the constructing of the market emotion estimation model includes:
historical language data and related market feedback data of financial experts are collected and data preprocessing is performed.
The pretreatment comprises the following steps:
Historical speech data and market feedback data for financial professionals are collected and consolidated, including but not limited to stock price fluctuations, trading volume changes, social media moods, and the like. And cleaning and normalizing the data to eliminate noise and abnormal values in the data.
Wherein, Is the data after the normalization and is used for the data,Is the original data of the data set,AndRespectively minimum and maximum values in the dataset.
And extracting key features from the preprocessed data to construct key feature vectors, wherein the key features comprise scores of emotion polarities of historical language data of financial experts, category classification and market key indexes of related market feedback data.
Wherein, Is a feature vector of the object set,Is the i-th feature.
And constructing realization sequence data based on the constructed key feature vector, and training by adopting a long-short-term memory network to obtain a market emotion assessment model.
Wherein, Is the result of the assessment of the emotion of the market,Is a model parameter.
Evaluating the market emotion of the word embedding matrix by adopting a market emotion evaluation model, and performing correlation analysis on the market emotion and the market key index, wherein the obtaining of the influence of financial expert speaking data on the market key index comprises the following steps:
And obtaining the influence of financial expert speaking data on the market key indexes according to the obtained market emotion of the word embedding matrix and the change curve of the market key indexes (such as index rise and fall and transaction amount change) of the market feedback data related to the classification matching of the emotion polarity and the category.
For example, historical speech data and related market feedback data of the expert are collected first through a data preprocessing step. In the feature extraction stage, the emotion polarity score of the comment, the text classification result and the like are taken as features. And then, processing the feature vector by using the constructed LSTM emotion estimation model to obtain a market emotion estimation result. Through emotion quantitative analysis, a remarkable positive correlation exists between the speech of the expert and the market transaction amount, so that the speech of the expert has a pushing effect on market transaction liveness and/or the speech of the expert has market correctness. This analysis can provide investors with deep insight into market dynamics, assisting them in making more reasonable investment decisions.
Referring to fig. 2, a method for analyzing semantic emotion of financial expert language includes:
S100, word segmentation, part-of-speech tagging, entity identification and professional term extraction are carried out on financial expert speaking data to form a word embedding matrix;
S200, carrying out emotion polarity analysis on each text vector in the word embedding matrix, wherein the emotion polarities comprise positive, negative and neutral, and carrying out optimistic, cautious or pessimistic category classification on financial expert speaking data, and obtaining the emotion strength of the word embedding matrix by combining the emotion polarities and the category classification;
S300, a market emotion assessment model is built based on historical financial expert speaking data and corresponding historical market feedback data, the market emotion of the word embedding matrix is assessed by adopting the market emotion assessment model, and correlation analysis is carried out on the market emotion and the market key index, so that influence of the financial expert speaking data on the market key index is obtained.
The word embedding matrix is formed by word segmentation, part-of-speech tagging, entity identification and professional term extraction of financial expert speaking data, and comprises the following steps:
word segmentation is carried out on financial expert speaking data by adopting a word segmentation model trained by a corpus in the financial field;
Labeling the parts of speech of the segmented financial expert speaking data;
Identifying financial entities of the financial expert speaking data after part of speech tagging by adopting a long-short-term memory network or a transducer model;
extracting the technical terms by combining a deep learning model of a dictionary in the financial field to form a financial technical term library, and extracting the technical terms of the financial entity by calculating cosine similarity of the identified financial entity and the technical terms in the financial technical term library;
A word embedding matrix of financial expert speech data is constructed based on the extracted technical terms of the financial entity.
The emotion polarity analysis for each text vector in the word embedding matrix comprises the following steps:
analyzing each text vector in the word embedding matrix by using the trained deep neural network model, and calculating the emotion polarity score of each word;
The optimistic, cautious or pessimistic class classification of financial expert speech data includes:
Converting the context information of the financial expert language data into a feature vector by using TF-IDF or word2vec, and integrating the feature vector with the score of the emotion polarity to obtain a comprehensive feature vector;
classifying the financial expert speaking data by a trained softmax regression model;
The emotion intensity of the word embedding matrix obtained by combining emotion polarity and category classification comprises the following steps:
and calculating the emotion intensity of the word embedding matrix according to the weight and polarity scores of different emotion categories.
Wherein the constructing the market emotion assessment model based on the historical financial expert speech data and the corresponding historical market feedback data comprises:
collecting historical language data and related market feedback data of financial experts, and preprocessing the data;
Extracting key features from the preprocessed data to construct key feature vectors, wherein the key features comprise scores of emotion polarities of historical language data of financial experts, category classification and market key indexes of related market feedback data;
And constructing realization sequence data based on the constructed key feature vector, and training by adopting a long-short-term memory network to obtain a market emotion assessment model.
The method for estimating the market emotion of the word embedding matrix by adopting the market emotion estimation model, and carrying out correlation analysis on the market emotion and the market key index, wherein the step of obtaining the influence of financial expert speaking data on the market key index comprises the following steps:
And obtaining the influence of financial expert speaking data on the market key indexes according to the obtained market emotion of the word embedding matrix, the score of emotion polarity and the change curve of the market key indexes of the market feedback data related to category classification matching.
Finally, it is noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.