Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a financial emotion analysis system which solves the problems in the background art.
In order to solve the technical problems, the invention adopts the basic conception of the technical scheme that:
A financial emotion analysis system includes a text data receiving module for receiving text data containing financial related content;
The emotion analysis module is connected with the text data receiving module, and is used for identifying emotion segmentation words and trend expression words of the received financial text data based on a preset large language model and calculating initial semantic vectors of the emotion segmentation words and trend expression words;
The rewarding function construction module is connected with the emotion analysis module and is used for generating a corresponding rewarding function model according to preset emotion scores and trend scores, and reinforcing initial semantic vectors of the emotion segmentation words and trend expression words based on the rewarding function model to obtain reinforced semantic vectors;
the reinforcement learning optimization module is connected with the reward function construction module and is used for optimizing the large language model based on the reward function model and a preset reinforcement learning model to generate an optimized language model;
The financial institution public opinion evaluation module is connected with the reinforcement learning optimization module, and is used for carrying out regression analysis on the financial text data processed by the emotion analysis module based on the optimized language model, calculating emotion scores of the financial text, and generating financial institution public opinion reports according to the emotion scores and the financial institution public opinion data of the target area;
and the public opinion coping scheme generating module is used for acquiring the financial institution public opinion report and extracting keywords according to the financial institution public opinion report to generate a corresponding scheme.
Optionally, the steps of building and training the large language model are as follows:
Collecting text data related to the financial field, and obtaining acquired data which are divided into a training set, a verification set and a fine tuning set, training a language model by using the training set, and setting the layer number of the large language model, the size of a hidden unit and the number of attention heads;
combining the characteristics of the financial field, designing a self-supervision learning task and performing fine adjustment on the financial field data through a fine adjustment set based on a pre-training model;
And evaluating the trimmed large language model by using the verification set, and arranging the trained large language model into a transaction platform for use.
Optionally, the steps of identifying emotion segmentation words and trend expression words based on the preset large language model and calculating initial semantic vectors of the emotion segmentation words and trend expression words are as follows:
Firstly, preprocessing the received financial text data, and recognizing emotion segmentation and trend related expression words of the preprocessed text data by using a preset large language model;
Identifying emotion segmentation words and trend expression words through a formula Respectively calculating initial semantic vectors of the words, wherein v i represents the initial semantic vector of the emotion segmentation word or trend expression word i,The context window representing word i, w j represents the weight of context word j, freq (j) represents the frequency of context word j in the whole corpus, E is a smoothing parameter, the calculation problem caused by too low frequency or zero is prevented, E j represents the embedding vector of context word j, Z is the number of context words, and α is the weighting coefficient of the context embedding vector. It controls the influence degree of the context on the initial semantic vector, P i represents the emotion embedding vector of word i in the emotion segmentation word set, T i represents the trend embedding vector of word i in the trend expression word set, beta is the weighting coefficient of emotion and trend embedding vector, and gamma and delta are the weighting coefficients of emotion embedding vector P i and trend embedding vector T i respectively.
Optionally, the step of identifying the emotion segmentation words and the trend expression words of the received financial text data by using the preset large language model and calculating initial semantic vectors of the emotion segmentation words and the trend expression words comprises the following steps:
identifying emotion segmentation words in the financial text by using the large language model and giving emotion scores and trend scores according to the use condition of the emotion segmentation words in the context and the matching degree of the emotion segmentation words and emotion labels preset in the large language model;
And constructing a rewarding function model according to the analyzed emotion score and trend score by using a formula omega i=α·ψi+β·τi·vi, and evaluating initial semantic vectors of the emotion segmentation words and the trend expression words according to the emotion score and the trend score by using the rewarding function model, wherein omega i represents rewarding function values of the emotion segmentation words or the trend expression words i. Alpha and beta are adjustment coefficients for balancing the contribution of the mood score and trend score in the final bonus function. And ψ i denotes an emotion score of the emotion segmentation i. τ i represents the trend score of the trend expression word i;
Based on feedback signals of the constructed reward function model, initial semantic vectors of emotion segmentation words and trend expression words are adjusted, and the semantic vectors are gradually optimized, so that the emotion vectors more accord with emotion and trend directions represented by emotion scores and trend scores;
And updating relevant parameters in the large language model according to the adjustment result of the semantic vector, and gradually adjusting the internal structure and the weight of the model through repeated iteration and repeated optimization.
Optionally, optimizing the large language model based on the reward function model and a preset reinforcement learning model, and generating the optimized language model includes the following steps:
setting a reinforcement learning frame comprising states, actions and rewards on the basis of the large language model;
Using the previously generated reward function model, according to the performance of the model in the current state, making use of the formula Calculating a corresponding prize value, wherein the prize value reflects the accuracy of the current model output and the degree of matching with the expected target, R represents the final calculated prize value, which reflects the accuracy of the current model output and the degree of matching with the expected target, α is an adjustment factor for controlling the overall magnitude of the prize value, N represents the number of samples, typically the number of samples used in evaluating the model representation,The expected target prize value for sample i, which is the target value set based on the desired output,Representing the absolute error between the actual output of the i th sample and the expected target, this value measuring the gap between the model output and the target value;
Feeding back the reward signal to the reinforcement learning model, directing it to select the optimal action in the subsequent training process to maximize the jackpot;
Based on the reward signal, strengthening the learning model and updating the strategy, and after executing the strategy updating, selecting actions based on the current strategy, and adjusting the parameters of the large language model.
Optionally, based on the optimized language model, performing regression analysis on the financial text data processed by the emotion analysis module, calculating emotion scores of the financial text, and according to the emotion scores, counting financial institution public opinion data of a target area, predicting future financial fluctuation indexes, and further evaluating financial risk levels, wherein the steps include:
the method comprises the steps that financial text data processed by an emotion analysis module are used as input, the reinforced semantic vectors of the extracted emotion segmentation words and trend expression words are included, the reinforced semantic vectors of the obtained emotion segmentation words and trend expression words are used as feature input of regression analysis, and target emotion scores of financial texts are set as target variables of a regression model;
Building and training a regression model by using the marked financial text data set, and after training, applying the regression model to calculate emotion scores of new financial text data, wherein each piece of input text data can generate an emotion score to reflect emotion tendency and strength of the emotion score;
summarizing and counting emotion scores of all the processed financial text data, classifying and analyzing the data according to time, geographic position and text source dimension, and obtaining financial institution public opinion data in a target area;
Analyzing the collected public opinion data, identifying the variation trend of emotion scores and constructing a prediction model to predict future public opinion fluctuation indexes;
setting a risk threshold of a financial institution public opinion fluctuation index according to historical data and financial institution researches, and indicating that the financial institution is likely to face higher risk when the predicted public opinion fluctuation index exceeds the threshold;
And comparing the predicted future public opinion fluctuation index with a set risk threshold, judging that the public opinion fluctuation of the financial institution is high if the predicted index is higher than the threshold, judging that the public opinion fluctuation risk is low if the predicted index is lower than or equal to the threshold, and generating a financial institution public opinion evaluation report based on the predicted result.
Optionally, the steps of obtaining the public opinion report of the financial institution and extracting keywords according to the public opinion report of the financial institution to generate a corresponding scheme are as follows:
carrying out text analysis on the obtained public opinion report, splitting the public opinion report into paragraph, sentence or finer-granularity word levels, and carrying out grammar analysis and semantic analysis on the analyzed text;
Extracting keywords with potential meaning and predictive value by the large language model through context and semantic understanding, and clustering and classifying the extracted keywords;
Analyzing the correlation between the extracted keywords and each paragraph or event in the public opinion report, determining the corresponding context and importance of each keyword in the public opinion report through semantic similarity calculation or dependency analysis, and finally determining the type of the scheme to be generated according to the category and importance of the keywords;
The financial institution verifies the generated solution by an analog or back-test method. Observing the performances in the historical data and the simulation environment to verify the validity of the strategy and adjusting the scheme according to the verification result so as to remove unreasonable parts;
And integrating the results generated by keyword extraction, correlation analysis and response schemes into a complete public opinion response report.
The financial emotion analysis system also comprises a database, wherein the database is used for receiving the data sent by the text data receiving module, the financial institution public opinion evaluation module and the public opinion response scheme generating module through communication signals, storing the data, and encrypting the data by adopting a blockchain technology in the process of data receiving and transmitting.
After the technical scheme is adopted, compared with the prior art, the invention has the following beneficial effects, and of course, any product for implementing the invention does not necessarily need to achieve all the following advantages at the same time:
Through the accurate recognition of the emotion analysis module, the dynamic optimization of the rewarding function and the reinforcement learning and the accurate report generation of the public opinion evaluation module, the application can comprehensively and efficiently process complex financial text data. The emotion analysis module ensures that the system accurately recognizes emotion segmentation words and trend expression words in a large amount of unstructured data, and provides key emotion and trend information. And the combination of the reward function and reinforcement learning enables the system to automatically optimize semantic vectors of emotion and trend in a continuously changing market environment, and continuously improves the prediction capability of the model.
The following describes the embodiments of the present invention in further detail with reference to the accompanying drawings.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings.
Referring to fig. 1, in this embodiment, a financial emotion analysis system is provided, which includes a text data receiving module for receiving text data including financial related content;
The emotion analysis module is connected with the text data receiving module, and is used for identifying emotion segmentation words and trend expression words of the received financial text data based on a preset large language model and calculating initial semantic vectors of the emotion segmentation words and trend expression words;
The rewarding function construction module is connected with the emotion analysis module and is used for generating a corresponding rewarding function model according to preset emotion scores and trend scores, and reinforcing initial semantic vectors of the emotion segmentation words and trend expression words based on the rewarding function model to obtain reinforced semantic vectors;
the reinforcement learning optimization module is connected with the reward function construction module and is used for optimizing the large language model based on the reward function model and a preset reinforcement learning model to generate an optimized language model;
The financial institution public opinion evaluation module is connected with the reinforcement learning optimization module, and is used for carrying out regression analysis on the financial text data processed by the emotion analysis module based on the optimized language model, calculating emotion scores of the financial text, and generating financial institution public opinion reports according to the emotion scores and the financial institution public opinion data of the target area;
and the public opinion coping scheme generating module is used for acquiring the financial institution public opinion report and extracting keywords according to the financial institution public opinion report to generate a corresponding scheme.
The building and training steps of the large language model in this embodiment are as follows:
The method comprises the steps of collecting text data related to the financial field, obtaining acquired data, dividing the acquired data into a training set, a verification set and a fine tuning set, training a language model by using the training set, setting the layer number of the large language model, the size of a hidden unit and the number of attention heads, wherein the training set accounts for 70% of the total data, and is mainly used for training the model. The validation set represents 15% of the total data and is used to evaluate the performance of the model during the training process. The trim set accounts for 15% of the total data, dedicated to trimming on a specific financial task, with the number of layers of the large language model being a 12-layer transducer encoder. The size of the hidden unit is 768 hidden units. The number of attention heads is 12.
Combining the characteristics of the financial field, designing a self-supervision learning task and performing fine adjustment on the financial field data through a fine adjustment set based on a pre-training model;
And evaluating the trimmed large language model by using the verification set, and arranging the trained large language model into a transaction platform for use. The accuracy and the practicability of the model in the financial text analysis can be improved through the arrangement of the large language model, so that transaction decision and financial institution prediction are better supported.
In this embodiment, the steps of identifying the emotion segmentation words and the trend expression words based on the preset large language model and calculating the initial semantic vectors of the emotion segmentation words and the trend expression words are as follows:
Firstly, preprocessing the received financial text data, and recognizing emotion segmentation and trend related expression words of the preprocessed text data by using a preset large language model;
Identifying emotion segmentation words and trend expression words through a formula Respectively calculating initial semantic vectors of the words, wherein v i represents the initial semantic vector of the emotion segmentation word or trend expression word i,The context window representing word i, w j represents the weight of context word j, freq (j) represents the frequency of context word j in the whole corpus, E is a smoothing parameter, the calculation problem caused by too low frequency or zero is prevented, E j represents the embedding vector of context word j, Z is the number of context words, and α is the weighting coefficient of the context embedding vector. It controls the influence degree of the context on the initial semantic vector, P i represents the emotion embedding vector of word i in the emotion segmentation word set, T i represents the trend embedding vector of word i in the trend expression word set, beta is the weighting coefficient of emotion and trend embedding vectors, and gamma and delta are the weighting coefficients of emotion embedding vector P i and trend embedding vector T i respectively. By introducing smooth parameters and weighting coefficients, the stability and the robustness of the model are enhanced, and finally the expression capability of semantic vectors in emotion and trend analysis is improved, so that the accuracy and the application effect of the model in financial text analysis are improved.
In this embodiment, the steps of identifying the emotion segmentation words and the trend expression words of the received financial text data by the preset large language model and calculating the initial semantic vectors of the emotion segmentation words and the trend expression words are as follows:
identifying emotion segmentation words in the financial text by using the large language model and giving emotion scores and trend scores according to the use condition of the emotion segmentation words in the context and the matching degree of the emotion segmentation words and emotion labels preset in the large language model;
And constructing a rewarding function model according to the analyzed emotion score and trend score by using a formula omega i=α·ψi+β·τi·vi, and evaluating initial semantic vectors of the emotion segmentation words and the trend expression words according to the emotion score and the trend score by using the rewarding function model, wherein omega i represents rewarding function values of the emotion segmentation words or the trend expression words i. Alpha and beta are adjustment coefficients for balancing the contribution of the mood score and trend score in the final bonus function. And ψ i denotes an emotion score of the emotion segmentation i. τ i represents the trend score of the trend expression word i;
Based on feedback signals of the constructed reward function model, initial semantic vectors of emotion segmentation words and trend expression words are adjusted, and the semantic vectors are gradually optimized, so that the emotion vectors more accord with emotion and trend directions represented by emotion scores and trend scores;
And updating relevant parameters in the large language model according to the adjustment result of the semantic vector, and gradually adjusting the internal structure and the weight of the model through repeated iteration and repeated optimization. Through multiple iterations and optimization, the internal structure and weight of the model are gradually adjusted, and the accuracy and effect of the model in emotion and trend analysis tasks are ensured. The process is helpful to improve the precision and the application value of the model in the financial emotion analysis.
In this embodiment, the step of optimizing the large language model based on the reward function model and the preset reinforcement learning model to generate an optimized language model includes:
setting a reinforcement learning frame comprising states, actions and rewards on the basis of the large language model;
Using the previously generated reward function model, according to the performance of the model in the current state, making use of the formula Calculating a corresponding prize value, wherein the prize value reflects the accuracy of the current model output and the degree of matching with the expected target, R represents the final calculated prize value, which reflects the accuracy of the current model output and the degree of matching with the expected target, α is an adjustment factor for controlling the overall magnitude of the prize value, N represents the number of samples, typically the number of samples used in evaluating the model representation,The expected target prize value for sample i, which is the target value set based on the desired output,Representing the absolute error between the actual output of the i th sample and the expected target, this value measuring the gap between the model output and the target value;
Feeding back the reward signal to the reinforcement learning model, directing it to select the optimal action in the subsequent training process to maximize the jackpot;
Based on the reward signal, strengthening the learning model and updating the strategy, and after executing the strategy updating, selecting actions based on the current strategy, and adjusting the parameters of the large language model. The reinforcement learning model can be updated by adopting a strategy gradient method and Q-learning method in the prior art. Through reinforcement learning, the model can dynamically adjust parameters and strategies thereof, so that the model can better optimize output results when processing complex financial text tasks, and gradually approaches to an optimal solution. The process can not only enable the model to obtain better performance in initial training, but also continuously promote in practical application, adapt to changing financial institution environments, and finally promote the application value and accuracy of the model in the financial field.
In the actual use process, firstly, the current prediction state of the reinforcement learning model is set, for example, the model predicts the trend of a stock in the future, the model can adjust operations, for example, changing the predicted time window, adjusting the predicted technical index weight and the like. And judging the correctness of model prediction according to data feedback of an actual financial institution, and giving a reward value.
Using the previously generated reward function model, the accuracy of the model in predicting stock rises and falls is calculated. For example, if the model predicts that stock a will rise by 10% and the actual result is an 8% rise, a prize value R is calculated based on the error magnitude, with higher prize values indicating more accurate predictions. If the model's predictions match the actual results highly, then the prize value is high. At this time, the reward signal is fed back to the reinforcement learning model to guide the model to continue to adopt the current strategy, and if the reward value is low, the model can adjust the strategy, for example, change the weight of the technical index or change the prediction method.
Based on the feedback reward signal, the learning model is reinforced and its strategy is updated. For example, the model may find certain technical metrics more efficient under certain financial institution conditions, thereby prioritizing those metrics for the next prediction. Meanwhile, the model can adjust the parameters of the large language model to adapt to the new strategy.
In this embodiment, based on the optimized language model, regression analysis is performed on the financial text data processed by the emotion analysis module, the emotion score of the financial text is calculated, and according to the emotion score, the public opinion data of the financial institution in the target area is counted, the future financial fluctuation index is predicted, and the step of evaluating the financial risk level is as follows:
the method comprises the steps that financial text data processed by an emotion analysis module are used as input, the reinforced semantic vectors of the extracted emotion segmentation words and trend expression words are included, the reinforced semantic vectors of the obtained emotion segmentation words and trend expression words are used as feature input of regression analysis, and target emotion scores of financial texts are set as target variables of a regression model;
Building and training a regression model by using the marked financial text data set, and after training, applying the regression model to calculate emotion scores of new financial text data, wherein each piece of input text data can generate an emotion score to reflect emotion tendency and strength of the emotion score;
summarizing and counting emotion scores of all the processed financial text data, classifying and analyzing the data according to time, geographic position and text source dimension, and obtaining financial institution public opinion data in a target area;
Analyzing the collected public opinion data, identifying the variation trend of emotion scores and constructing a prediction model to predict future public opinion fluctuation indexes;
setting a risk threshold of a financial institution public opinion fluctuation index according to historical data and financial institution researches, and indicating that the financial institution is at high risk when the predicted public opinion fluctuation index exceeds the threshold;
And comparing the predicted future public opinion fluctuation index with a set risk threshold, judging that the public opinion fluctuation of the financial institution is high if the predicted index is higher than the threshold, judging that the public opinion fluctuation risk is low if the predicted index is lower than or equal to the threshold, and generating a financial institution public opinion evaluation report based on the predicted result. By carrying out regression analysis by taking the enhanced semantic vectors of the emotion segmentation words and the trend expression words as characteristic inputs, the emotion score can be accurately calculated and future public opinion fluctuation indexes can be predicted. By the method, potential financial institution risks can be identified, and risk assessment can be performed according to set risk thresholds. Finally, a financial institution public opinion evaluation report is generated based on the prediction result, and reliable financial institution analysis is provided for a decision maker.
In the actual use process, news about a company may contain emotion segmentation words and trend expression words such as 'profit drop', 'financial institution turbulence', and the like, and after model analysis, the reinforced semantic vectors of the words are used as input features of regression analysis. The regression model is trained using the historical financial data set. The model learns how to generate emotion scores reflecting emotion tendencies according to the characteristics of the input emotion segmentation words and trend expression words. After training is completed, when new financial data is entered, the model will generate an emotional score for each piece of data, e.g. "financial institution optimistic is 0.7" or "financial institution panic index is 0.8".
The generated emotional scores are summarized and classified according to time, geographic position or source (such as news and social media). For example, you may find that in a particular region or for a certain period of time, the financial institution emotion gradually shifts from "optimistic" to "pessimistic", which may be predictive of potential financial institution fluctuations.
And (3) establishing a prediction model to predict future public opinion fluctuation indexes by analyzing the variation trend of emotion scores. Based on financial institution studies and historical data, a risk threshold is set, such as "financial institutions may be at a higher risk when the financial institution panic index exceeds 0.6". If the predicted public opinion fluctuation index is above this threshold, a financial institution risk increase is determined.
In this embodiment, the steps of obtaining a public opinion report of a financial institution and extracting keywords according to the public opinion report of the financial institution to generate a corresponding scheme are as follows:
carrying out text analysis on the obtained public opinion report, splitting the public opinion report into paragraph, sentence or finer-granularity word levels, and carrying out grammar analysis and semantic analysis on the analyzed text;
Extracting keywords with potential meaning and predictive value by the large language model through context and semantic understanding, and clustering and classifying the extracted keywords;
Analyzing the correlation between the extracted keywords and each paragraph or event in the public opinion report, determining the corresponding context and importance of each keyword in the public opinion report through semantic similarity calculation or dependency analysis, and finally determining the type of the scheme to be generated according to the category and importance of the keywords;
The financial institution verifies the generated solution by an analog or back-test method. Observing the performances in the historical data and the simulation environment to verify the validity of the strategy and adjusting the scheme according to the verification result so as to remove unreasonable parts;
And integrating the results generated by keyword extraction, correlation analysis and response schemes into a complete public opinion response report. Through deep grammar and semantic analysis of the public opinion report by a large language model, keywords with potential significance and predictive value can be extracted, and an effective coping scheme can be generated according to the relativity of the keywords. Then, the financial institution can simulate or return to test the validity of the verification scheme to ensure that the generated strategy has practical application value. Finally, all the results are summarized into a complete public opinion coping report, and reliable financial institution analysis and coping strategies are provided for decision makers, so that the accuracy of financial institution prediction and the success rate of strategy implementation are improved.
A public opinion report is received regarding a financial crisis of a company. By parsing the public opinion report, the system splits the text into multiple paragraphs, sentences, and even specific words, such as "bankruptcy", "debt reorganization", etc. Next, the large language model parses and semantically analyzes the text to extract keywords with potential significance, such as "crisis management", "financial institution reaction", etc.
The large language model extracts key terms from the report based on contextual relationships and semantic understanding. For example, during analysis, the model extracts keywords such as "financial institution confidence", "investor panic", etc., and clusters them into relevant risk factor categories.
Next, the system analyzes the relevance of these keywords to specific paragraphs or events in the public opinion report, and determines the importance of each keyword in the report through semantic similarity and dependency analysis. For example, the model may recognize that "financial institution confidence" is repeatedly mentioned in a number of key paragraphs in the report, indicating that this is a problem that requires priority. Based on this, a response scheme for "financial institution confidence restoration" is generated, such as "reinforcing financial institution communication policy".
And finally, integrating all results of keyword extraction, relevance analysis and coping schemes into a complete public opinion coping report. Not only is the current situation of financial institution confidence problems specified in the report, but also optimized coping strategies are provided, suggesting how to implement these strategies to minimize risk and restore financial institution confidence.
The financial emotion analysis system also comprises a database, wherein the database is used for receiving the data sent by the text data receiving module, the financial institution public opinion evaluation module and the public opinion response scheme generating module through communication signals, storing the data, and encrypting the data by adopting a blockchain technology in the process of data receiving and transmitting.
The present invention is not limited to the above embodiments, and any person who can learn the structural changes made under the teaching of the present invention can fall within the scope of the present invention if the technical scheme is the same as or similar to the present invention. The technology, shape, and construction parts of the present invention, which are not described in detail, are known in the art.