Disclosure of Invention
      The technical problem to be solved by the invention is that the prior art has obvious defects in the aspects of field suitability, training efficiency, matching precision, system expansibility and the like of scientific and technological service supply and demand matching, and the invention aims to provide a scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning, which realizes high-efficiency and accurate supply and demand matching by constructing a field specific data set, multitasking joint training, self-adaptive weight optimization, knowledge enhancement contrast learning and two-stage retrieval rearrangement strategies, can realize accurate and high-efficiency butt joint of technical requirements and services, and obviously improves matching precision, system efficiency and user experience.
      The invention is realized by the following technical scheme:
       the first aspect of the invention provides a scientific and technological service supply and demand intelligent matching method based on deep learning, which comprises the following specific steps: 
       acquiring user demand and science and technology service interaction data accumulated by a science and technology service platform history, and constructing a science and technology service supply and demand matching data set; 
       Constructing a pre-training language model, and performing post pre-training on the pre-training language model based on multi-task learning to obtain a first optimization model; 
       Using the path distance between the entities in the knowledge graph as a measurement standard of sample difficulty, and carrying out classification on the knowledge graph auxiliary samples; 
       based on the knowledge graph auxiliary sample pairs, performing contrast learning on the first optimization model to obtain a second optimization model; 
       collecting historical interaction data of a real user to optimize a second optimization model to obtain a rearrangement model; 
       And (3) carrying out multidimensional relevance scoring on the candidate services based on the rearrangement model, and obtaining a final recommendation result based on the weighted sorting. 
      Further, the acquiring the user demand and the science and technology service interaction data accumulated by the history of the science and technology service platform and constructing a science and technology service supply and demand matching data set specifically includes:
       Obtaining user demand and science and technology service interaction data accumulated based on history of a science and technology service platform, extracting a technology demand text, a science and technology service text and a correlation score as a science and technology service supply and demand matching data set, wherein, 
      The technical requirement text is technical service requirement description submitted by a user;
       The science and technology service text is a science and technology service content description provided by the platform; 
       the relevance score is generated by combining manual annotation and scoring model prediction based on user historical interaction behaviors. 
      Further, in the multi-task learning, the multi-tasks specifically comprise a masking task, a term prediction task, a regression task and a sequencing task.
      Further, the post-pretraining of the pretraining language model based on the multitask learning specifically comprises:
       Based on the masking task, carrying out random masking on an input text of a pre-training language model, carrying out semantic understanding of the text according to the rest visible text, obtaining words with hidden masking according to the semantic understanding of the text, carrying out correlation calculation on words predicted by the model and words with actual masking to obtain prediction accuracy, and optimizing model parameters by using a cross entropy loss function based on the prediction accuracy; 
       extracting key technical terms from an input text by using a domain dictionary based on a term prediction task, randomly selecting the extracted key technical terms and replacing the key technical terms with marks in the text, predicting the key technical terms replaced with the marks according to context information, calculating differences between the predicted key technical terms and actual key technical terms by using a cross entropy loss function, and optimizing model parameters according to the differences; 
       The method comprises the steps of obtaining a required text and a service text pair according to an input text based on a regression task, marking a relevance score for each pair of texts, encoding the required text and the service text pair by using a pre-training language model, extracting semantic feature vectors of the encoded required text and service text pair, carrying out relevance score prediction based on the extracted feature vectors, constructing a loss function according to the relevance score prediction, and optimizing model parameters; 
       and (3) a sorting task, namely constructing a loss function optimization model sorting capability based on the relevance score according to the relevance score of the demand text and the service text pair. 
      Further, when training the pre-training model, the method further comprises dynamically adjusting task weights during multi-task learning, and the dynamically adjusting process comprises the following steps:
       Setting training rounds and obtaining the current training round number; 
       obtaining a progress ratio according to the training round and the current training round number; 
       obtaining a transition factor according to the progress proportion; 
       and performing weight setting on the mask task, the term prediction task, the regression task and the sequencing task according to the transition factors. 
      Further, the acquiring the knowledge graph in the technical service field, and constructing the auxiliary sample pair of the knowledge graph specifically includes:
       generating a knowledge graph of the scientific and technological service field according to the scientific and technological service supply and demand matching data set; 
       in the knowledge graph of the scientific and technological service domain, the requirement-service pairs belonging to the same technical sub-domain or having strong semantic association are defined as positive samples, and the requirement-service pairs which are irrelevant across the technical domain or the semantic association are defined as negative samples. 
      Further, the learning of the first optimization model based on the knowledge-graph auxiliary sample pair to obtain a second optimization model specifically includes:
       acquiring nodes of a knowledge graph in the technical service field, and extracting node path distances; 
       extracting nodes with the node path distance higher than a set threshold value, and defining the nodes as difficult-to-separate samples; 
       And constructing a contrast learning loss function, weighting the difficult-to-separate samples, and optimizing sample representation to obtain a second optimization model. 
      Further, the collecting the historical interaction data of the real user optimizes the second optimization model to obtain a rearrangement model, which specifically includes:
       historical interaction data of a real user is obtained, 
      Training the second optimization model based on the historical interaction data of the real user to obtain a rearrangement model.
      The second aspect of the invention provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes a scientific and technological service supply and demand intelligent matching method based on deep learning when executing the program.
      A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a deep learning based intelligent matching method for scientific and technological service supply and demand.
      Compared with the prior art, the invention has the following advantages and beneficial effects:
       1. The field suitability is improved, the field-specific data set construction method and the pre-training task are designed according to the professionality and complexity of the science and technology service field, and the understanding capability of the model to the industry term, the technical background and the demand context is improved through semantic enhancement and knowledge graph integration, so that the limitation of the general pre-training model in the application of the professional field is solved. 
      2. The training efficiency and performance are optimized by introducing a self-adaptive multi-task weight optimizing mechanism, dynamically adjusting the weight distribution of multi-tasks such as regression, sequencing and the like, reducing the interference among tasks, improving the training efficiency and the overall performance of a model, and ensuring the stability and reliability of the system in complex matching tasks.
      3. And (3) enhancing the semantic matching precision, namely providing a knowledge enhancement contrast learning method, constructing a high-quality sample pair by using a domain knowledge graph, and optimizing the discrimination capability of the model on the fine-grained semantic relation between the technical requirements and the service, thereby realizing more accurate supply and demand matching.
      4. And designing a two-stage searching and rearranging strategy, screening candidate services through efficient semantic searching, and grading multidimensional correlation by combining a refined rearranging model, so that the matching precision is ensured, the response speed of the system is obviously improved, and the real-time requirement under a large-scale data scene is met.
      5. The cross-scene universality is realized by constructing a modularized and extensible technical framework, supporting the flexible adjustment of task design and optimization strategies according to different industries and application scenes, and simultaneously ensuring the adaptability and continuous optimization capability of the model in the cross-field and cross-scene through dynamic updating of user interaction data and knowledge maps.
    
    
      Detailed Description
      For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.
      As a possible implementation manner, as shown in FIG. 1, the embodiment provides a scientific and technological service supply and demand intelligent matching method based on deep learning, which comprises the following specific steps of obtaining user demand and scientific and technological service interaction data accumulated by a scientific and technological service platform history, constructing a scientific and technological service supply and demand matching data set, constructing a pre-training language model, performing post pre-training on the pre-training language model based on multi-task learning to obtain a first optimization model, using path distances among entities in a knowledge graph as a measurement standard of sample difficulty, performing knowledge graph auxiliary sample pair classification, performing comparison learning on the first optimization model based on the knowledge graph auxiliary sample pair to obtain a second optimization model, collecting history interaction data of a real user to optimize the second optimization model to obtain a rearrangement model, performing multi-dimensional correlation scoring on candidate services based on the rearrangement model, and obtaining a final recommendation result based on weighted sequencing.
      In this embodiment, by constructing a multidimensional dataset covering the technical service field and enhancing technical terms, industry background and semantic relationships in combination with a field knowledge graph, a field-specific technical term Prediction task (TECHNICAL TERM Prediction, TTP) is designed, so that the understanding capability of the pre-training language model on complex semantics of the technical field is significantly improved. An adaptive multi-task weight optimization mechanism is introduced, and the weight distribution of the multi-tasks such as regression, sequencing and the like is adjusted in real time according to the task convergence speed and the difficulty through a dynamic weighted average and reinforcement learning auxiliary optimization strategy, so that the inter-task interference is effectively reduced, and the training efficiency and the model performance are improved. And a high-quality positive and negative sample pair is constructed by utilizing a knowledge graph in the scientific and technological service field, and difficult-to-separate samples are optimized preferentially by weighting InfoNCE loss functions, so that the discrimination capability of the model on fine granularity semantic relation between technical requirements and services is enhanced. Through a two-stage searching and rearranging strategy, the candidate service is efficiently searched and screened through semantic vectors in the first stage, a refined rearranging model is adopted for multi-dimensional relevance scoring in the second stage, and a final matching result is output by combining a weighted sorting algorithm. The strategy ensures the matching precision and simultaneously remarkably improves the system efficiency. The model universality is ensured through dynamic updating of the user history interaction data and the domain knowledge graph. The method and the system for searching the technical service platform have the advantages that through accurate matching of technical requirements and the technical service, the user searching cost is remarkably reduced, the user satisfaction is improved, meanwhile, through efficient searching and intelligent recommending, the resource allocation efficiency of the technical service platform is optimized, and the service capacity and the market competitiveness of the platform are enhanced. Compared with the prior art, the method has obvious advantages in the aspects of field suitability, semantic understanding capability, training efficiency, matching precision, system performance, application expansibility and the like. The innovative technical scheme not only fills the technical blank in the intelligent matching field of the scientific and technological service supply and demand, but also shows excellent effect in practical application, and has important theoretical value and commercial prospect.
      The specific implementation steps comprise:
       1. construction of a science and technology service supply and demand matching data set: 
       The embodiment constructs a high-quality and domain-specific training data set based on the user demand and the scientific and technological service interaction data accumulated by the scientific and technological service platform. The data set takes (technical requirement text, scientific and technological service text and relevance scoring) as a basic unit, wherein the technical requirement text is technical service requirement description submitted by a user, the scientific and technological service text is scientific and technological service content description provided by a platform, the relevance scoring is generated by combining manual annotation and scoring model prediction based on historical interaction behaviors (such as clicking, consultation and achievement) of the user, and the matching degree is quantized by adopting a numerical form in a range of 0-1. In order to improve the data quality, the embodiment introduces a domain knowledge graph to semantically enhance the data, marks key entities (such as technical fields, application scenes and core technical points) in technical requirements and service texts and relations thereof, and ensures the accuracy and representativeness of the data at a semantic level. Meanwhile, training data covering multiple industries and multiple scenes is constructed through data cleaning, denoising and diversity sampling, so that a solid foundation is laid for subsequent model training. 
      2. Training phase of pre-training language model based on multitask learning, multitask combined training and field specific task design:
       Post pre-training is performed on the basis of a pre-training language model (such as BERT) so as to reduce the difference between the pre-training and the domain fine tuning, and the understanding and characterization capability of the model on complex semantic relations between scientific and technological services and technical requirements are improved through multi-task joint optimization and domain specific task design in the post pre-training stage. The specific task design is as follows: 
       2.1 masking task (masking language modeling task MLM): 
       masking part of words in the input text requires that the model predicts the masked content according to the context, and adopts cross entropy loss function optimization to further enhance the language understanding capability of the model. 
      Mask language modeling task loss function:;
      N represents the number of samples;
       M i is the set of masked words in sample i; 
       w t is the real word of the masked position t; 
       p (w t|contexti) is the probability that the model predicts the word w t according to the context i, and is obtained by calculating the softmax of the output layer of the pre-training language model (such as BERT); 
       the masked language modeling task penalty function optimizes the model's ability to understand context semantics by minimizing the probability differences between predicted words and real words. 
      2.2 Domain specific technical term Prediction task (TECHNICAL TERM Prediction, TTP):
       Aiming at the technical terms and the contextual semantic features of the science and technology service field, an innovative task is provided, key technical terms are randomly masked in an input science and technology service text (extracted through a field dictionary or TF-IDF), a model is required to predict the terms, meanwhile, a term relation prediction task is designed, the upper-lower relationship or relevance (such as the hierarchical relationship of artificial intelligence and deep learning) among the terms in the input text is predicted, and the sensitivity of the model to specific semantics of the science and technology field is improved through cross entropy loss function optimization. 
      The term prediction task (Term Prediction Task) loss function:
       ;
      N represents the number of samples;
       T i is the set of masked key technical terms in sample i; 
       w t is the real word of the masked position t; 
       p (w t|contexti) is the probability that the model predicts term w t according to context i, calculated by softmax through the output layer of the pre-trained language model (e.g., BERT). 
      Loss function of relational classification task (Relation Classification Task):
      
        
      
      N represents the number of samples;
        is a set of relationship classes (e.g., upper-lower relationship, association relationship) of term pairs in sample i; 
        is a term pair real label (0 or 1, encoded with one-hot) on the relationship class r; 
        the probability of the model prediction term pair belonging to the relation class r is obtained through calculation of the feature vector through the full connection layer and softmax. 
      Total loss function:;
       The weight coefficient of the two subtasks is balanced and can be adjusted according to the importance of the task or the convergence speed.
      The loss function enhances the deep understanding capability of the model to specific semantics in the technical field through combining optimized term prediction and relation prediction. The term selection and the relation annotation can be combined with a domain knowledge graph or an expert dictionary to ensure the quality and the domain suitability of training data.
      2.3 Regression task:
       and taking the demand text and the service text as input, predicting the relevance score of the demand text and the service text, and optimizing a model by adopting a Mean Square Error (MSE) loss function, so that the matching degree of the model can be accurately quantized. 
      Regression loss function:;
      N represents the number of samples;
       s i is the true relevance score of the sample for i (generated by manual labeling or rules); 
        The relevance score of the model prediction is usually obtained by mapping feature vectors coded by a pre-training language model through a full-connection layer. 
      The loss function optimizes the continuous value predictive capability of the model for supply-demand matching correlations by minimizing the squared difference between the predictive score and the true score.
      2.4 Ordering tasks:
       and carrying out relevance ranking on different services under the same requirement, ensuring that the service row highly related to the requirement is positioned at the front position, optimizing based on a loss function, and improving the ranking capability of the model in a recommended scene. 
      Ranking loss function:
       ;
      N represents the number of samples;
        model predictive scores for samples i and j, respectively; 
        true relevance scores for samples i and j, respectively; 
        is an indication function when The value is 1, otherwise 0.
      The penalty function optimizes the ranking results, ensuring that higher correlation services obtain higher predictive scores.
      The loss functions of the tasks are jointly optimized in a weighted summation mode to form a multi-task learning framework. Compared with the general task design of the traditional pre-training model, the TTP task of the embodiment focuses on technical field terms and relations, so that modeling capability of the model on deep semantics of technical contents is remarkably enhanced, and a primarily optimized model (a first optimization model) is obtained.
      Total loss function:
       ;
      Wherein alpha, beta, gamma and delta are super parameters, the task weight is balanced by experimental tuning,The task loss function is modeled for the mask language,The total loss function of the task is classified for the relationship,In order to return the loss function,Is a sort penalty function.
      3. Adaptive multitasking weight optimization mechanism:
       In order to solve the problems of low training efficiency or inter-task interference caused by the dependence of the loss weights (alpha, beta, gamma and delta) on manual tuning in the multi-task training, the embodiment provides a self-adaptive weight adjustment mechanism based on task convergence speed and difficulty. 
      Weights are dynamically adjusted based on a function of training progress (epoch or step). Assuming that the total training wheel number is T (in epoch), the current training wheel number is T, and defining a progress ratioThe range is [0,1]. Then define a transition factor:;
      Wherein:
       k controls the steepness (i.e. transition speed) of the function, and the value is usually 10-20; 
        is the midpoint of the function, represents the intermediate position of the transition, and is usually 0.5; 
        is a natural constant (approximately equal to 2.71828) and is used as a base in an excessive factor formula for smoothing a transitional scene, wherein the core goal of the transitional factor is to smoothly control the dynamic change of task weight and solve the pain point of manual parameter adjustment; 
       the weights of the mask task and the term prediction task are gradually reduced along with the training progress and are used ;
      The weight of the regression task and the sequencing task is gradually increased along with the training progress and is used。
      In model training, the dynamic weights of the loss functions of the tasks are as follows:
       ;
       ;
       ;
       ;
       ;
       ;
       ;
       ;
       ;
       Wherein alpha, beta, gamma and delta are superparameters, omega is the sum of the superparameters, and the sum is the total weight. 
      4. Knowledge enhancement contrast learning promotes model representation capabilities:
       On the basis of the first optimization model, the embodiment further provides a Knowledge enhancement contrast learning (knowledges-Enhanced Contrastive Learning, KECL) method, and the discrimination capability of the model on semantic relationships and the matched fine granularity distinction degree are remarkably improved by deeply integrating the domain Knowledge graph and the contrast learning frame. The specific method comprises the following steps: 
       4.1 knowledge-graph auxiliary sample pair construction 
      KECL innovatively introduces a knowledge graph in the technical service field as an auxiliary tool for guiding the construction of the sample pair. The knowledge graph comprises technical field classification, hierarchical and association relations among terms, expert resources and other structural information. Based on this information KECL tightly combines the construction process of the sample pairs with domain knowledge:
       Positive sample pair-a need-service pair belonging to the same technical sub-domain or having a strong semantic association is defined as a positive sample. For example, "artificial intelligence algorithm development requirements" and "machine learning model optimization services" within the same technical sub-domain are considered positive samples, as they share similar technical classifications or term paths in the knowledge graph. 
      Negative-sample pair-a cross-technology domain or semantically independent demand-service pair is defined as a negative-sample. For example, "biomedical technology requirements" and "cloud computing services" are considered as negative examples because they belong to different technology branches in the knowledge graph.
      Through the guidance of the knowledge graph, the construction of the sample pair is not limited to the text surface layer characteristics, but the semantic hierarchy and entity relationship in the field are fully considered, so that the model can learn the more field-specific representation.
      4.2 Knowledge weighted contrast loss function optimization:
       Knowledge-enhanced contrast learning (knowledges-Enhanced Contrastive Learning, KECL) uses path distances between entities in a Knowledge-graph as a measure of sample difficulty. The path distance reflects the semantic association degree of two entities in the knowledge graph, namely, the entity with shorter path distance has stronger semantic association with common semantics and is easier to distinguish, and the entity with longer path distance has weaker semantic association with semantics and belongs to a 'difficult-to-separate sample'. Based on this, KECL devised a weighted InfoNCE loss function that gives higher weight to difficult samples (i.e., positive pairs of samples with longer path distances or negative pairs of samples with shorter path distances), optimizing the representation of these samples preferentially. 
      The direct effect of the weighting mechanism is that the model can better pull the distance of the positive sample pair in the semantic space, and meanwhile, the distance of the negative sample pair is obviously pushed away, so that the distinguishing capability of the fine-granularity semantic relation is improved.
       ;
       ;
      Wherein, the Representing the weight between sample i and sample j,Representing the distance between sample i and sample j,Representing the sum of the distances between all pairs of samples,A characteristic representation of the sample i is represented,Representing positive samplesIs characterized in that,Representing the sensitivity of the loss function,The similarity of the samples is indicated,A positive pair of samples is represented and,Representing knowledge embedding versus loss functions.
      And obtaining a further optimized model, namely a second optimized model after optimization.
      5. Model fine-tuning based on real user history data:
       In order to make the model fit the actual application scene more, in this embodiment, the second optimization model is fine-tuned by using the historical interaction data of the real user. The model parameters are further optimized by collecting the behavior data of real clicking, consultation, bargaining and the like of the user on the platform and adopting a supervised fine tuning mode in combination with the feedback information of the user. The fine adjustment in the stage not only improves the adaptability of the model to the real demands of users, but also effectively reduces the cold start and generalization risks of the model in practical application, and a final rearrangement model (rearrangement model) is obtained. 
      6. Two-stage search and rearrangement mechanism for science and technology service supply and demand matching:
       In practical application, the embodiment adopts a two-stage searching and rearranging strategy, and combines matching efficiency and accuracy: 
       The first stage of semantic vector efficient retrieval, namely using a second optimization model as a semantic vector retrieval model, vectorizing a technical demand text input by a user and a service text in a science and technology service library, and rapidly screening candidate science and technology services through efficient vector similarity calculation (such as cosine similarity or ANN neighbor search), thereby realizing coarse ranking under a large-scale service library. 
      And in the second stage, the rearrangement model is used as a rearrangement model for carrying out multidimensional relevance scoring (comprising semantic relevance, user preference matching degree, field suitability and the like) on the candidate service, and a final recommendation result is output by combining a weighted ranking algorithm. The mechanism effectively improves the retrieval efficiency and recommendation accuracy of the system, and meets the actual requirements of a large-scale scientific and technological service supply and demand scene.
      In summary, the present embodiment remarkably improves the level of intellectualization of the supply and demand matching of the scientific and technological service platform by constructing a high-quality domain specific data set, innovatively designing technical term prediction task (TTP) and multi-task transitional training, introducing an adaptive multi-task weight optimization mechanism, providing a Knowledge Enhancement Contrast Learning (KECL) method, and combining two-stage search and rearrangement strategies. The technical scheme not only breaks through the matching accuracy and efficiency, but also has obvious advantages in field suitability, technical innovation and training efficiency, can be widely applied to intelligent recommendation and resource allocation scenes of various scientific and technological service platforms, and has important practical application value and popularization prospect.
      As a possible implementation manner, the embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements a scientific and technological service supply and demand intelligent matching method based on deep learning when executing the program.
      As one possible implementation manner, the present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a scientific and technological service supply and demand intelligent matching method based on deep learning.
      The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.