+

CN120561315A - Scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning - Google Patents

Scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning

Info

Publication number
CN120561315A
CN120561315A CN202511052777.7A CN202511052777A CN120561315A CN 120561315 A CN120561315 A CN 120561315A CN 202511052777 A CN202511052777 A CN 202511052777A CN 120561315 A CN120561315 A CN 120561315A
Authority
CN
China
Prior art keywords
model
demand
task
text
technology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202511052777.7A
Other languages
Chinese (zh)
Inventor
裴贵军
宋立锵
戢翔
赵永义
杜良辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Comservice Enrising Information Technology Co Ltd
Original Assignee
China Comservice Enrising Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Comservice Enrising Information Technology Co Ltd filed Critical China Comservice Enrising Information Technology Co Ltd
Priority to CN202511052777.7A priority Critical patent/CN120561315A/en
Publication of CN120561315A publication Critical patent/CN120561315A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

本发明公开了基于深度学习的科技服务供需智能匹配方法、设备和介质,涉及深度学习技术领域,通过构建覆盖科技服务领域的多维度数据集,基于多任务学习对预训练语言模型进行后预训练,基于知识图谱构建知识图谱辅助样本;基于知识图谱辅助样本对,对第一优化模型进行对比学习得到第二优化模型;采集真实用户的历史交互数据对第二优化模型进行优化得到重排模型,然后对候选服务进行多维度相关性评分,基于加权排序得到最终推荐结果。通过构建领域特定数据集、设计创新性的多任务过渡训练与领域特定任务、引入自适应多任务权重优化机制、提出知识增强对比学习方法,并结合两阶段检索与重排策略,实现了科技服务供需数据的高效、智能、精准匹配。

The present invention discloses a method, device and medium for intelligent matching of supply and demand of science and technology services based on deep learning, which relates to the field of deep learning technology. By constructing a multi-dimensional data set covering the field of science and technology services, a pre-trained language model is post-pre-trained based on multi-task learning, and a knowledge graph auxiliary sample is constructed based on the knowledge graph; based on the knowledge graph auxiliary sample pair, a first optimization model is subjected to comparative learning to obtain a second optimization model; historical interaction data of real users is collected to optimize the second optimization model to obtain a re-ranking model, and then candidate services are scored for multi-dimensional relevance, and the final recommendation result is obtained based on weighted sorting. By constructing a domain-specific data set, designing innovative multi-task transition training and domain-specific tasks, introducing an adaptive multi-task weight optimization mechanism, proposing a knowledge-enhanced comparative learning method, and combining a two-stage retrieval and re-ranking strategy, efficient, intelligent and accurate matching of science and technology service supply and demand data is achieved.

Description

Scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning
Technical Field
The invention relates to the technical field of deep learning, in particular to a scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning.
Background
With the rapid development of the scientific and technological service industry, the problem of information asymmetry between a technical demander and a service provider is increasingly prominent. Traditional science and technology service matching mainly relies on manual retrieval or a simple keyword matching mode, and has the defects of low matching precision, insufficient efficiency and incapability of fully mining deep semantic association between requirements and services. In recent years, the progress of deep learning technology provides new possibilities for intelligent matching, and particularly, the wide application of a pre-trained language model in the field of natural language processing remarkably improves the performance of semantic understanding and text matching. However, the existing matching method based on the pre-training language model is designed for the general field, and lacks of targeted adaptation to specific semantics, industry terms and complex demand scenes in the technical service field, so that model performance is limited when processing the technical service supply and demand matching task with strong specialization and complex context.
In addition, in the multi-task combined training process in the prior art, a static weight distribution strategy is often adopted, so that convergence speed and optimization targets among different tasks are difficult to balance, and training efficiency is low or model performance is poor. Meanwhile, when the existing matching system faces large-scale data, the problem that the retrieval efficiency and the matching precision are difficult to consider is commonly existed, and particularly in an application scene with high real-time requirements, the system response speed becomes a key bottleneck for restricting user experience. On the other hand, the knowledge updating speed in the technical service field is high, the existing model lacks a dynamic adaptation and knowledge enhancement mechanism, and the method is difficult to cope with diversified requirements of across industries and across scenes.
In summary, the prior art has significant shortcomings in the field suitability, training efficiency, matching precision, system expansibility and the like of the supply and demand matching of the scientific and technological service, and an efficient and intelligent matching method designed according to the characteristics of the scientific and technological service field is needed to solve the above problems and improve the service capability of the platform.
Disclosure of Invention
The technical problem to be solved by the invention is that the prior art has obvious defects in the aspects of field suitability, training efficiency, matching precision, system expansibility and the like of scientific and technological service supply and demand matching, and the invention aims to provide a scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning, which realizes high-efficiency and accurate supply and demand matching by constructing a field specific data set, multitasking joint training, self-adaptive weight optimization, knowledge enhancement contrast learning and two-stage retrieval rearrangement strategies, can realize accurate and high-efficiency butt joint of technical requirements and services, and obviously improves matching precision, system efficiency and user experience.
The invention is realized by the following technical scheme:
the first aspect of the invention provides a scientific and technological service supply and demand intelligent matching method based on deep learning, which comprises the following specific steps:
acquiring user demand and science and technology service interaction data accumulated by a science and technology service platform history, and constructing a science and technology service supply and demand matching data set;
Constructing a pre-training language model, and performing post pre-training on the pre-training language model based on multi-task learning to obtain a first optimization model;
Using the path distance between the entities in the knowledge graph as a measurement standard of sample difficulty, and carrying out classification on the knowledge graph auxiliary samples;
based on the knowledge graph auxiliary sample pairs, performing contrast learning on the first optimization model to obtain a second optimization model;
collecting historical interaction data of a real user to optimize a second optimization model to obtain a rearrangement model;
And (3) carrying out multidimensional relevance scoring on the candidate services based on the rearrangement model, and obtaining a final recommendation result based on the weighted sorting.
Further, the acquiring the user demand and the science and technology service interaction data accumulated by the history of the science and technology service platform and constructing a science and technology service supply and demand matching data set specifically includes:
Obtaining user demand and science and technology service interaction data accumulated based on history of a science and technology service platform, extracting a technology demand text, a science and technology service text and a correlation score as a science and technology service supply and demand matching data set, wherein,
The technical requirement text is technical service requirement description submitted by a user;
The science and technology service text is a science and technology service content description provided by the platform;
the relevance score is generated by combining manual annotation and scoring model prediction based on user historical interaction behaviors.
Further, in the multi-task learning, the multi-tasks specifically comprise a masking task, a term prediction task, a regression task and a sequencing task.
Further, the post-pretraining of the pretraining language model based on the multitask learning specifically comprises:
Based on the masking task, carrying out random masking on an input text of a pre-training language model, carrying out semantic understanding of the text according to the rest visible text, obtaining words with hidden masking according to the semantic understanding of the text, carrying out correlation calculation on words predicted by the model and words with actual masking to obtain prediction accuracy, and optimizing model parameters by using a cross entropy loss function based on the prediction accuracy;
extracting key technical terms from an input text by using a domain dictionary based on a term prediction task, randomly selecting the extracted key technical terms and replacing the key technical terms with marks in the text, predicting the key technical terms replaced with the marks according to context information, calculating differences between the predicted key technical terms and actual key technical terms by using a cross entropy loss function, and optimizing model parameters according to the differences;
The method comprises the steps of obtaining a required text and a service text pair according to an input text based on a regression task, marking a relevance score for each pair of texts, encoding the required text and the service text pair by using a pre-training language model, extracting semantic feature vectors of the encoded required text and service text pair, carrying out relevance score prediction based on the extracted feature vectors, constructing a loss function according to the relevance score prediction, and optimizing model parameters;
and (3) a sorting task, namely constructing a loss function optimization model sorting capability based on the relevance score according to the relevance score of the demand text and the service text pair.
Further, when training the pre-training model, the method further comprises dynamically adjusting task weights during multi-task learning, and the dynamically adjusting process comprises the following steps:
Setting training rounds and obtaining the current training round number;
obtaining a progress ratio according to the training round and the current training round number;
obtaining a transition factor according to the progress proportion;
and performing weight setting on the mask task, the term prediction task, the regression task and the sequencing task according to the transition factors.
Further, the acquiring the knowledge graph in the technical service field, and constructing the auxiliary sample pair of the knowledge graph specifically includes:
generating a knowledge graph of the scientific and technological service field according to the scientific and technological service supply and demand matching data set;
in the knowledge graph of the scientific and technological service domain, the requirement-service pairs belonging to the same technical sub-domain or having strong semantic association are defined as positive samples, and the requirement-service pairs which are irrelevant across the technical domain or the semantic association are defined as negative samples.
Further, the learning of the first optimization model based on the knowledge-graph auxiliary sample pair to obtain a second optimization model specifically includes:
acquiring nodes of a knowledge graph in the technical service field, and extracting node path distances;
extracting nodes with the node path distance higher than a set threshold value, and defining the nodes as difficult-to-separate samples;
And constructing a contrast learning loss function, weighting the difficult-to-separate samples, and optimizing sample representation to obtain a second optimization model.
Further, the collecting the historical interaction data of the real user optimizes the second optimization model to obtain a rearrangement model, which specifically includes:
historical interaction data of a real user is obtained,
Training the second optimization model based on the historical interaction data of the real user to obtain a rearrangement model.
The second aspect of the invention provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes a scientific and technological service supply and demand intelligent matching method based on deep learning when executing the program.
A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a deep learning based intelligent matching method for scientific and technological service supply and demand.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. The field suitability is improved, the field-specific data set construction method and the pre-training task are designed according to the professionality and complexity of the science and technology service field, and the understanding capability of the model to the industry term, the technical background and the demand context is improved through semantic enhancement and knowledge graph integration, so that the limitation of the general pre-training model in the application of the professional field is solved.
2. The training efficiency and performance are optimized by introducing a self-adaptive multi-task weight optimizing mechanism, dynamically adjusting the weight distribution of multi-tasks such as regression, sequencing and the like, reducing the interference among tasks, improving the training efficiency and the overall performance of a model, and ensuring the stability and reliability of the system in complex matching tasks.
3. And (3) enhancing the semantic matching precision, namely providing a knowledge enhancement contrast learning method, constructing a high-quality sample pair by using a domain knowledge graph, and optimizing the discrimination capability of the model on the fine-grained semantic relation between the technical requirements and the service, thereby realizing more accurate supply and demand matching.
4. And designing a two-stage searching and rearranging strategy, screening candidate services through efficient semantic searching, and grading multidimensional correlation by combining a refined rearranging model, so that the matching precision is ensured, the response speed of the system is obviously improved, and the real-time requirement under a large-scale data scene is met.
5. The cross-scene universality is realized by constructing a modularized and extensible technical framework, supporting the flexible adjustment of task design and optimization strategies according to different industries and application scenes, and simultaneously ensuring the adaptability and continuous optimization capability of the model in the cross-field and cross-scene through dynamic updating of user interaction data and knowledge maps.
Drawings
In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, the drawings that are needed in the examples will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and that other related drawings may be obtained from these drawings without inventive effort for a person skilled in the art. In the drawings:
Fig. 1 is a flow chart of a matching method in an embodiment of the invention.
Detailed Description
For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.
As a possible implementation manner, as shown in FIG. 1, the embodiment provides a scientific and technological service supply and demand intelligent matching method based on deep learning, which comprises the following specific steps of obtaining user demand and scientific and technological service interaction data accumulated by a scientific and technological service platform history, constructing a scientific and technological service supply and demand matching data set, constructing a pre-training language model, performing post pre-training on the pre-training language model based on multi-task learning to obtain a first optimization model, using path distances among entities in a knowledge graph as a measurement standard of sample difficulty, performing knowledge graph auxiliary sample pair classification, performing comparison learning on the first optimization model based on the knowledge graph auxiliary sample pair to obtain a second optimization model, collecting history interaction data of a real user to optimize the second optimization model to obtain a rearrangement model, performing multi-dimensional correlation scoring on candidate services based on the rearrangement model, and obtaining a final recommendation result based on weighted sequencing.
In this embodiment, by constructing a multidimensional dataset covering the technical service field and enhancing technical terms, industry background and semantic relationships in combination with a field knowledge graph, a field-specific technical term Prediction task (TECHNICAL TERM Prediction, TTP) is designed, so that the understanding capability of the pre-training language model on complex semantics of the technical field is significantly improved. An adaptive multi-task weight optimization mechanism is introduced, and the weight distribution of the multi-tasks such as regression, sequencing and the like is adjusted in real time according to the task convergence speed and the difficulty through a dynamic weighted average and reinforcement learning auxiliary optimization strategy, so that the inter-task interference is effectively reduced, and the training efficiency and the model performance are improved. And a high-quality positive and negative sample pair is constructed by utilizing a knowledge graph in the scientific and technological service field, and difficult-to-separate samples are optimized preferentially by weighting InfoNCE loss functions, so that the discrimination capability of the model on fine granularity semantic relation between technical requirements and services is enhanced. Through a two-stage searching and rearranging strategy, the candidate service is efficiently searched and screened through semantic vectors in the first stage, a refined rearranging model is adopted for multi-dimensional relevance scoring in the second stage, and a final matching result is output by combining a weighted sorting algorithm. The strategy ensures the matching precision and simultaneously remarkably improves the system efficiency. The model universality is ensured through dynamic updating of the user history interaction data and the domain knowledge graph. The method and the system for searching the technical service platform have the advantages that through accurate matching of technical requirements and the technical service, the user searching cost is remarkably reduced, the user satisfaction is improved, meanwhile, through efficient searching and intelligent recommending, the resource allocation efficiency of the technical service platform is optimized, and the service capacity and the market competitiveness of the platform are enhanced. Compared with the prior art, the method has obvious advantages in the aspects of field suitability, semantic understanding capability, training efficiency, matching precision, system performance, application expansibility and the like. The innovative technical scheme not only fills the technical blank in the intelligent matching field of the scientific and technological service supply and demand, but also shows excellent effect in practical application, and has important theoretical value and commercial prospect.
The specific implementation steps comprise:
1. construction of a science and technology service supply and demand matching data set:
The embodiment constructs a high-quality and domain-specific training data set based on the user demand and the scientific and technological service interaction data accumulated by the scientific and technological service platform. The data set takes (technical requirement text, scientific and technological service text and relevance scoring) as a basic unit, wherein the technical requirement text is technical service requirement description submitted by a user, the scientific and technological service text is scientific and technological service content description provided by a platform, the relevance scoring is generated by combining manual annotation and scoring model prediction based on historical interaction behaviors (such as clicking, consultation and achievement) of the user, and the matching degree is quantized by adopting a numerical form in a range of 0-1. In order to improve the data quality, the embodiment introduces a domain knowledge graph to semantically enhance the data, marks key entities (such as technical fields, application scenes and core technical points) in technical requirements and service texts and relations thereof, and ensures the accuracy and representativeness of the data at a semantic level. Meanwhile, training data covering multiple industries and multiple scenes is constructed through data cleaning, denoising and diversity sampling, so that a solid foundation is laid for subsequent model training.
2. Training phase of pre-training language model based on multitask learning, multitask combined training and field specific task design:
Post pre-training is performed on the basis of a pre-training language model (such as BERT) so as to reduce the difference between the pre-training and the domain fine tuning, and the understanding and characterization capability of the model on complex semantic relations between scientific and technological services and technical requirements are improved through multi-task joint optimization and domain specific task design in the post pre-training stage. The specific task design is as follows:
2.1 masking task (masking language modeling task MLM):
masking part of words in the input text requires that the model predicts the masked content according to the context, and adopts cross entropy loss function optimization to further enhance the language understanding capability of the model.
Mask language modeling task loss function:;
N represents the number of samples;
M i is the set of masked words in sample i;
w t is the real word of the masked position t;
p (w t|contexti) is the probability that the model predicts the word w t according to the context i, and is obtained by calculating the softmax of the output layer of the pre-training language model (such as BERT);
the masked language modeling task penalty function optimizes the model's ability to understand context semantics by minimizing the probability differences between predicted words and real words.
2.2 Domain specific technical term Prediction task (TECHNICAL TERM Prediction, TTP):
Aiming at the technical terms and the contextual semantic features of the science and technology service field, an innovative task is provided, key technical terms are randomly masked in an input science and technology service text (extracted through a field dictionary or TF-IDF), a model is required to predict the terms, meanwhile, a term relation prediction task is designed, the upper-lower relationship or relevance (such as the hierarchical relationship of artificial intelligence and deep learning) among the terms in the input text is predicted, and the sensitivity of the model to specific semantics of the science and technology field is improved through cross entropy loss function optimization.
The term prediction task (Term Prediction Task) loss function:
;
N represents the number of samples;
T i is the set of masked key technical terms in sample i;
w t is the real word of the masked position t;
p (w t|contexti) is the probability that the model predicts term w t according to context i, calculated by softmax through the output layer of the pre-trained language model (e.g., BERT).
Loss function of relational classification task (Relation Classification Task):
N represents the number of samples;
is a set of relationship classes (e.g., upper-lower relationship, association relationship) of term pairs in sample i;
is a term pair real label (0 or 1, encoded with one-hot) on the relationship class r;
the probability of the model prediction term pair belonging to the relation class r is obtained through calculation of the feature vector through the full connection layer and softmax.
Total loss function:;
The weight coefficient of the two subtasks is balanced and can be adjusted according to the importance of the task or the convergence speed.
The loss function enhances the deep understanding capability of the model to specific semantics in the technical field through combining optimized term prediction and relation prediction. The term selection and the relation annotation can be combined with a domain knowledge graph or an expert dictionary to ensure the quality and the domain suitability of training data.
2.3 Regression task:
and taking the demand text and the service text as input, predicting the relevance score of the demand text and the service text, and optimizing a model by adopting a Mean Square Error (MSE) loss function, so that the matching degree of the model can be accurately quantized.
Regression loss function:;
N represents the number of samples;
s i is the true relevance score of the sample for i (generated by manual labeling or rules);
The relevance score of the model prediction is usually obtained by mapping feature vectors coded by a pre-training language model through a full-connection layer.
The loss function optimizes the continuous value predictive capability of the model for supply-demand matching correlations by minimizing the squared difference between the predictive score and the true score.
2.4 Ordering tasks:
and carrying out relevance ranking on different services under the same requirement, ensuring that the service row highly related to the requirement is positioned at the front position, optimizing based on a loss function, and improving the ranking capability of the model in a recommended scene.
Ranking loss function:
;
N represents the number of samples;
model predictive scores for samples i and j, respectively;
true relevance scores for samples i and j, respectively;
is an indication function when The value is 1, otherwise 0.
The penalty function optimizes the ranking results, ensuring that higher correlation services obtain higher predictive scores.
The loss functions of the tasks are jointly optimized in a weighted summation mode to form a multi-task learning framework. Compared with the general task design of the traditional pre-training model, the TTP task of the embodiment focuses on technical field terms and relations, so that modeling capability of the model on deep semantics of technical contents is remarkably enhanced, and a primarily optimized model (a first optimization model) is obtained.
Total loss function:
;
Wherein alpha, beta, gamma and delta are super parameters, the task weight is balanced by experimental tuning,The task loss function is modeled for the mask language,The total loss function of the task is classified for the relationship,In order to return the loss function,Is a sort penalty function.
3. Adaptive multitasking weight optimization mechanism:
In order to solve the problems of low training efficiency or inter-task interference caused by the dependence of the loss weights (alpha, beta, gamma and delta) on manual tuning in the multi-task training, the embodiment provides a self-adaptive weight adjustment mechanism based on task convergence speed and difficulty.
Weights are dynamically adjusted based on a function of training progress (epoch or step). Assuming that the total training wheel number is T (in epoch), the current training wheel number is T, and defining a progress ratioThe range is [0,1]. Then define a transition factor:;
Wherein:
k controls the steepness (i.e. transition speed) of the function, and the value is usually 10-20;
is the midpoint of the function, represents the intermediate position of the transition, and is usually 0.5;
is a natural constant (approximately equal to 2.71828) and is used as a base in an excessive factor formula for smoothing a transitional scene, wherein the core goal of the transitional factor is to smoothly control the dynamic change of task weight and solve the pain point of manual parameter adjustment;
the weights of the mask task and the term prediction task are gradually reduced along with the training progress and are used ;
The weight of the regression task and the sequencing task is gradually increased along with the training progress and is used
In model training, the dynamic weights of the loss functions of the tasks are as follows:
;
;
;
;
;
;
;
;
;
Wherein alpha, beta, gamma and delta are superparameters, omega is the sum of the superparameters, and the sum is the total weight.
4. Knowledge enhancement contrast learning promotes model representation capabilities:
On the basis of the first optimization model, the embodiment further provides a Knowledge enhancement contrast learning (knowledges-Enhanced Contrastive Learning, KECL) method, and the discrimination capability of the model on semantic relationships and the matched fine granularity distinction degree are remarkably improved by deeply integrating the domain Knowledge graph and the contrast learning frame. The specific method comprises the following steps:
4.1 knowledge-graph auxiliary sample pair construction
KECL innovatively introduces a knowledge graph in the technical service field as an auxiliary tool for guiding the construction of the sample pair. The knowledge graph comprises technical field classification, hierarchical and association relations among terms, expert resources and other structural information. Based on this information KECL tightly combines the construction process of the sample pairs with domain knowledge:
Positive sample pair-a need-service pair belonging to the same technical sub-domain or having a strong semantic association is defined as a positive sample. For example, "artificial intelligence algorithm development requirements" and "machine learning model optimization services" within the same technical sub-domain are considered positive samples, as they share similar technical classifications or term paths in the knowledge graph.
Negative-sample pair-a cross-technology domain or semantically independent demand-service pair is defined as a negative-sample. For example, "biomedical technology requirements" and "cloud computing services" are considered as negative examples because they belong to different technology branches in the knowledge graph.
Through the guidance of the knowledge graph, the construction of the sample pair is not limited to the text surface layer characteristics, but the semantic hierarchy and entity relationship in the field are fully considered, so that the model can learn the more field-specific representation.
4.2 Knowledge weighted contrast loss function optimization:
Knowledge-enhanced contrast learning (knowledges-Enhanced Contrastive Learning, KECL) uses path distances between entities in a Knowledge-graph as a measure of sample difficulty. The path distance reflects the semantic association degree of two entities in the knowledge graph, namely, the entity with shorter path distance has stronger semantic association with common semantics and is easier to distinguish, and the entity with longer path distance has weaker semantic association with semantics and belongs to a 'difficult-to-separate sample'. Based on this, KECL devised a weighted InfoNCE loss function that gives higher weight to difficult samples (i.e., positive pairs of samples with longer path distances or negative pairs of samples with shorter path distances), optimizing the representation of these samples preferentially.
The direct effect of the weighting mechanism is that the model can better pull the distance of the positive sample pair in the semantic space, and meanwhile, the distance of the negative sample pair is obviously pushed away, so that the distinguishing capability of the fine-granularity semantic relation is improved.
;
;
Wherein, the Representing the weight between sample i and sample j,Representing the distance between sample i and sample j,Representing the sum of the distances between all pairs of samples,A characteristic representation of the sample i is represented,Representing positive samplesIs characterized in that,Representing the sensitivity of the loss function,The similarity of the samples is indicated,A positive pair of samples is represented and,Representing knowledge embedding versus loss functions.
And obtaining a further optimized model, namely a second optimized model after optimization.
5. Model fine-tuning based on real user history data:
In order to make the model fit the actual application scene more, in this embodiment, the second optimization model is fine-tuned by using the historical interaction data of the real user. The model parameters are further optimized by collecting the behavior data of real clicking, consultation, bargaining and the like of the user on the platform and adopting a supervised fine tuning mode in combination with the feedback information of the user. The fine adjustment in the stage not only improves the adaptability of the model to the real demands of users, but also effectively reduces the cold start and generalization risks of the model in practical application, and a final rearrangement model (rearrangement model) is obtained.
6. Two-stage search and rearrangement mechanism for science and technology service supply and demand matching:
In practical application, the embodiment adopts a two-stage searching and rearranging strategy, and combines matching efficiency and accuracy:
The first stage of semantic vector efficient retrieval, namely using a second optimization model as a semantic vector retrieval model, vectorizing a technical demand text input by a user and a service text in a science and technology service library, and rapidly screening candidate science and technology services through efficient vector similarity calculation (such as cosine similarity or ANN neighbor search), thereby realizing coarse ranking under a large-scale service library.
And in the second stage, the rearrangement model is used as a rearrangement model for carrying out multidimensional relevance scoring (comprising semantic relevance, user preference matching degree, field suitability and the like) on the candidate service, and a final recommendation result is output by combining a weighted ranking algorithm. The mechanism effectively improves the retrieval efficiency and recommendation accuracy of the system, and meets the actual requirements of a large-scale scientific and technological service supply and demand scene.
In summary, the present embodiment remarkably improves the level of intellectualization of the supply and demand matching of the scientific and technological service platform by constructing a high-quality domain specific data set, innovatively designing technical term prediction task (TTP) and multi-task transitional training, introducing an adaptive multi-task weight optimization mechanism, providing a Knowledge Enhancement Contrast Learning (KECL) method, and combining two-stage search and rearrangement strategies. The technical scheme not only breaks through the matching accuracy and efficiency, but also has obvious advantages in field suitability, technical innovation and training efficiency, can be widely applied to intelligent recommendation and resource allocation scenes of various scientific and technological service platforms, and has important practical application value and popularization prospect.
As a possible implementation manner, the embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements a scientific and technological service supply and demand intelligent matching method based on deep learning when executing the program.
As one possible implementation manner, the present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a scientific and technological service supply and demand intelligent matching method based on deep learning.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1.基于深度学习的科技服务供需智能匹配方法,其特征在于,包括以下具体步骤:1. A method for intelligently matching supply and demand of scientific and technological services based on deep learning, characterized by comprising the following specific steps: 获取科技服务平台历史积累的用户需求与科技服务交互数据,构建科技服务供需匹配数据集;Obtain the historical user demand and technology service interaction data accumulated by the technology service platform to build a technology service supply and demand matching data set; 构建预训练语言模型,基于多任务学习对预训练语言模型进行后预训练,得到第一优化模型;Constructing a pre-trained language model, and performing post-pretraining on the pre-trained language model based on multi-task learning to obtain a first optimized model; 获取科技服务领域的知识图谱,构建知识图谱辅助样本对;Obtain knowledge graphs in the field of scientific and technological services and construct knowledge graph-assisted sample pairs; 基于知识图谱辅助样本对,对第一优化模型进行对比学习,得到第二优化模型;Based on the knowledge graph-assisted sample pairs, the first optimization model is compared and learned to obtain the second optimization model; 采集真实用户的历史交互数据对第二优化模型进行优化,得到重排模型;Collect historical interaction data of real users to optimize the second optimization model and obtain a rearrangement model; 基于重排模型,对候选服务进行多维度相关性评分,基于加权排序得到最终推荐结果。Based on the re-ranking model, the candidate services are scored for multi-dimensional relevance, and the final recommendation results are obtained based on weighted sorting. 2.根据权利要求1所述的基于深度学习的科技服务供需智能匹配方法,其特征在于,所述获取科技服务平台历史积累的用户需求与科技服务交互数据,构建科技服务供需匹配数据集,具体包括:2. The deep learning-based intelligent matching method for technology service supply and demand according to claim 1 is characterized in that the step of acquiring historically accumulated user demand and technology service interaction data from the technology service platform and constructing a technology service supply and demand matching dataset specifically includes: 获取基于科技服务平台历史积累的用户需求与科技服务交互数据,提取技术需求文本、科技服务文本和相关性评分作为科技服务供需匹配数据集;其中,Obtain user demand and technology service interaction data based on the historical accumulation of the technology service platform, extract technology demand text, technology service text and correlation score as the technology service supply and demand matching data set; 所述技术需求文本为用户提交的技术服务需求描述;The technical requirement text is a description of the technical service requirements submitted by the user; 所述科技服务文本为平台提供的科技服务内容描述;The technology service text is a description of the technology services provided by the platform; 所述相关性评分为结合人工标注与基于用户历史交互行为的评分模型预测生成。The relevance score is generated by combining manual annotation with predictions from a scoring model based on historical user interaction behaviors. 3.根据权利要求1所述的基于深度学习的科技服务供需智能匹配方法,其特征在于,所述多任务学习中,多任务具体包括:掩码任务、术语预测任务、回归任务和排序任务。3. The deep learning-based intelligent matching method for technology service supply and demand according to claim 1 is characterized in that in the multi-task learning, the multiple tasks specifically include: a masking task, a term prediction task, a regression task, and a sorting task. 4.根据权利要求3所述的基于深度学习的科技服务供需智能匹配方法,其特征在于,所述基于多任务学习对预训练语言模型进行后预训练,具体包括:4. The deep learning-based intelligent matching method for technology service supply and demand according to claim 3, wherein the post-training of the pre-trained language model based on multi-task learning specifically comprises: 基于掩码任务,对预训练语言模型的输入文本进行随机掩码;预训练语言模型根据剩下的可见文本,进行文本的语义理解;根据文本的语义理解,得到掩码隐藏的词语;将模型预测出的词语与实际掩码的词语进行相关性计算,得到预测准确性;基于预测准确性,使用交叉熵损失函数优化模型参数;Based on the masking task, the input text of the pre-trained language model is randomly masked; the pre-trained language model performs semantic understanding of the text based on the remaining visible text; based on the semantic understanding of the text, the masked words are obtained; the correlation between the words predicted by the model and the actual masked words is calculated to obtain the prediction accuracy; based on the prediction accuracy, the model parameters are optimized using the cross-entropy loss function; 基于术语预测任务,使用领域词典从输入文本中提取关键技术术语;随机选择提取的关键技术术语,并在文本中用标记替换关键技术术语;根据上下文信息预测出用标记替换的关键技术术语;使用交叉熵损失函数计算预测的关键技术术语与实际的关键技术术语之间的差异,根据差异优化模型参数;Based on the term prediction task, the domain dictionary is used to extract key technical terms from the input text; the extracted key technical terms are randomly selected and replaced with tokens in the text; the token-replaced key technical terms are predicted based on the context information; the cross-entropy loss function is used to calculate the difference between the predicted key technical terms and the actual key technical terms, and the model parameters are optimized based on the difference; 基于回归任务,根据输入文本提取得到需求文本和服务文本对;为每对文本标注一个相关性评分;使用预训练语言模型对需求文本和服务文本对进行编码,提取编码后的需求文本和服务文本对的语义特征向量;基于提取的特征向量进行相关性评分预测;根据相关性评分预测构建损失函数,优化模型参数;Based on the regression task, the demand text and service text pairs are extracted from the input text; a relevance score is assigned to each text pair; the demand text and service text pairs are encoded using a pre-trained language model, and the semantic feature vectors of the encoded demand text and service text pairs are extracted; the relevance score is predicted based on the extracted feature vectors; a loss function is constructed based on the relevance score prediction to optimize the model parameters; 排序任务:根据需求文本和服务文本对的相关性评分,基于相关性评分构建损失函数优化模型排序能力。Sorting task: Based on the relevance score of the demand text and service text pairs, a loss function is constructed based on the relevance score to optimize the model's sorting ability. 5.根据权利要求4所述的基于深度学习的科技服务供需智能匹配方法,其特征在于,在对预训练模型进行训练时,还包括动态调整多任务学习时的任务权重,所述动态调整的过程包括:5. The deep learning-based intelligent matching method for technology service supply and demand according to claim 4 is characterized in that, when training the pre-trained model, it also includes dynamically adjusting the task weights in multi-task learning, and the dynamic adjustment process includes: 设置训练轮次,获取当前训练轮数;Set the training round and get the current number of training rounds; 根据训练轮次和当前训练轮数,得到进度比例;Get the progress ratio based on the training rounds and the current number of training rounds; 根据进度比例,得到过渡因子;According to the progress ratio, the transition factor is obtained; 根据过渡因子进行权重设置掩码任务、术语预测任务、回归任务和排序任务的权重。The weights of the mask task, term prediction task, regression task, and ranking task are set according to the transition factor. 6.根据权利要求1所述的基于深度学习的科技服务供需智能匹配方法,其特征在于,所述获取科技服务领域的知识图谱,构建知识图谱辅助样本对,具体包括:6. The deep learning-based intelligent matching method for technology service supply and demand according to claim 1 is characterized in that the step of acquiring a knowledge graph in the field of technology services and constructing knowledge graph-assisted sample pairs specifically includes: 根据科技服务供需匹配数据集,生成科技服务领域的知识图谱;Generate a knowledge graph in the field of science and technology services based on the science and technology service supply and demand matching dataset; 在科技服务领域的知识图谱中,将属于同一技术子领域或具有强语义关联的需求-服务对定义为正样本,将跨技术领域或语义上无关的需求-服务对定义为负样本。In the knowledge graph of the field of science and technology services, demand-service pairs belonging to the same technical sub-field or with strong semantic associations are defined as positive samples, and demand-service pairs that cross technical fields or are semantically unrelated are defined as negative samples. 7.根据权利要求1所述的基于深度学习的科技服务供需智能匹配方法,其特征在于,所述基于知识图谱辅助样本对,对第一优化模型进行对比学习,得到第二优化模型,具体包括:7. The deep learning-based intelligent matching method for technology service supply and demand according to claim 1 is characterized in that the method, based on the knowledge graph-assisted sample pairs, performs comparative learning on the first optimization model to obtain the second optimization model, specifically comprising: 获取科技服务领域的知识图谱的节点,提取节点路径距离;Obtain nodes of the knowledge graph in the field of science and technology services and extract node path distances; 提取节点路径距离高于设定阈值的节点,定义为难分样本;Nodes whose node path distance is higher than the set threshold are extracted and defined as hard-to-classify samples; 构建对比学习损失函数,对难分样本进行加权,优化样本表示,得到第二优化模型。Construct a contrastive learning loss function, weight the difficult samples, optimize the sample representation, and obtain the second optimization model. 8.根据权利要求1所述的基于深度学习的科技服务供需智能匹配方法,其特征在于,所述采集真实用户的历史交互数据对第二优化模型进行优化,得到重排模型,具体包括:8. The deep learning-based intelligent matching method for technology service supply and demand according to claim 1, wherein said collecting historical interaction data of real users to optimize the second optimization model to obtain a rearrangement model specifically comprises: 获取真实用户的历史交互数据,Get historical interaction data of real users, 基于真实用户的历史交互数据对第二优化模型进行训练,得到重排模型。The second optimization model is trained based on historical interaction data of real users to obtain a rearrangement model. 9.一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至8任一项所述的基于深度学习的科技服务供需智能匹配方法。9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the method for intelligent matching of supply and demand of scientific and technological services based on deep learning is implemented as described in any one of claims 1 to 8. 10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1至8任一项所述的基于深度学习的科技服务供需智能匹配方法。10. A computer-readable storage medium having a computer program stored thereon, characterized in that when the program is executed by a processor, the method for intelligent matching of supply and demand of scientific and technological services based on deep learning as described in any one of claims 1 to 8 is implemented.
CN202511052777.7A 2025-07-30 2025-07-30 Scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning Pending CN120561315A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511052777.7A CN120561315A (en) 2025-07-30 2025-07-30 Scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511052777.7A CN120561315A (en) 2025-07-30 2025-07-30 Scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning

Publications (1)

Publication Number Publication Date
CN120561315A true CN120561315A (en) 2025-08-29

Family

ID=96821706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511052777.7A Pending CN120561315A (en) 2025-07-30 2025-07-30 Scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning

Country Status (1)

Country Link
CN (1) CN120561315A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704388A (en) * 2021-03-05 2021-11-26 腾讯科技(深圳)有限公司 Training method and device for multi-task pre-training model, electronic equipment and medium
CN115312127A (en) * 2022-08-05 2022-11-08 抖音视界有限公司 Pre-training method of recognition model, recognition method, device, medium and equipment
US20230394387A1 (en) * 2022-06-01 2023-12-07 Dell Products L.P. Content analysis and retrieval using machine learning
US20240046330A1 (en) * 2022-08-05 2024-02-08 Salesforce, Inc. Systems and methods for universal item learning in item recommendation
CN118969088A (en) * 2024-08-20 2024-11-15 江南大学 A targeted peptide design method based on multi-task pre-training and transfer learning
CN119416880A (en) * 2024-10-14 2025-02-11 广西电网有限责任公司 A method and system for constructing electric power knowledge graph based on active learning
CN119493996A (en) * 2024-10-29 2025-02-21 南京苏逸实业有限公司 A small sample learning method and system based on data enhancement
CN119830200A (en) * 2024-11-30 2025-04-15 北京计算机技术及应用研究所 Dynamic data pipeline construction method based on artificial intelligence and multi-mode data processing

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704388A (en) * 2021-03-05 2021-11-26 腾讯科技(深圳)有限公司 Training method and device for multi-task pre-training model, electronic equipment and medium
US20230394387A1 (en) * 2022-06-01 2023-12-07 Dell Products L.P. Content analysis and retrieval using machine learning
CN115312127A (en) * 2022-08-05 2022-11-08 抖音视界有限公司 Pre-training method of recognition model, recognition method, device, medium and equipment
US20240046330A1 (en) * 2022-08-05 2024-02-08 Salesforce, Inc. Systems and methods for universal item learning in item recommendation
CN118969088A (en) * 2024-08-20 2024-11-15 江南大学 A targeted peptide design method based on multi-task pre-training and transfer learning
CN119416880A (en) * 2024-10-14 2025-02-11 广西电网有限责任公司 A method and system for constructing electric power knowledge graph based on active learning
CN119493996A (en) * 2024-10-29 2025-02-21 南京苏逸实业有限公司 A small sample learning method and system based on data enhancement
CN119830200A (en) * 2024-11-30 2025-04-15 北京计算机技术及应用研究所 Dynamic data pipeline construction method based on artificial intelligence and multi-mode data processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张超然 等: "基于预训练模型的机器阅读理解研究综述", 计算机工程与应用, no. 11, 31 December 2020 (2020-12-31), pages 22 - 30 *

Similar Documents

Publication Publication Date Title
CN112765477B (en) Information processing method and device, information recommendation method and device, electronic equipment and storage medium
CN111310438B (en) Chinese sentence semantic intelligent matching method and device based on multi-granularity fusion model
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
US11481456B2 (en) Model and pattern structure online unital learning: mapsoul
CN108932342A (en) A kind of method of semantic matches, the learning method of model and server
CN109284086B (en) Demand-adaptive Web Service Dynamic Evolution Method
CN112115264B (en) Text classification model adjustment method for data distribution change
CN108710672B (en) A Topic Crawler Method Based on Incremental Bayesian Algorithm
CN118195562B (en) Job entering willingness assessment method and system based on natural semantic analysis
CN118227790A (en) Text classification method, system, device and medium based on multi-label association
CN112486467B (en) Interactive service recommendation method based on dual interaction relation and attention mechanism
CN119295185B (en) Iterative design method and system for cultural and creative products based on user emotional feedback
CN115600602A (en) Method, system and terminal device for extracting key elements of long text
CN116976283A (en) Language processing method, training method, device, equipment, medium and program product
CN116306923A (en) Evaluation weight calculation method based on knowledge graph
CN118093648A (en) Personnel file quick retrieval method and system
CN112329440A (en) Relation extraction method and device based on two-stage screening and classification
CN114282875A (en) Process approval deterministic rules and semantic self-learning combined judgment method and device
CN115599980A (en) SaaS-oriented Web Api diversity recommendation method with fusion and restart random walk algorithm
CN119938846A (en) Method and device for generating question and answer based on knowledge graph
CN120561315A (en) Scientific and technological service supply and demand intelligent matching method, equipment and medium based on deep learning
CN115203532B (en) Project recommendation method and device, electronic equipment and storage medium
CN114610871B (en) Information system modeling analysis method based on artificial intelligence algorithm
CN114610880B (en) Text classification method, system, electronic equipment and storage medium
CN116431877A (en) Webpage big data content clustering method driven by cloud computing platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载