CN118691767A

CN118691767A - Image processing method, system, electronic device and computer program product

Info

Publication number: CN118691767A
Application number: CN202410721016.5A
Authority: CN
Inventors: 周昊; 倪冰冰; 尚文祥; 李昱翰; 林冉; 黄若慧; 余刚
Original assignee: Alibaba Shenzhen Technology Co ltd
Current assignee: Alibaba Shenzhen Technology Co ltd
Priority date: 2024-06-04
Filing date: 2024-06-04
Publication date: 2024-09-24

Abstract

The present application discloses an image processing method, system, electronic device and computer program product, which relate to large model technology and image processing fields. The method may include: identifying a product image to be processed, wherein the image content of the product image includes at least one product to be displayed; obtaining text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by a carrier; using the text description information, guiding the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training a Wensheng graph model, and the image content of the target image is used to simulate the wearing performance result. The present application solves the technical problem of poor effect of simulated wearing performance results in images.

Description

Image processing method, system, electronic device and computer program product

技术领域Technical Field

本申请涉及大模型技术、图像处理领域，具体而言，涉及一种图像的处理方法、系统、电子设备和计算机程序产品。The present application relates to large model technology and image processing fields, and more specifically, to an image processing method, system, electronic device and computer program product.

背景技术Background Art

目前，随着电子商务的快速发展，用户对于在线购物体验的需求不断增长，比如，他们希望能够在购买产品前能够更直观地了解产品的穿戴效果。因此，使用计算机视觉和图像处理技术，将商品图片与模特图形融合，以此模拟实际穿戴的效果的技术应运而生，成为了提升购物体验和增加客户满意度的重要手段以及电商行业中的一项重要趋势。虚拟试衣技术可以。At present, with the rapid development of e-commerce, users' demand for online shopping experience is growing. For example, they hope to have a more intuitive understanding of the wearing effect of a product before purchasing it. Therefore, the technology of using computer vision and image processing technology to merge product images with model graphics to simulate the actual wearing effect has emerged, becoming an important means to improve shopping experience and increase customer satisfaction, as well as an important trend in the e-commerce industry. Virtual fitting technology can.

在相关技术中，若需要模拟出某一产品在某一模特身上的穿戴效果，通常需要特定的模特图片，这对某些电商商家来说存在一定的挑战。比如，使用现有的模特图片可能会触及版权问题，穿戴效果受限于模特原本的体型，以及模特自身佩戴的物品或穿戴的衣物可能会干扰到试衣效果。总的来说，相关技术中在图像中展示产品的穿戴效果的服务受限较大，并且无法实现穿戴效果的定制化选择。因此，仍存在图像中模拟的穿戴表现结果效果差的技术问题。In the related art, if you need to simulate the wearing effect of a certain product on a certain model, you usually need a specific model picture, which poses certain challenges to some e-commerce merchants. For example, using existing model pictures may involve copyright issues, the wearing effect is limited by the model's original body shape, and the items or clothes worn by the model may interfere with the fitting effect. In general, the services for displaying the wearing effect of products in images in the related art are greatly limited, and customized selection of wearing effects cannot be achieved. Therefore, there is still a technical problem that the simulated wearing performance results in the image are poor.

针对上述的问题，目前尚未提出有效的解决方案。To address the above-mentioned problems, no effective solution has been proposed yet.

发明内容Summary of the invention

本申请实施例提供了一种图像的处理方法、系统、电子设备和计算机程序产品，以至少解决图像中模拟的穿戴表现结果效果差的技术问题。The embodiments of the present application provide an image processing method, system, electronic device, and computer program product to at least solve the technical problem of poor results of simulated wearing performance in images.

根据本申请实施例的一个方面，提供了一种图像的处理方法。该方法可以包括：识别出待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。According to one aspect of an embodiment of the present application, a method for processing an image is provided. The method may include: identifying a product image to be processed, wherein the image content of the product image includes at least one product to be displayed; obtaining text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by a carrier; using the text description information, guiding an image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training a text image model, and the image content of the target image is used to simulate the wearing performance result.

根据本申请实施例的另一个方面，提供了一种服饰图像的处理方法。该方法可以包括：识别部署在电子商务平台上的虚拟服饰店铺，且从虚拟服饰店铺中识别出待处理的服饰产品图像，其中，服饰产品图像的图像内容包括至少一待展示服饰产品；获取与服饰产品图像中待展示服饰产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示服饰产品在由载体穿戴的情况下，所展示出的穿戴表现结果；利用文本描述信息，引导图像处理模型对服饰产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；将目标图像下发至电子商务平台。According to another aspect of the embodiment of the present application, a method for processing clothing images is provided. The method may include: identifying a virtual clothing store deployed on an e-commerce platform, and identifying a clothing product image to be processed from the virtual clothing store, wherein the image content of the clothing product image includes at least one clothing product to be displayed; obtaining text description information corresponding to the clothing product to be displayed in the clothing product image, wherein the text description information is used to at least describe the wearing performance result displayed when the clothing product to be displayed is worn by a carrier; using the text description information, guiding the image processing model to analyze the clothing product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result; sending the target image to the e-commerce platform.

根据本申请实施例的另一个方面，提供了另一种图像的处理方法。该方法可以包括：响应作用于操作界面上的图像输入操作，在操作界面上显示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；响应作用于操作界面上的文本输入操作，在操作界面上显示与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；响应作用于操作界面上的图像生成操作，在操作界面上显示与产品图像和文本描述信息匹配的目标图像，其中，目标图像为利用文本描述信息，引导图像处理模型对产品图像进行分析得到，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。According to another aspect of the embodiment of the present application, another image processing method is provided. The method may include: in response to an image input operation on an operation interface, displaying a product image to be processed on the operation interface, wherein the image content of the product image includes at least one product to be displayed; in response to a text input operation on the operation interface, displaying text description information corresponding to the product to be displayed in the product image on the operation interface, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by a carrier; in response to an image generation operation on the operation interface, displaying a target image matching the product image and the text description information on the operation interface, wherein the target image is obtained by using the text description information to guide the image processing model to analyze the product image, the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

根据本申请实施例的另一个方面，提供了另一种图像的处理方法。该方法可以包括：在虚拟现实VR设备或增强现实AR设备的呈现画面上，展示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；驱动VR设备或AR设备利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；在VR设备或AR设备的呈现画面上，展示目标图像。According to another aspect of the embodiment of the present application, another image processing method is provided. The method may include: displaying a product image to be processed on a presentation screen of a virtual reality VR device or an augmented reality AR device, wherein the image content of the product image includes at least one product to be displayed; obtaining text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by a carrier; driving the VR device or AR device to use the text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result; displaying the target image on the presentation screen of the VR device or AR device.

根据本申请该实施例的另一个方面，提供了一种图像的处理系统。该系统可以包括：客户端，用于上传待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；上传与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；服务器，用于利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；将目标图像下发至客户端。According to another aspect of this embodiment of the present application, an image processing system is provided. The system may include: a client, for uploading a product image to be processed, wherein the image content of the product image includes at least one product to be displayed; uploading text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by a carrier; a server, for using the text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training a text image model, and the image content of the target image is used to simulate the wearing performance result; and sending the target image to the client.

根据本申请实施例的另一方面，还提供了一种电子设备，电子设备可以包括存储器和处理器：存储器用于存储计算机可执行指令，处理器用于执行计算机可执行指令，上述计算机可执行指令被处理器执行时，实现上述任意一项的图像的处理的方法。According to another aspect of an embodiment of the present application, an electronic device is also provided. The electronic device may include a memory and a processor: the memory is used to store computer-executable instructions, and the processor is used to execute computer-executable instructions. When the above-mentioned computer-executable instructions are executed by the processor, a method for processing an image of any one of the above-mentioned items is implemented.

根据本申请实施例的另一方面，还提供了一种处理器，处理器用于运行程序，其中，在程序运行时执行上述任意一项的图像的处理的方法。According to another aspect of an embodiment of the present application, a processor is further provided, and the processor is used to run a program, wherein any one of the above-mentioned image processing methods is executed when the program is running.

根据本申请实施例的另一方面，还提供了一种计算机可读存储介质，计算机可读存储介质包括存储的程序，其中，在程序运行时控制存储介质所在设备执行上述任意一项的图像的处理的方法。According to another aspect of an embodiment of the present application, a computer-readable storage medium is also provided, the computer-readable storage medium including a stored program, wherein when the program is executed, the device where the storage medium is located is controlled to execute any one of the above-mentioned image processing methods.

根据本申请实施例的另一方面，还提供了一种计算机程序产品。该计算机程序产品中包括计算机程序，该计算机程序在被处理器执行时实现上述本申请实施例的图像的处理方法。According to another aspect of the embodiment of the present application, a computer program product is also provided. The computer program product includes a computer program, and when the computer program is executed by a processor, the image processing method of the embodiment of the present application is implemented.

在本申请实施例中，若需要对某一产品在载体上进行穿戴的穿戴表现进行确定和展示，则可以识别出该产品的待处理的产品图像。获取在载体上对该产品进行展示的文本描述信息。可以利用文本描述信息，来引导预先通过文生图模型训练的图像处理模型，对产品图像进行分析，得到能够模拟该产品在载体上的穿戴表现的目标图像。由于本申请实施例考虑到相关技术中的问题，提出了一种利用文生图框架的虚拟试衣技术，通过简单的文字描述来对产品的虚拟试衣效果进行文字表达，利用文生图模型训练出来的图像处理模型对其进行分析，即可生成高度真实的模特试衣效果图，从而消除了对特定模特图片的依赖，为用户提供具有更大自由度、灵活度和定制化选择的产品展示效果，达到降低了通过图像来模拟产品的穿戴表现结果的局限性的目的，进而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。In an embodiment of the present application, if it is necessary to determine and display the wearing performance of a certain product on a carrier, the product image to be processed of the product can be identified. The text description information of the product displayed on the carrier is obtained. The text description information can be used to guide the image processing model pre-trained by the Wensheng graph model to analyze the product image and obtain a target image that can simulate the wearing performance of the product on the carrier. Since the embodiment of the present application takes into account the problems in the relevant technology, a virtual fitting technology using the Wensheng graph framework is proposed. The virtual fitting effect of the product is expressed in words through a simple text description, and the image processing model trained by the Wensheng graph model is used to analyze it, so that a highly realistic model fitting effect image can be generated, thereby eliminating the dependence on specific model pictures, providing users with a product display effect with greater freedom, flexibility and customized selection, and achieving the purpose of reducing the limitations of the wearing performance results of the product simulated by the image, thereby achieving the technical effect of improving the effect of the simulated wearing performance results in the image, and solving the technical problem of poor effect of the simulated wearing performance results in the image.

容易注意到的是，上面的通用描述和后面的详细描述仅仅是为了对本申请进行举例和解释，并不构成对本申请的限定。It is easy to notice that the above general description and the following detailed description are only for the purpose of exemplifying and explaining the present application, and do not constitute a limitation of the present application.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本申请的进一步理解，构成本申请的一部分，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings described herein are used to provide a further understanding of the present application and constitute a part of the present application. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation on the present application. In the drawings:

图1是根据本申请实施例的一种图像的处理方法的应用场景的示意图；FIG1 is a schematic diagram of an application scenario of an image processing method according to an embodiment of the present application;

图2是根据本申请实施例的一种图像的处理方法的流程图；FIG2 is a flow chart of an image processing method according to an embodiment of the present application;

图3是根据本申请实施例的另一种图像的处理方法的流程图；FIG3 is a flow chart of another image processing method according to an embodiment of the present application;

图4是根据本申请实施例的另一种图像的处理方法的流程图；FIG4 is a flow chart of another image processing method according to an embodiment of the present application;

图5是根据本申请实施例的另一种图像的处理方法的流程图；FIG5 is a flow chart of another image processing method according to an embodiment of the present application;

图6是根据本申请实施例的一种图像的处理系统的示意图；FIG6 is a schematic diagram of an image processing system according to an embodiment of the present application;

图7是根据本申请实施例的一种基于文生图框架的虚拟试穿流程化系统的示意图；FIG7 is a schematic diagram of a virtual try-on process system based on a Wensheng graph framework according to an embodiment of the present application;

图8是根据本申请实施例的一种知识注入位置的示意图；FIG8 is a schematic diagram of a knowledge injection position according to an embodiment of the present application;

图9是根据本申请实施例的一种知识注入方式的示意图；FIG9 is a schematic diagram of a knowledge injection method according to an embodiment of the present application;

图10是根据本发明实施例的一种计算机设备对图像处理的示意图；FIG10 is a schematic diagram of image processing by a computer device according to an embodiment of the present invention;

图11是根据本申请实施例的一种图像的处理装置的示意图；FIG11 is a schematic diagram of an image processing device according to an embodiment of the present application;

图12是根据本申请实施例的一种服饰图像的处理装置的示意图；FIG12 is a schematic diagram of a device for processing clothing images according to an embodiment of the present application;

图13是根据本申请实施例的另一种图像的处理装置的示意图；FIG13 is a schematic diagram of another image processing device according to an embodiment of the present application;

图14是根据本申请实施例的另一种图像的处理装置的示意图；FIG14 is a schematic diagram of another image processing device according to an embodiment of the present application;

图15是根据本申请实施例的一种计算机终端的结构框图；FIG15 is a structural block diagram of a computer terminal according to an embodiment of the present application;

图16是根据本申请实施例的一种图像的处理方法的电子设备的框图；FIG16 is a block diagram of an electronic device for an image processing method according to an embodiment of the present application;

图17是根据本申请实施例的一种用于实现图像的处理方法的计算机终端(或移动设备)的硬件结构框图；FIG17 is a hardware structure block diagram of a computer terminal (or mobile device) for implementing an image processing method according to an embodiment of the present application;

图18是根据本申请实施例的一种图像的处理方法的计算环境的结构框图。FIG. 18 is a structural block diagram of a computing environment of an image processing method according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分的实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work should fall within the scope of protection of this application.

本申请提供的技术方案主要采用大模型技术实现，此处的大模型是指具有大规模模型参数的深度学习模型，通常可以包含上亿、上百亿、上千亿、上万亿甚至十万亿以上的模型参数。大模型又可以称为基石模型/基础模型(Foundat ion Model)，通过大规模无标注的语料进行大模型的预训练，产出亿级以上参数的预训练模型，这种模型能适应广泛的下游任务，模型具有较好的泛化能力，例如大规模语言模型(Language Model forInformat ion Retrieval，简称为LLM)、多模态预训练模型(mult i-modal pre-trainingmodel)等。The technical solution provided in this application is mainly implemented by large-scale model technology. The large model here refers to a deep learning model with large-scale model parameters, which can usually contain hundreds of millions, tens of billions, hundreds of billions, trillions or even more than ten trillion model parameters. The large model can also be called a foundation model/foundation model. The large model is pre-trained by large-scale unlabeled corpus to produce a pre-trained model with more than 100 million parameters. This model can adapt to a wide range of downstream tasks, and the model has good generalization ability, such as a large-scale language model (Language Model for Information Retrieval, referred to as LLM), a multi-modal pre-training model, etc.

需要说明的是，大模型在实际应用时，可以通过少量样本对预训练模型进行微调，使得大模型可以应用于不同的任务中。例如，大模型可以广泛应用于自然语言处理(Natural Language Process ing，简称为NLP)、计算机视觉、语音处理等领域，具体可以应用于如视觉问答(Visual Quest ion Answering，简称为VQA)、图像描述(Image Capt ion，简称为IC)、图像生成等计算机视觉领域任务，也可以广泛应用于基于文本的情感分类、文本摘要生成、机器翻译等自然语言处理领域任务。因此，大模型主要的应用场景包括但不限于数字助理、智能机器人、搜索、在线教育、办公软件、电子商务、智能设计等。在本申请实施例中，以在对话场景下，通过生成式大模型训练得到的目标语言处理模型来应用程序生成为例进行解释说明。It should be noted that when the large model is actually applied, the pre-trained model can be fine-tuned through a small number of samples so that the large model can be applied to different tasks. For example, the large model can be widely used in natural language processing (NLP), computer vision, speech processing and other fields, and can be specifically applied to computer vision tasks such as visual question answering (VQA), image description (IC), image generation, etc., and can also be widely used in text-based sentiment classification, text summary generation, machine translation and other natural language processing tasks. Therefore, the main application scenarios of the large model include but are not limited to digital assistants, intelligent robots, search, online education, office software, e-commerce, intelligent design, etc. In the embodiment of the present application, the target language processing model obtained by training the generative large model in the dialogue scenario is used as an example to explain the application program generation.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those steps or units clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

首先，在对本申请实施例进行描述的过程中出现的部分名词或术语适用于如下解释：First, some nouns or terms that appear in the description of the embodiments of the present application are subject to the following explanations:

稳定扩散(Stable Diffus ion，简称为SD)模型，是一种扩散模型，属于生成式模型。扩散模型通常用于描述信息、疾病传播等在网络中的传播过程。生成式模型则是指一类用于对数据进行建模的机器学习模型。Stable Diffus ion模型可以被用于建模某种信息或现象在网络中的传播和扩散过程；The Stable Diffusion (SD) model is a diffusion model and a generative model. Diffusion models are usually used to describe the propagation process of information, disease propagation, etc. in the network. Generative models refer to a type of machine learning model used to model data. The Stable Diffusion model can be used to model the propagation and diffusion process of certain information or phenomena in the network;

变分自动编码器(Variat ional Autoencoder，简称为VAE)，它是一种图像编码网络，用于提取图像特征。VAE是一种生成式模型，通常用于学习数据的潜在表示。在这个描述中，它是Stable Diffus ion模型的基本组成模块之一，用于处理图像特征的提取。这说明了VAE在生成式模型中的重要性和应用价值；Variational Autoencoder (VAE) is an image encoding network used to extract image features. VAE is a generative model that is often used to learn the potential representation of data. In this description, it is one of the basic building blocks of the Stable Diffus ion model, which is used to process the extraction of image features. This illustrates the importance and application value of VAE in generative models;

U型网络(U-Net，简称为Unet)，是一种卷积神经网络结构，通常用于图像分割任务。它具有下采样模块和上采样模块，能够有效地捕获图像中的局部信息和全局信息。Unet网络是Stable Diffus ion模型的基本组成模块之一，这表明Unet网络可以应用于StableDiffus ion模型的构建中，用于处理图像数据的特征提取和分割。这说明了Unet网络在图像处理和生成式模型中的重要性；U-Net (Unet for short) is a convolutional neural network structure, which is usually used for image segmentation tasks. It has a downsampling module and an upsampling module, which can effectively capture local and global information in the image. The Unet network is one of the basic components of the Stable Diffus ion model, which shows that the Unet network can be applied to the construction of the Stable Diffus ion model to process feature extraction and segmentation of image data. This shows the importance of the Unet network in image processing and generative models;

虚拟试穿(TryOn)技术，输入信息是一张包含人像的图像以及一张平铺的目标服饰图。该技术的目标是去除人像原来的服饰，并以一种自然的方式将目标服饰图像融合在人像上。这种技术可能会涉及图像分割、服饰合成和图像融合等计算机视觉和图形处理技术，以实现对人像试穿服装的虚拟展示；Virtual TryOn technology, the input information is an image containing a portrait and a flat target clothing image. The goal of this technology is to remove the original clothing of the portrait and fuse the target clothing image onto the portrait in a natural way. This technology may involve computer vision and graphics processing technologies such as image segmentation, clothing synthesis and image fusion to achieve a virtual display of the portrait trying on clothing;

对比语言-图像预训练(Contrast ive Language-Image Pre-Training，简称为CLIP)，是一种图文相关性匹配模型。这种模型旨在将图像和文本进行相关性匹配，使得模型能够理解图像和文本之间的语义关系。CLIP模型的设计使其能够同时处理图像和文本数据，从而实现更加全面的语义理解和相关性匹配。Contrastive Language-Image Pre-Training (CLIP) is a model for image-text relevance matching. This model aims to match images and texts for relevance, so that the model can understand the semantic relationship between images and texts. The CLIP model is designed to process image and text data simultaneously, thereby achieving more comprehensive semantic understanding and relevance matching.

实施例1Example 1

根据本申请实施例，提供了一种图像的处理方法，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present application, a method for processing an image is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.

考虑到大模型的模型参数量庞大，且移动终端的运算资源有限，本申请实施例提供的上述图像的处理方法可以应用于如图1所示的应用场景，但不仅限于此。在如图1所示的应用场景中，大模型部署在服务器30中，该服务器可以为云端。服务器30可以通过局域网连接、广域网连接、因特网连接，或者其他类型的数据网络，连接一个或多个终端设备10，其中，终端设备10可以为客户端设备，此处的客户端设备可以包括但不限于：智能手机、平板电脑、笔记本电脑、掌上电脑、个人计算机、智能家居设备、车载设备等，客户端设备共同构成了与该服务器相对的客户端。在客户端设备上图形用户界面上可以部署用来获取产品图像的操作界面，该操作界面可以为电商平台的操作界面。终端设备10可以通过图形用户界面与用户进行交互，实现对大模型的调用，进而实现本申请实施例所提供的图像的处理方法。服务器30与终端设备10之间可以通过网络20进行信息交互。服务器中的数据库40中可以部署有大模型，在本申请实施例中，数据库40中存储有通过文生图模型训练得到的图像处理模型。可以用于通过服务器30进行调用，来模拟穿戴表现结果。Considering the huge amount of model parameters of the large model and the limited computing resources of the mobile terminal, the processing method of the above-mentioned image provided in the embodiment of the present application can be applied to the application scenario shown in Figure 1, but is not limited thereto. In the application scenario shown in Figure 1, the large model is deployed in the server 30, which can be a cloud. The server 30 can be connected to one or more terminal devices 10 through a local area network connection, a wide area network connection, an Internet connection, or other types of data networks, wherein the terminal device 10 can be a client device, and the client device here can include but is not limited to: a smart phone, a tablet computer, a laptop computer, a PDA, a personal computer, a smart home device, a vehicle-mounted device, etc., and the client devices together constitute the client relative to the server. An operation interface for obtaining product images can be deployed on the graphical user interface on the client device, and the operation interface can be an operation interface of an e-commerce platform. The terminal device 10 can interact with the user through the graphical user interface to realize the call of the large model, and then realize the image processing method provided in the embodiment of the present application. Information exchange can be carried out between the server 30 and the terminal device 10 through the network 20. A large model may be deployed in the database 40 in the server. In the embodiment of the present application, the database 40 stores an image processing model trained by the Wensheng graph model, which can be used to call through the server 30 to simulate the wearing performance results.

在本申请实施例中，终端设备、网络和服务器构成的系统可以执行如下步骤：若用户具有需要被展示一种产品的穿戴表现结果的需求时，可以在相应的终端设备10上的操作界面上执行输入或者选择等操作，使得到所要展示穿戴表现结果的产品对应的产品图像。终端设备10可以将产品图像通过网络20发送给服务器30。用户还可以在操作界面上输入展示的穿戴表现结果以及载体对应的文本描述信息。终端设备10可以将文本描述信息通过网络20发送给服务器30。服务器30接收到产品图像和文本描述信息之后，可以执行如下步骤：步骤S102，识别出待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；步骤S104，获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；步骤S106，利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。In the embodiment of the present application, the system composed of the terminal device, the network and the server can perform the following steps: if the user needs to be shown the wearing performance result of a product, he can perform input or selection operations on the operation interface of the corresponding terminal device 10, so as to obtain the product image corresponding to the product for which the wearing performance result is to be shown. The terminal device 10 can send the product image to the server 30 through the network 20. The user can also input the displayed wearing performance result and the text description information corresponding to the carrier on the operation interface. The terminal device 10 can send the text description information to the server 30 through the network 20. After receiving the product image and text description information, the server 30 may execute the following steps: Step S102, identifying the product image to be processed, wherein the image content of the product image includes at least one product to be displayed; Step S104, obtaining text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by the carrier; Step S106, using the text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

可以将待展示穿戴表现结果的目标图像通过网络20输出至终端设备10中，来对该产品在载体身上的穿戴表现结果进行展示，供用户查看。The target image of the wearing performance result to be displayed can be output to the terminal device 10 through the network 20 to display the wearing performance result of the product on the carrier for the user to view.

基于图像中模拟的穿戴表现结果效果差的技术问题，本申请提出了一种利用文生图框架的虚拟试衣技术，通过简单的文字描述来对产品的虚拟试衣效果进行文字表达，利用文生图模型训练出来的图像处理模型对其进行分析，即可生成高度真实的模特试衣效果图，从而消除了对特定模特图片的依赖，为用户提供具有更大自由度、灵活度和定制化选择的产品展示效果，达到降低了通过图像来模拟产品的穿戴表现结果的局限性的目的，进而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。Based on the technical problem that the wearing performance results simulated in images are poor, the present application proposes a virtual fitting technology using a Vincent graph framework, which uses simple text descriptions to express the virtual fitting effects of the product, and uses an image processing model trained by a Vincent graph model to analyze it, thereby generating highly realistic model fitting effect images, thereby eliminating the dependence on specific model images, and providing users with product display effects with greater freedom, flexibility and customized choices, thereby reducing the limitations of simulating the wearing performance results of products through images, thereby achieving the technical effect of improving the wearing performance results simulated in images, and solving the technical problem of poor wearing performance results simulated in images.

需要说明的是，在客户端设备的运行资源能够满足大模型的部署和运行条件的情况下，本申请实施例可以在客户端设备中进行。It should be noted that, when the operating resources of the client device can meet the deployment and operating conditions of the large model, the embodiments of the present application can be carried out in the client device.

在上述运行环境下，本申请提供了如图2所示的图像的处理的方法。需要说明的是，该实施例的图像的处理的方法可以由图1所示实施例的移动终端执行。图2是根据本申请实施例的一种图像的处理的方法的流程图，如图2所示，该方法可以包括如下步骤：In the above operating environment, the present application provides a method for processing an image as shown in FIG2. It should be noted that the method for processing an image of this embodiment can be executed by the mobile terminal of the embodiment shown in FIG1. FIG2 is a flow chart of a method for processing an image according to an embodiment of the present application. As shown in FIG2, the method may include the following steps:

步骤S202，识别出待处理的产品图像。Step S202: identifying the product image to be processed.

在本申请上述步骤S202提供的技术方案中，待展示产品可以为电商平台中需要展示穿戴效果的产品，比如，可以为服饰、配饰和鞋包等，服饰可以为上衣、下装和裙子等，配饰可以为发饰和首饰等，鞋包可以为鞋子和背包等。需要说明的是，上述待展示的产品仅为举例说明，此处不做具体限制，只要是需要展示穿戴效果的产品，均在本申请实施例的保护范围之内。In the technical solution provided in the above step S202 of the present application, the products to be displayed can be products that need to display wearing effects in the e-commerce platform, for example, they can be clothing, accessories, shoes and bags, etc. Clothing can be tops, bottoms and skirts, etc., accessories can be hair accessories and jewelry, etc., shoes and bags can be shoes and backpacks, etc. It should be noted that the above products to be displayed are only examples, and no specific restrictions are made here. As long as the products need to display wearing effects, they are within the protection scope of the embodiments of the present application.

可选地，产品图像中所展示的图像内容可以包括至少一待展示产品。若待展示穿戴效果的产品为服饰，则产品图像为服饰图，也可以称为服饰平铺图。此处的产品图像与待展示的产品一一对应，此处不再过多赘述。Optionally, the image content displayed in the product image may include at least one product to be displayed. If the product to be displayed is clothing, the product image is a clothing image, which can also be called a clothing tile image. The product image here corresponds to the product to be displayed one by one, and no further details are given here.

在该实施例中，可以对待处理的产品图像进行识别。In this embodiment, the product image to be processed may be identified.

可选地，在获取到待处理的产品图像之后，可以对产品图像进行识别，得到其中的待展示产品。Optionally, after the product image to be processed is acquired, the product image may be identified to obtain the product to be displayed therein.

举例而言，若电商平台的用户或商家需要获取某一产品的穿戴效果，则可以在其所对应的终端设备的操作界面上，上传待展示产品对应的产品图像，或者从操作界面上展示的大量产品的图像中筛选出所需展示穿戴效果的产品的图像，来作为产品图像。在从终端设备的操作界面上，确定出产品图像之后，可以将产品图像通过网络传输给服务器。在服务器中可以对所接收的产品图像进行识别，从而确定出需要进行穿戴效果展示的产品，其中，用户可以为电商平台的顾客或者商家。For example, if a user or merchant of an e-commerce platform needs to obtain the wearing effect of a certain product, they can upload the product image corresponding to the product to be displayed on the operation interface of the corresponding terminal device, or filter out the image of the product that needs to display the wearing effect from the large number of product images displayed on the operation interface as the product image. After determining the product image on the operation interface of the terminal device, the product image can be transmitted to the server through the network. The received product image can be identified in the server to determine the product that needs to be displayed for the wearing effect, wherein the user can be a customer or merchant of the e-commerce platform.

需要说明的是，上述获取产品图像的过程和方式仅为举例说明，此处不做具体限制，只要是能够采集到具有待展示产品的产品图像的方式，均在本申请实施例的保护范围之内。It should be noted that the above process and method of obtaining product images are only for illustration and are not specifically limited here. As long as the method can capture product images of the products to be displayed, it is within the protection scope of the embodiments of the present application.

步骤S204，获取与产品图像中待展示产品对应的文本描述信息。Step S204: obtaining text description information corresponding to the product to be displayed in the product image.

在本申请上述步骤S204提供的技术方案中，文本描述信息可以用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果。载体可以为呈现产品的穿戴效果的人物，比如，可以为模特，也可以为待购买该产品的顾客，此处仅为举例说明，不对呈现穿戴效果的载体做具体限制。穿戴表现结果可以为产品在载体的身上的穿戴效果，也可以称为虚拟试穿效果或上身效果，比如，若载体为模特，则可以为模特的上身效果，也可以称为模特试穿效果。此处仅为举例说明，不对穿戴表现结果做具体限制，可以与实际穿戴产品的载体相对应，此处不再一一赘述。In the technical solution provided in the above step S204 of the present application, the text description information can be used to at least describe the wearing performance result displayed when the product to be displayed is worn by the carrier. The carrier can be a person who presents the wearing effect of the product, for example, it can be a model, or it can be a customer who is to purchase the product. This is only for example, and no specific limitation is made to the carrier presenting the wearing effect. The wearing performance result can be the wearing effect of the product on the carrier, which can also be called a virtual try-on effect or an upper body effect. For example, if the carrier is a model, it can be the upper body effect of the model, which can also be called a model try-on effect. This is only for example, and no specific limitation is made to the wearing performance result. It can correspond to the carrier of the actual worn product, and will not be described one by one here.

可选地，文本描述信息可以为穿戴产品的载体的外形、展示穿戴效果时的姿势，以及产品的穿戴方式等文字描述。比如，文本描述信息可以用于描述载体的性别、体重、佩饰、身高、身形和发型等的外形特征，也可以描述载体处于正面站立、侧面站立或者坐下等姿势特征，也可以为描述待展示产品的穿戴方式，比如，可以为服饰的穿着方式，服饰是否开衫、挽袖和叠穿等穿戴属性。若文本描述信息为用户根据自身对穿戴效果的展示需求输入的，则所构建出的载体为定制化模特，所对应的外形特征为定制化模特属性，基于用户自身对穿戴效果的展示需求所构建的穿戴属性为定制化服饰属性。Optionally, the text description information can be a text description of the appearance of the carrier of the wearable product, the posture when showing the wearing effect, and the wearing method of the product. For example, the text description information can be used to describe the appearance characteristics of the carrier, such as gender, weight, accessories, height, body shape and hairstyle, and can also describe the posture characteristics of the carrier in front standing, side standing or sitting, and can also be used to describe the wearing method of the product to be displayed, for example, it can be the wearing method of clothing, whether the clothing is open-necked, rolled up sleeves, and layered, etc. If the text description information is input by the user according to his own display requirements for the wearing effect, the constructed carrier is a customized model, the corresponding appearance characteristics are customized model attributes, and the wearing attributes constructed based on the user's own display requirements for the wearing effect are customized clothing attributes.

需要说明的是，上述文本描述信息所描述的内容和特征类型仅为举例说明，此处不做具体限制，可以根据实际待展示产品的穿戴效果的需求进行确定。只要是能够通过文本方式描述载体如何展示穿戴效果，以及载体本身的过程和文本描述信息，均在本申请实施例所支持的保护范围之内。It should be noted that the content and feature types described in the above text description information are only examples and are not specifically limited here. They can be determined according to the actual requirements of the wearing effect of the product to be displayed. As long as it can describe how the carrier displays the wearing effect in a textual manner, as well as the process and text description information of the carrier itself, they are all within the protection scope supported by the embodiments of this application.

在该实施例中，在识别出待处理的产品图像之后，可以获取与该产品图像中待展示产品对应的文本描述信息。In this embodiment, after the product image to be processed is identified, text description information corresponding to the product to be displayed in the product image may be obtained.

举例而言，若电商平台的顾客或商家需要获取某一产品的穿戴效果，则在操作界面上，上传了产品图像之后，可以在操作界面上相应的文本输入框中，输入能够描述对该产品进行穿戴展示的载体的外形特征，也可以输入载体对产品进行穿戴展示时的姿势特征等，还可以输入载体如何对产品进行穿戴的方式。可以将上述所输入的外形特征、穿戴方式和姿势特征的文本，作为该待展示产品的文本描述信息。For example, if a customer or merchant on an e-commerce platform needs to obtain the wearing effect of a certain product, after uploading the product image on the operation interface, they can enter the appearance characteristics of the carrier that can describe the wearable display of the product in the corresponding text input box on the operation interface, or the posture characteristics of the carrier when the product is worn and displayed, and the way the carrier wears the product. The text of the above-mentioned appearance characteristics, wearing methods and posture characteristics can be used as the text description information of the product to be displayed.

再举例而言，可以将文本描述信息通过网络传输至服务器中，从而在模拟穿戴表现结果的时候，便于服务器能够了解到用户对穿戴效果展示的需求。For another example, the text description information can be transmitted to the server through the network, so that when simulating the wearing performance results, the server can understand the user's needs for the wearing effect display.

需要说明的是，上述获取文本描述信息的过程和方式，仅为举例说明，此处不做具体限制，只要是能够通过文本描述来构建虚拟试穿效果的操作和过程，均在本申请实施例的保护范围之内。It should be noted that the above process and method of obtaining text description information are for illustration only and are not specifically limited here. As long as the operations and processes can construct a virtual try-on effect through text description, they are within the protection scope of the embodiments of the present application.

在相关的虚拟试衣技术中，通常需要一张特定的模特图片，这可能对一些商家构成挑战。这可能限制了商家和顾客在试穿服装时的灵活性，尤其是对于那些希望提供个性化试穿体验的商家的顾客而言。随便使用现有模特照片可能会触及版权问题，这可能限制了商家使用特定模特图片的自由度。受限于模特原有的体型和穿着的服装，这可能会影响试衣效果，使得虚拟试衣技术的应用受到一定的限制。In related virtual fitting technologies, a specific model image is usually required, which may pose a challenge to some businesses. This may limit the flexibility of businesses and customers when trying on clothes, especially for customers of businesses that want to provide a personalized fitting experience. The casual use of existing model photos may touch on copyright issues, which may limit the freedom of businesses to use specific model images. Restricted by the model's original body shape and the clothes they wear, this may affect the fitting effect, making the application of virtual fitting technology subject to certain restrictions.

总的来说，相关的虚拟试衣技术在克服这些挑战方面仍有进一步发展的空间，例如开发更加灵活和自由度更大的虚拟试衣技术，以更好地满足商家和消费者的需求。因此，仍存在展示产品的穿戴表现的局限性高的技术问题。In general, the relevant virtual fitting technology still has room for further development in overcoming these challenges, such as developing more flexible and free virtual fitting technology to better meet the needs of merchants and consumers. Therefore, there is still a technical problem of high limitations in showing the wearing performance of products.

然而，在本申请实施例中，需要用户输入文字描述，这些描述将用于指定生成图的背景、模特的性别、发型等属性。用户提供的文字描述可以包括对虚拟试穿图像的背景和模特属性等的具体要求。这种方式为用户提供了一种便捷的方式来自定义和指定虚拟试穿图像的属性，从而实现对待展示产品更加个性化的试穿效果。总的来说，通过上述方法，可以允许用户通过简单的文字描述来指定虚拟试穿图像的各种属性，这为用户提供了更大的灵活性和定制化选择。However, in the embodiment of the present application, the user is required to input text descriptions, which will be used to specify the background of the generated image, the gender of the model, the hairstyle and other attributes. The text description provided by the user may include specific requirements for the background and model attributes of the virtual try-on image. This method provides users with a convenient way to customize and specify the attributes of the virtual try-on image, thereby achieving a more personalized try-on effect for the displayed products. In general, through the above method, users can be allowed to specify various attributes of the virtual try-on image through simple text descriptions, which provides users with greater flexibility and customization options.

步骤S206，利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像。Step S206: Use the text description information to guide the image processing model to analyze the product image to obtain the target image.

在本申请上述步骤S206提供的技术方案中，图像处理模型可以为对文生图模型训练得到，可以用于生成符合文本描述信息所描述的对产品进行穿戴展示的图像，可以为本申请该实施例的虚拟试穿模型，也可以称为TryOn模型，该模型可以为双Unet架构。文生图模型可以为扩散模型。目标图像的图像内容可以用于模拟穿戴表现结果，目标图像可以为通过图像处理模型，基于文本描述信息，文生图得到的图像，可以用于展示虚拟试衣结果，若待展示产品为服饰，则目标图像可以为载体的服饰上身图。文生图模型可以为文生图框架。In the technical solution provided in the above step S206 of the present application, the image processing model can be obtained by training the Wensheng graph model, which can be used to generate an image of the product being worn and displayed in accordance with the description of the text description information, and can be a virtual try-on model of this embodiment of the present application, which can also be called a TryOn model, and the model can be a dual Unet architecture. The Wensheng graph model can be a diffusion model. The image content of the target image can be used to simulate the wearing performance result, and the target image can be an image obtained by the Wensheng graph through the image processing model based on the text description information, which can be used to display the virtual fitting result. If the product to be displayed is clothing, the target image can be a picture of the clothing on the carrier. The Wensheng graph model can be a Wensheng graph framework.

在该实施例中，在获取与产品图像中待展示产品对应的文本描述信息之后，可以利用文本描述信息，来引导图像处理模型对产品图像进行分析，得到目标图像。In this embodiment, after obtaining the text description information corresponding to the product to be displayed in the product image, the text description information can be used to guide the image processing model to analyze the product image to obtain the target image.

可选地，在获取到产品图像以及对应的文本描述信息之后，可以利用图像处理模型，来分析文本描述信息以及产品图像，从而构造出符合文本描述信息中用户对待展示产品的穿搭效果的需求的目标图像。Optionally, after obtaining the product image and the corresponding text description information, an image processing model can be used to analyze the text description information and the product image, so as to construct a target image that meets the user's requirements for the wearing effect of the displayed product in the text description information.

举例而言，在服务器接收到产品图像和文本描述信息之后，可以调用服务器的数据库中所部署的图像处理模型，来对产品图像和文本描述信息进行分析，从而模拟出符合文本描述信息，待展示产品由载体穿戴的情况下的穿戴表现结果，并生成相应的目标图像。For example, after the server receives the product image and text description information, it can call the image processing model deployed in the server's database to analyze the product image and text description information, thereby simulating the wearing performance results of the product to be displayed when it is worn by the carrier in accordance with the text description information, and generating the corresponding target image.

在本申请实施例中，提出了一种文生图框架的虚拟试穿方法，通过该方法可以解决虚拟试穿技术的行业痛点，也即，对于特定模特图片的依赖和限制。这种方法不仅可以保证模拟试穿的高度真实感，使得试穿效果更加逼真，且可以实现通过用户简单的文字描述便能构建模特试穿效果，大大提高了操作的便捷性。这种方法支持语言描述来个性化定制图片中的背景、模特的性别、身形甚至皮肤颜色特征，从而实现了提高个性化定制的灵活性的技术效果。In an embodiment of the present application, a virtual try-on method based on a text-based image framework is proposed, through which the industry pain points of virtual try-on technology, namely, the reliance on and limitation of specific model pictures, can be solved. This method can not only ensure the high realism of the simulated try-on, making the try-on effect more realistic, but also can realize the construction of the model try-on effect through a simple text description by the user, greatly improving the convenience of operation. This method supports language description to personalize the background in the picture, the gender, body shape and even skin color characteristics of the model, thereby achieving the technical effect of improving the flexibility of personalized customization.

通过本申请上述步骤S202至步骤S206，若需要对某一产品在载体上进行穿戴的穿戴表现进行确定和展示，则可以识别出该产品的待处理的产品图像。获取在载体上对该产品进行展示的文本描述信息。可以利用文本描述信息，来引导预先通过文生图模型训练的图像处理模型，对产品图像进行分析，得到能够模拟该产品在载体上的穿戴表现的目标图像。由于本申请实施例考虑到相关技术中的问题，提出了一种利用文生图框架的虚拟试衣技术，通过简单的文字描述来对产品的虚拟试衣效果进行文字表达，利用文生图模型训练出来的图像处理模型对其进行分析，即可生成高度真实的模特试衣效果图，从而消除了对特定模特图片的依赖，为用户提供具有更大自由度、灵活度和定制化选择的产品展示效果，达到降低了通过图像来模拟产品的穿戴表现结果的局限性的目的，进而实现了提高图像中模拟的穿戴表现结果的效果的技术效果，解决了图像中模拟的穿戴表现结果的效果差的技术问题。Through the above steps S202 to S206 of the present application, if it is necessary to determine and display the wearing performance of a certain product on a carrier, the product image to be processed of the product can be identified. Obtain the text description information for displaying the product on the carrier. The text description information can be used to guide the image processing model pre-trained by the Wensheng graph model to analyze the product image and obtain a target image that can simulate the wearing performance of the product on the carrier. Since the embodiment of the present application takes into account the problems in the related art, a virtual fitting technology using the Wensheng graph framework is proposed, and the virtual fitting effect of the product is expressed in words through a simple text description, and the image processing model trained by the Wensheng graph model is used to analyze it, so that a highly realistic model fitting effect image can be generated, thereby eliminating the dependence on specific model pictures, providing users with a product display effect with greater freedom, flexibility and customized selection, and achieving the purpose of reducing the limitations of the wearing performance results of the product simulated by the image, thereby achieving the technical effect of improving the effect of the simulated wearing performance results in the image, and solving the technical problem of poor effect of the simulated wearing performance results in the image.

下面对该实施例的上述方法进行进一步的介绍。The above method of this embodiment is further introduced below.

作为一种可选的实施方式，步骤S206，利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，包括：利用图像处理模型，从文本描述信息中提取出文本特征；利用文本特征，引导图像处理模型对产品图像进行分析，得到目标图像。As an optional implementation, step S206, using the text description information to guide the image processing model to analyze the product image to obtain the target image, includes: using the image processing model to extract text features from the text description information; using the text features to guide the image processing model to analyze the product image to obtain the target image.

在该实施例中，在利用文本描述信息，引导图像处理模型对产品图像进行分析的过程中，可以利用图像处理模型，来从文本描述信息中提取出文本特征，利用文本特征来引导图像处理模型对产品图像进行分析，从而得到符合文本描述信息所对应的穿戴效果的目标图像，其中，文本特征可以为文本描述信息中能够反映出用户对产品穿搭效果的需求的文本字段，该文本字段可以为模特特征描述，也可以为模特姿势描述，也可以为目标图像的背景描述等。模特特征描述可以为“矮个子”、“长发”等。In this embodiment, in the process of using text description information to guide the image processing model to analyze the product image, the image processing model can be used to extract text features from the text description information, and the text features are used to guide the image processing model to analyze the product image, so as to obtain a target image that meets the wearing effect corresponding to the text description information, wherein the text feature can be a text field in the text description information that can reflect the user's demand for the product wearing effect, and the text field can be a model feature description, a model posture description, or a background description of the target image, etc. The model feature description can be "short", "long hair", etc.

需要说明的是，上述的文本特征以及对应的特征描述仅为举例说明，此处不做具体限制，只要是为文本描述信息中能够反映出用户对穿搭效果，以及展示穿搭效果的目标图像的需求的文本特征，均在本申请实施例的保护范围之内。It should be noted that the above-mentioned text features and corresponding feature descriptions are only for illustrative purposes and are not specifically limited here. As long as the text features in the text description information can reflect the user's demand for dressing effects and the target image showing the dressing effects, they are within the protection scope of the embodiments of the present application.

可选地，从文本描述信息中提取出文本特征，并利用该文本特征来引导TryOn模型确定目标图像的过程涉及自然语言处理和图像处理的复杂任务。Optionally, the process of extracting text features from the text description information and using the text features to guide the TryOn model to determine the target image involves complex tasks of natural language processing and image processing.

可选地，利用自然语言处理，来对文本描述信息进行处理，以提取出文本描述信息的语义和语境特征。比如，可以将文本描述信息编码成高纬度的向量表示，这些向量即为文本特征。需要说明的是，上述从文本描述信息中提取出文本特征的过程和方法仅为举例说明，此处不做具体限制。Optionally, natural language processing is used to process the text description information to extract semantic and contextual features of the text description information. For example, the text description information can be encoded into a high-dimensional vector representation, and these vectors are text features. It should be noted that the above process and method for extracting text features from text description information are only for illustration and are not specifically limited here.

可选地，文本特征可以包括与服装相关的描述，比如，服装的颜色、款式、材质等信息，还可以包括关于模特的描述，比如，性别、体型、发型等信息。这些文本特征可以帮助指导TryOn模型生成符合文本描述的虚拟试穿效果。举例而言，如果文本描述信息中提到了“蓝色连衣裙”的文本特征，那么图像处理模型可以利用这个文本特征来确保生成的虚拟试穿图像中包含蓝色连衣裙的穿戴效果。此处仅为举例说明，不对文本描述信息和所展示的穿戴效果做具体限制。Optionally, the text features may include descriptions related to clothing, such as the color, style, material, etc. of the clothing, and may also include descriptions about the model, such as gender, body shape, hairstyle, etc. These text features can help guide the TryOn model to generate a virtual try-on effect that conforms to the text description. For example, if the text description information mentions the text feature of "blue dress", the image processing model can use this text feature to ensure that the generated virtual try-on image contains the wearing effect of the blue dress. This is only an example, and there is no specific limitation on the text description information and the displayed wearing effect.

可选地，上述的文本特征可以用来引导TryOn模型生成符合该文本特征的虚拟试穿效果，也即，TryOn模型可以根据这些文本特征生成对应的穿戴效果的目标图像，确保所生成的目标图像与用户的文本描述相符合。Optionally, the above-mentioned text features can be used to guide the TryOn model to generate a virtual try-on effect that conforms to the text features, that is, the TryOn model can generate a target image of the corresponding wearing effect according to these text features, ensuring that the generated target image conforms to the user's text description.

作为一种可选的实施方式，利用图像处理模型从文本描述信息中提取出文本特征，包括：利用图像处理模型中的文本特征提取模型，从文本描述信息中提取出文本特征；利用文本特征，引导图像处理模型对产品图像进行分析，得到目标图像，包括：利用文本特征，引导图像处理模型中的注意力处理模型，对产品图像进行注意力处理，得到目标图像。As an optional implementation, an image processing model is used to extract text features from text description information, including: using a text feature extraction model in the image processing model to extract text features from the text description information; using the text features to guide the image processing model to analyze the product image to obtain a target image, including: using the text features to guide the attention processing model in the image processing model to perform attention processing on the product image to obtain the target image.

在该实施例中，在利用图像处理模型从文本描述信息中提取出文本特征的过程中，可以利用图像处理模型中的文本特征体模型，来从文本描述信息中提取出文本特征。在利用文本特征引导图像处理模型对产品图像进行分析的过程中，可以利用文本特征，引导图像处理模型中的注意力处理模型，来对产品图像进行注意力处理，从而得到目标图像，其中，文本特征提取模型可以为文本编码器(text encoder)。注意力处理模型中可以包括交叉注意力(Cross Attent ion)模块。In this embodiment, in the process of extracting text features from text description information using an image processing model, a text feature body model in the image processing model can be used to extract text features from the text description information. In the process of using text features to guide the image processing model to analyze product images, the text features can be used to guide the attention processing model in the image processing model to perform attention processing on the product image, thereby obtaining a target image, wherein the text feature extraction model can be a text encoder. The attention processing model can include a cross attention module.

可选地，双U-Net的TryOn模型通过一个text encoder来得到文本描述的特征，并将这些特征分别注入到双U-Net的TryOn模型的Cross Attent ion模块中。这样的设置可以确保用户输入的文字描述能够直接影响到最终的虚拟试穿效果，使得用户的个性化需求和定制化要求能够被充分考虑。Optionally, the dual U-Net TryOn model obtains the features of the text description through a text encoder, and injects these features into the Cross Attention module of the dual U-Net TryOn model. This setting can ensure that the text description entered by the user can directly affect the final virtual try-on effect, so that the user's personalized needs and customization requirements can be fully considered.

可选地，text encoder通常指的是一种能够将文本信息转换为向量或者其他数学表示的模型或算法。这些向量或数学表示可以被用于文本分类、语义理解、语言生成等任务中。可选地，该实施例的文本编码器包括诸如BERT、GPT等基于深度学习的模型，它们能够将文本转换为高维度的语义向量表示，以便计算机能够更好地理解和处理文本信息。Optionally, a text encoder generally refers to a model or algorithm that can convert text information into vectors or other mathematical representations. These vectors or mathematical representations can be used in tasks such as text classification, semantic understanding, and language generation. Optionally, the text encoder of this embodiment includes deep learning-based models such as BERT and GPT, which can convert text into high-dimensional semantic vector representations so that computers can better understand and process text information.

可选地，Cross Attent ion是指模型在处理多个输入序列时，通过计算序列之间的关联性，以便更好地捕捉序列之间的关联信息。交叉注意力机制通常用于序列到序列的任务，例如机器翻译、文本摘要等，以帮助模型更好地理解输入序列之间的关系，并生成更准确的输出。Optionally, Cross Attention refers to the model calculating the correlation between sequences when processing multiple input sequences in order to better capture the correlation information between sequences. The cross attention mechanism is often used in sequence-to-sequence tasks such as machine translation, text summarization, etc. to help the model better understand the relationship between input sequences and generate more accurate outputs.

作为一种可选的实施方式，该方法还包括：对产品图像进行检测，得到产品图像中待展示产品的属性信息；利用文本特征，引导图像处理模型中的注意力处理模型，对产品图像进行注意力处理，得到目标图像，包括：利用文本特征，引导注意力处理模型，对待展示产品的属性信息进行注意力处理，得到目标图像。As an optional implementation, the method also includes: detecting the product image to obtain attribute information of the product to be displayed in the product image; using text features to guide the attention processing model in the image processing model to perform attention processing on the product image to obtain a target image, including: using text features to guide the attention processing model to perform attention processing on the attribute information of the product to be displayed to obtain a target image.

在该实施例中，可以对产品图像进行检测，得到产品图像中待展示产品的属性信息。在利用文本特征，引导图像处理模型中的注意力处理模型，对产品图像进行注意力处理的过程中，可以利用文本特征，引导注意力处理模型，来对待展示产品的属性信息进行注意力处理，得到目标图像，其中，属性信息可以为产品的外观信息和材质信息等，比如，若待展示产品为服饰，则服饰的属性信息可以为服饰的颜色、细节纹理以及商标(logo)。属性信息也可以称为知识信息或知识注入信息。需要说明的是，上述的属性信息仅为举例说明，此处不做具体限制，可以根据实际的待展示产品的外观等进行确定。In this embodiment, the product image can be detected to obtain the attribute information of the product to be displayed in the product image. In the process of using text features to guide the attention processing model in the image processing model to perform attention processing on the product image, the text features can be used to guide the attention processing model to perform attention processing on the attribute information of the product to be displayed to obtain the target image, wherein the attribute information can be the appearance information and material information of the product, etc. For example, if the product to be displayed is clothing, the attribute information of the clothing can be the color, detailed texture and trademark (logo) of the clothing. Attribute information can also be called knowledge information or knowledge injection information. It should be noted that the above-mentioned attribute information is only for example purposes and is not specifically limited here. It can be determined based on the actual appearance of the product to be displayed.

可选地，本申请实施例中还可以设置服饰检测模块。可以利用服饰检测模块来对产品图像中待展示产品进行识别和分析，确定相应的属性信息。需要说明的是，上述的服饰检测模型中的服饰仅为一种名称限制，而并非对服饰的属性信息进行确定，还可以对其他待展示产品进行分析。Optionally, a clothing detection module may be provided in the embodiment of the present application. The clothing detection module may be used to identify and analyze the products to be displayed in the product image and determine the corresponding attribute information. It should be noted that the clothing in the above clothing detection model is only a name restriction, and does not determine the attribute information of the clothing, and other products to be displayed may also be analyzed.

可选地，用户上传的服饰图经过一个服饰检测模块，该模块的作用是识别并裁剪出服饰主体区域，以便后续的处理。这一步是为了确保后续的模型能够专注于服饰本身，而不受到背景或其他无关因素的干扰。裁剪出的服饰主体区域被送入一个VAE模型。VAE是一种生成模型，可以学习服饰的潜在表示，提供服饰的颜色、细节纹理等信息。通过VAE模型，可以对服饰进行特征提取和编码，以便后续的处理和注入知识。Optionally, the clothing images uploaded by users are passed through a clothing detection module, which is responsible for identifying and cropping out the main clothing area for subsequent processing. This step is to ensure that the subsequent model can focus on the clothing itself without being disturbed by the background or other irrelevant factors. The cropped main clothing area is fed into a VAE model. VAE is a generative model that can learn the potential representation of clothing and provide information such as the color, detailed texture, etc. of clothing. Through the VAE model, the features of clothing can be extracted and encoded for subsequent processing and knowledge injection.

可选地，VAE模型的输出再送入后续的U-Net模型。U-Net是一种用于图像分割的卷积神经网络，常用于图像处理任务中。在这里，U-Net可能用于进一步处理服饰图像，提取更多的细节信息，并注入服饰的logo等知识信息，以便最终的虚拟试穿效果。Optionally, the output of the VAE model is fed into a subsequent U-Net model. U-Net is a convolutional neural network used for image segmentation and is commonly used in image processing tasks. Here, U-Net may be used to further process the clothing image, extract more detailed information, and inject knowledge information such as the clothing logo for the final virtual try-on effect.

上述整个流程涉及了图像检测、特征提取和知识注入等多个步骤，结合了多种深度学习模型，以实现服饰图像的处理和虚拟试穿效果的生成。这一流程在虚拟试穿技术中扮演重要角色，为用户提供了更加逼真和个性化的试穿体验。The entire process mentioned above involves multiple steps such as image detection, feature extraction and knowledge injection, and combines multiple deep learning models to achieve clothing image processing and virtual try-on effect generation. This process plays an important role in virtual try-on technology, providing users with a more realistic and personalized try-on experience.

作为一种可选的实施方式，注意力处理模型包括：第一自注意力处理模型和第一交叉注意力处理模型，其中，利用文本特征，引导注意力处理模型，对待展示产品的属性信息进行注意力处理，得到目标图像，包括：利用第一自注意力处理模型，对待展示产品的属性信息进行自注意力处理，得到第一自注意力处理结果；利用第一交叉注意力处理模型，对第一自注意力处理结果和文本特征进行交叉注意力处理，得到第一交叉注意力处理结果；基于第一交叉注意力处理结果，生成目标图像。As an optional implementation, the attention processing model includes: a first self-attention processing model and a first cross-attention processing model, wherein the attention processing model is guided by using text features to perform attention processing on the attribute information of the product to be displayed to obtain a target image, including: using the first self-attention processing model to perform self-attention processing on the attribute information of the product to be displayed to obtain a first self-attention processing result; using the first cross-attention processing model to perform cross-attention processing on the first self-attention processing result and text features to obtain a first cross-attention processing result; generating a target image based on the first cross-attention processing result.

在该实施例中，在利用文本特征，引导注意力处理模型对待展示产品的属性信息进行注意力处理，得到目标图像的过程中，可以利用注意力处理模型中的第一自注意力处理模型，来对待展示产品的属性信息进行自注意力处理，得到第一自注意力处理结果。还可以利用注意力处理模型中的第一交叉注意力处理模型，来对第一自注意力处理结果和文本特征进行交叉注意力处理，得到第一交叉注意力处理结果。可以记忆第一交叉注意力处理结果，生成目标图像，其中，注意力处理模型可以包括第一自注意力处理模型和第一交叉注意力处理模型。第一自注意力处理模型可以为自注意力(Self Attent ion)模块。第一交叉注意力处理模型可以为Cross Attent ion模块。In this embodiment, in the process of using text features to guide the attention processing model to perform attention processing on the attribute information of the product to be displayed to obtain the target image, the first self-attention processing model in the attention processing model can be used to perform self-attention processing on the attribute information of the product to be displayed to obtain a first self-attention processing result. The first cross-attention processing model in the attention processing model can also be used to perform cross-attention processing on the first self-attention processing result and the text features to obtain a first cross-attention processing result. The first cross-attention processing result can be memorized to generate a target image, wherein the attention processing model can include a first self-attention processing model and a first cross-attention processing model. The first self-attention processing model can be a self-attention (Self Attention ion) module. The first cross-attention processing model can be a Cross Attention module.

可选地，TryOn模型基于上述的产品图像、文本描述信息，也即，服饰图和文字描述，通过双Unet架构实现知识注入，最终输出服饰上身图。其中，双Unet架构通常指的是一种神经网络结构，它由两个相互关联的Unet网络组成，分别用于编码和解码输入的信息。通过这种结构，TryOn模型能够将来自服饰图、骨架图和文字描述的信息进行整合，以生成最终的服饰上身图。这种结构允许模型同时处理来自不同输入的信息，并在生成服饰上身图时注入相关的知识，比如姿态和服饰细节。这种方法有助于实现对用户需求的个性化定制，提供更加准确和符合期望的虚拟试穿效果。整体而言，这种双Unet架构的TryOn模型结合了多种输入信息，通过知识注入实现了对服饰上身图的生成，从而提供更加个性化和精确的虚拟试穿效果。Optionally, the TryOn model implements knowledge injection through a dual Unet architecture based on the above-mentioned product image and text description information, that is, the clothing image and text description, and finally outputs the clothing upper body image. Among them, the dual Unet architecture generally refers to a neural network structure, which consists of two interrelated Unet networks, which are used to encode and decode the input information respectively. Through this structure, the TryOn model can integrate information from the clothing image, skeleton image and text description to generate the final clothing upper body image. This structure allows the model to process information from different inputs at the same time and inject relevant knowledge, such as posture and clothing details, when generating the clothing upper body image. This method helps to achieve personalized customization for user needs and provide a more accurate and expected virtual try-on effect. Overall, this dual Unet architecture TryOn model combines multiple input information and realizes the generation of clothing upper body images through knowledge injection, thereby providing a more personalized and accurate virtual try-on effect.

在本申请实施例中，TryOn模型的架构为双U-Net，用于实现知识注入，比如，注入姿态、服饰细节等，最终输出服饰上身图。U-Net是一种常用于图像分割的卷积神经网络。这种双U-Net架构能够有效地整合服饰图和模特图的信息，并根据用户输入的文字描述，实现了知识注入，以生成最终的虚拟试穿效果。这种架构在虚拟试穿技术中发挥着重要的作用，为用户提供更加贴近实际和个性化的试穿体验。In the embodiment of the present application, the architecture of the TryOn model is a dual U-Net, which is used to realize knowledge injection, such as injecting posture, clothing details, etc., and finally outputting a clothing upper body picture. U-Net is a convolutional neural network commonly used for image segmentation. This dual U-Net architecture can effectively integrate the information of clothing pictures and model pictures, and realizes knowledge injection based on the text description entered by the user to generate the final virtual try-on effect. This architecture plays an important role in virtual try-on technology, providing users with a more realistic and personalized try-on experience.

可选地，TryOn模型通过一个text encoder来得到文本描述的特征，并将这些特征分别注入到双U-Net的Cross Attent ion模块中。这样的设计可以确保用户输入的文字描述能够直接影响到最终的虚拟试穿效果，使得用户的个性化需求和定制化要求能够被充分考虑。Optionally, the TryOn model obtains the features of the text description through a text encoder and injects these features into the Cross Attention module of the dual U-Net. This design ensures that the text description entered by the user can directly affect the final virtual try-on effect, so that the user's personalized needs and customization requirements can be fully considered.

作为一种可选的实施方式，该方法还包括：对载体进行姿态检测，得到载体的姿态信息，其中，姿态信息用于表示载体在模拟的穿戴表现结果中展示出的姿态；利用第一自注意力处理模型，对待展示产品的属性信息进行自注意力处理，得到第一自注意力处理结果，包括：利用第一自注意力处理模型，对待展示产品的属性信息和姿态信息进行自注意力处理，得到第一自注意力处理结果。As an optional implementation, the method also includes: performing posture detection on the carrier to obtain posture information of the carrier, wherein the posture information is used to represent the posture displayed by the carrier in the simulated wearing performance result; using a first self-attention processing model to perform self-attention processing on the attribute information of the product to be displayed to obtain a first self-attention processing result, including: using the first self-attention processing model to perform self-attention processing on the attribute information and posture information of the product to be displayed to obtain a first self-attention processing result.

在该实施例中，可以对载体进行姿态检测，得到载体的姿态信息。在利用第一自注意力处理模型，对待展示产品的属性信息进行自注意力处理，得到第一自注意力处理结果的过程中，可以利用第一自注意力处理模型，对待展示产品的属性信息和姿态信息进行自注意力处理，得到第一自注意力处理结果，其中，姿态检测可以为骨架检测。姿态信息可以为骨架检测得到的骨架(pose)图。In this embodiment, the carrier may be subjected to posture detection to obtain posture information of the carrier. In the process of using the first self-attention processing model to perform self-attention processing on the attribute information of the displayed product to obtain the first self-attention processing result, the first self-attention processing model may be used to perform self-attention processing on the attribute information and posture information of the displayed product to obtain the first self-attention processing result, wherein the posture detection may be skeleton detection. The posture information may be a skeleton (pose) diagram obtained by skeleton detection.

可选地，用户上传的任意姿态的模特图可以经过骨架检测。Optionally, images of models in any pose uploaded by users can be subjected to skeleton detection.

可选地，用户上传了任意姿态的模特图，这可以是实际的照片或者是模拟的图像。通过开源的实时多人姿态估计库(Openpose)等人体姿势估计的算法，对上传的模特图进行骨架检测，得到相应的骨架图(pose)。这个过程会识别模特身体的关键关节和姿势，生成一个包含姿势信息的骨架图。其中，Openpose可以检测人体的骨骼关键点，并估计出人体的姿势。这个库使用深度学习技术，基于卷积神经网络和计算机视觉算法，能够从图像或视频中精确地检测出人体的关键部位，例如头部、肩膀、手肘、手腕、臀部、膝盖、脚踝等关键点。Openpose还能够进行多人姿态估计，即使在有多人重叠的情况下，也能够准确地分割出每个人的姿势。这使得Openpose在诸如人体动作分析、人体姿势识别、虚拟试衣等领域具有重要的应用价值。Optionally, the user uploads a model image in any pose, which can be an actual photo or a simulated image. Through the open source real-time multi-person pose estimation library (Openpose) and other human pose estimation algorithms, the uploaded model image is subjected to skeleton detection to obtain the corresponding skeleton image (pose). This process identifies the key joints and postures of the model's body and generates a skeleton image containing posture information. Among them, Openpose can detect the key points of the human skeleton and estimate the human posture. This library uses deep learning technology, based on convolutional neural networks and computer vision algorithms, and can accurately detect key parts of the human body from images or videos, such as key points such as the head, shoulders, elbows, wrists, hips, knees, ankles, etc. Openpose can also perform multi-person pose estimation, and can accurately segment each person's posture even when there are multiple people overlapping. This makes Openpose have important application value in fields such as human motion analysis, human posture recognition, and virtual fitting.

可选地，骨架图可以被作为Tryon模型的第二个标准输入，用于控制虚拟试穿模型中模特的姿势。通过结合这些骨架信息，虚拟试穿模型能够确保服装在模特身上呈现出合适的姿势和姿态。这个步骤的作用是确保虚拟试穿模型能够根据模特图中的姿势进行适当的调整和展示，从而提供更加真实和贴近实际的试穿效果。整个流程结合了图像处理和姿势控制技术，为用户提供了更加个性化和逼真的虚拟试穿体验。Optionally, the skeleton image can be used as the second standard input of the Tryon model to control the pose of the model in the virtual try-on model. By combining this skeleton information, the virtual try-on model can ensure that the clothing presents the right pose and posture on the model. The purpose of this step is to ensure that the virtual try-on model can be properly adjusted and displayed according to the pose in the model image, thereby providing a more realistic and practical try-on effect. The whole process combines image processing and posture control technology to provide users with a more personalized and realistic virtual try-on experience.

作为一种可选的实施方式，对载体进行姿态检测，得到载体的姿态信息，包括：调用姿态编码模型，对载体进行姿态编码，得到载体的骨架特征，其中，姿态信息包括骨架特征。As an optional implementation, performing posture detection on the carrier to obtain posture information of the carrier includes: calling a posture coding model, performing posture coding on the carrier, and obtaining skeleton features of the carrier, wherein the posture information includes skeleton features.

在该实施例中，在对载体进行姿态检测，得到载体的姿态信息的过程中，可以调用姿态编码模型，对载体进行姿态编码，得到载体的骨架特征。In this embodiment, in the process of performing posture detection on the carrier to obtain the posture information of the carrier, the posture coding model can be called to perform posture coding on the carrier to obtain the skeleton features of the carrier.

可选地，姿态信息可以包括骨架特征。姿态编码模型中可以包括姿态编码器(poseencoder)，以及上述的Openpose。Pose encoder可以将人体姿势的信息编码成一个向量或者特征表示，以便计算机能够更好地理解和处理姿势信息。pose encoder常用于姿势估计、动作识别、人体姿态分析等任务中。Optionally, the pose information may include skeleton features. The pose encoding model may include a pose encoder, as well as the above-mentioned Openpose. Pose encoder can encode the information of human pose into a vector or feature representation so that the computer can better understand and process the pose information. Pose encoder is commonly used in tasks such as pose estimation, action recognition, and human pose analysis.

可选地，通过pose encoder可以对所接收的模特图进行分析，得到骨架的编码特征。可以将这些编码特征与主要U-Net的原本输入特征一起作为共同的输入。这样的设置可以确保模特的姿态信息能够直接影响到最终的虚拟试穿效果，使得模特的姿态和动作能够被准确地反映在生成的服饰上身图中。Optionally, the received model image can be analyzed by a pose encoder to obtain skeleton encoding features. These encoding features can be used as common inputs together with the original input features of the main U-Net. Such a setting can ensure that the model's posture information can directly affect the final virtual fitting effect, so that the model's posture and movement can be accurately reflected in the generated clothing upper body image.

作为一种可选的实施方式，利用姿态编码模型，对载体进行姿态编码，得到载体的骨架特征，包括：获取对象图像，其中，对象图像的图像内容包括允许穿戴待展示产品的任意姿态的对象；利用姿态编码模型识别对象图像中的对象，且将识别出的对象确定为载体，对载体进行姿态编码，得到载体的骨架特征。As an optional implementation, a posture coding model is used to perform posture coding on a carrier to obtain skeleton features of the carrier, including: acquiring an object image, wherein the image content of the object image includes an object in any posture that allows the product to be worn to be displayed; using the posture coding model to identify the object in the object image, and determining the identified object as a carrier, performing posture coding on the carrier to obtain skeleton features of the carrier.

在该实施例中，在利用姿态编码模型，对载体进行姿态编码，得到骨架特征的过程中，可以获取对象图像。利用姿态编码模型可以对对象图像中的对象进行识别，且将识别出的对象确定为载体，对载体进行姿态编码，得到载体的骨架特征，其中，对象图像的图像内容中可以包括允许穿戴待展示产品的任意姿态的对象，对象图像可以为模特图。骨架特征可以用于表示载体在展示正穿戴的产品时的姿势，可以为通过pose encoder所得到的骨架的编码特征。In this embodiment, an object image can be obtained in the process of using a posture coding model to perform posture coding on a carrier to obtain skeleton features. The posture coding model can be used to identify an object in the object image, and the identified object is determined as a carrier, and the carrier is posture-encoded to obtain the skeleton features of the carrier, wherein the image content of the object image may include an object in any posture that allows the product to be displayed to be worn, and the object image may be a model image. The skeleton features can be used to represent the posture of the carrier when displaying the product being worn, and can be the encoding features of the skeleton obtained by the pose encoder.

可选地，从用户上传的模特图中进行骨架检测，并将检测得到的骨架图作为Tryon模型的标准输入，以控制生成图中模特的姿势。Optionally, skeleton detection is performed on the model image uploaded by the user, and the detected skeleton image is used as the standard input of the Tryon model to control the pose of the model in the generated image.

可选地，用户可以上传任意姿态的模特图，这可能是模特在不同动作或姿势下的照片或者图像。骨架检测可以通过OpenPose，由于OpenPose是一种用于人体姿势估计的开源库，能够在图像中检测出人体的骨架关键点，包括头部、肩膀、手肘、手腕、臀部、膝盖、脚踝等关键点。因此，通过OpenPose，可以对用户上传的模特图进行骨架检测，得到相应的骨架图。上述过程所得到的骨架图将作为Tryon模型的输入之一，用于控制生成图中模特的姿势。这样的设置可以确保生成的虚拟试穿图像中的模特姿势与用户上传的模特图保持一致，从而提供更加真实和贴近实际的试穿效果。Optionally, users can upload pictures of models in any pose, which may be photos or images of models in different actions or poses. Skeleton detection can be performed through OpenPose. Since OpenPose is an open source library for human pose estimation, it can detect the key points of the human skeleton in the image, including the head, shoulders, elbows, wrists, hips, knees, ankles and other key points. Therefore, through OpenPose, skeleton detection can be performed on the model picture uploaded by the user to obtain the corresponding skeleton map. The skeleton map obtained by the above process will be used as one of the inputs of the Tryon model to control the pose of the model in the generated image. Such a setting can ensure that the pose of the model in the generated virtual try-on image is consistent with the model picture uploaded by the user, thereby providing a more realistic and practical try-on effect.

可选地，通过上述方法，Tryon模型可以根据用户上传的模特图中的姿势信息，生成与之对应的虚拟试穿效果。这将为用户提供了更加个性化和逼真的试穿体验，同时也有助于提高产品展示的效果和吸引力。Optionally, through the above method, the Tryon model can generate a virtual try-on effect corresponding to the pose information in the model picture uploaded by the user. This will provide users with a more personalized and realistic try-on experience, and also help to improve the effect and attractiveness of product display.

作为一种可选的实施方式，该方法还包括：利用图像处理模型中的第一残差网络模型，对骨架特征进行残差学习；利用第一自注意力处理模型，对待展示产品的属性信息和姿态信息进行自注意力处理，得到第一自注意力处理结果，包括：利用第一自注意力处理模型，对待展示产品的属性信息和学习后的骨架特征进行自注意力处理，得到第一自注意力处理结果。As an optional implementation, the method also includes: using a first residual network model in the image processing model to perform residual learning on skeleton features; using a first self-attention processing model to perform self-attention processing on attribute information and posture information of the product to be displayed, and obtaining a first self-attention processing result, including: using the first self-attention processing model to perform self-attention processing on the attribute information of the product to be displayed and the learned skeleton features, and obtaining a first self-attention processing result.

在该实施例中，可以利用图像处理模型中的第一残差网络模型，来对骨架特征进行残差学习。在利用第一自注意力处理模型，对待展示产品的属性信息和姿态信息进行自注意力处理的过程中，可以利用第一自注意力处理模型，对待展示产品的属性信息，和学习后的骨架特征进行自注意力处理，得到第一自注意力处理结果，其中，第一残差网络模型可以为残差网络(Res idual Network，简称为ResNet)模块。In this embodiment, the first residual network model in the image processing model can be used to perform residual learning on the skeleton features. In the process of using the first self-attention processing model to perform self-attention processing on the attribute information and posture information of the displayed product, the first self-attention processing model can be used to perform self-attention processing on the attribute information of the displayed product and the learned skeleton features to obtain the first self-attention processing result, wherein the first residual network model can be a residual network (ResNet) module.

可选地，在利用ResNet对pose encoder所得到的骨架的编码特征进行残差学习的过程中，可以定义一个残差块，包括一个或多个卷积层和激活函数。这些层可以用于学习骨架特征中的残差信息。将pose encoder得到的骨架编码特征作为输入，经过残差块的处理后，将学习到的残差特征与原始输入特征相加。这样的设置允许网络学习输入特征和期望输出之间的差异，从而提高模型的性能和训练效果。可以将带有残差连接的残差块堆叠成深层网络，以进一步学习和提取骨架的编码特征。这样的设计可以帮助网络更好地捕捉和利用输入的骨架信息，提高模型的表征能力。Optionally, in the process of residual learning of the encoded features of the skeleton obtained by the pose encoder using ResNet, a residual block can be defined, including one or more convolutional layers and activation functions. These layers can be used to learn residual information in the skeleton features. The skeleton encoded features obtained by the pose encoder are taken as input, and after being processed by the residual block, the learned residual features are added to the original input features. Such a setting allows the network to learn the difference between the input features and the expected output, thereby improving the performance and training effect of the model. Residual blocks with residual connections can be stacked into a deep network to further learn and extract the encoded features of the skeleton. Such a design can help the network better capture and utilize the input skeleton information and improve the representation ability of the model.

可选地，通过上述过程，可以利用ResNet对pose encoder所得到的骨架的编码特征进行残差学习，从而提高模型对骨架信息的学习能力和表征能力。这将有助于改善模型在姿态估计和相关任务中的性能。Optionally, through the above process, ResNet can be used to perform residual learning on the encoded features of the skeleton obtained by the pose encoder, thereby improving the model's learning and representation capabilities for skeleton information. This will help improve the performance of the model in pose estimation and related tasks.

作为一种可选的实施方式，对产品图像进行检测，得到产品图像中待展示产品的属性信息，包括：调用图像特征提取模型，从产品图像中提取出原始图像特征；利用图像处理模型中的第二残差网络模型，对原始图像特征进行残差学习；利用图像处理模型中的第二自注意力处理模型，对学习后的原始图像特征进行自注意力处理，得到第二自注意力处理结果，其中，第二自注意力处理结果用于表示尺寸大于尺寸阈值的图像特征，第二自注意力处理结果用于表示待展示产品的属性信息；利用第一自注意力处理模型，对待展示产品的属性信息和学习后的骨架特征进行自注意力处理，得到第一自注意力处理结果，包括：利用第一自注意力处理模型，接收第二自注意力处理模型输出的第二自注意力处理结果，且对第二自注意力处理结果和学习后的骨架特征进行自注意力处理，得到第一自注意力处理结果。As an optional implementation, a product image is detected to obtain attribute information of a product to be displayed in the product image, including: calling an image feature extraction model to extract original image features from the product image; using a second residual network model in the image processing model to perform residual learning on the original image features; using a second self-attention processing model in the image processing model to perform self-attention processing on the learned original image features to obtain a second self-attention processing result, wherein the second self-attention processing result is used to represent image features whose size is greater than a size threshold, and the second self-attention processing result is used to represent the attribute information of the product to be displayed; using a first self-attention processing model to perform self-attention processing on the attribute information of the product to be displayed and the learned skeleton features to obtain a first self-attention processing result, including: using the first self-attention processing model to receive the second self-attention processing result output by the second self-attention processing model, and performing self-attention processing on the second self-attention processing result and the learned skeleton features to obtain the first self-attention processing result.

在该实施例中，在对产品图像进行检测，得到产品图像中待展示产品的属性信息的过程中，可以调用图像特征提取模型，从产品图像中提取出原始图像特征。可以利用图像处理模型中的第二残差网络模型，对原始图像特征进行残差学习。可以利用图像处理模型中的第二自注意力处理模型，来对学习后的原始图像特征进行自注意力处理，得到第二自注意力处理结果。可以利用第一自注意力处理模型，接收第二自注意力处理模型输出的第二自注意力处理结果，且对第二自注意力处理结果和学习户的骨架特征进行自注意力处理，得到第一自注意力处理结果。In this embodiment, in the process of detecting the product image and obtaining the attribute information of the product to be displayed in the product image, the image feature extraction model can be called to extract the original image features from the product image. The second residual network model in the image processing model can be used to perform residual learning on the original image features. The second self-attention processing model in the image processing model can be used to perform self-attention processing on the learned original image features to obtain a second self-attention processing result. The first self-attention processing model can be used to receive the second self-attention processing result output by the second self-attention processing model, and the second self-attention processing result and the skeleton features of the learner can be self-attention processed to obtain the first self-attention processing result.

可选地，第二残差网络模型与第一残差网络模型对应，可以为ResNet。图像特征提取模块可以为图像编码网络模块，比如，可以为VAE。原始图像特征可以为产品图像中原本的特征。第二自注意力处理模型可以与第一自注意力处理对应，可以包括Self Attent ion以及Cross Attent ion。第二自注意力处理结果可以用于表示尺寸大于尺寸阈值的图像特征。第二自注意力处理结果可以用于表示待展示产品的属性信息。Optionally, the second residual network model corresponds to the first residual network model and may be ResNet. The image feature extraction module may be an image encoding network module, for example, may be VAE. The original image features may be the original features in the product image. The second self-attention processing model may correspond to the first self-attention processing and may include Self Attention and Cross Attention. The second self-attention processing result may be used to represent image features whose size is greater than a size threshold. The second self-attention processing result may be used to represent attribute information of the product to be displayed.

可选地，尺寸阈值可以为预先设置的尺寸大小，也可以为根据实际情况自行设置的尺寸大小，比如，可以预先设置为1024*1024。若尺寸大于尺寸阈值，则可以说明图像特征为大尺寸。需要说明的是，上述尺寸阈值的大小和设置方式仅为举例说明，此处不做具体限制。Optionally, the size threshold may be a preset size, or a size set according to actual conditions, for example, it may be preset to 1024*1024. If the size is greater than the size threshold, it may be indicated that the image feature is a large size. It should be noted that the size and setting method of the above-mentioned size threshold are only for example description and are not specifically limited here.

可选地，在本申请实施例中，服饰的知识注入，在TryOn模型中可以使用了参考(Reference)U-Net网络来提取大尺寸(比如，1024*1024)的服饰图像特征。在每一个SelfAttent ion模块中，将提取的服饰图像特征直接注入到主要的U-Net中，与原本的特征直接串联。这样的设置可以确保服饰图像的详细特征能够有效地融入到主要的U-Net中，从而影响最终的虚拟试穿效果，其中，Unet网络提取大尺寸(1024*1024)的图像特征是指Unet网络在处理图像时，能够捕获到较高分辨率的图像特征。通常情况下，Unet网络常用于图像分割任务，它的结构设计使得可以从输入的大尺寸图像中提取详细和丰富的特征信息，其中，Reference U-Net也可以称为ReferenceNet或者Reference unet。Optionally, in an embodiment of the present application, for knowledge injection of clothing, a reference U-Net network may be used in the TryOn model to extract large-size (e.g., 1024*1024) clothing image features. In each SelfAttention module, the extracted clothing image features are directly injected into the main U-Net and directly connected in series with the original features. Such a setting can ensure that the detailed features of the clothing image can be effectively integrated into the main U-Net, thereby affecting the final virtual try-on effect, wherein the Unet network extracts large-size (1024*1024) image features, which means that the Unet network can capture higher-resolution image features when processing images. Typically, the Unet network is often used for image segmentation tasks, and its structural design enables detailed and rich feature information to be extracted from large-size input images, wherein the Reference U-Net may also be referred to as ReferenceNet or Reference unet.

可选地，对于虚拟试穿技术而言，Unet网络提取大尺寸图像特征可能意味着模型能够更好地理解服饰图像中的细节、纹理和其他重要信息，使得最终的虚拟试穿效果更加逼真和符合期望。这也说明了该技术对于处理高分辨率图像有着较强的适应性，能够在处理图像时保持较高的细节和信息丰富度。因此，Unet网络提取大尺寸的图像特征表示模型能够更好地理解和利用高分辨率图像中的信息，以提高虚拟试穿效果的真实感和质量。Alternatively, for virtual try-on technology, the Unet network extracting large-scale image features may mean that the model can better understand the details, textures, and other important information in the clothing image, making the final virtual try-on effect more realistic and in line with expectations. This also shows that the technology has a strong adaptability to processing high-resolution images and can maintain high details and information richness when processing images. Therefore, the Unet network extracting large-scale image features indicates that the model can better understand and utilize the information in high-resolution images to improve the realism and quality of the virtual try-on effect.

举例而言，服饰的知识注入依赖于一支Reference Unet网络提取大尺寸的图像特征，在每一个Self Attent ion模块中向主要U-Net模型(main Unet)直接注入,与原本的特征直接串联。For example, the knowledge injection of clothing relies on a Reference Unet network to extract large-scale image features, which are directly injected into the main U-Net model (main Unet) in each Self Attention module and directly connected in series with the original features.

可选地，本申请实施例中的U-Net模型中可以包括Reference Unet以及mainUnet。Optionally, the U-Net model in the embodiment of the present application may include Reference Unet and mainUnet.

作为一种可选的实施方式，该方法还包括：利用图像处理模型中的第二交叉注意力处理模型，对第二自注意力处理结果和文本特征进行交叉注意力处理，得到第二交叉注意力处理结果；将第二交叉注意力处理结果，确定为原始图像特征，将第二残差网络模型在图像处理模型中的下一残差网络模型，确定为第二残差网络模型，将第二自注意力处理模型在图像处理模型中的下一自注意力处理模型，确定为第二自注意力处理模型，将第二交叉注意力处理模型在图像处理模型中的下一交叉注意力处理模型，确定为第二交叉注意力处理模型，且从以下步骤开始执行，直至第二残差网络模型在图像处理模型中未有下一残差网络模型，第二自注意力处理模型在图像处理模型中未有下一自注意力处理模型，第二交叉注意力处理模型在图像处理模型中未有下一交叉注意力处理模型：利用图像处理模型中的第二残差网络模型，对原始图像特征进行残差学习。As an optional implementation, the method also includes: using a second cross-attention processing model in the image processing model to perform cross-attention processing on the second self-attention processing result and the text feature to obtain a second cross-attention processing result; determining the second cross-attention processing result as the original image feature, determining the next residual network model of the second residual network model in the image processing model as the second residual network model, determining the next self-attention processing model of the second self-attention processing model in the image processing model as the second self-attention processing model, and determining the next cross-attention processing model of the second cross-attention processing model in the image processing model as the second cross-attention processing model, and starting from the following steps until the second residual network model has no next residual network model in the image processing model, the second self-attention processing model has no next self-attention processing model in the image processing model, and the second cross-attention processing model has no next cross-attention processing model in the image processing model: using the second residual network model in the image processing model to perform residual learning on the original image feature.

在该实施例中，可以利用图像处理模型中的第二交叉注意力处理模型，对第二自注意力处理模型和文本特征进行交叉注意力处理，得到第二交叉注意力处理结果。可以将第二交叉注意力处理结果，确定为原始图像特征，将第二残差网络模型在图像处理模型中的下一残差网络模型，确定为第二残差网络模型。可以将第二自注意力处理模型在图形处理模型中的下一自注意力处理模型，确定为第二自注意力处理模型。可以将第二交叉注意力处理模型在图像处理模型中的下一交叉注意力处理模型，确定为第二交叉注意力处理模型。直至第二残差网络模型在图像处理模型中未有下一残差网络模型，第二自注意力处理模型在图像处理模型中未有下一自注意力处理模型，第二交叉注意力处理模型在图像处理模型中未有下一交叉注意力处理模型，可以利用图像处理模型中的第二残差网络模型，对原始图像特征进行残差学习。In this embodiment, the second cross-attention processing model in the image processing model can be used to perform cross-attention processing on the second self-attention processing model and the text feature to obtain a second cross-attention processing result. The second cross-attention processing result can be determined as the original image feature, and the next residual network model of the second residual network model in the image processing model can be determined as the second residual network model. The next self-attention processing model of the second self-attention processing model in the graphic processing model can be determined as the second self-attention processing model. The next cross-attention processing model of the second cross-attention processing model in the image processing model can be determined as the second cross-attention processing model. Until the second residual network model has no next residual network model in the image processing model, the second self-attention processing model has no next self-attention processing model in the image processing model, and the second cross-attention processing model has no next cross-attention processing model in the image processing model, the second residual network model in the image processing model can be used to perform residual learning on the original image feature.

可选地，本申请实施例中提到了一种多层级神经网络结构的TryOn模型，包括多层交叉注意力处理、自注意力处理、残差网络等模块。第二自注意力处理结果和文本特征进行交叉注意力处理，也即，自注意力处理结果和文本特征进行交叉注意力处理，将图像处理中的自注意力结果与文本特征进行关联和整合。第二交叉注意力处理结果也即，经过交叉注意力处理后得到的结果，可以为一个结合了图像处理结果和文本特征的新特征表示。将第二交叉注意力处理结果确定为原始图像特征，强调这一结果将被视为下一步骤的输入。本申请实施例中的这一结构涉及了复杂的神经网络架构和多种深度学习技术，用于图像和文本特征的融合和学习。Optionally, a TryOn model of a multi-layer neural network structure is mentioned in an embodiment of the present application, including multi-layer cross-attention processing, self-attention processing, residual network and other modules. The second self-attention processing result and the text feature are subjected to cross-attention processing, that is, the self-attention processing result and the text feature are subjected to cross-attention processing, and the self-attention result in the image processing is associated and integrated with the text feature. The second cross-attention processing result, that is, the result obtained after the cross-attention processing, can be a new feature representation combining the image processing result and the text feature. The second cross-attention processing result is determined as the original image feature, emphasizing that this result will be regarded as the input of the next step. This structure in the embodiment of the present application involves a complex neural network architecture and a variety of deep learning techniques for the fusion and learning of image and text features.

在本申请实施例中，多层级结构允许在不同层次上对图像和文本特征进行融合。这有助于TryOn模型更好地理解和利用不同类型特征之间的关系，从而提高TryOn模型的表征能力和泛化能力。多层级结构允许在不同层次上学习特征的抽象表示。通过逐层的学习和提取，TryOn模型可以逐渐理解和利用更加抽象和复杂的特征，从而提高对数据的表征和理解能力。多层级结构可以在不同层次上对不同层次的信息进行关注和整合。这有助于TryOn模型更全面地理解输入数据，并从多个角度对其进行分析和处理。通过多层级结构，TryOn模型可以更好地利用数据中的信息，从而提高TryOn模型的性能和效果。这种结构允许TryOn模型更好地利用数据中的丰富信息，提高TryOn模型在复杂任务中的表现。多层级结构可以帮助TryOn模型更好地理解输入数据的内在结构和特点，从而提高TryOn模型的泛化能力，使其对新数据的适应能力更强。In the embodiment of the present application, the multi-level structure allows the fusion of image and text features at different levels. This helps the TryOn model to better understand and utilize the relationship between different types of features, thereby improving the representation and generalization capabilities of the TryOn model. The multi-level structure allows the abstract representation of features to be learned at different levels. Through layer-by-layer learning and extraction, the TryOn model can gradually understand and utilize more abstract and complex features, thereby improving the representation and understanding capabilities of data. The multi-level structure can focus on and integrate information at different levels at different levels. This helps the TryOn model to understand the input data more comprehensively and analyze and process it from multiple angles. Through the multi-level structure, the TryOn model can better utilize the information in the data, thereby improving the performance and effect of the TryOn model. This structure allows the TryOn model to better utilize the rich information in the data and improve the performance of the TryOn model in complex tasks. The multi-level structure can help the TryOn model better understand the inherent structure and characteristics of the input data, thereby improving the generalization ability of the TryOn model and making it more adaptable to new data.

综上所述，多层级结构具有特征融合、多层级学习、关注不同层次信息、提高TryOn模型性能和提高泛化能力等多个方面的好处和有益效果，有助于提高TryOn模型的表征能力和任务性能。In summary, the multi-level structure has many benefits and beneficial effects, such as feature fusion, multi-level learning, focusing on information at different levels, improving the performance of the TryOn model and improving the generalization ability, which helps to improve the representation ability and task performance of the TryOn model.

作为一种可选的实施方式，该方法还包括：将第一交叉注意力处理结果，确定为骨架特征；将第一残差网络模型在图像处理模型中的下一残差网络模型，确定为第一残差网络模型，将第一自注意力处理模型在图像处理模型中的下一自注意力处理模型，确定为第一自注意力处理模型，将第一交叉注意力处理模型在图像处理模型中的下一交叉注意力处理模型，确定为第一交叉注意力处理模型，且从以下步骤开始执行，直至第一残差网络模型在图像处理模型中未有下一残差网络模型，第一自注意力处理模型在图像处理模型中未有下一自注意力处理模型，第一交叉注意力处理模型在图像处理模型中未有下一交叉注意力处理模型：利用图像处理模型中的第一残差网络模型，对骨架特征进行残差学习，其中，第二自注意力处理模型与第一自注意力处理模型一一对应。As an optional implementation, the method also includes: determining the first cross-attention processing result as a skeleton feature; determining the next residual network model of the first residual network model in the image processing model as the first residual network model, determining the next self-attention processing model of the first self-attention processing model in the image processing model as the first self-attention processing model, determining the next cross-attention processing model of the first cross-attention processing model in the image processing model as the first cross-attention processing model, and starting from the following steps until the first residual network model has no next residual network model in the image processing model, the first self-attention processing model has no next self-attention processing model in the image processing model, and the first cross-attention processing model has no next cross-attention processing model in the image processing model: using the first residual network model in the image processing model to perform residual learning on the skeleton features, wherein the second self-attention processing model corresponds one-to-one to the first self-attention processing model.

在该实施例中，可以将第一交叉注意力处理结果，确定为骨架特征。可以将第一残差网络模型在图像处理模型中的下一残差网络模型，确定为第一残差网络模型，可以将第一自注意力处理模型在图像处理模型中的下一自注意力处理模型，确定为第一自注意力处理模型，可以将第一交叉注意力处理模型在图像处理模型中的下一交叉注意力处理模型，确定为第一交叉注意力处理模型。直至第一残差网络模型在图像处理模型中不具有下一残差网络模型，第一自注意力处理模型在图像处理模型中不具有下一自注意力处理模型，第一交叉注意力处理模型在图像处理模型中不具有下一交叉注意力处理模型，可以利用图像处理模型中的第一残差网络模型来对骨架特征进行残差学习，其中，第二自注意力处理模型与第一自注意力处理模型一一对应。In this embodiment, the first cross-attention processing result can be determined as a skeleton feature. The next residual network model of the first residual network model in the image processing model can be determined as the first residual network model, the next self-attention processing model of the first self-attention processing model in the image processing model can be determined as the first self-attention processing model, and the next cross-attention processing model of the first cross-attention processing model in the image processing model can be determined as the first cross-attention processing model. Until the first residual network model does not have a next residual network model in the image processing model, the first self-attention processing model does not have a next self-attention processing model in the image processing model, and the first cross-attention processing model does not have a next cross-attention processing model in the image processing model, the first residual network model in the image processing model can be used to perform residual learning on the skeleton features, wherein the second self-attention processing model corresponds one-to-one to the first self-attention processing model.

可选地，本申请实施例中的TryOn模型具有一个复杂的多层级神经网络结构。将第一交叉注意力处理结果确定为骨架特征，可能意味着强调这一结果将被视为下一步骤的输入。将第一残差网络模型在图像处理模型中的下一残差网络模型，确定为第一残差网络模型，也即，将第一残差网络模型作为下一步骤的输入。利用图像处理模型中的第一残差网络模型，对骨架特征进行残差学习，也即，对骨架特征进行残差学习的过程，利用第一残差网络模型进行特征的学习和提取。这种复杂的循环结构可以是为了对图像中的骨架特征进行深度学习和提取。这种多层级结构有助于TryOn模型更深入地理解和学习图像中的骨架信息，提高TryOn模型对姿态和动作的理解能力。Optionally, the TryOn model in the embodiment of the present application has a complex multi-layer neural network structure. Determining the first cross-attention processing result as a skeleton feature may mean emphasizing that this result will be regarded as the input of the next step. The next residual network model of the first residual network model in the image processing model is determined as the first residual network model, that is, the first residual network model is used as the input of the next step. The first residual network model in the image processing model is used to perform residual learning on the skeleton features, that is, in the process of performing residual learning on the skeleton features, the first residual network model is used to learn and extract features. This complex loop structure can be for deep learning and extraction of skeleton features in images. This multi-layer structure helps the TryOn model to more deeply understand and learn the skeleton information in the image, and improves the TryOn model's ability to understand posture and movement.

在本申请实施例中，TryOn模型架构充分考虑了服饰图像特征、模特姿态信息和用户输入的文字描述，通过详细的知识注入设计，确保了这些重要信息能够有效地影响最终的虚拟试穿效果。这样的架构在虚拟试穿技术中具有重要的意义，为用户提供了更加真实和个性化的试穿体验。In the embodiment of the present application, the TryOn model architecture fully considers the clothing image features, model posture information and text descriptions entered by the user, and through detailed knowledge injection design, ensures that these important information can effectively affect the final virtual try-on effect. Such an architecture is of great significance in virtual try-on technology, providing users with a more realistic and personalized try-on experience.

作为一种可选的实施方式，该方法还包括：对目标图像进行图像增强；输出增强后的目标图像。As an optional implementation, the method further includes: performing image enhancement on the target image; and outputting the enhanced target image.

在该实施例中，还可以对目标图像进行图像增强，并输出增强后的目标图像。In this embodiment, the target image may also be enhanced, and the enhanced target image may be output.

可选地，通过图像增强策略，对图像处理模型所生成的目标图像进行处理，以进一步提升图像的质量和逼真度，从而得到最终的目标图像，其中，图像增强策略中可以包括调整锐化、对比度、饱和度和亮度等。需要说明的是，上述的图像增强策略仅为举例说明，此处不做具体限制。Optionally, the target image generated by the image processing model is processed by an image enhancement strategy to further improve the quality and realism of the image, thereby obtaining a final target image, wherein the image enhancement strategy may include adjusting sharpness, contrast, saturation, brightness, etc. It should be noted that the above-mentioned image enhancement strategy is only for illustration and is not specifically limited here.

在本申请实施例中，通过上述的图像增强策略有助于改进虚拟试穿结果的视觉效果，使得最终呈现的虚拟试穿图像更加清晰、生动，并且符合用户的审美要求。这样的处理可以有效地增强图像的效果，提供更加逼真的虚拟试穿效果，从而更好地满足用户的期望。In the embodiment of the present application, the above-mentioned image enhancement strategy helps to improve the visual effect of the virtual try-on result, so that the virtual try-on image finally presented is clearer and more vivid, and meets the aesthetic requirements of the user. Such processing can effectively enhance the effect of the image and provide a more realistic virtual try-on effect, thereby better meeting the expectations of the user.

可选地，将从TryOn模型输出的服饰上身图进行图像增强，以获得最终的虚拟试穿结果。图像增强是一种图像处理技术，可以通过调整图像的各种属性来改善图像的视觉质量。这些调整可以包括锐化、对比度增强、饱和度调整以及亮度调整等。Optionally, the clothing image output from the TryOn model is enhanced to obtain the final virtual try-on result. Image enhancement is an image processing technique that can improve the visual quality of an image by adjusting various attributes of the image. These adjustments can include sharpening, contrast enhancement, saturation adjustment, and brightness adjustment.

可选地，锐化图像可以增强图像中的细节和边缘，使得服饰上身图更加清晰和真实。增加对比度可以使图像中的颜色和明暗区域更加鲜明，使得服饰上身图更加生动和吸引人。调整饱和度可以增加或减少图像中颜色的强度和鲜艳度，以使服饰上身图的颜色更加逼真和吸引人。调整亮度可以改变图像整体的明亮度，以使服饰上身图在不同光照条件下呈现出满足条件的效果。通过上述图像增强策略，可以确保最终的虚拟试穿结果达到满足条件的视觉效果，并能够更好地满足用户的需求和期望。这种图像增强技术可以提高虚拟试穿效果的真实感和吸引力，为用户提供更加逼真和个性化的试穿体验。Optionally, sharpening the image can enhance the details and edges in the image, making the clothing upper body picture clearer and more realistic. Increasing the contrast can make the colors and light and dark areas in the image more distinct, making the clothing upper body picture more vivid and attractive. Adjusting the saturation can increase or decrease the intensity and vividness of the colors in the image to make the colors of the clothing upper body picture more realistic and attractive. Adjusting the brightness can change the overall brightness of the image so that the clothing upper body picture presents a satisfactory effect under different lighting conditions. Through the above-mentioned image enhancement strategy, it can be ensured that the final virtual try-on result achieves a satisfactory visual effect and can better meet the needs and expectations of users. This image enhancement technology can improve the realism and attractiveness of the virtual try-on effect and provide users with a more realistic and personalized try-on experience.

作为一种可选的实施方式，文本描述信息包括以下至少之一：待展示产品在展示过程中的展示策略信息、待生成的目标图像的背景内容、载体的属性信息。As an optional implementation manner, the text description information includes at least one of the following: display strategy information of the product to be displayed during the display process, background content of the target image to be generated, and attribute information of the carrier.

在该实施例中，文本描述信息中可以至少包括以下其中之一：待展示产品在展示过程中的展示策略信息、待生成的目标图像的背景内容、载体的属性信息，其中，展示策略信息可以用于表示展示产品的方式，比如，若待展示产品为服饰，则展示策略信息可以为服饰是否开衫、挽袖或者叠穿等展示方式。背景内容可以为展示产品时的画面背景，比如，可以为白墙、可以为山水风景等。属性信息可以为模特属性，比如，可以为性别、发型、佩饰、身高、体重和身形等。In this embodiment, the text description information may include at least one of the following: display strategy information of the product to be displayed during the display process, background content of the target image to be generated, and attribute information of the carrier, wherein the display strategy information may be used to indicate the way of displaying the product, for example, if the product to be displayed is clothing, the display strategy information may be whether the clothing is open-necked, rolled-up, or layered. The background content may be the screen background when the product is displayed, for example, a white wall, a landscape, etc. The attribute information may be model attributes, for example, gender, hairstyle, accessories, height, weight, and body shape, etc.

需要说明的是，上述的展示策略信息、背景内容和属性信息仅为举例说明，此处不做具体限制，只要是能够反映出用户对展示产品的穿戴效果需求的信息，均在本申请实施例的保护范围之内。It should be noted that the above-mentioned display strategy information, background content and attribute information are only for illustration and are not specifically limited here. As long as the information can reflect the user's demand for the wearing effect of the displayed product, it is within the protection scope of the embodiments of this application.

可选地，展示策略信息可以包括展示产品的方式，比如，服饰是否开衫、挽袖或者叠穿等展示方式。这种信息有助于确定展示产品的特定展示方式，从而使得生成的目标图像更加符合预期的效果，并且能够更好地满足用户的个性化需求。背景内容可以描述展示产品时的画面背景。这种信息有助于确定展示产品时的背景环境，从而使得生成的目标图像具有更加丰富和真实的视觉效果，提高了产品的展示效果。载体的属性信息可以包括模特的性别、发型、佩饰、身高、体重和身形等。这种信息有助于确定适合的模特和载体，从而使得生成的目标图像更具个性化和针对性，能够更好地展现产品效果。Optionally, the display strategy information may include the way of displaying the product, such as whether the clothing is open-necked, with rolled-up sleeves, or layered. This information helps to determine the specific display method of the product, so that the generated target image is more in line with the expected effect and can better meet the personalized needs of users. The background content can describe the background of the picture when the product is displayed. This information helps to determine the background environment when the product is displayed, so that the generated target image has a richer and more realistic visual effect, improving the display effect of the product. The attribute information of the carrier may include the model's gender, hairstyle, accessories, height, weight and body shape. This information helps to determine the appropriate model and carrier, so that the generated target image is more personalized and targeted, and can better show the product effect.

在本申请实施例中，上述信息可以帮助确定展示产品的方式、背景环境和适合的模特，从而使得生成的目标图像更具个性化和针对性，能够更好地展现产品的特点和效果。根据文本描述的信息生成的目标图像更符合用户的预期，能够提高用户对产品的认知和满意度，促进消费者的购买意愿。本描述的信息有助于确定展示产品的方式和背景环境，以及适合的模特，从而优化产品的展示效果，提高产品的吸引力和竞争力。综上所述，文本描述信息中包括展示策略信息、背景内容和载体的属性信息，有助于个性化定制、提高用户满意度和优化产品展示效果，对于现有发明具有重要的有益效果。In an embodiment of the present application, the above information can help determine the way to display the product, the background environment and the suitable model, so that the generated target image is more personalized and targeted, and can better show the characteristics and effects of the product. The target image generated according to the information described in the text is more in line with the user's expectations, can improve the user's understanding and satisfaction with the product, and promote consumers' willingness to buy. The information described in this description helps to determine the way to display the product and the background environment, as well as the suitable model, so as to optimize the display effect of the product and improve the attractiveness and competitiveness of the product. In summary, the text description information includes display strategy information, background content and carrier attribute information, which is helpful for personalized customization, improving user satisfaction and optimizing product display effects, and has important beneficial effects on existing inventions.

本申请实施例从服饰场景中还提供了一种图像的处理方法，图3是根据本申请实施例的一种图像的处理方法的流程图，如图3所示，该方法可以包括以下步骤：The embodiment of the present application also provides a method for processing an image from a clothing scene. FIG3 is a flow chart of a method for processing an image according to the embodiment of the present application. As shown in FIG3 , the method may include the following steps:

步骤S302，识别部署在电子商务平台上的虚拟服饰店铺，且从虚拟服饰店铺中识别出待处理的服饰产品图像，其中，服饰产品图像的图像内容包括至少一待展示服饰产品。Step S302, identifying a virtual clothing store deployed on an e-commerce platform, and identifying clothing product images to be processed from the virtual clothing store, wherein the image content of the clothing product images includes at least one clothing product to be displayed.

在本申请上述步骤S302提供的技术方案中，可以对电子商务平台上的虚拟服饰店铺进行识别，并可以从虚拟服饰店铺中识别出待处理的服饰产品图像，其中，服饰产品图像的图像内容可以包括待展示服饰产品。电子商务平台也可以称为电商平台。In the technical solution provided in the above step S302 of the present application, the virtual clothing store on the e-commerce platform can be identified, and the clothing product image to be processed can be identified from the virtual clothing store, wherein the image content of the clothing product image can include the clothing product to be displayed. The e-commerce platform can also be called an e-commerce platform.

可选地，在用户通过终端设备的操作界面上浏览电子商务平台中的虚拟服饰店铺中的服饰产品的过程中，若用户需要对某一服饰产品在载体上进行穿戴的穿戴表现进行确定和展示，则可以先相应的虚拟服饰店铺中确定出需要进行展示穿戴表现的服饰产品，比如，可以在操作界面上执行点击等操作，从大量的服饰产品中选中待展示服饰产品。Optionally, when a user browses clothing products in a virtual clothing store in an e-commerce platform through an operation interface of a terminal device, if the user needs to determine and display the wearing performance of a certain clothing product on a carrier, the clothing product that needs to be displayed can be determined in the corresponding virtual clothing store. For example, operations such as clicking can be performed on the operation interface to select the clothing product to be displayed from a large number of clothing products.

步骤S304，获取与服饰产品图像中待展示服饰产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示服饰产品在由载体穿戴的情况下，所展示出的穿戴表现结果。Step S304, obtaining text description information corresponding to the clothing product to be displayed in the clothing product image, wherein the text description information is used to at least describe the wearing performance result displayed when the clothing product to be displayed is worn by the carrier.

在本申请上述步骤S304提供的技术方案中，在从虚拟服饰店铺中识别出待处理的服饰产品图像之后，可以获取与服饰产品图像中待展示服饰产品对应的文本描述信息。In the technical solution provided in the above step S304 of the present application, after the clothing product image to be processed is identified from the virtual clothing store, text description information corresponding to the clothing product to be displayed in the clothing product image can be obtained.

可选地，若电商平台的用户需要获取某一服饰产品的穿戴效果，则在操作界面上确定出待展示服饰产品对应的服饰产品素材之后，可以在操作界面上相应的文本输入框中，输入能够描述对该产品进行穿戴展示的载体的外形特征，也可以输入载体对产品进行穿戴展示时的姿势特征等，还可以输入载体如何对产品进行穿戴的方式。可以将上述所输入的外形特征、穿戴方式和姿势特征的文本，作为该待展示产品的文本描述信息。Optionally, if the user of the e-commerce platform needs to obtain the wearing effect of a certain clothing product, after determining the clothing product material corresponding to the clothing product to be displayed on the operation interface, the user can enter the appearance characteristics of the carrier that can describe the wear and display of the product in the corresponding text input box on the operation interface, or the posture characteristics of the carrier when the product is worn and displayed, and the way the carrier wears the product can also be entered. The text of the above-mentioned appearance characteristics, wearing methods and posture characteristics can be used as the text description information of the product to be displayed.

步骤S306，利用文本描述信息，引导图像处理模型对服饰产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。Step S306, using the text description information to guide the image processing model to analyze the clothing product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

在本申请上述步骤S306提供的技术方案中，在获取与服饰产品图像中待展示服饰对应的文本描述信息之后，可以利用文本描述信息，引导图像处理模型来对服饰产品素材进行分析，得到目标图像。In the technical solution provided in the above step S306 of the present application, after obtaining the text description information corresponding to the clothing to be displayed in the clothing product image, the text description information can be used to guide the image processing model to analyze the clothing product material to obtain the target image.

可选地，在获取到服饰产品图像以及对应的文本描述信息之后，可以利用图像处理模型，来分析文本描述信息以及服饰产品图像，从而构造出符合文本描述信息中用户对待展示服饰产品的穿搭效果的需求的目标图像。Optionally, after obtaining the clothing product image and the corresponding text description information, an image processing model can be used to analyze the text description information and the clothing product image, so as to construct a target image that meets the user's requirements in the text description information for displaying the wearing effect of the clothing product.

举例而言，在服务器接收到服饰产品图像和文本描述信息之后，可以调用服务器的数据库中所部署的图像处理模型，来对服饰产品图像和文本描述信息进行分析，从而模拟出符合文本描述信息的，服饰产品图像在载体穿戴情况下的穿戴表现结果，并生成相应的目标图像。For example, after the server receives the clothing product image and text description information, it can call the image processing model deployed in the server's database to analyze the clothing product image and text description information, thereby simulating the wearing performance of the clothing product image when the carrier is worn in accordance with the text description information, and generating a corresponding target image.

步骤S308，将目标图像下发至电子商务平台。Step S308: sending the target image to the e-commerce platform.

在本申请上述步骤S308提供的技术方案中，在利用文本描述信息，引导图像处理模型对服饰产品图像进行分析，得到目标图像之后，可以将目标图像下发至电子商务平台。In the technical solution provided in the above step S308 of the present application, after using the text description information to guide the image processing model to analyze the clothing product image and obtaining the target image, the target image can be sent to the e-commerce platform.

可选地，在通过服务器确定出目标图像之后，可以将目标图像通过网络传输至对应的用户的终端设备上的操作界面上。可以在操作界面上的电子商务平台中，对目标图像进行显示。Optionally, after the target image is determined by the server, the target image can be transmitted to an operation interface on a corresponding user's terminal device through a network. The target image can be displayed on an e-commerce platform on the operation interface.

在本申请实施例中，识别部署在电子商务平台上的虚拟服饰店铺，且从虚拟服饰店铺中识别出待处理的服饰产品图像，其中，服饰产品图像的图像内容包括至少一待展示服饰产品；获取与服饰产品图像中待展示服饰产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示服饰产品在由载体穿戴的情况下，所展示出的穿戴表现结果；利用文本描述信息，引导图像处理模型对服饰产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；将目标图像下发至电子商务平台，从而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。In an embodiment of the present application, a virtual clothing store deployed on an e-commerce platform is identified, and a clothing product image to be processed is identified from the virtual clothing store, wherein the image content of the clothing product image includes at least one clothing product to be displayed; text description information corresponding to the clothing product to be displayed in the clothing product image is obtained, wherein the text description information is used to at least describe the wearing performance result displayed by the clothing product to be displayed when it is worn by a carrier; using the text description information, an image processing model is guided to analyze the clothing product image to obtain a target image, wherein the image processing model is obtained by training a text image model, and the image content of the target image is used to simulate the wearing performance result; the target image is sent to the e-commerce platform, thereby achieving the technical effect of improving the wearing performance result effect simulated in the image, and solving the technical problem of poor wearing performance result effect simulated in the image.

本申请实施例还从人机交互侧提供了一种图像的处理方法，图4是根据本申请实施例的一种图像的处理方法的流程图，如图4所示，该方法可以包括以下步骤：The embodiment of the present application also provides an image processing method from the human-computer interaction side. FIG4 is a flow chart of an image processing method according to the embodiment of the present application. As shown in FIG4, the method may include the following steps:

步骤S402，响应作用于操作界面上的图像输入操作，在操作界面上显示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品。Step S402, in response to an image input operation on the operation interface, displaying a product image to be processed on the operation interface, wherein the image content of the product image includes at least one product to be displayed.

在本申请上述步骤S402提供的技术方案中，在检测到操作界面上的图像输入操作之后，可以在操作界面上显示待处理的产品图像。In the technical solution provided in the above step S402 of the present application, after the image input operation on the operation interface is detected, the product image to be processed can be displayed on the operation interface.

可选地，若电商平台的用户或商家需要获取某一产品的穿戴效果，则可以在其所对应的终端设备的操作界面上执行相应的输入操作，比如，可以在操作界面上，上传待展示产品对应的产品图像，或者从操作界面上展示的大量产品的图像中筛选出所需展示穿戴效果的产品的图像，来作为产品图像。在从终端设备的操作界面上，确定出产品图像之后，可以将产品图像通过网络传输给服务器。在服务器中可以对所接收的产品图像进行识别，从而确定出需要进行穿戴效果展示的产品。Optionally, if users or merchants of the e-commerce platform need to obtain the wearing effect of a certain product, they can perform corresponding input operations on the operation interface of the corresponding terminal device. For example, they can upload the product image corresponding to the product to be displayed on the operation interface, or filter out the image of the product that needs to display the wearing effect from the images of a large number of products displayed on the operation interface as the product image. After determining the product image on the operation interface of the terminal device, the product image can be transmitted to the server through the network. The received product image can be identified in the server to determine the product that needs to be displayed for the wearing effect.

需要说明的是，上述的图像输入操作仅为举例说明，此处不做具体限制。It should be noted that the above-mentioned image input operation is only for illustration and no specific limitation is made here.

步骤S404，响应作用于操作界面上的文本输入操作，在操作界面上显示与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果。Step S404, in response to the text input operation on the operation interface, displaying text description information corresponding to the product to be displayed in the product image on the operation interface, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by the carrier.

在本申请上述步骤S404提供的技术方案中，在检测到操作界面上的文本输入操作之后，可以在操作界面上显示出与产品图像中待展示产品对应的文本描述信息。In the technical solution provided in the above step S404 of the present application, after a text input operation on the operation interface is detected, text description information corresponding to the product to be displayed in the product image can be displayed on the operation interface.

可选地，若电商平台的用户需要获取某一产品的穿戴效果，则在操作界面上，上传了产品图像之后，可以在操作界面上执行文本输入操作，比如，可以在操作界面上相应的文本输入框中，输入能够描述对该产品进行穿戴展示的载体的外形特征，也可以输入载体对产品进行穿戴展示时的姿势特征等，还可以输入载体如何对产品进行穿戴的方式。可以将上述所输入的外形特征、穿戴方式和姿势特征的文本，作为该待展示产品的文本描述信息。需要说明的是，上述的文本输入操作仅为举例说明，此处不做具体限制。Optionally, if the user of the e-commerce platform needs to obtain the wearing effect of a certain product, then after uploading the product image on the operation interface, the text input operation can be performed on the operation interface. For example, the appearance characteristics of the carrier that can describe the wearable display of the product can be entered in the corresponding text input box on the operation interface, and the posture characteristics of the carrier when the product is worn and displayed can also be entered. The way in which the carrier wears the product can also be entered. The text of the above-mentioned input appearance characteristics, wearing methods and posture characteristics can be used as the text description information of the product to be displayed. It should be noted that the above-mentioned text input operation is only for example and is not specifically limited here.

步骤S406，响应作用于操作界面上的图像生成操作，在操作界面上显示与产品图像和文本描述信息匹配的目标图像，其中，目标图像为利用文本描述信息，引导图像处理模型对产品图像进行分析得到，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。Step S406, in response to the image generation operation on the operation interface, a target image matching the product image and the text description information is displayed on the operation interface, wherein the target image is obtained by analyzing the product image by guiding the image processing model using the text description information, the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

在本申请上述步骤S406提供的技术方案中，在检测到操作界面上图像生成操作之后，可以在操作界面上显示出与产品图像和文本描述信息相匹配的目标图像。In the technical solution provided in the above step S406 of the present application, after the image generation operation on the operation interface is detected, a target image matching the product image and text description information can be displayed on the operation interface.

可选地，在操作界面上输入了产品图像和文本描述信息之后，可以在操作界面上执行图像生成操作，比如，可以点击操作界面上的“确定”控件，来触发服务器对所接收到产品图像和文本描述信息进行处理。需要说明的是，上述的图像生成操作仅为举例说明，此次不做具体限制。Optionally, after inputting the product image and text description information on the operation interface, an image generation operation can be performed on the operation interface, for example, the "OK" control on the operation interface can be clicked to trigger the server to process the received product image and text description information. It should be noted that the above-mentioned image generation operation is only for illustration and is not specifically limited at this time.

可选地，在服务器接收到产品图像和文本描述信息之后，可以调用服务器的数据库中所部署的图像处理模型，来对产品图像和文本描述信息进行分析，从而模拟出符合文本描述信息的，待展示产品在载体穿戴情况下的穿戴表现结果，并生成相应的目标图像。Optionally, after the server receives the product image and text description information, it can call the image processing model deployed in the server's database to analyze the product image and text description information, so as to simulate the wearing performance results of the product to be displayed when the carrier is worn in accordance with the text description information, and generate the corresponding target image.

可选地，在服务器确定出目标图像之后，可以通过网络将目标图像传输至终端设备。可以在用户的终端设备的操作界面上对目标图像进行显示。Optionally, after the server determines the target image, the target image can be transmitted to the terminal device via the network, and the target image can be displayed on the operation interface of the user's terminal device.

在本申请实施例中，响应作用于操作界面上的图像输入操作，在操作界面上显示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；响应作用于操作界面上的文本输入操作，在操作界面上显示与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；响应作用于操作界面上的图像生成操作，在操作界面上显示与产品图像和文本描述信息匹配的目标图像，其中，目标图像为利用文本描述信息，引导图像处理模型对产品图像进行分析得到，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果，从而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。In an embodiment of the present application, in response to an image input operation on an operation interface, a product image to be processed is displayed on the operation interface, wherein the image content of the product image includes at least one product to be displayed; in response to a text input operation on the operation interface, text description information corresponding to the product to be displayed in the product image is displayed on the operation interface, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by a carrier; in response to an image generation operation on the operation interface, a target image matching the product image and the text description information is displayed on the operation interface, wherein the target image is obtained by using the text description information to guide the image processing model to analyze the product image, the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result, thereby achieving the technical effect of improving the effect of the simulated wearing performance result in the image, and solving the technical problem of poor effect of the simulated wearing performance result in the image.

本申请实施例还提供了一种图像的处理方法，图5是根据本申请实施例的一种图像的处理方法的流程图，如图5所示，该方法可以包括以下步骤：The embodiment of the present application further provides an image processing method. FIG5 is a flow chart of an image processing method according to the embodiment of the present application. As shown in FIG5, the method may include the following steps:

步骤S502，在虚拟现实VR设备或增强现实AR设备的呈现画面上，展示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品。Step S502: displaying a product image to be processed on a presentation screen of a virtual reality (VR) device or an augmented reality (AR) device, wherein the image content of the product image includes at least one product to be displayed.

在本申请上述步骤S502提供的技术方案中，在虚拟现实(Virtual Real ity，简称为VR)设备或者增强现实(Augmented Real ity，简称为AR)设备的呈现画面上来对产品集合进行展示。In the technical solution provided in the above step S502 of the present application, the product collection is displayed on a presentation screen of a virtual reality (VR) device or an augmented reality (AR) device.

可选地，利VR设备或AR设备展示产品图像可以带来更加直观和生动的展示效果。通过VR设备，用户可以沉浸在一个虚拟的环境中，将产品图像呈现在三维空间中，用户可以360度自由观察产品的外观和细节，从不同角度全方位了解产品的样式和特性。而AR设备则可以将产品图像叠加在现实场景中，让用户通过AR眼镜或手机屏幕，直接在现实中观察产品的虚拟呈现，可以在实际的环境中观察产品的大小、比例和适应性，为用户提供更加真实的体验。通过VR和AR设备展示产品图像，可以让用户更加直观地了解产品的外观和特性，帮助用户更好地进行产品选购和决策。同时，也可以提升产品展示的吸引力和趣味性，为用户带来全新的体验感受。Optionally, using VR devices or AR devices to display product images can bring more intuitive and vivid display effects. Through VR devices, users can immerse themselves in a virtual environment, present product images in three-dimensional space, and freely observe the appearance and details of the product in 360 degrees, and fully understand the style and characteristics of the product from different angles. AR devices can superimpose product images on real scenes, allowing users to observe the virtual presentation of products directly in reality through AR glasses or mobile phone screens, and observe the size, proportion and adaptability of products in actual environments, providing users with a more realistic experience. Displaying product images through VR and AR devices can allow users to understand the appearance and characteristics of products more intuitively, helping users to better select and make product purchases. At the same time, it can also enhance the attractiveness and fun of product displays, bringing users a new experience.

步骤S504，获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果。Step S504, obtaining text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed when the product to be displayed is worn by the carrier.

在本申请上述步骤S504提供的技术方案中，可以获取与产品图像中待展示产品对应的文本描述信息。In the technical solution provided in the above step S504 of the present application, text description information corresponding to the product to be displayed in the product image can be obtained.

可选地，若电商平台的顾客或商家需要获取某一产品的穿戴效果，则在操作界面上，上传了产品图像之后，可以在操作界面上相应的文本输入框中，输入能够描述对该产品进行穿戴展示的文本描述信息。Optionally, if a customer or merchant of an e-commerce platform needs to obtain the wearing effect of a certain product, after uploading the product image on the operation interface, they can enter text description information that can describe the wearing display of the product in the corresponding text input box on the operation interface.

步骤S506，驱动VR设备或AR设备利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。Step S506, driving the VR device or AR device to use the text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

在本申请上述步骤S506提供的技术方案中，可以驱动VR设备或AR设备利用文本描述信息，来引导图像处理模型对产品图像进行分析，得到目标图像。In the technical solution provided in the above step S506 of the present application, the VR device or AR device can be driven to use the text description information to guide the image processing model to analyze the product image and obtain the target image.

可选地，连接并启动VR设备或AR设备，例如头戴式显示设备或增强现实眼镜。这些设备可以通过传感器和摄像头捕获用户的视野，并将虚拟内容叠加到现实世界中。用户可以通过语音输入或者手动输入文本描述信息，来描述他们希望分析的产品图像。VR或AR设备可以将用户输入的文本描述信息传送给图像处理模型。图像处理模型可以是一个深度学习模型，用于识别和分析图像。图像处理模型将根据用户输入的文本描述信息，从数据库或者互联网上获取相关的产品图像。然后，模型会对这些图像进行分析，识别出与用户描述信息相匹配的目标图像。一旦图像处理模型完成对产品图像的分析，它会返回一个或多个与用户描述信息相匹配的目标图像。这些目标图像可以通过VR或AR设备显示给用户，以便用户查看和确认是否符合其要求。总之，利用文本描述信息驱动VR或AR设备来引导图像处理模型对产品图像进行分析，可以帮助用户快速获得所需的目标图像，从而提高工作效率和用户体验。Optionally, a VR device or AR device, such as a head-mounted display device or augmented reality glasses, is connected and started. These devices can capture the user's field of view through sensors and cameras and overlay virtual content on the real world. Users can describe the product images they want to analyze by voice input or manually inputting text description information. The VR or AR device can transmit the text description information entered by the user to the image processing model. The image processing model can be a deep learning model for identifying and analyzing images. The image processing model will obtain relevant product images from a database or the Internet based on the text description information entered by the user. The model will then analyze these images and identify target images that match the user's description information. Once the image processing model completes the analysis of the product image, it will return one or more target images that match the user's description information. These target images can be displayed to the user through the VR or AR device so that the user can view and confirm whether they meet their requirements. In short, using text description information to drive the VR or AR device to guide the image processing model to analyze the product image can help users quickly obtain the desired target images, thereby improving work efficiency and user experience.

步骤S508，在VR设备或AR设备的呈现画面上，展示目标图像。Step S508: Display the target image on the presentation screen of the VR device or the AR device.

在本申请上述步骤S508提供的技术方案中，在VR设备或AR设备上可以对目标图像进行展示。In the technical solution provided in the above step S508 of the present application, the target image can be displayed on a VR device or an AR device.

在本申请实施例中，在虚拟现实VR设备或增强现实AR设备的呈现画面上，展示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；驱动VR设备或AR设备利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；在VR设备或AR设备的呈现画面上，展示目标图像，从而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。In an embodiment of the present application, a product image to be processed is displayed on a presentation screen of a virtual reality VR device or an augmented reality AR device, wherein the image content of the product image includes at least one product to be displayed; text description information corresponding to the product to be displayed in the product image is obtained, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by a carrier; the VR device or AR device is driven to use the text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training a text image model, and the image content of the target image is used to simulate the wearing performance result; the target image is displayed on the presentation screen of the VR device or the AR device, thereby achieving the technical effect of improving the effect of the simulated wearing performance result in the image, and solving the technical problem of poor effect of the simulated wearing performance result in the image.

实施例2Example 2

根据本申请实施例，还提供了一种图像的处理系统的实施例，图6是根据本申请实施例的一种图像的处理系统的示意图，如图6所示，图像的处理系统600可以包括：客户端601和服务器602。According to an embodiment of the present application, an embodiment of an image processing system is also provided. FIG. 6 is a schematic diagram of an image processing system according to an embodiment of the present application. As shown in FIG. 6 , the image processing system 600 may include: a client 601 and a server 602.

客户端601，用于上传待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；上传与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果。Client 601 is used to upload a product image to be processed, wherein the image content of the product image includes at least one product to be displayed; and upload text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by the carrier.

在该实施例中，可以通过客户端601上传待处理的产品图像。还可以在客户端601上，上传与产品图像中待展示产品对应的文本描述信息。In this embodiment, the product image to be processed can be uploaded through the client 601. The text description information corresponding to the product to be displayed in the product image can also be uploaded on the client 601.

可选地，若电商平台的用户或商家需要获取某一产品的穿戴效果，则可以在其所对应的客户端601的操作界面上，上传待展示产品对应的产品图像，或者从操作界面上展示的大量产品的图像中筛选出所需展示穿戴效果的产品的图像，作为产品图像。在从终端设备的操作界面上，确定出产品图像之后，可以将产品图像通过网络传输给服务器602。Optionally, if a user or merchant of an e-commerce platform needs to obtain the wearing effect of a certain product, they can upload a product image corresponding to the product to be displayed on the operation interface of the corresponding client 601, or filter out the image of the product to be displayed from the images of a large number of products displayed on the operation interface as the product image. After the product image is determined on the operation interface of the terminal device, the product image can be transmitted to the server 602 through the network.

可选地，若电商平台的用户需要获取某一产品的穿戴效果，则在操作界面上，上传了产品图像之后，可以在操作界面上相应的文本输入框中，输入能够描述对该产品进行穿戴展示的文本描述信息。可以将文本描述信息通过网络传输给服务器602。Optionally, if a user of the e-commerce platform needs to obtain the wearing effect of a certain product, after uploading the product image on the operation interface, the user can enter text description information that can describe the wearing display of the product in the corresponding text input box on the operation interface. The text description information can be transmitted to the server 602 via the network.

服务器602，用于利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；将目标图像下发至客户端。Server 602 is used to use text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result; and the target image is sent to the client.

在该实施例中，可以通过服务器602利用文本描述信息，引导图像处理模型来对产品图像进行分析，得到目标图像。In this embodiment, the server 602 may use the text description information to guide the image processing model to analyze the product image to obtain the target image.

可选地，在服务器602接收到来自客户端601的文本描述信息和产品图像之后，可以利用图像处理模型，来分析文本描述信息以及产品图像，从而构造出符合文本描述信息中用户对待展示产品的穿搭效果的需求的目标图像。Optionally, after the server 602 receives the text description information and product image from the client 601, an image processing model can be used to analyze the text description information and the product image, thereby constructing a target image that meets the user's requirements for the wearing effect of the displayed product in the text description information.

可选地，在确定出目标图像之后，可以将目标图像通过网络传输给客户端601，可以在客户端601上对目标图像进行显示。Optionally, after the target image is determined, the target image may be transmitted to the client 601 through a network, and the target image may be displayed on the client 601 .

在本申请实施例中，提供了一种图像的处理系统600。通过客户端601上传待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；上传与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；通过服务器602利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；将目标图像下发至客户端，从而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。In an embodiment of the present application, an image processing system 600 is provided. A product image to be processed is uploaded through a client 601, wherein the image content of the product image includes at least one product to be displayed; text description information corresponding to the product to be displayed in the product image is uploaded, wherein the text description information is used to at least describe the wearing performance result displayed when the product to be displayed is worn by a carrier; the server 602 uses the text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training the Wensheng graph model, and the image content of the target image is used to simulate the wearing performance result; the target image is sent to the client, thereby achieving the technical effect of improving the effect of the simulated wearing performance result in the image, and solving the technical problem of poor effect of the simulated wearing performance result in the image.

需要说明的是，本申请中所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)，比如，对进行校验的数据，均为经用户授权或者经过各方充分授权的信息和数据，并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准，并提供有相应的操作入口，供用户选择授权或者拒绝。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application, for example, the data for verification, are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

实施例3Example 3

目前，虚拟试衣技术能够减少商家推出新产品的成本，因为它可以减少实际试衣和拍摄的需要，从而提高推出新产品的效率。虚拟试衣技术利用计算机视觉和图像处理技术，将商品图片与模特图形融合，以模拟实际穿戴效果。相关的虚拟试衣技术通常需要特定的模特图片，这可能对一些商家构成挑战。Currently, virtual fitting technology can reduce the cost of launching new products by merchants because it can reduce the need for actual fitting and photo shooting, thereby improving the efficiency of launching new products. Virtual fitting technology uses computer vision and image processing technology to merge product images with model graphics to simulate actual wearing effects. Related virtual fitting technology usually requires specific model images, which may pose a challenge to some merchants.

传统情况下，商家上架新款服饰需要耗费大量资源进行模特试穿拍摄，包括模特费用、拍摄成本以及造型、背景和光线的策划。而虚拟试衣技术的出现为这一问题提供了解决方案。通过结合先进的计算机视觉和图像处理技术，虚拟试衣技术能够将商品图片与模特图形融合，模拟实际穿戴的效果。商家可以利用这项技术快速生成大量模特试穿新品的高质量图片，从中挑选出高质量的样张，大大节约时间和成本，提高工作效率。这表明虚拟试衣技术在服装行业中有着显著的商业和实际价值。Traditionally, merchants need to spend a lot of resources on model try-on photos before putting new clothing on the shelves, including model fees, shooting costs, and planning of styling, background, and lighting. The emergence of virtual fitting technology provides a solution to this problem. By combining advanced computer vision and image processing technology, virtual fitting technology can merge product images with model graphics to simulate the effect of actual wearing. Merchants can use this technology to quickly generate a large number of high-quality photos of models trying on new products, and select high-quality samples from them, greatly saving time and cost and improving work efficiency. This shows that virtual fitting technology has significant commercial and practical value in the clothing industry.

然而，相关的虚拟试衣技术通常需要一张特定的模特图片，这可能对一些商家构成挑战。这可能限制了商家在试穿服装时的灵活性，尤其是对于那些希望提供个性化试穿体验的商家而言。任意使用现有模特照片可能会触及版权问题，这可能限制了商家使用特定模特图片的自由度。受限于模特原有的体型和穿着的服装，这可能会影响试衣效果，使得虚拟试衣技术的应用受到一定的限制。总的来说，虚拟试衣技术在克服这些挑战方面仍有进一步发展的空间，例如开发更加灵活和自由度更大的虚拟试衣技术，以更好地满足商家和消费者的需求。因此，仍存在图像中模拟的穿戴表现结果效果差的技术问题。However, related virtual fitting technologies usually require a specific model picture, which may pose a challenge to some merchants. This may limit the flexibility of merchants when trying on clothes, especially for those who want to provide a personalized fitting experience. The arbitrary use of existing model photos may touch on copyright issues, which may limit the freedom of merchants to use specific model pictures. Restricted by the model's original body shape and the clothes they wear, this may affect the fitting effect, which makes the application of virtual fitting technology subject to certain restrictions. In general, there is still room for further development of virtual fitting technology in overcoming these challenges, such as developing more flexible and freer virtual fitting technology to better meet the needs of merchants and consumers. Therefore, there is still a technical problem that the simulated wearing performance results in the image are poor.

进一步地，本申请提供了一种基于文生图框架的虚拟试穿流程化方法，该方法旨在从根本上解决虚拟试衣技术目前所面临的挑战。这套方法不仅保证了模拟试穿的高度真实感，还允许商家仅通过简单的文字描述就能构建模特试穿效果，大大提高了操作的便捷性。还支持使用语义描述来个性化定制图片的背景、模特的性别和身形等特征。这意味着商家可以更轻松地进行个性化定制，而且该方法避开了版权问题，极大地提升了商家的服务体验。这种方法的创新性和灵活性将为虚拟试衣技术带来新的发展和应用前景。上述虚拟试穿流程化方法的出现填补了虚拟试穿技术的一些空白，并为商家提供了更加灵活、便捷且个性化的虚拟试穿体验，有效解决了现有虚拟试衣技术所面临的挑战。从而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。Furthermore, the present application provides a virtual fitting process method based on the Wenshengtu framework, which aims to fundamentally solve the challenges currently faced by virtual fitting technology. This method not only ensures a high degree of realism in simulated fitting, but also allows merchants to construct the model fitting effect only through simple text descriptions, greatly improving the convenience of operation. It also supports the use of semantic descriptions to personalize features such as the background of the picture, the gender and body shape of the model. This means that merchants can customize more easily, and the method avoids copyright issues, greatly improving the service experience of merchants. The innovation and flexibility of this method will bring new development and application prospects to virtual fitting technology. The emergence of the above-mentioned virtual fitting process method fills some gaps in virtual fitting technology, and provides merchants with a more flexible, convenient and personalized virtual fitting experience, effectively solving the challenges faced by existing virtual fitting technology. Thereby achieving the technical effect of improving the effect of the simulated wearing performance results in the image, and solving the technical problem of poor effect of the simulated wearing performance results in the image.

在本申请实施例中，图7是根据本申请实施例的一种基于文生图框架的虚拟试穿流程化系统的示意图，如图7所示，该系统中可以包括产品图像701、变分自动编码器702，也即，VAE 702、残差网络模块703、自注意力模块704、交叉注意力模块705、残差网络模块706、自注意力模块707、交叉注意力模块708、文本描述信息709、文本编码器710、原始图像特征711、模特图712、姿态编码器713、残差网络模块714、自注意力模块715、交叉注意力模块716、残差网络模块717、自注意力模块718、交叉注意力模块719和目标图像720。In an embodiment of the present application, Figure 7 is a schematic diagram of a virtual try-on process system based on a Wensheng graph framework according to an embodiment of the present application. As shown in Figure 7, the system may include a product image 701, a variational autoencoder 702, that is, a VAE 702, a residual network module 703, a self-attention module 704, a cross-attention module 705, a residual network module 706, a self-attention module 707, a cross-attention module 708, text description information 709, a text encoder 710, original image features 711, a model image 712, a posture encoder 713, a residual network module 714, a self-attention module 715, a cross-attention module 716, a residual network module 717, a self-attention module 718, a cross-attention module 719 and a target image 720.

可选地，如图7所示，可以采用双Unet架构通常被用于虚拟试穿模型。这种架构允许模型同时处理两个任务：一个用于提取图像的特征表示，另一个用于根据这些特征表示生成最终的虚拟试穿效果图。这种结构有助于模型更好地理解服饰图像，并生成与用户期望相符的虚拟试穿效果。因此，双Unet架构通常被用于处理虚拟试穿任务，帮助模型更好地理解服饰图像的特征和细节，从而生成逼真的虚拟试穿效果。Optionally, as shown in FIG7 , a dual Unet architecture can be used, which is usually used for virtual try-on models. This architecture allows the model to process two tasks at the same time: one for extracting feature representations of images, and the other for generating the final virtual try-on effect image based on these feature representations. This structure helps the model better understand clothing images and generate virtual try-on effects that meet user expectations. Therefore, the dual Unet architecture is usually used to process virtual try-on tasks, helping the model to better understand the features and details of clothing images, thereby generating realistic virtual try-on effects.

可选地，如图7所示，用户上传的产品图像通过一个服饰检测模块，裁减出其中的服饰主体区域，作为Tryon模型的第一个标准输入，依次送入VAE 702、以及后续的unet模型，用于提供服饰的颜色、细节纹理以及logo等知识注入信息。Optionally, as shown in FIG7 , the product image uploaded by the user passes through a clothing detection module, and the main clothing area is cropped out as the first standard input of the Tryon model, and is sequentially sent to the VAE 702 and the subsequent unet model to provide knowledge injection information such as clothing color, detailed texture, and logo.

可选地，如图7所示，用户上传任意姿态的模特图712，通过openpose进行骨架检测，得到相应的骨架图，作为Tryon模型的第二个标准输入，用于控制生图中模特姿势。Optionally, as shown in FIG7 , the user uploads a model image 712 in any pose, performs skeleton detection through openpose, and obtains a corresponding skeleton image as the second standard input of the Tryon model, which is used to control the model pose in the raw image.

可选地，如图7所示，用户输入文字描述信息709，该属性可以包括并不局限于用于指定生成图的背景、模特的性别、发型、配饰、身高、体重、身形等定制化模特属性，亦或是服饰是否开衫、挽袖、叠穿等定制化服饰属性，作为Tryon模型的第三个标准输入。Optionally, as shown in FIG. 7 , the user inputs text description information 709, which may include but is not limited to customized model attributes such as the background for specifying the generated image, the model's gender, hairstyle, accessories, height, weight, body shape, or customized clothing attributes such as whether the clothing is open-necked, with rolled-up sleeves, or layered, as the third standard input of the Tryon model.

可选地，如图7所示，Tryon模型基于上述的三种标准输入，通过一种双unet的架构实现知识注入(姿态、服饰细节)，最终输出服饰上身图，也即，目标图像720。Optionally, as shown in FIG. 7 , the Tryon model implements knowledge injection (posture, clothing details) through a dual-unet architecture based on the above three standard inputs, and finally outputs a clothing upper body image, that is, a target image 720 .

可选地，如图7所示，姿态的知识注入通过姿态编码器713得到骨架的编码特征，与main unet的原本输入特征一起作为共同输入。文字描述信息709的注入通过一个文本编码器710得到文本特征，并分别注入到双unet的Cross Attent ion模块中。将输出的目标图像720通过图像增强策略(调整锐化、对比度、饱和度、亮度)，最终得到虚拟试衣结果。Optionally, as shown in FIG7 , the knowledge injection of posture obtains the encoded features of the skeleton through the posture encoder 713, and takes them as common input together with the original input features of the main unet. The injection of text description information 709 obtains text features through a text encoder 710, and injects them into the Cross Attention module of the dual unet respectively. The output target image 720 is subjected to the image enhancement strategy (adjusting sharpness, contrast, saturation, brightness), and finally the virtual fitting result is obtained.

可选地，图8是根据本申请实施例的一种知识注入位置的示意图，如图8所示，ReferenceNet可以包括残差网络模块801、自注意力模块802、交叉注意力模块803和自注意力模块804。服饰的知识注入依赖于一支Reference unet网络提取大尺寸(1024*1024)的图像特征。自注意力模型804可以将其所处理的结果，比如，Key和Value传输至自注意力模块802中，从而可以结合两个自注意力模型中的结果，来确定最终的目标图像。Optionally, FIG8 is a schematic diagram of a knowledge injection position according to an embodiment of the present application. As shown in FIG8, ReferenceNet may include a residual network module 801, a self-attention module 802, a cross-attention module 803, and a self-attention module 804. The knowledge injection of clothing relies on a Reference unet network to extract large-size (1024*1024) image features. The self-attention model 804 can transfer its processed results, such as Key and Value, to the self-attention module 802, so that the results of the two self-attention models can be combined to determine the final target image.

可选地，图9是根据本申请实施例的一种知识注入方式的示意图，如图9所示，在每一个Self Attent ion模块中向main unet直接注入,与原本的特征直接串联，也即，参考网络中每个Self Attent ion模块可以向主要U-Net模型直接注入，比如，可以将ReferenceNet的Value和main unet中的Value可以进行串联，二者中的Key也可以进行串联。Optionally, Figure 9 is a schematic diagram of a knowledge injection method according to an embodiment of the present application. As shown in Figure 9, each Self Attent ion module is directly injected into the main unet and directly connected in series with the original features. That is, each Self Attent ion module in the reference network can be directly injected into the main U-Net model. For example, the Value of ReferenceNet and the Value in the main unet can be connected in series, and the Keys of the two can also be connected in series.

在本申请实施例中，上述方法巧妙地克服了相关虚拟试衣技术的限制，提供了一个不依赖具体模特图像的创新虚拟试穿方法。通过文到图的图像生成算法，允许完全自由地通过文字描述控制生成模特上身的效果图，无需处理复杂的掩码图制作或模特原有服饰的移除问题。在提高服饰上身的灵活性的同时，也杜绝了版权担忧。通过上述方法，用户能享受到一个无缝、自定义且高效率的虚拟试衣体验，将个性化选项提升至全新的水平。In the embodiments of the present application, the above method cleverly overcomes the limitations of related virtual fitting technologies and provides an innovative virtual fitting method that does not rely on specific model images. Through the text-to-image image generation algorithm, it is allowed to completely freely control the generation of renderings of the model's upper body through text descriptions, without having to deal with complex mask image production or removal of the model's original clothing. While improving the flexibility of clothing, it also eliminates copyright concerns. Through the above method, users can enjoy a seamless, customized and efficient virtual fitting experience, bringing personalization options to a whole new level.

在该实施例中，无需依赖于特定的模特图像，克服了传统方法需要基于实际模特拍摄照片的限制。用户可以通过文字描述来控制生成模特的上身效果，这提供了更高的自由度和定制化选项的同时，避免了可能涉及的版权问题，确保了商家的使用安心。传统的去除模特原有服饰所需的复杂掩模图生成策略在上述方法中被省略，简化了图像处理流程，能够灵活应对各种款式服饰上身的需求。In this embodiment, there is no need to rely on specific model images, overcoming the limitation of traditional methods that require taking photos based on actual models. Users can control the generated model's upper body effect through text descriptions, which provides higher degrees of freedom and customization options while avoiding possible copyright issues, ensuring that merchants can use it with peace of mind. The traditional complex mask image generation strategy required to remove the model's original clothing is omitted in the above method, simplifying the image processing process and being able to flexibly respond to the needs of wearing various styles of clothing.

在本申请实施例中，若需要对某一产品在载体上进行穿戴的穿戴表现进行确定和展示，则可以识别出该产品的待处理的产品图像。获取在载体上对该产品进行展示的文本描述信息。可以利用文本描述信息，来引导预先通过文生图模型训练的图像处理模型，对产品图像进行分析，得到能够模拟该产品在载体上的穿戴表现的目标图像。由于本申请实施例考虑到相关技术中的问题，提出了一种利用文生图框架的虚拟试衣技术，通过简单的文字描述来对产品的虚拟试衣效果进行文字表达，利用文生图模型训练出来的图像处理模型对其进行分析，即可生成高度真实的模特试衣效果图，从而消除了对特定模特图片的依赖，为用户提供具有更大自由度、灵活度和定制化选择的产品展示效果，达到降低了通过图像来模拟产品的穿戴表现结果的局限性的目的，进而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。In an embodiment of the present application, if it is necessary to determine and display the wearing performance of a certain product on a carrier, the product image to be processed of the product can be identified. Obtain text description information for displaying the product on the carrier. The text description information can be used to guide the image processing model pre-trained by the Wensheng graph model to analyze the product image and obtain a target image that can simulate the wearing performance of the product on the carrier. Since the embodiment of the present application takes into account the problems in the relevant technology, a virtual fitting technology using the Wensheng graph framework is proposed. The virtual fitting effect of the product is expressed in words through a simple text description, and the image processing model trained by the Wensheng graph model is used to analyze it, so that a highly realistic model fitting effect image can be generated, thereby eliminating the dependence on specific model pictures, providing users with a product display effect with greater freedom, flexibility and customized selection, and achieving the purpose of reducing the limitations of simulating the wearing performance results of the product through images, thereby achieving the technical effect of improving the effect of the simulated wearing performance results in the image, and solving the technical problem of poor effect of the simulated wearing performance results in the image.

图10是根据本发明实施例的一种计算机设备对图像处理的示意图，如图10所示，可以通过调用第一接口获取客户端设备上用户所发起的模拟某一产品的穿戴效果的请求，也即，图像处理请求，计算机设备从网络获取客户端设备上所接收到的产品图像以及文本描述信息；基于客户端设备上所接收的产品图像和文本描述信息，利用文本描述信息，调用并引导图像处理模型对产品图像进行分析，得到目标图像。Figure 10 is a schematic diagram of image processing by a computer device according to an embodiment of the present invention. As shown in Figure 10, a request initiated by a user on a client device to simulate the wearing effect of a certain product, that is, an image processing request, can be obtained by calling the first interface. The computer device obtains the product image and text description information received on the client device from the network; based on the product image and text description information received on the client device, the text description information is used to call and guide the image processing model to analyze the product image to obtain the target image.

本发明实施例，通过调用第一接口获取客户端设备上所接收到的产品图像以及文本描述信息；基于客户端设备上所接收的产品图像和文本描述信息，利用文本描述信息，调用并引导图像处理模型对产品图像进行分析，得到目标图像，实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。In an embodiment of the present invention, a product image and text description information received on a client device are obtained by calling a first interface; based on the product image and text description information received on the client device, the text description information is used to call and guide an image processing model to analyze the product image to obtain a target image, thereby achieving a technical effect of improving the effect of a simulated wearing performance result in the image and solving the technical problem of a poor effect of a simulated wearing performance result in the image.

实施例4Example 4

根据本申请实施例，还提供了一种用于实施上述图2所示的图像的处理方法的图像的处理装置。According to an embodiment of the present application, there is also provided an image processing device for implementing the image processing method shown in FIG. 2 .

图11是根据本申请实施例的一种图像的处理装置的示意图，如图11所示，该图像的处理装置1100可以包括：第一识别单元1102、第一获取单元1104和第一引导单元1106。FIG11 is a schematic diagram of an image processing device according to an embodiment of the present application. As shown in FIG11 , the image processing device 1100 may include: a first recognition unit 1102 , a first acquisition unit 1104 , and a first guiding unit 1106 .

第一识别单元1102，用于识别出待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品。The first recognition unit 1102 is configured to recognize a product image to be processed, wherein the image content of the product image includes at least one product to be displayed.

第一获取单元1104，用于获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果。The first acquisition unit 1104 is used to acquire text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed when the product to be displayed is worn by the carrier.

第一引导单元1106，用于利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。The first guiding unit 1106 is used to use the text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

此处上述第一识别单元1102、第一获取单元1104和第一引导单元1106对应于实施例1中的步骤S202至步骤S206，三个单元与对应的步骤所实现的实例和应用场景相同，但不限于上述实施例1所公开的内容。需要说明的是，上述单元可以是存储在存储器(例如，存储器1504)中并由一个或多个处理器(例如，处理器1502a，1502b……，1502n)处理的硬件组件或软件组件，上述单元也可以作为装置的一部分可以运行在实施例5提供的计算机终端A中。Here, the first identification unit 1102, the first acquisition unit 1104 and the first guidance unit 1106 correspond to steps S202 to S206 in Example 1. The three units and the corresponding steps implement the same examples and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1. It should be noted that the above-mentioned units can be hardware components or software components stored in a memory (e.g., memory 1504) and processed by one or more processors (e.g., processors 1502a, 1502b..., 1502n), and the above-mentioned units can also be run in the computer terminal A provided in Example 5 as part of the device.

根据本申请实施例，还提供了一种用于实施上述图3所示的服饰图像的处理方法的服饰图像的处理装置。According to an embodiment of the present application, a clothing image processing device for implementing the clothing image processing method shown in FIG. 3 is also provided.

图12是根据本申请实施例的一种服饰图像的处理装置的示意图，如图12所示，该服饰图像的处理装置1200可以包括：第二识别单元1202、第二获取单元1204、第二引导单元1206和下发单元1208。Figure 12 is a schematic diagram of a clothing image processing device according to an embodiment of the present application. As shown in Figure 12, the clothing image processing device 1200 may include: a second recognition unit 1202, a second acquisition unit 1204, a second guiding unit 1206 and a sending unit 1208.

第二识别单元1202，用于识别部署在电子商务平台上的虚拟服饰店铺，且从虚拟服饰店铺中识别出待处理的服饰产品图像，其中，服饰产品图像的图像内容包括至少一待展示服饰产品。The second identification unit 1202 is used to identify a virtual clothing store deployed on the e-commerce platform, and identify clothing product images to be processed from the virtual clothing store, wherein the image content of the clothing product image includes at least one clothing product to be displayed.

第二获取单元1204，用于获取与服饰产品图像中待展示服饰产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示服饰产品在由载体穿戴的情况下，所展示出的穿戴表现结果。The second acquisition unit 1204 is used to acquire text description information corresponding to the clothing product to be displayed in the clothing product image, wherein the text description information is used to at least describe the wearing performance result displayed when the clothing product to be displayed is worn by the carrier.

第二引导单元1206，用于利用文本描述信息，引导图像处理模型对服饰产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。The second guiding unit 1206 is used to use the text description information to guide the image processing model to analyze the clothing product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

下发单元1208，用于将目标图像下发至电子商务平台。The sending unit 1208 is used to send the target image to the e-commerce platform.

此处需要说明的是，上述第二识别单元1202、第二获取单元1204、第二引导单元1206和下发单元1208对应于实施例1中的步骤S302至步骤S308，四个单元与对应的步骤所实现的实例和应用场景相同，但不限于上述实施例1所公开的内容。需要说明的是，上述单元可以是存储在存储器(例如，存储器1504)中并由一个或多个处理器(例如，处理器1502a，1502b……，1502n)处理的硬件组件或软件组件，上述单元也可以作为装置的一部分可以运行在实施例5提供的计算机终端A中。It should be noted that the second identification unit 1202, the second acquisition unit 1204, the second guidance unit 1206 and the issuing unit 1208 correspond to steps S302 to S308 in Example 1, and the four units and the corresponding steps implement the same examples and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1. It should be noted that the above-mentioned units can be hardware components or software components stored in a memory (e.g., memory 1504) and processed by one or more processors (e.g., processors 1502a, 1502b..., 1502n), and the above-mentioned units can also be run in the computer terminal A provided in Example 5 as part of the device.

根据本申请实施例，还提供了一种用于实施上述图4所示的图像的处理方法的图像的处理装置。According to an embodiment of the present application, there is also provided an image processing device for implementing the image processing method shown in FIG. 4 above.

图13是根据本申请实施例的另一种图像的处理装置的示意图，如图13所示，该图像的处理装置1300可以包括：第一显示单元1302、第二显示单元1304和第三显示单元1306。FIG13 is a schematic diagram of another image processing device according to an embodiment of the present application. As shown in FIG13 , the image processing device 1300 may include: a first display unit 1302 , a second display unit 1304 , and a third display unit 1306 .

第一显示单元1302，用于响应作用于操作界面上的图像输入操作，在操作界面上显示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品。The first display unit 1302 is used to respond to the image input operation on the operation interface and display the product image to be processed on the operation interface, wherein the image content of the product image includes at least one product to be displayed.

第二显示单元1304用于响应作用于操作界面上的文本输入操作，在操作界面上显示与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果。The second display unit 1304 is used to respond to a text input operation on the operation interface, and display text description information corresponding to the product to be displayed in the product image on the operation interface, wherein the text description information is used to at least describe the wearing performance results displayed by the product to be displayed when it is worn by the carrier.

第三显示单元1306，用于响应作用于操作界面上的图像生成操作，在操作界面上显示与产品图像和文本描述信息匹配的目标图像，其中，目标图像为利用文本描述信息，引导图像处理模型对产品图像进行分析得到，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。The third display unit 1306 is used to respond to the image generation operation on the operation interface, and display a target image matching the product image and the text description information on the operation interface, wherein the target image is obtained by analyzing the product image by guiding the image processing model using the text description information, the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

此处上述第一显示单元1302、第二显示单元1304和第三显示单元1306对应于实施例1中的步骤S402至步骤S408，四个单元与对应的步骤所实现的实例和应用场景相同，但不限于上述实施例1所公开的内容。需要说明的是，上述单元可以是存储在存储器(例如，存储器1504)中并由一个或多个处理器(例如，处理器1502a，1502b……，1502n)处理的硬件组件或软件组件，上述单元也可以作为装置的一部分可以运行在实施例5提供的计算机终端A中。Here, the first display unit 1302, the second display unit 1304, and the third display unit 1306 correspond to steps S402 to S408 in Embodiment 1, and the four units and the corresponding steps implement the same examples and application scenarios, but are not limited to the contents disclosed in Embodiment 1. It should be noted that the above units may be hardware components or software components stored in a memory (e.g., memory 1504) and processed by one or more processors (e.g., processors 1502a, 1502b..., 1502n), and the above units may also be part of the device and run in the computer terminal A provided in Embodiment 5.

根据本申请实施例，还提供了一种用于实施上述图5所示的图像的处理方法的图像的处理装置。According to an embodiment of the present application, there is also provided an image processing device for implementing the image processing method shown in FIG. 5 .

图14是根据本申请实施例的另一种图像的处理装置的示意图，如图14所示，该图像的处理装置1400可以包括：第一展示单元1402、第三获取单元1404、第三引导单元1406和第二展示单元1408。FIG14 is a schematic diagram of another image processing device according to an embodiment of the present application. As shown in FIG14 , the image processing device 1400 may include: a first display unit 1402 , a third acquisition unit 1404 , a third guiding unit 1406 and a second display unit 1408 .

第一展示单元1402，用于在虚拟现实VR设备或增强现实AR设备的呈现画面上，展示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品。The first display unit 1402 is used to display the product image to be processed on the presentation screen of the virtual reality VR device or the augmented reality AR device, wherein the image content of the product image includes at least one product to be displayed.

第三获取单元1404，用于获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果。The third acquisition unit 1404 is used to acquire text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed when the product to be displayed is worn by the carrier.

第三引导单元1406，用于驱动VR设备或AR设备利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。The third guiding unit 1406 is used to drive the VR device or AR device to use the text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

第二展示单元1408，用于在VR设备或AR设备的呈现画面上，展示目标图像。The second display unit 1408 is used to display the target image on the presentation screen of the VR device or the AR device.

此处上述第一展示单元1402、第三获取单元1404、第三引导单元1406和第二展示单元1408对应于实施例1中的步骤S502至步骤S508，四个单元与对应的步骤所实现的实例和应用场景相同，但不限于上述实施例1所公开的内容。需要说明的是，上述单元可以是存储在存储器(例如，存储器1504)中并由一个或多个处理器(例如，处理器1502a，1502b……，1502n)处理的硬件组件或软件组件，上述单元也可以作为装置的一部分可以运行在实施例5提供的计算机终端A中。Here, the first display unit 1402, the third acquisition unit 1404, the third guide unit 1406, and the second display unit 1408 correspond to steps S502 to S508 in Example 1. The four units and the corresponding steps have the same examples and application scenarios, but are not limited to the contents disclosed in the above-mentioned Example 1. It should be noted that the above-mentioned units can be hardware components or software components stored in a memory (e.g., memory 1504) and processed by one or more processors (e.g., processors 1502a, 1502b..., 1502n), and the above-mentioned units can also be run in the computer terminal A provided in Example 5 as part of the device.

在该图像的处理装置中，若需要对某一产品在载体上进行穿戴的穿戴表现进行确定和展示，则可以识别出该产品的待处理的产品图像。获取在载体上对该产品进行展示的文本描述信息。可以利用文本描述信息，来引导预先通过文生图模型训练的图像处理模型，对产品图像进行分析，得到能够模拟该产品在载体上的穿戴表现的目标图像。由于本申请实施例考虑到相关技术中的问题，提出了一种利用文生图框架的虚拟试衣技术，通过简单的文字描述来对产品的虚拟试衣效果进行文字表达，利用文生图模型训练出来的图像处理模型对其进行分析，即可生成高度真实的模特试衣效果图，从而消除了对特定模特图片的依赖，为用户提供具有更大自由度、灵活度和定制化选择的产品展示效果，达到降低了通过图像来模拟产品的穿戴表现结果的局限性的目的，进而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。In the image processing device, if it is necessary to determine and display the wearing performance of a certain product on a carrier, the product image to be processed of the product can be identified. The text description information of the product displayed on the carrier is obtained. The text description information can be used to guide the image processing model pre-trained by the Wensheng graph model to analyze the product image and obtain a target image that can simulate the wearing performance of the product on the carrier. Since the embodiment of the present application takes into account the problems in the related art, a virtual fitting technology using the Wensheng graph framework is proposed. The virtual fitting effect of the product is expressed in words through a simple text description, and the image processing model trained by the Wensheng graph model is used to analyze it, so that a highly realistic model fitting effect image can be generated, thereby eliminating the dependence on specific model pictures, providing users with a product display effect with greater freedom, flexibility and customized selection, and achieving the purpose of reducing the limitations of the wearing performance results of the product simulated by the image, thereby achieving the technical effect of improving the effect of the simulated wearing performance results in the image, and solving the technical problem of poor effect of the simulated wearing performance results in the image.

实施例5Example 5

本申请的实施例可以提供一种计算机终端，该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地，在本实施例中，上述计算机终端也可以替换为移动终端等终端设备。The embodiment of the present application may provide a computer terminal, which may be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced by a terminal device such as a mobile terminal.

可选地，在本实施例中，上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。Optionally, in this embodiment, the computer terminal may be located in at least one network device among a plurality of network devices of the computer network.

在本实施例中，上述计算机终端可以执行图像的处理方法中以下步骤的程序代码：识别出待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。In this embodiment, the above-mentioned computer terminal can execute the program code of the following steps in the image processing method: identifying a product image to be processed, wherein the image content of the product image includes at least one product to be displayed; obtaining text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by the carrier; using the text description information, guiding the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

可选地，图15是根据本申请实施例的一种计算机终端的结构框图。如图15所示，该计算机终端A可以包括：一个或多个(图中仅示出一个)处理器1502、存储器1504以及传输装置1506。Optionally, Figure 15 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in Figure 15, the computer terminal A may include: one or more (only one is shown in the figure) processors 1502, a memory 1504 and a transmission device 1506.

其中，存储器可用于存储软件程序以及模块，如本申请实施例中的图像的处理方法和装置对应的程序指令/模块，处理器通过运行存储在存储器内的软件程序以及模块，从而执行各种功能应用以及数据处理，即实现上述的图像的处理方法。存储器可包括高速随机存储器，还可以包括非易失性存储器，如一个或者多个磁性存储装置、闪存、或者其它非易失性固态存储器。在一些实例中，存储器可进一步包括相对于处理器远程设置的存储器，这些远程存储器可以通过网络连接至计算机终端A。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。Among them, the memory can be used to store software programs and modules, such as the program instructions/modules corresponding to the image processing method and device in the embodiment of the present application. The processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, realizing the above-mentioned image processing method. The memory may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include a memory remotely arranged relative to the processor, and these remote memories may be connected to the computer terminal A via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

可选地，上述处理器还可以执行如下步骤的程序代码：利用图像处理模型，从文本描述信息中提取出文本特征；利用文本特征，引导图像处理模型对产品图像进行分析，得到目标图像。Optionally, the processor may also execute program codes of the following steps: using an image processing model to extract text features from text description information; using the text features to guide the image processing model to analyze the product image to obtain a target image.

可选地，上述处理器还可以执行如下步骤的程序代码：利用图像处理模型中的文本特征提取模型，从文本描述信息中提取出文本特征；利用文本特征，引导图像处理模型中的注意力处理模型，对产品图像进行注意力处理，得到目标图像。Optionally, the processor may also execute the following program code: using the text feature extraction model in the image processing model to extract text features from the text description information; using the text features to guide the attention processing model in the image processing model to perform attention processing on the product image to obtain the target image.

可选地，上述处理器还可以执行如下步骤的程序代码：对产品图像进行检测，得到产品图像中待展示产品的属性信息；利用文本特征，引导注意力处理模型，对待展示产品的属性信息进行注意力处理，得到目标图像。Optionally, the processor may also execute the following program code: detecting the product image to obtain the attribute information of the product to be displayed in the product image; using text features to guide the attention processing model to perform attention processing on the attribute information of the product to be displayed to obtain the target image.

可选地，上述处理器还可以执行如下步骤的程序代码：利用第一自注意力处理模型，对待展示产品的属性信息进行自注意力处理，得到第一自注意力处理结果；利用第一交叉注意力处理模型，对第一自注意力处理结果和文本特征进行交叉注意力处理，得到第一交叉注意力处理结果；基于第一交叉注意力处理结果，生成目标图像。Optionally, the processor may also execute the program code of the following steps: using a first self-attention processing model to perform self-attention processing on the attribute information of the product to be displayed to obtain a first self-attention processing result; using a first cross-attention processing model to perform cross-attention processing on the first self-attention processing result and text features to obtain a first cross-attention processing result; and generating a target image based on the first cross-attention processing result.

可选地，上述处理器还可以执行如下步骤的程序代码：对载体进行姿态检测，得到载体的姿态信息，其中，姿态信息用于表示载体在模拟的穿戴表现结果中展示出的姿态；利用第一自注意力处理模型，对待展示产品的属性信息和姿态信息进行自注意力处理，得到第一自注意力处理结果。Optionally, the processor may also execute the program code of the following steps: performing posture detection on the carrier to obtain posture information of the carrier, wherein the posture information is used to represent the posture displayed by the carrier in the simulated wearing performance result; performing self-attention processing on the attribute information and posture information of the displayed product using the first self-attention processing model to obtain a first self-attention processing result.

可选地，上述处理器还可以执行如下步骤的程序代码：调用姿态编码模型，对载体进行姿态编码，得到载体的骨架特征，其中，姿态信息包括骨架特征。Optionally, the processor may also execute program codes of the following steps: calling a posture coding model, performing posture coding on the carrier, and obtaining skeleton features of the carrier, wherein the posture information includes the skeleton features.

可选地，上述处理器还可以执行如下步骤的程序代码：获取对象图像，其中，对象图像的图像内容包括允许穿戴待展示产品的任意姿态的对象；利用姿态编码模型识别对象图像中的对象，且将识别出的对象确定为载体，对载体进行姿态编码，得到载体的骨架特征。Optionally, the processor may also execute program code of the following steps: obtaining an object image, wherein the image content of the object image includes an object in any posture that allows the product to be worn; identifying the object in the object image using a posture coding model, and determining the identified object as a carrier, performing posture coding on the carrier, and obtaining skeleton features of the carrier.

可选地，上述处理器还可以执行如下步骤的程序代码：利用图像处理模型中的第一残差网络模型，对骨架特征进行残差学习；利用第一自注意力处理模型，对待展示产品的属性信息和学习后的骨架特征进行自注意力处理，得到第一自注意力处理结果。Optionally, the processor may also execute the program code of the following steps: using the first residual network model in the image processing model to perform residual learning on the skeleton features; using the first self-attention processing model to perform self-attention processing on the attribute information of the displayed product and the learned skeleton features to obtain a first self-attention processing result.

可选地，上述处理器还可以执行如下步骤的程序代码：调用图像特征提取模型，从产品图像中提取出原始图像特征；利用图像处理模型中的第二残差网络模型，对原始图像特征进行残差学习；利用图像处理模型中的第二自注意力处理模型，对学习后的原始图像特征进行自注意力处理，得到第二自注意力处理结果，其中，第二自注意力处理结果用于表示尺寸大于尺寸阈值的图像特征，第二自注意力处理结果用于表示待展示产品的属性信息；利用第一自注意力处理模型，接收第二自注意力处理模型输出的第二自注意力处理结果，且对第二自注意力处理结果和学习后的骨架特征进行自注意力处理，得到第一自注意力处理结果。Optionally, the processor may also execute the following program code: calling the image feature extraction model to extract original image features from the product image; performing residual learning on the original image features using the second residual network model in the image processing model; performing self-attention processing on the learned original image features using the second self-attention processing model in the image processing model to obtain a second self-attention processing result, wherein the second self-attention processing result is used to represent image features whose size is greater than a size threshold, and the second self-attention processing result is used to represent attribute information of the product to be displayed; using the first self-attention processing model, receiving the second self-attention processing result output by the second self-attention processing model, and performing self-attention processing on the second self-attention processing result and the learned skeleton features to obtain the first self-attention processing result.

可选地，上述处理器还可以执行如下步骤的程序代码：利用图像处理模型中的第二交叉注意力处理模型，对第二自注意力处理结果和文本特征进行交叉注意力处理，得到第二交叉注意力处理结果；将第二交叉注意力处理结果，确定为原始图像特征，将第二残差网络模型在图像处理模型中的下一残差网络模型，确定为第二残差网络模型，将第二自注意力处理模型在图像处理模型中的下一自注意力处理模型，确定为第二自注意力处理模型，将第二交叉注意力处理模型在图像处理模型中的下一交叉注意力处理模型，确定为第二交叉注意力处理模型，且从以下步骤开始执行，直至第二残差网络模型在图像处理模型中未有下一残差网络模型，第二自注意力处理模型在图像处理模型中未有下一自注意力处理模型，第二交叉注意力处理模型在图像处理模型中未有下一交叉注意力处理模型：利用图像处理模型中的第二残差网络模型，对原始图像特征进行残差学习。Optionally, the processor may also execute the following program code: using the second cross-attention processing model in the image processing model to perform cross-attention processing on the second self-attention processing result and the text feature to obtain a second cross-attention processing result; determining the second cross-attention processing result as the original image feature, determining the next residual network model of the second residual network model in the image processing model as the second residual network model, determining the next self-attention processing model of the second self-attention processing model in the image processing model as the second self-attention processing model, determining the next cross-attention processing model of the second cross-attention processing model in the image processing model as the second cross-attention processing model, and starting from the following steps until the second residual network model has no next residual network model in the image processing model, the second self-attention processing model has no next self-attention processing model in the image processing model, and the second cross-attention processing model has no next cross-attention processing model in the image processing model: using the second residual network model in the image processing model to perform residual learning on the original image feature.

可选地，上述处理器还可以执行如下步骤的程序代码：将第一交叉注意力处理结果，确定为骨架特征；将第一残差网络模型在图像处理模型中的下一残差网络模型，确定为第一残差网络模型，将第一自注意力处理模型在图像处理模型中的下一自注意力处理模型，确定为第一自注意力处理模型，将第一交叉注意力处理模型在图像处理模型中的下一交叉注意力处理模型，确定为第一交叉注意力处理模型，且从以下步骤开始执行，直至第一残差网络模型在图像处理模型中未有下一残差网络模型，第一自注意力处理模型在图像处理模型中未有下一自注意力处理模型，第一交叉注意力处理模型在图像处理模型中未有下一交叉注意力处理模型：利用图像处理模型中的第一残差网络模型，对骨架特征进行残差学习，其中，第二自注意力处理模型与第一自注意力处理模型一一对应。Optionally, the processor may also execute the following program code: determining the first cross-attention processing result as a skeleton feature; determining the next residual network model of the first residual network model in the image processing model as the first residual network model; determining the next self-attention processing model of the first self-attention processing model in the image processing model as the first self-attention processing model; determining the next cross-attention processing model of the first cross-attention processing model in the image processing model as the first cross-attention processing model; and starting from the following steps until the first residual network model has no next residual network model in the image processing model, the first self-attention processing model has no next self-attention processing model in the image processing model, and the first cross-attention processing model has no next cross-attention processing model in the image processing model: using the first residual network model in the image processing model to perform residual learning on the skeleton features, wherein the second self-attention processing model corresponds one-to-one to the first self-attention processing model.

可选地，上述处理器还可以执行如下步骤的程序代码：对目标图像进行图像增强；输出增强后的目标图像。Optionally, the processor may also execute program codes of the following steps: performing image enhancement on the target image; and outputting the enhanced target image.

可选地，处理器可以通过传输装置调用存储器存储的信息及应用程序，以执行下述步骤：识别部署在电子商务平台上的虚拟服饰店铺，且从虚拟服饰店铺中识别出待处理的服饰产品图像，其中，服饰产品图像的图像内容包括至少一待展示服饰产品；获取与服饰产品图像中待展示服饰产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示服饰产品在由载体穿戴的情况下，所展示出的穿戴表现结果；利用文本描述信息，引导图像处理模型对服饰产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；将目标图像下发至电子商务平台。Optionally, the processor can call the information and application stored in the memory through the transmission device to perform the following steps: identify a virtual clothing store deployed on the e-commerce platform, and identify a clothing product image to be processed from the virtual clothing store, wherein the image content of the clothing product image includes at least one clothing product to be displayed; obtain text description information corresponding to the clothing product to be displayed in the clothing product image, wherein the text description information is used to at least describe the wearing performance results displayed by the clothing product to be displayed when it is worn by the carrier; use the text description information to guide the image processing model to analyze the clothing product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance results; send the target image to the e-commerce platform.

可选地，处理器可以通过传输装置调用存储器存储的信息及应用程序，以执行下述步骤：响应作用于操作界面上的图像输入操作，在操作界面上显示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；响应作用于操作界面上的文本输入操作，在操作界面上显示与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；响应作用于操作界面上的图像生成操作，在操作界面上显示与产品图像和文本描述信息匹配的目标图像，其中，目标图像为利用文本描述信息，引导图像处理模型对产品图像进行分析得到，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。Optionally, the processor can call the information and application stored in the memory through the transmission device to perform the following steps: in response to the image input operation on the operation interface, display the product image to be processed on the operation interface, wherein the image content of the product image includes at least one product to be displayed; in response to the text input operation on the operation interface, display the text description information corresponding to the product to be displayed in the product image on the operation interface, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by the carrier; in response to the image generation operation on the operation interface, display the target image matching the product image and the text description information on the operation interface, wherein the target image is obtained by using the text description information to guide the image processing model to analyze the product image, the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

可选地，处理器可以通过传输装置调用存储器存储的信息及应用程序，以执行下述步骤：在虚拟现实VR设备或增强现实AR设备的呈现画面上，展示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；驱动VR设备或AR设备利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；在VR设备或AR设备的呈现画面上，展示目标图像。Optionally, the processor can call the information and application stored in the memory through the transmission device to perform the following steps: display the product image to be processed on the presentation screen of the virtual reality VR device or the augmented reality AR device, wherein the image content of the product image includes at least one product to be displayed; obtain text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by the carrier; drive the VR device or AR device to use the text description information to guide the image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result; display the target image on the presentation screen of the VR device or the AR device.

采用本申请实施例，提供了一种图像的处理方法。在本申请实施例中，若需要对某一产品在载体上进行穿戴的穿戴表现进行确定和展示，则可以识别出该产品的待处理的产品图像。获取在载体上对该产品进行展示的文本描述信息。可以利用文本描述信息，来引导预先通过文生图模型训练的图像处理模型，对产品图像进行分析，得到能够模拟该产品在载体上的穿戴表现的目标图像。由于本申请实施例考虑到相关技术中的问题，提出了一种利用文生图框架的虚拟试衣技术，通过简单的文字描述来对产品的虚拟试衣效果进行文字表达，利用文生图模型训练出来的图像处理模型对其进行分析，即可生成高度真实的模特试衣效果图，从而消除了对特定模特图片的依赖，为用户提供具有更大自由度、灵活度和定制化选择的产品展示效果，达到降低了通过图像来模拟产品的穿戴表现结果的局限性的目的，进而实现了提高图像中模拟的穿戴表现结果效果的技术效果，解决了图像中模拟的穿戴表现结果效果差的技术问题。According to an embodiment of the present application, an image processing method is provided. In the embodiment of the present application, if it is necessary to determine and display the wearing performance of a certain product on a carrier, the product image to be processed of the product can be identified. The text description information of the product displayed on the carrier is obtained. The text description information can be used to guide the image processing model pre-trained by the Wensheng graph model to analyze the product image and obtain a target image that can simulate the wearing performance of the product on the carrier. Since the embodiment of the present application takes into account the problems in the related art, a virtual fitting technology using the Wensheng graph framework is proposed. The virtual fitting effect of the product is expressed in words through a simple text description, and the image processing model trained by the Wensheng graph model is used to analyze it, so that a highly realistic model fitting effect image can be generated, thereby eliminating the dependence on specific model pictures, providing users with a product display effect with greater freedom, flexibility and customized selection, and achieving the purpose of reducing the limitations of the wearing performance results of the product simulated by the image, thereby achieving the technical effect of improving the effect of the simulated wearing performance result in the image, and solving the technical problem of poor effect of the simulated wearing performance result in the image.

本领域普通技术人员可以理解，图15所示的结构仅为示意，计算机终端A也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobi leInternet Devices，简称为MID)、PAD等终端设备。图15其并不对上述计算机终端A的结构造成限定。例如，计算机终端A还可包括比图15中所示更多或者更少的组件(如网络接口、显示装置等)，或者具有与图15所示不同的配置。Those skilled in the art will appreciate that the structure shown in FIG. 15 is for illustration only, and the computer terminal A may also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a PDA, a mobile Internet device (MID for short), a PAD, and other terminal devices. FIG. 15 does not limit the structure of the computer terminal A. For example, the computer terminal A may also include more or fewer components (such as a network interface, a display device, etc.) than those shown in FIG. 15 , or may have a configuration different from that shown in FIG. 15 .

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成，该程序可以存储于一计算机可读存储介质中，存储介质可以包括：闪存盘、只读存储器(Read-OnlyMemory，简称为ROM)、随机存取器(Random Access Memory，简称为RAM)、磁盘或光盘等。A person of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing the hardware related to the terminal device through a program, and the program can be stored in a computer-readable storage medium, and the storage medium may include: a flash drive, a read-only memory (ROM), a random access memory (RAM), a disk or an optical disk, etc.

实施例6Example 6

本申请的实施例还提供了一种计算机可读存储介质。可选地，在本实施例中，上述计算机可读存储介质可以用于保存上述实施例一所提供的图像的处理方法所执行的程序代码。The embodiment of the present application further provides a computer-readable storage medium. Optionally, in this embodiment, the computer-readable storage medium can be used to store the program code executed by the image processing method provided in the first embodiment.

可选地，在本实施例中，上述计算机可读存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中，或者位于移动终端群中的任意一个移动终端中。Optionally, in this embodiment, the computer-readable storage medium may be located in any one of the computer terminals in a computer terminal group in a computer network, or in any one of the mobile terminals in a mobile terminal group.

可选地，在本实施例中，计算机可读存储介质被设置为存储用于执行以下步骤的程序代码：识别出待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。Optionally, in this embodiment, the computer-readable storage medium is configured to store program codes for executing the following steps: identifying a product image to be processed, wherein the image content of the product image includes at least one product to be displayed; obtaining text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe a wearing performance result displayed by the product to be displayed when it is worn by a carrier; using the text description information, guiding an image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training a text image model, and the image content of the target image is used to simulate the wearing performance result.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：利用图像处理模型，从文本描述信息中提取出文本特征；利用文本特征，引导图像处理模型对产品图像进行分析，得到目标图像。Optionally, the computer-readable storage medium may also execute program codes for the following steps: extracting text features from text description information using an image processing model; and guiding the image processing model to analyze the product image using the text features to obtain a target image.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：利用图像处理模型中的文本特征提取模型，从文本描述信息中提取出文本特征；利用文本特征，引导图像处理模型中的注意力处理模型，对产品图像进行注意力处理，得到目标图像。Optionally, the computer-readable storage medium may also execute program codes of the following steps: utilizing a text feature extraction model in an image processing model to extract text features from text description information; utilizing text features to guide an attention processing model in an image processing model to perform attention processing on a product image to obtain a target image.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：对产品图像进行检测，得到产品图像中待展示产品的属性信息；利用文本特征，引导注意力处理模型，对待展示产品的属性信息进行注意力处理，得到目标图像。Optionally, the computer-readable storage medium may also execute program codes of the following steps: detecting the product image to obtain attribute information of the product to be displayed in the product image; using text features to guide the attention processing model to perform attention processing on the attribute information of the product to be displayed to obtain the target image.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：利用第一自注意力处理模型，对待展示产品的属性信息进行自注意力处理，得到第一自注意力处理结果；利用第一交叉注意力处理模型，对第一自注意力处理结果和文本特征进行交叉注意力处理，得到第一交叉注意力处理结果；基于第一交叉注意力处理结果，生成目标图像。Optionally, the computer-readable storage medium can also execute the program code of the following steps: using a first self-attention processing model to perform self-attention processing on the attribute information of the product to be displayed to obtain a first self-attention processing result; using a first cross-attention processing model to perform cross-attention processing on the first self-attention processing result and text features to obtain a first cross-attention processing result; based on the first cross-attention processing result, generating a target image.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：对载体进行姿态检测，得到载体的姿态信息，其中，姿态信息用于表示载体在模拟的穿戴表现结果中展示出的姿态；利用第一自注意力处理模型，对待展示产品的属性信息和姿态信息进行自注意力处理，得到第一自注意力处理结果。Optionally, the computer-readable storage medium may also execute program codes of the following steps: performing posture detection on the carrier to obtain posture information of the carrier, wherein the posture information is used to represent the posture displayed by the carrier in the simulated wearing performance result; performing self-attention processing on the attribute information and posture information of the displayed product using a first self-attention processing model to obtain a first self-attention processing result.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：调用姿态编码模型，对载体进行姿态编码，得到载体的骨架特征，其中，姿态信息包括骨架特征。Optionally, the computer-readable storage medium may also execute program codes of the following steps: calling a posture coding model, performing posture coding on the carrier, and obtaining skeleton features of the carrier, wherein the posture information includes skeleton features.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：获取对象图像，其中，对象图像的图像内容包括允许穿戴待展示产品的任意姿态的对象；利用姿态编码模型识别对象图像中的对象，且将识别出的对象确定为载体，对载体进行姿态编码，得到载体的骨架特征。Optionally, the computer-readable storage medium may also execute program code for the following steps: obtaining an object image, wherein the image content of the object image includes an object in any posture that allows the product to be worn; identifying the object in the object image using a posture coding model, determining the identified object as a carrier, performing posture coding on the carrier, and obtaining skeleton features of the carrier.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：利用图像处理模型中的第一残差网络模型，对骨架特征进行残差学习；利用第一自注意力处理模型，对待展示产品的属性信息和学习后的骨架特征进行自注意力处理，得到第一自注意力处理结果。Optionally, the computer-readable storage medium can also execute program codes of the following steps: using the first residual network model in the image processing model to perform residual learning on the skeleton features; using the first self-attention processing model to perform self-attention processing on the attribute information of the displayed product and the learned skeleton features to obtain a first self-attention processing result.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：调用图像特征提取模型，从产品图像中提取出原始图像特征；利用图像处理模型中的第二残差网络模型，对原始图像特征进行残差学习；利用图像处理模型中的第二自注意力处理模型，对学习后的原始图像特征进行自注意力处理，得到第二自注意力处理结果，其中，第二自注意力处理结果用于表示尺寸大于尺寸阈值的图像特征，第二自注意力处理结果用于表示待展示产品的属性信息；利用第一自注意力处理模型，接收第二自注意力处理模型输出的第二自注意力处理结果，且对第二自注意力处理结果和学习后的骨架特征进行自注意力处理，得到第一自注意力处理结果。Optionally, the computer-readable storage medium can also execute the following steps of program code: calling the image feature extraction model to extract original image features from the product image; using the second residual network model in the image processing model to perform residual learning on the original image features; using the second self-attention processing model in the image processing model to perform self-attention processing on the learned original image features to obtain a second self-attention processing result, wherein the second self-attention processing result is used to represent image features whose size is greater than a size threshold, and the second self-attention processing result is used to represent attribute information of the product to be displayed; using the first self-attention processing model, receiving the second self-attention processing result output by the second self-attention processing model, and performing self-attention processing on the second self-attention processing result and the learned skeleton features to obtain the first self-attention processing result.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：利用图像处理模型中的第二交叉注意力处理模型，对第二自注意力处理结果和文本特征进行交叉注意力处理，得到第二交叉注意力处理结果；将第二交叉注意力处理结果，确定为原始图像特征，将第二残差网络模型在图像处理模型中的下一残差网络模型，确定为第二残差网络模型，将第二自注意力处理模型在图像处理模型中的下一自注意力处理模型，确定为第二自注意力处理模型，将第二交叉注意力处理模型在图像处理模型中的下一交叉注意力处理模型，确定为第二交叉注意力处理模型，且从以下步骤开始执行，直至第二残差网络模型在图像处理模型中未有下一残差网络模型，第二自注意力处理模型在图像处理模型中未有下一自注意力处理模型，第二交叉注意力处理模型在图像处理模型中未有下一交叉注意力处理模型：利用图像处理模型中的第二残差网络模型，对原始图像特征进行残差学习。Optionally, the computer-readable storage medium can also execute the following program code: using the second cross-attention processing model in the image processing model to perform cross-attention processing on the second self-attention processing result and the text feature to obtain a second cross-attention processing result; determining the second cross-attention processing result as the original image feature, determining the next residual network model of the second residual network model in the image processing model as the second residual network model, determining the next self-attention processing model of the second self-attention processing model in the image processing model as the second self-attention processing model, and determining the next cross-attention processing model of the second cross-attention processing model in the image processing model as the second cross-attention processing model, and starting from the following steps until the second residual network model has no next residual network model in the image processing model, the second self-attention processing model has no next self-attention processing model in the image processing model, and the second cross-attention processing model has no next cross-attention processing model in the image processing model: using the second residual network model in the image processing model to perform residual learning on the original image feature.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：将第一交叉注意力处理结果，确定为骨架特征；将第一残差网络模型在图像处理模型中的下一残差网络模型，确定为第一残差网络模型，将第一自注意力处理模型在图像处理模型中的下一自注意力处理模型，确定为第一自注意力处理模型，将第一交叉注意力处理模型在图像处理模型中的下一交叉注意力处理模型，确定为第一交叉注意力处理模型，且从以下步骤开始执行，直至第一残差网络模型在图像处理模型中未有下一残差网络模型，第一自注意力处理模型在图像处理模型中未有下一自注意力处理模型，第一交叉注意力处理模型在图像处理模型中未有下一交叉注意力处理模型：利用图像处理模型中的第一残差网络模型，对骨架特征进行残差学习，其中，第二自注意力处理模型与第一自注意力处理模型一一对应。Optionally, the computer-readable storage medium can also execute the program code of the following steps: determining the first cross-attention processing result as a skeleton feature; determining the next residual network model of the first residual network model in the image processing model as the first residual network model, determining the next self-attention processing model of the first self-attention processing model in the image processing model as the first self-attention processing model, determining the next cross-attention processing model of the first cross-attention processing model in the image processing model as the first cross-attention processing model, and starting from the following steps until the first residual network model has no next residual network model in the image processing model, the first self-attention processing model has no next self-attention processing model in the image processing model, and the first cross-attention processing model has no next cross-attention processing model in the image processing model: using the first residual network model in the image processing model to perform residual learning on the skeleton features, wherein the second self-attention processing model corresponds one-to-one to the first self-attention processing model.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：对目标图像进行图像增强；输出增强后的目标图像。Optionally, the computer-readable storage medium may also execute program codes for the following steps: performing image enhancement on the target image; and outputting the enhanced target image.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：识别部署在电子商务平台上的虚拟服饰店铺，且从虚拟服饰店铺中识别出待处理的服饰产品图像，其中，服饰产品图像的图像内容包括至少一待展示服饰产品；获取与服饰产品图像中待展示服饰产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示服饰产品在由载体穿戴的情况下，所展示出的穿戴表现结果；利用文本描述信息，引导图像处理模型对服饰产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；将目标图像下发至电子商务平台。Optionally, the computer-readable storage medium may also execute program codes of the following steps: identifying a virtual clothing store deployed on an e-commerce platform, and identifying clothing product images to be processed from the virtual clothing store, wherein the image content of the clothing product images includes at least one clothing product to be displayed; obtaining text description information corresponding to the clothing product to be displayed in the clothing product image, wherein the text description information is used to at least describe the wearing performance results displayed by the clothing product to be displayed when it is worn by a carrier; using the text description information, guiding an image processing model to analyze the clothing product image to obtain a target image, wherein the image processing model is obtained by training a text image model, and the image content of the target image is used to simulate the wearing performance results; and sending the target image to the e-commerce platform.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：响应作用于操作界面上的图像输入操作，在操作界面上显示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；响应作用于操作界面上的文本输入操作，在操作界面上显示与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；响应作用于操作界面上的图像生成操作，在操作界面上显示与产品图像和文本描述信息匹配的目标图像，其中，目标图像为利用文本描述信息，引导图像处理模型对产品图像进行分析得到，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果。Optionally, the computer-readable storage medium may also execute program codes of the following steps: in response to an image input operation on an operation interface, displaying a product image to be processed on the operation interface, wherein the image content of the product image includes at least one product to be displayed; in response to a text input operation on the operation interface, displaying text description information corresponding to the product to be displayed in the product image on the operation interface, wherein the text description information is used to at least describe the wearing performance result displayed by the product to be displayed when it is worn by the carrier; in response to an image generation operation on the operation interface, displaying a target image matching the product image and the text description information on the operation interface, wherein the target image is obtained by using the text description information to guide the image processing model to analyze the product image, the image processing model is obtained by training the text image model, and the image content of the target image is used to simulate the wearing performance result.

可选地，上述计算机可读存储介质还可以执行如下步骤的程序代码：在虚拟现实VR设备或增强现实AR设备的呈现画面上，展示待处理的产品图像，其中，产品图像的图像内容包括至少一待展示产品；获取与产品图像中待展示产品对应的文本描述信息，其中，文本描述信息用于至少描述待展示产品在由载体穿戴的情况下，所展示出的穿戴表现结果；驱动VR设备或AR设备利用文本描述信息，引导图像处理模型对产品图像进行分析，得到目标图像，其中，图像处理模型为对文生图模型训练得到，目标图像的图像内容用于模拟穿戴表现结果；在VR设备或AR设备的呈现画面上，展示目标图像。Optionally, the computer-readable storage medium may also execute program code for the following steps: displaying a product image to be processed on a presentation screen of a virtual reality (VR) device or an augmented reality (AR) device, wherein the image content of the product image includes at least one product to be displayed; obtaining text description information corresponding to the product to be displayed in the product image, wherein the text description information is used to at least describe a wearing performance result displayed by the product to be displayed when the product is worn by a carrier; driving the VR device or AR device to use the text description information to guide an image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training a text image model, and the image content of the target image is used to simulate the wearing performance result; and displaying the target image on the presentation screen of the VR device or the AR device.

实施例7Example 7

本申请的实施例可以提供一种电子设备，该电子设备可以包括存储器和处理器。An embodiment of the present application may provide an electronic device, which may include a memory and a processor.

图16是根据本申请实施例的一种图像的处理方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本申请的实现。Figure 16 is a block diagram of an electronic device according to a method for processing an image in an embodiment of the present application. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present application described and/or required herein.

如图16所示，设备1600包括计算单元1601，其可以根据存储在只读存储器(ROM)1602中的计算机程序或者从存储单元1608加载到随机访问存储器(RAM)1603中的计算机程序，来执行各种适当的动作和处理。在RAM1603中，还可存储设备1600操作所需的各种程序和数据。计算单元1601、ROM1602以及RAM1603通过总线1604彼此相连。输入/输出(I/O)接口1605也连接至总线1604。As shown in FIG. 16 , the device 1600 includes a computing unit 1601, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 1602 or a computer program loaded from a storage unit 1608 into a random access memory (RAM) 1603. In the RAM 1603, various programs and data required for the operation of the device 1600 can also be stored. The computing unit 1601, the ROM 1602, and the RAM 1603 are connected to each other via a bus 1604. An input/output (I/O) interface 1605 is also connected to the bus 1604.

设备1600中的多个部件连接至I/O接口1605，包括：输入单元1606，例如键盘、鼠标等；输出单元1604，例如各种类型的显示器、扬声器等；存储单元1608，例如磁盘、光盘等；以及通信单元1609，例如网卡、调制解调器、无线通信收发机等。通信单元1609允许设备1600通过诸如因特网的计算机网络和/或各种电信网络与其它设备交换信息/数据。A number of components in the device 1600 are connected to the I/O interface 1605, including: an input unit 1606, such as a keyboard, a mouse, etc.; an output unit 1604, such as various types of displays, speakers, etc.; a storage unit 1608, such as a disk, an optical disk, etc.; and a communication unit 1609, such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 1609 allows the device 1600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元1601可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1601的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1601执行上文所描述的各个方法和处理，例如图像的处理方法。例如，在一些实施例中，图像的处理方法可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元1608。在一些实施例中，计算机程序的部分或者全部可以经由ROM 1602和/或通信单元1609而被载入和/或安装到设备1600上。当计算机程序加载到RAM 1603并由计算单元1601执行时，可以执行上文描述的图像的处理方法的一个或多个步骤。备选地，在其它实施例中，计算单元1601可以通过其它任何适当的方式(例如，借助于固件)而被配置为执行图像的处理方法。The computing unit 1601 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the computing unit 1601 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence computing chips, various computing units running machine learning model algorithms, digital signal processors (DSPs), and any appropriate processors, controllers, microcontrollers, etc. The computing unit 1601 performs the various methods and processes described above, such as the image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as a storage unit 1608. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 1600 via the ROM 1602 and/or the communication unit 1609. When the computer program is loaded into the RAM 1603 and executed by the computing unit 1601, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 1601 may be configured to perform the image processing method in any other appropriate manner (e.g., by means of firmware).

实施例8Example 8

本申请的实施例还提供了一种计算机程序产品。可选地，在本实施例中，上述计算机程序产品中可以包括计算机程序，计算机程序在被处理器执行时实现上述本申请实施例的图像的处理方法。The embodiment of the present application further provides a computer program product. Optionally, in the present embodiment, the computer program product may include a computer program, and when the computer program is executed by a processor, the image processing method of the embodiment of the present application is implemented.

本申请实施例8所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。图17是根据本申请实施例的一种用于实现图像的处理方法的计算机终端(或移动设备)的硬件结构框图，如图17所示，计算机终端170(或移动设备)可以包括一个或多个(图中采用1702a、1702b，……，1702n来示出)处理器1702(处理器1702可以包括但不限于微处理器(Microcontrol ler Unit，简称为MCU)或可编程逻辑器件(FieldProgrammable Gate Array，简称为FPGA)等的处理装置)、用于存储数据的存储器1704、以及用于通信功能的传输装置1706。除此以外，还可以包括：显示器、输入/输出接口(I/O接口)、通用串行总线(Universal Serial Bus，简称为USB)端口(可以作为BUS总线的端口中的一个端口被包括)、网络接口、电源和/或相机。本领域普通技术人员可以理解，图17所示的结构仅为示意，其并不对上述电子装置的结构造成限定。例如，计算机终端170还可包括比图17中所示更多或者更少的组件，或者具有与图17所示不同的配置。The method embodiment provided in Example 8 of the present application can be executed in a mobile terminal, a computer terminal or a similar computing device. Figure 17 is a hardware structure block diagram of a computer terminal (or mobile device) for implementing an image processing method according to an embodiment of the present application. As shown in Figure 17, the computer terminal 170 (or mobile device) may include one or more (1702a, 1702b, ..., 1702n are used in the figure to illustrate) processors 1702 (the processor 1702 may include but is not limited to a microprocessor (Microcontroller Unit, referred to as MCU) or a programmable logic device (FieldProgrammable Gate Array, referred to as FPGA) and other processing devices), a memory 1704 for storing data, and a transmission device 1706 for communication functions. In addition, it may also include: a display, an input/output interface (I/O interface), a universal serial bus (Universal Serial Bus, referred to as USB) port (which can be included as one of the ports of the BUS bus), a network interface, a power supply and/or a camera. It can be understood by those skilled in the art that the structure shown in Figure 17 is only for illustration and does not limit the structure of the above-mentioned electronic device. For example, the computer terminal 170 may also include more or fewer components than those shown in FIG. 17 , or have a configuration different from that shown in FIG. 17 .

图17示出的硬件结构框图，不仅可以作为上述计算机终端170(或移动设备)的示例性框图，还可以作为上述服务器的示例性框图，一种可选实施例中，图18以框图示出了使用上述图17所示的计算机终端170(或移动设备)作为计算环境1801中计算节点的一种实施例。The hardware structure block diagram shown in Figure 17 can not only serve as an exemplary block diagram of the above-mentioned computer terminal 170 (or mobile device), but also as an exemplary block diagram of the above-mentioned server. In an optional embodiment, Figure 18 shows a block diagram of an embodiment of using the computer terminal 170 (or mobile device) shown in the above-mentioned Figure 17 as a computing node in the computing environment 1801.

图18是根据本申请实施例的一种图像的处理方法的计算环境的结构框图，如图18所示，计算环境1801包括运行在分布式网络上的多个(图中采用1810-1，1810-2，…，来示出)计算节点(如服务器)。计算节点都包含本地处理和内存资源，终端用户1802可以在计算环境1801中远程运行应用程序或存储数据。应用程序可以作为计算环境1601中的多个服务1820-1，1820-2，1820-3和1820-4进行提供，分别代表服务“F”，“G”，“I”和“H”。FIG18 is a block diagram of a computing environment of an image processing method according to an embodiment of the present application. As shown in FIG18 , a computing environment 1801 includes multiple computing nodes (such as servers) running on a distributed network (shown in the figure using 1810-1, 1810-2, ...). The computing nodes all contain local processing and memory resources, and the terminal user 1802 can remotely run applications or store data in the computing environment 1801. The application can be provided as multiple services 1820-1, 1820-2, 1820-3 and 1820-4 in the computing environment 1601, representing services "F", "G", "I" and "H" respectively.

终端用户1802可以通过客户端上的web浏览器或其它软件应用程序提供和访问服务，在一些实施例中，可以将终端用户1802的供应和/或请求提供给入口网关1830。入口网关1830可以包括一个相应的代理来处理针对服务(计算环境1801中提供的一个或多个服务)的供应和/或请求。The end user 1802 can provide and access services through a web browser or other software application on the client, and in some embodiments, the end user 1802's provision and/or request can be provided to the entry gateway 1830. The entry gateway 1830 may include a corresponding agent to handle the provision and/or request for the service (one or more services provided in the computing environment 1801).

服务是根据计算环境1801支持的各种虚拟化技术来提供或部署的。在一些实施例中，可以根据基于虚拟机(Virtual Machine，简称为VM)的虚拟化、基于容器的虚拟化和/或类似的方式提供服务。基于虚拟机的虚拟化可以是通过初始化虚拟机来模拟真实的计算机，在不直接接触任何实际硬件资源的情况下执行程序和应用程序。在虚拟机虚拟化机器的同时，根据基于容器的虚拟化，可以启动容器来虚拟化整个操作系统，以便多个工作负载可以在单个操作系统实例上运行。Services are provided or deployed based on various virtualization technologies supported by computing environment 1801. In some embodiments, services can be provided based on virtual machine (VM)-based virtualization, container-based virtualization, and/or similar methods. Virtual machine-based virtualization can be to simulate a real computer by initializing a virtual machine, and execute programs and applications without directly contacting any actual hardware resources. While the virtual machine virtualizes the machine, according to container-based virtualization, a container can be started to virtualize the entire operating system so that multiple workloads can run on a single operating system instance.

在基于容器虚拟化的一个实施例中，服务的若干容器可以被组装成一个Pod(例如，Kubernetes Pod)。举例来说，如图18所示，服务1820-2可以配备一个或多个Pod1840-1，1840-2，…，1840-N(统称为Pod)。Pod可以包括代理1845和一个或多个容器1842-1，1842-2，…，1842-M(统称为容器)。Pod中一个或多个容器处理与服务的一个或多个相应功能相关的请求，代理1845通常控制与服务相关的网络功能，如路由、负载均衡等。其它服务也可以配备类似于Pod的Pod。In an embodiment based on container virtualization, several containers of a service can be assembled into a Pod (e.g., a Kubernetes Pod). For example, as shown in FIG. 18 , service 1820-2 can be equipped with one or more Pods 1840-1, 1840-2, ..., 1840-N (collectively referred to as Pods). A Pod may include a proxy 1845 and one or more containers 1842-1, 1842-2, ..., 1842-M (collectively referred to as containers). One or more containers in a Pod process requests related to one or more corresponding functions of the service, and the proxy 1845 generally controls network functions related to the service, such as routing, load balancing, etc. Other services may also be equipped with Pods similar to Pods.

在操作过程中，执行来自终端用户1802的用户请求可能需要调用计算环境1801中的一个或多个服务，执行一个服务的一个或多个功能可能需要调用另一个服务的一个或多个功能。如图18所示，服务“F”1820-1从入口网关1830接收终端用户1802的用户请求，服务“F”1820-1可以调用服务“G”1820-2，服务“G”1820-2可以请求服务“I”1820-3执行一个或多个功能。During operation, executing a user request from the end user 1802 may require invoking one or more services in the computing environment 1801, and executing one or more functions of a service may require invoking one or more functions of another service. As shown in FIG18 , service “F” 1820-1 receives a user request from the end user 1802 from the ingress gateway 1830, service “F” 1820-1 may call service “G” 1820-2, and service “G” 1820-2 may request service “I” 1820-3 to execute one or more functions.

上述的计算环境可以是云计算环境，资源的分配由云服务提供上管理，允许功能的开发无需考虑实现、调整或扩展服务器。该计算环境允许开发人员在不构建或维护复杂基础设施的情况下执行响应事件的代码。服务可以被分割完成一组可以自动独立伸缩的功能，而不是扩展单个硬件设备来处理潜在的负载。The computing environment described above can be a cloud computing environment, where the allocation of resources is managed by the cloud service provider, allowing the development of functions without considering the implementation, adjustment or expansion of servers. The computing environment allows developers to execute code in response to events without building or maintaining complex infrastructure. Services can be divided into a set of functions that can be automatically and independently scaled, rather than expanding a single hardware device to handle potential loads.

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(Appl icat ion Specific Integrated，简称为ASIC)、专用标准产品(Appl icat ion Specific Standard Parts，简称为ASSP)、芯片上系统的系统(System-on-a-Chip，简称为SOC)、复杂可编程逻辑设备(ComplexProgrammable Logic Device，简称为CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various embodiments of the systems and techniques described above herein may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard parts (ASSPs), system-on-a-chip systems (SOCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs that are executable and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

用于实施本申请的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。The program code for implementing the method of the present application can be written in any combination of one or more programming languages. These program codes can be provided to a processor or controller of a general-purpose computer, a special-purpose computer or other programmable data processing device, so that the program code, when executed by the processor or controller, enables the functions/operations specified in the flow chart and/or block diagram to be implemented. The program code can be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a stand-alone software package, or entirely on a remote machine or server.

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

在本申请的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory，简称为EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(Compact Disc Read-Only Memory，简称为CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present application, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing. A more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory for short), an optical fiber, a portable compact disc read-only memory (CD-ROM for short), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，阴极射线管(Cathode Ray Tube，简称为CRT)或者液晶显示器(Liquid Crystal Display，简称为LCD)、监测器；以及键盘和指向装置(例如，鼠标或者路径球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having: a display device (e.g., a cathode ray tube (CRT) or a liquid crystal display (LCD), a monitor for displaying information to the user; and a keyboard and a pointing device (e.g., a mouse or a path ball), through which the user may provide input to the computer. Other types of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form (including acoustic input, voice input, or tactile input).

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(Local Area Network，简称为LAN)、广域网(Wide Area Network，简称为WAN)和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such back-end components, middleware components, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a Local Area Network (LAN), a Wide Area Network (WAN), and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，也可以为分布式系统的服务器，或者是结合了区块链的服务器。A computer system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The relationship of client and server is generated by computer programs running on respective computers and having a client-server relationship with each other. The server may be a cloud server, a server of a distributed system, or a server combined with a blockchain.

需要说明的是，上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。It should be noted that the serial numbers of the above-mentioned embodiments of the present application are only for description and do not represent the advantages or disadvantages of the embodiments.

在本申请的上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the above embodiments of the present application, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的技术内容，可通过其它的方式实现。其中，以上所描述的装置实施例仅仅是示意性的，例如单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，单元或模块的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. Among them, the device embodiments described above are only schematic, for example, the division of units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of units or modules, which can be electrical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括：U盘、只读存储器、随机存取存储器、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the various embodiments of the present application. The aforementioned storage medium includes: various media that can store program codes, such as USB flash drives, read-only memories, random access memories, mobile hard disks, magnetic disks or optical disks.

以上仅是本申请的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本申请的保护范围。The above are only preferred implementations of the present application. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principles of the present application. These improvements and modifications should also be regarded as the scope of protection of the present application.

Claims

1.A method of processing an image, comprising:

Identifying a product image to be processed, wherein the image content of the product image comprises at least one product to be displayed;

Acquiring text description information corresponding to the product to be displayed in the product image, wherein the text description information is used for at least describing the wearing performance result displayed by the product to be displayed when the product to be displayed is worn by a carrier;

And utilizing the text description information to guide an image processing model to analyze the product image to obtain a target image, wherein the image processing model is obtained by training a meridional chart model, and the image content of the target image is used for simulating the wearing performance result.

2. The method of claim 1, wherein using the text description information to direct an image processing model to analyze the product image to obtain a target image comprises:

Extracting text features from the text description information by using the image processing model;

And guiding the image processing model to analyze the product image by utilizing the text characteristics to obtain the target image.

3. The method of claim 2, wherein extracting text features from the text description information using the image processing model comprises:

Extracting the text features from the text description information by using a text feature extraction model in the image processing model;

And guiding the image processing model to analyze the product image by utilizing the text characteristics to obtain the target image, wherein the method comprises the following steps of: and guiding an attention processing model in the image processing model by using the text characteristics, and performing attention processing on the product image to obtain the target image.

4. A method according to claim 3, characterized in that the method further comprises:

Detecting the product image to obtain attribute information of the product to be displayed in the product image;

And guiding an attention processing model in the image processing model by using the text characteristics, and performing attention processing on the product image to obtain the target image, wherein the method comprises the following steps: and guiding the attention processing model by utilizing the text characteristics, and performing attention processing on the attribute information of the product to be displayed to obtain the target image.

5. The method of claim 4, wherein the attention processing model comprises: the first self-attention processing model and the first cross-attention processing model, wherein the text features are utilized to guide the attention processing model to perform attention processing on the attribute information of the product to be displayed, so as to obtain the target image, and the method comprises the following steps:

Performing self-attention processing on the attribute information of the product to be displayed by using the first self-attention processing model to obtain a first self-attention processing result;

Performing cross attention processing on the first self attention processing result and the text feature by using the first cross attention processing model to obtain a first cross attention processing result; the target image is generated based on the first cross-attention processing result.

6. The method of claim 5, wherein the method further comprises:

Carrying out gesture detection on the carrier to obtain gesture information of the carrier, wherein the gesture information is used for representing the gesture of the carrier shown in the simulated wearing performance result;

And performing self-attention processing on the attribute information of the product to be displayed by using the first self-attention processing model to obtain a first self-attention processing result, wherein the first self-attention processing result comprises: and performing self-attention processing on the attribute information and the attitude information of the product to be displayed by using the first self-attention processing model to obtain a first self-attention processing result.

7. The method of claim 6, wherein performing gesture detection on the carrier to obtain gesture information of the carrier comprises:

and calling a gesture coding model to code the gesture of the carrier to obtain skeleton characteristics of the carrier, wherein the gesture information comprises the skeleton characteristics.

8. The method of claim 7, wherein encoding the pose of the carrier using the pose encoding model results in a skeletal feature of the carrier, comprising:

Acquiring an object image, wherein the image content of the object image comprises an object in any posture allowing wearing of the product to be displayed;

And identifying the object in the object image by using the gesture coding model, determining the identified object as the carrier, and performing gesture coding on the carrier to obtain the skeleton characteristic of the carrier.

9. The method of claim 7, wherein the method further comprises:

Residual learning is carried out on the skeleton characteristics by using a first residual network model in the image processing model;

And performing self-attention processing on the attribute information and the gesture information of the product to be displayed by using the first self-attention processing model to obtain a first self-attention processing result, wherein the first self-attention processing result comprises: and performing self-attention processing on the attribute information of the product to be displayed and the learned skeleton characteristics by using the first self-attention processing model to obtain a first self-attention processing result.

10. The method according to claim 9, wherein detecting the product image to obtain the attribute information of the product to be displayed in the product image includes:

calling an image feature extraction model, and extracting original image features from the product image;

Residual learning is carried out on the original image characteristics by using a second residual network model in the image processing model;

performing self-attention processing on the learned original image features by using a second self-attention processing model in the image processing models to obtain a second self-attention processing result, wherein the second self-attention processing result is used for representing image features with the size larger than a size threshold, and the second self-attention processing result is used for representing attribute information of the product to be displayed;

And performing self-attention processing on the attribute information of the product to be displayed and the learned skeleton characteristics by using the first self-attention processing model to obtain a first self-attention processing result, wherein the first self-attention processing result comprises: and receiving the second self-attention processing result output by the second self-attention processing model by using the first self-attention processing model, and performing self-attention processing on the second self-attention processing result and the learned skeleton feature to obtain the first self-attention processing result.

11. The method according to claim 10, wherein the method further comprises:

performing cross attention processing on the second self attention processing result and the text feature by using a second cross attention processing model in the image processing model to obtain a second cross attention processing result;

Determining the second cross-attention processing result as the original image feature, determining a next residual network model of the second residual network model in the image processing models as the second residual network model, determining a next self-attention processing model of the second self-attention processing model in the image processing models as the second self-attention processing model, determining a next cross-attention processing model of the second cross-attention processing model in the image processing models as the second cross-attention processing model, and starting from the following steps until the second residual network model has no next residual network model in the image processing models, the second cross-attention processing model has no next cross-attention processing model in the image processing models:

and carrying out residual learning on the original image features by using a second residual network model in the image processing model.

12. The method of claim 11, wherein the method further comprises:

Determining the first cross-attention processing result as the skeleton feature; determining a next residual network model of the first residual network model in the image processing models as the first residual network model, determining a next self-attention processing model of the first self-attention processing model in the image processing models as the first self-attention processing model, determining a next cross-attention processing model of the first cross-attention processing model in the image processing models as the first cross-attention processing model, and starting from the steps of executing until the first residual network model has no next residual network model in the image processing models, the first self-attention processing model has no next self-attention processing model in the image processing models, the first cross-attention processing model has no next cross-attention processing model in the image processing models:

And residual learning is carried out on the skeleton characteristics by using a first residual network model in the image processing model, wherein the second self-attention processing model corresponds to the first self-attention processing model one by one.

13. The method according to any one of claims 1 to 12, further comprising:

performing image enhancement on the target image;

And outputting the enhanced target image.

14. The method according to any one of claims 1 to 12, wherein the text description information comprises at least one of: and displaying strategy information of the product to be displayed in the displaying process, background content of the target image to be generated and attribute information of the carrier.

15. A method of processing an image, comprising:

Identifying a virtual clothing store deployed on an electronic commerce platform, and identifying an clothing product image to be processed from the virtual clothing store, wherein the image content of the clothing product image comprises at least one clothing product to be displayed;

acquiring text description information corresponding to the clothing product to be displayed in the clothing product image, wherein the text description information is used for at least describing the wearing performance result displayed by the clothing product to be displayed when the clothing product to be displayed is worn by a carrier;

Using the text description information to guide an image processing model to analyze the clothing product image to obtain a target image, wherein the image processing model is obtained by training a meridional chart model, and the image content of the target image is used for simulating the wearing performance result;

And sending the target image to the electronic commerce platform.

16. A method of processing an image, comprising:

responding to image input operation acted on an operation interface, and displaying a product image to be processed on the operation interface, wherein the image content of the product image comprises at least one product to be displayed;

Responding to text input operation acted on the operation interface, and displaying text description information corresponding to the product to be displayed in the product image on the operation interface, wherein the text description information is used for describing at least the wearing performance result displayed by the product to be displayed under the condition of being worn by a carrier;

And responding to image generation operation acted on the operation interface, and displaying a target image matched with the product image and the text description information on the operation interface, wherein the target image is obtained by analyzing the product image by utilizing the text description information and guiding an image processing model, the image processing model is obtained by training a meridional chart model, and the image content of the target image is used for simulating the wearing representation result.

17. A method of processing an image, comprising:

Displaying a product image to be processed on a display picture of Virtual Reality (VR) equipment or Augmented Reality (AR) equipment, wherein the image content of the product image comprises at least one product to be displayed;

Driving the VR equipment or the AR equipment to utilize the text description information to guide an image processing model to analyze the product image so as to obtain a target image, wherein the image processing model is obtained by training a meridional chart model, and the image content of the target image is used for simulating the wearing representation result;

And displaying the target image on a display screen of the VR device or the AR device.

18. An image processing system, comprising:

The system comprises a client, a display device and a display device, wherein the client is used for uploading a product image to be processed, and the image content of the product image comprises at least one product to be displayed; uploading text description information corresponding to the product to be displayed in the product image, wherein the text description information is used for at least describing the wearing performance result displayed by the product to be displayed when the product to be displayed is worn by a carrier;

The server is used for guiding an image processing model to analyze the product image by utilizing the text description information to obtain a target image, wherein the image processing model is obtained by training a meridional chart model, and the image content of the target image is used for simulating the wearing performance result; and sending the target image to the client.

19. An electronic device, comprising:

A memory storing an executable program;

a processor for executing the program, wherein the program when run performs the method of any one of claims 1 to 17.

20. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored executable program, wherein the executable program when run controls a device in which the computer readable storage medium is located to perform the method of any one of claims 1 to 17.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 17.