CN117008763A

CN117008763A - Virtual clothes matching method and image generation model training method

Info

Publication number: CN117008763A
Application number: CN202310800449.5A
Authority: CN
Inventors: 田林睿; 王琪; 张昕荻; 卓力安; 张邦; 孙可; 曹健
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2023-06-30
Filing date: 2023-06-30
Publication date: 2023-11-07

Abstract

This application discloses a virtual clothing matching method and an image generation model training method. Wherein, the method includes: responding to input instructions acting on the operating interface, displaying clothing images of at least two types of virtual clothing on the operating interface; responding to clothing matching instructions acting on the operating interface, displaying at least two types of virtual clothing on the operating interface. A matching image of two types of virtual clothing. The matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object. The matching image is generated using an image generation model corresponding to at least two types of virtual clothing based on at least two types of virtual clothing. Images generated by clothing text corresponding to each type of virtual clothing, and the clothing text is used to describe at least two types of virtual clothing. This application solves the technical problems in related technologies that the effect of virtual clothing matching through virtual fitting technology is poor and the application scenarios are limited.

Description

Virtual clothing matching method, image generation model training method

技术领域Technical field

本申请涉及数据生成领域，具体而言，涉及一种虚拟服饰的匹配方法、图像生成模型的训练方法。The present application relates to the field of data generation, specifically, to a virtual clothing matching method and an image generation model training method.

背景技术Background technique

传统的虚拟试衣大都是基于2维服饰图片进行，主要通过变形的方式，将2维服饰图片匹配至虚拟对象上，从而得到试衣效果，但是，通过上述方式生成试衣效果一般，而且无法对特殊的款型服饰进行模拟。Traditional virtual fittings are mostly based on 2D clothing pictures. They mainly match the 2D clothing pictures to virtual objects through deformation to obtain the fitting effect. However, the fitting effect generated by the above method is average and cannot be customized. Simulate special styles of clothing.

发明内容Contents of the invention

本申请实施例提供了一种虚拟服饰的匹配方法、图像生成模型的训练方法，以至少解决相关技术中通过虚拟试衣技术进行虚拟服饰匹配的效果较差，且应用场景有限的技术问题。Embodiments of the present application provide a virtual clothing matching method and an image generation model training method to at least solve the technical problems in related technologies that virtual clothing matching through virtual fitting technology has poor effects and limited application scenarios.

根据本申请实施例的一个方面，提供了一种虚拟服饰的匹配方法，包括：响应作用于操作界面上的输入指令，在操作界面上显示至少两个类型的虚拟服饰的服饰图像；响应作用于操作界面上的服饰匹配指令，在操作界面上显示至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，匹配图像是利用至少两个类型的虚拟服饰对应的图像生成模型基于至少两个类型的虚拟服饰对应的服饰文本生成的图像，服饰文本用于对至少两个类型的虚拟服饰进行描述。According to an aspect of an embodiment of the present application, a virtual clothing matching method is provided, including: responding to an input instruction acting on an operating interface, displaying clothing images of at least two types of virtual clothing on the operating interface; responding to an input instruction acting on an operating interface; The clothing matching instruction on the operation interface displays matching images of at least two types of virtual clothing on the operation interface, where the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object, and the matching image It is an image generated by using an image generation model corresponding to at least two types of virtual clothing based on clothing text corresponding to at least two types of virtual clothing, and the clothing text is used to describe at least two types of virtual clothing.

根据本申请实施例的另一方面，还提供了一种虚拟服饰的匹配方法，包括：获取至少两个类型的虚拟服饰的服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。According to another aspect of the embodiment of the present application, a virtual clothing matching method is also provided, including: obtaining clothing images of at least two types of virtual clothing; determining image generation models and clothing corresponding to at least two types of virtual clothing. Text, where the clothing text is used to describe at least two types of virtual clothing; an image generation model is used to generate matching images of at least two types of virtual clothing based on the clothing text, where the matching image is used to represent at least two types of virtual clothing. The image obtained by matching the type of virtual clothing to the virtual object.

根据本申请实施例的另一方面，还提供了一种图像生成模型的训练方法，包括：构建包含至少两个类型的虚拟服饰的训练图像；生成训练图像对应的训练文本，其中，训练文本用于对至少两个类型的虚拟服饰进行描述；利用训练图像和训练文本对初始生成模型进行训练，得到图像生成模型，其中，图像生成模型用于基于至少两个类型的虚拟服饰对应的服饰文本，生成至少两个类型的虚拟服饰的匹配图像，服饰文本用于对至少两个类型的虚拟服饰进行描述，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。According to another aspect of the embodiments of the present application, a method for training an image generation model is also provided, including: constructing a training image containing at least two types of virtual clothing; generating training text corresponding to the training image, wherein the training text is Used to describe at least two types of virtual clothing; use training images and training texts to train an initial generation model to obtain an image generation model, wherein the image generation model is used to create clothing text corresponding to at least two types of virtual clothing, Matching images of at least two types of virtual clothing are generated, the clothing text is used to describe the at least two types of virtual clothing, and the matching image is used to represent an image obtained by matching the at least two types of virtual clothing to the virtual object.

根据本申请实施例的另一方面，还提供了一种虚拟服饰的匹配方法，包括：在虚拟现实VR设备或增强现实AR设备的呈现画面上展示至少两个类型的虚拟服饰的服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；驱动VR设备或AR设备渲染展示匹配图像。According to another aspect of the embodiment of the present application, a method for matching virtual clothing is also provided, including: displaying clothing images of at least two types of virtual clothing on a presentation screen of a virtual reality VR device or an augmented reality AR device; determining Image generation models and clothing texts corresponding to at least two types of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing; the image generation model is used to generate at least two types of virtual clothing based on the clothing text. Matching images, wherein the matching images are used to represent images obtained by matching at least two types of virtual clothing to virtual objects; driving the VR device or AR device to render and display the matching image.

根据本申请实施例的另一方面，还提供了一种虚拟服饰的匹配方法，包括：通过调用第一接口获取至少两个类型的虚拟服饰的服饰图像，其中，第一接口包括第一参数，第一参数的参数值为服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；通过调用第二接口在操作界面中输出匹配图像，其中，第二接口包括第二参数，第二参数的参数值为匹配图像。According to another aspect of the embodiment of the present application, a virtual clothing matching method is also provided, including: obtaining clothing images of at least two types of virtual clothing by calling a first interface, where the first interface includes a first parameter, The parameter value of the first parameter is a clothing image; determine the image generation model and clothing text corresponding to at least two types of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing; use the image generation model to generate the clothing based on the clothing Text, generate matching images of at least two types of virtual clothing, where the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object; output the matching image in the operation interface by calling the second interface , wherein the second interface includes a second parameter, and the parameter value of the second parameter is the matching image.

根据本申请实施例的另一方面，还提供了一种电子设备，包括：存储器，存储有可执行程序；处理器，用于运行程序，其中，程序运行时执行上述实施例中任意一项的方法。According to another aspect of the embodiment of the present application, an electronic device is also provided, including: a memory storing an executable program; a processor configured to run the program, wherein when the program is run, any one of the above embodiments is executed. method.

根据本申请实施例的另一方面，还提供了一种计算机可读存储介质，计算机可读存储介质包括存储的可执行程序，其中，在可执行程序运行时控制计算机可读存储介质所在设备执行上述实施例中任意一项的方法。According to another aspect of the embodiment of the present application, a computer-readable storage medium is also provided. The computer-readable storage medium includes a stored executable program, wherein when the executable program is running, the device where the computer-readable storage medium is located is controlled to execute. The method of any one of the above embodiments.

在本申请实施例中，响应作用于操作界面上的文本输入指令，可以在操作界面上显示至少两个类型的虚拟服饰的服饰图像，响应作用于操作界面上的服饰匹配指令，可以确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，然后利用图像生成模型基于服饰文本生成至少两个类型的虚拟服饰的图像，并在操作界面上显示匹配图像，从而实现虚拟试衣的目的。容易注意到的是，由于匹配图像是将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，实现了对特定款式的服饰进行模拟，将不同类型的虚拟服饰进行搭配展示的目的，而且，由于匹配图像是利用图像生成模型基于服饰文本生成的图像，实现了利用文本生成图像的方式来直接生成更高真实度、更多图像元素的模特图的目的，从而达到了提高虚拟服饰匹配效果，扩展虚拟服饰匹配的应用场景的技术效果，进而解决了相关技术中通过虚拟试衣技术进行虚拟服饰匹配的效果较差，且应用场景有限的技术问题。In the embodiment of the present application, in response to a text input instruction acting on the operation interface, clothing images of at least two types of virtual clothing can be displayed on the operation interface, and in response to a clothing matching instruction acting on the operation interface, at least two types of virtual clothing can be determined. The image generation model and clothing text corresponding to each type of virtual clothing are then used to generate images of at least two types of virtual clothing based on the clothing text, and the matching images are displayed on the operation interface, thereby achieving the purpose of virtual fitting. It is easy to notice that since the matching image is an image obtained by matching at least two types of virtual clothing to a virtual object, the purpose of simulating specific styles of clothing and matching and displaying different types of virtual clothing is achieved, and , since the matching image is an image generated based on the clothing text using the image generation model, it achieves the purpose of using the text to generate the image to directly generate a model picture with higher realism and more image elements, thereby improving the virtual clothing matching effect. , expand the technical effects of virtual clothing matching application scenarios, and thus solve the technical problems in related technologies that virtual clothing matching through virtual fitting technology has poor effects and limited application scenarios.

容易注意到的是，上面的通用描述和后面的详细描述仅仅是为了对本申请进行举例和解释，并不构成对本申请的限定。It is easy to notice that the above general description and the following detailed description are only for illustrating and explaining the present application, and do not constitute a limitation of the present application.

附图说明Description of the drawings

此处所说明的附图用来提供对本申请的进一步理解，构成本申请的一部分，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings described here are used to provide a further understanding of the present application and constitute a part of the present application. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation of the present application. In the attached picture:

图1是根据本申请实施例的一种虚拟服饰的匹配方法的虚拟现实设备的硬件环境的示意图；Figure 1 is a schematic diagram of the hardware environment of a virtual reality device according to a virtual clothing matching method according to an embodiment of the present application;

图2是根据本申请实施例的一种虚拟服饰的匹配方法的计算环境的结构框图；Figure 2 is a structural block diagram of a computing environment of a virtual clothing matching method according to an embodiment of the present application;

图3是根据本申请实施例1的虚拟服饰的匹配方法的流程图；Figure 3 is a flow chart of a virtual clothing matching method according to Embodiment 1 of the present application;

图4是根据本申请实施例的一种可选的操作界面的示意图；Figure 4 is a schematic diagram of an optional operation interface according to an embodiment of the present application;

图5是根据本申请实施例的一种可选的虚拟服饰的匹配方法的流程图；Figure 5 is a flow chart of an optional virtual clothing matching method according to an embodiment of the present application;

图6是根据本申请实施例2的虚拟服饰的匹配方法的流程图；Figure 6 is a flow chart of a virtual clothing matching method according to Embodiment 2 of the present application;

图7是根据本申请实施例3的模型训练方法的流程图；Figure 7 is a flow chart of a model training method according to Embodiment 3 of the present application;

图8是根据本申请实施例4的虚拟服饰的匹配方法的流程图；Figure 8 is a flow chart of a virtual clothing matching method according to Embodiment 4 of the present application;

图9是根据本申请实施例的一种虚拟服饰的匹配结果的示意图；Figure 9 is a schematic diagram of a matching result of virtual clothing according to an embodiment of the present application;

图10是根据本申请实施例5的虚拟服饰的匹配方法的流程图；Figure 10 is a flow chart of a virtual clothing matching method according to Embodiment 5 of the present application;

图11是根据本申请实施例6的虚拟服饰的匹配装置的示意图；Figure 11 is a schematic diagram of a virtual clothing matching device according to Embodiment 6 of the present application;

图12是根据本申请实施例7的虚拟服饰的匹配装置的示意图；Figure 12 is a schematic diagram of a virtual clothing matching device according to Embodiment 7 of the present application;

图13是根据本申请实施例8的模型训练装置的示意图；Figure 13 is a schematic diagram of a model training device according to Embodiment 8 of the present application;

图14是根据本申请实施例9的虚拟服饰的匹配装置的示意图；Figure 14 is a schematic diagram of a virtual clothing matching device according to Embodiment 9 of the present application;

图15是根据本申请实施例10的虚拟服饰的匹配装置的示意图；Figure 15 is a schematic diagram of a virtual clothing matching device according to Embodiment 10 of the present application;

图16是根据本申请实施例的一种计算机终端的结构框图。Figure 16 is a structural block diagram of a computer terminal according to an embodiment of the present application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分的实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本申请保护的范围。In order to enable those in the technical field to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only These are part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of this application.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

首先，在对本申请实施例进行描述的过程中出现的部分名词或术语适用于如下解释：First of all, some nouns or terms that appear in the description of the embodiments of this application are applicable to the following explanations:

AIGC：AI Generated Content，人工智能生成内容，也可以称为生成式AI，在本申请中主要指扩散模型生成虚拟服饰的匹配图像。AIGC: AI Generated Content, artificial intelligence generated content, can also be called generative AI. In this application, it mainly refers to the diffusion model to generate matching images of virtual clothing.

SD：Stable Diffusion，一种生成式大型网络模型结构。SD: Stable Diffusion, a generative large-scale network model structure.

LORA：Low Rank Adaptation，一种用于减少大模型微调成本的小型网络模型。LORA: Low Rank Adaptation, a small network model used to reduce the cost of fine-tuning large models.

BLIP2：Bootstrapping Language-Image Pre-training，一种通用的、计算效率高的视觉-语言预训练模型，可以利用冻结的预训练图像编码器和大型语言模型引导视觉语言预训练。BLIP2: Bootstrapping Language-Image Pre-training, a general, computationally efficient visual-language pre-training model that can leverage frozen pre-trained image encoders and large language models to guide visual language pre-training.

实施例1Example 1

根据本申请实施例，提供了一种虚拟服饰的匹配方法，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present application, a virtual clothing matching method is provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although A logical order is shown in the flowcharts, but in some cases the steps shown or described may be performed in a different order than herein.

图1是根据本申请实施例的一种虚拟服饰的匹配方法的虚拟现实设备的硬件环境的示意图。如图1所示，虚拟现实设备104与终端106相连接，终端106与服务器102通过网络进行连接，上述虚拟现实设备104并不限定于：虚拟现实头盔、虚拟现实眼镜、虚拟现实一体机等，上述终端104并不限定于PC、手机、平板电脑等，服务器102可以为媒体文件运营商对应的服务器，上述网络包括但不限于：广域网、城域网或局域网。Figure 1 is a schematic diagram of the hardware environment of a virtual reality device according to a virtual clothing matching method according to an embodiment of the present application. As shown in Figure 1, the virtual reality device 104 is connected to the terminal 106, and the terminal 106 and the server 102 are connected through a network. The above-mentioned virtual reality device 104 is not limited to: virtual reality helmets, virtual reality glasses, virtual reality all-in-one machines, etc. The above-mentioned terminal 104 is not limited to a PC, a mobile phone, a tablet, etc. The server 102 may be a server corresponding to a media file operator. The above-mentioned network includes but is not limited to: wide area network, metropolitan area network or local area network.

可选地，该实施例的虚拟现实设备104包括：存储器、处理器和传输装置。存储器用于存储应用程序，该应用程序可以用于执行：响应作用于操作界面上的输入指令，在操作界面上显示至少两个类型的虚拟服饰的服饰图像；响应作用于操作界面上的服饰匹配指令，在操作界面上显示至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，匹配图像是利用至少两个类型的虚拟服饰对应的图像生成模型基于至少两个类型的虚拟服饰对应的服饰文本生成的图像，服饰文本用于对至少两个类型的虚拟服饰进行描述，从而解决了相关技术中通过虚拟试衣技术进行虚拟服饰匹配的效果较差，且应用场景有限的技术问题，达到了提高虚拟服饰匹配效果，提升虚拟服饰匹配的应用场景的技术效果。Optionally, the virtual reality device 104 of this embodiment includes: a memory, a processor, and a transmission device. The memory is used to store an application program, which can be used to perform: in response to input instructions acting on the operating interface, display clothing images of at least two types of virtual clothing on the operating interface; in response to clothing matching acting on the operating interface Instructions to display matching images of at least two types of virtual clothing on the operation interface, wherein the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object, and the matching image is made by using at least two types of virtual clothing. The image generation model corresponding to the virtual clothing is based on images generated by clothing text corresponding to at least two types of virtual clothing. The clothing text is used to describe at least two types of virtual clothing, thereby solving the problem in related technologies through virtual fitting technology. The technical problems of poor virtual clothing matching and limited application scenarios have been achieved by improving the virtual clothing matching effect and improving the technical effects of virtual clothing matching application scenarios.

该实施例的终端可以用于执行在虚拟现实(Virtual Reality，简称为VR)设备或增强现实(Augmented Reality，简称为AR)设备的呈现画面上展示至少两个类型的虚拟服饰的服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；并向虚拟现实设备104发送匹配图像，虚拟现实设备104在接收到匹配图像之后在目标投放位置显示出来。The terminal of this embodiment can be used to display clothing images of at least two types of virtual clothing on the presentation screen of a virtual reality (Virtual Reality, referred to as VR) device or an augmented reality (Augmented Reality, referred to as AR) device; determine Image generation models and clothing texts corresponding to at least two types of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing; the image generation model is used to generate at least two types of virtual clothing based on the clothing text. Matching images, where the matching images are used to represent images obtained by matching at least two types of virtual clothing to virtual objects; and sending the matching images to the virtual reality device 104, which after receiving the matching images, places them on the target The location is displayed.

可选地，该实施例的虚拟现实设备104带有的眼球追踪的HMD(Head MountDisplay，头戴式显示器)头显与眼球追踪模块与上述实施例中的作用相同，也即，HMD头显中的屏幕，用于显示实时的画面，HMD中的眼球追踪模块，用于获取用户眼球的实时运动轨迹。该实施例的终端通过跟踪系统获取用户在真实三维空间的位置信息与运动信息，并计算出用户头部在虚拟三维空间中的三维坐标，以及用户在虚拟三维空间中的视野朝向。Optionally, the eye-tracking HMD (Head Mount Display) head-mounted display and eye-tracking module provided with the virtual reality device 104 of this embodiment have the same functions as in the above-mentioned embodiment, that is, in the HMD head-mounted display The screen is used to display real-time images, and the eye tracking module in the HMD is used to obtain the real-time movement trajectory of the user's eyeballs. The terminal of this embodiment obtains the user's position information and motion information in the real three-dimensional space through the tracking system, and calculates the three-dimensional coordinates of the user's head in the virtual three-dimensional space, as well as the user's field of view orientation in the virtual three-dimensional space.

图1示出的硬件结构框图，不仅可以作为上述AR/VR设备(或移动设备)的示例性框图，还可以作为上述服务器的示例性框图，一种可选实施例中，图2以框图示出了使用上述图1所示的AR/VR设备(或移动设备)作为计算环境201中计算节点的一种实施例。图2是根据本申请实施例的一种虚拟服饰的匹配方法的计算环境的结构框图，如图2所示，计算环境201包括运行在分布式网络上的多个(图中采用210-1，210-2，…,来示出)计算节点(如服务器)。不同计算节点都包含本地处理和内存资源，终端用户202可以在计算环境201中远程运行应用程序或存储数据。应用程序可以作为计算环境201中的多个服务220-1，220-2，220-3和220-4进行提供，分别代表服务“A”，“D”，“E”和“H”。The hardware structure block diagram shown in Figure 1 can not only be used as an exemplary block diagram of the above-mentioned AR/VR device (or mobile device), but also can be used as an exemplary block diagram of the above-mentioned server. In an optional embodiment, Figure 2 is a block diagram An embodiment of using the AR/VR device (or mobile device) shown in FIG. 1 above as a computing node in the computing environment 201 is shown. Figure 2 is a structural block diagram of a computing environment of a virtual clothing matching method according to an embodiment of the present application. As shown in Figure 2, the computing environment 201 includes multiple (210-1 is used in the figure) running on a distributed network. 210-2,..., to show) computing nodes (such as servers). Different computing nodes contain local processing and memory resources, and end users 202 can remotely run applications or store data in computing environment 201. Applications may be provided as multiple services 220-1, 220-2, 220-3 and 220-4 in computing environment 201, representing services "A", "D", "E" and "H" respectively.

终端用户202可以通过客户端上的web浏览器或其他软件应用程序提供和访问服务，在一些实施例中，可以将终端用户202的供应和/或请求提供给入口网关230。入口网关230可以包括一个相应的代理来处理针对服务(计算环境201中提供的一个或多个服务)的供应和/或请求。End users 202 may provide and access services through a web browser or other software application on the client, and in some embodiments, end user 202 offers and/or requests may be provided to ingress gateway 230 . Ingress gateway 230 may include a corresponding agent to handle provisioning and/or requests for services (one or more services provided in computing environment 201).

服务是根据计算环境201支持的各种虚拟化技术来提供或部署的。在一些实施例中，可以根据基于虚拟机(Virtual Machine，VM)的虚拟化、基于容器的虚拟化和/或类似的方式提供服务。基于虚拟机的虚拟化可以是通过初始化虚拟机来模拟真实的计算机，在不直接接触任何实际硬件资源的情况下执行程序和应用程序。在虚拟机虚拟化机器的同时，根据基于容器的虚拟化，可以启动容器来虚拟化整个操作系统(Operating System，OS)，以便多个工作负载可以在单个操作系统实例上运行。Services are provided or deployed according to various virtualization technologies supported by computing environment 201. In some embodiments, services may be provided according to virtual machine (VM)-based virtualization, container-based virtualization, and/or similar approaches. Virtual machine-based virtualization can simulate a real computer by initializing a virtual machine to execute programs and applications without directly touching any actual hardware resources. While a virtual machine virtualizes a machine, according to container-based virtualization, containers can be launched to virtualize the entire operating system (OS) so that multiple workloads can run on a single operating system instance.

在基于容器虚拟化的一个实施例中，服务的若干容器可以被组装成一个Pod(例如，Kubernetes Pod)。举例来说，如图2所示，服务220-2可以配备一个或多个Pod 240-1，240-2，…，240-N(统称为Pod)。Pod可以包括代理245和一个或多个容器242-1，242-2，…，242-M(统称为容器)。Pod中一个或多个容器处理与服务的一个或多个相应功能相关的请求，代理245通常控制与服务相关的网络功能，如路由、负载均衡等。其他服务也可以配备类似的Pod。In one embodiment based on container virtualization, several containers of a service can be assembled into a Pod (for example, Kubernetes Pod). For example, as shown in Figure 2, service 220-2 may be equipped with one or more Pods 240-1, 240-2, ..., 240-N (collectively referred to as Pods). A Pod may include an agent 245 and one or more containers 242-1, 242-2, ..., 242-M (collectively, containers). One or more containers in a Pod handle requests related to one or more corresponding functions of the service. The proxy 245 typically controls network functions related to the service, such as routing, load balancing, etc. Other services can be equipped with similar Pods.

在操作过程中，执行来自终端用户202的用户请求可能需要调用计算环境201中的一个或多个服务，执行一个服务的一个或多个功能需要调用另一个服务的一个或多个功能。如图2所示，服务“A”220-1从入口网关230接收终端用户202的用户请求，服务“A”220-1可以调用服务“D”220-2，服务“D”220-2可以请求服务“E”220-3执行一个或多个功能。During operation, executing a user request from an end user 202 may require invoking one or more services in the computing environment 201 and executing one or more functions of one service may require invoking one or more functions of another service. As shown in Figure 2, service "A" 220-1 receives the user request of the end user 202 from the ingress gateway 230. Service "A" 220-1 can call service "D" 220-2, and service "D" 220-2 can Service "E" 220-3 is requested to perform one or more functions.

上述的计算环境可以是云计算环境，资源的分配由云服务提供上管理，允许功能的开发无需考虑实现、调整或扩展服务器。该计算环境允许开发人员在不构建或维护复杂基础设施的情况下执行响应事件的代码。服务可以被分割完成一组可以自动独立伸缩的功能，而不是扩展单个硬件设备来处理潜在的负载。The above-mentioned computing environment can be a cloud computing environment, and the allocation of resources is provided and managed by the cloud service, allowing the development of functions without considering the implementation, adjustment or expansion of the server. This computing environment allows developers to execute code that responds to events without building or maintaining complex infrastructure. Instead of scaling a single hardware device to handle the potential load, services can be partitioned into a set of functions that can automatically scale independently.

在上述运行环境下，本申请提供了如图3所示的虚拟服饰的匹配方法。需要说明的是，该实施例的虚拟服饰的匹配方法可以由图1所示实施例的移动终端执行。图3是根据本申请实施例1的虚拟服饰的匹配方法的流程图。如图3所示，该方法可以包括如下步骤：Under the above operating environment, this application provides a virtual clothing matching method as shown in Figure 3. It should be noted that the virtual clothing matching method of this embodiment can be executed by the mobile terminal of the embodiment shown in FIG. 1 . Figure 3 is a flow chart of a virtual clothing matching method according to Embodiment 1 of the present application. As shown in Figure 3, the method may include the following steps:

步骤S302，响应作用于操作界面上的输入指令，在操作界面上显示至少两个类型的虚拟服饰的服饰图像。Step S302: In response to an input instruction acting on the operation interface, clothing images of at least two types of virtual clothing are displayed on the operation interface.

上述的操作界面可以是移动终端提供给用户进行人机交互的界面，例如，可以是如图4所示的操作界面，但不仅限于此，也可以是其他操作界面，可以由用户根据实际需要进行设定。上述的输入指令可以是用户在操作界面上选择服饰图像所生成的指令，例如，以如图4所示的操作界面为例，输入指令可以是用户在“服饰显示区域”内选择希望搭配展示的多个虚拟服饰所生成的指令。The above-mentioned operation interface may be an interface provided by the mobile terminal for users to perform human-computer interaction. For example, it may be the operation interface as shown in Figure 4, but it is not limited to this. It may also be other operation interfaces, which can be performed by the user according to actual needs. set up. The above-mentioned input instructions can be instructions generated by the user selecting clothing images on the operation interface. For example, taking the operation interface shown in Figure 4 as an example, the input instructions can be the user selecting the items they wish to display in the "clothing display area". Instructions generated by multiple virtual garments.

上述的至少两个类型的虚拟服饰可以是用户希望搭配展示的虚拟服饰，此处的类型可以包括：上装、下装、鞋子、帽子、手套、配饰、包、伞等，但不仅限于此。在本申请实施例中，以上装和下装两个类型的虚拟服饰为例进行说明，但不仅限于此。The above-mentioned at least two types of virtual clothing can be virtual clothing that the user wants to display together. The types here can include: tops, bottoms, shoes, hats, gloves, accessories, bags, umbrellas, etc., but are not limited to these. In the embodiment of the present application, two types of virtual clothing, tops and bottoms, are used as examples for description, but it is not limited to this.

在一种可选的实施例中，用户可以通过在移动终端提供的操作界面上进行操作，选择希望搭配展示的多个虚拟服饰，从而生成输入指令。移动终端设备在接收到输入指令之后，可以将用户选中的多个虚拟服饰的服饰图像在操作界面上进行标记，由用户确认服饰图像是否存在错误。In an optional embodiment, the user can operate on the operation interface provided by the mobile terminal to select multiple virtual garments that he or she wishes to display together, thereby generating an input instruction. After receiving the input command, the mobile terminal device can mark the clothing images of multiple virtual clothing selected by the user on the operation interface, and the user can confirm whether there are errors in the clothing images.

例如，以电商场景中卖家侧的生成需求为例进行说明，卖家为了能够为自己的商品生成媲美专业摄影的模特图，可以在如图4所示的操作界面中的“服饰显示区域”内显示卖家售卖的服饰商品，从而卖家可以在“服饰显示区域”内选择多个服饰商品，例如，卖家希望将绿色衬衫和蓝色牛仔裤进行搭配展示，可以在“服饰显示区域”内选择绿色衬衫的图像和蓝色牛仔裤的图像，从而移动终端可以生成输入指令，并将卖家选择的图像进行标记，例如，在“服饰显示区域”内将卖家选择的图像上增加一个“√”标记，供卖家查看。For example, taking the generation requirements on the seller side in the e-commerce scenario as an example, in order to be able to generate model pictures for their products that are comparable to professional photography, the seller can enter the "clothing display area" in the operation interface as shown in Figure 4. Display the clothing products sold by the seller, so that the seller can select multiple clothing products in the "Clothing Display Area". For example, if the seller wants to display a green shirt and blue jeans together, he can select the green shirt in the "Clothing Display Area" image and the image of blue jeans, so that the mobile terminal can generate input instructions and mark the image selected by the seller, for example, add a "√" mark to the image selected by the seller in the "clothing display area" for the seller to view .

又例如，以电商场景中买家侧的生成需求为例进行说明，买家在购买服饰的过程中，为了能够查看买家希望搭配展示的多个待购买服饰的搭配展示效果，买家可以在如图4所示的操作界面中的“服饰显示区域”内，选择希望搭配展示的多个待购买服饰，例如，买家想查看绿色衬衫和蓝色牛仔裤的搭配展示效果，可以在“服饰显示区域”内选择绿色衬衫的图像和蓝色牛仔裤的图像，从而移动终端可以生成输入指令，并将买家选择的图像进行标记，例如，在“服饰显示区域”内将买家选择的图像上增加一个“√”标记，供买家查看。For another example, take the generation of demand on the buyer's side in an e-commerce scenario as an example. In the process of purchasing clothing, in order to view the matching display effect of multiple clothing to be purchased that the buyer wants to display together, the buyer can In the "Clothing Display Area" in the operation interface as shown in Figure 4, select multiple garments to be purchased that you want to display in combination. For example, if a buyer wants to see the matching display effect of a green shirt and blue jeans, he can go to "Clothing" Select the image of a green shirt and the image of blue jeans in the "display area", so that the mobile terminal can generate an input instruction and mark the image selected by the buyer, for example, mark the image selected by the buyer in the "clothing display area" Add a "√" mark for buyers to view.

步骤S304，响应作用于操作界面上的服饰匹配指令，在操作界面上显示至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，匹配图像是利用至少两个类型的虚拟服饰对应的图像生成模型基于至少两个类型的虚拟服饰对应的服饰文本生成的图像，服饰文本用于对至少两个类型的虚拟服饰进行描述。Step S304: In response to the clothing matching instruction acting on the operation interface, the matching images of at least two types of virtual clothing are displayed on the operation interface, where the matching images are used to represent the matching of at least two types of virtual clothing to the virtual object. The obtained image, the matching image is an image generated by using an image generation model corresponding to at least two types of virtual clothing based on clothing text corresponding to at least two types of virtual clothing, and the clothing text is used to describe at least two types of virtual clothing. .

上述的服饰匹配指令可以是将至少两个类型的虚拟服饰进行搭配展示的指令，可选的，可以是用户在操作界面上点击预设控件所生成的指令，例如，以如图4所示的操作界面为例，服饰匹配指令可以是用户点击“虚拟试衣”按钮所生成的指令。The above clothing matching instruction may be an instruction to match and display at least two types of virtual clothing. Alternatively, it may be an instruction generated by the user clicking a preset control on the operation interface, for example, as shown in Figure 4 Taking the operation interface as an example, the clothing matching instructions can be instructions generated by the user clicking the "Virtual Fitting" button.

上述的匹配图像可以是将至少两个类型的虚拟服饰搭配穿戴在虚拟对象上所生成的高拟真的模特图像，上述的虚拟对象可以是预先设定好的一个虚拟模特，也可以是与用户相匹配的虚拟模特，但不仅限于此。上述的图像生成模型可以是一个大模型，例如，可以是SD模型，但不仅限于此。由于图像生成模型的参数量较大，为了避免对大模型进行训练困难且成本较高，可以通过预先训练一个低秩模型，并通过将两个模型并行运行的方式，实现大模型和低秩模型的共同推理过程，此处的低秩模型可以是一个网络结构较小的模型，例如，LORA模型，但不仅限于此。The above-mentioned matching image can be a highly realistic model image generated by matching at least two types of virtual clothing and wearing it on a virtual object. The above-mentioned virtual object can be a preset virtual model, or it can be a virtual model with the user. Matching avatars, but not only. The above image generation model can be a large model, for example, it can be an SD model, but it is not limited to this. Due to the large number of parameters of the image generation model, in order to avoid the difficulty and high cost of training a large model, a low-rank model can be pre-trained and the two models can be run in parallel to achieve the large model and the low-rank model. In the common inference process, the low-rank model here can be a model with a smaller network structure, such as the LORA model, but is not limited to this.

在一种可选的实施例中，在用户确认服饰图像正确的情况下，用户可以在操作界面上进行操作，点击预设控件生成服饰匹配指令，从而移动终端可以将用户确认后的服饰图像发送给服务器，由服务器确定能够将至少两个类型的虚拟服饰进行搭配展示的图像生成模型，并且用于对至少两个类型的虚拟服饰进行描述的服饰文本，利用图像生成模型生成服饰文本对应的匹配图像，并返回给移动终端，从而移动终端可以将匹配图像显示在操作界面上供用户查看。In an optional embodiment, when the user confirms that the clothing image is correct, the user can operate on the operation interface and click the preset control to generate a clothing matching instruction, so that the mobile terminal can send the clothing image confirmed by the user. To the server, the server determines an image generation model that can match and display at least two types of virtual clothing, and uses clothing text to describe at least two types of virtual clothing, and uses the image generation model to generate a matching corresponding to the clothing text. The image is returned to the mobile terminal, so that the mobile terminal can display the matching image on the operation interface for the user to view.

例如，以电商场景中卖家侧的生成需求为例进行说明，在卖家确定多个服饰商品正确的情况下，卖家可以点击如图4所示的操作界面中的“虚拟试衣”按钮，从而移动终端可以接收到服饰匹配指令，并将多个服饰商品的图像发送给服务器，由服务器确定能够将多个服饰商品进行搭配展示的图像生成模型，以及多个服饰商品的描述信息(即上述的服饰文本)，利用图像生成模型生成描述信息对应的模特图，也即上述的匹配图像，并返回给移动终端，由移动终端显示在操作界面中的“服饰搭配显示区域”内，从而卖家可以看到为自己的服饰商品生成的模特图，从而在保证低成本的同时大大提高商品宣传的质量。For example, taking the generated demand on the seller side in an e-commerce scenario as an example, when the seller determines that multiple clothing products are correct, the seller can click the "virtual fitting" button in the operation interface as shown in Figure 4, thereby The mobile terminal can receive the clothing matching instruction and send images of multiple clothing products to the server, and the server determines an image generation model that can display the multiple clothing products together, as well as the description information of the multiple clothing products (i.e., the above-mentioned clothing text), use the image generation model to generate a model image corresponding to the description information, that is, the above-mentioned matching image, and return it to the mobile terminal, which is displayed in the "clothing matching display area" in the operation interface, so that the seller can view to generate model pictures for your own clothing products, thereby greatly improving the quality of product promotion while ensuring low costs.

又例如，以电商场景中买家侧的生成需求为例进行说明，在买家确定多个待购买服饰正确的情况下，买家可以点击如图4所示的操作界面中的“虚拟试衣”按钮，从而移动终端可以接收到服饰匹配指令，并将多个待购买服饰的图像发送给服务器，由服务器确定能够将多个待购买服饰进行搭配展示的图像生成模型，以及多个待购买服饰的描述信息(即上述的服饰文本)，利用图像生成模型生成描述信息对应的模特图，也即上述的匹配图像，并返回给移动终端，由移动终端显示在操作界面中的“服饰搭配显示区域”内，从而买家可以看到自己穿戴上多个待购买服饰后的搭配展示效果，从而为买家购买服饰提供参考。For another example, take the generation of demand on the buyer's side in an e-commerce scenario as an example. When the buyer determines that the multiple garments to be purchased are correct, the buyer can click "Virtual Trial" in the operation interface as shown in Figure 4. "Clothes" button, so that the mobile terminal can receive clothing matching instructions and send multiple images of clothing to be purchased to the server, and the server determines an image generation model that can match and display multiple clothing to be purchased, as well as multiple images to be purchased. The description information of the clothing (i.e., the above-mentioned clothing text) uses the image generation model to generate the model image corresponding to the description information, that is, the above-mentioned matching image, and returns it to the mobile terminal, and the mobile terminal displays the "clothing matching display" in the operation interface. "area", so that buyers can see the matching display effect after wearing multiple clothing to be purchased, thereby providing buyers with a reference for purchasing clothing.

通过本申请上述实施例提供的方案，响应作用于操作界面上的文本输入指令，可以在操作界面上显示至少两个类型的虚拟服饰的服饰图像，响应作用于操作界面上的服饰匹配指令，可以确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，然后利用图像生成模型基于服饰文本生成至少两个类型的虚拟服饰的图像，并在操作界面上显示匹配图像，从而实现虚拟试衣的目的。容易注意到的是，由于匹配图像是将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，实现了对特定款式的服饰进行模拟，将不同类型的虚拟服饰进行搭配展示的目的，而且，由于匹配图像是利用图像生成模型基于服饰文本生成的图像，实现了利用文本生成图像的方式来直接生成更高真实度、更多图像元素的模特图的目的，从而达到了提高虚拟服饰匹配效果，扩展虚拟服饰匹配的应用场景的技术效果，进而解决了相关技术中通过虚拟试衣技术进行虚拟服饰匹配的效果较差，且应用场景有限的技术问题。Through the solutions provided by the above embodiments of the present application, clothing images of at least two types of virtual clothing can be displayed on the operating interface in response to text input instructions acting on the operating interface. In response to clothing matching instructions acting on the operating interface, clothing images of at least two types of virtual clothing can be displayed on the operating interface. Determine the image generation model and clothing text corresponding to at least two types of virtual clothing, and then use the image generation model to generate at least two types of virtual clothing images based on the clothing text, and display the matching images on the operation interface, thereby realizing virtual fitting the goal of. It is easy to notice that since the matching image is an image obtained by matching at least two types of virtual clothing to a virtual object, the purpose of simulating specific styles of clothing and matching and displaying different types of virtual clothing is achieved, and , since the matching image is an image generated based on the clothing text using the image generation model, it achieves the purpose of using the text to generate the image to directly generate a model picture with higher realism and more image elements, thereby improving the virtual clothing matching effect. , expand the technical effects of virtual clothing matching application scenarios, and thus solve the technical problems in related technologies that virtual clothing matching through virtual fitting technology has poor effects and limited application scenarios.

在本申请上述实施例中，该方法还包括：利用图像生成模型中的特征提取模块对服饰文本进行特征提取，得到服饰特征；利用图像生成模型中的生成模块基于服饰特征和对象特征，生成匹配图像，其中，对象特征用于表征虚拟对象。In the above embodiment of the present application, the method further includes: using the feature extraction module in the image generation model to extract features from the clothing text to obtain clothing features; using the generation module in the image generation model to generate matching based on clothing features and object features. Images where object features are used to characterize virtual objects.

上述的特征提取模块可以是图像生成模型中的Encoder(编码器)，上述的生成模块可以包含图像生成模型中的UNet(U型卷积神经网络)和Decoder(解码器)，但不仅限于此。The above-mentioned feature extraction module may be an Encoder (encoder) in the image generation model, and the above-mentioned generation module may include UNet (U-shaped convolutional neural network) and Decoder (decoder) in the image generation model, but it is not limited to this.

上述的对象特征可以是匹配图像中包含的虚拟对象的特征。如果用户未输入对虚拟对象进行描述的描述信息，则可以将预先设定好的一个虚拟对象的特征作为上述的对象特征；如果用户输入了对虚拟对象进行描述的描述信息，则可以通过对该描述信息进行特征提取，提取出虚拟对象的特征作为上述的对象特征。The above-mentioned object features may be features matching virtual objects contained in the image. If the user does not input description information to describe the virtual object, the characteristics of a preset virtual object can be used as the above-mentioned object characteristics; if the user inputs the description information to describe the virtual object, then the characteristics of the virtual object can be The description information is used for feature extraction, and the features of the virtual object are extracted as the above-mentioned object features.

在一种可选的实施例中，可以利用特征提取模块提取服饰特征，可以包括但不限于：服饰颜色、服饰材质、服饰款式等。进一步地，可以利用生成模块对服饰特征和对象特征进行处理，实现文生图的目的，从而生成匹配图像。In an optional embodiment, the feature extraction module can be used to extract clothing features, which may include but are not limited to: clothing color, clothing material, clothing style, etc. Furthermore, the generation module can be used to process clothing features and object features to achieve the purpose of the Vincentian graph, thereby generating matching images.

在本申请上述实施例中，操作界面上还显示有用于对虚拟对象进行描述的对象文本，该方法还包括：利用特征提取模块对对象文本进行特征提取，得到对象特征。In the above embodiments of the present application, the object text used to describe the virtual object is also displayed on the operation interface. The method further includes: using a feature extraction module to extract features from the object text to obtain object features.

在一种可选的实施例中，为了确保匹配图像可以满足用户的虚拟试衣要求，用户可以自行指定匹配图像中的虚拟对象，也即，在选择多个虚拟服饰的同时，可以输入对象文本，例如，用户可以输入“1个站立的短发女孩”作为对象文本，从而可以通过对对象文本分别进行特征提取，得到对象特征，进而利用生成模块可以生成包含穿搭有多个虚拟服饰的虚拟对象的匹配图像。In an optional embodiment, in order to ensure that the matching image can meet the user's virtual fitting requirements, the user can specify the virtual object in the matching image, that is, while selecting multiple virtual clothes, the user can enter the object text , for example, the user can input "a standing girl with short hair" as the object text, so that the object features can be obtained by separately extracting features from the object text, and then the generation module can be used to generate a virtual object containing multiple virtual clothes. matching images.

在本申请上述实施例中，操作界面上还显示有用于对虚拟对象所处虚拟环境进行描述的环境文本，利用图像生成模型中的生成模块基于服饰特征和对象特征，生成匹配图像，包括：利用特征提取模块对环境文本进行特征提取，得到环境特征；利用生成模块基于服饰特征、对象特征和环境特征生成匹配图像。In the above embodiments of the present application, the operating interface also displays environmental text describing the virtual environment in which the virtual object is located. The generation module in the image generation model is used to generate matching images based on clothing features and object features, including: using The feature extraction module extracts features from the environmental text to obtain environmental features; the generation module is used to generate matching images based on clothing features, object features and environmental features.

上述的虚拟环境可以是希望显示在匹配图像中的背景，例如墙体、地面、窗户等，但不仅限于此。The above-mentioned virtual environment can be a background that is desired to be displayed in the matching image, such as walls, floors, windows, etc., but is not limited to this.

在一种可选的实施例中，为了更加凸显虚拟试衣的显示效果，用户可以自行指定匹配图像中的背景元素，也即，在选择多个虚拟服饰的同时，可以输入环境文本，例如，用户可以输入“简单背景”作为环境文本，从而可以通过对环境文本进行特征提取，得到服饰特征，进而利用生成模块可以生成包含穿搭有多个虚拟服饰的虚拟对象和背景的匹配图像。In an optional embodiment, in order to highlight the display effect of the virtual fitting, the user can specify the background elements in the matching image, that is, while selecting multiple virtual clothes, the user can input environmental text, for example, The user can input "simple background" as the environment text, and then the clothing features can be obtained through feature extraction of the environment text, and then the generation module can be used to generate matching images containing virtual objects and backgrounds wearing multiple virtual clothing.

在本申请上述实施例中，图像生成模型包括：训练好的生成模型和大模型，大模型包含依次连接的至少一个第一模块，训练好的生成模型包含至少一个第二模块，第二模块与第一模块相匹配，该方法还包括：利用至少一个第一模块中的当前第一模块基于服饰文本，生成第一匹配数据；利用至少一个第二模块中的当前第二模块基于服饰文本，生成第二匹配数据，其中，当前第二模块为与当前第一模块对应的第二模块；将第一匹配数据和第二匹配数据进行合并，得到合并数据，其中，在当前第一模块为至少一个第一模块中的最后一个模块的情况下，合并数据为匹配图像；在当前第一模块不为至少一个第一模块中的最后一个模块的情况下，将合并数据分别输入至至少一个第一模块中的下一个第一模块和至少一个第二模块中的下一个第二模块。In the above embodiments of the present application, the image generation model includes: a trained generation model and a large model. The large model includes at least one first module connected in sequence. The trained generation model includes at least one second module. The second module is connected with The method further includes: using the current first module in at least one first module to generate first matching data based on the clothing text; using the current second module in at least one second module to generate based on the clothing text. second matching data, where the current second module is the second module corresponding to the current first module; merge the first matching data and the second matching data to obtain merged data, where the current first module is at least one In the case of the last module in the first module, the merged data is a matching image; in the case that the current first module is not the last module in at least one first module, the merged data is input to at least one first module respectively. the next first module in and the next second module in at least one second module.

上述的大模型是由依次连接的多个网络模块(即上述的第一模块)构成的，为了避免对大模型进行参数调整的成本较高，可以预先训练一个低秩模型(即上述的训练好的生成模型)，该低秩模型包含的网络模块与大模型包含的网络模块相对应。上述的第一匹配数据和第二匹配数据可以是模型推理过程中生成的特征向量。The above-mentioned large model is composed of multiple network modules (i.e., the above-mentioned first module) connected in sequence. In order to avoid the high cost of parameter adjustment of the large model, a low-rank model (i.e., the above-mentioned well-trained model) can be pre-trained. Generative model), the network modules contained in the low-rank model correspond to the network modules contained in the large model. The above-mentioned first matching data and second matching data may be feature vectors generated during model inference.

在一种可选的实施例中，基于服饰文本，大模型单个网络模块生成第一匹配数据，低秩模型中对应的网络模块生成第二匹配数据，然后对第一匹配数据和第二匹配数据进行合并，合并数据继续传入下一个子网络模块，最终获得至少两个类型的虚拟服饰的匹配图像。In an optional embodiment, based on the clothing text, a single network module of the large model generates the first matching data, the corresponding network module in the low-rank model generates the second matching data, and then the first matching data and the second matching data are Merge is performed, and the merged data continues to be passed to the next sub-network module, and finally matching images of at least two types of virtual clothing are obtained.

在本申请上述实施例中，该方法还包括：构建包含至少两个类型的虚拟服饰的训练图像；生成训练图像对应的训练文本，其中，训练文本用于对至少两个类型的虚拟服饰进行描述；利用训练图像和训练文本对初始生成模型进行训练，得到训练好的生成模型。In the above embodiment of the present application, the method further includes: constructing a training image containing at least two types of virtual clothing; and generating training text corresponding to the training image, wherein the training text is used to describe at least two types of virtual clothing. ; Use training images and training text to train the initial generation model to obtain a trained generation model.

在一种可选的实施例中，由于匹配图像不仅仅需要显示一个服饰穿戴在模特身上的效果，往往需要显示不同类型的服饰搭配穿戴在模特身上的效果，因此，可以预先构建需要搭配展示的不同类型的虚拟服饰分别穿戴在模特身上的图像，即上述的训练图像。进一步地，可以由人工基于训练图像，生成相应的训练文本，作为训练图像的标签，或者通过一个图像文本推理模型实现图生文的目的，生成相应的训练文本，作为训练图像的标签。最终将训练图像和训练图像的标签作为最终的训练数据，对初始生成模型进行训练。In an optional embodiment, since the matching image not only needs to show the effect of a piece of clothing worn on the model, but also often needs to show the effect of different types of clothing worn on the model, therefore, the matching image that needs to be displayed can be pre-constructed. Images of different types of virtual clothing worn on models respectively, which are the above-mentioned training images. Furthermore, the corresponding training text can be manually generated based on the training image as the label of the training image, or an image text reasoning model can be used to realize the purpose of drawing text and the corresponding training text can be generated as the label of the training image. Finally, the training images and the labels of the training images are used as the final training data to train the initial generation model.

需要说明的是，为了降低匹配图像的生成成本，可以提供较少数量的训练图像进行训练，此处的训练图像的数量可以是6张，但不仅限于此。It should be noted that in order to reduce the cost of generating matching images, a smaller number of training images can be provided for training. The number of training images here can be 6, but is not limited to this.

在本申请上述实施例中，构建包含至少两个类型的虚拟服饰的训练图像包括：获取原始图像，其中，原始图像的显示内容包含至少两个类型的虚拟服饰和至少一个对象；对原始图像中至少一个对象所在区域进行分割，得到分割图像；将分割图像叠加至预设图像，生成训练图像。In the above embodiments of the present application, constructing a training image containing at least two types of virtual clothing includes: obtaining an original image, wherein the display content of the original image contains at least two types of virtual clothing and at least one object; Segment the area where at least one object is located to obtain a segmented image; superimpose the segmented image onto a preset image to generate a training image.

上述的原始图像可以是通过专业摄影对穿戴不同类型的虚拟服饰的模特进行拍摄得到的图像，该原始图像中可以包含一个或多个模特(即上述的至少一个对象)。The above-mentioned original image may be an image obtained through professional photography of models wearing different types of virtual clothing, and the original image may contain one or more models (ie, at least one of the above-mentioned objects).

上述的预设图像可以是预设背景的图像，为了避免背景元素对匹配图像生成造成影响，预设图像中的背景可以是一个纯色背景，并不包含任何背景元素，但不仅限于此。The above-mentioned preset image can be an image with a preset background. In order to avoid the influence of background elements on matching image generation, the background in the preset image can be a solid color background and does not contain any background elements, but it is not limited to this.

在一种可选的实施例中，为了方便后续基于用户输入的背景元素，生成匹配图像中的背景，作为训练图像的图像要求背景干净，且主体鲜明。因此，可以获取用户指定的服饰图像作为原始图像，或者获取预先设定的服饰图像作为原始图像，原始图像中通常包含有穿戴在模特身上的多个虚拟服饰，因此，可以通过对模特所在区域进行分割，并且可以将不同类型的虚拟服饰进行分割，防止原始图像中的固有服饰搭配对匹配图像造成影响。进一步地，将分割图像裁剪成指定大小之后，可以将分割图像贴在纯色背景上，构造训练图像。In an optional embodiment, in order to facilitate subsequent generation of the background in the matching image based on the background elements input by the user, the image used as the training image requires a clean background and a clear subject. Therefore, the clothing image specified by the user can be obtained as the original image, or the preset clothing image can be obtained as the original image. The original image usually contains multiple virtual clothing worn on the model. Therefore, the area where the model is located can be Segmentation, and can segment different types of virtual clothing to prevent the inherent clothing matching in the original image from affecting the matching image. Further, after cropping the segmented image to a specified size, the segmented image can be pasted on a solid color background to construct a training image.

在本申请上述实施例中，获取原始图像包括：在操作界面中显示至少一个图像集合，其中，不同图像集合包含的至少两个类型的虚拟服饰不同；响应作用于至少一个图像集合的选择指令，获取选择指令对应的图像集合，得到原始图像。In the above embodiment of the present application, obtaining the original image includes: displaying at least one image set in the operation interface, wherein different image sets contain at least two types of virtual clothing that are different; responding to a selection instruction acting on at least one image set, Get the image collection corresponding to the selection instruction and get the original image.

上述的至少一个图像集合可以是电商场景中不同卖家的商品模特图，也可以是从网络上获取到的不同品牌的商品模特图，但不仅限于此。The above-mentioned at least one image collection may be product model pictures of different sellers in an e-commerce scenario, or may be product model pictures of different brands obtained from the Internet, but is not limited to this.

上述的选择指令可以是用户在至少一个图像集合上进行点击所生成的指令，但不仅限于此。The above-mentioned selection instructions may be instructions generated by the user clicking on at least one image set, but are not limited to this.

在一种可选的实施例中，为了确保生成的匹配图像更加符合用户的要求，用户可以指定特定品牌的商品模特图，或者指定特定卖家的商品模特图来构建最终的训练图像。可选地，可以将至少一个图像集合显示在移动终端的操作界面中进行查看，用户根据匹配图像生成需求，从至少一个图像集合中选择一个或多个图像集合，用户直接通过点击选中的图像集合来生成选择指令，从而移动终端可以基于选择指令确定用户选中的图像集合，得到原始图像。In an optional embodiment, in order to ensure that the generated matching images better meet the user's requirements, the user can specify a product model image of a specific brand, or specify a product model image of a specific seller to construct the final training image. Optionally, at least one image set can be displayed in the operation interface of the mobile terminal for viewing. The user selects one or more image sets from the at least one image set according to matching image generation requirements, and the user directly clicks on the selected image set. to generate a selection instruction, so that the mobile terminal can determine the image set selected by the user based on the selection instruction and obtain the original image.

在本申请上述实施例中，生成训练图像对应的训练文本，包括：利用图像文本推理模型对训练图像进行文本预测，生成训练文本。In the above embodiments of the present application, generating training text corresponding to the training image includes: using an image text reasoning model to perform text prediction on the training image and generating training text.

上述的图像文本推理模型可以是能够实现基于图像生成文本的模型，例如，可以是BLIP2，但不仅限于此。The above image text reasoning model may be a model capable of generating text based on images, for example, it may be BLIP2, but is not limited to this.

在一种可选的实施例中，为了减少用户对训练图像进行标签标注的成本，可以通过预先训练好的图像文本推理模型对训练图像进行文本预测，得到训练文本。In an optional embodiment, in order to reduce the user's cost of labeling training images, a pre-trained image text reasoning model can be used to perform text prediction on the training images to obtain the training text.

下面结合图5，以电商场景中上装和下装进行搭配展示为例进行详细说明。如图5所示，该方案主要分为三个流程，分别为服饰数据准备，LORA模型训练和模型推理。其中，在服饰数据准备流程中，需要提前准备好用于训练LORA模型的训练数据，训练数据主要包含服饰图像和文本标签，也即，包含训练图像和训练文本。可以输入指定服饰的原始图像，然后通过人体检测模型进行人体分割，分割出人体部分，并贴在纯色背景上构建训练图像，然后采用图像文本推理模型为训练图像自动标签标注，生成对应的标签描述，例如，为上装生成的标签描述可以是“1个女孩的上半身，穿着上衣，简单背景”，为下装生成的标签描述可以是“1个女孩的下半身，穿着牛仔裤，简单背景”等。在LORA模型训练流程中，可以获取预先训练好的SD大模型，该模型具备生成高质量服饰模特图像，也即生成匹配图像的能力，可以利用训练数据训练一个LORA模型，并将LORA模型和该模型并行处理，从而实现指定的匹配图像生成。在模型推理流程中，可以基于上下装对应的文本数据(包含服饰文本、对象文本和环境文本)，例如，“1个站立的短发女孩，全身穿着绿色衬衫和蓝色牛仔裤，简单背景”，引导SD模型配合训练好的LORA模型生成匹配图像，也即，生成一个穿戴指定上下装的模特图像。The following is a detailed explanation based on Figure 5, taking the matching display of tops and bottoms in an e-commerce scenario as an example. As shown in Figure 5, the solution is mainly divided into three processes, namely clothing data preparation, LORA model training and model inference. Among them, in the clothing data preparation process, the training data for training the LORA model needs to be prepared in advance. The training data mainly includes clothing images and text labels, that is, it includes training images and training text. You can input the original image of the specified clothing, and then perform human body segmentation through the human body detection model, segment the human body parts, and paste them on a solid color background to construct a training image, and then use the image text inference model to automatically label the training image and generate the corresponding label description. , for example, the tag description generated for the tops can be "the upper body of a girl, wearing a top, simple background", the tag description generated for the bottoms can be "the lower body of a girl, wearing jeans, a simple background" and so on. In the LORA model training process, you can obtain a pre-trained SD large model. This model has the ability to generate high-quality clothing model images, that is, to generate matching images. You can use the training data to train a LORA model, and combine the LORA model with the The models are processed in parallel to achieve specified matching image generation. In the model inference process, you can guide based on the text data corresponding to upper and lower clothing (including clothing text, object text and environment text), for example, "a standing girl with short hair, wearing a green shirt and blue jeans, with a simple background" The SD model cooperates with the trained LORA model to generate a matching image, that is, a model image wearing specified tops and bottoms is generated.

需要说明的是，本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)，均为经用户授权或者经过各方充分授权的信息和数据，并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准，并提供有相应的操作入口，供用户选择授权或者拒绝。It should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all It is information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with the relevant laws, regulations and standards of relevant countries and regions, and corresponding operation portals are provided for users to choose to authorize or reject.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本申请所必须的。It should be noted that for the sake of simple description, the foregoing method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the present application is not limited by the described action sequence. Because in accordance with this application, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily necessary for this application.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in various embodiments of this application.

实施例2Example 2

根据本申请实施例，还提供了一种虚拟服饰的匹配方法，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present application, a virtual clothing matching method is also provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although A logical order is shown in the flowcharts, but in some cases, the steps shown or described may be performed in a different order than herein.

图6是根据本申请实施例2的虚拟服饰的匹配方法的流程图。如图6所示，该方法可以包括如下步骤：Figure 6 is a flow chart of a virtual clothing matching method according to Embodiment 2 of the present application. As shown in Figure 6, the method may include the following steps:

步骤S602，获取至少两个类型的虚拟服饰的服饰图像。Step S602: Obtain clothing images of at least two types of virtual clothing.

步骤S604，确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述。Step S604: Determine image generation models and clothing text corresponding to at least two types of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing.

步骤S606，利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。Step S606: Use an image generation model to generate matching images of at least two types of virtual clothing based on clothing text, where the matching images are used to represent images obtained by matching at least two types of virtual clothing to virtual objects.

在一种可选的实施例中，用户可以在移动终端上选择至少两个类型的虚拟服饰的服饰图像，并由移动终端将服饰图像发送给服务器。服务器可以获取移动终端发送的服饰图像，确定能够将至少两个类型的虚拟服饰进行搭配展示的图像生成模型，并且用于对至少两个类型的虚拟服饰进行描述的服饰文本，利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，从而服务器可以将匹配图像返回给移动终端，由移动终端显示给用户查看。In an optional embodiment, the user can select clothing images of at least two types of virtual clothing on the mobile terminal, and the mobile terminal sends the clothing images to the server. The server can obtain clothing images sent by the mobile terminal, determine an image generation model that can match and display at least two types of virtual clothing, and use clothing text to describe at least two types of virtual clothing, using the image generation model based on The clothing text generates matching images of at least two types of virtual clothing, so that the server can return the matching images to the mobile terminal, and the mobile terminal displays them for the user to view.

需要说明的是，本申请上述实施例中涉及到的优选实施方案与实施例1提供的方案以及应用场景、实施过程相同，但不仅限于实施例1所提供的方案。It should be noted that the preferred implementations involved in the above embodiments of the present application are the same as the solutions, application scenarios, and implementation processes provided in Embodiment 1, but are not limited to the solutions provided in Embodiment 1.

实施例3Example 3

根据本申请实施例，还提供了一种模型训练方法，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to the embodiment of the present application, a model training method is also provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although in the flow A logical order is shown in the figures, but in some cases the steps shown or described may be performed in a different order than herein.

图7是根据本申请实施例3的模型训练方法的流程图。如图7所示，该方法可以包括如下步骤：Figure 7 is a flow chart of a model training method according to Embodiment 3 of the present application. As shown in Figure 7, the method may include the following steps:

步骤S702，构建包含至少两个类型的虚拟服饰的训练图像。Step S702: Construct training images containing at least two types of virtual clothing.

步骤S704，生成训练图像对应的训练文本，其中，训练文本用于对至少两个类型的虚拟服饰进行描述。Step S704: Generate training text corresponding to the training image, where the training text is used to describe at least two types of virtual clothing.

步骤S706，利用训练图像和训练文本对初始生成模型进行训练，得到图像生成模型，其中，图像生成模型用于基于至少两个类型的虚拟服饰对应的服饰文本，生成至少两个类型的虚拟服饰的匹配图像，服饰文本用于对至少两个类型的虚拟服饰进行描述，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。Step S706, use the training image and the training text to train the initial generation model to obtain an image generation model, wherein the image generation model is used to generate at least two types of virtual clothing based on clothing text corresponding to at least two types of virtual clothing. The matching image and clothing text are used to describe at least two types of virtual clothing, and the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object.

在一种可选的实施例中，可以由服务器预先训练一个图像生成模型，具体地，可以预先构建需要搭配展示的不同类型的虚拟服饰分别穿戴在模特身上的图像，即上述的训练图像。进一步地，可以由人工基于训练图像，生成相应的训练文本，作为训练图像的标签，或者通过一个图像文本推理模型实现图生文的目的，生成相应的训练文本，作为训练图像的标签。最终将训练图像和训练图像的标签作为最终的训练数据，对图像生成模型进行训练。In an optional embodiment, an image generation model can be pre-trained by the server. Specifically, images of different types of virtual clothing that need to be displayed and worn on the model, that is, the above-mentioned training images, can be pre-constructed. Furthermore, the corresponding training text can be manually generated based on the training image as the label of the training image, or an image text reasoning model can be used to realize the purpose of drawing text and the corresponding training text can be generated as the label of the training image. Finally, the training images and the labels of the training images are used as the final training data to train the image generation model.

实施例4Example 4

根据本申请实施例，还提供了一种可以应用于虚拟现实VR设备、增强现实AR设备等虚拟现实场景下的虚拟服饰的匹配方法，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to the embodiments of the present application, a matching method for virtual clothing that can be applied to virtual reality VR equipment, augmented reality AR equipment and other virtual reality scenarios is also provided. It should be noted that the steps shown in the flow chart of the accompanying drawings are may be executed in a computer system, such as a set of computer-executable instructions, and, although a logical sequence is shown in the flowchart diagrams, in some cases, the steps shown or described may be executed in a sequence different from that described herein. step.

图8是根据本申请实施例4的虚拟服饰的匹配方法的流程图。如图8所示，该方法可以包括如下步骤：Figure 8 is a flow chart of a virtual clothing matching method according to Embodiment 4 of the present application. As shown in Figure 8, the method may include the following steps:

步骤S802，在虚拟现实VR设备或增强现实AR设备的呈现画面上展示至少两个类型的虚拟服饰的服饰图像。Step S802: Display clothing images of at least two types of virtual clothing on the presentation screen of the virtual reality VR device or the augmented reality AR device.

步骤S804，确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述。Step S804: Determine image generation models and clothing text corresponding to at least two types of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing.

步骤S806，利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。Step S806: Use an image generation model to generate matching images of at least two types of virtual clothing based on clothing text, where the matching images are used to represent images obtained by matching at least two types of virtual clothing to virtual objects.

步骤S808，驱动VR设备或AR设备渲染展示匹配图像。Step S808: Drive the VR device or AR device to render and display the matching image.

可选地，在本实施例中，上述虚拟服饰的匹配方法可以应用于由服务器、虚拟现实设备所构成的硬件环境中。在虚拟现实VR设备或增强现实AR设备的呈现画面上展示虚拟服饰的匹配图像，服务器可以为媒体文件运营商对应的服务器，上述网络包括但不限于：广域网、城域网或局域网，上述虚拟现实设备并不限定于：虚拟现实头盔、虚拟现实眼镜、虚拟现实一体机等。Optionally, in this embodiment, the above-mentioned matching method of virtual clothing can be applied to a hardware environment composed of a server and a virtual reality device. The matching image of the virtual clothing is displayed on the presentation screen of the virtual reality VR device or the augmented reality AR device. The server can be a server corresponding to the media file operator. The above-mentioned network includes but is not limited to: wide area network, metropolitan area network or local area network. The above-mentioned virtual reality Equipment is not limited to: virtual reality helmets, virtual reality glasses, virtual reality all-in-one machines, etc.

可选地，虚拟现实设备包括：存储器、处理器和传输装置。存储器用于存储应用程序，该应用程序可以用于执行：在虚拟现实VR设备或增强现实AR设备的呈现画面上展示至少两个类型的虚拟服饰的服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；驱动VR设备或AR设备渲染展示匹配图像。Optionally, the virtual reality device includes: a memory, a processor and a transmission device. The memory is used to store an application program, which can be used to perform: display clothing images of at least two types of virtual clothing on the presentation screen of the virtual reality VR device or the augmented reality AR device; determine the correspondence between at least two types of virtual clothing The image generation model and clothing text are used, where the clothing text is used to describe at least two types of virtual clothing; the image generation model is used to generate matching images of at least two types of virtual clothing based on the clothing text, where the matching image is used To represent the image obtained by matching at least two types of virtual clothing to the virtual object; driving the VR device or AR device to render and display the matching image.

需要说明的是，该实施例的上述应用在VR设备或AR设备中的虚拟服饰的匹配方法可以包括图3所示实施例的方法，以实现驱动VR设备或AR设备展示虚拟服饰的匹配方法目的。It should be noted that the above-mentioned matching method of virtual clothing applied in a VR device or AR device in this embodiment may include the method of the embodiment shown in FIG. 3 to achieve the purpose of the matching method of driving the VR device or AR device to display virtual clothing. .

可选地，该实施例的处理器可以通过传输装置调用上述存储器存储的应用程序以执行上述步骤。传输装置可以通过网络接收服务器发送的媒体文件，也可以用于上述处理器与存储器之间的数据传输。Optionally, the processor of this embodiment can call the application program stored in the above memory through the transmission device to perform the above steps. The transmission device can receive media files sent by the server through the network, and can also be used for data transmission between the above-mentioned processor and the memory.

可选地，在虚拟现实设备中，带有眼球追踪的头戴式显示器，该HMD头显中的屏幕，用于显示展示的视频画面，HMD中的眼球追踪模块，用于获取用户眼球的实时运动轨迹，跟踪系统，用于追踪用户在真实三维空间的位置信息与运动信息，计算处理单元，用于从跟踪系统中获取用户的实时位置与运动信息，并计算出用户头部在虚拟三维空间中的三维坐标，以及用户在虚拟三维空间中的视野朝向等。Optionally, in the virtual reality device, a head-mounted display with eye tracking is used. The screen in the HMD is used to display the displayed video picture, and the eye-tracking module in the HMD is used to obtain real-time images of the user's eyeballs. Motion trajectory, tracking system, used to track the user's position information and motion information in the real three-dimensional space, calculation processing unit, used to obtain the user's real-time position and motion information from the tracking system, and calculate the position of the user's head in the virtual three-dimensional space The three-dimensional coordinates in the virtual three-dimensional space, as well as the user's visual field direction in the virtual three-dimensional space, etc.

在本申请实施例中，虚拟现实设备可以与终端相连接，终端与服务器通过网络进行连接，上述虚拟现实设备并不限定于：虚拟现实头盔、虚拟现实眼镜、虚拟现实一体机等，上述终端并不限定于PC、手机、平板电脑等，服务器可以为媒体文件运营商对应的服务器，上述网络包括但不限于：广域网、城域网或局域网。In the embodiment of the present application, the virtual reality device can be connected to the terminal, and the terminal and the server are connected through the network. The above-mentioned virtual reality device is not limited to: virtual reality helmets, virtual reality glasses, virtual reality all-in-one machines, etc., and the above-mentioned terminals are not limited to: It is not limited to PCs, mobile phones, tablets, etc., the server can be a server corresponding to the media file operator, and the above-mentioned network includes but is not limited to: wide area network, metropolitan area network or local area network.

图9是根据本申请实施例的一种虚拟服饰的匹配结果的示意图，如图9所示，在虚拟现实VR设备或增强现实AR设备的呈现画面上展示至少两个类型的虚拟服饰的服饰图像，可以利用至少两个类型的虚拟服饰对应的图像生成模型基于至少两个类型的虚拟服饰对应的服饰文本，例如“1个站立的短发女孩，全身穿着绿色衬衫和蓝色牛仔裤，简单背景”，生成至少两个类型的虚拟服饰的匹配图像，并驱动虚拟现实VR设备或增强现实AR设备渲染展示匹配图像，如图9所示的模特图。Figure 9 is a schematic diagram of a matching result of virtual clothing according to an embodiment of the present application. As shown in Figure 9, clothing images of at least two types of virtual clothing are displayed on the presentation screen of the virtual reality VR device or the augmented reality AR device. , the image generation model corresponding to at least two types of virtual clothing can be used to generate clothing text corresponding to at least two types of virtual clothing, such as "1 standing girl with short hair, wearing a green shirt and blue jeans, simple background", Generate matching images of at least two types of virtual clothing, and drive the virtual reality VR device or augmented reality AR device to render and display the matching images, such as the model image shown in Figure 9.

实施例5Example 5

图10是根据本申请实施例5的虚拟服饰的匹配方法的流程图。如图10所示，该方法可以包括如下步骤：Figure 10 is a flow chart of a virtual clothing matching method according to Embodiment 5 of the present application. As shown in Figure 10, the method may include the following steps:

步骤S1002，通过调用第一接口获取至少两个类型的虚拟服饰的服饰图像，其中，第一接口包括第一参数，第一参数的参数值为服饰图像。Step S1002: Obtain clothing images of at least two types of virtual clothing by calling a first interface, where the first interface includes a first parameter, and the parameter value of the first parameter is the clothing image.

上述的第一接口可以是移动终端与服务器连接的实体接口或虚拟接口，该接口可以实现将服饰图像从移动终端传输至服务器的目的。The above-mentioned first interface may be a physical interface or a virtual interface that connects the mobile terminal to the server. This interface may realize the purpose of transmitting clothing images from the mobile terminal to the server.

步骤S1004，确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述。Step S1004: Determine image generation models and clothing text corresponding to at least two types of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing.

步骤S1006，利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。Step S1006: Use an image generation model to generate matching images of at least two types of virtual clothing based on clothing text, where the matching images are used to represent images obtained by matching at least two types of virtual clothing to virtual objects.

步骤S1008，通过调用第二接口在操作界面中输出匹配图像，其中，第二接口包括第二参数，第二参数的参数值为匹配图像。Step S1008: Output the matching image in the operation interface by calling a second interface, where the second interface includes a second parameter, and the parameter value of the second parameter is the matching image.

上述的第一接口可以是移动终端与服务器连接的实体接口或虚拟接口，该接口可以实现将匹配图像从服务器传输至移动终端的目的。The above-mentioned first interface may be a physical interface or a virtual interface that connects the mobile terminal to the server. The interface may realize the purpose of transmitting the matching image from the server to the mobile terminal.

实施例6Example 6

根据本申请实施例，还提供了一种用于实施上述虚拟服饰的匹配方法的虚拟服饰的匹配装置，如图11所示，该装置1100包括：第一显示模块1102和第二显示模块1104。According to an embodiment of the present application, a virtual clothing matching device for implementing the above virtual clothing matching method is also provided. As shown in Figure 11, the device 1100 includes: a first display module 1102 and a second display module 1104.

其中，第一显示模块1102用于响应作用于操作界面上的文本输入指令，在操作界面上显示至少两个类型的虚拟服饰的服饰图像；第二显示模块1104用于响应作用于操作界面上的服饰匹配指令，在操作界面上显示至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，匹配图像是利用至少两个类型的虚拟服饰对应的图像生成模型基于至少两个类型的虚拟服饰对应的服饰文本生成的图像，服饰文本用于对至少两个类型的虚拟服饰进行描述。The first display module 1102 is used to display clothing images of at least two types of virtual clothing on the operation interface in response to text input instructions acting on the operation interface; the second display module 1104 is used to respond to the text input instructions acting on the operation interface. Clothing matching instructions display matching images of at least two types of virtual clothing on the operation interface, where the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object, and the matching image is made by using at least two types of virtual clothing. The image generation model corresponding to each type of virtual clothing generates images based on clothing text corresponding to at least two types of virtual clothing, and the clothing text is used to describe at least two types of virtual clothing.

此处需要说明的是，上述第一显示模块1102和第二显示模块1104对应于实施例1中的步骤S302至步骤S304，两个模块与对应的步骤所实现的实例和应用场景相同，但不限于上述实施例1所公开的内容。需要说明的是，上述模块或单元可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件，上述模块也可以作为装置的一部分可以运行在实施例1提供的AR/VR设备中。It should be noted here that the first display module 1102 and the second display module 1104 correspond to steps S302 to S304 in Embodiment 1. The examples and application scenarios implemented by the two modules and the corresponding steps are the same, but they are not the same. It is limited to the content disclosed in the above-mentioned Embodiment 1. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory and processed by one or more processors. The above-mentioned modules may also be part of a device that can run on the AR/VR provided in Embodiment 1. in the device.

在本申请上述实施例中，该装置还包括：特征提取模块和图像生成模块。In the above embodiment of the present application, the device further includes: a feature extraction module and an image generation module.

其中，特征提取模块用于利用图像生成模型中的特征提取模块对服饰文本进行特征提取，得到服饰特征；图像生成模块用于利用图像生成模型中的生成模块基于服饰特征和对象特征，生成匹配图像，其中，对象特征用于表征虚拟对象。Among them, the feature extraction module is used to use the feature extraction module in the image generation model to extract features from the clothing text to obtain clothing features; the image generation module is used to use the generation module in the image generation model to generate matching images based on clothing features and object features. , where object features are used to characterize virtual objects.

在本申请上述实施例中，操作界面上还显示有用于对虚拟对象进行描述的对象文本，特征提取模块还用于利用特征提取模块对对象文本进行特征提取，得到对象特征。In the above embodiments of the present application, the object text used to describe the virtual object is also displayed on the operation interface, and the feature extraction module is also used to extract features from the object text using the feature extraction module to obtain object features.

在本申请上述实施例中，操作界面上还显示有用于对虚拟对象所处虚拟环境进行描述的环境文本，特征提取模块还用于利用特征提取模块对环境文本进行特征提取，得到环境特征；图像生成模块还用于利用生成模块基于服饰特征、对象特征和环境特征生成匹配图像。In the above embodiments of the present application, the operating interface also displays environmental text for describing the virtual environment in which the virtual object is located, and the feature extraction module is also used to extract features from the environmental text using the feature extraction module to obtain environmental features; image The generation module is also used to generate matching images based on clothing features, object features and environment features using the generation module.

在本申请上述实施例中，图像生成模型包括：训练好的生成模型和大模型，大模型包含依次连接的至少一个第一模块，训练好的生成模型包含至少一个第二模块，第二模块与第一模块相匹配，该装置还包括：第一生成模块，第二生成模块、合并模块和输入模块。In the above embodiments of the present application, the image generation model includes: a trained generation model and a large model. The large model includes at least one first module connected in sequence. The trained generation model includes at least one second module. The second module is connected with The first module matches, and the device further includes: a first generation module, a second generation module, a merging module and an input module.

其中，第一生成模块用于利用至少一个第一模块中的当前第一模块基于服饰文本，生成第一匹配数据，其中，在当前第一模块为至少一个第一模块中的第一个模块的情况下；第二生成模块用于利用至少一个第二模块中的当前第二模块基于服饰文本，生成第二匹配数据，其中，当前第二模块为与当前第一模块对应的第二模块；合并模块用于将第一匹配数据和第二匹配数据进行合并，得到合并数据，其中，在当前第一模块为至少一个第一模块中的最后一个模块的情况下，合并数据为匹配图像；输入模块用于在当前第一模块不为至少一个第一模块中的最后一个模块的情况下，将合并数据分别输入至至少一个第一模块中的下一个第一模块和至少一个第二模块中的下一个第二模块。Wherein, the first generation module is used to generate the first matching data based on the clothing text by using the current first module in the at least one first module, wherein the current first module is the first module in the at least one first module. In this case; the second generation module is used to generate second matching data based on the clothing text using the current second module in at least one second module, where the current second module is the second module corresponding to the current first module; merge The module is used to merge the first matching data and the second matching data to obtain merged data, wherein, when the current first module is the last module of at least one first module, the merged data is a matching image; the input module Used to respectively input merged data to the next first module in the at least one first module and the next module in the at least one second module when the current first module is not the last module in the at least one first module. A second module.

在本申请上述实施例中，该装置还包括：构建模块、文本生成模块和训练模块。In the above embodiment of the present application, the device further includes: a building module, a text generation module and a training module.

其中，构建模块用于构建包含至少两个类型的虚拟服饰的训练图像；文本生成模块用于生成训练图像对应的训练文本，其中，训练文本用于对至少两个类型的虚拟服饰进行描述；训练模块用于利用训练图像和训练文本对初始生成模型进行训练，得到训练好的生成模型。Wherein, the building module is used to construct a training image containing at least two types of virtual clothing; the text generation module is used to generate training text corresponding to the training image, wherein the training text is used to describe at least two types of virtual clothing; training The module is used to train the initial generation model using training images and training text to obtain a trained generation model.

在本申请上述实施例中，构建模块包括：获取单元、分割单元和叠加单元。In the above embodiments of the present application, the building modules include: an acquisition unit, a segmentation unit and an overlay unit.

其中，获取单元用于获取原始图像，其中，原始图像的显示内容包含至少两个类型的虚拟服饰和至少一个对象；分割单元用于对原始图像中至少一个对象所在区域进行分割，得到分割图像；叠加单元用于将分割图像叠加至预设图像，生成训练图像。Wherein, the acquisition unit is used to acquire the original image, wherein the display content of the original image includes at least two types of virtual clothing and at least one object; the segmentation unit is used to segment the area where at least one object is located in the original image to obtain the segmented image; The overlay unit is used to superimpose the segmented images onto the preset images to generate training images.

在本申请上述实施例中，获取单元还用于在操作界面中显示至少一个图像集合，其中，不同图像集合包含的至少两个类型的虚拟服饰不同；响应作用于至少一个图像集合的选择指令，获取选择指令对应的图像集合，得到原始图像。In the above embodiments of the present application, the acquisition unit is also configured to display at least one image set in the operation interface, wherein different image sets contain at least two types of virtual clothing that are different; in response to a selection instruction acting on at least one image set, Get the image collection corresponding to the selection instruction and get the original image.

在本申请上述实施例中，文本生成模块包括：生成单元。In the above embodiments of this application, the text generation module includes: a generation unit.

其中，生成单元用于利用图像文本推理模型对训练图像进行文本预测，生成训练文本。Among them, the generation unit is used to use the image text reasoning model to perform text prediction on training images and generate training text.

实施例7Example 7

根据本申请实施例，还提供了一种用于实施上述虚拟服饰的匹配方法的虚拟服饰的匹配装置，如图12所示，该装置1200包括：获取模块1202、确定模块1204和生成模块1206。According to an embodiment of the present application, a virtual clothing matching device for implementing the above virtual clothing matching method is also provided. As shown in Figure 12, the device 1200 includes: an acquisition module 1202, a determination module 1204 and a generation module 1206.

其中，获取模块1202用于获取至少两个类型的虚拟服饰的服饰图像；确定模块1204用于确定至少两个类型的虚拟服饰对应的图像生成模和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；生成模块1206用于利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。Among them, the obtaining module 1202 is used to obtain clothing images of at least two types of virtual clothing; the determining module 1204 is used to determine image generation models and clothing text corresponding to at least two types of virtual clothing, wherein the clothing text is used to generate at least two types of virtual clothing. Each type of virtual clothing is described; the generation module 1206 is configured to use an image generation model to generate matching images of at least two types of virtual clothing based on the clothing text, where the matching images are used to represent matching at least two types of virtual clothing to The resulting image of the virtual object.

此处需要说明的是，上述获取模块1202、确定模块1204和生成模块1206对应于实施例2中的步骤S602至步骤S606，三个模块与对应的步骤所实现的实例和应用场景相同，但不限于上述实施例2所公开的内容。需要说明的是，上述模块或单元可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件，上述模块也可以作为装置的一部分可以运行在实施例1提供的AR/VR设备中。It should be noted here that the above-mentioned acquisition module 1202, determination module 1204 and generation module 1206 correspond to steps S602 to step S606 in Embodiment 2. The examples and application scenarios implemented by the three modules and the corresponding steps are the same, but they are not the same. It is limited to the content disclosed in the above-mentioned Embodiment 2. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory and processed by one or more processors. The above-mentioned modules may also be part of a device that can run on the AR/VR provided in Embodiment 1. in the device.

实施例8Example 8

根据本申请实施例，还提供了一种用于实施上述模型训练方法的模型训练装置，如图13所示，该装置1300包括：构建模块1302、生成模块1304和训练模块1306。According to an embodiment of the present application, a model training device for implementing the above model training method is also provided. As shown in Figure 13, the device 1300 includes: a construction module 1302, a generation module 1304 and a training module 1306.

其中，构建模块1302用于构建包含至少两个类型的虚拟服饰的训练图像；生成模块1304用于生成训练图像对应的训练文本，其中，训练文本用于对至少两个类型的虚拟服饰进行描述；训练模块1306用于利用训练图像和训练文本对初始生成模型进行训练，得到图像生成模型，其中，图像生成模型用于基于至少两个类型的虚拟服饰对应的服饰文本，生成至少两个类型的虚拟服饰的匹配图像，服饰文本用于对至少两个类型的虚拟服饰进行描述，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。Among them, the construction module 1302 is used to construct a training image containing at least two types of virtual clothing; the generation module 1304 is used to generate training text corresponding to the training image, wherein the training text is used to describe at least two types of virtual clothing; The training module 1306 is used to train the initial generation model using training images and training text to obtain an image generation model, wherein the image generation model is used to generate at least two types of virtual clothing based on clothing text corresponding to at least two types of virtual clothing. Matching images of clothing, clothing text is used to describe at least two types of virtual clothing, and matching images are used to represent images obtained by matching at least two types of virtual clothing to virtual objects.

此处需要说明的是，上述构建模块1302、生成模块1304和训练模块1306对应于实施例3中的步骤S702至步骤S706，三个模块与对应的步骤所实现的实例和应用场景相同，但不限于上述实施例3所公开的内容。需要说明的是，上述模块或单元可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件，上述模块也可以作为装置的一部分可以运行在实施例1提供的AR/VR设备中。It should be noted here that the above-mentioned building module 1302, generation module 1304 and training module 1306 correspond to step S702 to step S706 in Embodiment 3. The examples and application scenarios implemented by the three modules and the corresponding steps are the same, but they are not the same. It is limited to the content disclosed in the above-mentioned Embodiment 3. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory and processed by one or more processors. The above-mentioned modules may also be part of a device that can run on the AR/VR provided in Embodiment 1. in the device.

实施例9Example 9

根据本申请实施例，还提供了一种用于实施上述虚拟服饰的匹配方法的虚拟服饰的匹配装置，如图14所示，该装置1400包括：第一展示模块1402、确定模块1404、生成模块1406和第二展示模块1408。According to an embodiment of the present application, a virtual clothing matching device for implementing the above virtual clothing matching method is also provided. As shown in Figure 14, the device 1400 includes: a first display module 1402, a determination module 1404, and a generation module. 1406 and the second display module 1408.

其中，第一展示模块1402用于在虚拟现实VR设备或增强现实AR设备的呈现画面上展示至少两个类型的虚拟服饰的服饰图像；确定模块1404用于确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；生成模块1406用于利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；第二展示模块1408用于驱动VR设备或AR设备渲染展示匹配图像。Among them, the first display module 1402 is used to display clothing images of at least two types of virtual clothing on the presentation screen of the virtual reality VR device or the augmented reality AR device; the determination module 1404 is used to determine the clothing images corresponding to the at least two types of virtual clothing. The image generation model and clothing text, where the clothing text is used to describe at least two types of virtual clothing; the generation module 1406 is used to use the image generation model to generate matching images of at least two types of virtual clothing based on the clothing text, where , the matching image is used to represent the image obtained by matching at least two types of virtual clothing to the virtual object; the second display module 1408 is used to drive the VR device or AR device to render and display the matching image.

此处需要说明的是，上述第一展示模块1402、确定模块1404、生成模块1406和第二展示模块1408对应于实施例4中的步骤S802至步骤S808，四个模块与对应的步骤所实现的实例和应用场景相同，但不限于上述实施例4所公开的内容。需要说明的是，上述模块或单元可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件，上述模块也可以作为装置的一部分可以运行在实施例1提供的AR/VR设备中。It should be noted here that the above-mentioned first display module 1402, determination module 1404, generation module 1406 and second display module 1408 correspond to steps S802 to step S808 in Embodiment 4. The four modules and the corresponding steps implement The examples and application scenarios are the same, but are not limited to the content disclosed in the above-mentioned Embodiment 4. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory and processed by one or more processors. The above-mentioned modules may also be part of a device that can run on the AR/VR provided in Embodiment 1. in the device.

实施例10Example 10

根据本申请实施例，还提供了一种用于实施上述虚拟服饰的匹配方法的虚拟服饰的匹配装置，如图15所示，该装置1500包括：第一调用模块1502、确定模块1504、生成模块1506和第二调用模块1508。According to an embodiment of the present application, a virtual clothing matching device for implementing the above virtual clothing matching method is also provided. As shown in Figure 15, the device 1500 includes: a first calling module 1502, a determination module 1504, and a generation module. 1506 and the second calling module 1508.

其中，第一调用模块1502用于通过调用第一接口获取至少两个类型的虚拟服饰的服饰图像，其中，第一接口包括第一参数，第一参数的参数值为服饰图像；确定模块1504用于确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；生成模块1506用于利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；第二调用模块1508用于通过调用第二接口在操作界面中输出匹配图像，其中，第二接口包括第二参数，第二参数的参数值为匹配图像。The first calling module 1502 is used to obtain clothing images of at least two types of virtual clothing by calling a first interface, where the first interface includes a first parameter, and the parameter value of the first parameter is the clothing image; the determination module 1504 uses The image generation model and clothing text corresponding to at least two types of virtual clothing are determined, wherein the clothing text is used to describe the at least two types of virtual clothing; the generation module 1506 is used to use the image generation model to generate at least one based on the clothing text. Matching images of two types of virtual clothing, where the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object; the second calling module 1508 is used to call the second interface in the operation interface The matching image is output, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the matching image.

此处需要说明的是，上述第一调用模块1502、确定模块1504、生成模块1506和第二调用模块1508对应于实施例5中的步骤S1002至步骤S1008，四个模块与对应的步骤所实现的实例和应用场景相同，但不限于上述实施例5所公开的内容。需要说明的是，上述模块或单元可以是存储在存储器中并由一个或多个处理器处理的硬件组件或软件组件，上述模块也可以作为装置的一部分可以运行在实施例1提供的AR/VR设备中。It should be noted here that the above-mentioned first calling module 1502, determining module 1504, generating module 1506 and second calling module 1508 correspond to steps S1002 to step S1008 in Embodiment 5. The four modules and the corresponding steps implement The examples and application scenarios are the same, but are not limited to the content disclosed in the above-mentioned Embodiment 5. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory and processed by one or more processors. The above-mentioned modules may also be part of a device that can run on the AR/VR provided in Embodiment 1. in the device.

实施例11Example 11

本申请的实施例可以提供一种AR/VR设备，该AR/VR设备可以是AR/VR设备群中的任意一个AR/VR设备。可选地，在本实施例中，上述AR/VR设备也可以替换为移动终端等终端设备。Embodiments of the present application may provide an AR/VR device, and the AR/VR device may be any AR/VR device in the AR/VR device group. Optionally, in this embodiment, the above-mentioned AR/VR device can also be replaced by a terminal device such as a mobile terminal.

可选地，在本实施例中，上述AR/VR设备可以位于计算机网络的多个网络设备中的至少一个网络设备。Optionally, in this embodiment, the above-mentioned AR/VR device may be located in at least one network device among multiple network devices of the computer network.

在本实施例中，上述AR/VR设备可以执行虚拟服饰的匹配方法中以下步骤的程序代码：响应作用于操作界面上的输入指令，在操作界面上显示至少两个类型的虚拟服饰的服饰图像；响应作用于操作界面上的服饰匹配指令，在操作界面上显示至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，匹配图像是利用至少两个类型的虚拟服饰对应的图像生成模型基于至少两个类型的虚拟服饰对应的服饰文本生成的图像，服饰文本用于对至少两个类型的虚拟服饰进行描述。In this embodiment, the above-mentioned AR/VR device can execute the program code of the following steps in the virtual clothing matching method: in response to the input instructions acting on the operation interface, display clothing images of at least two types of virtual clothing on the operation interface. ; In response to a clothing matching instruction acting on the operating interface, display matching images of at least two types of virtual clothing on the operating interface, where the matching images are used to represent the results obtained by matching at least two types of virtual clothing to virtual objects. The matching image is an image generated by using an image generation model corresponding to at least two types of virtual clothing based on clothing text corresponding to at least two types of virtual clothing, and the clothing text is used to describe at least two types of virtual clothing.

可选地，图16是根据本申请实施例的一种计算机终端的结构框图。如图16所示，该计算机终端A可以包括：一个或多个(图中仅示出一个)处理器1602、存储器1604、存储控制器、以及外设接口，其中，外设接口与射频模块、音频模块和显示器连接。Optionally, FIG. 16 is a structural block diagram of a computer terminal according to an embodiment of the present application. As shown in Figure 16, the computer terminal A may include: one or more (only one is shown in the figure) processors 1602, a memory 1604, a storage controller, and a peripheral interface, wherein the peripheral interface is connected with a radio frequency module, Audio module and monitor connections.

其中，存储器可用于存储软件程序以及模块，如本申请实施例中的虚拟服饰的匹配方法和装置对应的程序指令/模块，处理器通过运行存储在存储器内的软件程序以及模块，从而执行各种功能应用以及数据处理，即实现上述的虚拟服饰的匹配方法。存储器可包括高速随机存储器，还可以包括非易失性存储器，如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中，存储器可进一步包括相对于处理器远程设置的存储器，这些远程存储器可以通过网络连接至终端A。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory can be used to store software programs and modules, such as the program instructions/modules corresponding to the virtual clothing matching method and device in the embodiments of the present application. The processor executes various tasks by running the software programs and modules stored in the memory. Function application and data processing are to implement the above-mentioned virtual clothing matching method. Memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely relative to the processor, and these remote memories may be connected to terminal A through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.

处理器可以通过传输装置调用存储器存储的信息及应用程序，以执行下述步骤：响应作用于操作界面上的输入指令，在操作界面上显示至少两个类型的虚拟服饰的服饰图像；响应作用于操作界面上的服饰匹配指令，在操作界面上显示至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，匹配图像是利用至少两个类型的虚拟服饰对应的图像生成模型基于至少两个类型的虚拟服饰对应的服饰文本生成的图像，服饰文本用于对至少两个类型的虚拟服饰进行描述。The processor can call the information and application programs stored in the memory through the transmission device to perform the following steps: in response to the input instructions acting on the operation interface, display clothing images of at least two types of virtual clothing on the operation interface; in response to the input instructions acting on the operation interface The clothing matching instruction on the operation interface displays matching images of at least two types of virtual clothing on the operation interface, where the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object, and the matching image It is an image generated by using an image generation model corresponding to at least two types of virtual clothing based on clothing text corresponding to at least two types of virtual clothing, and the clothing text is used to describe at least two types of virtual clothing.

可选的，上述处理器还可以执行如下步骤的程序代码：利用图像生成模型中的特征提取模块对服饰文本进行特征提取，得到服饰特征；利用图像生成模型中的生成模块基于服饰特征和对象特征，生成匹配图像，其中，对象特征用于表征虚拟对象。Optionally, the above processor can also execute the program code of the following steps: using the feature extraction module in the image generation model to extract features from the clothing text to obtain clothing features; using the generation module in the image generation model to extract features based on clothing features and object features. ,generate matching images where object features are ,used to characterize virtual objects.

可选的，操作界面上还显示有用于对虚拟对象进行描述的对象文本，上述处理器还可以执行如下步骤的程序代码：利用特征提取模块对对象文本进行特征提取，得到对象特征。Optionally, the object text used to describe the virtual object is also displayed on the operation interface. The above-mentioned processor can also execute the program code of the following steps: using the feature extraction module to extract features from the object text to obtain object features.

可选的，操作界面上还显示有用于对虚拟对象所处虚拟环境进行描述的环境文本，上述处理器还可以执行如下步骤的程序代码：利用特征提取模块对环境文本进行特征提取，得到环境特征；利用生成模块基于服饰特征、对象特征和环境特征生成匹配图像。Optionally, the operation interface also displays environmental text describing the virtual environment in which the virtual object is located. The above-mentioned processor can also execute the program code of the following steps: use the feature extraction module to extract features from the environmental text to obtain environmental features. ; Use the generation module to generate matching images based on clothing features, object features and environmental features.

可选的，图像生成模型包括：训练好的生成模型和大模型，大模型包含依次连接的至少一个第一模块，训练好的生成模型包含至少一个第二模块，第二模块与第一模块相匹配，上述处理器还可以执行如下步骤的程序代码：利用至少一个第一模块中的当前第一模块基于服饰文本，生成第一匹配数据；利用至少一个第二模块中的当前第二模块基于服饰文本，生成第二匹配数据，其中，当前第二模块为与当前第一模块对应的第二模块；将第一匹配数据和第二匹配数据进行合并，得到合并数据，其中，在当前第一模块为至少一个第一模块中的最后一个模块的情况下，合并数据为匹配图像；在当前第一模块不为至少一个第一模块中的最后一个模块的情况下，将合并数据分别输入至至少一个第一模块中的下一个第一模块和至少一个第二模块中的下一个第二模块。Optionally, the image generation model includes: a trained generation model and a large model. The large model includes at least one first module connected in sequence. The trained generation model includes at least one second module. The second module is related to the first module. For matching, the above-mentioned processor can also execute the program code of the following steps: using the current first module in at least one first module to generate the first matching data based on the clothing text; using the current second module in the at least one second module to generate the first matching data based on the clothing text. Text, generate second matching data, where the current second module is the second module corresponding to the current first module; merge the first matching data and the second matching data to obtain merged data, where the current first module If it is the last module in at least one first module, the merged data is a matching image; if the current first module is not the last module in at least one first module, the merged data is input to at least one a next first module among the first modules and a next second module among at least one second module.

可选的，上述处理器还可以执行如下步骤的程序代码：构建包含至少两个类型的虚拟服饰的训练图像；生成训练图像对应的训练文本，其中，训练文本用于对至少两个类型的虚拟服饰进行描述；利用训练图像和训练文本对初始生成模型进行训练，得到练好的生成模型。Optionally, the above-mentioned processor can also execute the program code of the following steps: construct a training image containing at least two types of virtual clothing; generate training text corresponding to the training image, wherein the training text is used to classify at least two types of virtual clothing. The clothing is described; the initial generation model is trained using training images and training text to obtain a well-trained generation model.

可选的，上述处理器还可以执行如下步骤的程序代码：获取原始图像，其中，原始图像的显示内容包含至少两个类型的虚拟服饰和至少一个对象；对原始图像中至少一个对象所在区域进行分割，得到分割图像；将分割图像叠加至预设图像，生成训练图像。Optionally, the above-mentioned processor can also execute the program code of the following steps: obtain an original image, wherein the display content of the original image includes at least two types of virtual clothing and at least one object; Segment to obtain the segmented image; superimpose the segmented image to the preset image to generate a training image.

可选的，上述处理器还可以执行如下步骤的程序代码：在操作界面中显示至少一个图像集合，其中，不同图像集合包含的至少两个类型的虚拟服饰不同；响应作用于至少一个图像集合的选择指令，获取选择指令对应的图像集合，得到原始图像。Optionally, the above-mentioned processor can also execute the program code of the following steps: display at least one image set in the operation interface, wherein different image sets contain at least two types of virtual clothing that are different; respond to the at least one image set. Select the instruction, obtain the image collection corresponding to the selection instruction, and obtain the original image.

可选的，上述处理器还可以执行如下步骤的程序代码：利用图像文本推理模型对训练图像进行文本预测，生成训练文本。Optionally, the above processor can also execute the program code of the following steps: using the image text reasoning model to perform text prediction on the training image and generate training text.

处理器可以通过传输装置调用存储器存储的信息及应用程序，以执行下述步骤：获取至少两个类型的虚拟服饰的服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。The processor can call the information stored in the memory and the application program through the transmission device to perform the following steps: obtain clothing images of at least two types of virtual clothing; determine image generation models and clothing texts corresponding to at least two types of virtual clothing, Among them, the clothing text is used to describe at least two types of virtual clothing; an image generation model is used to generate matching images of at least two types of virtual clothing based on the clothing text, where the matching image is used to represent the combination of at least two types of virtual clothing. An image obtained by matching virtual clothing to a virtual object.

处理器可以通过传输装置调用存储器存储的信息及应用程序，以执行下述步骤：构建包含至少两个类型的虚拟服饰的训练图像；生成训练图像对应的训练文本，其中，训练文本用于对至少两个类型的虚拟服饰进行描述；利用训练图像和训练文本对初始生成模型进行训练，得到图像生成模型，其中，图像生成模型用于基于至少两个类型的虚拟服饰对应的服饰文本，生成至少两个类型的虚拟服饰的匹配图像，服饰文本用于对至少两个类型的虚拟服饰进行描述，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。The processor can call the information stored in the memory and the application program through the transmission device to perform the following steps: construct a training image containing at least two types of virtual clothing; generate training text corresponding to the training image, wherein the training text is used to at least Two types of virtual clothing are described; the initial generation model is trained using training images and training text to obtain an image generation model, wherein the image generation model is used to generate at least two types of clothing text corresponding to at least two types of virtual clothing. Matching images of two types of virtual clothing, the clothing text is used to describe at least two types of virtual clothing, and the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object.

处理器可以通过传输装置调用存储器存储的信息及应用程序，以执行下述步骤：在虚拟现实VR设备或增强现实AR设备的呈现画面上展示至少两个类型的虚拟服饰的服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；驱动VR设备或AR设备渲染展示匹配图像。The processor can call the information and application programs stored in the memory through the transmission device to perform the following steps: display clothing images of at least two types of virtual clothing on the presentation screen of the virtual reality VR device or the augmented reality AR device; determine at least two types of virtual clothing. The image generation model and clothing text corresponding to each type of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing; the image generation model is used to generate matching images of at least two types of virtual clothing based on the clothing text. , wherein the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object; driving the VR device or AR device to render and display the matching image.

处理器可以通过传输装置调用存储器存储的信息及应用程序，以执行下述步骤：调用第一接口获取至少两个类型的虚拟服饰的服饰图像，其中，第一接口包括第一参数，第一参数的参数值为服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；通过调用第二接口在操作界面中输出匹配图像，其中，第二接口包括第二参数，第二参数的参数值为匹配图像。The processor can call the information stored in the memory and the application program through the transmission device to perform the following steps: call the first interface to obtain clothing images of at least two types of virtual clothing, where the first interface includes a first parameter, and the first parameter The parameter value is a clothing image; determine the image generation model and clothing text corresponding to at least two types of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing; use the image generation model based on the clothing text to generate Matching images of at least two types of virtual clothing, where the matching image is used to represent an image obtained by matching at least two types of virtual clothing to a virtual object; the matching image is output in the operation interface by calling the second interface, where, The second interface includes a second parameter, and the parameter value of the second parameter is the matching image.

采用本申请实施例，提供了一种虚拟服饰的匹配方案。由于匹配图像是将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，实现了对特定款式的服饰进行模拟，将不同类型的虚拟服饰进行搭配展示的目的，而且，由于匹配图像是利用图像生成模型基于服饰文本生成的图像，实现了利用文本生成图像的方式来直接生成更高真实度、更多图像元素的模特图的目的，从而达到了提高虚拟服饰匹配效果，扩展虚拟服饰匹配的应用场景的技术效果，进而解决了相关技术中通过虚拟试衣技术进行虚拟服饰匹配的效果较差，且应用场景有限的技术问题。Using the embodiments of the present application, a virtual clothing matching solution is provided. Since the matching image is an image obtained by matching at least two types of virtual clothing to a virtual object, the purpose of simulating specific styles of clothing and matching and displaying different types of virtual clothing is achieved, and because the matching image is made using The image generation model is based on images generated by clothing text, and achieves the purpose of using text to generate images to directly generate model pictures with higher authenticity and more image elements, thereby improving the virtual clothing matching effect and expanding the capabilities of virtual clothing matching. The technical effects of the application scenarios thus solve the technical problems in related technologies that the effect of virtual clothing matching through virtual fitting technology is poor and the application scenarios are limited.

本领域普通技术人员可以理解，图16所示的结构仅为示意，计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(MobileInternet Devices，MID)、PAD等终端设备。图16并不对上述电子装置的结构造成限定。例如，计算机终端A还可包括比图16中所示更多或者更少的组件(如网络接口、显示装置等)，或者具有与图16所示不同的配置。Those of ordinary skill in the art can understand that the structure shown in Figure 16 is only illustrative, and the computer terminal can also be a smart phone (such as an Android phone, iOS phone, etc.), a tablet computer, a handheld computer, and a mobile Internet device (Mobile Internet Devices, MID) , PAD and other terminal equipment. FIG. 16 does not limit the structure of the above electronic device. For example, the computer terminal A may also include more or fewer components (such as network interfaces, display devices, etc.) than shown in FIG. 16 , or have a different configuration than that shown in FIG. 16 .

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成，该程序可以存储于一计算机可读存储介质中，存储介质可以包括：闪存盘、只读存储器(Read-Only Memory，ROM)、随机存取器(RandomAccess Memory，RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing the hardware related to the terminal device through a program. The program can be stored in a computer-readable storage medium, and the storage medium can Including: flash disk, read-only memory (Read-Only Memory, ROM), random access memory (RandomAccess Memory, RAM), magnetic disk or optical disk, etc.

实施例12Example 12

本申请的实施例还提供了一种计算机可读存储介质。可选地，在本实施例中，上述计算机可读存储介质可以用于保存上述实施例1所提供的虚拟服饰的匹配方法所执行的程序代码。Embodiments of the present application also provide a computer-readable storage medium. Optionally, in this embodiment, the above-mentioned computer-readable storage medium can be used to save the program code executed by the virtual clothing matching method provided in the above-mentioned Embodiment 1.

可选地，在本实施例中，上述计算机可读存储介质可以位于AR/VR设备网络中AR/VR设备终端群中的任意一个计算机终端中，或者位于移动终端群中的任意一个移动终端中。Optionally, in this embodiment, the above computer-readable storage medium can be located in any computer terminal in the AR/VR device terminal group in the AR/VR device network, or in any mobile terminal in the mobile terminal group. .

可选地，在本实施例中，计算机可读存储介质被设置为存储用于执行以下步骤的程序代码：响应作用于操作界面上的输入指令，在操作界面上显示至少两个类型的虚拟服饰的服饰图像；响应作用于操作界面上的服饰匹配指令，在操作界面上显示至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像，匹配图像是利用至少两个类型的虚拟服饰对应的图像生成模型基于至少两个类型的虚拟服饰对应的服饰文本生成的图像，服饰文本用于对至少两个类型的虚拟服饰进行描述。Optionally, in this embodiment, the computer-readable storage medium is configured to store program codes for performing the following steps: displaying at least two types of virtual clothing on the operating interface in response to input instructions acting on the operating interface clothing images; in response to clothing matching instructions acting on the operating interface, display matching images of at least two types of virtual clothing on the operating interface, wherein the matching images are used to represent matching at least two types of virtual clothing to virtual objects The obtained image, the matching image is an image generated by using an image generation model corresponding to at least two types of virtual clothing based on clothing text corresponding to at least two types of virtual clothing, and the clothing text is used to perform at least two types of virtual clothing. describe.

可选的，上述计算机可读存储介质还被设置为存储用于执行以下步骤的程序代码：利用图像生成模型中的特征提取模块对服饰文本进行特征提取，得到服饰特征；利用图像生成模型中的生成模块基于服饰特征和对象特征，生成匹配图像，其中，对象特征用于表征虚拟对象。Optionally, the above computer-readable storage medium is also configured to store program codes for performing the following steps: using the feature extraction module in the image generation model to extract features from the clothing text to obtain clothing features; using the feature extraction module in the image generation model. The generation module generates matching images based on clothing features and object features, where the object features are used to characterize virtual objects.

可选的，操作界面上还显示有用于对虚拟对象进行描述的对象文本，上述计算机可读存储介质还被设置为存储用于执行以下步骤的程序代码：利用特征提取模块对对象文本进行特征提取，得到对象特征。Optionally, the object text used to describe the virtual object is also displayed on the operation interface. The above-mentioned computer-readable storage medium is also configured to store program codes for performing the following steps: using a feature extraction module to extract features from the object text. , get the object characteristics.

可选的，操作界面上还显示有用于对虚拟对象所处虚拟环境进行描述的环境文本，上述计算机可读存储介质还被设置为存储用于执行以下步骤的程序代码：利用特征提取模块对环境文本进行特征提取，得到环境特征；利用生成模块基于服饰特征、对象特征和环境特征生成匹配图像。Optionally, the operating interface also displays environment text describing the virtual environment in which the virtual object is located. The above computer-readable storage medium is also configured to store program code for performing the following steps: using the feature extraction module to analyze the environment. Features are extracted from the text to obtain environmental features; the generation module is used to generate matching images based on clothing features, object features and environmental features.

可选的，图像生成模型包括：训练好的生成模型和大模型，大模型包含依次连接的至少一个第一模块，训练好的生成模型包含至少一个第二模块，第二模块与第一模块相匹配，上述计算机可读存储介质还被设置为存储用于执行以下步骤的程序代码：利用至少一个第一模块中的当前第一模块基于服饰文本，生成第一匹配数据；利用至少一个第二模块中的当前第二模块基于服饰文本，生成第二匹配数据，其中，当前第二模块为与当前第一模块对应的第二模块；将第一匹配数据和第二匹配数据进行合并，得到合并数据，其中，在当前第一模块为至少一个第一模块中的最后一个模块的情况下，合并数据为匹配图像；在当前第一模块不为至少一个第一模块中的最后一个模块的情况下，将合并数据分别输入至至少一个第一模块中的下一个第一模块和至少一个第二模块中的下一个第二模块。Optionally, the image generation model includes: a trained generation model and a large model. The large model includes at least one first module connected in sequence. The trained generation model includes at least one second module. The second module is related to the first module. Matching, the above computer-readable storage medium is further configured to store program code for performing the following steps: using the current first module in at least one first module to generate first matching data based on the clothing text; using at least one second module The current second module in generates second matching data based on the clothing text, where the current second module is the second module corresponding to the current first module; the first matching data and the second matching data are merged to obtain the merged data , wherein, when the current first module is the last module among at least one first module, the merged data is a matching image; when the current first module is not the last module among at least one first module, The merged data is respectively input to a next first module of at least one first module and a next second module of at least one second module.

可选的，上述计算机可读存储介质还被设置为存储用于执行以下步骤的程序代码：构建包含至少两个类型的虚拟服饰的训练图像；生成训练图像对应的训练文本，其中，训练文本用于对至少两个类型的虚拟服饰进行描述；利用训练图像和训练文本对初始生成模型进行训练，得到练好的生成模型。Optionally, the above computer-readable storage medium is also configured to store program code for performing the following steps: constructing a training image containing at least two types of virtual clothing; generating training text corresponding to the training image, wherein the training text is It is used to describe at least two types of virtual clothing; use training images and training text to train the initial generation model to obtain a well-trained generation model.

可选的，上述计算机可读存储介质还被设置为存储用于执行以下步骤的程序代码：获取原始图像，其中，原始图像的显示内容包含至少两个类型的虚拟服饰和至少一个对象；对原始图像中至少一个对象所在区域进行分割，得到分割图像；将分割图像叠加至预设图像，生成训练图像。Optionally, the above computer-readable storage medium is also configured to store program codes for performing the following steps: obtaining an original image, wherein the display content of the original image includes at least two types of virtual clothing and at least one object; The area where at least one object in the image is located is segmented to obtain a segmented image; the segmented image is superimposed on the preset image to generate a training image.

可选的，上述计算机可读存储介质还被设置为存储用于执行以下步骤的程序代码：在操作界面中显示至少一个图像集合，其中，不同图像集合包含的至少两个类型的虚拟服饰不同；响应作用于至少一个图像集合的选择指令，获取选择指令对应的图像集合，得到原始图像。Optionally, the above-mentioned computer-readable storage medium is further configured to store program codes for performing the following steps: displaying at least one image collection in the operation interface, wherein different image collections contain at least two different types of virtual clothing; In response to a selection instruction acting on at least one image set, the image set corresponding to the selection instruction is obtained, and the original image is obtained.

可选的，上述计算机可读存储介质还被设置为存储用于执行以下步骤的程序代码：利用图像文本推理模型对训练图像进行文本预测，生成训练文本。Optionally, the above computer-readable storage medium is further configured to store program codes for performing the following steps: using an image-text reasoning model to perform text prediction on training images and generate training text.

可选地，在本实施例中，计算机可读存储介质被设置为存储用于执行以下步骤的程序代码：获取至少两个类型的虚拟服饰的服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。Optionally, in this embodiment, the computer-readable storage medium is configured to store program codes for performing the following steps: obtaining clothing images of at least two types of virtual clothing; determining the clothing images corresponding to the at least two types of virtual clothing. Image generation model and clothing text, wherein the clothing text is used to describe at least two types of virtual clothing; using the image generation model to generate matching images of at least two types of virtual clothing based on the clothing text, wherein the matching image is used to Characterizes an image obtained by matching at least two types of virtual clothing to a virtual object.

可选地，在本实施例中，计算机可读存储介质被设置为存储用于执行以下步骤的程序代码：构建包含至少两个类型的虚拟服饰的训练图像；生成训练图像对应的训练文本，其中，训练文本用于对至少两个类型的虚拟服饰进行描述；利用训练图像和训练文本对初始生成模型进行训练，得到图像生成模型，其中，图像生成模型用于基于至少两个类型的虚拟服饰对应的服饰文本，生成至少两个类型的虚拟服饰的匹配图像，服饰文本用于对至少两个类型的虚拟服饰进行描述，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像。Optionally, in this embodiment, the computer-readable storage medium is configured to store program code for performing the following steps: constructing a training image containing at least two types of virtual clothing; generating training text corresponding to the training image, wherein , the training text is used to describe at least two types of virtual clothing; the training image and the training text are used to train the initial generation model to obtain an image generation model, in which the image generation model is used to correspond to at least two types of virtual clothing. The clothing text is used to generate matching images of at least two types of virtual clothing. The clothing text is used to describe at least two types of virtual clothing. The matching image is used to represent the results obtained by matching at least two types of virtual clothing to virtual objects. Image.

可选地，在本实施例中，计算机可读存储介质被设置为存储用于执行以下步骤的程序代码：在虚拟现实VR设备或增强现实AR设备的呈现画面上展示至少两个类型的虚拟服饰的服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；驱动VR设备或AR设备渲染展示匹配图像。Optionally, in this embodiment, the computer-readable storage medium is configured to store program codes for performing the following steps: displaying at least two types of virtual clothing on the presentation screen of the virtual reality VR device or the augmented reality AR device clothing images; determine the image generation models and clothing texts corresponding to at least two types of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing; use the image generation model to generate at least two types of clothing based on the clothing text. Matching images of types of virtual clothing, where the matching images are used to represent images obtained by matching at least two types of virtual clothing to virtual objects; driving the VR device or AR device to render and display the matching images.

可选地，在本实施例中，计算机可读存储介质被设置为存储用于执行以下步骤的程序代码：调用第一接口获取至少两个类型的虚拟服饰的服饰图像，其中，第一接口包括第一参数，第一参数的参数值为服饰图像；确定至少两个类型的虚拟服饰对应的图像生成模型和服饰文本，其中，服饰文本用于对至少两个类型的虚拟服饰进行描述；利用图像生成模型基于服饰文本，生成至少两个类型的虚拟服饰的匹配图像，其中，匹配图像用于表征将至少两个类型的虚拟服饰匹配至虚拟对象所得到的图像；通过调用第二接口在操作界面中输出匹配图像，其中，第二接口包括第二参数，第二参数的参数值为匹配图像。Optionally, in this embodiment, the computer-readable storage medium is configured to store program code for performing the following steps: calling a first interface to obtain clothing images of at least two types of virtual clothing, wherein the first interface includes The first parameter, the parameter value of the first parameter is a clothing image; determine the image generation model and clothing text corresponding to at least two types of virtual clothing, where the clothing text is used to describe at least two types of virtual clothing; use the image The generation model generates matching images of at least two types of virtual clothing based on the clothing text, where the matching images are used to represent images obtained by matching at least two types of virtual clothing to virtual objects; by calling the second interface, in the operation interface The matching image is output, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the matching image.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The above serial numbers of the embodiments of the present application are only for description and do not represent the advantages or disadvantages of the embodiments.

在本申请的上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present application, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的技术内容，可通过其它的方式实现。其中，以上所描述的装置实施例仅仅是示意性的，例如所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，单元或模块的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. Among them, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the units or modules may be in electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which can be a personal computer, a server or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program code. .

以上所述仅是本申请的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本申请的保护范围。The above are only the preferred embodiments of the present application. It should be pointed out that for those of ordinary skill in the art, several improvements and modifications can be made without departing from the principles of the present application. These improvements and modifications can also be made. should be regarded as the scope of protection of this application.

Claims

1. A virtual clothing matching method, characterized by including:

In response to an input instruction acting on the operation interface, display clothing images of at least two types of virtual clothing on the operation interface;

In response to a clothing matching instruction acting on the operation interface, matching images of the at least two types of virtual clothing are displayed on the operation interface, wherein the matching images are used to represent the combination of the at least two types of virtual clothing. An image obtained by matching virtual clothing to a virtual object, where the matching image is an image generated based on the clothing text corresponding to the at least two types of virtual clothing using an image generation model corresponding to the at least two types of virtual clothing, so The clothing text is used to describe the at least two types of virtual clothing.

2. The method according to claim 1, characterized in that, the method further comprises:

Use the feature extraction module in the image generation model to extract features from the clothing text to obtain clothing features;

The matching image is generated based on the clothing features and object features using a generation module in the image generation model, where the object features are used to characterize the virtual object.

3. The method according to claim 2, wherein object text for describing the virtual object is also displayed on the operation interface, and the method further includes:

The feature extraction module is used to extract features from the object text to obtain the object features.

4. The method according to claim 2, characterized in that the operation interface also displays environmental text for describing the virtual environment in which the virtual object is located, and the generation module in the image generation model is used based on The clothing characteristics and object characteristics are used to generate the matching image, including:

Use the feature extraction module to extract features from the environmental text to obtain environmental features;

The matching image is generated based on the clothing features, the object features and the environment features using the generating module.

5. The method according to claim 1, characterized in that the image generation model includes: a trained generation model and a large model, the large model includes at least one first module connected in sequence, the trained The generated model includes at least one second module that matches the first module, and the method further includes:

Utilize the current first module in the at least one first module to generate first matching data based on the clothing text;

Utilize the current second module in the at least one second module to generate second matching data based on the clothing text, wherein the current second module is the second module corresponding to the current first module;

The first matching data and the second matching data are merged to obtain merged data, wherein, when the current first module is the last module among the at least one first module, the merge The data is the matching image;

In the case where the current first module is not the last module among the at least one first module, the merged data is respectively input to the next first module among the at least one first module and the next first module among the at least one first module. The next second module of at least one second module.

6. The method according to claim 5, characterized in that, the method further comprises:

Constructing training images containing the at least two types of virtual clothing;

Generate training text corresponding to the training image, wherein the training text is used to describe the at least two types of virtual clothing;

The initial generation model is trained using the training image and the training text to obtain the trained generation model.

7. The method of claim 6, wherein constructing a training image containing the at least two types of virtual clothing includes:

Obtaining an original image, wherein the display content of the original image includes the at least two types of virtual clothing and at least one object;

Segment the area where the at least one object is located in the original image to obtain a segmented image;

The segmented image is superimposed on a preset image to generate the training image.

8. The method of claim 7, wherein obtaining the original image includes:

Display at least one image set in the operation interface, wherein the at least two types of virtual clothing contained in different image sets are different;

In response to a selection instruction acting on the at least one image set, the image set corresponding to the selection instruction is obtained to obtain the original image.

9. A virtual clothing matching method, characterized by including:

Obtain clothing images of at least two types of virtual clothing;

Determine the image generation model and clothing text corresponding to the at least two types of virtual clothing, wherein the clothing text is used to describe the at least two types of virtual clothing;

The image generation model is used to generate matching images of the at least two types of virtual clothing based on the clothing text, wherein the matching images are used to represent the matching of the at least two types of virtual clothing to virtual objects. the resulting image.

10. A training method for an image generation model, characterized by comprising:

Constructing training images containing at least two types of virtual clothing;

The initial generation model is trained using the training image and the training text to obtain an image generation model, wherein the image generation model is used to generate the at least two types of virtual clothing based on clothing text corresponding to the at least two types of virtual clothing. Matching images of two types of virtual clothing, the clothing text is used to describe the at least two types of virtual clothing, and the matching image is used to represent matching the at least two types of virtual clothing to a virtual object the resulting image.

11. A virtual clothing matching method, characterized by including:

Display clothing images of at least two types of virtual clothing on the presentation screen of the virtual reality VR device or the augmented reality AR device;

The image generation model is used to generate matching images of the at least two types of virtual clothing based on the clothing text, wherein the matching images are used to represent the matching of the at least two types of virtual clothing to virtual objects. the resulting image;

Driving the VR device or the AR device to render and display the matching image.

12. A virtual clothing matching method, characterized by including:

Obtain clothing images of at least two types of virtual clothing by calling a first interface, wherein the first interface includes a first parameter, and the parameter value of the first parameter is the clothing image;

The matching image is output in the operation interface by calling a second interface, wherein the second interface includes a second parameter, and the parameter value of the second parameter is the matching image.

13. An electronic device, characterized in that it includes:

Memory, which stores executable programs;

A processor, configured to run the program, wherein the method according to any one of claims 1 to 12 is executed when the program is run.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium includes a stored executable program, wherein when the executable program is running, the device where the computer-readable storage medium is located is controlled to execute the right The method according to any one of claims 1 to 12.