CN110225285B

CN110225285B - Audio and video communication method and device, computer device and readable storage medium

Info

Publication number: CN110225285B
Application number: CN201910305621.3A
Authority: CN
Inventors: 齐燕
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2022-09-02
Anticipated expiration: 2039-04-16
Also published as: CN110225285A

Abstract

The present invention provides an audio and video communication method, comprising: when performing audio and video communication with an external device, acquiring audio and video data to be transmitted, and extracting audio and video related parameters from the audio and video data to be transmitted; The scene recognition model, identifies the current scene of the user according to the acquired audio and video related parameters; determines the processing method for the audio and video data to be transmitted according to the current scene of the user; The audio and video data to be transmitted is processed, and the processed audio and video data is transmitted to the external device. The present invention also provides a device, a computer device, and a readable storage medium for implementing the audio and video communication method. The present invention can solve the technical problem of poor user experience in audio and video communication.

Description

Audio and video communication method, device, computer device, and readable storage medium

技术领域technical field

本发明涉及计算机技术领域，具体涉及一种音视频通信方法、装置、计算机装置、及可读存储介质。The present invention relates to the field of computer technology, and in particular, to an audio and video communication method, device, computer device, and readable storage medium.

背景技术Background technique

在音视频通信过程中，用户当前所处环境对音视频通信体验的影响很大。例如，嘈杂的环境会使得通信对方听不清用户说话。In the process of audio and video communication, the current environment of the user has a great influence on the audio and video communication experience. For example, a noisy environment can make it difficult for the communicating party to hear the user's speech.

发明内容SUMMARY OF THE INVENTION

鉴于以上内容，有必要提出一种音视频通信方法、装置、计算机装置、及可读存储介质，用以解决用户音视频通信体验不佳的技术问题。In view of the above content, it is necessary to propose an audio and video communication method, device, computer device, and readable storage medium, so as to solve the technical problem of poor user experience in audio and video communication.

本发明的第一方面提供一种音视频通信方法，所述方法包括：A first aspect of the present invention provides an audio and video communication method, the method comprising:

当与外部设备进行音视频通信时，获取待传输的音视频数据，从所述待传输的音视频数据中提取音视频相关参数；When performing audio and video communication with an external device, obtain audio and video data to be transmitted, and extract audio and video related parameters from the audio and video data to be transmitted;

调用预先训练生成的场景识别模型，根据所获取的音视频相关参数识别用户当前所处场景；Call the scene recognition model generated by pre-training, and identify the current scene of the user according to the acquired audio and video related parameters;

根据用户当前所处场景确定对所述待传输的音视频数据的处理方式；及Determine the processing method for the audio and video data to be transmitted according to the current scene of the user; and

根据所述确定的处理方式对所述待传输的音视频数据进行处理，并将处理后的所述音视频数据传输给所述外部设备。The audio and video data to be transmitted is processed according to the determined processing mode, and the processed audio and video data is transmitted to the external device.

优选地，训练所述场景识别模型的方法包括：Preferably, the method for training the scene recognition model comprises:

获取预设数量的与不同的场景分别对应的音视频相关参数，并对与每种场景所对应的音视频相关参数标注类别，使得与所述每种场景所对应的音视频相关参数携带类别标签；Acquire a preset number of audio and video-related parameters corresponding to different scenarios, and label the audio and video-related parameters corresponding to each scene with categories, so that the audio and video-related parameters corresponding to each of the scenarios carry category labels ;

分别将与所述不同场景对应的音视频相关参数随机分成第一预设比例的训练集和第二预设比例的验证集，利用所述训练集训练所述场景识别模型，并利用所述验证集验证训练后的所述场景识别模型的准确率；及The audio and video related parameters corresponding to the different scenes are randomly divided into a training set of a first preset ratio and a verification set of a second preset ratio, and the scene recognition model is trained by using the training set, and the verification set is used to train the scene recognition model. set the accuracy of the scene recognition model after training; and

若所述准确率大于或者等于预设准确率时，则结束训练；若所述准确率小于所述预设准确率时，则增加样本数量以重新训练所述场景识别模型直至所述准确率大于或者等于预设准确率。If the accuracy rate is greater than or equal to the preset accuracy rate, end the training; if the accuracy rate is less than the preset accuracy rate, increase the number of samples to retrain the scene recognition model until the accuracy rate is greater than Or equal to the preset accuracy.

优选地，所述根据用户当前所处场景确定对所述待传输的音视频数据的处理方式包括：Preferably, the determining of the processing method for the audio and video data to be transmitted according to the current scene of the user includes:

当用户当前所处场景为室外时，确定对所述待传输的音视频数据的处理方式为第一方式，其中，所述第一方式是指对所述待传输的音视频数据的处理至少包括降噪处理；When the current scene where the user is located is outdoors, it is determined that the processing mode for the audio and video data to be transmitted is the first mode, wherein the first mode means that the processing of the audio and video data to be transmitted at least includes: Noise reduction processing;

当用户当前所处场景为室内时，确定对所述待传输的音视频数据的处理方式为第二方式，其中，所述第二方式是指根据室内面积和室内墙壁的材质处理所述待传输的音视频数据。When the scene where the user is currently located is indoors, it is determined that the processing method for the audio and video data to be transmitted is the second method, wherein the second method refers to processing the to-be-transmitted data according to the indoor area and the material of the indoor walls audio and video data.

优选地，所述根据室内面积和室内墙壁的材质处理所述待传输的音视频数据包括步骤：Preferably, the processing of the audio and video data to be transmitted according to the indoor area and the material of the indoor walls includes the steps of:

估算室内面积的大小；Estimate the size of the indoor area;

从所述待传输的音视频数据中截取一帧包括墙壁的图像；Cut out a frame of an image including a wall from the audio and video data to be transmitted;

利用图像识别算法将所截取的墙壁的图像与预先存储的多张不同材质的图像进行匹配以确定所述墙壁的材质；根据所述墙壁的材质确定吸声系数；Use an image recognition algorithm to match the captured image of the wall with a plurality of pre-stored images of different materials to determine the material of the wall; determine the sound absorption coefficient according to the material of the wall;

将所述室内面积乘以所确定的吸声系数估算吸声量；及multiplying the indoor area by the determined sound absorption coefficient to estimate the sound absorption; and

根据所估算获得的吸声量处理所述待传输的音视频数据，其中，当所估算获得的吸声量大于一个预设的吸声量值时，对所述待传输的音视频数据的处理至少包括去混响处理，当所估算获得的吸声量小于或者等于所述预设的吸声量值时，对所述待传输的音视频数据的处理不包括去混响处理。The audio and video data to be transmitted is processed according to the estimated sound absorption, wherein, when the estimated sound absorption is greater than a preset sound absorption value, the audio and video data to be transmitted is processed at least De-reverberation processing is included. When the estimated sound absorption amount is less than or equal to the preset sound absorption amount value, the processing of the audio and video data to be transmitted does not include de-reverberation processing.

优选地，所述估算室内面积的大小包括：Preferably, the estimating the size of the indoor area includes:

从所述音视频数据中截取一帧包括用户的头像的图像；intercepting a frame of an image including the user's avatar from the audio and video data;

计算所述用户的头像所包括的第一像素点总数，以及计算所述截取的图像所包括的第二像素点总数；calculating the total number of first pixels included in the avatar of the user, and calculating the total number of second pixels included in the captured image;

根据所述第一像素点总数和第二像素点总数之间的比值估算所述室内面积的大小，其中，所述室内面积的大小等于一个预设值除以所述比值。The size of the indoor area is estimated according to the ratio between the total number of the first pixel points and the total number of the second pixel points, wherein the size of the indoor area is equal to a preset value divided by the ratio.

优选地，所述根据所述确定的处理方式对所述待传输的音视频数据进行处理后，所述方法还包括：Preferably, after the audio and video data to be transmitted is processed according to the determined processing method, the method further includes:

确定所述待传输的音视频数据所包括的视频图像中是否存在多个人像；Determine whether there are multiple portraits in the video images included in the audio and video data to be transmitted;

当确定所述视频图像中存在多个人像时，识别所述视频图像中正对镜头的人像，当所述视频图像中不存在多个人像时，则不对所述视频图像中正对镜头的人像进行识别；及When it is determined that there are multiple portraits in the video image, identify the portrait in the video image facing the camera, and when there are no multiple portraits in the video image, do not identify the portrait in the video image facing the camera ;and

将所述视频图像中，除所述正对镜头的人像之外的其他人像作虚化处理。In the video image, other portraits except the portrait facing the lens are blurred.

获取所述待传输的音视频数据中的视频图像的平均亮度；Obtain the average brightness of the video image in the audio-video data to be transmitted;

判断视频图像的平均亮度是否小于预设的亮度阈值；及determining whether the average brightness of the video image is less than a preset brightness threshold; and

当视频图像的平均亮度小于所述预设的亮度阈值，对视频图像进行亮度增强，当视频图像的平均亮度大于或者等于所述预设的亮度阈值，则不对视频图像进行亮度增强处理。When the average brightness of the video image is less than the preset brightness threshold, the brightness enhancement is performed on the video image, and when the average brightness of the video image is greater than or equal to the preset brightness threshold, the brightness enhancement process is not performed on the video image.

本发明第二方面提供一种计算机装置，所述计算机装置包括存储器和处理器，所述存储器用于存储至少一个指令，所述处理器用于执行所述至少一个指令时实现所述音视频通信方法。A second aspect of the present invention provides a computer device, the computer device includes a memory and a processor, the memory is configured to store at least one instruction, and the processor is configured to implement the audio-video communication method when executing the at least one instruction .

本发明第三方面提供一种计算机可读存储介质，所述计算机可读存储介质存储有至少一个指令，所述至少一个指令被处理器执行时实现所述音视频通信方法。A third aspect of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, implements the audio-video communication method.

本发明第四方面提供一种音视频通信装置，所述装置包括：A fourth aspect of the present invention provides an audio and video communication device, the device comprising:

获取模块，用于当与外部设备进行音视频通信时，获取待传输的音视频数据，从所述待传输的音视频数据中提取音视频相关参数；an acquisition module for acquiring audio and video data to be transmitted when performing audio and video communication with an external device, and extracting audio and video related parameters from the audio and video data to be transmitted;

执行模块，用于调用预先训练生成的场景识别模型，根据所获取的音视频相关参数识别用户当前所处场景；The execution module is used to call the scene recognition model generated by pre-training, and identify the current scene of the user according to the acquired audio and video related parameters;

所述执行模块，还用于根据用户当前所处场景确定对所述待传输的音视频数据的处理方式；及The execution module is further configured to determine a processing method for the audio and video data to be transmitted according to the current scene of the user; and

所述执行模块，还用于根据所述确定的处理方式对所述待传输的音视频数据进行处理，并将处理后的所述音视频数据传输给所述外部设备。The execution module is further configured to process the audio and video data to be transmitted according to the determined processing mode, and transmit the processed audio and video data to the external device.

本发明实施例中所述的音视频通信方法、装置、计算机装置、及可读存储介质，通过于计算机装置与外部设备进行音视频通信时，获取待传输的音视频数据，从所述待传输的音视频数据中提取音视频相关参数；调用预先训练生成的场景识别模型，根据所获取的音视频相关参数识别用户当前所处场景；根据用户当前所处场景确定对所述待传输的音视频数据的处理方式；及根据所述确定的处理方式对所述待传输的音视频数据进行处理，并将处理后的所述音视频数据传输给所述外部设备，可改善用户音视频通信体验。The audio and video communication method, device, computer device, and readable storage medium described in the embodiments of the present invention obtain audio and video data to be transmitted when the computer device performs audio and video communication with an external device, and obtain audio and video data to be transmitted from the to-be-transmitted audio and video data. The audio and video related parameters are extracted from the audio and video data; the scene recognition model generated by pre-training is called, and the user's current scene is identified according to the acquired audio and video related parameters; and processing the audio and video data to be transmitted according to the determined processing mode, and transmitting the processed audio and video data to the external device, which can improve the user's audio and video communication experience.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to the provided drawings without creative work.

图1是本发明实施例一提供的音视频通信方法的流程图。FIG. 1 is a flowchart of an audio and video communication method provided by Embodiment 1 of the present invention.

图2是本发明实施例二提供的音视频通信装置的结构图。FIG. 2 is a structural diagram of an audio and video communication apparatus according to Embodiment 2 of the present invention.

图3是本发明实施例三提供的计算机装置的示意图。FIG. 3 is a schematic diagram of a computer apparatus according to Embodiment 3 of the present invention.

如下具体实施方式将结合上述附图进一步说明本发明。The following specific embodiments will further illustrate the present invention in conjunction with the above drawings.

具体实施方式Detailed ways

为了能够更清楚地理解本发明的上述目的、特征和优点，下面结合附图和具体实施例对本发明进行详细描述。需要说明的是，在不冲突的情况下，本发明的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present invention, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and the features in the embodiments may be combined with each other under the condition of no conflict.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In the following description, many specific details are set forth in order to facilitate a full understanding of the present invention, and the described embodiments are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

除非另有定义，本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本发明。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terms used herein in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention.

实施例一Example 1

在本实施例中，所述音视频通信方法可以应用于计算机装置中，对于需要进行音视频通信的计算机装置，可以直接在计算机装置上集成本发明的方法所提供的用于音视频通信的功能，或者以软件开发工具包(Software Development Kit，SDK)的形式运行在计算机装置上。In this embodiment, the audio and video communication method can be applied to a computer device. For a computer device that needs to perform audio and video communication, the function for audio and video communication provided by the method of the present invention can be directly integrated on the computer device. , or run on a computer device in the form of a software development kit (Software Development Kit, SDK).

如图1所示，所述音视频通信方法具体包括以下步骤，根据不同的需求，该流程图中步骤的顺序可以改变，某些步骤可以省略。As shown in FIG. 1 , the audio and video communication method specifically includes the following steps. According to different requirements, the order of the steps in the flowchart can be changed, and some steps can be omitted.

步骤S1、于计算机装置与外部设备进行音视频通信时，获取待传输的音视频数据，从所述待传输的音视频数据中提取音视频相关参数。Step S1, when the computer device performs audio and video communication with an external device, acquire audio and video data to be transmitted, and extract audio and video related parameters from the audio and video data to be transmitted.

本实施例中，所述音视频相关参数包括，但不限于，音频频谱特征、音量大小、频率分布，视频图像所包括人像以及人像的数目、地面、背景。In this embodiment, the audio and video related parameters include, but are not limited to, audio frequency spectrum features, volume, frequency distribution, portraits included in the video image and the number of portraits, ground, and background.

在一个实施例中，所述音视频数据是指利用麦克风所采集的音频数据和利用摄像头同步所捕捉的视频数据。In one embodiment, the audio and video data refers to audio data collected by a microphone and video data captured by a camera synchronization.

在一个实施例中，可以首先对音频数据进行加窗分帧。例如，可以采用汉宁窗将音频数据分为多个帧长例如为10-30ms(毫秒)的帧，帧移可以取10ms，从而可以将音频数据分为多帧。对音频数据进行加窗分帧后，然后对加窗分帧后的音频数据进行快速傅里叶转换，由此得到音频数据的频谱。接着根据音频数据的频谱提取出音频数据对应的频谱特征。In one embodiment, the audio data may be firstly windowed and framed. For example, the audio data can be divided into multiple frames with a frame length of 10-30ms (milliseconds) by using a Hanning window, and the frame shift can be 10ms, so that the audio data can be divided into multiple frames. After windowing and framing the audio data, fast Fourier transform is then performed on the audio data after windowing and framing, thereby obtaining the frequency spectrum of the audio data. Then, the spectral features corresponding to the audio data are extracted according to the frequency spectrum of the audio data.

在一个实施例中，所述音视频相关参数所包括音量大小可以是指音量的平均值。In one embodiment, the volume level included in the audio and video related parameters may refer to the average volume of the volume.

在一个实施例中，可以利用图像识别算法从音视频数据中识别视频图像所包括人像以及人像的数目、地面、背景。In one embodiment, an image recognition algorithm can be used to identify the human figures included in the video image and the number, ground, and background of the human figures from the audio and video data.

在一个实施例中，所述麦克风和摄像头可以内置于所述计算机装置，或者以有线/无线的方式外接于所述计算机装置。In one embodiment, the microphone and the camera may be built into the computer device, or externally connected to the computer device in a wired/wireless manner.

举例而言，可以利用USB数据线将所述麦克风和摄像头与所述计算机装置建立通信连接。For example, a USB data cable can be used to establish a communication connection between the microphone and the camera and the computer device.

在一个实施例中，计算机装置和外部设备可以是智能手机、平板电脑、笔记本电脑、台式电脑、智能电视等设备。In one embodiment, the computer device and the external device may be a smartphone, a tablet computer, a laptop computer, a desktop computer, a smart TV, or the like.

在一个实施例中，所述计算机装置和外部设备可以通过任何传统的有线网络及/或无线网络通信连接。所述有线网络可以为传统有线通信的任何类型，例如因特网、局域网。所述无线网络可以为传统无线通信的任何类型，例如无线电、无线保真(WirelessFidelity,WIFI)、蜂窝、卫星、广播等。无线通信技术可以包括，但不限于，全球移动通信系统(Global System for Mobile Communications，GSM)、通用分组无线业务(GeneralPacket Radio Service，GPRS)、码分多址(Code Division Multiple Access，CDMA)，宽带码分多址(W-CDMA)、CDMA2000、IMT单载波(IMT Single Carrier)、增强型数据速率GSM演进(Enhanced Data Rates for GSM Evolution，EDGE)、长期演进技术(Long-TermEvolution，LTE)、高级长期演进技术、时分长期演进技术(Time-Division LTE，TD-LTE)、高性能无线电局域网(High Performance Radio Local Area Network，HiperLAN)、高性能无线电广域网(High Performance Radio Wide Area Network，HiperWAN)、本地多点派发业务(Local Multipoint Distribution Service，LMDS)、全微波存取全球互通(WorldwideInteroperability for Microwave Access，WiMAX)、紫蜂协议(ZigBee)、蓝牙、正交频分复用技术(Flash Orthogonal Frequency-Division Multiplexing，Flash-OFDM)、大容量空分多路存取(High Capacity Spatial Division Multiple Access，HC-SDMA)、通用移动电信系统(Universal Mobile Telecommunications System，UMTS)、通用移动电信系统时分双工(UMTS Time-Division Duplexing，UMTS-TDD)、演进式高速分组接入(Evolved HighSpeed Packet Access，HSPA+)、时分同步码分多址(Time Division Synchronous CodeDivision Multiple Access，TD-SCDMA)、演进数据最优化(Evolution-Data Optimized，EV-DO)、数字增强无绳通信(Digital Enhanced Cordless Telecommunications，DECT)及其他。In one embodiment, the computer apparatus and external device may be communicatively connected via any conventional wired network and/or wireless network. The wired network may be any type of traditional wired communication, such as the Internet, a local area network. The wireless network may be any type of conventional wireless communication, such as radio, Wireless Fidelity (WIFI), cellular, satellite, broadcast, and the like. Wireless communication technologies may include, but are not limited to, Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), broadband Code Division Multiple Access (W-CDMA), CDMA2000, IMT Single Carrier, Enhanced Data Rates for GSM Evolution (EDGE), Long-Term Evolution (LTE), Advanced Long Term Evolution, Time-Division LTE (TD-LTE), High Performance Radio Local Area Network (HiperLAN), High Performance Radio Wide Area Network (HiperWAN), local Local Multipoint Distribution Service (LMDS), Worldwide Interoperability for Microwave Access (WiMAX), ZigBee, Bluetooth, Flash Orthogonal Frequency-Division Multiplexing, Flash-OFDM), High Capacity Spatial Division Multiple Access (HC-SDMA), Universal Mobile Telecommunications System (UMTS), Universal Mobile Telecommunications System Time Division Duplex (UMTS) Time-Division Duplexing, UMTS-TDD), Evolved High Speed Packet Access (HSPA+), Time Division Synchronous Code Division Multiple Access (TD-SCDMA), Evolution Data Optimization (Evolution -Data Optimized, EV-DO ), Digital Enhanced Cordless Telecommunications (DECT) and others.

步骤S2、调用预先训练生成的场景识别模型，根据所获取的音视频相关参数识别用户当前所处场景。Step S2, calling the scene recognition model generated by pre-training, and identifying the scene where the user is currently located according to the acquired audio and video related parameters.

具体地，将所获取的音视频相关参数输入至所述预先训练生成的场景识别模型，得到用户当前所处场景。Specifically, the acquired audio and video related parameters are input into the scene recognition model generated by the pre-training to obtain the scene where the user is currently located.

本实施例中，所述场景可以分为室内和室外。不同的场景对应不同的音视频相关参数。In this embodiment, the scene can be divided into indoor and outdoor. Different scenarios correspond to different audio and video related parameters.

优先的，训练所述场景识别模型的方法包括：Preferably, the method for training the scene recognition model includes:

1)获取预设数量的与所述不同的场景分别对应的音视频相关参数，并对与每种场景所对应的音视频相关参数标注类别，使得与所述每种场景所对应的音视频相关参数携带类别标签。1) Acquire a preset number of audio and video related parameters corresponding to the different scenes, and label the audio and video related parameters corresponding to each kind of scene, so that the audio and video related parameters corresponding to each kind of scene are related. Parameters carry category labels.

例如，选取与室内所对应的音视频相关参数共1000笔，并对该1000笔记录分别标注为“1”，即以“1”作为标签。类似地，选取与室外所对应的音视频相关参数1000笔，并对该1000笔记录分别标注为“2”，即以“2”作为标签。For example, a total of 1000 records of audio and video related parameters corresponding to the room are selected, and the 1000 records are marked as "1" respectively, that is, "1" is used as the label. Similarly, select 1000 audio and video related parameters corresponding to the outdoors, and mark the 1000 records as "2", that is, use "2" as the label.

2)分别将与所述不同场景对应的音视频相关参数随机分成第一预设比例的训练集和第二预设比例的验证集，利用所述训练集训练所述场景识别模型，并利用所述验证集验证训练后的所述场景识别模型的准确率。2) The audio and video related parameters corresponding to the different scenes are respectively randomly divided into a training set of a first preset ratio and a verification set of a second preset ratio, and the scene recognition model is trained by using the training set, and the The validation set verifies the accuracy of the trained scene recognition model.

举例而言，可以首先将与不同场景对应的训练样本(即音视频相关参数)分发到不同的文件夹里。例如，将与室内所对应的训练样本分发到第一文件夹里，将与室外所对应的训练样本分发到第二文件夹里。然后从不同的文件夹里分别提取第一预设比例(例如，70％)的训练样本作为总的训练样本进行场景识别模型的训练，从不同的文件夹里分别取剩余第二预设比例(例如，30％)的训练样本作为总的测试样本对训练完成的所述场景识别模型进行准确性验证。For example, training samples corresponding to different scenes (that is, audio and video related parameters) may be distributed to different folders first. For example, the training samples corresponding to the indoor are distributed to the first folder, and the training samples corresponding to the outdoor are distributed to the second folder. Then, the training samples of the first preset ratio (for example, 70%) are respectively extracted from different folders as the total training samples to train the scene recognition model, and the remaining second preset ratios ( For example, 30%) of the training samples are used as the total test samples to verify the accuracy of the trained scene recognition model.

3)若所述准确率大于或者等于预设准确率时，则结束训练，以训练后的所述场景识别模型作为分类器识别所述用户当前所处环境；若所述准确率小于所述预设准确率时，则增加样本数量以重新训练所述场景识别模型直至所述准确率大于或者等于预设准确率。3) If the accuracy rate is greater than or equal to the preset accuracy rate, end the training, and use the trained scene recognition model as a classifier to identify the current environment of the user; if the accuracy rate is less than the preset accuracy rate When the accuracy rate is set, the number of samples is increased to retrain the scene recognition model until the accuracy rate is greater than or equal to the preset accuracy rate.

步骤S3、根据用户当前所处场景确定对所述待传输的音视频数据的处理方式，其中，不同的场景对应不同的处理方式。Step S3: Determine a processing method for the audio and video data to be transmitted according to the scene where the user is currently located, wherein different scenes correspond to different processing methods.

本实施例中，所述根据用户当前所处场景确定对所述待传输的音视频数据的处理方式包括：In this embodiment, the determining of the processing method for the audio and video data to be transmitted according to the current scene of the user includes:

当用户当前所处场景为室外时，确定对所述待传输的音视频数据的处理方式为第一方式；及When the scene where the user is currently located is outdoors, determining that the processing mode for the audio and video data to be transmitted is the first mode; and

当用户当前所处场景为室内时，确定对所述待传输的音视频数据的处理方式为第二方式。When the scene where the user is currently located is indoors, it is determined that the processing mode for the audio and video data to be transmitted is the second mode.

在一个实施例中，所述第一方式是指对所述待传输的音视频数据的处理至少包括降噪(noise reduction)处理。在一个实施例中，还可以进一步包括语音增强。In one embodiment, the first manner means that the processing of the audio and video data to be transmitted at least includes noise reduction processing. In one embodiment, speech enhancement may also be further included.

在一个实施例中，所述第二方式是指根据室内面积和室内墙壁的材质处理所述待传输的音视频数据。In one embodiment, the second manner refers to processing the audio and video data to be transmitted according to the indoor area and the material of the indoor wall.

在一个实施例中，所述根据室内面积和室内墙壁的材质处理所述待传输的音视频数据包括步骤(a1)-(a4)：In one embodiment, the processing of the audio and video data to be transmitted according to the indoor area and the material of the indoor walls includes steps (a1)-(a4):

(a1)估算室内面积的大小。(a1) Estimate the size of the indoor area.

在一个实施例中，所述估算室内面积的大小包括步骤(a11)-(a13)：In one embodiment, the estimating the size of the indoor area includes steps (a11)-(a13):

(a11)从所述音视频数据中截取一帧包括用户的头像的图像；(a11) intercepting a frame of an image including the user's avatar from the audio and video data;

(a12)计算所述用户的头像所包括的像素点总数(为方便描述，简称为“第一像素点总数”)，以及计算所述截取的图像所包括的像素点总数(为方便描述，简称为“第二像素点总数”)；(a12) Calculate the total number of pixels included in the user's avatar (for convenience of description, referred to as "the first total number of pixels"), and calculate the total number of pixels included in the captured image (for convenience of description, referred to as "the total number of pixels") is "the total number of second pixels");

(a13)根据所述第一像素点总数和第二像素点总数之间的比值估算所述室内面积的大小。(a13) Estimate the size of the indoor area according to the ratio between the total number of the first pixel points and the total number of the second pixel points.

在一个实施例中，所述室内面积的大小等于一个预设值除以所述比值。In one embodiment, the size of the indoor area is equal to a predetermined value divided by the ratio.

(a2)确定室内墙壁的材质，并根据墙壁的材质确定吸声系数。(a2) Determine the material of the indoor wall, and determine the sound absorption coefficient according to the material of the wall.

具体地，所述确定室内墙壁的材质包括步骤(a21)-(a22)：Specifically, the determining the material of the indoor wall includes steps (a21)-(a22):

(a21)从所述音视频数据中截取一帧包括墙壁的图像。(a21) Cut out a frame of an image including a wall from the audio and video data.

在一个实施例中，可根据用户的操作从所述音视频数据中截取所述包括墙壁的图像。In one embodiment, the image including the wall can be intercepted from the audio and video data according to a user's operation.

(a22)利用图像识别算法将所截取的图像与预先存储的多张不同材质的图像进行匹配以确定所述墙壁的材质。(a22) Using an image recognition algorithm to match the captured image with a plurality of pre-stored images of different materials to determine the material of the wall.

具体地，当所截取的图像与预先存储的其中某种材质的图像的相似度大于预设的相似度值时，则确定所述墙壁的材质为所述某种材质。Specifically, when the similarity between the captured image and the pre-stored image of a certain material is greater than a preset similarity value, it is determined that the material of the wall is the certain material.

不同的材质对应不同的吸声系数。因此，一旦确定了墙壁的材质，则可确定吸声系数。Different materials correspond to different sound absorption coefficients. Therefore, once the material of the wall is determined, the sound absorption coefficient can be determined.

(a3)将所述室内面积乘以所确定的吸声系数估算吸声量。(a3) The amount of sound absorption is estimated by multiplying the indoor area by the determined sound absorption coefficient.

(a4)根据所估算获得的吸声量处理所述待传输的音视频数据。(a4) Process the audio and video data to be transmitted according to the estimated sound absorption.

在一个实施例中，当所估算获得的吸声量大于一个预设的吸声量值时，对所述待传输的音视频数据的处理至少包括去混响(dereverberation)处理。当所估算获得的吸声量小于或者等于所述预设的吸声量值时，对所述待传输的音视频数据的处理可以不包括去混响处理。In one embodiment, when the estimated sound absorption amount is greater than a preset sound absorption amount value, the processing of the audio and video data to be transmitted at least includes de-reverberation processing. When the estimated sound absorption amount is less than or equal to the preset sound absorption amount value, the processing of the audio and video data to be transmitted may not include de-reverberation processing.

在一个实施例中，所述根据所估算获得的吸声量处理所述待传输的音视频数据还可以进一步包括回声消除、语音增强。In one embodiment, the processing of the audio and video data to be transmitted according to the estimated sound absorption may further include echo cancellation and speech enhancement.

步骤S4、根据所述确定的处理方式对所述待传输的音视频数据进行处理，并将处理后的所述音视频数据传输给所述外部设备。Step S4: Process the audio and video data to be transmitted according to the determined processing mode, and transmit the processed audio and video data to the external device.

举例而言，假设所确定的处理方式为所述第一方式，则对所述待传输的音视频数据至少作降噪处理。For example, assuming that the determined processing method is the first method, at least noise reduction processing is performed on the audio and video data to be transmitted.

在一个实施例中，无论采用所述第一方式还是所述第二方式处理所述待传输的音视频数据，均还进一步包括对所述待传输的音视频数据作如下处理，包括步骤(b1)-(b3)：In one embodiment, regardless of whether the audio and video data to be transmitted is processed in the first manner or the second manner, the process further includes processing the audio and video data to be transmitted as follows, including step (b1 )-(b3):

(b1)确定所述待传输的音视频数据所包括的视频图像中是否存在多个人像(例如人像的数目大于或等于2)；(b1) determine whether there are multiple portraits in the video images included in the audio and video data to be transmitted (for example, the number of portraits is greater than or equal to 2);

(b2)当确定所述视频图像中存在多个人像时，识别所述视频图像中正对镜头的人像，当所述视频图像中不存在多个人像时，则不对所述视频图像中正对镜头的人像进行识别；(b2) When it is determined that there are multiple portraits in the video image, identify the portraits in the video image that are facing the lens, and when there are no multiple portraits in the video image, do not face recognition;

(b3)将所述视频图像中，除所述正对镜头的人像之外的其他人像作虚化处理，从而突显所述正对镜头的人像。(b3) Perform bokeh processing on other portraits in the video image except the portrait facing the lens, so as to highlight the portrait facing the lens.

在一个实施例中，无论采用所述第一方式还是所述第二方式处理所述待传输的音视频数据，均还进一步包括对所述待传输的音视频数据作如下处理，包括步骤(c1)-(c3)：In one embodiment, whether the audio and video data to be transmitted is processed in the first manner or the second manner, the process further includes processing the audio and video data to be transmitted as follows, including step (c1 )-(c3):

(c1)获取所述待传输的音视频数据中的视频图像的平均亮度。(c1) Obtain the average brightness of video images in the audio and video data to be transmitted.

具体地，可通过图像亮度检测算法获取视频图像的平均亮度。Specifically, the average brightness of the video image can be obtained through an image brightness detection algorithm.

具体而言，在本申请的实施例中，获取视频图像的平均亮度的具体实现过程可包括：获取视频图像的分辨率，并根据分辨率确定对应的采样间隔，以及根据采样间隔对上述视频图像中的像素点的亮度进行采样以生成平均亮度。Specifically, in the embodiment of the present application, the specific implementation process of acquiring the average brightness of the video image may include: acquiring the resolution of the video image, determining a corresponding sampling interval according to the resolution, and performing a sampling interval on the video image according to the sampling interval. The brightness of the pixels in is sampled to generate the average brightness.

在一个实施例中，图像亮度检测算法可包括平均算法、直方图算法等。In one embodiment, the image brightness detection algorithm may include an averaging algorithm, a histogram algorithm, and the like.

在一个实施例中，可根据用户当前所处场景选择相应的亮度检测算法以获取视频图像的平均亮度。In one embodiment, a corresponding brightness detection algorithm may be selected according to the scene where the user is currently located to obtain the average brightness of the video image.

在一个实施例中，以平均算法为例，可根据视频图像的分辨率，进行采样计算。In one embodiment, taking the averaging algorithm as an example, the sampling calculation can be performed according to the resolution of the video image.

举例而言，可首先获取视频图像的分辨率，然后根据该视频图像的分辨率的大小确定对应的采样间隔。例如当该视频图像的分辨率小于预设的分辨率时，确定采样间隔为1，即计算整个视频图像；当视频图像的分辨率为所述预设的分辨率的1～4倍时，确定水平与垂直方向的采样间隔为2，即在视频图像中每两个像素点选取一个像素点；当视频图像的分辨率为所述预设的分辨率的4～8倍时，确定水平与垂直方向的采样间隔为4，即在视频图像中每四个像素点选取一个像素点；当视频图像的分辨率大于所述预设的分辨率的8倍时，确定水平与垂直方向的采样间隔为8，即在视频图像中每八个像素点选取一个像素点。依次类推，对更大分辨率的视频图像确定采样间隔。在确定采样间隔之后，对视频图像中根据该采样间隔所采样的像素点进行亮度值计算，并将所采样的所有像素点的亮度值相加后求平均，将计算得到的值作为整个视频图像的平均亮度。For example, the resolution of the video image may be acquired first, and then the corresponding sampling interval may be determined according to the size of the resolution of the video image. For example, when the resolution of the video image is smaller than the preset resolution, the sampling interval is determined to be 1, that is, the entire video image is calculated; when the resolution of the video image is 1 to 4 times the preset resolution, the sampling interval is determined to be 1. The sampling interval between the horizontal and vertical directions is 2, that is, one pixel is selected for every two pixels in the video image; when the resolution of the video image is 4 to 8 times the preset resolution, determine the horizontal and vertical The sampling interval in the direction is 4, that is, one pixel is selected for every four pixels in the video image; when the resolution of the video image is greater than 8 times the preset resolution, the sampling interval in the horizontal and vertical directions is determined as 8, that is, select one pixel for every eight pixels in the video image. By analogy, the sampling interval is determined for larger resolution video images. After the sampling interval is determined, calculate the luminance value of the pixel points sampled according to the sampling interval in the video image, add the luminance values of all the sampled pixel points and average them, and use the calculated value as the entire video image. average brightness.

(c2)判断视频图像的平均亮度是否小于预设的亮度阈值。(c2) Determine whether the average brightness of the video image is less than a preset brightness threshold.

其中，预设亮度阈值可根据用户当前所处场景来选择合适的亮度阈值，即不同的场景需使用不同的阈值。Wherein, the preset brightness threshold can be selected according to the scene where the user is currently located, that is, different thresholds need to be used for different scenes.

在一个实施例中，用户当前所处场景为室外时所对应的亮度阀值大于用户当前所处场景为室内时所对应的亮度阀值。In one embodiment, the brightness threshold value corresponding to the scene where the user is currently located is outdoor is greater than the brightness threshold value corresponding to when the scene where the user is currently located is indoors.

(c3)如果视频图像的平均亮度小于所述预设的亮度阈值，则对视频图像进行亮度增强。如果视频图像的平均亮度大于或者等于所述预设的亮度阈值，则不对视频图像进行亮度增强处理。(c3) If the average brightness of the video image is less than the preset brightness threshold, perform brightness enhancement on the video image. If the average brightness of the video image is greater than or equal to the preset brightness threshold, no brightness enhancement processing is performed on the video image.

在一个实施例中，当视频图像的平均亮度小于所述预设的亮度阈值时，可采用线性亮度增强算法对视频图像进行亮度增强。In one embodiment, when the average brightness of the video image is less than the preset brightness threshold, a linear brightness enhancement algorithm may be used to enhance the brightness of the video image.

综上所述，本发明实施例中所述的音视频通信方法，通过于计算机装置与外部设备进行音视频通信时，获取待传输的音视频数据，从所述待传输的音视频数据中提取音视频相关参数；调用预先训练生成的场景识别模型，根据所获取的音视频相关参数识别用户当前所处场景；根据用户当前所处场景确定对所述待传输的音视频数据的处理方式；及根据所述确定的处理方式对所述待传输的音视频数据进行处理，并将处理后的所述音视频数据传输给所述外部设备，可改善用户音视频通信体验。To sum up, the audio and video communication method described in the embodiments of the present invention obtains audio and video data to be transmitted when a computer device performs audio and video communication with an external device, and extracts audio and video data from the audio and video data to be transmitted. Audio and video related parameters; call a scene recognition model generated by pre-training, identify the current scene of the user according to the acquired audio and video related parameters; determine the processing method for the audio and video data to be transmitted according to the current scene of the user; and The audio and video data to be transmitted is processed according to the determined processing method, and the processed audio and video data is transmitted to the external device, which can improve the user's audio and video communication experience.

上述图1详细介绍了本发明的音视频通信方法，下面结合图2和图3，对实现所述音视频通信方法的软件装置的功能模块以及实现所述音视频通信方法的硬件装置架构进行介绍。The above-mentioned Fig. 1 introduces the audio-video communication method of the present invention in detail, and below in conjunction with Fig. 2 and Fig. 3, the functional modules of the software device for realizing the audio-video communication method and the hardware device architecture for realizing the audio-video communication method are introduced. .

应该了解，所述实施例仅为说明之用，在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.

实施例二Embodiment 2

参阅图2所示，是本发明实施例二提供的音视频通信装置的结构图。Referring to FIG. 2 , it is a structural diagram of an audio and video communication apparatus provided by Embodiment 2 of the present invention.

在一些实施例中，所述音视频通信装置30运行于计算机装置中。所述计算机装置通过网络连接了外部设备。所述音视频通信装置30可以包括多个由程序代码段所组成的功能模块。所述音视频通信装置30中的各个程序段的程序代码可以存储于计算机装置的存储器中，并由所述至少一个处理器所执行，以实现(详见图2描述)音视频通信功能。In some embodiments, the audiovisual communication device 30 runs on a computer device. The computer apparatus is connected to an external device through a network. The audio and video communication device 30 may include a plurality of functional modules composed of program code segments. The program codes of each program segment in the audio-video communication device 30 may be stored in the memory of the computer device and executed by the at least one processor to realize (see description in FIG. 2 for details) audio-video communication functions.

本实施例中，所述音视频通信装置30根据其所执行的功能，可以被划分为多个功能模块。所述功能模块可以包括：获取模块301、执行模块302。本发明所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机程序段，其存储在存储器中。在本实施例中，关于各模块的功能将在后续的实施例中详述。In this embodiment, the audio and video communication apparatus 30 may be divided into multiple functional modules according to the functions performed by the audio and video communication apparatus 30 . The functional modules may include: an acquisition module 301 and an execution module 302 . The modules referred to in the present invention refer to a series of computer program segments that can be executed by at least one processor and can perform fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

获取模块301于计算机装置与外部设备进行音视频通信时，获取待传输的音视频数据，从所述待传输的音视频数据中提取音视频相关参数。The acquisition module 301 acquires audio and video data to be transmitted when the computer device performs audio and video communication with an external device, and extracts audio and video related parameters from the audio and video data to be transmitted.

执行模块302调用预先训练生成的场景识别模型，根据所获取的音视频相关参数识别用户当前所处场景。The execution module 302 invokes the scene recognition model generated by the pre-training, and identifies the scene where the user is currently located according to the acquired audio and video related parameters.

具体地，执行模块302将所获取的音视频相关参数输入至所述预先训练生成的场景识别模型，得到用户当前所处场景。Specifically, the execution module 302 inputs the acquired audio and video related parameters into the scene recognition model generated by the pre-training to obtain the scene where the user is currently located.

执行模块302根据用户当前所处场景确定对所述待传输的音视频数据的处理方式，其中，不同的场景对应不同的处理方式。The execution module 302 determines a processing method for the audio and video data to be transmitted according to the scene where the user is currently located, wherein different scenarios correspond to different processing methods.

(a1)估算室内面积的大小。(a1) Estimate the size of the indoor area.

执行模块302根据所述确定的处理方式对所述待传输的音视频数据进行处理，并将处理后的所述音视频数据传输给所述外部设备。The execution module 302 processes the audio and video data to be transmitted according to the determined processing mode, and transmits the processed audio and video data to the external device.

举例而言，可首先获取视频图像的分辨率，然后根据该视频图像的分辨率的大小确定对应的采样间隔。例如当该视频图像的分辨率小于预设的分辨率时，确定采样间隔为1，即计算整个视频图像；当视频图像的分辨率为所述预设的分辨率的1～4倍时，确定水平与垂直方向的采样间隔为2，即在视频图像中每两个像素点选取一个像素点；当视频图像的分辨率为所述预设的分辨率的4～8倍时，确定水平与垂直方向的采样间隔为4，即在视频图像中每四个像素点选取一个像素点；当视频图像的分辨率大于所述预设的分辨率的8倍时，确定水平与垂直方向的采样间隔为8，即在视频图像中每八个像素点选取一个像素点。依次类推，对更大分辨率的视频图像确定采样间隔。在确定采样间隔之后，对视频图像中根据该采样间隔所采样的像素点进行亮度值计算，并将所采样的所有像素点的亮度值相加后求平均，将计算得到的值作为整个视频图像的平均亮度。For example, the resolution of the video image may be acquired first, and then the corresponding sampling interval may be determined according to the size of the resolution of the video image. For example, when the resolution of the video image is smaller than the preset resolution, the sampling interval is determined to be 1, that is, the entire video image is calculated; when the resolution of the video image is 1 to 4 times the preset resolution, the sampling interval is determined to be 1. The sampling interval between the horizontal and vertical directions is 2, that is, one pixel is selected for every two pixels in the video image; when the resolution of the video image is 4 to 8 times the preset resolution, determine the horizontal and vertical The sampling interval in the direction is 4, that is, one pixel is selected for every four pixels in the video image; when the resolution of the video image is greater than 8 times the preset resolution, the sampling interval in the horizontal and vertical directions is determined as 8, that is, select one pixel for every eight pixels in the video image. By analogy, the sampling interval is determined for larger resolution video images. After the sampling interval is determined, calculate the brightness value of the pixel points sampled according to the sampling interval in the video image, add the brightness values of all the sampled pixel points and average them, and use the calculated value as the entire video image. average brightness.

综上所述，本发明实施例中所述的音视频通信装置，通过于计算机装置与外部设备进行音视频通信时，获取待传输的音视频数据，从所述待传输的音视频数据中提取音视频相关参数；调用预先训练生成的场景识别模型，根据所获取的音视频相关参数识别用户当前所处场景；根据用户当前所处场景确定对所述待传输的音视频数据的处理方式，其中，不同的场景对应不同的处理方式；及根据所述确定的处理方式对所述待传输的音视频数据进行处理，并将处理后的所述音视频数据传输给所述外部设备，可改善用户音视频通信体验。To sum up, the audio and video communication device described in the embodiment of the present invention acquires audio and video data to be transmitted when a computer device performs audio and video communication with an external device, and extracts audio and video data from the audio and video data to be transmitted. Audio and video related parameters; call the scene recognition model generated by pre-training, identify the current scene of the user according to the acquired audio and video related parameters; determine the processing method for the audio and video data to be transmitted according to the current scene of the user, wherein , different scenarios correspond to different processing methods; and the audio and video data to be transmitted is processed according to the determined processing method, and the processed audio and video data is transmitted to the external device, which can improve the user experience. Audio and video communication experience.

实施例三Embodiment 3

参阅图3所示，为本发明实施例三提供的计算机装置的结构示意图。在本发明较佳实施例中，所述计算机装置3包括存储器31、至少一个处理器32、至少一条通信总线33。本领域技术人员应该了解，图3示出的计算机装置的结构并不构成本发明实施例的限定，既可以是总线型结构，也可以是星形结构，所述计算机装置3还可以包括比图示更多或更少的其他硬件或者软件，或者不同的部件布置。Referring to FIG. 3 , it is a schematic structural diagram of a computer apparatus according to Embodiment 3 of the present invention. In a preferred embodiment of the present invention, the computer device 3 includes a memory 31 , at least one processor 32 , and at least one communication bus 33 . Those skilled in the art should understand that the structure of the computer device shown in FIG. 3 does not constitute a limitation of the embodiments of the present invention, and may be a bus-type structure or a star-shaped structure, and the computer device 3 may also include a ratio diagram more or less other hardware or software, or a different arrangement of components is shown.

在一些实施例中，所述计算机装置3包括一种能够按照事先设定或存储的指令，自动进行数值计算和/或信息处理的终端，其硬件包括但不限于微处理器、专用集成电路、可编程门阵列、数字处理器及嵌入式设备等。In some embodiments, the computer device 3 includes a terminal capable of automatically performing numerical calculations and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application-specific integrated circuits, Programmable gate arrays, digital processors and embedded devices, etc.

需要说明的是，所述计算机装置3仅为举例，其他现有的或今后可能出现的电子产品如可适应于本发明，也应包含在本发明的保护范围以内，并以引用方式包含于此。It should be noted that the computer device 3 is only an example. If other existing or future electronic products can be adapted to the present invention, they should also be included in the protection scope of the present invention, and are incorporated herein by reference. .

在一些实施例中，所述存储器31用于存储程序代码和各种数据，例如安装在所述计算机装置3中的音视频通信装置30，并在计算机装置3的运行过程中实现高速、自动地完成程序或数据的存取。所述存储器31包括只读存储器(Read-Only Memory，ROM)、随机存储器(Random Access Memory，RAM)、可编程只读存储器(Programmable Read-Only Memory，PROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory，EPROM)、一次可编程只读存储器(One-time Programmable Read-Only Memory，OTPROM)、电子擦除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory，EEPROM)、只读光盘(Compact Disc Read-Only Memory，CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他存储介质。In some embodiments, the memory 31 is used for storing program codes and various data, such as the audio and video communication device 30 installed in the computer device 3 , and realizes high-speed and automatic operation during the operation of the computer device 3 . Complete program or data access. The memory 31 includes a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a programmable read-only memory (Programmable Read-Only Memory, PROM), an erasable programmable read-only memory. (Erasable Programmable Read-Only Memory, EPROM), one-time programmable read-only memory (One-time Programmable Read-Only Memory, OTPROM), electronically erasable rewritable read-only memory (Electrically-Erasable Programmable Read-Only Memory, EEPROM) ), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other computer-readable storage medium that can be used to carry or store data.

在一些实施例中，所述至少一个处理器32可以由集成电路组成，例如可以由单个封装的集成电路所组成，也可以是由多个相同功能或不同功能封装的集成电路所组成，包括一个或者多个中央处理器(Central Processing unit，CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述至少一个处理器32是所述计算机装置3的控制核心(Control Unit)，利用各种接口和线路连接整个计算机装置3的各个部件，通过运行或执行存储在所述存储器31内的程序或者模块，以及调用存储在所述存储器31内的数据，以执行计算机装置3的各种功能和处理数据，例如执行音视频通信的功能。In some embodiments, the at least one processor 32 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one Or a combination of multiple central processing units (Central Processing Units, CPUs), microprocessors, digital processing chips, graphics processors, and various control chips. The at least one processor 32 is the control core (Control Unit) of the computer device 3, and uses various interfaces and lines to connect various components of the entire computer device 3, by running or executing the program stored in the memory 31 or modules, and call the data stored in the memory 31 to perform various functions of the computer device 3 and process data, such as the function of performing audio and video communication.

在一些实施例中，所述至少一条通信总线33被设置为实现所述存储器31以及所述至少一个处理器32等之间的连接通信。In some embodiments, the at least one communication bus 33 is configured to enable connection communication between the memory 31 and the at least one processor 32 and the like.

尽管未示出，所述计算机装置3还可以包括给各个部件供电的电源(比如电池)，优选的，电源可以通过电源管理装置与所述至少一个处理器32逻辑相连，从而通过电源管理装置实现管理充电、放电、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述计算机装置3还可以包括多种传感器、蓝牙模块、Wi-Fi模块等，在此不再赘述。Although not shown, the computer device 3 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 32 through a power management device, so as to be implemented by the power management device Manage charging, discharging, and power management functions. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The computer device 3 may further include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

上述以软件功能模块的形式实现的集成的单元，可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中，包括若干指令用以使得一台计算机装置(可以是服务器、个人电脑等)或处理器(processor)执行本发明各个实施例所述方法的部分。The above-mentioned integrated units implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions for causing a computer device (which may be a server, a personal computer, etc.) or a processor (processor) to execute parts of the methods described in the various embodiments of the present invention.

在进一步的实施例中，结合图2，所述至少一个处理器32可执行所述计算机装置3的操作装置以及安装的各类应用程序(如所述的音视频通信装置30)、程序代码等，例如，上述的各个模块。In a further embodiment, with reference to FIG. 2 , the at least one processor 32 can execute the operating device of the computer device 3 and various installed applications (such as the audio and video communication device 30 ), program codes, etc. , for example, the various modules above.

所述存储器31中存储有程序代码，且所述至少一个处理器32可调用所述存储器31中存储的程序代码以执行相关的功能。例如，图2中所述的各个模块是存储在所述存储器31中的程序代码，并由所述至少一个处理器32所执行，从而实现所述各个模块的功能以达到音视频通信的目的。Program codes are stored in the memory 31, and the at least one processor 32 can call the program codes stored in the memory 31 to perform related functions. For example, each module described in FIG. 2 is a program code stored in the memory 31 and executed by the at least one processor 32, so as to realize the function of each module and achieve the purpose of audio and video communication.

在本发明的一个实施例中，所述存储器31存储一个或多个指令(即至少一个指令)，所述一个或多个指令被所述至少一个处理器32所执行以实现音视频通信的目的。In one embodiment of the present invention, the memory 31 stores one or more instructions (ie, at least one instruction), and the one or more instructions are executed by the at least one processor 32 to achieve the purpose of audio and video communication .

具体地，所述至少一个处理器32对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述，在此不赘述。Specifically, for the specific implementation method of the above instruction by the at least one processor 32, reference may be made to the description of the relevant steps in the embodiment corresponding to FIG. 1 , and details are not described herein.

在本发明所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

对于本领域技术人员而言，显然本发明不限于上述示范性实施例的细节，而且在不背离本发明的精神或基本特征的情况下，能够以其他的具体形式实现本发明。因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外，显然“包括”一词不排除其他单元或，单数不排除复数。装置权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一，第二等词语用来表示名称，而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and range of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other units or, and the singular does not exclude the plural. Several units or means recited in the device claims can also be realized by one unit or means by means of software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.

最后应说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或等同替换，而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. an audio-video communication method, is characterized in that, described method comprises:

When performing audio and video communication with an external device, obtain audio and video data to be transmitted, and extract audio and video related parameters from the audio and video data to be transmitted;

Call the scene recognition model generated by pre-training, and identify the current scene of the user according to the acquired audio and video related parameters;

Determine the processing method for the audio and video data to be transmitted according to the current scene where the user is located, and when the current scene where the user is located is outdoors, determine that the processing method for the audio and video data to be transmitted is the first method, wherein, The first method means that the processing of the audio and video data to be transmitted at least includes noise reduction processing; when the current scene where the user is located is indoors, it is determined that the processing method of the audio and video data to be transmitted is the second. method, wherein the second method refers to processing the audio and video data to be transmitted according to the indoor area and the material of the indoor wall; and

The audio and video data to be transmitted is processed according to the determined processing mode, and the processed audio and video data is transmitted to the external device.

2. The audio-video communication method according to claim 1, wherein the method for training the scene recognition model comprises:

Acquire a preset number of audio and video-related parameters corresponding to different scenarios, and label the audio and video-related parameters corresponding to each scene with categories, so that the audio and video-related parameters corresponding to each of the scenarios carry category labels ;

The audio and video related parameters corresponding to the different scenes are randomly divided into a training set of a first preset ratio and a verification set of a second preset ratio, and the scene recognition model is trained by using the training set, and the verification set is used to train the scene recognition model. set the accuracy of the scene recognition model after training; and

If the accuracy rate is greater than or equal to the preset accuracy rate, end the training; if the accuracy rate is less than the preset accuracy rate, increase the number of samples to retrain the scene recognition model until the accuracy rate is greater than Or equal to the preset accuracy.

3. The audio and video communication method according to claim 1, wherein the processing of the audio and video data to be transmitted according to the indoor area and the material of the indoor wall comprises the steps:

Estimate the size of the indoor area;

Cut out a frame of an image including a wall from the audio and video data to be transmitted;

Use an image recognition algorithm to match the captured image of the wall with a plurality of pre-stored images of different materials to determine the material of the wall; determine the sound absorption coefficient according to the material of the wall;

multiplying the indoor area by the determined sound absorption coefficient to estimate the sound absorption; and

The audio and video data to be transmitted is processed according to the estimated sound absorption, wherein, when the estimated sound absorption is greater than a preset sound absorption value, the audio and video data to be transmitted is processed at least De-reverberation processing is included. When the estimated sound absorption amount is less than or equal to the preset sound absorption amount value, the processing of the audio and video data to be transmitted does not include de-reverberation processing.

4. The audio-video communication method according to claim 3, wherein the estimating the size of the indoor area comprises:

intercepting a frame of an image including the user's avatar from the audio and video data;

calculating the total number of first pixels included in the avatar of the user, and calculating the total number of second pixels included in the captured image;

The size of the indoor area is estimated according to the ratio between the total number of the first pixel points and the total number of the second pixel points, wherein the size of the indoor area is equal to a preset value divided by the ratio.

5. The audio and video communication method according to claim 1, wherein after the audio and video data to be transmitted is processed according to the determined processing mode, the method further comprises:

Determine whether there are multiple portraits in the video images included in the audio and video data to be transmitted;

When it is determined that there are multiple portraits in the video image, identify the portrait in the video image facing the camera, and when there are no multiple portraits in the video image, do not identify the portrait in the video image facing the camera ;and

In the video image, other portraits except the portrait facing the lens are blurred.

6. The audio and video communication method according to claim 1, wherein after the audio and video data to be transmitted is processed according to the determined processing mode, the method further comprises:

Obtain the average brightness of the video image in the audio-video data to be transmitted;

determining whether the average brightness of the video image is less than a preset brightness threshold; and

When the average brightness of the video image is less than the preset brightness threshold, the brightness enhancement is performed on the video image, and when the average brightness of the video image is greater than or equal to the preset brightness threshold, the brightness enhancement process is not performed on the video image.

7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory is used to store at least one instruction, and the processor is used to implement the at least one instruction when executing the at least one instruction as claimed in claims 1 to 6 The audio and video communication method described in any one of the above.

8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, the implementation of any one of claims 1 to 6 is implemented Audio and video communication methods.

9. An audio and video communication device, wherein the device comprises:

an acquisition module for acquiring audio and video data to be transmitted when performing audio and video communication with an external device, and extracting audio and video related parameters from the audio and video data to be transmitted;

The execution module is used to call the scene recognition model generated by pre-training, and identify the current scene of the user according to the acquired audio and video related parameters;

The execution module is further configured to determine the processing mode of the audio and video data to be transmitted according to the current scene of the user, and determine the processing of the audio and video data to be transmitted when the current scene of the user is outdoors The method is the first method, wherein the first method means that the processing of the audio and video data to be transmitted at least includes noise reduction processing; when the scene where the user is currently located is indoors, it is determined to The processing method of the video data is the second method, wherein the second method refers to processing the audio and video data to be transmitted according to the indoor area and the material of the indoor wall; and

The execution module is further configured to process the audio and video data to be transmitted according to the determined processing mode, and transmit the processed audio and video data to the external device.