CN115396784B

CN115396784B - Remote tuning method and system

Info

Publication number: CN115396784B
Application number: CN202211017602.9A
Authority: CN
Inventors: 马敏; 陈洋; 陈玮
Original assignee: Hansang Nanjing Technology Co ltd
Current assignee: Hansang Nanjing Technology Co ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2023-12-08
Anticipated expiration: 2042-08-23
Also published as: CN115396784A

Abstract

Embodiments of the present disclosure provide a method and system for remote tuning, the method comprising: predicting simulated sound data based on at least one of user input, environmental data of the device to be tuned, and distribution data of the device to be tuned; the simulated sound data is transmitted to the remote tuning terminal to cause the remote tuning terminal to play audio based on the simulated sound data.

Description

Remote tuning method and system

Technical Field

The specification relates to the field of information technology, in particular to a remote tuning method and system.

Background

Playback devices (e.g., speakers, etc.) are becoming increasingly popular with consumers, and tuning (e.g., adjustment of volume, sound effects, etc.) of the playback device is essential in order to provide a better audiovisual experience to listeners. In some scenarios, for some reasons, a commissioning person may not be able to listen on site to the sound played by the playback device, such that the commissioning person is typically only able to commission empirically.

Accordingly, there is a need to provide a method and system for remote tuning to better remotely tune a playback device.

Disclosure of Invention

One or more embodiments of the present specification provide a method of remote tuning. The remote tuning method comprises the following steps: predicting simulated sound data based on at least one of user input, environmental data of a device to be tuned, and distribution data of the device to be tuned; and sending the simulated sound data to a remote tuning terminal so that the remote tuning terminal plays audio based on the simulated sound data.

One or more embodiments of the present specification provide a system for remote tuning, comprising: the prediction module is used for predicting the simulated sound data based on at least one of the input of a user, the environment data of the equipment to be tuned and the distribution data of the equipment to be tuned; and the simulation module is used for sending the simulated sound data to a remote tuning terminal so that the remote tuning terminal plays audio based on the simulated sound data.

One or more embodiments of the present description provide a computer-readable storage medium storing computer instructions that, when executed by a processor, implement a method of remote tuning.

One or more embodiments of the present specification provide a remote tuning terminal, including: a speaker array; the speaker array plays audio based on simulated sound data, wherein the simulated sound data is determined based on at least one of user input, environmental data of the device to be tuned, and distribution data of the device to be tuned.

Drawings

The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:

FIG. 1 is a schematic illustration of a remotely tuned application scenario according to some embodiments of the present description;

FIG. 2 is an exemplary block diagram of a remote tuning system according to some embodiments of the present description;

FIG. 3 is an exemplary flow chart of remote tuning shown in accordance with some embodiments of the present description;

FIG. 4 is an exemplary schematic diagram illustrating a top view of an environment in which a device to be tuned is located, according to some embodiments of the present description;

FIG. 5 is an exemplary diagram illustrating determining simulated sound effects based on a first predictive model in accordance with some embodiments of the present disclosure;

FIG. 6 is an exemplary architectural diagram of a first predictive model, shown in accordance with some embodiments of the present description.

FIG. 7 is a schematic diagram of an exemplary architecture for determining simulated volume based on a second predictive model, according to some embodiments of the present disclosure.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

It will be appreciated that "system," "apparatus," "unit" and/or "module" as used herein is one method for distinguishing between different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.

As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.

Fig. 1 is a schematic illustration of a remotely tuned application scenario according to some embodiments of the present description. As shown in fig. 1, the remotely-tuned application scenario 100 may include a device to be tuned 110, a remote tuning terminal 120, a network 130, and a processing device 140, the processing device 140 being configured to perform the method of remotely tuning shown in some embodiments of the present description.

The device to be tuned 110 is a device that needs tuning. For more on the device to be tuned see fig. 3 and its related description.

The remote tuning terminal 120 is a device that plays a simulated sound to a user to allow the user to listen to a tuning effect on trial. In some embodiments, the remote tuning terminal 120 may include a speaker array of a plurality of speakers for playing audio. In some embodiments, the remote tuning terminal 120 may be a remote tuning helmet 150. As illustrated in fig. 1, the remote tuning helmet 150 may include a speaker array 150-1, a noise reducer 150-2, and a microphone 150-3. The speaker array 150-1 is used to play audio. The microphone 150-3 is used to collect sound from the environment in which the user wearing the remote tuning helmet 150 is located. The noise reducer 150-2 is used to remove sound from the environment in which the user wearing the remote tuning helmet 150 is located based on the sound collected by the microphone 150-3.

The network 130 may connect components of the system and/or connect the system with external resource components. The network 130 enables communication between the various components and with other components outside the system to facilitate the exchange of data and/or information. For example, the processing device 140 may receive the environmental data, the distribution data, of the device to be tuned 110 over the network 130. For another example, the processing device 140 may receive input from a user of the remote tuning terminal 120 over the network 130. For another example, the processing device 140 may also transmit analog sound data to the remote tuning terminal 120 via the network 130. The network may be implemented in various ways, such as a local area network, a USB connection, etc.

The processing device 140 may be used to process data and/or information from at least one component of the application scenario 100 or an external data source. For example, the processing device 140 may predict the analog sound data based on at least one of the user's input, the environment data of the device to be tuned 110, and the distribution data of the device to be tuned 110. For another example, the processing device 140 may transmit the simulated sound data to the remote tuning terminal 120 to cause the remote tuning terminal 120 to play audio based on the simulated sound data. The processing device 140 may be a stand-alone device or may be built into the remote tuning terminal 120.

Fig. 2 is an exemplary block diagram of a remote tuning system according to some embodiments of the present description. In some embodiments, the remote tuning system 200 may include a predictive module 210, a simulation module 220.

The prediction module 210 may be used to predict the simulated sound data based on at least one of user input, environmental data of the device to be tuned, and distribution data of the device to be tuned. In some embodiments, the environmental data of the device to be tuned may include at least one of temperature, humidity, mass flow of people, and spatial data of the environment in which the device to be tuned is located, and the spatial data may include one or more of the following features: the type of environment, the size, and the parameters of the sound transmission barrier. In some embodiments, the simulated sound data may include simulated volume and/or simulated sound effects, wherein the simulated sound effects may include at least one of simulated surround mode, simulated gain, and simulated ambient sound.

In some embodiments, the prediction module 210 may be further configured to process the environmental data of the device to be tuned and/or the distribution data of the device to be tuned based on the simulated sound effect determination algorithm to determine the simulated sound effect.

In some embodiments, the remote tuning terminal may include a speaker array of a plurality of speakers, and the prediction module 210 may be further configured to determine a target speaker location in the speaker array based on the environmental data of the device to be tuned and/or the distribution data of the device to be tuned to generate the simulated sound effect.

In some embodiments, the prediction module 210 may be further configured to process the user's input and the environmental data of the device to be tuned based on an analog volume determination algorithm to determine an analog volume.

The simulation module 220 may be used to send the simulated sound data to the remote tuning terminal to cause the remote tuning terminal to play audio based on the simulated sound data.

It should be noted that the above description of the remote tuning system and its modules is for descriptive convenience only and is not intended to limit the present description to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the principles of the system, various modules may be combined arbitrarily or a subsystem may be constructed in connection with other modules without departing from such principles. In some embodiments, the prediction module and the simulation module disclosed in fig. 2 may be different modules in one system, or may be one module to implement the functions of two or more modules described above. For example, each module may share one memory module, or each module may have a respective memory module. Such variations are within the scope of the present description.

Fig. 3 is an exemplary flow chart of remote tuning shown in accordance with some embodiments of the present description. As shown in fig. 3, the process 300 includes the following steps. In some embodiments, the process 300 may be performed by the processing device 140.

In step 310, the simulated sound data is predicted based on at least one of the user's input, the environmental data of the device to be tuned, and the distribution data of the device to be tuned. In some embodiments, step 310 may be performed by prediction module 210.

A user refers to a person or thing that participates in tuning. For example, the user may include a person listening to audio played by the remote tuning terminal (e.g., wearing a remote tuning helmet).

In some embodiments, the user's input may include a parameter adjustment value to tune the device to be tuned. The parameter types include: volume, sound effect. The user input may also include: play content, play music type (e.g., track), play duration, etc.

The user may enter the input through a remote tuning terminal. For example, a button capable of adjusting the volume is provided on the remote tuning terminal, and the user can input the volume through the button, so that the volume of the audio which the user tries to hear can be controlled.

The type of the adjustable parameter on the remote tuning terminal can be designed according to the type of the adjustable parameter on the device to be tuned, so that the device is used for simulating and testing different adjustment values, a user can determine the parameter adjustment value of the adjustable parameter based on the audio played from the remote tuning terminal, and finally, the adjustment mode of the device to be tuned is determined.

In some embodiments, an adjustable parameter (e.g., volume) on the remote tuning terminal may set a default value. For example, when the user does not input the adjustment value of the volume, the volume adjustment of the device to be tuned may be a default value.

The device to be tuned is the device which needs tuning. For example, the device to be tuned may include a speaker, a microphone, a loudspeaker, and the like.

The environment data of the device to be tuned refers to environment-related data of the environment in which the device to be tuned is located.

In some embodiments, the environmental data of the device to be tuned may include at least one of temperature, humidity, mass flow of people, and spatial data of the environment in which the device to be tuned is located, and the spatial data may include one or more of the following features: the type of environment, the size, and the parameters of the sound transmission barrier.

The temperature and humidity of the environment in which the device to be tuned is located can be obtained by acquiring stored or entered data. For example, the temperature of the environment in which the device to be tuned is located may be detected by a temperature sensor disposed in the environment, the processing device may be obtained by communicating with the temperature sensor, the humidity of the environment in which the device to be tuned is located may be detected by a humidity sensor disposed in the environment, and the processing device may be obtained by communicating with the humidity sensor.

The flow of people may be used to represent the concentration of people. In some embodiments, the volume of people in the environment in which the device to be tuned is located may be the number of people in the environment in which the device to be tuned is located at the current time. For example, 15 people are present at the current time, the people flow may be 15.

In some embodiments, the processing device may determine the flow of people through an image recognition algorithm or model based on images acquired by cameras deployed in the environment. In some embodiments, the processing device may also determine the traffic by other means (e.g., gate count, etc.).

Spatial data refers to data related to space, structure, etc. that may affect sound delivery. The spatial data may include parameters of the type, size, and sound transmission barrier of the environment in which the device is located.

The type of environment may be differentiated according to the function, use, etc. of the environment. For example, the type of environment may include a lobby area, clothing area, venue, office, etc. Different types of environments may have different effects on sound. For example, when the type of environment is a hall, there are typically fewer items placed in the environment, the environment is more open, and there may be an enhancement to sound. For another example, when the type of environment is a clothing region, the environment is usually more objects, the environment is more complex, and the sound may be attenuated.

The size of the environment refers to the size of the three-dimensional space of the environment. For example, the environment in which the device to be treated is located is 50m in size ³ 。

The parameters of the sound transmission barrier in the environment refer to parameters related to the barrier affecting the sound transmission in the environment.

In some embodiments, the parameters of the sound transmission barrier of the environment may include wall parameters.

Wall parameters refer to parameters associated with a wall. In some embodiments, the wall parameters may include the number of walls, the size of the walls in the environment. In some embodiments, the wall parameters may also include other information including, but not limited to, wall type (e.g., lime wall, wood wall, brick wall, etc.), wall thickness, wall location, etc.

Spatial data may be acquired in a variety of ways. In some embodiments, the spatial data of the environment in which the device is located may be pre-stored in the storage device, and the processing device may read directly from the storage device. In some embodiments, the spatial data of the environment in which the device is located may be obtained based on a family pattern stored in a storage device or uploaded to a remote tuning terminal. The house type map refers to a map of the installation position of the device to be tuned in space and the space structure. The information of the house type graph can be represented by various feature extraction modes. Spatial data (e.g., wall parameters) of the environment in which the device is located can be determined from the house pattern. For example, the house pattern is input into an image recognition model, which outputs wall parameters.

In some embodiments, the parameters of the sound transfer barrier of the environment may also include a propagation parameter matrix.

The propagation parameter matrix refers to a matrix of parameters related to sound propagation when sound propagates in an environment where the device to be tuned is located. In some embodiments, each device to be tuned may correspond to a propagation parameter matrix.

Different rows or columns of the propagation parameter matrix represent at least one propagation parameter at different first angles. In some embodiments, the propagation parameters may include a first angle, a second angle, a first distance, a second distance, an obstacle material at the intersection point, and the like.

The first angle refers to an angle of a first ray generated with the sounding site as an origin. The different first rays correspond to different first angles. In some embodiments, the first angle may be represented in a three-dimensional space coordinate system in a variety of ways. For example, the angle between the first ray and the ground plane, etc. The first angle may be determined in a number of ways. For example, the first angle may be preset. For another example, a plurality of points are selected on the sphere, and a line connecting each point and the center of the sphere is taken as a ray, and the angle between the ray and the ground plane is taken as a first angle.

The sounding position refers to the position of the device to be tuned corresponding to the propagation parameter matrix.

The second angle refers to the angle of a ray passing through the intersection of the targets with the listening position as the origin. The target intersection point refers to an intersection point of the first ray and an obstacle on which the first ray strikes. The second angle is similar to the first angle, and will not be described again.

The listening position refers to a position where a user may listen to audio in an environment where the device to be tuned is located. The listening position may be preset based on task requirements.

The first distance refers to the distance between the sound producing position and the target intersection point.

The second distance refers to the distance between the listening position and the target intersection point.

The obstacle material at the intersection point refers to the material of the obstacle that the first ray irradiates. For example, the barrier material at the intersection may include lime, tile, rosewood, and the like.

By way of example, fig. 4 is an exemplary schematic diagram of a top view of an environment in which a device to be tuned is located. As shown in fig. 4, the sound producing location 410 is a location where the device to be tuned corresponds in its environment. The listening position 420 is a position preset based on task requirements. Taking the sounding position as an origin, emitting a ray outwards at a preset angle, wherein the ray is the first ray 430, and the preset angle is a first angle corresponding to the first ray 430. The intersection of the first ray 430 and an obstacle (e.g., a wall) is the target intersection 440. Among the rays having the listening position 420 as the origin, the angle corresponding to the ray passing through the target intersection point 440 is determined as the second angle. The distance between the sounding position 410 and the target intersection point 440 is the first distance 450, the distance between the sounding position 420 and the target intersection point 440 is the second distance 460, and the obstacle material at the intersection point is the obstacle material at the target intersection point 440.

In some embodiments, the propagation parameter matrix may be constructed based on a variety of possible methods, for example, the propagation parameters may be acquired by way of field mapping, real-time image recognition, or the like to construct the propagation parameter matrix.

In some embodiments of the present disclosure, by introducing a propagation parameter matrix into parameters of a sound transmission barrier of an environment, finer barrier materials may be obtained according to each point based on a propagation route of sound, so as to more fully describe distribution conditions of the sound transmission barrier of the environment, and more accurate results may be obtained when the parameters of the sound transmission barrier of the environment are used in an algorithm or a model.

In some embodiments of the present specification, by introducing information such as a flow rate of people, spatial data, and the like, environmental data can be more comprehensively represented, so that when the environmental data is used for simulating sound data, more accurate simulation data can be obtained.

The distribution data of the device to be tuned refers to data related to the position and distribution of the device to be tuned in space. In some embodiments, the distribution data of the device to be tuned may include location coordinate information of the device to be tuned in space. In some embodiments, the distribution data of the devices to be tuned may also include other information, such as the number of devices to be tuned, the distance between the devices to be tuned, and the like.

The distribution information of the device to be tuned can be obtained in various ways. For example, the processing device may determine distribution information of the device to be tuned based on the family pattern. For another example, the processing device may capture an image of the device to be tuned in the environment via a camera in the environment and identify the image.

The analog sound data refers to data for causing the sound producing device to produce a sound similar to the actual effect. The analog sound data may be represented in the form of sound waveforms or other data.

In some embodiments, the analog sound data may include an analog volume.

The analog volume refers to data for causing the sound emitting device to emit a volume similar to the actual effect. In some embodiments, the analog volume may correspond to a waveform amplitude in the sound waveform.

In some embodiments, the analog sound data may include analog sound effects, wherein the analog sound effects may include at least one of analog surround mode, analog gain, and analog ambient sound.

The analog sound effect refers to data for causing the sound producing device to produce a sound effect similar to the actual effect. In some embodiments, the analog sound effects may correspond to waveform shapes in the sound waveform.

The analog sound effects may include at least one of analog surround mode, analog gain, and analog ambient sound.

The simulated surround mode refers to data for causing the sound emitting device to emit a surround mode similar to the actual effect. The surround mode refers to a placement mode of speakers corresponding to a mode of creating more realistic listening effects by adding placement of speakers at a reasonable position.

Analog gain refers to data used to cause the sonification device to emit a gain similar to the actual effect. Gain refers to the degree to which the volume is increased or decreased, for example, gain may be a magnification or reduction of the volume.

The simulated ambient sound refers to data for causing the sound producing device to produce an ambient sound similar to the actual effect. Ambient sound refers to the sound of the surrounding environment that can be heard. For example, the ambient sound may be a background sound in which various surrounding noise (e.g., a noisy human voice, noise from some action, etc.) are mixed.

The waveform amplitude corresponding to the analog volume and the waveform form corresponding to the analog sound effect can jointly form the sound waveform corresponding to the analog sound data.

In some embodiments of the present disclosure, by introducing a simulated volume, a simulated sound effect, and defining the simulated sound effect to include at least one of a simulated surround mode, a simulated gain, and a simulated ambient sound, the simulated sound data may be further subdivided into a plurality of components, so that a finer and more accurate sound waveform is generated when simulating sound, thereby making the simulated effect better.

In some embodiments, the simulated sound data may be predicted by a simulated sound determination algorithm based on at least one of user input, environmental data of the device to be tuned, and distribution data of the device to be tuned.

In some embodiments, the analog sound determination algorithm includes an analog volume determination algorithm and an analog sound effect determination algorithm. Wherein the analog volume determination algorithm is used to determine the volume of audio played at the remote tuning terminal. The analog sound effect determination algorithm is used to determine the sound effect of audio played at the remote tuning terminal.

In some embodiments, the simulated sound effect determination algorithm may include a simulated surround pattern determination sub-algorithm that may determine a simulated surround pattern based on the distribution data. In some embodiments, the inputs of the simulated surround pattern determination sub-algorithm may include distribution data and the outputs may include the simulated surround pattern. Different surround modes correspond to different distribution data, and the corresponding relation can be preset. For example, the reference distribution data and its corresponding reference surround pattern are stored by a database. The simulated surround pattern determining sub-algorithm may be to search the distribution data in a database to determine the closest reference distribution data, and further use the reference surround pattern corresponding to the reference distribution data as the simulated surround pattern. The simulated surround mode determination sub-algorithm may also be any other feasible algorithm.

In some embodiments, the analog sound effect determination algorithm may include an analog gain determination sub-algorithm that may determine an analog gain based on environmental data of the device to be tuned. In some embodiments, the input of the analog gain determination sub-algorithm may include environmental data of the device to be tuned and the output may include analog gain. For example, the analog gain determination sub-algorithm may reflect a correspondence between the degree of openness of the environment in which the device to be tuned is located and the gain. The analog gain determining sub-algorithm can determine the analog gain based on the degree of openness of the environment in which the device to be tuned is located (for example, the type of the environment in which the device to be tuned is located can be determined, different types correspond to different degrees of openness), the higher the degree of openness of the environment in which the device to be tuned is located is, the higher the final sound is amplified, and the larger the analog gain output by the algorithm is. The analog gain determination sub-algorithm may also be any other feasible algorithm.

In some embodiments, the analog gain determination sub-algorithm may further include a first gain determination sub-algorithm, a second gain determination sub-algorithm, a third gain determination sub-algorithm, and a gain fusion sub-algorithm.

The first gain determination sub-algorithm refers to a correlation algorithm for determining the first gain. The first gain may refer to the gain of the spatial size versus sound. In some embodiments, the input of the first gain determination sub-algorithm may be the size of the environment in which it is located and the output may be the first gain. For example, the first gain determination sub-algorithm may reflect a correspondence between the size of the environment in which the device to be tuned is located and the gain. The first gain determining sub-algorithm can determine the first gain based on the size of the environment where the device to be tuned is located, and the larger the space of the environment where the device to be tuned is located is, the higher the final sound is amplified, and the larger the first gain output by the algorithm is. Any other feasible algorithm can be used as the first gain determining sub-algorithm.

The second gain determination sub-algorithm refers to a correlation algorithm for determining the second gain. The second gain may refer to the gain of the spatial type to sound. In some embodiments, the input of the second gain determination sub-algorithm may be the type of environment in which the output may be the second gain. For example, the second gain determination sub-algorithm may reflect a correspondence between the type of environment in which the device to be tuned is located and the gain. The second gain determining sub-algorithm can determine the second gain based on the type of the environment where the device to be tuned is located, the type of the environment where the device to be tuned is located is a hall area (usually, objects in the environment of the type are placed less), sound is easier to reflect to generate an echo effect, the sound is louder in hearing, and the second gain output by the algorithm is larger. The second gain determination sub-algorithm may also be any other feasible algorithm.

The third gain determination sub-algorithm refers to a correlation algorithm for determining the third gain. The third gain may refer to the gain of the sound by the obstruction in space. In some embodiments, the input of the third gain determination sub-algorithm may be a parameter of the sound transfer barrier of the environment in which it is located and the output may be the third gain. For example, the third gain determination sub-algorithm may reflect a correspondence between parameters of the sound transfer barrier and the gain of the environment in which the device to be tuned is located. The third gain determining sub-algorithm can determine the third gain based on wall parameters in parameters of sound transmission barriers of the environment, wherein the third gain is larger when the wall thickness is larger in a proper range, the sound is easier to reflect, difficult to penetrate and absorb, and the echo effect is stronger, and the third gain output by the algorithm is also larger when the sound absorption coefficient of a wall type is smaller (for example, the sound absorption coefficient of a marble wall for sound with each frequency is smaller than that of a concrete wall). Any other feasible algorithm can be used as the third gain determining sub-algorithm.

The gain fusion sub-algorithm refers to a correlation algorithm that fuses at least one gain. In some embodiments, the inputs of the gain fusion sub-algorithm may be a first gain, a second gain, a third gain, and the output may be an analog gain. In some embodiments, the gain fusion sub-algorithm may perform weighted fusion (e.g., weighted summation, etc.) on the first gain, the second gain, and the third gain, respectively, based on the gain weight vector, to obtain the analog gain. The gain weight vector may include a weight of the first gain, a weight of the second gain, and a weight of the third gain.

The weight of each gain may be determined in a number of ways. For example, the weight of the first gain, the weight of the second gain, and the weight of the third gain may be preset. For another example, the gain weight vector may be determined based on play characteristics of the remote tuning device. The play characteristics of the remote tuning device may include at least a play duration characteristic and a play content characteristic. When different tracks are played according to preset rules established by experience, different gain weight vectors are selected based on the playing time length and playing content of the played tracks. For another example, the gain weight vector may be determined by a fusion model, where the input of the fusion model includes the obstacle parameters and the obstacle distribution data in the environment where the device to be tuned is located, and the output is the gain weight vector. The obstacle distribution data includes: an obstacle ratio that is strong in sound absorption for low, medium, and high frequencies, and the like. The training sample for the fusion model training can be obtained through historical tuning data.

In some embodiments, the simulated sound effect determination algorithm may include a simulated ambient sound determination sub-algorithm that may determine a simulated ambient sound based on ambient data of the device to be tuned. In some embodiments, the inputs to the simulated ambient sound determination sub-algorithm may include temperature, humidity, and flow of people of the environment in which the device to be tuned is located, and the outputs may include simulated ambient sound. The simulated environmental sound determining sub-algorithm may be any feasible algorithm, for example, the simulated environmental sound determining sub-algorithm may determine the simulated environmental sound from a plurality of preset environmental sounds according to a preset matching rule based on the temperature, the humidity and the people flow of the environment where the device to be tuned is located, where the matching rule may be: and each preset environmental sound corresponds to a group of temperature, humidity and people flow, the temperature, humidity and people flow of the environment where the equipment to be tuned is located are compared with the temperature, humidity and people flow corresponding to the preset environmental sound, and the preset environmental sound with the maximum similarity is determined as the simulation environmental sound output by the simulation environmental sound determination sub-algorithm. For another example, the environmental sound may be preset, and the simulated environmental sound determining sub-algorithm may adjust (e.g., adjust the volume of sound, etc.) the preset environmental sound based on the temperature, the humidity, and the flow of people of the environment where the device to be tuned is located, and then output the simulated environmental sound.

In some embodiments, the simulated ambient sound determination sub-algorithm may include a comfort level determination sub-algorithm and an ambient sound matching sub-algorithm.

The comfort level determination sub-algorithm refers to a related algorithm for determining comfort level. Comfort may refer to the comfort of a person in a particular environment. In some embodiments, the inputs to the comfort level determination sub-algorithm may be temperature, humidity, and flow of people of the environment in which the device to be tuned is located, and the outputs may be comfort levels. The comfort level determining sub-algorithm may be any feasible algorithm, for example, the comfort level determining sub-algorithm may calculate the comfort level through a preset formula based on the temperature and the humidity, and then adjust the comfort level according to the traffic volume (for example, when the traffic volume is large, the surrounding environment is noisy, the comfort level is poor, and the comfort level needs to be reduced at this time), so as to obtain the final comfort level.

The ambient sound matching sub-algorithm refers to a related algorithm that matches ambient sound based on comfort and traffic. In some embodiments, the input of the ambient sound matching sub-algorithm may be the flow of people, comfort, and the output may be the simulated ambient sound of the environment in which the device is located. The ambient sound matching sub-algorithm can be any feasible algorithm. For example, the ambient sound matching sub-algorithm may determine the simulated ambient sound as follows: firstly, matching a corresponding preset environmental sound according to the size of the flow of people; and then, according to the comfort level, the obtained environmental sound is properly strengthened or weakened to obtain the final simulated environmental sound.

In some embodiments, the analog volume determination algorithm may determine the analog volume based on user input, environmental data of the device to be tuned. In some embodiments, the inputs to the analog volume determination sub-algorithm may include a user's input, a temperature, humidity, and a flow of people of the environment in which the device to be tuned is located, and the output may include an analog volume. For example, the analog volume determining algorithm may increase or decrease the volume input by the user according to the temperature, the humidity, and the people flow (for example, the current temperature, the humidity, and the people flow correspond to a low comfort level, in which case the people feel loud and noisy, and at this time, the volume value needs to be increased), and finally the output analog volume is obtained. The analog volume determination sub-algorithm may also be any other feasible algorithm.

In some embodiments, the simulated sound effect determination algorithm may include a first predictive model.

In some embodiments, the simulated sound effects may be determined based on processing environmental data of the device to be tuned and/or distribution data of the device to be tuned based on a first predictive model, which is a machine learning model. For more on the first predictive model and determining the simulated sound effect see fig. 5 and its associated description.

In some embodiments, the remote tuning terminal may include a speaker array of a plurality of speakers, and the processing device may determine a target speaker location in the speaker array based on the environmental data of the device to be tuned and/or the distribution data of the device to be tuned to generate the simulated sound effect.

The target speaker location may refer to the location of a speaker in the speaker array that needs to be active (i.e., that needs to play audio). In some embodiments, the target speaker location may be represented in a variety of ways (e.g., numerical numbers, location coordinates, etc.). Taking the number as an example, the speaker array comprises 10 speakers, the serial numbers are 1-10 in sequence, and if the speakers No. 5 and No. 6 are finally determined to be speakers needing to work, the target speaker positions are as follows: 5,6.

In some embodiments, the processing device may determine a simulated surround pattern based on the environmental data of the device to be tuned and/or the distribution data of the device to be tuned, determine a target speaker location in the speaker array based on the simulated surround pattern, to generate the simulated sound effect. For more on determining the simulated surround mode reference may be made to the rest of the description, e.g. the description of the simulated surround mode determination algorithm, the first predictive model, etc. In some embodiments, the target speaker location may be included in the simulated surround mode, and the processing device may obtain the target speaker location directly from the simulated surround mode. The processing device may control the speakers of the target speaker locations in the speaker array to play to generate the analog sound effects. For example, as shown in fig. 1, the processing device may turn on speakers corresponding to the target speaker locations in the speaker array 150-1 of the remote tuning helmet 150 and turn off the remaining speakers based on the obtained target speaker locations.

In some embodiments of the present disclosure, by determining the target speaker position in the speaker array and generating the simulated sound effect based on the environmental data of the device to be tuned and/or the distribution data of the device to be tuned, the remote tuning device may also fully consider the simulated surround mode from the physical structure when playing the audio, so that the audio corresponding to the simulated sound data heard by the end user is more close to the audio in the real environment.

In some embodiments, the analog volume determination algorithm may include a second predictive model.

In some embodiments, the simulated volume may be determined based on processing the user's input and the environmental data of the device to be tuned based on a second predictive model, which is a machine learning model. For more on the second predictive model and determining the simulated volume see fig. 7 and its associated description.

Step 320, the simulated sound data is sent to the remote tuning terminal to cause the remote tuning terminal to play audio based on the simulated sound data.

In some embodiments, the processing device may also obtain user feedback based on the remote tuning terminal.

Feedback from the user refers to the sensation of the user after listening to the audio. In some embodiments, the user's feedback may be represented in binary (e.g., "acceptable" or "unacceptable"). The remote tuning terminal can be provided with a button, a switch and other structures for user feedback, and the user can perform corresponding feedback by touching the button, toggling the switch and other modes. In some embodiments, the user feedback may also be direct adjustments to the analog sound effects or analog volume, such as increasing or decreasing volume, switching surround modes, etc. It will be appreciated that no adjustment is made to represent acceptable, otherwise unacceptable.

In some embodiments, the processing device may control the device to be tuned in response to feedback from the user. For example, when the feedback indicates "acceptable," the processing device may use the current analog sound data as sound data for the device to be tuned to play audio in the environment. For another example, when the feedback indicates "unacceptable", a new round of simulation is performed based on the user's adjustment and remotely played to the user to obtain new feedback until the user feedback is "acceptable".

In some embodiments of the present description, by introducing user feedback, the method of remote tuning may form a base reference based on the user feedback, thereby better tuning the device to be tuned.

In some embodiments of the present disclosure, the simulated sound data is predicted based on at least one of the input of the user, the environmental data of the device to be tuned, and the distribution data of the device to be tuned, so that the accuracy of sound simulation can be greatly improved, and the audio corresponding to the simulated sound data can be efficiently played to the user.

FIG. 5 is an exemplary architectural diagram illustrating determining simulated sound effects based on a first predictive model in accordance with some embodiments of the present description.

As shown in fig. 5, the input of the first predictive model 530 may include environmental data 510 of the device to be tuned and/or distribution data 520 of the device to be tuned, and the output may include simulated sound effects 540; the environment data 510 of the device to be tuned may include at least one of a temperature 510-1, a humidity 510-2, a flow rate of people 510-3, and space data 510-4 of an environment in which the device to be tuned is located. The first predictive model may be a deep neural network (Deep Neural Network, DNN) or the like.

In some embodiments, the first predictive model 530 may be comprised of a surround mode determination model, which may be used to determine a simulated surround mode, a gain determination model, which may be used to determine a simulated gain, and an ambient sound determination model, which may be used to determine a simulated ambient sound. In some embodiments, the simulated surround pattern determination sub-algorithm may include a surround pattern determination model. In some embodiments, the analog gain determination sub-algorithm may include a gain determination model. In some embodiments, the simulated ambient sound determination sub-algorithm may include an ambient sound determination model. For more on the surround mode determining model, the gain determining model and the ambient sound determining model, see fig. 6 and the related description thereof.

In some embodiments, the first predictive model 530 may be derived by training. For example, training samples may be input into the initial first predictive model 550, a loss function constructed based on the output of the initial first predictive model 550, and parameters of the initial first predictive model 550 iteratively updated based on the loss function until preset conditions are met and training is complete.

In some embodiments, the first training sample 560 may include environmental data of the sample to-be-tuned device and/or distribution data of the sample to-be-tuned device, and the tag of the first training sample 560 is a simulated sound effect corresponding to the environmental data of the sample to-be-tuned device and/or the distribution data of the sample to-be-tuned device. The first training samples and labels may be obtained based on historical data. And determining that the user satisfaction degree in the historical data is higher as a first training sample and a label.

In some embodiments of the present disclosure, the environmental data of the device to be tuned and/or the distribution data of the device to be tuned are processed based on the first prediction model, so that the simulated sound effect is determined, so that the first prediction model can learn the internal rule of the simulated sound effect corresponding to the environmental data and the distribution data based on a large amount of historical data, and thus the simulated sound effect is determined more accurately.

As shown in FIG. 6, the first predictive model 530 may be comprised of a surround mode determination model 630-1, a gain determination model 630-2, and an ambient sound determination model 630-3.

The surround pattern determination model 630-1 may be used to determine a simulated surround pattern. As shown in FIG. 6, the input of the surround pattern determination model 630-1 may include distribution data 520 of the device to be tuned and the output may include a simulated surround pattern 640-1. In some embodiments, the surround mode determination model may be a machine learning model. For example, the surround pattern determination model may be DNN or the like.

The surround mode determination model may be trained in the same training manner as the first predictive model or in another manner.

Gain determination model 630-2 may be used to determine analog gain. As shown in FIG. 6, the input of the gain determination model 630-2 may include spatial data 510-4 of the environment in which the device to be tuned is located and the output may include analog gain 640-2. In some embodiments, the gain determination model may be a machine learning model. For example, the gain determination model may be DNN or the like.

The gain determination model may be trained in the same training manner as the first predictive model or in another manner.

The ambient sound determination model 630-3 may be used to determine simulated ambient sound. As shown in FIG. 6, the inputs to the ambient sound determination model 630-3 may include the temperature 510-1, humidity 510-2, and flow 510-3 of the environment in which the device to be tuned is located, and the outputs may include simulated ambient sound 640-3. In some embodiments, the ambient sound determination model may be a machine learning model. For example, the ambient sound determination model may be DNN or the like.

The ambient sound determination model may be trained in the same training manner as the first predictive model or in another manner.

In some embodiments, the amplified analog gain 640-4 may be determined based on the analog gain 640-2 output by the gain determination model 630-2, and the first amplification factor 650, and the analog gain 640-2 may be replaced with the amplified analog gain 640-4 as the gain of the audio played by the last remote tuning terminal.

The first amplification factor may be used to amplify the analog gain output by the gain determination model. The increase in gain may include an increase or decrease in the value of the gain. For example, if the analog gain output by the gain determination model is X and the first amplification factor is 1.2, the amplified analog gain is 1.2X.

In some embodiments, the amplified simulated ambient sound 640-5 may be determined based on the simulated ambient sound 640-3 output by the ambient sound determination model, and the second amplification factor 660, and the simulated ambient sound 640-3 may be replaced with the amplified simulated ambient sound 640-5 as the ambient sound of the audio played by the last remote tuning terminal.

The second amplification factor may be used to amplify the simulated ambient sound output by the ambient sound determination model. The amplification of the ambient sound may include strengthening/weakening the intensity of the ambient sound. For example, if the simulated ambient sound output by the ambient sound determining model is the sound waveform X and the second amplification factor is 1.2, the sound waveform corresponding to the amplified simulated ambient sound may be: the amplitude of each time of the sound waveform X is stretched to 1.2 times of the original sound waveform.

In some embodiments, the first amplification factor may be determined based on the simulated ambient sound output by the ambient sound determination model. For example, the first amplification factor may be determined by a preset formula based on a high frequency ratio of the analog ambient sound, which may refer to a duty ratio of a high frequency sound waveform in the analog ambient sound in the entire sound waveform. In some embodiments, the first amplification factor may also be related to a play characteristic of the device to be tuned in the user's input. The first amplification factor may also be determined by various possible methods based on the play characteristics of the device to be tuned. For example, when different tracks are played according to preset rules established empirically, different first amplification factors are selected according to the playing time length and playing content of the tracks; for another example, the first amplification factor may be determined based on a similarity of a high frequency ratio in the played content to a high frequency ratio of the analog ambient sound.

In some embodiments, the second amplification factor may be determined based on an analog gain output by the gain determination model. For example, the second amplification factor may be determined based on the magnitude of the analog gain, the larger the analog gain, the smaller the second amplification factor may be.

In some embodiments of the present disclosure, by introducing the first amplification factor and the second amplification factor, the influence of different ambient sound frequencies, different playing contents, and the like on the analog gain can be effectively reflected, and the suppression effect on the ambient sound when the analog gain is used for gain on the playing contents can be reflected.

In some embodiments of the present description, by dividing the first predictive model into three separately predicted models, so that each portion of the simulated sound effect may be predicted using one separately trained model, the accuracy of the prediction of each portion may be improved, thereby improving the accuracy of the prediction of the final simulated sound effect.

As shown in fig. 7, the inputs of the second predictive model 730 may include user inputs 710 and environmental data 510 of the device to be tuned, and the outputs may include simulated volume 740; the environmental data 510 of the device to be tuned input into the second predictive model 730 may include at least one of a temperature 510-1, a humidity 510-2, and a flow rate 510-3 of the environment in which the device to be tuned is located. The second predictive model may be DNN or the like.

In some embodiments, the second predictive model 730 may be obtained through training. For example, training samples may be input into the initial second predictive model 750, a loss function constructed based on the output of the initial second predictive model 750, and parameters of the initial second predictive model 750 iteratively updated based on the loss function until the preset conditions are met and training is completed.

In some embodiments, the second training sample 760 may include a sample user's input and environmental data of the sample to-be-tuned device, and the label of the second training sample is a simulated volume corresponding to the sample user's input and the environmental data of the sample to-be-tuned device. Training samples and labels may be obtained based on historical data. And determining that the user satisfaction degree in the historical data is higher as a second training sample and a label.

In some embodiments of the present disclosure, the simulated volume is determined based on the second prediction model processing the input of the user and the environmental data of the device to be tuned, so that the second prediction model may learn the intrinsic law of the simulated volume corresponding to the input of the user and the environmental data based on a large amount of historical data, thereby determining the simulated volume more accurately.

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations to the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this specification, and therefore, such modifications, improvements, and modifications are intended to be included within the spirit and scope of the exemplary embodiments of the present invention.

Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.

Furthermore, the order in which the elements and sequences are processed, the use of numerical letters, or other designations in the description are not intended to limit the order in which the processes and methods of the description are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing server or mobile device.

Likewise, it should be noted that in order to simplify the presentation disclosed in this specification and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the present description. Indeed, less than all of the features of a single embodiment disclosed above.

In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the number allows for a 20% variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations that may be employed in some embodiments to confirm the breadth of the range, in particular embodiments, the setting of such numerical values is as precise as possible.

Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., referred to in this specification is incorporated herein by reference in its entirety. Except for application history documents that are inconsistent or conflicting with the content of this specification, documents that are currently or later attached to this specification in which the broadest scope of the claims to this specification is limited are also. It is noted that, if the description, definition, and/or use of a term in an attached material in this specification does not conform to or conflict with what is described in this specification, the description, definition, and/or use of the term in this specification controls.

Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims

1. A method of remote tuning, comprising:

Predicting simulated sound data based on user input, environment data of equipment to be tuned and distribution data of the equipment to be tuned, wherein the simulated sound data comprises simulated sound effects; wherein the predicted simulated sound data comprises: predicting the simulated sound data by a simulated sound determination algorithm; the simulated sound determining algorithm comprises a simulated sound effect determining algorithm, the simulated sound effect determining algorithm comprises a first prediction model, the first prediction model is a machine learning model, the input of the first prediction model comprises environment data of equipment to be tuned and distribution data of the equipment to be tuned, and the output of the first prediction model comprises the simulated sound effect;

transmitting the simulated sound data to a remote tuning terminal, so that the remote tuning terminal plays audio based on the simulated sound data; wherein,

the environment data of the equipment to be tuned comprise temperature, humidity, people flow and space data of the environment where the equipment to be tuned is located; the space data comprises parameters of the type, the size and the sound transmission barrier of the environment; the parameters of the sound transmission barrier include a propagation parameter matrix; the propagation parameter matrix refers to a matrix formed by parameters related to sound propagation when the sound propagates in the environment where the device to be tuned is located;

The distribution data of the equipment to be tuned refers to data related to the position and the distribution of the equipment to be tuned in space; the distribution data of the equipment to be tuned comprises position coordinate information of the equipment to be tuned in space.

2. The method of claim 1, wherein the analog sound data further comprises an analog volume, wherein the analog sound effects comprise at least one of an analog surround mode, an analog gain, and an analog ambient sound.

3. The method of claim 2, wherein the remote tuning terminal comprises a speaker array of a plurality of speakers,

the predicting analog sound data based on the user input, the environment data of the device to be tuned and the distribution data of the device to be tuned includes:

and determining a target loudspeaker position in the loudspeaker array based on the environment data of the equipment to be tuned and/or the distribution data of the equipment to be tuned so as to generate the simulated sound effect.

4. The method of claim 2, wherein predicting the simulated sound data based on the user input, the environmental data of the device to be tuned, and the distribution data of the device to be tuned comprises:

And processing the input of the user and the environmental data of the equipment to be tuned based on an analog volume determining algorithm to determine the analog volume.

5. A system for remote tuning, comprising:

the device comprises a prediction module, a control module and a control module, wherein the prediction module is used for predicting simulated sound data based on input of a user, environment data of equipment to be tuned and distribution data of the equipment to be tuned, and the simulated sound data comprises simulated sound effects; wherein the predicted simulated sound data comprises: predicting the simulated sound data by a simulated sound determination algorithm; the simulated sound determining algorithm comprises a simulated sound effect determining algorithm, the simulated sound effect determining algorithm comprises a first prediction model, the first prediction model is a machine learning model, the input of the first prediction model comprises environment data of equipment to be tuned and distribution data of the equipment to be tuned, and the output of the first prediction model comprises the simulated sound effect; the environment data of the equipment to be tuned comprise temperature, humidity, people flow and space data of the environment where the equipment to be tuned is located; the space data comprises parameters of the type, the size and the sound transmission barrier of the environment; the parameters of the sound transmission barrier include a propagation parameter matrix; the propagation parameter matrix refers to a matrix formed by parameters related to sound propagation when the sound propagates in the environment where the device to be tuned is located;

The distribution data of the equipment to be tuned refers to data related to the position and the distribution of the equipment to be tuned in space; the distribution data of the equipment to be tuned comprises position coordinate information of the equipment to be tuned in space;

and the simulation module is used for sending the simulated sound data to a remote tuning terminal so that the remote tuning terminal plays audio based on the simulated sound data.

6. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of claims 1-4.

7. A remote tuning terminal, comprising: a speaker array;

the speaker array plays audio based on simulated sound data, wherein the simulated sound data is determined based on input of a user, environment data of equipment to be tuned and distribution data of the equipment to be tuned, and the simulated sound data comprises simulated sound effects; wherein determining the simulated sound data comprises: predicting the simulated sound data by a simulated sound determination algorithm; the simulated sound determining algorithm comprises a simulated sound effect determining algorithm, the simulated sound effect determining algorithm comprises a first prediction model, the first prediction model is a machine learning model, the input of the first prediction model comprises environment data of equipment to be tuned and distribution data of the equipment to be tuned, and the output of the first prediction model comprises the simulated sound effect;

8. The remote tuning terminal of claim 7, the analog sound data further comprising an analog volume, wherein the analog sound effects comprise at least one of an analog surround mode, an analog gain, and an analog ambient sound.

9. The remote tuning terminal of claim 8, wherein speakers of a target speaker location in the speaker array play to generate the simulated sound effect.