WO2019066348A1

WO2019066348A1 - Audio signal processing method and device

Info

Publication number: WO2019066348A1
Application number: PCT/KR2018/010926
Authority: WO
Inventors: 정현주; 전상배; 전세운; 백용현; 문현기
Original assignee: 가우디오디오랩 주식회사
Priority date: 2017-09-28
Filing date: 2018-09-17
Publication date: 2019-04-04

Abstract

An audio signal processing device comprises a processor for outputting an output audio signal generated on the basis of an input audio signal. The processor can acquire information related to an input audio signal and a virtual space in which the input audio signal is simulated, can determine whether a blocking object, which performs blocking between a sound source and a listener, exists among a plurality of objects, on the basis of the position of each of the plurality of objects included in the virtual space and the position of the sound source corresponding to the input audio signal, with respect to the listener in the virtual space, and can binaurally render the input audio signal on the basis of the determination result so as to generate an output audio signal.

Description

Method and apparatus for processing audio signal

The present disclosure relates to an audio signal processing method and apparatus, and more particularly, to an audio signal processing method and apparatus for providing an immersive sound for a portable device including an HMD (Head Mounted Display) device.

3D audio technology refers to signal processing, transmission, encoding, and rendering technologies that provide sound in a three-dimensional space. 3D audio can reproduce a sound scene with a height direction added to a sound scene on a horizontal plane (2D) where surround audio is reproduced. Especially, in order to provide 3D audio, the audio device can use more number of speakers than the conventional one. Or when using the same or fewer number of speakers than in the prior art, the audio device is required to have a rendering technique that causes the sound image to be formed at a virtual position where no speaker is present. In particular, 3D audio rendering technology is more needed because the sense of presence is more important in a virtual reality (VR) or augmented reality (AR) space reproduced using an HMD device or the like.

The most typical binaural rendering in 3D audio rendering technology is to model the 3D audio signal as an audio signal that is delivered to the user's ears. The user can feel the stereoscopic effect through the binaural rendered 2 channel audio output signal through the headphone or the earphone. Specifically, the user can recognize the position and direction of the sound source corresponding to the sound through the sound heard through both ears of the user. The audio signal processing apparatus can reproduce the 3D sense of 3D audio by modeling the 3D audio signal in the form of a two-channel audio signal transmitted to both ears of the user.

Particularly, an acoustic space that realistically simulates an environment in which a plurality of objects and a user interacts, such as a virtual reality or a game environment, can be an important factor for increasing a user's immersion feeling . At this time, the acoustic characteristics of each of the plurality of objects may be reflected in a complex manner. The interaction between an object and a user can be defined relatively simply by making it heard at a specific position according to the relative position of the sound source. However, when multiple objects are added, the acoustic space may vary depending on the relative position and size of the object between the user and the sound source. This is because various acoustic phenomena may occur depending on the interaction between the objects. For example, when an object is positioned between a sound source and a listener, reflections from the object may be added or the sound source may be intercepted by the object. Accordingly, the object positioned between the sound source and the listener can attenuate the size of the sound reached to the listener. In order to reproduce the interaction of these objects, a sound object (sound object) (or an active object, a sound object, an audio object, an audio element) Non-sound objects (or passive objects, non-audio objects, scene objects, acoustic elements) should be considered. This is because the user interacts with various terrain and objects frequently in virtual reality space.

The audio signal processing apparatus can simulate the interaction between the sound source and the user by using the relative positions of the sound sources on the basis of the listeners in the virtual space. However, when a sound source or a non-sound object other than a sound source is located between the sound source and the listener, the sound space may vary depending on the position and size of the object. For example, when there is an object blocking the sound such as a wall between a listener and a sound source in a virtual space, when the audio signal corresponding to the sound source reaches the listener, the size of the audio signal is smaller than when the object is absent Can be attenuated. Also, the sound corresponding to the sound source can be reflected by the wall surface. Also, even if the object interferes with the path between the sound source and the user, the sound corresponding to the sound source may be diffracted at a specific point of the object, depending on the size of the object. Description of the Related Art [0002] A technique for simulating the above-described acoustic characteristics is required because a user often interacts with various terrains and objects in a virtual reality space, particularly, in a field such as a game.

One embodiment of the present disclosure aims to reproduce a more realistic spatial sound to the user. In particular, the present disclosure aims to efficiently simulate a spatial sound including an occlusion effect caused by an obstacle between a sound source and a listener.

In addition, one embodiment of the present disclosure aims to simulate the effect of the culling on an input audio signal in which various audio signals coexist. In addition, one embodiment of the present disclosure is directed to simulating the interaction between audio signals in various formats and non-sound objects that do not produce sound.

An apparatus for processing an audio signal according to an embodiment of the present disclosure may include a processor for outputting an output audio signal generated based on an input audio signal. Wherein the processor is configured to obtain information about a virtual space in which the input audio signal and the input audio signal are simulated and determine a position of each of at least one object included in the virtual space based on the listener of the virtual space And determining whether there is a blocking object blocking the direct acoustic path between the sound source and the listener based on the position of the sound source corresponding to the input audio signal And binaurally rendering the input audio signal based on the determination result to generate an output audio signal.

The output audio signal may include a transmission audio signal through which the sound corresponding to the input audio signal is passed to the listener through the blocking object. At this time, if the blocking object is present, the processor determines whether or not the input audio signal is converted based on the length of a section in which the direct acoustic path between the sound source and the listener overlaps with the blocking object and the acoustic transmittance of the blocking object. And can generate the transparent audio signal.

In addition, the acoustic transmittance of the blocking object may have different values depending on the frequency bin.

The output audio signal may include a diffracted audio signal that simulates sound that is diffracted by the blocking object to arrive at the listener. At this time, the processor determines, based on the shape of the blocking object, at least one diffraction spot at which the sound corresponding to the input audio signal is diffracted at the surface of the blocking object, Based on this, the input audio signal can be binaurally rendered to generate the diffracted audio signal.

Wherein the processor is configured to obtain a first HRTF corresponding to the at least one diffraction point based on the head direction of the listener and to generate the HRTF using the first HRTF, Binaural rendering to generate the diffracted audio signal.

The processor may determine a point at which the sum of distances of the first path from the point on the surface of the object to the listener and the distance of each of the second path from the point to the source is the at least one diffraction point. In this case, the first path and the second path may be shortest paths that do not cross the object.

Wherein the processor is further configured to perform a binaural rendering of the input audio signal based on the first HRTF and a diffraction distance representing a sum of a distance of the first path and a distance of the second path along the at least one diffraction point, Thereby generating the diffracted audio signal.

Determine an attenuation gain that adjusts the magnitude of the diffracted audio signal based on the diffraction distance and binaurally render the input audio signal based on the first HRTF and the attenuation gain to generate the diffracted audio signal have. At this time, the attenuation gain may have different values according to the frequency bin of the audio signal.

Wherein the processor mixes the diffracted audio signal and the transparent audio signal to generate the output audio signal.

The output audio signal may include a two-channel output audio signal corresponding to each of the two ears of the listener. At this time, the processor determines whether there is the blocking object for each of the right and left sides of the listener based on the position of each of the two ears of the listener, and based on the determination result, It can be generated for each channel.

The blocking object may include a first blocking object that only blocks either the right or left of the listener. The 2-channel output audio signal may also include a reflected audio signal that simulates sound that is reflected by the blocking object to the listener and that corresponds to the input audio signal. At this time, the processor may be configured to detect, based on the position of the ear corresponding to the other one of the listeners of the listener and the shape of the first blocking object, the reflection of the sound corresponding to the input audio signal at the surface of the first blocking object And binaurally rendering the input audio signal based on the position of the reflection point to generate a first reflected audio signal corresponding to the first blocking object.

Wherein the processor is further configured to: obtain a second HRTF corresponding to the reflection point with respect to the head direction of the listener, binaurally render the input audio signal using the second HRTF to generate the first reflected audio signal can do.

Wherein the processor is further configured to determine a channel comprising the first reflected audio signal from the two-channel output audio signal based on the position of the first blocking object, Can be generated. Here, among the two-channel output audio signals, the channel audio signal corresponding to the other one includes the first reflected audio signal, and the channel audio signal corresponding to either one of the two includes a first reflected audio signal .

The processor may determine a position of each of the ears of the listener based on the head size of the listener.

Wherein the processor is further configured to determine, based on the position of the listener, a set of HRTFs comprising a plurality of HRTFs along an elevation angle and an elevation angle of the listener based on the measured reference distance, the location of each ear of the listener, The east side HRTF and the large side HRTF corresponding to the east side and the large side, respectively, and binaurally render the input audio signal based on the east side HRTF and the large side HRTF. At this time, the east side HRTF and the large side HRTF may be HRTF corresponding to different positions among the plurality of HRTFs.

The virtual space may include a plurality of subdivisions in which the reverberation filter is different. The processor may filter the input audio signal based on different reverberation filters for the right and left sides of the listener, respectively, when the positions of the respective ears of the listener are located in different divided spaces, It is possible to generate the reverberant audio signals corresponding to the right and left sides of the reverberant audio signal.

The blocking object may be a non-sound object having no sound output from the blocking object in the virtual space.

In addition, the processor may receive metadata indicating information about a non-sound object included in the virtual space together with the input audio signal.

An operation method of an audio signal processing apparatus for rendering an input audio signal according to another aspect of the present disclosure includes the steps of obtaining information about an input audio signal and a virtual space in which the input audio signal is simulated, Based on the position of each of the at least one object included in the virtual space and the position of the sound source corresponding to the input audio signal based on the position of the sound source and the position of the sound source corresponding to the input audio signal, Generating binaural rendering of the input audio signal based on the determination result to generate an output audio signal, and outputting the output audio signal.

The audio signal processing apparatus according to the embodiment of the present disclosure can provide an immersive three-dimensional audio signal. In addition, the audio signal processing apparatus according to the embodiment of the present disclosure can efficiently simulate a spatial sound including an occlusion effect caused by an obstacle between a sound source and a listener.

In addition, the audio signal processing apparatus according to the embodiment of the present disclosure can simulate the effect of the arcade on an input audio signal in which audio signals of various formats coexist. Further, the audio signal processing apparatus according to the embodiment of the present disclosure can simulate an interaction between audio signals in various formats and a non-sound object that does not produce sound.

FIG. 1 is a diagram showing that characteristics of an audio signal are changed by an acoustic acicular effect according to an embodiment of the present disclosure.

2 is a block diagram showing a configuration of an audio signal processing apparatus according to an embodiment of the present disclosure.

3 is a diagram illustrating a method by which an audio signal processing apparatus according to an embodiment of the present disclosure generates a transmission audio signal based on an input audio signal.

FIGS. 4 and 5 are diagrams illustrating a method by which an audio signal processing apparatus according to an embodiment of the present disclosure generates a diffracted audio signal based on an input audio signal.

6 is a diagram showing HRTFs determined based on the listener's head direction and sound source position with respect to the head center of the listener.

Figs. 7 and 8 are diagrams showing HRTF pairs obtained when the distance from the listener to the sound source is located closer or farther than the reference distance at which the HRTF set is generated.

9 is a diagram showing the operation of the audio signal processing apparatus when the presence or absence of an object is different in each acoustic path between each of the ears of the listener and the sound source.

10 is a diagram illustrating an example in which an output audio signal according to an embodiment of the present disclosure is configured differently for each ear of the listener.

11 is a diagram illustrating a method by which an audio signal processing apparatus according to an embodiment of the present disclosure generates a reflected audio signal.

12 is a diagram showing a method of generating a reverberation audio signal corresponding to each of the two ears of the listener.

13 is a block diagram illustrating a process of processing an input audio signal by an audio signal processing apparatus according to an embodiment of the present disclosure.

14 is a block diagram showing the preprocessing operation of the audio signal processing apparatus in more detail.

15 is a block diagram showing the audio signal preprocessing operation of the audio signal processing apparatus in more detail.

FIG. 16 is a view showing the binaural rendering process described in FIG. 13 in more detail.

17 is a block diagram showing in detail the configuration of an audio signal processing apparatus according to an embodiment of the present disclosure.

18 is a block diagram showing the configuration of an audio signal processing apparatus according to an embodiment of the present disclosure in detail.

19 is a block diagram showing in detail the configuration of an audio signal processing apparatus according to an embodiment of the present disclosure.

20 is a block diagram specifically illustrating an object renderer according to an embodiment of the present disclosure;

21 is a diagram showing an object renderer further including a coordinate transformation processing unit according to an embodiment of the present disclosure;

22 is a block diagram specifically illustrating an ambsonic renderer according to an embodiment of the present disclosure;

23 is a block diagram specifically illustrating a channel renderer according to an embodiment of the present disclosure;

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

Also, when an element is referred to as " comprising ", it means that it can include other elements as well, without departing from the other elements unless specifically stated otherwise.

There may be an object blocking the sound source and the listener in a virtual space in which the audio signal is simulated. In this case, the audio signal processing device can simulate acoustic acrobation effects by the object (s) blocking the direct acoustic path between the sound source and the listener in the virtual space. In this way, the audio signal processing device can provide the user with a lively output audio signal. In the present disclosure, a direct acoustic path or acoustic path can be used to denote an acoustic path of a direct sound between a sound source and a listener. The present disclosure relates to an audio signal processing apparatus for binaurally rendering an input audio signal based on object-related information related to an object included in a virtual space, and simulating an acoustic acrobation effect. In the present disclosure, an object blocking between a sound source and a listener may be referred to as a blocking object (s). Also, in this disclosure, a listener may represent a listener in a virtual space unless otherwise noted.

FIG. 1 is a diagram showing that characteristics of an audio signal are changed by an acoustic acicular effect according to an embodiment of the present disclosure. In Fig. 1, the acoustic path of the direct sound, from which the sound output from the sound source O is directly transmitted to the listener, can be modeled as a shortest path connecting the head center of the listener A from the sound source O. [ At this time, when the object W is positioned on the direct acoustic path between the sound source O and the listener A, the characteristics of the audio signal corresponding to the sound source O may be changed. For example, the direct sound output from the sound source O may be attenuated depending on the acoustic transmittance that indicates the degree to which the sound passes through the object W. The audio signal processing apparatus can simulate a direct sound attenuated by the object by attenuating the audio signal corresponding to the sound source O. [ At this time, the audio signal processing apparatus can set the degree of attenuation of the audio signal differently for each frequency component. A method for simulating a direct sound attenuated by an object by an audio signal processing apparatus will be described in detail with reference to FIG. Further, the sound output from the sound source O may be diffracted at a specific point (for example, 'a' in FIG. 1) on the surface of the object W and transmitted to the listener A. A method of simulating a diffracted sound diffracted by the audio signal processing device on the surface of the object W and transmitted to the listener A will be described in detail with reference to FIG.

On the other hand, in the case of a binaural audio signal, the acoustic path may include a first acoustic path and a second acoustic path with respect to each of the ears of the listener A, respectively. At this time, the first acoustic path and the second acoustic path may be different from each other. The first acoustic path and the second acoustic path may be modeled as a shortest path connecting each of the ears of the listener A from the sound source O. [ Accordingly, the audio signal processing apparatus can simulate acoustic acrobatic effects for each of the first acoustic path and the second acoustic path, rather than one acoustic path based on the head center of the listener A. [ Since the occlusion effect by the blocking object may be different for each of the first acoustic path and the second acoustic path. Specifically, a blocking object may exist only in either the first acoustic path or the second acoustic path. Or the object on the first acoustic path and the object on the second acoustic path may be different. A method for the audio signal processing apparatus to classify the first acoustic path and the second acoustic path to simulate the acoustic eclipse effect will be described in detail with reference to FIG. 6 through FIG.

Hereinafter, a configuration of an audio signal processing apparatus according to an embodiment of the present disclosure will be described with reference to FIG. 2 is a block diagram showing a configuration of an audio signal processing apparatus 10 according to an embodiment of the present disclosure. Some of the components shown in Fig. 2 may be omitted, and the audio signal processing apparatus 10 may further include components not shown in Fig. In addition, the audio signal processing apparatus 10 may include at least two or more different components as one unit. The audio signal processing apparatus 10 may be implemented as one semiconductor chip. For example, each component may be implemented through a hardware component, such as a separate circuit

2, the audio signal processing apparatus 10 may include a receiving unit 11, a processor 12, and an output unit 13. The receiving unit 11 may receive an input audio signal input to the audio signal processing apparatus 10. [ The receiving unit 11 can receive an input audio signal to be processed by the processor 12 for audio signal processing. The output unit 13 may also transmit the output audio signal generated by the processor 12. [ Here, the input audio signal may include at least one of an object signal, an ambsonic signal, and a channel signal. Also, the output audio signal may be an audio signal rendered from the input audio signal.

The receiving unit 11 may receive an input audio signal input to the audio signal processing apparatus 10. [ The receiving unit 11 can receive an input audio signal to be processed by the processor 12 for audio signal processing. According to one embodiment, the receiving unit 11 may include receiving means for receiving an audio signal. For example, the receiving unit 11 may include an audio signal input / output terminal for receiving an audio signal transmitted through a wire. The receiving unit 11 may include a wireless audio receiving module for transmitting and receiving an audio signal transmitted wirelessly. In this case, the receiving unit 11 can receive an audio signal wirelessly transmitted using a Bluetooth or Wi-Fi communication method. According to an embodiment, when the audio signal processing apparatus 10 includes a separate decoder, the receiving unit 11 may receive a bitstream encoded from the input audio signal. At this time, the decoder may be implemented through the processor 12, which will be described later. The receiving unit 11 may receive information related to the input audio signal together with the input audio signal. In this case, the bitstream may additionally include information related to the input audio signal in addition to the input audio signal. This will be described in detail with reference to FIG. 17 through FIG. The receiving unit 11 may include one or more components communicating with other devices outside the audio signal processing apparatus 10. [ Also, the receiving unit 11 may include at least one antenna for receiving the bit stream. Also, the receiving unit 11 may include hardware for wired communication for receiving the bit stream.

The processor 12 can control the overall operation of the audio signal processing apparatus 10. [ The processor 12 can control each component of the audio signal processing apparatus 10. The processor 12 may perform arithmetic processing and processing of various data and signals. The processor 12 may be implemented in hardware in the form of a semiconductor chip or an electronic circuit, or may be implemented in software that controls hardware. The processor 12 may be implemented as a combination of hardware and software. For example, the processor 12 can control the operations of the receiving unit 11 and the output unit 13 by executing at least one program included in the software. In addition, the processor 12 may execute at least one program to perform operations of the audio signal processing apparatus 10 described in Figs. 3 to 23, which will be described later.

The processor 12 may render the input audio signal based on the spatial information and the listener information to generate an output audio signal. A method by which the processor 12 generates the output audio signal will be described later with reference to FIG. 3 to FIG. At this time, the spatial information may include information about a plurality of objects included in a virtual space in which the input audio signal is simulated. Further, the information on the plurality of objects may include at least one of the position, the structural characteristic, or the physical characteristic of each of the plurality of objects. The structural characteristics of the object may include at least one of the size or the shape of the object. The physical property of the object may include at least one of information indicating the material of the object or the transmittance of the object.

The listener information may also include information associated with the listener in the virtual space. Specifically, the listener information may include listener position information indicating the position of the listener in the virtual space. In addition, the listener information may include head direction information indicating the head direction of the listener according to the head movement of the listener. Head direction information can be acquired in real time via the head-mounted display and sensors attached to the hardware. Also, the listener's location and heading direction information may be obtained based on the user's input. At this time, the user may be a user who controls the operation of the listener in a game environment provided by a device such as a PC or a mobile. The listener information may include head size information indicating the head size of the listener. The processor 12 may estimate the position of both ears of the listener based on the listener's location information and the listener's head size information. Or the processor 12 may obtain the position of both ears of the listener via the listener information including information about the position of both ears of the listener. For example, the processor 12 may receive at least one of spatial information or listener information through the receiving unit 11 described above. The processor 12 may receive the spatial information corresponding to the input audio signal together with the input audio signal through the receiving unit 11. [ The way in which the processor 12 receives the spatial information will be described later with reference to Figs. 17 to 19. Fig. In addition, the processor 12 may further perform post-processing on the output audio signal. Post processing may include at least one of crosstalk removal, dynamic range control (DRC), volume normalization, and peak limiting. The audio signal processing apparatus 10 may include a separate post-processing unit for performing post-processing, and the post-processing unit may be included in the processor 12 according to another embodiment.

The output unit 13 can output the output audio signal. For example, the output unit 13 may output the output audio signal generated by the processor 12. [ The output unit 13 may include at least one output channel. Here, the output audio signal may be a two-channel output audio signal corresponding to the amount of the listener, respectively. Also, the output audio signal may be a binaural 2-channel output audio signal. The output unit 13 can output the 3D audio headphone signal generated by the processor 12.

According to one embodiment, the output unit 13 may comprise output means for outputting an output audio signal. For example, the output unit 13 may include an output terminal for outputting the output audio signal to the outside. At this time, the audio signal processing apparatus 10 can output an output audio signal to an external device connected to the output terminal. Or the output unit 13 may include a wireless audio transmission module for outputting an output audio signal to the outside. In this case, the output unit 13 can output an output audio signal to an external device using a wireless communication method such as Bluetooth or Wi-Fi. Or the output unit 13 may include a speaker. At this time, the audio signal processing apparatus 10 can output the output audio signal through the speaker. Specifically, the output unit 13 may include a plurality of speakers arranged according to a predetermined channel layout. The output unit 13 may further include a converter (e.g., a digital-to-analog converter (DAC)) for converting the digital audio signal into an analog audio signal.

The apparatus for processing an audio signal according to an embodiment of the present disclosure can determine whether there is an object blocking between a sound source and a listener based on information about a virtual space. At this time, the information about the virtual space may include position information indicating the position of the sound source based on the listener and the position of each of the plurality of objects included in the virtual space. Next, the audio signal processing apparatus can binaurally render the input audio signal based on the determination result to generate an output audio signal. For example, if there is no blocking object, then the audio signal processing device may not use the information associated with the blocking object in filtering the input audio signal. On the other hand, if there is a blocking object, the audio signal processing device can filter the input audio signal based on the information associated with the blocking object. In this case, the audio signal processing apparatus can binaurally render the input audio signal using the HRTF corresponding to the additional position in addition to the head related transfer function (HRTF) corresponding to the sound source.

According to one embodiment of the present disclosure, in an imaginary space in which an input audio signal is simulated, there may be an object W in the acoustic path between the sound source and the listener. The object W may be an object other than a listener and a sound source. For example, the sound source of the input audio signal to be processed by the audio signal processing apparatus may be a sound source O occluded by the object W in the listener A position. In this case, the audio signal processing apparatus can simulate the effect of the archery by the object W. At this time, the effect of the occlusion by the object W can be modeled as a transmission sound, a diffraction sound, and a reflection sound representing a direct sound attenuated through the object W. The audio signal processing apparatus can generate a transmission audio signal, a diffraction audio signal, and a reflection audio signal corresponding to the transmission sound, the diffraction sound, and the reflection sound, respectively, based on the input audio signal. In addition, the output audio signal described in this disclosure may include at least one of a transmitted audio signal, a diffracted audio signal, or a reflected audio signal. Hereinafter, a method by which an audio signal processing apparatus according to an embodiment of the present disclosure generates a transmission audio signal based on an input audio signal will be described with reference to FIG.

3 is a diagram illustrating a method by which an audio signal processing apparatus according to an embodiment of the present disclosure generates a transmission audio signal based on an input audio signal. According to one embodiment, when the transmittance of the object W is equal to or greater than the reference transmittance, the audio signal processing apparatus can generate a transparent audio signal based on the transmission attenuation gain. If the transmittance of the object is less than the reference transmittance, the audio signal processing apparatus may not generate the transparent audio signal. If the transmittance of the object is less than the reference transmittance, it may be similar to the case where there is no transmitted sound passing through the object to the listener.

According to one embodiment, an audio signal processing apparatus may binaurally render an input audio signal based on a transmission attenuation gain to produce a transmitted audio signal. The audio signal processing apparatus can generate the transparent audio signal by adjusting the size of the input audio signal with the transmission attenuation gain. At this time, the transmission attenuation gain may indicate a ratio of the size of the transmission audio signal to the size of the input audio signal. The transmission attenuation gain may be a filter coefficient that models the ratio of the lost sound as it passes through the object W. For example, the audio signal processing apparatus may multiply an input audio signal by a transmission attenuation gain to generate a transmission audio signal.

According to one embodiment, the audio signal processing apparatus may filter the input audio signal corresponding to the sound source O based on the length x of the section in which the direct acoustic path overlaps with the object W. [ Specifically, the audio signal processing apparatus can determine the attenuation gain based on the length (x). At this time, the attenuation gain may become smaller as the length (x) of the section in which the acoustic path overlaps with the object W is longer. This is because the longer the length (x) of the section through which the original sound output from the sound source passes through the object, the greater the degree of attenuation of the transmitted sound transmitted to the listener. Specifically, the attenuation gain may be inversely proportional to the length (x). The audio signal processing apparatus can calculate the length x based on the position of the sound source and the position of the object W with respect to the listener. Further, the audio signal processing apparatus can calculate the length (x) based on the shape of the object (W).

Further, the audio signal processing apparatus can filter the input audio signal based on the acoustic transmittance of the object W. Specifically, the audio signal processing apparatus can determine the transmission attenuation gain based on the acoustic transmittance of the object W. Here, the acoustic transmittance may indicate the degree to which the object W passes the sound. Specifically, the acoustic transmittance of the object W may vary depending on the material constituting the object W. In addition, the acoustical transmittance may vary according to the frequency component of the audio signal. In the present disclosure, a frequency component may represent a frequency bin of a predetermined magnitude. The audio signal processing apparatus can determine the acoustic transmittance based on the information about the material constituting the object W. [

Specifically, the first material may transmit an audio signal relatively more than the second material. When the material constituting the object W is the first material, the acoustic transmittance of the object W may be higher than other objects constituted of the second material. In addition, the transmittance of the third material may be different from that of the fourth material. Specifically, the third material may transmit the first frequency component relatively more than the second frequency component. When the material constituting the object W is a third material, the acoustic transmittance of the object W may be relatively high in the first frequency component as compared to the second frequency component. Also, the first frequency component and the second frequency component may be frequency bands differentiated based on a predetermined frequency in the entire frequency domain. The first frequency component may be a frequency band lower than a predetermined frequency. The second frequency component may be a frequency band higher than a predetermined frequency.

Next, the audio signal processing apparatus can binaurally render the input audio signal based on the HRTF and the attenuation gain corresponding to the sound source to generate a transmission audio signal. The audio signal processing apparatus can obtain the HRTF corresponding to the sound source based on the head direction of the listener and the position of the sound source. At this time, the HRTF may include the east side HRTF and the large side HRTF pair. In the present disclosure, the transfer functions include a Head Related Transfer Function (HRTF), an Interaural Transfer Function (ITF), a Modified ITF (MITF), a Binaural Room Transfer Function (BRTF), a Room Impulse Response (RIR), a Binaural Room Impulse Response ), Head Related Impulse Response (HRIR), and modified and edited data thereof, and the present disclosure is not limited thereto. The audio signal processing apparatus can acquire the transfer function from a separate database. In the present disclosure, it is assumed that the transfer function is a Fast Fourier Transform (IR) of an impulse response, but the method of conversion is not limited thereto. For example, the transform method may include at least one of a Quadratic Mirror Filterbank (QMF), a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), or a wavelet. In Fig. 3, the audio signal processing apparatus can obtain H0 which is the HRTF corresponding to the sound source. Further, the audio signal processing apparatus can generate a transparent audio signal based on H0 and the above-described attenuation gain.

Hereinafter, a method by which an audio signal processing apparatus according to an embodiment of the present disclosure generates a diffracted audio signal based on an input audio signal will be described with reference to Figs. 4 and 5. Fig. If there is a blocking object between the sound source and the listener, the sound output from the sound source can be diffracted at the surface of the blocking object and delivered to the listener. When the sound is diffracted, the magnitude of the sound reaching the listener may be attenuated from the size of the original sound. At this time, the extent of the attenuation of the sound may vary depending on the frequency component of the sound. FIGS. 4 and 5 are diagrams illustrating a method by which an audio signal processing apparatus according to an embodiment of the present disclosure generates a diffracted audio signal based on an input audio signal.

According to one embodiment, the audio signal processing device may determine a diffraction point at which the acoustic corresponding to the input audio signal is diffracted at the surface of the blocking object. The sound output from the sound source can be diffracted at the diffraction point on the surface of the blocking object to reach the listener. For example, the audio signal processing apparatus can determine at least one diffraction point based on the shape of the blocking object. The audio signal processing apparatus can determine at least one diffraction point based on the diffraction distance at the surface of the blocking object. For example, the audio signal processing apparatus can determine a point at which the diffraction distance is the smallest among the points on the surface of the blocking object as the diffraction point. Here, the diffraction distance may represent the sum of the first path from the source to the first point on the blocking object surface and the distance of each of the second path from the first point to the listener. At this time, the first path and the second path may be shortest paths that do not cross the blocking object. The longer the distance the sound is diffracted, the smaller the size of the sound reaching the listener and the less the characteristics of the audio signal can be transformed. The longer the diffraction distance, the greater the degree of attenuation, which may be ineffective to reproduce the effect of the occlusion relative to the required computational complexity. The audio signal processing apparatus can efficiently model the diffracted sound based on the diffracted distance.

On the other hand, in the first path and the second path, the shortest path that does not intersect the blocking object may pass through a plurality of points on the surface of the blocking object. In this case, the audio signal processing apparatus can determine the last point where the diffraction path of the sound output from the sound source meets the blocking object as the diffraction point. Here, the diffraction path represents the entire first path and the second path. The audio signal processing device can binaurally render the input audio signal based on the last point at which the acoustic path of the acoustic to be diffracted abuts the blocking object to produce a diffracted audio signal.

4, the diffraction distance with respect to the point a on the surface of the blocking object W is determined by the first distance from the position O of the sound source to the point a and the distance from the point a to the listener A It can be the sum of two distances. The audio signal processing apparatus can determine the point (a) having the smallest diffraction distance as the diffraction point. Point (a) may be one point with the shortest diffraction distance among the plurality of points on the surface of the blocking object. In Fig. 4, although the case of one diffraction point is taken as an example, the audio signal processing apparatus may generate a diffraction audio signal based on a plurality of diffraction points.

Further, according to one embodiment, the audio signal processing apparatus can divide the blocking object into a plurality of regions to determine a diffraction point for each region. The audio signal processing apparatus can determine a point corresponding to the shortest diffraction distance for each divided region as a diffraction point for each region. For example, the audio signal processing apparatus can divide the blocking object based on at least one of the size and the shape of the object. Specifically, the audio signal processing apparatus can divide the blocking object into a plurality of regions by referring to a coordinate axis representing a blocking object in a virtual space. At this time, the blocking object may be a two-dimensional or a three-dimensional object.

Referring to Fig. 5, the audio signal processing apparatus can divide a blocking object into a first area including a point a and a second area including a points b and c, based on a side including a point a and a point c have. Next, the audio signal processing apparatus can determine the point a having the shortest diffraction distance in the first region as the diffraction point in the first region. Further, the audio signal processing apparatus can determine the point c having the shortest diffraction distance in the second region as the diffraction point in the second region. Here, the diffraction distance corresponding to the point c may be the distance from the sound source O to the point b, the distance from the point b to the point c, and the distance from the point c to the listener. As in Fig. 5, the diffraction path of the diffracted sound in the second region can cross a plurality of points on the surface of the blocking object. In this case, as described above, the audio signal processing apparatus can determine the point c, which is the last point where the diffraction path of the sound output from the sound source meets the blocking object, as the diffraction point. Further, the audio signal processing apparatus can binaurally render the input audio signal based on the point c. This will be described later.

Further, according to one embodiment, the audio signal processing apparatus can limit the number of diffraction points. For example, the audio signal processing apparatus can determine the maximum number of diffraction points. In addition, the audio signal processing apparatus can generate a diffracted audio signal based on the number of diffraction points that is equal to or less than the maximum number of diffraction points. For example, the audio signal processing apparatus can generate a diffracted audio signal corresponding to each diffraction point based on a maximum number of diffraction points, among the diffraction points for each region. At this time, the audio signal processing apparatus can determine the diffraction points corresponding to the maximum number or less from the diffraction point having the shortest diffraction distance to the shortest diffraction distance, based on the diffraction distance. For example, when the maximum number of diffraction points is two and the blocking object is divided into three regions, the audio signal processing apparatus generates a first diffraction audio signal based on the first diffraction point corresponding to the shortest diffraction distance . At this time, if there is one point corresponding to the shortest diffraction distance, the audio signal processing apparatus can generate the second diffraction audio signal based on the second diffraction point corresponding to the second shortest diffraction distance.

On the other hand, among the diffraction points that are not selected as the reference diffraction points for generating the diffracted audio signal, the number of different diffraction points corresponding to the same diffraction distance may be larger than the remaining number of diffraction points. In this case, the audio signal processing apparatus can select any point corresponding to the maximum number of diffraction points remaining among the different points corresponding to the same diffraction distance. At this time, the audio signal processing apparatus can set the diffraction point so that the distance between the selected diffraction points becomes maximum. Further, according to one embodiment, the audio signal processing apparatus can determine the maximum number of diffraction points based on the processing performance of the audio signal processing apparatus. At this time, the processing performance may include the processing speed of the processor included in the audio signal processing apparatus. Since the resources that can be allocated to the operation for generating the diffracted audio signal can be limited depending on the processing speed of the processor. In addition, the processing capabilities of the audio signal processing apparatus may include the computing power of the memory or GPU included in the audio signal processing apparatus.

Further, according to one embodiment, the audio signal processing apparatus may determine a point at which the diffraction distance along each point on the surface of the blocking object is shorter than a predetermined distance as the diffraction point. The longer the diffraction distance becomes, the greater the degree of attenuation becomes, and the reproduction of the effect of the arcucation relative to the required computation amount may be ineffective. In the embodiment of Fig. 4, when the diffraction distance based on the point a is longer than the predetermined distance, the audio signal processing apparatus may not generate the diffraction audio signal. On the other hand, in the embodiment of Fig. 5, the diffraction distance based on the point c is longer than the predetermined length, and the diffraction distance based on the point a may be shorter than the predetermined distance. In this case, the audio signal processing apparatus may determine only the point a as the diffraction point. If there is no point having a diffraction distance shorter than the predetermined distance, the audio signal processing apparatus can not determine the diffraction point. On the other hand, when there are a plurality of points having a diffraction distance shorter than a predetermined distance, the audio signal processing apparatus can select some of the plurality of points. According to one embodiment, the predetermined distance may be a value set based on the distance from the sound source to the listener. For example, the predetermined distance may be set to a larger value as the distance from the sound source to the listener becomes longer. The longer the distance from the source to the listener, the longer the diffraction distance. That is, the audio signal processing apparatus may determine at least one diffraction point as a point connecting the sound source and the listener at a distance shorter than a predetermined distance among a plurality of points on the surface of the blocking object.

Next, the audio signal processing apparatus can obtain the HRTF corresponding to the diffraction point based on the head direction and the diffraction point of the listener. At this time, the HRTF may be an HRTF corresponding to a different location from the HRTF used to generate the transmitted audio signal. Specifically, the audio signal processing apparatus can obtain the HRTF corresponding to the diffraction point with respect to the head direction of the listener. In Fig. 4, the audio signal processing apparatus can obtain H1 different from H0 which is the HRTF corresponding to the position of the sound source. At this time, H1 may be the HRTF corresponding to the diffraction point (a) with respect to the head direction of the listener. When there are a plurality of diffraction points, the audio signal processing apparatus can obtain the HRTF corresponding to each of the plurality of diffraction points based on the position of the listener. Further, the audio signal processing apparatus can binaurally render the input audio signal using the HRTF corresponding to the diffraction point. The audio signal processing apparatus may binaurally render the input audio signal using the HRTF corresponding to the diffraction point to generate a diffracted audio signal. As described above, the diffraction path can cross a plurality of points on the surface of the blocking object. In this case, the audio signal processing apparatus can binaurally render the input audio signal to generate a diffracted audio signal based on the HRTF corresponding to the last point where the diffraction path of the sound output from the sound source meets the blocking object.

Further, the audio signal processing apparatus can generate the diffracted audio signal based on the diffraction distance. The audio signal processing apparatus can generate the diffracted audio signal by attenuating the input audio signal based on the diffraction distance corresponding to the diffraction point. The size of the sound output from the sound source is attenuated according to the diffraction distance. Specifically, the audio signal processing apparatus can determine the diffraction attenuation gain by diffraction based on the diffraction distance along the diffraction point. The audio signal processing device may multiply the input audio signal by a diffraction attenuation gain. At this time, the audio signal processing apparatus can determine the diffraction damping gain differently for each frequency component. In the case of diffracted sounds, the lower the frequency is, the higher the ratio of diffraction from the original sound to the listener is high. Accordingly, the audio signal processing apparatus can set the attenuation gain so that the degree of attenuation becomes smaller as the frequency becomes lower. Also, the diffracted sound can be delayed compared to the direct sound. This is because the path through which the sound output from the sound source is transmitted to the listener becomes longer. Accordingly, the audio signal processing apparatus can generate the diffracted audio signal by delaying the input audio signal based on the diffraction distance. Next, the audio signal processing apparatus may mix the transmission audio signal and the diffraction audio signal generated by the method described with reference to FIG. 3 to generate an output audio signal. For example, the audio signal processing device may mix the binaurally rendered transmitted audio signal and the diffracted audio signal for each ear of the listener.

On the other hand, as described above, the acoustic path from the sound source to the head center of the listener and the acoustic path from the sound source to both ears of the listener may be different from each other. Accordingly, the influence of the object W on each acoustic path from the sound source to both ears of the listener can be changed. For example, the object W may be located on the second acoustic path from the sound source to the listener's right ear, while the object W is not located on the first acoustic path from the sound source to the listener's left ear. have. Or different objects may be located in each of the first acoustic path and the second acoustic path. The audio signal processing apparatus may model different transfer functions for the first acoustic path and the second acoustic path, respectively.

6 is a diagram showing HRTFs determined based on the listener's head direction and sound source position with respect to the head center of the listener. For example, the audio signal processing apparatus can determine an azimuth angle and an elevation angle corresponding to a position O of a sound source from a center of a listener's head in a virtual space. Next, the audio signal processing apparatus can binaurally render the input audio signal corresponding to the sound source using the transfer function H0 corresponding to the determined azimuth and elevation angles. Here, the reference distance may represent the measured distance of the HRTF set including the HRTF based on the listener. At this time, the transfer function H0 may be part of the HRTF set (set) measured based on the reference distance R. [ For example, the set of HRTFs may be a set of transfer functions centered at the listener ' s head center and representing properties measured at points on the sphere with the reference distance R as a radius.

When the distance from the center of the listener's head to the position O of the sound source is equal to or similar to the reference distance R that is the basis of HRTF generation, the audio signal processing apparatus can use the transfer function H0 obtained by the above-described method. However, if the head size of the listener (or the distance between the ears) is greater than or equal to a threshold distance set with respect to the HRTF measurement distance R, and the audio signal processing device is binaurally rendering using the transfer function H0, Performance may be degraded. As shown in FIGS. 7 and 8, the HRTF obtained based on the position of each ear of the listener and the HRTF obtained based on the position of the head center of the listener are different. However, when the HRTF set having various reference distances is configured to improve the performance of the binaural rendering, the number of points to be measured by the apparatus for generating the HRTF set may increase. Also, if there is a limit on the storage capacity of the database storing the HRTF sets, the database may be difficult to store all of the HRTF sets measured at various reference distances.

Figs. 7 and 8 are diagrams showing HRTF pairs obtained when the distance from the listener to the sound source is located closer or farther than the reference distance at which the HRTF set is generated. In Fig. 7, the angle from the left ear to the sound source of the listener with respect to the head direction of the listener is theta_c. Also, the angle from the right ear to the sound source of the listener based on the head direction of the listener is theta_i. At this time, theta_c and theta_i may be different from each other. Also, theta_c and theta_i are different from the angle theta_O from the center of the listener's head to the sound source with respect to the listener's head direction. As such, the acoustic path from the source to both ears of the listener may be different from the acoustic path from the source to the listener's head center. Also, the transfer function reference positions Hi and Hc transferred on the spherical surface having the measured reference distance R as the radius of the HRTF set may be different from the transfer function H0 obtained on the basis of the head center of the listener. The audio signal processing apparatus can obtain the HRTF corresponding to different positions for each of the listeners' ears based on the reference distance at which the HRTF set is generated and the distance between the sound source and the listener. Specifically, the audio signal processing apparatus can obtain the east side HRTF and the large side HRTF corresponding to the east side and the large side of the listener, respectively, based on the reference distance, the position of each of the listener's ears, and the position of the source. 7 and 8, the east side (left) HRTF may be the HRTF corresponding to the left ear of the listener in the transfer function pair corresponding to the position of Hc. 7 and 8, the opposite (right) HRTF may be the HRTF corresponding to the right ear of the listener in the transfer function pair corresponding to the position of Hi. Next, the audio signal processing apparatus can binaurally render the input audio signal based on the obtained east side HRTF and the opposite side HRTF to generate an output audio signal.

Hereinafter, a method of simulating the effect of the audio signal processing apparatus corresponding to each of the east side and the large side of the listener will be described with reference to FIG. 9 through FIG. According to one embodiment of the present disclosure, the audio signal processing apparatus can determine the presence or absence of a blocking object independently of each of the east side and the large side of the listener from the sound source based on the positions of the ears of the listener. This is because the influence of the object can be changed according to the position of both ears of the listener and the positional relationship of the sound source corresponding to the input audio signal. Specifically, the audio signal processing apparatus can determine whether there is an obstacle between the sound source corresponding to the input audio signal and the east side of the listener based on the information about the virtual space. Further, the audio signal processing apparatus can determine whether there is an obstacle between the sound source corresponding to the input audio signal and the opposite side of the listener, based on the information about the virtual space.

9 is a diagram showing the operation of the audio signal processing apparatus when the presence or absence of an object is different in each acoustic path between each of the ears of the listener and the sound source. In Fig. 9, the audio signal processing apparatus can generate an output audio signal using the HRTF obtained at different positions for each of the east side and the large side corresponding to each of the ears of the listener, as in Figs. 7 and 8. At this time, the output audio signal may include an east-side output audio signal and a large-side output audio signal. In Fig. 9, no blocking object may be located in the first acoustic path between the sound source and the left ear L of the listener. In this case, the audio signal processing apparatus may not apply the effect of the object W to the left output audio signal for the left ear (L) of the listener. The output audio signal for the left ear (L) of the listener may be closer to the actual sound than not to apply the archiving effect by the object W. On the other hand, the audio signal processing apparatus can apply the effect of the object W to the right output audio signal for the right ear (R) of the listener. Specifically, the audio signal processing apparatus can generate the left output audio signal based on the left transfer function Hi. Further, the audio signal processing apparatus can generate a right output audio signal in which the transmission audio signal and the diffraction audio signal are mixed based on the information on the right transfer function Hc and the blocking object.

On the other hand, according to one embodiment of the present disclosure, an audio signal processing apparatus is characterized in that the sound diffracted at the diffraction point determined by the method described above with reference to Figs. 4 and 5 is diffracted at another point on the surface of the blocking object including the diffraction point Thereby generating an indirectly diffracted audio signal that is delivered to the listener. For example, the blocking object may mask only one of the acoustic paths corresponding to each of the ears of the listener. In this case, the output audio signal corresponding to the other one may not include the diffracted audio signal. However, there may be an indirect effect on the other side by the blocking object blocking one side. Through the indirectly diffracted audio signal, the audio signal processing device can provide a realistic audio signal to the user.

In the present disclosure, the indirect diffraction spot may represent a diffraction spot determined with the diffraction spot as a virtual sound source. Further, the indirectly diffracted audio signal may be an audio signal simulating an indirect diffraction sound. The indirect diffraction sound may be sound output from the source and diffracted at the surface of the blocking object diffracted at other points of the same blocking object surface and delivered to the listener. In the present embodiment, for convenience of explanation, the diffraction spot rather than the indirect diffraction spot is referred to as a direct diffraction spot.

Specifically, the indirect diffraction point may be a diffraction point determined by using the direct diffraction point as a virtual sound source. The audio signal processing apparatus includes a first path from a sound source to a direct diffraction point, a third path from the direct diffraction point to the indirect diffraction point, and a point where the sum of the distances of the fourth path from the indirect diffraction point to the listener is minimum, Can be determined as a point. At this time, each path may be a shortest path that does not traverse the blocking object, like the first path and the second path described in Fig.

10 is a diagram illustrating an example in which an output audio signal according to an embodiment of the present disclosure is configured differently for each ear of the listener. 10, no blocking object is located in the first acoustic path between the sound source and the left ear (L) of the listener, and the blocking object may be located in the second acoustic path between the sound source and the listener's right ear (R) . In Fig. 10, the diffraction points for the first acoustic path may be D1 and D3. Accordingly, the audio signal processing apparatus can generate the right diffracted audio signal based on the diffraction points D1 and D3 on the surface of the blocking object. Specifically, the audio signal processing apparatus can binaurally render the input audio signal based on the transfer function HD1 corresponding to the diffraction point D1 to generate a first right diffracted audio signal. Further, the audio signal processing apparatus can binaurally render the input audio signal on the basis of the transfer function HD3 corresponding to the diffraction point D3 to generate the second right-hand diffraction audio signal. In addition, the audio signal processing apparatus can binaurally render the input audio signal on the basis of the right transfer function Hi to generate a transparent audio signal. Next, the audio signal processing apparatus may mix the transmission audio signal, the first right-hand diffraction audio signal, and the second right-hand diffraction audio signal to generate a right output audio signal.

In Fig. 10, the left output audio signal corresponding to the second acoustic path may not include the diffracted audio signal. In the case of the second acoustic path, it does not overlap with the blocking object. However, the audio signal processing apparatus may generate a left output audio signal including an indirectly diffracted audio signal. Referring to FIG. 10, the audio signal processing apparatus may generate the indirectly diffracted audio signal by using the diffraction points D1 and D3 for the first acoustic path as virtual sound sources. First, in the case of D1, the shortest path from D1 to the left ear (L) of the listener may not pass through the other point of the blocking object, so that an indirect diffraction sound may not exist. Next, in the case of D3, the third acoustic path from D3 to the left ear (L) of the listener may go past point D2. At this time, the audio signal processing apparatus can determine the point D2 as the indirect diffraction point. Further, the audio signal processing apparatus can render the input audio signal based on the indirect diffraction point D2. For example, the audio signal processing apparatus can binaurally render the input audio signal based on the transfer function HD2 corresponding to the diffraction point D2 to generate an indirectly diffracted audio signal.

The method for determining the direct diffraction point described above and the method for generating the diffracted audio signal can be applied in the same or corresponding manner to each of the method for determining the indirect diffraction spot and the method for generating the indirect diffraction audio signal. For example, the audio signal processing apparatus may attenuate an input audio signal based on an indirect diffraction distance at which the sound output from the sound source reaches the listener through the direct diffraction point and the indirect diffraction point. Further, the audio signal processing apparatus can generate an indirectly diffracted audio signal by delaying the input audio signal based on the indirect diffraction distance. Further, the audio signal processing apparatus can directly generate an audio signal based on the left transfer function Hc. Finally, the audio signal processing apparatus can mix the direct audio signal and the left diffracted audio signal to generate a left output audio signal.

On the other hand, the audio signal processing apparatus can determine whether to generate an indirectly diffracted audio signal based on the size of the blocking object. For example, if the size of the blocking object is smaller than the head size of the listener, the audio signal processing apparatus can generate an indirectly diffracted audio signal. In this case, modeling the indirect diffraction sound by the audio signal processing apparatus can help provide a sense of reality to the user. On the other hand, when the size of the blocking object is larger than the head size of the listener, the audio signal processing apparatus may not generate the indirectly diffracted audio signal. Further, the audio signal processing apparatus can determine whether to generate the indirectly diffracted audio signal based on at least one of the position and the shape of the blocking object.

On the other hand, if there is no blocking object in any one of the acoustic paths corresponding to the two ears of the listener, a reflected sound may be generated in the non-occluded by the blocking object. The audio signal processing apparatus can generate a reflected audio signal based on the blocking object blocking one of them. 11 is a diagram illustrating a method by which an audio signal processing apparatus according to an embodiment of the present disclosure generates a reflected audio signal. 11, no blocking object is located in the first acoustic path between the sound source and the left ear (L) of the listener, and the blocking object may be located in the second acoustic path between the sound source and the listener's right ear (R) .

According to one embodiment, the audio signal processing device can determine the reflection point at which the sound corresponding to the input audio signal is reflected at the surface of the blocking object. Specifically, the audio signal processing apparatus can determine the reflection point based on the position, size, and shape of the blocking object. The audio signal processing apparatus can determine, as a reflection point, a point where the reflection angle and the incident angle on the surface of the blocking object become equal to each other based on the position of the sound source and the position of the listener. For example, the audio signal processing apparatus can determine, as the reflection point, a point at which the angle of incidence from the sound source at the surface of the blocking object becomes equal to the reflection angle from the listener to the listener.

Next, the audio signal processing apparatus can binaurally render the input audio signal based on the position of the listener's head direction and the reflection point to generate a reflected audio signal. The audio signal processing apparatus can obtain the HRTF corresponding to the reflection point based on the position of the listener's head direction and the reflection point. In Fig. 11, the audio signal processing apparatus can obtain the transfer function HR corresponding to the reflection point R '. In addition, the audio signal processing apparatus can binaurally render the input audio signal based on the transfer function HR to generate a reflected audio signal.

Further, the audio signal processing apparatus can generate the reflected audio signal based on the information about the blocking object. At this time, the information about the blocking object may include the acoustic reflectance of the object. Acoustic reflectance can indicate the magnitude ratio of the sound reflected by the object to the acoustic contrast before being reflected. Specifically, the audio signal processing apparatus can determine the reflection attenuation gain based on at least one of the information indicating the material constituting the blocking object or the reflectance of the blocking object. This is because the reflection attenuation gain may vary depending on the material constituting the blocking object.

Further, the audio signal processing apparatus can generate the reflected audio signal based on the reflection distance indicating the length of the reflection path. Specifically, the audio signal processing device can generate a reflected audio signal by attenuating the input audio signal based on the reflection distance corresponding to the reflection point. The size of the reflected sound transmitted to the listener is attenuated compared to the sound output from the sound source according to the reflection distance. Specifically, the audio signal processing apparatus can determine the reflection attenuation gain due to reflection based on the reflection distance along the reflection point. Also, the reflected sound can be delayed compared to the direct sound. This is because the path through which the sound output from the sound source is transmitted becomes longer. Accordingly, the audio signal processing apparatus can generate the diffracted audio signal by delaying the input audio signal based on the reflection distance. Next, the audio signal processing apparatus may mix the direct audio signal, the indirectly diffracted audio signal, and the reflected audio signal generated by the above-described method to generate a left output audio signal. Further, the audio signal processing apparatus can mix the transmission audio signal and the diffraction audio signal generated by the above-described method to generate a right output audio signal.

Meanwhile, according to another embodiment of the present disclosure, an audio signal processing apparatus can generate a reverberant audio signal corresponding to a room reverberation due to a virtual space of sound output from a sound source. At this time, the reverberation may be performed in the post-processing process by the processor 12 described above. 12 is a diagram showing a method of generating a reverberation audio signal corresponding to each of the two ears of the listener. According to one embodiment, the listener in the virtual space may be located at the boundary of the divided space to have different reverberation characteristics as shown in FIG. In this case, both ears of the listener can acquire sound through spaces having different reverberation characteristics. Thus, the audio signal processing apparatus can generate a reverberant audio signal for each of the listeners 'ears, based on the reverberation filter corresponding to the divided space where each of the listeners' ears is located.

For example, if the position of each of the listeners' ears is located in a different subdivision space, the audio signal processing apparatus may filter the input audio signal based on different reverberation filters for each of the right and left of the listener . Specifically, the audio signal processing apparatus can determine the right reverberation filter and the left reverberation filter corresponding to the right and left sides of the listener, respectively, based on the position of each of the listeners' ears. Next, the audio signal processing apparatus binaurally renders an input audio signal on the basis of the right reverberation filter and the left reverberation filter, thereby generating a reverberant audio signal corresponding to each of the right and left sides of the listener.

In Fig. 12, the audio signal processing apparatus can generate a reverberant audio signal for the left ear based on the first reverberation filter generated based on the characteristics of the space R_A. In addition, the audio signal processing apparatus can generate a reverberant audio signal for the right ear based on the second reverberation filter generated based on the characteristics of the space R_B. In this case, the first and second reverberation filters may be filters having different values of at least one filter coefficient. According to another embodiment, the audio signal processing device may combine the first and second reverberation filters to generate one representative reverberation filter. The audio signal processing apparatus may generate reverberant audio signals for left and right using the representative reverberation filter.

Hereinafter, a process in which the audio signal processing apparatus processes the input audio signal to generate the above-described output audio signal will be described. The process described below may be a software component that is executed by a hardware configuration such as a processor. For example, the processor 12 described above with reference to FIG. 2 may perform the processing described in FIGS.

13 is a block diagram illustrating a process of processing an input audio signal by the audio signal processing apparatus 10 according to an embodiment of the present disclosure. According to an embodiment, the audio signal processing apparatus may preprocess the input audio signal based on the spatial information and the listener information (S100). The input audio signal may include a plurality of object signals. Further, the input audio signal may include at least one of an object signal, an ambsonic signal, and a channel signal. The audio signal processing apparatus can generate the intermediate audio signal in which the acoustic acicular effect by the plurality of objects included in the virtual space is simulated. At this time, the intermediate audio signal may be one object signal or a monaural signal. Or an intermediate audio signal may be a multi-channel signal. Further, the audio signal processing apparatus can acquire the HRTF used for binaural rendering based on the spatial information and the listener information. At this time, the HRTF may include the east side HRTF and the large side HRTF pair.

In addition, the audio signal processing apparatus may binaurally render the preprocessed audio signal to generate an output audio signal (S200). The output audio signal may be a binaural signal. For example, the output audio signal may be a 3D audio headphone signal (i.e., a 3D audio 2-channel signal). The audio signal processing apparatus can binaurally render the intermediate audio signal using the HRTF pair obtained in the preprocessing step (S100). The binaural rendering may be performed in the time domain or the frequency domain. According to one embodiment, the intermediate audio signal may be a two-channel audio signal for each of the listeners' ears. Further, the audio signal processing apparatus may be further subjected to Post-Processing on the output audio signal. Post-processing may include Cross-Talk Cancellation, Dynamic Range Control (DRC), Volume Normalization, Peak Limiter, and Reverberator. In addition, the post-processing can be performed in the time domain or the frequency domain as well as the binaural rendering. Accordingly, the audio signal processing apparatus may perform frequency / time domain conversion of the output audio signal in the post-processing process. The audio signal processing apparatus may include a post-processing block processor for performing post-processing. Or post-processing may be performed through the processor 12 of FIG.

14 is a block diagram showing in more detail the operation of the pre-processing (S100) of the audio signal processing apparatus. Referring to FIG. 1, an audio signal processing apparatus may analyze an acoustic space (S110). The audio signal processing apparatus can analyze the acoustic path from the sound source to both ears of the listener based on the position of the listener. The audio signal processing apparatus can determine whether there is a blocking object between the listener and the sound source based on the acoustic path. The audio signal processing apparatus can determine whether a blocking object exists based on the position of the sound source, the position of the listener, and the position of each of a plurality of objects included in the virtual space. When a blocking object exists, the audio signal processing apparatus can determine at least one blocking object from among a plurality of objects. The audio signal processing apparatus can generate the modeling information based on the object-related information of each of the determined blocking objects. At this time, the object related information may be in the form of metadata for the input audio signal. The object-related information may include positional information of the object. In addition, the object-related information may include attribute information indicating whether the object is a sound object or a non-sound object. According to one embodiment, the blocking object may be a non-sound object. The blocking object may also be a passive object, a non-audio object, a scene object, a visual object, an acoustic object, an acoustic element, , An occluder, a reflector, or an absorber.

In addition, the object-related information may include information on the material constituting the object. At this time, the information about the material may include at least one of sound absorption rate, reflectance, transmittance, diffraction rate, and scattering rate for each frequency component of the material constituting the object. Alternatively, the object-related information may include a frequency response characteristic in which information about a material constituting the object is reflected. The audio signal processing apparatus may selectively transmit an audio signal on which binaural rendering is performed based on object-related information of each object. Specifically, the audio signal processing apparatus may not select the first audio signal corresponding to each of the at least one sound source blocked by the first blocking object, when the transmittance of the first blocking object is less than the reference transmittance. In this case, the audio signal processing apparatus may binaurally render an input audio signal except for the first audio signal to generate an output audio signal.

In addition, the audio signal processing apparatus can generate binaural information necessary for binaural rendering of the intermediate audio signal. Herein, the binaural information may include a binaural filter that binaurally renders an audio signal. Or binaural information may include horizontal angle and elevation angle information of a specific point relative to the listener. The audio signal processing apparatus can generate binaural information based on the listener position information and the listener's head direction information. For example, the audio signal processing apparatus can obtain the horizontal angle and the altitude angle corresponding to the position of the sound source on the basis of the listener. Further, the audio signal processing apparatus may acquire the HRTF corresponding to the position of the sound source on the basis of the listener. Further, the audio signal processing apparatus can generate a binaural filter based on the position, size, and shape of the object. Thereby, the audio signal processing apparatus can model the diffracted sound or the reflected sound. For example, the audio signal processing device may obtain a horizontal angle and an elevation angle representing a specific point on the surface of the object. The audio signal processing device may obtain a binaural filter that is used to generate the output audio signal based on the horizontal angle and the altitude angle representing a specific point on the surface of the object.

Specifically, the binaural information may include Ipsilateral binaural information and contralateral binaural information. In FIG. 16 to be described later, the first binaural information and the second binaural information may represent the east side binaural information and the large side binaural information, respectively. The east side binaural information may also include at least one binaural filter for modeling the east side sound. The lateral binaural information may include at least one binaural filter for modeling the major acoustic. Further, in accordance with one embodiment of the present disclosure, an audio signal processing device may use binaural information to simulate acoustic acrobation effects by blocking objects. In this case, the audio signal processing apparatus can acquire binaural information including a plurality of binaural filter pairs through the above-described acoustic space analysis (S110). Or the audio signal processing apparatus may obtain binaural information including a plurality of sets of horizontal angle and altitude angles.

Next, the audio signal processing apparatus can generate the intermediate audio signal using the binaural information. For example, an audio signal processing apparatus may generate one representative binaural filter pair based on a plurality of binaural filter pairs. At this time, the audio signal processing apparatus can generate a plurality of intermediate audio signals for each of the east side and the large side based on a plurality of binaural filter pairs. This is because the binaural filter pairs used may vary depending on the type of sound being modeled (for example, transmitted sound, diffracted sound, and reflected sound). Further, when there are a plurality of blocking objects located between the listener and one sound source, the binaural filter pair may vary depending on the blocking object. Further, the audio signal processing apparatus may mix a plurality of intermediate audio signals to generate a final intermediate audio signal. Alternatively, the audio signal processing apparatus may generate a representative binaural filter pair through a method of averaging, weighting, or compositing a plurality of binaural filter pairs. In this case, the audio signal processing apparatus can binaurally render the intermediate audio signal based on the generated representative binaural filter pair.

In addition, the audio signal processing apparatus may generate the intermediate audio signal based on the modeling information obtained in the acoustic space analysis step (S110) (S120). The audio signal processing apparatus can generate an intermediate audio signal by filtering the input audio signal based on the modeling information. At this time, the intermediate audio signal may include a plurality of audio signals processed in a different manner from the input audio signal. For example, the intermediate audio signal may include an audio signal that models the transmitted sound through the blocking object. The intermediate audio signal may also include an audio signal that models the diffracted sound diffracted at the surface of the blocking object and the reflected sound reflected at the surface of the blocking object. Hereinafter, a method of generating an intermediate audio signal by the audio signal processing apparatus will be described in detail with reference to FIG.

15 is a block diagram showing the audio signal preprocessing operation (S120) of the audio signal processing apparatus in more detail. Referring to FIG. 15, the audio signal processing apparatus may model the transmitted sound by preprocessing the input audio signal (S121). For example, the audio signal processing apparatus can filter the input audio signal based on the transmittance of the blocking object to generate a transparent audio signal. At this time, the transmittance may be applied to different values depending on the frequency bin of the input audio signal.

In addition, the audio signal processing apparatus can pre-process the input audio signal to model at least one of the diffracted sound or the reflected sound (S122). The audio signal processing apparatus can model the diffracted sound and the reflected sound based on the time delay and the decay rate generated by the distortion of the acoustic path. For example, the audio signal processing apparatus may filter the input audio signal based on the diffraction point on the surface of the blocking object to generate a diffracted audio signal. The method described in FIG. 4 can be applied to the method by which the audio signal processing apparatus generates the diffracted audio signal. In addition, the audio signal processing apparatus can generate a reflected audio signal from the input audio signal based on the reflection point on the surface of the blocking object. The method described in FIG. 11 can be applied to a method by which an audio signal processing apparatus generates a reflected audio signal.

Next, the audio signal processing apparatus may generate at least one of the intermediate audio signals by mixing the input audio signal, the transparent audio signal, the diffracted audio signal, and the reflected audio signal, the modeling of which is bypassed (S123). The audio signal processing apparatus can determine the mixing ratio of the input audio signal, the transparent audio signal, the diffracted audio signal, and the reflected audio signal based on the modeling information. Further, the audio signal processing apparatus can mix the input audio signal, the transmission audio signal, the diffraction audio signal, and the reflection audio signal based on the determined mixing ratio. For example, if a blocking object is present, the audio signal processing apparatus may not include the input audio signal to which the modeling is bypassed. On the other hand, when there is no blocking object, there may be no transmitted audio signal, diffracted audio signal, and reflected audio signal. Also, even if a blocking object exists, at least one of a transmitted audio signal, a diffracted audio signal, or a reflected audio signal may not exist. In this case, the audio signal processing apparatus may omit some processing steps based on the modeling information obtained in the acoustic space analysis (S110).

In addition, when the presence or absence of a blocking object is different depending on the acoustic paths corresponding to the east side and the large side, the audio signal processing apparatus can mix the audio signals required for modeling on both the east side and the large side. Specifically, when there is a blocking object only in the acoustic path corresponding to the east side, the audio signal processing apparatus can mix the transmitted audio signal and the diffracted audio signal to generate the east side intermediate audio signal. Further, the audio signal processing apparatus can mix the input audio signal and the reflection audio signal, which modeling is bypassed, to generate a large-side audio signal. Thus, the audio signal processing apparatus can provide more realistic spatial sound to the user.

15, the intermediate audio signal may be a two-channel audio signal corresponding to each of the two ears of the listener. For example, the intermediate audio signal may comprise a first intermediate audio signal and a second intermediate audio signal. Specifically, the audio signal processing apparatus can analyze the acoustic space by dividing the sound path into left and right (or east side and large side) acoustic paths along both ears of the listener. In this case, the audio signal processing apparatus can process the audio signal according to the divided acoustic paths. For example, in the acoustic spatial analysis process (S110), the audio signal processing apparatus can generate the east side and the large side binaural filters, respectively. Further, in the audio signal preprocessing step (S120), the audio signal processing apparatus can generate the east side intermediate audio signal and the large side intermediate audio signal. In this case, the audio signal processing apparatus can independently process the first intermediate audio signal and the second intermediate audio signal.

FIG. 16 is a diagram specifically illustrating the binaural rendering process (S200) illustrated in FIG. According to one embodiment, the audio signal processing apparatus can independently generate an output audio signal corresponding to each of the east side and the large side. For example, the audio signal processing apparatus may binaurally render the first intermediate audio signal based on the first binaural information obtained in the acoustic space analysis step (S110) to generate a first output audio signal S210). In addition, the audio signal processor may binaurally render the second intermediate audio signal based on the second binaural information obtained in the acoustic space analysis step (S110) to generate a second output audio signal (S220) .

Hereinafter, a configuration of an audio signal processing apparatus according to an embodiment of the present disclosure will be described with reference to Figs. 17 to 23. Fig. The audio signal processing apparatus according to the embodiment of FIGS. 17 to 22 may be an audio signal processing apparatus which is the same as or equivalent to the audio signal processing apparatus 10 of FIG. 17 to 23 are block diagrams according to an embodiment of the present disclosure. Blocks that are separately displayed are logically distinguishing elements of the audio signal processing apparatus according to their operations. Further, each unit may be a software component that is executed by a hardware configuration such as a processor. Thus, the operation of each block illustrated in FIGS. 17 through 23 may be performed through an integrated processor including at least one processor. For example, the operation of each block may be performed by the processor 12 of FIG. Accordingly, the same or corresponding portions as those of the embodiment of FIG. 2 in the embodiment of FIGS. 17 to 23 are not described.

17 is a block diagram showing in detail the configuration of an audio signal processing apparatus 160 according to an embodiment of the present disclosure. 17, the audio signal processing apparatus 160 may include a decoder 100, an object renderer 200, an ambienceic renderer 300, a channel renderer 400, and a mixer 500. According to one embodiment, the audio signal processing apparatus 160 may receive an encoded bit stream from an input audio signal by an apparatus other than the audio signal processing apparatus 160. [ The decoder 100 may decode the input bitstream. The decoder 100 may decode the bit stream to obtain an input audio signal. Specifically, the decoder 100 may decode the bitstream using the MPEG-H 3DA standard codec. According to one embodiment, the input audio signal may comprise a plurality of audio signals that are classified in at least one format. For example, the input audio signal may include at least one of an object signal, an ambsonic signal, or a channel signal. In this case, the decoder 100 may classify a plurality of audio signals of different formats included in the input audio signal by format. In addition, the decoder 100 may decode the bit stream to obtain side information corresponding to each of the audio signals classified according to the format. In Fig. 17, the decoder 100 can acquire additional information corresponding to each of an object signal, an ambisonic signal, and a channel signal. In addition, the decoder 100 may decode the bit stream to obtain non-sound object side information for a non-sound object that does not make a sound.

According to one embodiment, the virtual space in which the input audio signal is simulated may comprise a non-sound object. The non-sound object may represent various objects involved in interaction between objects in a virtual space in which the input audio signal is simulated. The non-sound object may be an object having no audio signal corresponding to the object. For example, the non-sound object may include at least one of a passive object, a non-audio object, a scene object, a visual object, an acoustic object, an acoustic element, an occluder, a reflector, or an absorber.

In addition, the non-sound object side information may be included in an acoustic element. Here, an acoustic element may represent a physical object that affects an audio element according to the position and head direction of the listener in a virtual space. Here, the audio element constitutes an audio scene and may be one or more audio signals described by the metadata. For example, the audio element may include at least one of the above-described object signal, ambience signal, or channel signal and additional information corresponding thereto. Further, the audio signal processing apparatus can receive the acoustic element together with the metadata included in the audio object. The audio object may include an audio signal and metadata necessary for simulating a sound source corresponding to the audio signal. The metadata required to simulate the sound source may include location information. In addition, the audio object may be an audio object defined by the ISO / IEC 23008-3 standard. In this disclosure, the case where the input audio signal includes an object signal, an ambsonic signal, and a channel signal is described as an example, but the present disclosure is not limited thereto.

In FIG. 17, the audio signal of each format classified by the decoder 100 can be rendered in a format-specific renderer. The additional information corresponding to each of the audio signals classified according to the format includes real acoustical environments in which the input audio signal is recorded or 6-DOF (degrees of freedom) coordinates of the speaker layout reproducing the output audio signal . At this time, the 6-DOF coordinates may include azimuth angle, elevation angle, distance, yaw, pitch and roll information. At this time, the azimuth, elevation angle, and distance may be information indicating the position of the listener. Further, the yaw, pitch and roll may be information indicating the head direction of the listener. Specifically, the object side information corresponding to the object signal may include directional information such as a directivity pattern of the object. In addition, the non-sound object side information may include information for handling the influence of the non-sound object on sound output from a sound source other than the non-sound object. For example, the non-sound object side information may include at least one of a sound absorption ratio, a reflectance, a transmittance, a diffraction rate, and a scattering rate for each frequency component of a material constituting the non-sound object.

In Fig. 17, the user interaction information may include the above-described listener information. For example, the user interaction information may include a listener's head direction and a listener's location. At this time, the head direction of the listener and the position of the listener can be controlled by user input. Also, the user interaction information may include UI (user interface) information such as a sound object moving (sound), playback / stop, and the like. In this case, the sound object may be an object in which sound corresponding to the object exists, as opposed to a non-sound object. For example, the sound object may include at least one of an active object, an audio object, an audio element, or a sound source.

In addition, in FIG. 17, the renderer corresponding to the format-specific audio signal can generate the intermediate audio signal according to the format of the output audio signal. For example, the output audio signal may be a loud speaker audio signal consisting of a combination of 5.1, 7.1, 5.1.2, 10.2, 22.2 channels, and the like. Alternatively, the output audio signal may be a 2-channel binaural signal output via the headphone / earphone. Or the output audio signal may be a combination of a speaker output signal and a headphone / earphone output signal. For example, the output audio signal may be an audio signal corresponding to a virtual space simulated with the user wearing an earphone or headphone in a space where the loudspeaker layout is installed. Next, the mixer 500 mixes a plurality of intermediate audio signals generated through the object renderer 200, the ambienceic renderer 300, and the channel renderer 400 to generate an output audio signal. A method of generating an intermediate audio signal in each of the renderers will be described in detail with reference to FIGS. 20 to 23. FIG. Hereinafter, additional information transmitted in various manners will be described.

On the other hand, according to another embodiment of the present disclosure, the additional information may be obtained through a separate interface from the input audio signal, unlike the example of Fig. In the embodiment of Figs. 18 and 19, the same or corresponding parts as those of the embodiment of Fig. 17 are not described. 18 is a block diagram showing in detail the configuration of an audio signal processing apparatus 170 according to an embodiment of the present disclosure. In FIG. 18, the audio signal processing apparatus 170 may include a first parser 171 and a second parser 172. In FIG. 18, the first parser 171 and the second parser 172 are represented as replacing the decoder 100 of FIG. 17, but each parser may include a decoder internally. Or the audio signal processing apparatus 170 may include a separate decoder.

According to an embodiment, the audio signal processing apparatus can receive metadata transmitted separately from an input audio signal. For example, the audio signal processing apparatus can receive an input audio signal in the form of pulse-code modulation (PCM) audio. Or the audio signal processing apparatus may receive the input audio signal through a separate audio codec (Codec) for processing the audio signal. In this case, the additional information corresponding to the input audio signal may be parsed through the second parser 172 in addition to the first parser 171 that processes the input audio signal. 18, the first parser 171 can classify the input audio signal into an object signal, an ambience signal, and a channel signal. The first parser 171 can classify the input audio signal according to the format by referring to the track index information on the input audio signal. The second parser 172 may parse the additional information corresponding to the object signal, the ambience signal, and the channel signal, respectively. Also, the second parser 172 can parse the above-described non-sound object side information.

19 is a block diagram showing in detail the configuration of an audio signal processing apparatus 180 according to an embodiment of the present disclosure. According to one embodiment, there may be a second object signal received via a separate interface without a decoding process. When a plurality of users (multi-users) coexist in one virtual space, each of a plurality of users can input a voice signal through a voice input interface (for example, a microphone or a headset). Examples are situations such as voice communication. In this case, the audio signal of each of the plurality of users may be a second input audio signal other than the predetermined first input audio signal. The audio signal processing apparatus can process the second object signal as a separate object signal through the object renderer 200. [ This is because, in the case of a second object signal such as a voice signal of each of a plurality of users, it is possible to reduce the latency by separately processing the second object signal as compared with reclassification through the decoder 100. Also, the object renderer 200 may render the second object signal based on the second object side information.

20 is a block diagram specifically illustrating an object renderer 200 according to an embodiment of the present disclosure. Referring to FIG. 20, the object renderer 200 may generate an object intermediate audio signal based on an object signal, object side information, non-sound object side information, and user interaction information. The object renderer 200 may include a sound source directivity processing unit 210, an object-to-object (O2O) interaction processing unit 220, and a sound localization processing unit 230.

The sound source directivity processing unit 210 may filter the object signal output from the object based on the direction information of the object. The sound source directivity processing unit can model the directivity characteristic of the object signal. And the position and direction of the sound source are relatively different depending on the position of the listener and the head direction in the virtual space.

The O2O interaction processing unit 220 can process the above-described occlusion effect. For example, the O2O interaction processing unit 220 may perform the operations of the audio signal processing apparatus described with reference to FIGS. Specifically, the O2O interaction processing unit 220 may generate at least one of a transmitted audio signal, a diffracted audio signal, or a reflected audio signal based on additional information on at least one blocking object. At this time, the additional information for the blocking object may include at least one of object side information corresponding to the sound object or non-sound object side information.

The sound phase normalization processing unit 230 can process the sound image of the object signal. The sound localization processing unit 230 can filter the object signal based on the layout on which the output audio signal is output. For example, when the output audio signal is output through a loudspeaker layout, the sound image position processing unit 230 generates an object intermediate audio signal using 3D panning such as Vector-Base Amplitude Panning (VBAP) can do. Or the sound localization processing unit 230 may binaurally render the object signal to generate an object intermediate audio signal. According to one embodiment, the object side information may include an azimuth and an elevation angle of the object corresponding to the object signal. At this time, the image-localization processing unit 230 can binarize the object signal using the HRTF determined based on the object side information.

At this time, the HRTF can be determined based on the position and head direction of the listener. FIG. 21 is a diagram showing an object renderer 201, which further includes a coordinate transformation processing unit 240 according to an embodiment of the present disclosure. In FIG. 21, the coordinate transformation processing unit 240 can adjust the position information included in the object side information and non-sound object side information based on the user interaction information. In addition, the user interaction information may include information indicating the position and head direction of the listener. For example, the coordinate transformation processing unit 240 may convert coordinates indicating the position of the sound object and the position of the non-sound object based on the position and the head direction of the listener. Specifically, the coordinate transformation processing unit 240 can calculate the relative coordinates indicating the position of the object on the basis of the coordinate indicating the position of the listener in the virtual space.

22 is a block diagram specifically illustrating an ambsonic renderer 300 according to one embodiment of the present disclosure. 22, the ambisonic renderer 300 renders an ambisonic signal based on the ambisonic signal, the ambisonic supplemental information, the object supplemental information, the non-sound object supplemental information, and the user interaction information, Lt; / RTI > The ambienceic renderer 300 includes an ambisonic-to-ambience (A2A) interpolation processing unit 310, an ambsonic-to-object (A2O) interaction processing unit 320, and a rotation processing unit 330 can do.

The A2A interpolation processing unit 310 may perform interpolation for reproducing acoustic space based on a plurality of ambisonic spatial samples. Each of the Ambisonic spatial samples can represent an ambisonic signal obtained at a plurality of locations. The A2A interpolation processing unit 310 may generate an interpolation ambience signal corresponding to a point where the ambience sound signal is not acquired based on the ambience sound space sample. Specifically, the A2A interpolation processing unit 310 may interpolate a plurality of ambisonic space samples to generate an interpolation ambience signal.

The A2O interaction processing unit 320 can process the occlusion effect on the ambsonic signal. For example, the A2O interaction processing unit 320 may filter the ambsonic signal based on the additional information for at least one blocking object. For example, the A2O interaction processing unit 320 can determine a transmission attenuation gain for each direction component of the ambsonic signal based on the additional information about the blocking object. At this time, the direction component of the ambsonic signal can be specified on the basis of the ambsonic order indicating the highest order among the components of the ambsonic signal. In addition, the A2O interaction processing unit 320 can determine the transmission attenuation gain for each frequency component of the ambsonic signal based on the additional information about the blocking object. The rotation processing unit 330 may rotate the ambsonic signal based on the user interaction information to generate a binaural rendered amviconic intermediate audio signal.

23 is a block diagram specifically illustrating a channel renderer 400 according to one embodiment of the present disclosure. Referring to FIG. 23, the channel renderer 400 may generate a channel intermediate audio signal by rendering a channel signal based on a channel signal, channel additional information object additional information, non-sound object additional information, and user interaction information. The channel renderer 400 may include a channel-to-channel (C2C) interpolation processing unit 410, a channel-to-object (A2O) interaction processing unit 420, and a rotation processing unit 430.

The C2C interpolation processing unit 410 may perform interpolation for reproducing acoustic space based on a plurality of channel space samples. Each of the channel space samples may be a channel signal obtained at a plurality of locations. Alternatively, the channel space sample may be a pre-rendered channel signal based on a particular location. The C2C interpolation processing unit 410 may generate an interpolation channel signal corresponding to a point where the channel signal is not acquired based on the channel space sample. Specifically, the C2C interpolation processing unit 410 may interpolate a plurality of channel space samples to generate an interpolation channel signal.

The C2O interaction processing unit 420 can process the culling effect on the channel signal. For example, the C2O interaction processing unit 420 may filter the channel signal based on the additional information for at least one blocking object. For example, the C2O interaction processing unit 420 may determine a panning gain for each channel of the channel signal based on the additional information about the blocking object. In addition, the C2O interaction processing unit 420 may filter the channel signal based on the channel-specific panning gain. The rotation processing unit 430 may rotate the channel signal based on the user interaction information to generate a binaural-rendered channel intermediate audio signal.

Some embodiments may also be implemented in the form of a recording medium including instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer, and can include both volatile and nonvolatile media, removable and non-removable media. The computer-readable medium may also include computer storage media. Computer storage media may include both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

While the present disclosure has been described with reference to specific embodiments, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the present invention. That is, while the present disclosure has been described with respect to embodiments of binaural rendering of audio signals, the present disclosure is equally applicable and extendable to various multimedia signals including video signals as well as audio signals. Therefore, it is to be understood that within the scope of the present disclosure, those skilled in the art to which the present disclosure belongs may easily construe the description and the embodiments of the present disclosure.

Claims

An audio signal processing apparatus comprising:

And a processor for outputting an output audio signal generated based on the input audio signal,

The processor comprising:

Obtaining information about an input audio signal and a virtual space in which the input audio signal is simulated,

The sound source and the sound source of the at least one object based on the position of each of the at least one object included in the virtual space and the position of the sound source corresponding to the input audio signal based on the listener of the virtual space, It is determined whether there is a blocking object blocking a direct acoustic path between the listeners,

And binaurally rendering the input audio signal based on the determination result to generate an output audio signal.
The method according to claim 1,

Wherein the output audio signal includes a transmission audio signal through which the sound corresponding to the input audio signal passes through the blocking object and is delivered to the listener,

The processor comprising:

If the blocking object exists,

Wherein the audio signal is binaurally rendered on the basis of a length of a section in which a direct acoustic path between the sound source and the listener overlaps the blocking object and an acoustic transmittance of the blocking object, Processing device.
3. The method of claim 2,

Wherein an acoustic transmittance of the blocking object has a different value for each frequency bin.
3. The method of claim 2,

Wherein the output audio signal comprises a diffracted audio signal simulating an acoustic sound corresponding to the input audio signal being diffracted by the blocking object to reach the listener,

The processor comprising:

Determining at least one diffraction point at which the sound corresponding to the input audio signal is diffracted at the surface of the blocking object, based on the shape of the blocking object,

And binarally rendering the input audio signal based on the position of the at least one diffraction point to generate the diffracted audio signal.
5. The method of claim 4,

The processor comprising:

Acquiring a first head related transfer function (HRTF) corresponding to the at least one diffraction point with respect to the head direction of the listener,

And binaurally rendering the input audio signal using the first HRTF to generate the diffracted audio signal.
6. The method of claim 5,

The processor comprising:

Determining a point at which the sum of distances of the first path from the point on the surface of the object to the listener and the distance of each of the second path from the point to the sound source is the at least one diffraction point,

Wherein the first path and the second path are shortest paths that do not traverse the object.
The method according to claim 6,

The processor comprising:

And binarally rendering the input audio signal based on the first HRTF and a diffraction distance representing a sum of a distance of the first path and a distance of the second path along the at least one diffraction point, And generates a signal.
8. The method of claim 7,

The processor comprising:

Determines an attenuation gain for adjusting the size of the diffracted audio signal based on the diffraction distance,

Binarally rendering the input audio signal based on the first HRTF and the attenuation gain to generate the diffracted audio signal,

Wherein the attenuation gain has a different value for each frequency bin of the audio signal.
The method according to claim 6,

The processor comprising:

And mixes the diffracted audio signal and the transparent audio signal to generate the output audio signal.
3. The method of claim 2,

Wherein the output audio signal comprises a two-channel output audio signal corresponding to each of the two ears of the listener,

The processor comprising:

Determining whether the blocking object is present for each of the right and left sides of the listener based on the position of each of the ears of the listener,

And generates the 2-channel output audio signal for each channel based on the determination result.
11. The method of claim 10,

Wherein the blocking object includes a first blocking object that only blocks either the right or left side of the listener,

Wherein the 2-channel output audio signal comprises a reflected audio signal that simulates an acoustic sound corresponding to the input audio signal reflected by the blocking object and delivered to the listener,

The processor comprising:

Determining a reflection point at which sound corresponding to the input audio signal is reflected at the surface of the first blocking object based on the position of the ear corresponding to the other of the listener's ears and the shape of the first blocking object,

And binaurally rendering the input audio signal based on the position of the reflection point to generate a first reflected audio signal corresponding to the first blocking object.
The method of claim 11, wherein

The processor comprising:

Acquiring a second HRTF corresponding to the reflection point with respect to the head direction of the listener,

And binaurally rendering the input audio signal using the second HRTF to generate the first reflected audio signal.
The method of claim 11, wherein

The processor comprising:

Determining a channel including the first reflected audio signal from the 2-channel output audio signal based on the position of the first blocking object,

And generates the 2-channel output audio signal based on the determination.
14. The method of claim 13,

Wherein the channel audio signal corresponding to the other one of the two-channel output audio signals includes the first reflected audio signal,

And the channel audio signal corresponding to either one of the channels does not include the first reflected audio signal.
11. The method of claim 10,

The processor comprising:

And determines the position of each of the ears of the listener based on the head size of the listener.
11. The method of claim 10,

The processor comprising:

An HRTF set including a plurality of HRTFs according to elevation angles and elevation angles based on the position of the listener is measured on the basis of the measured reference distance, the position of each ear of the listener, and the position of the sound source, Gt; HRTF < / RTI > corresponding to < RTI ID = 0.0 &

Binaurally rendering the input audio signal based on the east side HRTF and the large side HRTF,

Wherein the east side HRTF and the large side HRTF are HRTFs corresponding to different positions among the plurality of HRTFs.
11. The method of claim 10,

Wherein the virtual space includes a plurality of divided spaces in which a reverberation filter is different from each other,

The processor comprising:

If the position of each of the ears of the listener is located in a different divided space, the input audio signal is filtered based on a different reverberation filter for each of the right and left sides of the listener, And generates a corresponding reverberant audio signal.
The method according to claim 1,

Wherein the blocking object is a non-sound object having no sound output from the blocking object in the virtual space.
19. The method of claim 18,

The processor comprising:

And receives, together with the input audio signal, metadata representing information about a non-sound object included in the virtual space.
A method of operating an audio signal processing apparatus for rendering an input audio signal,

Obtaining information about an input audio signal and a virtual space in which the input audio signal is simulated;

The sound source and the sound source of the at least one object based on the position of each of the at least one object included in the virtual space and the position of the sound source corresponding to the input audio signal based on the listener of the virtual space, Determining whether there is a blocking object blocking the listener;

Binaurally rendering the input audio signal based on the determination result to generate an output audio signal; And

And outputting the output audio signal.