WO2009042367A1

WO2009042367A1 - Method and system for matching media content

Info

Publication number: WO2009042367A1
Application number: PCT/US2008/075331
Authority: WO
Inventors: Raghavan Subramaniyan; Nanda Kishore A.S.; Vikrant Oak Vasant; Shailesh Ramamurthy
Original assignee: Motorola, Inc.
Priority date: 2007-09-26
Filing date: 2008-09-05
Publication date: 2009-04-02

Abstract

A method for matching test media content with a plurality of reference media content is provided. The method includes passing (404) the test media content and the plurality of reference media content through a set of filters to obtain a set of filtered components, dividing (406) each filtered component into a plurality of blocks of data points, computing (408) a plurality of representative signal-levels corresponding to the plurality of blocks of data points, computing (410) the correlation values of the plurality of representative signal-levels corresponding to the set of filtered components of the test media content and a plurality of offset versions of at least one reference media content, and determining (412) whether the test media content matches at least one reference media content, based on the computed correlation values.

Description

METHOD AND SYSTEM FOR MATCHING MEDIA CONTENT

FIELD OF THE INVENTION

[001] The present invention relates in general to the field of media content, and more specifically, to a method and system for matching media content.

BACKGROUND OF THE INVENTION

[002] Satellite television channels have increased exponentially with the advent of communication technology. The content being shown on these channels have also increased proportionately. In most cases, a person watching a particular channel does not know much about the content being shown on the channel. For example, a person may not know the name of the movie he or she is watching or the name of the singer singing a particular song. In this event, the person may be interested in knowing the details of the content he or she is viewing. For example, a person watching a music channel may be interested in knowing the name of the rock band playing a particular song. In this case, the particular detail the person is interested in is the name of the rock band. Another example could be a person watching an advertisement of a product and wanting to know the details about the product, such as its price, its unique features, the name of the stores that sells the product, etc.

[003] To obtain details of the content, the person watching the channel can record a small segment of the content being played on the channel and transmit the recorded segment to a server, which has access to all the channels being shown on television at that particular time. For example, if the person is interested in knowing the name of the rock band playing a particular song 'X' on a television channel 'Y', he can record a small segment of the song X and can transmit it to a server, which has access to all the satellite channels that can be viewed on the television. After receiving the segment of the song, the server can use a matching algorithm to match that segment of the song to all the songs being played on all the channels. When a match is identified for the segment, the server can send the required details of the song to the person. For example, if the server is able to identify the song corresponding to the segment the person has sent, it can search its database for the name of the rock band playing the song. After the name of the rock band is determined, the server can send a message to the person, "The song X, being played on channel Y, is played by the rock band 'ABC". In another case, the server can even facilitate the person to buy the particular song X. For example, the server can send a message to the person, saying "If you are interested in buying the song X, send us a confirmation message". In this case, the server can charge some money from the person and send the song X.

[004] Another example of the use of matching algorithm in a server can be synchronization of media streams. For example, a person may be watching a streamed broadcast program on his computer. At the same time, he can be a part of a chat session though the computer or through a phone with his friends who are also watching the same program in their homes. Due to variable network delays, the audio may be playing at different times in the user's home and his friend's home. This can lead to inconvenience when the friends need to discuss about the program, since they are not synchronized. To synchronize the programs, a segment of the audio from each of the friends can be sent to the server, where the matching algorithm determines which program all the friends are watching as well as the match location and hence the relative delay can be determined. This delay value can be used by the application on the person's computer to synchronize with the rest of the group.

[005] Conventionally, the server using the matching algorithm is able to serve only a few requests to match content at a particular time. For example, a server may not be able to serve more than eight requests at a time. This may be due to the complex nature of the matching algorithm or the limited infrastructure available that corresponds to the server. Today, with fast-growing satellite channels and public awareness, a server should be able to serve more requests than what it typically does. However, to upgrade or add new infrastructure to a server entails significant costs, and is therefore impractical. Therefore, the only practical way to make a server more efficient is to reduce the complexity of the matching algorithm.

[006] In light of the foregoing, there is a need for a method and system for matching content, which uses a matching algorithm that is not only accurate but is also less complex than conventional matching algorithms. BRIEF DESCRIPTION OF THE FIGURES

[007] The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views, and which, together with the detailed description below, are incorporated in and form part of the specification, serve to further illustrate various embodiments and explain various principles and advantages, all in accordance with the present invention.

[008] FIG. 1 illustrates an exemplary communication device and an exemplary server for receiving a media content sent by the exemplary communication device, in accordance with an embodiment of the present invention;

[009] FIG. 2 illustrates an exemplary media content, in accordance with an embodiment of the present invention;

[010] FIG. 3 illustrates a block diagram of an exemplary filter and an exemplary representative signal-level calculator, in accordance with an embodiment of the present invention;

[011] FIG. 4 is a flow diagram illustrating a method for matching test media content with a plurality of reference media content, in accordance with a first embodiment of the present invention;

[012] FIG. 5 is a flow diagram illustrating a method for matching test media content with a plurality of reference media content, in accordance with a second embodiment of the present invention;

[013] FIG. 6 is a flow diagram illustrating a method for matching test media content with a plurality of reference media content, in accordance with a third embodiment of the present invention;

[014] FIGs. 7 and 8 is a flow diagram illustrating a method for matching test media content with a plurality of reference media content, in accordance with a fourth embodiment of the present invention; [015] FIGs. 9 and 10 is a flow diagram illustrating a method for matching test media content with a plurality of reference media content, in accordance with a fifth embodiment of the present invention;

[016] FIGs. 11 and 12 is a flow diagram illustrating a method for matching test media content with a plurality of reference media content, in accordance with a sixth embodiment of the present invention;

[017] FIG. 13 is a flow diagram illustrating a method for choosing a subset of filters from a set of filters, in accordance with an embodiment of the present invention;

[018] FIG. 14 is a flow diagram illustrating a method for matching test media content with a plurality of reference media content, in accordance with a seventh embodiment of the present invention;

[019] FIG. 15 illustrates the modification of a block size of varying sampling frequencies, in accordance with an embodiment of the present invention;

[020] FIG. 16 illustrates the overlapping and spacing-out of blocks of varying sampling frequencies, in accordance with an embodiment of the present invention;

[021] FIG. 17 illustrates a block diagram of an exemplary server, in accordance with an embodiment of the present invention;

[022] FIG. 18 illustrates a block diagram of an exemplary server, in accordance with another embodiment of the present invention;

[023] FIG. 19 illustrates the frequency response of a plurality of filters, in accordance with an embodiment of the present invention;

[024] FIG. 20 illustrates a percentage of correct matches of varying block sizes, in accordance with an embodiment of the present invention; and

[025] FIG. 21 illustrates the percentage of wrong matches of varying block sizes, in accordance with an embodiment of the present invention. [026] Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated, relative to other elements, to help in improving an understanding of the embodiments of the present invention.

DETAILED DESCRIPTION

[027] Before describing in detail the particular method and system for matching media content, in accordance with various embodiments of the present invention, it should be observed that the present invention resides primarily in combinations of method steps related to a method for matching media content. Accordingly, the system components and method steps have been represented, where appropriate, by conventional symbols in the drawings, showing only those specific details that are pertinent for an understanding of the present invention, so as not to obscure the disclosure with details that will be readily apparent to those with ordinary skill in the art, having the benefit of the description herein.

[028] In this document, the terms 'comprises,' 'comprising' or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article or apparatus that comprises a list of elements does not include only those elements, but may include other elements that are not expressly listed or inherent in such a process, method, article or apparatus. An element proceeded by 'comprises ... a' does not, without more constraints, preclude the existence of additional identical elements in the process, method, article or apparatus that comprises the element. The term 'another', as used in this document, is defined as at least a second or more. The terms 'includes' and/or 'having', as used herein, are defined as comprising. The term 'another', as used in this document, is defined as at least a second or more.

[029] For a first embodiment, a method for matching a test media content with a plurality of reference media content is provided. The media content is defined by a plurality of data points, and each data point of the plurality of data points represents a signal-level. The method includes passing the test-media content and the plurality of reference media content through a set of filters to obtain a set of filtered components of the test media content and a set of filtered components of each reference media content of the plurality of reference media content. Further, the method includes dividing each filtered component of the set of filtered components of the test media content and each filtered component of the set of filtered components of each reference media content to obtain a plurality of blocks of data points for each filtered component of the set of filtered components of the test media content and each filtered component of the set of filtered components of each reference media content. Furthermore, the method includes computing a plurality of representative signal- levels corresponding to the plurality of blocks of data points for each filtered component of the set of filtered components of the test media content and the plurality of blocks of data points for each filtered component of the set of filtered components of each reference media content. The method also includes computing the correlation values of the plurality of representative signal-levels corresponding to the plurality of blocks of data points of each filtered component of the set of filtered components of the test media content and the plurality of representative signal-levels corresponding to a plurality of blocks of data points of a plurality of offset versions of at least one filtered component of at least one reference media content. The method also includes determining whether the test media content matches at least one reference media content of the plurality of reference media content, based on the computed correlation values.

[030] For a second embodiment, a method for matching a test media content with a plurality of reference media content is provided. Media content is defined by a plurality of data points, with each data point of the plurality of data points representing a signal-level. The method includes passing the test media content and the plurality of reference media content through a set of filters to obtain a set of filtered components of the test media content and a set of filtered components of each reference media content of the plurality of reference media content. Further, the method includes computing the correlation values of the set of filtered components of the test media content and the set of filtered components of each reference media content of the plurality of reference media content, based on the block size value chosen in at least one previous stage. Furthermore, the method includes selecting a subset of the plurality of reference media content, based on the computed correlation values. The method also includes choosing a block size value from a predetermined set of block size values. The chosen block size value is less than the block size value chosen in at least one previous stage. Moreover, the method includes dividing each filtered component of the set of filtered components of the test media content and the set of filtered components of each reference media content of the selected subset of the plurality of reference media content to obtain a plurality of blocks of data points, based on the chosen block size value. Further, the method includes computing a plurality of representative signal-levels corresponding to the plurality of blocks of data points of each filtered component of the set of filtered components of the test media content and the plurality of blocks of data points of each filtered component of the set of filtered components of each reference media content of the selected subset of the plurality of reference media content. The method also includes calculating a matching metric between the test media content and each reference media content of the selected subset of the plurality of reference media content, based on the computed plurality of representative signal-levels. Further, the method includes determining whether the test media content matches at least one reference media content of the selected subset of the plurality of reference media content, based on the calculated matching metric. Furthermore, the method includes iteratively computing, selecting, choosing, dividing, computing, calculating and determining until a predefined criterion is fulfilled. The predefined criterion is that at least one of the chosen block size values is the lowest block size value of the predetermined set of block size values and a match is found between the test media content and the plurality of reference media content.

[031] For a third embodiment, a method for matching a test media content with a plurality of reference media content is provided. The media content is defined by a plurality of data points, with each data point of the plurality of data points representing a signal-level. The method includes passing the test media content and the plurality of reference media content through a set of filters to obtain a set of filtered components of the test media content and a set of filtered components of each reference media content of the plurality of reference media content. Further, the method includes choosing a subset of filters from the set of filters. The chosen subset of filters is different from a subset chosen at any previous stage. Furthermore, the method includes computing the correlation values of a subset of filtered components of the test media content and a subset of filtered components of each reference media content of the plurality of reference media content. A subset of filtered components corresponds to the chosen subset of filters. The method also includes selecting a subset of the plurality of reference media content, based on the computed correlation values. Moreover, the method includes identifying a matching metric between the test media content and each reference media content of the selected subset of the plurality of reference media content, based on the chosen subset of filters. Further, the method includes determining whether the test media content matches at least one reference media content of the selected subset of the plurality of reference media content, based on the identified matching metric. The method also includes iteratively choosing, computing, selecting, identifying and determining until a predefined criterion is fulfilled. The predefined criterion is that at least one of the total numbers of distinct filters in the chosen subset of filters is greater than a predetermined number, and a match is found between the test media content and the plurality of reference media content.

[032] For a fourth embodiment, a method for matching a test media content with a plurality of reference media content is provided. The media content is defined by a plurality of data points, with each data point of the plurality of data points representing a signal-level. The method includes dividing the test media content and the plurality of reference media content into a plurality of blocks of data points corresponding to a block-size value. Further, the method includes comparing a sampling frequency of the test media content with a representative sampling frequency of a plurality of sampling frequencies corresponding to the plurality of reference media content. Furthermore, the method includes modifying the block size value of at least one test media content and the plurality of reference media content when the sampling frequency of the test media content is different from its representative sampling frequency. The block-size value is modified, based on the difference between the sampling frequency of the test media content and the representative sampling frequency. The method also includes determining whether the test media content matches at least one reference media content of the plurality of reference media content, based on the modified block-size value. [033] For a fifth embodiment, a server is provided. The server includes a set of filters, which are configured to generate a set of filtered components of specific test media content and a set of filtered components of each of a plurality of reference media content. The server also includes a processor. The processor includes a calculator that is configured to compute a plurality of representative signal-levels corresponding to a plurality of blocks of data points for each filtered component of a set of filtered components of a media content. The calculator is also configured to compute the correlation values of a plurality of representative signal-levels corresponding to the filtered components of the test media content and each reference media content. Moreover, the processor includes a matching engine that is configured to determine whether the test media content matches at least one reference media content of the plurality of reference media content.

[034] For a sixth embodiment, a system provided. The system includes a comparator that is configured to compare the sampling frequency of specific test media content with the representative sampling frequency of a plurality of sampling frequencies corresponding to a plurality of reference media content. Further, the system includes a processor that is configured to modify the block-size value of at least one of the test media content and the plurality of reference media content when the sampling frequency of the test media content is different from the representative sampling frequency. The block-size value is modified, based on the difference between the sampling frequency of the test media content and the representative sampling frequency. Furthermore, the system includes a matching engine that is configured to determine whether the test media content matches at least one reference media content of the plurality of reference media content, based on the modified block-size value.

[035] FIG. 1 illustrates an exemplary communication device 102 and an exemplary server 104 for receiving a media content sent by the communication device 102, in accordance with an embodiment of the present invention. Examples of the communication device 102 include, but are not limited to, a cordless phone, a mobile phone, a Personal Digital Assistant (PDA) and a personal computer. Typically, the communication device 102 can provide a plurality of operational features to the user of the device. These operational features can be, for example, recording an audio or video clip, storing the recorded clip, viewing the stored clip, and sending or receiving the recorded clip. The communication device 102 can send the recorded clip to a plurality of communication devices or to a server by using the conventional Multimedia Messaging Service (MMS) service or by any other means of transmission.

[036] For one embodiment, the communication device 102 can include a recorder (not shown) that is configured to record a media content 108 being played on a display unit 106. The media content 108 can be an audio clip being played on the display unit 106. The display unit 106 can be, for example, a television (TV). Typically, the display unit 106 can display a plurality of satellite channels, with each satellite channel playing a different audio clip. However, at a particular time, the display unit 106 can display only one satellite channel. Therefore, only a single audio clip can be played on the display unit 106 at a time. Consequently, the communication device 102 can record only a single audio clip that is playing on the display unit 106 at that particular time.

[037] In an exemplary scenario, a user of the communication device 102, viewing the media content 108 on the display unit 106, may want to get additional information about the media content 108. For example, the user may want to know the name of the rock band playing the media content 108 or he or she may be interested in knowing the name of the movie from which the media content 108 is derived. In this case, the user can record a small segment of the media content 108 by using the communication device 102. The length of the small recorded segment of the media content 108 may vary from a few seconds to a few minutes. For example, the user can record a 15- second clip of the media content 108 that is playing on the display unit 106. To get additional information about the media content 108, the user can send the recorded clip to the server 104, along with the additional information he or she wants by using the communication device 102. For example, the user can send the recorded clip to the server 104, along with the question, "What is the name of the rock band?". Once the server 104 receives the media content 108, it can send the information about the media content 108 to the user. [038] For one embodiment, the server 104 can have access to all the satellite channels that can be displayed on the display unit 106. For example, the display unit 106 can be a television and the server 104 can have access to all the audio clips playing on all the satellite channels on television at that time. There can be, for example, more than 100 satellite channels that can be displayed on television at a particular time. The server 104 can have access to all the audio clips playing on all of these 100 satellite channels.

[039] On receiving the recorded clip of the media content 108, the server 104 matches the recorded clip of the media content 108 with all the audio clips playing on all the satellite channels at that time. When a match is found, the server 104 locates the additional information about the media content 108 and sends the information back to the communication device 102. Continuing with the earlier example, the server 104 can determine that the name of the rock band playing the media content 108 as 'XYZ' and send a message to the communication device 102, "The name of the rock band is XYZ."

[040] For one embodiment, the recorded clip of the media content 108 may take a few minutes to reach the server 104 after it is sent by the communication device 102. This time delay of a few minutes may be due to communication network congestion or network delay. To account for this time delay, the server 104 should have access to all the audio clips that were being played on all the satellite channels during the last few minutes. For example, the server 104 can have access to all the audio clips that were being played on all the satellite channels during the last two minutes. In this case, the server 104 matches the recorded clip of the media content 108 with all the audio clips displayed during the last two minutes. After a match for the media content 108 is found, the server 104 can send the additional information, corresponding to the media content 108, to the communication device 102.

[041] FIG. 2 illustrates the exemplary media content 108, in accordance with an embodiment of the present invention. The media content 108 can be, for example, an audio or a video clip. The length of the media content 108 can vary from a few seconds to a few minutes. As shown in FIG. 2, the media content 108 can have a length 'L', where L denotes the time duration of the media content 108. The length L is also known as the time duration of the 'window of the media content 108'.

[042] The media content 108 can be defined by a plurality of data points, where each data point of the plurality of data points represents a signal-level. For example, a data point 202 can represent a signal-level 'X₁', as shown on the x-axis in FIG. 2. A collection of a plurality of single data points, such as the data point 202, forms the media content 108. The intensity of the signal of the media content 108 at a particular time is denoted by the signal-level of the data point at that time. For example, the intensity of the signal of the media content 108 at a time 204 is denoted by X₁, i.e., the signal-level corresponding to the data point 202.

[043] For one embodiment, the media content 108 can be divided into a plurality of blocks of data points. As shown in FIG. 2, each block of data points can be of a length 206. For one embodiment, the length 206 is known as the 'block-size value' of the media content 108. The length 206 or the block- size value of the media content 108 can vary from a few milliseconds to a few seconds. For example, the length 206 can be 480 milliseconds. The length 206 is always a subset of the length L of the media content 108. For example, the length L of the media content 108 can be 120 seconds and the length 206 can be 480 milliseconds. In this case, the length 206 can never exceed 120 seconds. The number of blocks of the media content 108, obtained by dividing the media content 108 into equal blocks of length 206, is equal to the value obtained by dividing the length L of the media content 108 by the length 206. Continuing with example mentioned above, the number of blocks of the media content 108 is equal to 250, i.e., 120 seconds/ 480 milliseconds. As the length 206 increases, the number of blocks of data points decreases. Similarly, the number of blocks of data points increases as the length 206 decreases.

[044] FIG. 3 illustrates a block diagram of an exemplary filter 302 and an exemplary representative signal-level calculator 304, in accordance with an embodiment of the present invention. The filter 302 and the representative signal- level calculator 304 are used to calculate a representative signal-level of a filtered component of the media content 108. The representative signal-level of a filtered component of a media content is also known as the 'block energy' of the filtered component of the media content. To describe FIG. 3, reference will be made to FIGs. 1 and 2, although it should be understood that the filter 302 and the representative signal-level calculator 304 can be used in any other suitable environment as well.

[045] The filter 302 is selected, based on the expected sampling frequency of the media content 108. For one embodiment, the media content 108 can be an audio clip with a sampling frequency of 8000 Hz. In this case, the filter 302 can be designed as a broad filter with zeroes at 0 Hertz (Hz) and 4000 Hz. In other words, in this case, the filter 302 can act as an all-pass filter. For another embodiment, the filter 302 can have a frequency coverage range of between 400 Hz to 3400 Hz. This frequency coverage range corresponds to the frequency range where the media content 108 is expected to have the maximum signal-levels. For example, if the media content 108 has a sampling frequency of 8000 Hz, it can be expected to have maximum signal-levels between the frequencies 400 Hz and 3400 Hz. The 400 Hz frequency corresponds to the lowest frequency or the noise frequency of the media content 108. The lowest frequency can even be lower than 400 Hz if the media content 108 is recorded in a reduced noise environment. For example, the lowest frequency can be 200 Hz if the media content 108 is recorded in a noise-free environment. In this case, the frequency coverage of the filter 302 can be 200 to 3400 Hz.

[046] The process of calculating the representative signal-level of a filtered component of the media content 108 begins by dividing the media content 108 into a number of blocks of data points. The division of the media content 108 into a number of blocks of data points has already been explained in FIG. 2. Further, when the media content 108 is divided into a number of blocks of data points, it is passed through the filter 302. The filter 302 then generates a filtered component, X(m), of the media content 108. For one embodiment, the step of dividing the media content 108 into a number of blocks is performed after the media content 108 is passed through the filter 302. In this case, the filtered component X (m) of the media content 108 is divided into a number of blocks of data points rather than the media content 108 being divided. [047] When the filtered component X (m) is generated, it is passed though the representative signal-level calculator 304. The representative signal-level calculator 304 then calculates the representative signal-level, Ex (m), of the filtered component X (m). Typically, the representative signal-level Ex (m) of the filtered component X (m) indicates the relative signal strength of the filtered component X (m). For example, if a first filtered component of a media content has a representative signal- level of nine units, and a second filtered component has a representative signal-level of five units, the first filtered component is expected to have higher signal strength than the second signal strength. In other words, the media content has higher signal strength in the frequency corresponding to the first filtered component than in the frequency corresponding to the second filtered component.

[048] For one embodiment, the representative signal-level Ex (m) of the filtered component of the media content 108 is calculated, based on the length 206 of the block of data points of the media content 108. Typically, the signal strength of a filtered component is better estimated by choosing the shorter length of the block of data points than the longer length. For example, the representative signal-level Ex (m) corresponding to a shorter length 206, e.g. 480 milliseconds, provides a better estimate of the signal strength of the filtered component X (m) than a longer length 206, e.g., 1440 milliseconds. However, the calculation of the representative signal- level Ex (m) is simpler if a longer length 206 is chosen rather than a shorter length 206. Therefore, while programming the algorithm of the representative signal-level calculator 304, the programmer should decide whether more accuracy or more speed of calculation is required. For more accuracy, the shorter length of the block of data points should be chosen, and for more speed of calculation, the longer length of the block of data points should be chosen.

[049] FIG. 4 is a flow diagram illustrating a method for matching a test-media content with a plurality of reference media content, in accordance with a first embodiment of the present invention. To describe FIG. 4, reference is made to previous figures, although it should be understood that the described method can be implemented in any other suitable environment as well. Moreover, the invention is not limited to the order in which the steps are listed in the described method. [050] At step 402, the method for matching the test media content with the plurality of reference media content is initiated. The test media content and each of the reference media content of the plurality of reference media content is defined by a plurality of data points, with each data point of the plurality of data points representing a signal-level. The concept of 'data point' and 'signal-level' has already been explained in conjunction with FIG. 2. At step 404, the test media content and the plurality of reference media content are passed through a set of filters. For one embodiment, the number of filters in the set of filters is greater than four. For example, the number of filters in the set of filters can be 16. The selection of frequency coverage or the frequency range of each filter of the set of filters is based on the expected sampling frequency of the test media content. The selection of the frequency range of a filter, based on the sampling frequency of the test media content, has already been explained in FIG. 3.

[051] When the test media content is passed through the set of filters, a set of filtered components of the test media content is obtained. For example, if there are 'M' filters in the set of filters, M filtered components of the test media content are obtained to form the set of filtered components of the test media content. Similarly, a set of filtered components of each of the reference media content of the plurality of reference media content is obtained when the reference media content is passed through the set of filters.

[052] At step 406, each filtered component of the set of filtered components of the test media content and the plurality of reference media content is divided to obtain a plurality of blocks of data points for each filtered component of the set of filtered components of the test media content and each filtered component of the set of filtered components of each reference media content. This is better understood with the help of the following example.

[053] Consider an exemplary scenario where there are 16 filters in a set of filters. The test media content is passed through each of these 16 filters to obtain 16 filtered components of the test media content. Thereafter, each of these 16 filtered components of the test media content is divided into a number of blocks of data points to obtain a plurality of blocks of data points. The same process is repeated for all the filtered components of all the reference media content, to obtain a plurality of blocks of data points of each filtered component. For one embodiment, the filtered components of the test media content and each reference media content are divided, based on the length of the block of data points or 'a block- size value'. The concept of dividing a filtered component of a media content, based on the length of the block of data points, has already been explained in conjunction with FIGs. 2and 3.

[054] At step 408, a plurality of representative signal-levels corresponding to the plurality of blocks of data points for each filtered component of the set of filtered components of the test media content and each reference media content is computed. The process of computing a representative signal-level of a filtered component of a media content has already been explained in conjunction with FIG. 3. For one embodiment, the representative signal-level of the filtered component of a media content indicates the signal-strength value or 'energy value' of the filtered component of the media content. In other words, computing the representative signal-levels of a filtered component is the same as computing the signal strength value or energy value of the filtered component. Further, as explained in conjunction with FIG. 3, the higher the representative signal-level of the filtered component, the greater is the signal strength of the filtered component. In other words, if the representative signal- level of the filtered component of the media content is high, the media content is expected to have greater signal strength in the frequency range corresponding to the filtered component.

[055] At step 410, the correlation values of the plurality of representative signal- levels of the test media content and a plurality of offset versions of each of the reference media content is computed. As has already been explained in the last paragraph, the plurality of representative signal-levels of the test media content corresponds to the plurality of blocks of data points of each filtered component of the set of filtered components of the test media content. For one embodiment, an offset version of the filtered component of a media content corresponds to the 'lag' value of the filtered component. This is better understood with the help of the following example. Consider a test media content with a length 'L' of 120 seconds and a plurality of reference media content with a length of, for example, 1200 seconds. The test media content and the plurality of reference media content are passed through a filter 'm' to obtain a filtered component X (m) of the test media content and a filtered component of each reference media content of the plurality of reference media content. To compute the correlation values of X(m) and a filtered component, for example, Y(m), of a reference media content 'Y', an initial lag value of two seconds may be chosen. Typically, the lag value is taken to be in the range of milliseconds, but for the ease of discussion, the lag value of two seconds is assumed here. In this case, X (m) is correlated with Y (m) every two seconds. For example, at the first stage, a first correlation value is calculated for X (m) and Y(m) for a time period of 0 to 120 seconds of Y (m). At the second stage, a second correlation value is calculated for a time period of 2 to 122 seconds of Y (m). Similarly, in subsequent stages, correlation values corresponding to intervals of 4 to 124 seconds, 6 to 126 seconds, etc., are calculated. Consequently, different correlation values are obtained, which correspond to the lag values of zero second, two seconds, four seconds, etc., when X (m) is correlated with Y (m).

[056] Those with ordinary skill in the art would appreciate that although the method mentioned above is described for a plurality of reference media content of the same length of 1200 seconds, the invention can also work if each reference media content of the plurality of reference media content is of a different length. For example, the invention would work as effectively even if the lengths of the reference media content are 1200 seconds, 1000 seconds, 2000 seconds, and the like. In another case, the reference media content could also continue indefinitely, such as in a broadcast program, in which case a moving time window of a fixed size would be used to effectively represent the media content.

[057] The process of computing the correlation values of the filtered components of the test media content and the reference media content are better understood with the help of the following example. Consider a case when there are a total of 'Y' reference media content and test media content 'X'. They are passed through a filter 'm' of a total of 'M' filters. Further, the correlation values are computed of reference media content 'k' for a particular lag value of '1'. Furthermore, the number of blocks of X can be assumed to be N_x, and the number of blocks of reference media content can be assumed to be N_y. For one embodiment, if the representative signal-levels are denoted by 'E', the correlation value of the filtered component of X and k can be, for example,

Here, M_x(m) = -!- - ∑E_x(m,n)

M_κ (m,l) = —— - ∑E_K (m, n)

A: = 0,1,2,..., (7 - 1)

/ = 0,l,2,..., (N_Y - N_{x +} I)

[058] For another embodiment, the correlation value of the filtered component of X and k is calculated by first computing the correlation values of the small segments of the filtered component, and then by averaging all the calculated correlation values to obtain the final correlation value. For example, the correlation value of the filtered component of X and k can be,

iV^ -l

∑(p_s(k,m,l,j)) p(k,m,l) = J=O

N SEG k = 0X2,...,(Y -l)

l = 0,l,2,...,(N_Y - N_x + l)

JV ¹ S_™EG =

N,

«_s (O+i)-i ∑ (E_x (m, n - l) - E_K (m, n) - M_x (m) ■ M_κ (m, I)) p_s(k,m,l,j) = B=B₁ (Q) if B₁(O₊I)-I VB₁ (O₊I)-I . T^~

2 ∑ (E_x(m,n - l) -M_x(m))² \\ ∑ [E_κ (m,n) -M_K (mJ)f

n_s (l,j) = l + j - N_s

I N_x-I

M_x{m) = —— - ∑E_x(m,n)

N_{x n=}o

M_κ(m,l) = —- - ∑E_K (m,n)

[059] At step 412, it is determined whether the test media content matches at least one reference media content of the plurality of reference media content, based on the computed correlation values. The match can be any of the plurality of reference media content. In a particular case, the match can also be more than one reference media content or it can be no reference media content at all. The matching of the test media content with the plurality of reference media content, based on the computed correlation values, has been described in detail in the subsequent figures. At step 414, the method for matching the test media content with the plurality of reference media content is terminated.

[060] FIG. 5 is a flow diagram illustrating a method for matching a test media content with a plurality of reference media content, in accordance with a second embodiment of the present invention. To describe FIG. 5, reference is made to previous figures, although it should be understood that the described method can be implemented in any other suitable environment as well. Moreover, the invention is not limited to the order in which the steps are listed in the described method.

[061] At step 502, the method for matching the test media content with the plurality of reference media content is initiated. As already explained in conjunction with FIG. 4, the method for matching the test media content with the plurality of reference media content begins by first passing the test media content and the plurality of reference media content through a set of filters to obtain a set of filtered components of the test media content and each of the reference media content. Thereafter, a plurality of representative signal-levels is calculated that corresponds to the plurality of filtered components of the test media content and each of the reference media content.

[062] At step 504, a set of correlation values is computed of the plurality of representative signal-levels of the test media content and a plurality of offset versions of each of the reference media content. The concept of an Offset version' of a media content and the process of computing correlation values has already been explained in conjunction with FIG. 4. At step 506, the offset value of a reference media content is identified, based on the computed correlation values of the filtered components of the reference media content and the filtered components of the test media content. For one embodiment, the identified offset value corresponds to that offset version of the filtered component of the reference media content, which yields the maximum correlation value from the computed correlation values. This is better understood with the help of the following example.

[063] Consider a system with a total of 'M' filters, where M can be any value between 1 and 20. Test media content 'X' and reference media content 'Y' are passed through a first filter 'm', and the obtained filtered components are X (m) and Y (m), respectively. The length of X (m) can be, for example, 120 seconds, and the length of the length of Y(m) can be 140 seconds. As explained in FIG. 4, the initial offset value or 'lag value' can be assumed to be two seconds. Thereafter, the correlation values of X(m) and Y(m) are computed by using the initial lag value of 2 seconds, i.e., for time periods of 0 to 120 seconds, 2 to 122 seconds, 4 to 124 seconds, etc., of Y (m). Consider an exemplary scenario where the correlation value, corresponding to the lag value of 4' seconds, is the highest correlation value among all the correlation values computed for all the lag values. Consequently, in the present case, the lag value of 4 is the identified lag value or the identified offset value corresponding to the first filter m.

[064] X and Y are passed through a second filter 'm+1 ', and the entire process of identifying the lag value corresponding to the maximum correlation value, as explained in the last paragraph, is repeated for the second filter m+1. On similar lines, the method for identifying the lag value, which yields the maximum correlation value, is extended to all the M filters. Thereafter, all the identified lag values corresponding to each of the filters are collated, and the corresponding correlation values are compared to find the maximum correlation value. This is better understood with the help of the following example. Consider a case where there are a total of four filters, and the corresponding identified lag values are 4 seconds for the first filter, 16 seconds for the second filter, 6 seconds for the third filter, and 8 seconds for the fourth filter. The corresponding correlation values can be, for example, 0.5, 0.5, 0.7 and 0.9. Clearly, the identified lag value of 8 seconds yields the maximum correlation value (0.9) of X and Y. Therefore, in this case, the final identified lag value, or the identified offset value for Y, is 8 seconds.

[065] Similarly, the offset value and the corresponding maximum correlation value are identified for each reference media content of the plurality of reference media content. For one embodiment, the maximum correlation value corresponding to each filter of all the filters is also determined. Continuing with the example mentioned above, and assuming the total number of reference media content to be three, the corresponding best correlation values of the first filter 'm' can be, for example, 0.5 for the first reference media content, 0.9 for the second reference media content, and 0.6 for the third reference media content. In this case, the maximum correlation value of the first filter is 0.9, which is the correlation value corresponding to the second reference media content. For one embodiment, a set of these maximum correlation values, corresponding to each reference media content and each filter, form a set of 'best' correlation values. For example, the set of best correlation values of the example mentioned above can be 0.5 0.4 0.5 0.4

P (y,m) = 0.9 0.7 0.1 0.2 0.6 0.8 0.6 0.1

[066] At step 508, a matching metric for a reference media content, e.g., Y, is determined, based on the formed set of correlation values. For one embodiment, the matching metric of Y can also be known as the probability metric of Y, i.e., the matching metric indicates the probability of the reference media content Y being a match for the test media content X. Given the set of correlation values, the conditional probability that Y is the match for the test media content can be, for example,

(Y,m)). This would be appreciated by the person ordinarily skilled in the art. Those with ordinary skill in art would also appreciate that calculating

(Y,m)) is essentially the same as calculating P(Match≠ Y//? (Y,m)), since

P(Match=Y/p (Y,m)) + P(Match≠ YIp (Y,m)) = 1 (1)

[067] Further, using Baye's theorem, it can be shown that:

P(Match=Y/p (Y,m)) = ™ CY, m)}/Match = Y) . P(Match ₌ Y)

P({/? (Y,m)})

P(Match_≠ YIp (Y,m)) = ^P^ O^/Mateh ≠ Y) ^• ^tch _≠ Y)

P({/? (Y,m)})

If the total reference media content are 'K', then it can be shown that:

P(Match = Y) = l/K; and (4)

P(Match ≠ Y)= (K-l)/K (5)

Using equations (1), (2), (3), (4) and (5), it can be shown that:

P({/? (Y,m)})= (l/K). P({/? (Y,m)}/Match = Y) +

{(K-l)/K}. P({/? (Y,m)}/Match ≠ Y) (6) Using equations (2), (4), (5) and (6), it can be shown that:

P({/? (Y,m)}/Match = Y)

P(Match=Y//? (Y,m)) =

P({/7 (Y,m)}/Match = Y) + (K-I) . P({p (Y,m)}/Match ≠ Y)

1

τ F_:or one em ubod _Ai-men ₊t, r π(Yn) = P((P (Y,m) —}/Match = Y) i .s the matching metri .c for the

P({/? (Y,m)}/Match ≠ Y) reference media content Y.

[068] Similarly, matching metrics can be calculated for each reference media content of the 'K' reference media content. Those with ordinary skill in the art would appreciate that the process of calculating the matching metric of a reference media content would not change for different values of K. For example, the process of calculating the matching metric would remain the same for K equal to 4 or for K equal to 20.

[069] At step 510, a match is identified between the test media content and the plurality of reference media content, based on the matching metrics calculated for each reference media content. The identified match can be one or more than one reference media content. There may also be a case when no match is identified, and consequently, the identified match can be no reference media content. The method for identifying a match between the test media content and the plurality of reference media content, based on the calculated matching metric, is explained in detail in subsequent figures. At step 512, the method for matching the test media content with the plurality of reference media content is terminated.

[070] FIG. 6 is a flow diagram illustrating a method for matching a test media content with a plurality of reference media content, in accordance with a third embodiment of the present invention. To describe FIG. 6, reference is made to previous figures, although it should be understood that the described method can be implemented in any other suitable environment as well. Moreover, the invention is not limited to the order in which the steps are listed in the described method.

[071] At step 602, the method for matching the test media content with the plurality of reference media content is initiated. As already explained in previous figures, the process of matching the test media content with the plurality of reference media content begins by passing the test media content and the plurality of reference media content through a set of filters. The filtered components thus obtained are correlated to obtain a plurality of correlation values. Thereafter, matching metrics are computed for each reference media content of the plurality of reference media content. The process of computing a matching metric for a reference media content has already been explained in conjunction with FIG. 5.

[072] At step 604, a reference media content is identified, corresponding to the highest matching metric and the second-highest matching metric. The highest matching metric and the second-highest matching metric are selected from the matching metrics determined for the plurality of reference media content. This is better understood with the help of the following example. Consider a case when there are a total of five reference media content and the corresponding matching metrics are 0.4, 0.5, 0.3, 0.7 and 0.9. In this case, the highest and the second-highest matching metric are 0.9 and 0.7, respectively. Therefore, the identified reference media content, in this case, are the fifth and the fourth reference media content from the five reference media content.

[073] At step 606, it is determined whether the highest identified matching metric is greater than a first threshold value. For example, the first threshold value can be 0.8 and the identified highest matching metric can be 0.9. In this case, the highest matching metric is determined to be greater than the first threshold value. However, if the first threshold value is 0.95, the highest matching metric is determined to be less than the first threshold value. For one embodiment, the first threshold value is preset by the programmer of the matching algorithm, who has designed the system for matching the test media content with the plurality of reference media content. [074] At step 608, it is determined whether the difference between the highest matching metric and the second-highest matching metric is greater than a second threshold value, when the highest matching metric identified is greater than the first threshold value. Continuing with the example mentioned earlier, the highest and the second-highest matching metric can be 0.9 and 0.7, respectively. If the second threshold value is 0.4, it is determined that the difference between the highest and the second-highest matching metric, i.e., 0.2, is less than the second threshold value. However, if the second threshold value is 0.15, it is determined that the difference between the highest and the second-highest matching metric is greater than the second threshold value. Like the first threshold value, the second threshold value can also be preset by the programmer of the matching algorithm.

[075] At step 610, the match between the test media content and the plurality of reference media content is declared when the highest matching metric is greater than the first threshold value and the difference between the highest matching metric and second-highest matching metric is greater than the second threshold value. The declared match is the reference media content corresponding to the highest matching metric. Continuing with the earlier example, the highest and the second-highest matching metric are 0.9 and 0.7, respectively, and the first and second threshold value can be, for example, 0.8 and 0.15, respectively. In this case, the highest matching metric, i.e., 0.9, is greater than the first threshold value, and the difference between the highest and the second-highest threshold value, i.e., 0.2, is greater than the second threshold value. Consequently, the match between the test media content and the plurality of reference media content is the reference media content corresponding to the matching metric of 0.9.

[076] At step 612, no match is declared if even one condition of the above two conditions mentioned above is not met. For example, if the highest matching metric is less than the first threshold value, the declaration for the match may be, 'No match is available for the test media content'. Similarly, if the difference between the highest matching metric and the second-highest matching metric is less than the second threshold value, no match is declared for the match between the test media content and the plurality of reference media content. Preset results or statements can de declared when no match is found for the test media content. These preset statements can be, for example, 'No match is available for the audio clip' if the test media content is an audio clip. For one embodiment, the preset statement is designed by the programmer of the matching algorithm.

[077] Those with ordinary skill in art would appreciate that although the method mentioned above is described by using two threshold conditions, the invention can also be functional if only one threshold condition is used by the programmer of the matching algorithm. For example, the following method can also be used to match the test media content with the plurality of reference media content, based on the matching metrics computed. The reference media content corresponding to the highest matching metric can be identified or selected from the plurality of reference media content at the first step. At the second step, it can be determined whether the highest matching metric is greater than the matching threshold value. At the third step, the match between the test media content and the plurality of reference media content can be declared when the highest matching metric is greater than the matching threshold value. Like the first and the second threshold values, the matching threshold value can also be preset by the programmer of the matching algorithm.

[078] As is evident in the example mentioned above, the present invention can function as effectively with a single threshold condition as with a double threshold condition. Those with ordinary skill in the art would appreciate that there can be other threshold conditions or embodiments where the invention could work in a similar manner. Some of those embodiments are described in subsequent figures. At step 614, the method for matching the test media content with the plurality of reference media content is terminated.

[079] FIGs. 7 and 8 is a flow diagram illustrating a method for matching specific test media content with a plurality of reference media content, in accordance with a fourth embodiment of the present invention. To describe FIGs. 7 and 8, reference will be made to previous figures, although it should be understood that the described method can be implemented in any other suitable environment as well. Moreover, the invention is not limited to the order in which the steps are listed in the described method.

[080] At step 702, the method for matching the test media content with the plurality of reference media content is initiated. At step 704, the highest matching metric from the matching metrics computed for each reference media content is determined. The process of computing a matching metric for a reference media content has already been explained in conjunction with FIG. 5. Further, the concept of the 'highest matching metric' has also been explained with a help of an example in FIG. 6.

[081] At step 706, a first set of reference media content is formed, based on the highest matching metric determined. The first set of reference media content is a subset to the plurality of reference media content. For one embodiment, the first set of reference media content is formed by selecting the reference media content that have matching metrics greater than a matching-metric threshold value. The matching- metric threshold value is calculated, based on the highest matching metric determined. This is better understood with the help of the following example. Consider a case where there are five reference media content with matching metrics 0.4, 0.8, 0.77, 0.5 and 0.85, respectively. Clearly, the highest matching metric has a value of 0.85 and the corresponding reference media content is the 'fifth' reference media content. The matching-metric threshold value, in this case, can be, for example, 0.9 * 0.85 (= 0.765). The value of '0.9' can be preset by the programmer of the matching algorithm by designing a system for matching the test media content with the plurality of reference media content. Further, reference media content with matching metrics that are greater than 0.765 are selected to form the first set of reference media content. Consequently, the 'second', 'third' and 'fifth' reference media content of the five reference media content form the first set of reference media content.

[082] For another embodiment, the first set of reference media content is formed by determining the set of difference values of the highest matching metric and the matching metric corresponding to each of the plurality of reference media content. Continuing with the earlier example, the highest matching metric is 0.85 and the set of difference values is {0.45, 0.05, 0.08, 0.35, 0}. These difference values are obtained by subtracting the value of each matching metric from the highest matching metric, 0.85. Further, these difference values are selected from this set of difference values, which are less than a set threshold. For example, the set threshold can be preset to be 0.06, and thus, the difference values selected are 0.05, 0.08 and 0. The corresponding matching metrics are identified as 0.8, 0.77 and 0.85. Furthermore, the reference media content corresponding to the identified matching metrics are selected to form the first set of reference media content. In the present example, the first set includes the 'second', 'third' and 'fifth' reference media content of the five reference media content.

[083] Those with ordinary skill in the art would appreciate that although the two methods mentioned above for forming the first set of reference media content are described separately, the invention can also function if the two methods are combined in the same matching algorithm. For example, the process of forming the first set of reference media content can include the matching metric threshold condition as well as the set threshold condition in a single method for forming the first set of reference media content. In such a case, only the reference media content is selected to form the first set, which satisfies both the conditions simultaneously.

[084] At step 708, a second set of reference media content is formed from the plurality of reference media content. The second set of reference media content includes the reference media content that is not included in the first set of reference media content. Continuing with the earlier example, the five reference media content have matching metric values of 0.4, 0.8, 0.77, 0.5 and 0.85. As described in previous paragraphs, the first set can include the 'second', 'third' and 'fifth' reference media content of the five reference media content. Consequently, in this case, the second set can include the 'first' and 'fourth' reference media content.

[085] At step 710, a first matching metric and a second matching metric are selected that correspond to the first set and second set of reference media content, respectively. For one embodiment, the first matching metric is the highest matching metric of the matching metrics corresponding to the first set of reference media content. Continuing with the earlier example, the first set of reference media content has matching metric values of 0.8, 0.77 and 0.85. The highest matching metric, in this case the first matching metric, can be 0.85. For another embodiment, the first matching metric is the mean of all the matching metrics corresponding to the first set of reference media content. In the example mentioned above, the first matching metric can be, for example, (0.8 + 0.77 + 0.85)/3, i.e., 0.8067. For yet another embodiment, the first matching metric is the media value of the matching metrics corresponding to the first set of reference media content. For example, the first matching metric can be 0.8 for the set of {0.77, 0.8, 0.85}.

[086] Like the first matching metric, the second matching metric can be the highest matching metric of the matching metrics corresponding to the second set of reference media content. In the example described earlier, the second set of reference media content has matching metric values of 0.4 and 0.5. Consequently, in this case, the second matching metric can be, for example, 0.5.

[087] At step 712, it is determined whether the first matching metric is greater than an upper threshold value. For example, if the upper threshold value is 0.8 and the first matching metric is 0.85, it is determined that the first matching metric is greater than the upper threshold value. On the other hand, if the upper threshold value is 0.9, it is determined that the first matching metric is less than the upper threshold value.

[088] In FIG. 8, it is determined at step 802 whether the difference between the first matching metric and the second matching metric is greater than a difference threshold value. This step is carried out after it has been established in the step 712 that the first matching metric is greater than the upper threshold value. For example, if the upper threshold value is 0.8, the difference threshold value is 0.2, the first matching metric is 0.85 and the second threshold value is 0.5, it is determined that the difference between the first matching metric and the second matching metric is greater than the difference threshold value. However, if the difference threshold value is 0.5, it is determined that the difference between the first matching metric and the second matching metric is less than the difference threshold value.

[089] At step 804, the match between the test media content and the plurality of reference media content is declared when the first matching metric is greater than the upper threshold value, and the difference between the first and the second matching metrics is greater than the difference threshold value. The match is the reference media content corresponding to the first matching metric. Continuing with the earlier example, the match is the reference media content corresponding to the matching metric of 0.85 if the upper threshold value is 0.8 and the difference threshold value is 0.2.

[090] At step 806, no match is declared if even one condition of the two conditions mentioned above is not met. For example, if the first matching metric is less than the upper threshold value, the declaration for the match may be, 'No match is available for the audio clip' if the test media content is an audio clip. Similarly, if the difference between the first matching metric and the second matching metric is less than the difference threshold value, no match is declared for the match between the test media content and the plurality of reference media content. At step 808, the method for matching the test media content with the plurality of reference media content is terminated.

[091] FIGs. 9 and 10 is a flow diagram illustrating a method for matching a test media content with a plurality of reference media content, in accordance with a fifth embodiment of the present invention. To describe FIGs. 9 and 10, reference is made to previous figures, although it should be understood that the described method can be implemented in any other suitable environment as well. Moreover, the invention is not limited to the order in which the steps are listed in the described method.

[092] At step 902, the method for matching the test media content with the plurality of reference media content is initiated. As mentioned in previous figures, the test media content and each of the reference media content of the plurality of reference media content are defined by a plurality of data points, and each data point of the plurality of data points represents a signal-level. The concept of 'data point' and 'signal-level' has been explained in conjunction with FIG. 2. At step 904, the test media content and the plurality of reference media content is passed through a set of filters to obtain a set of filtered components of the test media content and each of the reference media content. The process of obtaining the set of filtered components of the test media content and the reference media content, by passing them through the set of filters, has already been explained in conjunction with FIG. 4.

[093] At step 906, the correlation values are computed of the set of filtered components of the test media content, and each reference media content, based on the block-size value chosen during at least one previous stage. The concept of 'block-size value' is the same as the concept of the 'length 206' of data points of the media content 108, as described in FIG. 2. Further, the term 'previous stage' is used because the method for matching the test media content with the plurality of reference media content, as described in FIGs. 9 and 10, is iterative in nature and is repeated a number of times until a predefined criterion is met. This predefined criterion is explained later, in detail, in the description of the present figures. Further, if the step 906 is executed for the first time, with no chosen block size from the previous stages, no correlation values can be computed until a block size is chosen. In this case, a block size is first chosen from a predetermined set of block values, and then the correlation values are computed, based on the chosen block-size value. For example, the predetermined set of block size values can be 60 milliseconds, 120 milliseconds, 240 milliseconds and 480 milliseconds, and the chosen block-size value can be, for example, 240 milliseconds. In this case, the correlation values are computed, and correspond to the block-size value of 240 milliseconds.

[094] The process of computing the correlation values of the filtered components of the test media content and each of the reference media content begins by dividing each filtered component into blocks of data points. The filtered components are divided, based on a chosen block-size value or a 'length of data points', as explained in conjunction with FIG. 4. Thereafter, a representative signal-level is calculated for each filtered component corresponding to the number of blocks of data points. Finally, the correlation values are computed of these calculated representative signal- levels, to obtain the correlation values of the filtered components of the test media content and each of the reference media content. The entire process of computing the correlation values of the filtered components of the test media content, and each of the reference media content, has already been explained in conjunction with FIG. 4. [095] At step 908, a subset of the plurality of reference media content is selected, based on the computed correlation values. To select the subset of reference media content, matching metrics are first calculated for each reference media content, based on the computed correlation values. The process of calculating matching metrics, based on correlation values, has already been explained in conjunction with FIG. 5.

[096] For one embodiment, only those reference media content are selected to form the subset that have a corresponding matching metric that is greater than a threshold value. For example, if there are six reference media content with matching metrics of 0.62, 0.54, 0.76, 0.86, 0.23 and 0.45, and the threshold value is 0.6, only the 'first', 'third' and 'fourth' reference media content is selected to form the subset of reference media content.

[097] For another embodiment, a predefined number of reference media content are eliminated from the plurality of reference media content, based on the calculated matching metrics and the remaining reference media content of the plurality of reference media content, are selected to form the subset. This embodiment is better understood with the help of the following example. Consider a case where there are six reference media content with matching metrics of 0.62, 0.54, 0.76, 0.86, 0.23 and 0.45. A condition may be set that 1/3 rd of the reference media content is eliminated after the calculation of the matching metrics. In this case, the two reference media content corresponding to the lowest matching metrics are eliminated from the six reference media content. Therefore, the reference media content corresponding to the matching metrics of 0.23 and 0.45 are eliminated, and the remaining four are selected to form the subset of reference media content. For one embodiment, the value of ' 1/3 rd' is preset by the programmer of the matching algorithm.

[098] At step 910, a block size is chosen from the predetermined set of block size values. For one embodiment, the chosen block size value is less than block size-value chosen in at least one previous stage. As explained earlier, the present method for matching the test media content with the plurality of reference media content is iterative in nature and the steps are performed recursively. Therefore, the block size values of previous stages are determined before a block-size value is chosen from the predetermined set of block-size values. The new block-size value is chosen, based on the 'previous' values. The step 910 is better understood with the help of the following example. Consider a case where the predetermined set of block-size values is 60 milliseconds, 120 milliseconds, 240 milliseconds, and 480 milliseconds. If block size values of '240 milliseconds' and ' 120 milliseconds' have already been chosen at previous stages, only the block-size value of '60 milliseconds' can be chosen at the next stage. The value of '480 milliseconds' cannot be chosen for the next stage since it is greater than 120 and 240 milliseconds.

[099] At step 912, each filtered component of the set of filtered components of the test media content and the set of filtered components of each reference media content of the selected subset of reference media content is divided to obtain a plurality of blocks of data points. The filtered components are divided, based on the chosen block-size value. This is better understood with the help of the following example. Consider a case when the selected subset of reference media content has media content and the chosen block-size value is 60 milliseconds. The filtered components of the test media content and the filtered components of the four reference media content are divided, based on the block- size value or 'the length of the block of data points' of 60 milliseconds, to obtain a number of blocks of data points. The division of the filtered components into a number of blocks of data points, based on the length of blocks of data points or the block-size value, has already been explained in conjunction with FIG. 4.

[0100] At step 914, a plurality of representative signal-levels corresponding to the plurality of blocks of data points obtained is computed. As mentioned earlier, the plurality of blocks of data points corresponds to the filtered components of the test media content and each reference media content of the selected subset of reference media content. The process of computing representative signal-levels corresponding to a plurality of blocks of data points has already been explained in conjunction with FIG. 4.

[0101] In FIG. 10, a matching metric is calculated between the test media content and each reference media content of the selected subset of reference media content at step 1002, based on the computed plurality of representative signal-levels. The process of calculating the matching metrics begins by first computing the correlation values of the filtered components of the test media content and each reference media content corresponding to the plurality of representative signal-levels. Thereafter, the matching metrics are calculated, based on the computed correlation values. The method for computing the correlation values and the matching metrics has already been explained in conjunction with FIGs. 4 and 5.

[0102] At step 1004, it is determined whether the test media content matches at least one reference media content of the selected subset of the plurality of reference media content, based on the matching metrics calculated. The process of determining a match begins by identifying the highest matching metric from the matching metrics calculated for each reference media content of the selected subset of reference media content. Thereafter, a set of reference media content is formed, based on the highest matching metric. The process of forming a set of reference media content, based on the highest matching metric, has already been explained in conjunction with FIGs. 7 and 8. Further, after the set is formed, it is determined whether the highest matching metric is greater than the upper threshold value. For one embodiment, the match is identified as the reference media content corresponding to the highest matching metric if the highest matching metric is greater than the upper threshold value. For another embodiment, before identifying the match, it is determined whether the difference between the highest matching metric and the second-highest matching metric is greater than a difference threshold value. If the difference is greater than the difference threshold value, and the highest matching metric is greater than the upper threshold value, the match is identified as the reference media content corresponding to the highest matching metric. The method mentioned above of determining whether the test media content matches at least one reference media content has been explained in detail in conjunction with FIGs. 7 and 8.

[0103] At step 1006, it is determined whether the block-size value chosen at the step 910 is the lowest block-size value of the predetermined set of block-size values. For example, if the predetermined set of block-size values is 480 milliseconds, 240 milliseconds, 120 milliseconds and 60 milliseconds, and the chosen block-size value is 60 milliseconds, it is determined that the chosen block- size value is the lowest block-size value in the predetermined set of block-size values. However, if the chosen block-size value is 120 milliseconds, it is determined that the block-size value chosen at the step 910 is not the lowest block size of the predetermined set of block-size values.

[0104] At step 1008, it is determined whether a match has been found between the test media content and at least one reference media content at the step 1004. This step is performed when it is established at the step 1006 that the chosen block-size value of the step 910 is not the lowest block-size value of the predetermined set of block-size values. For example, if the predetermined set of block-size values is 480 milliseconds, 240 milliseconds, 120 milliseconds and 60 milliseconds, and the chosen block-size value is 240 milliseconds, it is determined whether the test media content matches any reference media content. For one embodiment, if it is determined that no match has been found for the test media content at the step 1004, and the chosen block-size value is not the lowest block-size value of the predetermined set, the steps 906, 908, 910, 912, 914, 1002 and 1004 are performed again. In other words, the steps of computing, selecting, choosing, dividing, computing, calculating and determining are performed iteratively until a predefined criterion has been fulfilled. In the case mentioned above, the predefined criterion is either the step 1006 or the step 1008. Therefore, even if one of the steps 1006 or 1008 is fulfilled, the steps 906, 908, 910, 912, 914, 1002 and 1004 are not performed iteratively.

[0105] If the step 1006 is fulfilled, i.e., if the chosen block size value is the lowest block size value among the predetermined set of block-size values, and a match is determined between the test media content and the reference media content at the step 1004, the match determined at the step 1004 is declared the 'final' match for the test media content. However, if the step 1006 is fulfilled and no match is determined at the step 1004, this 'no match' is declared as the test media content. The declaration, in this case, can be 'No match available for the audio clip', if the test media content is in an audio clip. [0106] Similarly, if the step 1006 is not fulfilled and the step 1008 is, the match corresponding to the step 1004 is declared the 'final' match between the test media content and the plurality of reference media content. In other words, if the chosen block size is not the lowest block-size value of the predetermined set of block size values, but a match is found between the test media content and the plurality of reference media content, this match is declared to be the 'final' match. At step 1010, the method for matching the test media content with the plurality of reference media content is terminated.

[0107] FIGs. 11 and 12 is a flow diagram illustrating a method for matching a test media content with a plurality of reference media content, in accordance with a sixth embodiment of the present invention. Before the method for matching the test media content with the plurality of reference media content is initiated, the test media content and each reference media content of the plurality of reference media content needs to be stored in a memory. Only when all the media content is stored, can the method for matching the test media content begin.

[0108] To describe the method of FIGs. 11 and 12, reference is made to previous figures, although it should be understood that the described method can be implemented in any other suitable environment as well. Moreover, the invention is not limited to the order in which the steps are listed in the described method.

[0109] At step 1102, the method for matching the test media content with the plurality of reference media content is initiated. As explained in earlier figures, a media content is defined by a plurality of data points, with each data point of the plurality of data points representing a signal-level. The concept of 'data point' and 'signal-level' has been explained in detail in conjunction with FIG. 2. At step 1104, the test media content and the plurality of reference media content is passed through a set of filters to obtain a set of filtered components of the test media content and each reference media content. The process of the test media content and each of the reference media content passing through a set of filters has already been explained in FIG. 4.

[0110] At step 1106, a subset of filters is chosen from the set of filters mentioned above. The chosen subset of filters is different from the subset of filters chosen at any previous stage. For one embodiment, the chosen subset of filters is superset to a previous set of filters, chosen during at least one previous stage. Like the method explained in FIGs. 9 and 10, the method, as explained in FIGs. 11 and 12, is iterative in nature, and the steps are performed recursively until a predefined criterion is fulfilled. Hence, the term 'previous stage' is also used in this case. The predefined criterion and the steps included in the iteration are described later in the description.

[0111] To better understand the step 1106, consider a case when the set of filters includes six filters, for example, the set of filters can be 1, 2, 3, 4, 5 and 6. For one embodiment, the frequency range corresponding to these six filters is different, and at least one of these six filters is an all-pass filter. Further, if at a previous stage of the method, the set of filters 2, 3 and 4 was chosen to be the subset of filters, then at the present stage, the chosen subset should be a superset of the set 2, 3 and 4. For example, the chosen subset can be 2, 3, 4 and 5. However, if the step 1106 is performed for the first time, the chosen subset of filters can include any filter of the set of filters. For example, if the step 1106 is performed for the first time in the case mentioned above, the chosen subset of filters can be 2. In one embodiment, the subset of filters is chosen, based on the filter-activity value of each filter. The concept and calculation of the filter- activity value is explained in detail in FIG. 13.

[0112] At step 1108, the correlation values of a subset of filtered components of the test media content and a subset of filtered components of each reference media content of the plurality of reference media content are computed. For one embodiment, the subset of filtered components corresponds to the chosen subset of filters. This is better understood with the help of the following example. Consider a case when the set of filters includes six filters, 1, 2, 3, 4, 5 and 6. The test media content and the plurality of reference media content are passed through these six filters to obtain six filtered components of the test media content and each reference media content. The chosen subset of filters can be, for example, 2, 3, 4 and 5. In this case, only the correlation values of the filtered components corresponding to these four filters are computed. For example, only the filtered components of the test media content and the plurality of reference media content corresponding to the second, third, fourth and fifth filters are used to compute the correlation values. The filtered components corresponding to the first and sixth filters are not used to compute the correlation values in this case. The process of computing the correlation values of the filtered components of the test media content and the plurality of reference media content has already been explained in conjunction with FIG. 4.

[0113] At step 1110, a subset of the plurality of reference media content is selected, based on the computed correlation values. As explained in conjunction with FIGs. 9 and 10, the selection of the subset of the plurality of reference media content, based on the computed correlation values, begins at step 908 by first calculating the matching metrics corresponding to each reference media content. Thereafter, only the reference media content that forms the subset with a corresponding matching metric that is greater than a threshold value is selected. In another method for selecting the subset, a predefined number of reference media content is eliminated from the plurality of reference media content, based on the calculated matching metrics, and the remaining reference media content of the plurality of reference media content is selected to form the subset. The selection of the subset of the plurality of reference media content, based on the calculated matching metrics, has already been explained in detail in FIGs. 9 and 10.

[0114] At step 1112, the matching metrics corresponding to the reference media content of the selected subset of reference media content is identified. For example, if there are six reference media content in the selected subset of reference media content, and the corresponding matching metrics are 0.4, 0.5, 0.7, 0.2, 0.8 and 0.45, the values of the matching metrics are identified at the step 1112. At step 1114, it is determined whether the test media content matches at least one reference media content, based on the matching metrics identified. The process of determining a match for the test media content, based on identified matching metrics, has already been explained in detail in conjunction with FIGs. 7 and 8 and FIGs. 9 and 10.

[0115] In FIG. 12, it is determined at step 1202 whether the total number of distinct filters in the chosen subset of filters is greater than a predetermined number. This is better understood with the help of the following example. Consider a case where the set of filters is 1, 2, 3, 4, 5 and 6 and the predetermined number is five. A predetermined number of five means that the maximum number of total distinct filters chosen can be five. For example, if the chosen subset of filters is 2 and 4 at the first stage, 2, 4, and 5 at the second stage, and 2, 4, 5 and 6 at the third stage, then at the fourth stage there can be only one more filter that can be added to this set. Further, after the fourth stage no other filter can be added to this set, as the number of filters will be five, i.e. the maximum number of distinct filters that a chosen subset can have. In one embodiment, the predetermined number is set by the programmer of the matching algorithm designing a system for matching the test media content with the plurality of reference media content.

[0116] At step 1204, it is determined whether a match has been found between the test media content and at least one reference media content at the step 1114. This step is performed when it is established at the step 1202 that the total number of distinct filters in the chosen subset of filters is less than the predetermined number. For one embodiment, the steps 1106, 1108, 1110, 1112 and 1114 are performed again if it is determined that no match has been found for the test media content at the step 1114 and the total number of distinct filters in the chosen subset of filters is less than the predetermined number. In other words, the steps of choosing, computing, selecting, identifying and determining are performed iteratively until a predefined criterion is fulfilled. In the case mentioned above, the predefined criterion is either the step 1202 or the step 1204. Therefore, even if one of the steps 1202 or 1204 is fulfilled, the steps 1106, 1108, 1110, 1112 and 1114 are not performed iteratively.

[0117] If the step 1202 is fulfilled, i.e., if the total number of distinct filters in the chosen subset of filters is greater than the predetermined number, and a match is determined between the test media content and the reference media content in the step 1114, the determined match of the step 1114 is declared the 'final' match for the test media content. However, if no match is determined for the test media content at the step 1114, no match is declared for the test media content.

[0118] Similarly, if the step 1204 is fulfilled and the step 1202 is not, the match corresponding to the step 1114 is declared the 'final' match between the test media content and the plurality of reference media content. In other words, if a match is found for the test media content at the step 1114, the match is declared, irrespective of the fact that the total number of distinct filters in the chosen subset of filters is greater or less than the predetermined number. At step 1206, the method for matching the test media content with the plurality of reference media content is terminated.

[0119] FIG. 13 is a flow diagram illustrating a method for choosing a subset of filters from a set of filters, in accordance with an embodiment of the present invention. To describe FIG. 13, reference is made to previous figures, although it is understood that the described method can be implemented in any other suitable environment as well. Moreover, the invention is not limited to the order in which the steps are listed in the described method.

[0120] At step 1302, the method for choosing the subset of filters from the set of filters is initiated. For one embodiment, the number of filters in the set of filters is greater than four. For example, the number of filters in the set of filters can be 16. Further, at least one filter of the set of filters is an all-pass filter, and the frequency coverage or frequency range of each filter of the set of filters is based on the sampling frequency of a test media content. The concept of the 'all pass filter' and the selection of the frequency range of filters have already been explained in conjunction with FIG. 3.

[0121] At step 1304, the test media content and a plurality of reference media content are passed through the set of filters to obtain a set of filtered components of the test media content and each reference media content. The test media content and each of the reference media content passing through the set of filters has already been explained in detail in FIG. 4.

[0122] At step 1306, a set of representative signal-levels is calculated for each filtered component of the set of filtered components of the test media content. As explained in FIG. 4, the process of calculating the representative signal-level of a filtered component of the test media content begins by first dividing the filtered component into a predetermined number of blocks of data points. This predetermined number of blocks is based on the length of a block of data points or a block-size value. For one embodiment, the block-size value is preset by the programmer of the matching algorithm. The process of calculating the representative signal-level of the filtered component also includes estimating the representative signal-level corresponding to each of the predetermined number of blocks of data points. After the representative signal-level corresponding to each block of data points is estimated, the representative level corresponding to the entire filtered component is computed. For one embodiment, the representative signal-level corresponding to the entire filtered component is computed by averaging the representative signal-levels corresponding to all the blocks of data points.

[0123] At step 1308, the filter-activity values of each filter of the set of filters are calculated, based on the representative signal-levels of the filtered components of the test media content. For one embodiment, a filter-activity value of a filter signifies the relative signal strength of the test media content in the frequency range corresponding to the filter. This is better understood with the help of the following example. Consider a case where there are six filters in the set of filters and the corresponding filter activity values are 2, 7, 4, 6, 8 and 5 units. In this case, the test media content is expected to have the highest signal strength in frequency range corresponding to the fifth filter of the set of six filters.

[0124] To calculate the filter-activity value of a filter, the average representative signal-level of the test media content, above a 'noise level', is estimated first. The average representative signal-level corresponding to the noise level is estimated by identifying the representative signal-level of the 'quiet' periods of the test media content. For example, for a 120-second-long test media content, there is bound to be a period of at least a two or three seconds when the test media content does not have any audible sounds. The only sound during this time period is noise. The representative signal-level of this two- or three-second period denotes the representative signal-level corresponding to the noise level. Typically, the representative signal-level corresponding to the noise level is the lowest representative signal-level of the test media content.

[0125] For one embodiment, after the representative signal-level corresponding to the noise level is estimated, the filter activity value of a filter can be calculated as,

The equation set (1) denotes the formula for calculating the filter activity value, A(m) , of a filter 'm' of 'M' total filters. Here, N_x denotes the number of blocks of data points of the test media content X, S(m) denotes the mean of the representative signal-levels corresponding to the blocks of data points of a filtered component X (m), and Q{m) denotes the minimum representative signal-level of the representative signal-levels corresponding to X (m). The term ' a ^■ P(m) ' denotes the peak factor value of the filter 'm' and is used to estimate the 'relative' representative signal strength of X corresponding to the frequency range of the filter 'm', as compared to the other filters of the set of filters. For example, in the formula shown above, the filter 'm' is compared with the filters 'm-1 ' and 'm+1 '. The term P(m) denotes this comparison of 'm' with 'm-1 ' and 'm+1 '. For one embodiment, the value of ' a ' is 0.5 in the peak factor value a ^■ P(m) .

[0126] Those with ordinary skill in the art would appreciate that the formula for calculating the filter-activity value, as described in the last paragraph, is exemplary in nature, and the functioning of the present invention would not change even if a different formula is used to calculate the filter-activity value.

[0127] At step 1310, a subset of filters is chosen from the set of filters, based on the filter activity values calculated. For one embodiment, only those filters are chosen to form the subset of filters, which have a filter-activity value that is greater than a threshold value. For example, if six filters have filter activity values of 2, 7, 4, 6, 8 and 5.5 units and the threshold value is five units, only the 'second', 'fourth', 'fifth' and 'sixth' filters are chosen to form the subset of filters. For another embodiment, a predetermined number of filters are eliminated from the set of filters, and the remaining filters are chosen to form the subset of filters. For example, it can be set that ' l/3^rd' of the six filters are eliminated after the calculation of filter-activity values. In such a case, two filters corresponding to the lowest two filter-activity values are eliminated from the set of filters. Consequently, the 'first' and 'third' filters are eliminated from the set of filters and the remaining four filters form the subset of filters. At step 1312, the method for choosing the subset of filters from the set of filters is terminated.

[0128] FIG. 14 is a flow diagram illustrating a method for matching a test media content with a plurality of reference media content, in accordance with a seventh embodiment of the present invention. To describe FIG. 14, reference is be made to previous figures, although it should be understood that the described method can be implemented in any other suitable environment as well. Moreover, the invention is not limited to the order in which the steps are listed in the described method.

[0129] At step 1402, the method for matching the test media content with the plurality of reference media content is initiated. The test media content and each reference media content is defined by a plurality of data points, and each data point of the plurality of data points represents a signal-level. The concept of the 'data point' and 'signal-level' of media content has already been explained in conjunction with FIG. 2.

[0130] At step 1404, the test media content and the plurality of reference media content are divided into a plurality of blocks of data points. The media content are divided, based on the block-size value or the length of a block of data points. The block-size value can be, for example, 120 milliseconds. The concept of dividing specific media content, based on block-size value, has already been explained in conjunction with FIG. 4. At step 1406, the sampling frequency of the test media content is compared with a representative sampling frequency. For one embodiment, the representative sampling frequency is based on the sampling frequencies of the plurality of reference media content. The representative sampling frequency can be a mean of all the sampling frequencies corresponding to the plurality of reference media content. The step 1406 is better understood with the help of the following example.

[0131] Consider a case where there are four reference media content with corresponding sampling frequencies of 8 Kilohertz (KHz), 8.2 KHz, 8.1 KHz and 8.5 KHz. In this case, the representative sampling frequency can be, for example, 8 + 8.2 + 8.1 + 8.5/4, i.e., 8.2 KHz. After the calculation of the representative sampling frequency, the sampling frequency of the test media content is compared with the calculated value, and it is determined whether the sampling frequency of the test media content is greater than or less than the representative sampling frequency. For example, if the sampling frequency of the test media content is 8.1 KHz, it is determined to be less than the representative sampling frequency.

[0132] At step 1408, at least one of modifying the block size value, overlapping the blocks and spacing-out of blocks of the test media content or the reference media content is performed. This is done when it is determined at the step 1406 that the sampling frequency of the test media content is different from the representative sampling frequency. For the purpose of discussion, all the three processes will be explained one -by-one. To begin with modifying the block-size value, either the block- size value of the test media content is modified or the block-size value of the reference media content is modified. For one embodiment, the block-size value is modified, based on the difference between the sampling frequency of the test media content and the representative sampling frequency. This is better understood with the help of the following example.

[0133] Consider a case when the test media content has a sampling frequency of F_TEST and the reference media content has a sampling frequency ofF^ . An error in the sampling frequency can be denoted by Δ^ For one embodiment, the error can be, for example,

A _ " F* TEST - ^£F REF

F R₁EF

If the earlier block size value is B, the modified block size value can be, for example,

B' = B - (I + A_F) For one embodiment, a block size is always chosen as an integral value. Therefore, the term 'B. Δ_f ' is always modified according to the greater integer function. For example, if the value of B. Δ_f is 1.4, this value is taken as '1 '.

[0134] In one embodiment, when it is determined at step 1406 that the sampling frequency of the test media content is lower than the representative sampling frequency, the blocks of the test media content or the plurality of reference media content are overlapped. The amount of overlapping is based on the difference between the sampling frequency of the test media content and the representative sampling frequency. Similarly, when it is determined at step 1406 that the sampling frequency of the test media content is greater than the representative sampling frequency, the blocks of the test media content or the plurality of reference media content are spaced out. As in the case of overlapping, the spacing-out of blocks is based on the difference between the sampling frequency of the test media content and the representative sampling frequency. The concept of 'overlapping' and 'spacing-out' is made clear by the illustration shown in FIG. 16.

[0135] At step 1410, it is determined whether the test media content matches at least one reference media content, based on the modified block-size value. The method followed for determining a match between the test media content and the plurality of reference media content for the modified block-size value is the same as the method for determining the initial block-size value. Therefore, after the block-size value is modified, the method, as described in FIG. 4, is followed to match the test media content with the plurality of reference media content. At step 1412, the method for matching the test media content with the plurality of reference media content is terminated.

[0136] FIG. 15 illustrates the modification of the block size with varying sampling frequencies, in accordance with an embodiment of the present invention. Typically, when test media content is an audio clip, its sampling frequency is 8 KHz. Any audio media content deviating from this value of sampling frequency is assumed to have an error, Δ^, as explained in conjunction with FIG. 14. This error of sampling frequency is also reflected in the block size of the media content. In other words, the block size of media content differs from its 'normal' size when the sampling frequency of the media content deviates from the value of 8 KHz.

[0137] As shown in FIG. 15, 'case 1 ' depicts 'normal' media content with a sampling frequency of 8 KHz. The corresponding blocks are shown as 'block 0', 'block 1 ' and 'block 3'. However, if the sampling frequency of the media content is less than 8 KHz, the corresponding block size is reduced from its normal size. This is shown as 'case 2' in FIG. 15, with the corresponding 'faulty' blocks being 'block' 1 ', 'block' 2' and 'block' 3'. To rectify this error in the block size, the faulty block size must be modified according to the method explained in FIG. 14.

[0138] If the sampling frequency of the media content is greater than 8 KHz, the corresponding block size is increased from the normal block size. This is shown as 'case 2' in the FIG. 15, with the corresponding blocks being 'block" Y, 'block" 2' and 'block" 3'. Like 'case 1 ', in this case, to rectify the error in block size, the 'faulty' block size must be modified according to the method explained in FIG. 14.

[0139] FIG. 16 illustrates the overlapping and spacing-out of blocks of varying sampling frequencies, in accordance with an embodiment of the present invention. As explained, in FIG. 15, 'normal' media content has sampling frequency of 8 KHz. This is shown as 'case 1 ' in FIG. 16. The corresponding blocks are shown as 'Block 0', 'Block 1 ' and 'Block 2'.

[0140] If the sampling frequency of a media content, especially an audio media content, is different from 8 KHz, the media content is said to have an error in sampling frequency. As illustrated in 'Case 2' and 'Case 3' of FIG. 16, the sampling frequency can be less than or greater than 8 KHz. When the sampling frequency is less than 8 KHz, as shown in 'Case 2', the blocks of the media content are overlapped over each other. The amount of overlapping is based on the error in sampling frequency. For example, if a media content has sampling frequency of 7.9 KHz, it is said to have an error of 0.1 KHz. The corresponding amount of overlapping will be based on the value of 0.1 KHz. [0141] On the other hand, when the sampling frequency of the media content is greater than 8 KHz, as shown in 'Case 3', the blocks of the media content are spaced out of each other. Similar to overlapping, the amount of spacing-out is based on the error in the sampling frequency of the media content. For example, if a media content has sampling frequency of 8.2 KHz, the amount of spacing-out will be based on the value of 0.2 KHz.

[0142] FIG. 17 illustrates a block diagram of an exemplary server 1700, in accordance with an embodiment of the present invention. Those skilled in the art would appreciate that the server 1700 may include all or some of the components shown in FIG. 17. Further, those ordinarily skilled in the art would understand that the server 1700 may include additional components that are not shown here since they are not germane to the operation of the server 1700, in accordance with the inventive arrangements. To describe the server 1700, reference is made to the previous figures, although it is understood that the server 1700 can be used in any other suitable environment or network.

[0143] The server 1700 can include a set of filters 1702 and a processor 1704. For one embodiment, the number of filters in the set of filters 1702 can be greater than four. For example, the number of filters in the set of filters 1702 can be 16. Further, at least one filter of the set of filters 1702 has zeroes at 0 Hertz (Hz) and a predetermined frequency value. For one embodiment, the predetermined frequency value is half the sampling frequency of a test media content. The test media content can be, for example, a recorded audio clip or an audio message. The concept of the 'at least one filter' mentioned above is better understood with the help of the following example. Consider a case where the test media content is an audio clip and has a sampling frequency of 8000 Hertz (Hz). In this case, the predetermined frequency value can be, for example, 4000 Hz, i.e., 8000/2. Therefore, corresponding to the frequency value of 4000 Hz, at least one filter of the set of filters 1702 has zeroes at 0 Hz and 4000 Hz. Practically, this filter acts as an all-pass filter. Furthermore, the frequency range corresponding to the rest of the filters is also selected, based on the sampling frequency of the test media content. The selection of the frequency range of a filter, based on the sampling frequency of the media content, has already been explained in FIG. 3.

[0144] The set of filters 1702 can be configured to generate a set of filtered components of the test media content and a set of filtered components of each of the plurality of reference media content. The set of filtered components is obtained when the test media content and the plurality of reference media content are passed through the set of filters 1702. This is better understood with the help of the following example. Consider a case when the set of filters 1702 has 16 filters and the test media content and each reference media content is passed through them. After the media content is passed through the filters, a set of 16 filtered components of the test media content and each reference media content is obtained. The entire process of passing media content through the set of filters 1702, to obtain the filtered components of the media content, has already been explained in detail in FIG. 4.

[0145] When the test media content and each reference media content is passed through the set of filters 1702, the filtered components thus obtained are send to the processor 1704. For one embodiment, the processor 1704 can include a calculator 1706, a matching engine 1708 and a selector 1710. The calculator 1706 can be configured to compute a plurality of representative signal-levels corresponding to a plurality of blocks of data points of a filtered component of a media content. The process for calculating the representative signal-levels corresponding to a filtered component has already been explained in conjunction with FIG. 3. For one embodiment, the functioning of the calculator 1706 is the same as the functioning of the representative signal-level calculator 304, as explained in FIG. 3.

[0146] The calculator 1706 can be configured to compute the correlation values of the computed plurality of representative signal-levels of the filtered components of the test media content and each reference media content. The process followed by the calculator 1706 to compute the correlation values, based on the computed representative signal-levels, is the same as that described at the step 410 of FIG. 4.

[0147] For one embodiment, the calculator 1706 can be configured to compute a matching metric for each reference media content of the plurality of reference media content. The matching metrics are computed, based on the computed correlation values of the filtered components of the test media content and the plurality of reference media content. The process of calculating the matching metrics, based on correlation values, has already been explained in conjunction with FIG. 5.

[0148] The calculator 1706 can be configured to compute the filter-activity value of each filter of the set of filter 1702. As explained in conjunction with FIG. 13, the filter-activity value of a filter denotes the signal strength of the test media content in a frequency range corresponding to the filter. For example, a filter activity value of 'five', corresponding to a filter 'm', signifies more signal strength, as compared to a filter-activity value of 'three'. The concept of filter-activity value and the process of computing it, followed by the calculator 1706, has already been explained in conjunction with FIG. 13.

[0149] As mentioned earlier, the processor 1704 also includes the matching engine 1708. The matching engine 1708 can be configured to determine whether the test media content matches at least one reference media content of the plurality of reference media content. For one embodiment, the matching engine 1708 determines a match for the test media content, based on the computed matching metrics for the plurality of reference media content. The process followed by the matching engine 1708 to determine the match for the test media content is the same as that explained in FIG. 6 and FIGs. 7 and 8.

[0150] The processor 1704 includes the selector 1710, which can be configured to perform a plurality of functions. One of these functions can be, for example, choosing a block-size value from a predetermined set of block-size values. For example, if the predetermined set of block-size values is 480 milliseconds, 120 milliseconds and 60 milliseconds, the selector 1710 can choose the block-size value of 120 milliseconds from the set. Another function of the selector 1710 can be, for example, selecting a subset of the reference media content from the plurality of reference media content. For one embodiment, the subset of the reference media content is selected, based on a plurality of correlation values computed by the calculator 1706. As explained at the step 908 of FIG. 9, to select the subset of reference media content, matching metrics are first calculated for each reference media content, based on the computed correlation values. Thereafter, only those reference media content are selected to form the subset that satisfies a predefined criterion. The predefined criterion can be, for example, a minimum threshold matching metric or the elimination of a certain number of reference media content, based on their matching metrics. The entire process of selecting the subset of reference media content, based on their matching metrics, has already been explained in conjunction with FIG. 9.

[0151] The selector 1710 can also be configured to choose a subset of filters from the set of filters 1702, based on the filter-activity values computed by the calculator 1706. For one embodiment, only these filters are chosen to form the subset of filters, which have a filter activity value that is greater than a threshold value. For example, consider a case where the set of filters 1702 includes six filters, and the corresponding filter- activity values are 3, 5, 1, 6, 7 and 8 units. The threshold value can be assumed to be, for example, four. In this case, only the 'second', 'fourth', 'fifth' and 'sixth' filters are chosen to form the subset of filters.

[0152] The selector 1710 can also be configured to identify the offset value of specific reference media content, based on a plurality of correlation values. As explained in FIG. 4 and FIG. 5, the offset value of a reference media content corresponds to a 'lag value', and the lag value yielding the maximum correlation value for the reference media content is the 'identified' offset value. The entire process followed by the selector 1710 to identify the offset value, based on the calculated correlation values, is the same as the method explained in FIG. 5.

[0153] Apart from the components described above, the server 1700 can also include a memory 1712. The memory 1712 can be configured to store the test media content and the plurality of reference media content before the method for determining a match for the test media content is initiated. The memory 1712 can also be configured to store the filtered components of the test media content and the reference media content obtained when the media content are passed through the set of filters 1702. For one embodiment, the memory 1712 can also be configured to store the plurality of representative signal-levels, the correlation values and the matching metrics corresponding to each reference media content.

[0154] FIG. 18 illustrates a block diagram of an exemplary server 1800, in accordance with another embodiment of the present invention. Those skilled in the art would appreciate that the server 1800 may include all or some of the components shown in FIG. 18. Further, those ordinarily skilled in the art would understand that the server 1800 may include additional components that are not shown here since they are not germane to the operation of the server 1800, in accordance with the inventive arrangements. To describe the server 1800, reference is made to previous figures, although it should be understood that the server 1800 can be used in any other suitable environment or network.

[0155] The server 1800 can include a comparator 1802, a processor 1804 and a matching engine 1806. For one embodiment, all these components of the server 1800 are connected to each other, as shown in FIG. 18. For example, the comparator 1802 can be connected to the processor 1804 and the processor 1804 to the matching engine 1806.

[0156] For one embodiment, the comparator 1802 can be configured to compare the sampling frequency of the test media content with a representative sampling frequency. As explained in FIG. 14, the representative sampling frequency is based on the sampling frequencies of a plurality of reference media content. For example, the representative sampling frequency can be the mean of all the sampling frequencies corresponding to the plurality of reference media content. In this case, the comparator 1802 compares the sampling frequency of the test media content with the mean of the sampling frequencies of the plurality of reference media content. This is better understood with the help of the following example. Consider a case where there are five reference media content with corresponding sampling frequencies of 8030 Hertz (Hz), 8000 Hz, 7960 Hz, 8000 Hz and 8010 Hz. The representative sampling frequency, in this case, will be 8000 Hz. Further, if the test media content has a sampling frequency of 8050 Hertz (Hz), the comparator 1802 compares the value of 8050 Hz with the value of 8000 Hz. In this case, the comparator 1802 establishes that the sampling frequency of the test media content is greater than the representative sampling frequency by 50 Hz. This value of '50 Hz' is then passed to the processor 1804.

[0157] For one embodiment, the processor 1804 can be configured to modify the block-size value of either the test media content or the plurality of reference media content when it is established by the comparator 1802 that the sampling frequency of the test media content is different from its representative sampling frequency. As explained in FIG. 14, before comparing the sampling frequency of the test media content with the representative sampling frequency, the test media content and the plurality of reference content are divided into a plurality of blocks of data points that correspond to the block-size value or a 'length of block of data points'. For one embodiment, this division of the test media content and the plurality of reference media content is performed by the processor 1804. However, if it is determined that the sampling frequency of the test media content is different from its representative sampling frequency, the processor 1804 modifies the block-size value of the test media content or the reference media content and divides the corresponding media content again, based on the modified block-size value.

[0158] For one embodiment, the block-size value is modified, based on the difference between the sampling frequency of the test media content and the representative sampling frequency. Continuing with the earlier example, the processor 1804 modifies the block-size value, based on the value of '50 Hz'. The entire process of modifying the block-size value of a media content, based on the difference in the sampling frequencies, has been explained in detail in the FIG. 14.

[0159] As mentioned earlier, apart from the comparator 1802 and the processor 1804, the server 1800 includes the matching engine 1806. This matching engine 1806 can be configured to determine whether the test media content matches at least one reference media content of the plurality of reference media content, based on the modified block-size value. As explained in FIG. 14, the method followed by the matching engine 1806 to determine a match between the test media content and the plurality of reference media content, based on the modified block-size value, is the same as the method described in FIG. 4. For one embodiment, the functionality of the matching engine 1806 is the same as the functionality of the matching engine 1708, as described in FIG. 17.

[0160] FIG. 19 illustrates the frequency response of a plurality of filters, in accordance with an embodiment of the present invention. In the figure, the frequency response of 16 filters is shown. The frequency range corresponding to the all-pass filter is 0 Hz to 4000 Hz. This is based on the assumption that the sampling frequency of the test media content is 8000 Hertz (Hz). As explained in FIG. 17, for one embodiment, the frequency range of the all-pass filter is half of the sampling frequency of the test media content. Therefore, the frequency range corresponding to the all-pass filter is 0 Hz to 4000 Hz, i.e., 8000/2.

[0161] As explained in FIG. 3, the frequency range of all the filters is selected, based on the expected sampling frequency of the test media content. Typically, the frequency range of the filters is selected to cover the entire frequency range of test media content. Further, the selected frequency range is above the noise or reverberation level of the test media content. Based on the simulation results for a sampling frequency of 8000 Hz, the noise level of the test media content is determined to be 400 Hz. Therefore, the frequency range selected for the filters is greater than 400 Hz. Furthermore, based on the simulation results, 16 filters are found to be the optimum number of filters to design a matching algorithm for the test media content and a plurality of reference media content.

[0162] All the 16 filters shown in FIG. 19 have common zeroes at 0 and π radians. Further, except for the all-pass filter, all the other filters have zeroes as well as poles. The all-pass filter only has zeroes and no poles. For one embodiment, the filters , H(z), can be represented as

H₀(z) = G₀ - 5(z), where, (1)

5(Z) = (I - Z ² ) 1)

The equation (1) corresponds to the all-pass filter. The gains G_m, the normalized frequencies f_m and the pole magnitudes r_m are chosen to achieve unit peak gain, an adequate frequency coverage and an adequate overlap between the filters. Based on the simulation results for the sampling frequency of the test media content of 8000 Hz, the adequate frequency coverage is found to be 400 Hz to 3400 Hz and the adequate overlap to be -6 dB.

[0163] FIG. 20 illustrates a percentage of correct matches with varying block sizes, in accordance with an embodiment of the present invention. The selection of a block size has an overall effect on the matching algorithm designed to match a test media content with a plurality of reference media content. A lower block-size value gives more accurate results, and hence, results in an enhanced performance of the matching algorithm. However, in the simulation results, it was observed that below a certain block size, the performance of the matching algorithm actually decreases. This could be due to the fact that as the block size is reduced, the noise and reverberation begins affecting the performance of the matching algorithm.

[0164] Based on the experiments, it was observed that there is an improvement in the performance of the matching algorithm when the block size is reduced, but the performance saturates beyond a certain block size. FIG. 20 shows the results of the experiment. The figure shows the percentage of correct matches with decreasing block sizes. As is clearly visible, the percentage of correct matches saturates below a block size of 100 milliseconds.

[0165] FIG. 21 illustrates the percentage of wrong matches with varying block sizes, in accordance with an embodiment of the present invention. As already explained in FIG. 20, beyond a certain block size, the performance of the matching algorithm saturates. Experiments were conducted to determine the optimum block size for the matching algorithm. FIG. 20 shows the percentage of correct matches with decreasing block sizes, whereas FIG. 21 shows the percentage of wrong matches with decreasing block sizes. As is apparent, the performance of the matching algorithm saturates beyond a block size of 100 milliseconds.

[0166] Based on the graphs shown in FIG. 20 and FIG. 21, the optimum block- size value is determined to be 60 milliseconds to 80 milliseconds. Further, it can be observed from the graphs that the number of correct matches is small for larger block sizes. This signifies that although the performance is poor for larger block size values, the complexity of the matching engine is considerably reduced. This fact forms the basis of the method for matching the test media content with the plurality of reference media content, as described in FIGs. 9 and 10.

[0167] Various embodiments, as described above, provide a method and system for matching media content. The present invention provides a matching algorithm, which provides enhanced accuracy while determining a match for a test media content, as compared to conventional matching algorithms. Further, the present matching algorithm is less complex than conventional matching algorithms, and can, therefore, be used to improve the efficiency of servers that implement it. With improved efficiency, more matching requests can be simultaneously served by the servers.

[0168] It will be appreciated that embodiments of the invention described herein may comprise one or more conventional processors and unique stored program instructions that control the one or more processors, to implement, in conjunction with certain non-processor circuits, some, most or all of the functions of the embodiments of the invention described herein. The non-processor circuits may include, but are not limited to, a radio receiver, a radio transmitter, signal drivers, clock circuits, power- source circuits and user-input devices. Some or all of the functions can be implemented by a state machine that has no stored program instructions, or in one or more application-specific integrated circuits (ASICs), in which each function or some combinations of certain of the functions are implemented as custom logic. Of course, a combination of these approaches could be used. Thus, means for these functions have been described herein. In situations where functions of the embodiments of the invention can be implemented by using a processor and stored program instructions, it will be appreciated that a means for implementing such functions is the media that stores the stored program instructions, be it magnetic storage or a signal conveying a file. Further, it is expected that one with ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology and economic considerations, when guided by the concepts and principles disclosed herein, will be readily capable of generating such stored program instructions and ICs with minimal experimentation.

[0169] In the foregoing specification, the invention and its benefits and advantages have been described with reference to specific embodiments. However, one with ordinary skill in the art would appreciate that various modifications and changes can be made, without departing from the scope of the present invention, as set forth in the claims. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present invention. The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage or solution to occur or become more pronounced are not to be construed as critical, required or essential features or elements of any or all the claims. The invention is defined solely by the appended claims, including any amendments made during the pendency of this application, and all equivalents of those claims, as issued.

[0170] The Abstract of the disclosure is provided to comply with 37 C.F.R. §1.72(b), which requires an abstract that would enable a reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, it can be seen in the foregoing Detailed Description that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than those that are expressly recited in each claim. On the contrary, as the following claims reflect, the inventive subject matter lies in less than all the features of a single disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.

Claims

CLAIMS We claim:

1. A method for matching a test media content with a plurality of reference media content, wherein a media content is defined by a plurality of data points, and wherein each data point of the plurality of data points represents a signal-level, the method comprising: passing the test media content and the plurality of reference media content through a set of filters to obtain a set of filtered components of the test media content and a set of filtered components of each reference media content of the plurality of reference media content; dividing each filtered component of the set of filtered components of the test media content and each filtered component of the set of filtered components of each reference media content to obtain a plurality of blocks of data points for each filtered component of the set of filtered components of the test media content and each filtered component of the set of filtered components of each reference media content; computing a plurality of representative signal-levels corresponding to the plurality of blocks of data points for each filtered component of the set of filtered components of the test media content and the plurality of blocks of data points for each filtered component of the set of filtered components of each reference media content; computing correlation values between the plurality of representative signal-levels corresponding to the plurality of blocks of data points for each filtered component of the set of filtered components of the test media content and the plurality of representative signal-levels corresponding to a plurality of blocks of data points for a plurality of offset versions of at least one filtered component of at least one reference media content; and determining whether the test media content matches with at least one reference media content of the plurality of reference media content based on the computed correlation values.

2. The method as recited in claim 1 wherein the representative signal-level of a block of data points is obtained by computing an energy value for the block of data points.

3. The method as recited in claim 1 wherein determining whether the test media content matches with at least one reference media content comprises identifying an offset value for a reference media content, wherein the identified offset value is chosen based on the computed correlation values for the filtered components of the reference media content and the filtered components of the test media content.

4. The method as recited in claim 3 wherein the identified offset value corresponds to an offset version of a filtered component of the reference media content having a maximum correlation value from the correlation values computed for the filtered components of the reference media content and the filtered components of the test media content.

5. The method as recited in claim 3 wherein determining whether the test media content matches with at least one reference media content further comprises: determining a matching metric for the reference media content based on a set of correlation values computed for the identified offset value; and identifying the match between the test media content and the plurality of reference media content based on matching metrics computed for each reference media content.

6. The method as recited in claim 5 wherein the matching metric is a probability metric.

7. The method as recited in claim 5 wherein identifying the match between the test media content and the plurality of reference media content based on the computed matching metrics comprises selecting a reference media content from the plurality of reference media content corresponding to a highest matching metric selected from the computed matching metrics.

8. The method as recited in claim 5 wherein identifying the match between the test media content and the plurality of reference media content based on the computed matching metrics comprises: identifying a reference media content corresponding to each of a highest matching metric and a second-highest matching metric selected from the computed matching metrics; determining whether the highest matching metric is greater than a first threshold value; determining whether a difference between the highest matching metric and the second-highest matching metric is greater than a second threshold value; and declaring the match between the test media content and the plurality of reference media content when the highest matching metric is greater than the first threshold value and the difference between the highest matching metric and the second-highest matching metric is greater than the second threshold value, wherein the match is the reference media content corresponding to the highest matching metric.

9. The method as recited in claim 5 wherein identifying the match between the test media content and the plurality of reference media content based on the computed matching metrics comprises: identifying a reference media content corresponding to a highest matching metric selected from the computed matching metrics; determining whether the highest matching metric is greater than a matching threshold value; and declaring the match between the test media content and the plurality of reference media content when the highest matching metric is greater than the matching threshold value, wherein the match is the reference media content corresponding to the highest matching metric.

10. The method as recited in claim 5 wherein identifying the match between the test media content and the plurality of reference media content based on the computed matching metrics comprises: determining a highest matching metric from the matching metrics computed for each reference media content of the plurality of reference media content; forming a first set of reference media content from the plurality of reference media content based on the highest matching metric; and forming a second set of reference media content from the plurality of reference media content, wherein the second set comprises the reference media content not included in the first set of reference media content.

11. The method as recited in claim 10 wherein forming the first set of reference media content comprises: determining one or more matching metrics greater than a matching- metric threshold value, wherein the probability-metric threshold value is based on the highest matching metric; and forming the first set of reference media content corresponding to the determined one or more matching metrics.

12. The method as recited in claim 10 wherein forming the first set of reference media content comprises: determining a set of difference values between the highest matching metric and the matching metric for each of the reference media content of the plurality of reference media content; selecting one or more difference values less than a set-threshold from the determined set of difference values; identifying one or more matching metrics corresponding to the selected one or more difference values; and forming the first set of reference media content corresponding to the identified one or more matching metrics.

13. The method as recited in claim 10 wherein identifying the match between the test media content and the plurality of reference media content based on the computed matching metrics further comprises: selecting a first matching metric corresponding to the first set of reference media content; selecting a second matching metric corresponding to the second set of reference media content; determining whether the first matching metric is greater than an upper threshold value; determining whether a difference between the first matching metric and the second matching metric is greater than a difference threshold value; and declaring the match between the test media content and the plurality of reference media content when the first matching metric is greater than the upper threshold value and the determined difference is greater than the difference threshold value, wherein the match is the reference media content corresponding to the first matching metric.

14. The method as recited in claim 13 wherein the first matching metric is selected from the group consisting of: the highest matching metric of the one or more matching metrics corresponding to the first set of reference media content, a mean value of the one or more matching metrics corresponding to the first set of reference media content, and a median value of the one or more matching metrics corresponding to the first set of reference media content.

15. The method as recited in claim 13 wherein the second matching metric is a highest matching metric of one or more matching metrics corresponding to the second set of reference media content.