WO2007020544A2 - Method and apparatus for extracting feature information from a multimedia file - Google Patents
Method and apparatus for extracting feature information from a multimedia file Download PDFInfo
- Publication number
- WO2007020544A2 WO2007020544A2 PCT/IB2006/052588 IB2006052588W WO2007020544A2 WO 2007020544 A2 WO2007020544 A2 WO 2007020544A2 IB 2006052588 W IB2006052588 W IB 2006052588W WO 2007020544 A2 WO2007020544 A2 WO 2007020544A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- multimedia file
- event
- analysis window
- data
- occurrence
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 22
- 238000004458 analytical method Methods 0.000 claims abstract description 35
- 239000013598 vector Substances 0.000 claims description 22
- 238000000605 extraction Methods 0.000 claims description 9
- 238000012935 Averaging Methods 0.000 claims 3
- 238000004590 computer program Methods 0.000 claims 1
- 238000001914 filtration Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 4
- 239000008187 granular material Substances 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/632—Query formulation
- G06F16/634—Query by example, e.g. query by humming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/638—Presentation of query results
- G06F16/639—Presentation of query results using playlists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Definitions
- the present invention relates to a method and apparatus for extracting feature information from a multimedia file.
- the feature information extracted may be used to classify the multimedia file.
- it relates to identifying audio files (songs) from a large collection of songs which are similar to a seed song to assist a user in compiling playlists that are similar to a given song.
- the features of the audio files are extracted by means of a feature extraction algorithm. These algorithms are generally very expensive in terms of required computational power, especially when integrated in consumer devices.
- these known algorithms analyse the whole audio file and extract features (e.g. EFCC coefficients) of audio files in frames or chunks of data at a time.
- the process of extraction is a very time consuming one because it needs complex operations to be performed for all frames.
- These algorithms process the audio file from begin to end, extracting a feature vector in respect of each frame (known as a local feature vector). These local feature vectors are collated for the whole file and are averaged. In this way, the average feature vector represents the song being analysed.
- the average feature vector f for M features has the form
- each music track or audio file can be classified as belonging to a given music genre.
- the distance between two such average feature vectors is an indication of how similar the corresponding songs are.
- J 1 and f j be the average feature vectors corresponding to the i -th and j -th song, respectively and let N be the total number of items in the database.
- a distance of zero means that both songs are equal (actually, that they have the same average features); a small distance indicates that they are similar songs (actually, that they have similar average features) whereas a large distance indicates that the songs are not related.
- the problem here is to choose the appropriate part of the song, which is representative for the song as a whole.
- audio files are stored in a compressed data format, for example MPEG-I (MP3), MPEG-2 or MPEG-4 AAC to maximise available storage capacity of portable devices used for playback.
- MPEG-I MP3
- MPEG-2 MPEG-2
- MPEG-4 AAC MPEG-4 AAC
- the present invention aims to reduce the amount of processing required to extract feature information from a multimedia file.
- This is achieved according to an aspect of the present invention comprising a method for extracting feature information from a multimedia file, the method comprising the steps of: determining the location of an analysis window of the multimedia file in accordance with occurrence of an event within the multimedia file; and extracting the feature information from data within the analysis window. Since the occurrence of an event is determined, such as, for example the maximum energy of the data of the multimedia file or the first occurrence of the signal amplitude of the data of the multimedia file exceeding a predetermined threshold, the chosen region of the file for analysis is more likely to comprise a relevant portion which takes into consideration the content of the file. In extracting the feature information in this way an average feature vector is produced which is representative of the file and prevent analysis taking place in regions with low amplitude.
- the event is the maximum global gain.
- the creation of the analysis window can be easily established by merely parsing the header of the frames and reading the global gain values without decoding the whole file which speeds up the extraction of the feature information and reduces the computational resources required.
- Figure 1 a is a flow diagram of a first embodiment of the present invention
- Figure Ib is a flow diagram of a second embodiment of the present invention.
- Figure 2 is a schematic diagram of an example of frame structure of compressed data of an audio file
- Figure 3 illustrates determination of an analysis window according to an embodiment of the present invention.
- Figure 4 is a schematic diagram of apparatus according to a further embodiment of the present invention.
- step 101 the duration of an input multimedia file, t 0 , is tested. If the input multimedia file is shorter than a predetermined time interval, t ls for example 15 seconds, an error is generated, step 103, since this file is considered too short for feature extraction. If the input file is greater than t ls the method proceeds to step 105 for feature extraction.
- step 105 the duration of the input multimedia file, t 0 , is tested again. If the duration of the input file, t 0 , is less than the predetermined duration of an analysis window, t 2 , for example 90 seconds, the entire input file becomes the analysis window, step 107. If the duration of the input file is greater than the predetermined duration of an analysis window, t 2 , the input file is scanned at step 109 for the maximum signal amplitude or acoustic energy level of the acoustic signal of the data contained in the multimedia file. The location a ls of the maximum signal amplitude, energy level, etc., of the acoustic signal is determined at step 111.
- the input file is then scanned again, step 113, to establish the extremes of the analysis window a 2 .
- the location of the analysis window is then determined at step 117.
- the analysis window may be located between ai and a 2 or, alternatively, the analysis window may be centred at ai between locations + a 2 .
- step 119 the feature vectors are extracted from the data within the analysis window.
- the extracted feature vectors are averaged, step 121, and stored, step 123, in a feature database for later reference.
- Fig. Ib illustrates an alternative, preferred embodiment of the present invention.
- the method according to this embodiment follows the steps in respect of the first embodiment of Fig. Ia except that step 111 of Fig. Ia is replaced by steps 111 1, 111 3 and 111 5 shown in Fig. Ib.
- a compressed MP3 audio file comprises a plurality of frames 200 1 to 20O n (only frames 200 1 to 200 4 are shown in Fig.2). Each frame comprises a header portion 201 1 to 201_n, an error check portion 203 1 to 203_n, side information portion 205 1 to 205_n, and main data portion 207 1 to 207_n.
- the side information portions 205 1 to 205_n comprise a plurality of granules of left and mid channels 209L0 1 , 209L1 1 , 209R0 1 and 209R1 1. Each granule contains a global gain value.
- the global gain values provided in each frame is read, step 111 1.
- the global gain values are then filtered, step 111 2, for example, using a moving average filter with a depth of 100 granules.
- the location of the maximum global gain value is determined to provide ai of the analysis window, step 111 3.
- Fig. 3 illustrates a plot of the amplitude of the acoustic signal of an uncompressed audio file over time.
- ai of the analysis window is determined as the first occurrence of the signal amplitudes exceeding a predetermined threshold value S T .
- the analysis window W is then determined as starting at ai for a subsequent time interval t 2 , say 90 seconds.
- the apparatus 400 comprises a pre-processor 401 connected to an input terminal 402 of the apparatus 400.
- the pre-processor 401 is connected to a processor 403.
- the output of the processor 403 is connected to a comparator 409 and a feature database 405.
- the output of the comparator 409 and the feature database 405 are connected to a multimedia file store 407.
- the output of the file store 407 is connected to the output terminal 408 of the apparatus 400.
- a multimedia file is input on the input terminal 402.
- the pre-processor 401 scans the input file to determine the location of an analysis window according to the steps 101 to 117 (and 125) of Fig. Ia.
- the input file and the location of the analysis window are fed to the processor 403.
- the processor 403 executes the feature extraction algorithm in accordance with the steps 119 to 123 of Fig. Ia.
- the feature vectors extracted from the data within the analysis window are averaged and stored in the feature database 405.
- the multimedia file is stored in the file store 407.
- the feature vectors are extracted and averaged as described above and are fed to the input of the comparator 409 whereupon the feature vectors of the seed song and the candidate songs stored in the feature database 405 are compared.
- the comparator 409 determines the distance between the seed song and each candidate song.
- the candidate songs considered "similar" to the seed sing are then selected from the file store 407 and placed on the output terminal 408 of the apparatus 400 to be forwarded to a user interface device or playlist generator (not shown here) for consideration by the user.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Feature information from a multimedia file is extracted from an analysis window. To ensure that feature information is extracted from a relevant portion of the file, the location of an analysis window is determined according to the occurrence of an event, steps 101 to 117.
Description
Method and apparatus for extracting feature information from a multimedia file
TECHNICAL FIELD
The present invention relates to a method and apparatus for extracting feature information from a multimedia file. The feature information extracted may be used to classify the multimedia file. In particular, but not exclusively, it relates to identifying audio files (songs) from a large collection of songs which are similar to a seed song to assist a user in compiling playlists that are similar to a given song.
BACKGROUND OF THE INVENTION
Many systems exist that assist users in compiling playlists. Many such systems compare all available songs to a seed song. To achieve this all candidate songs in the collection are analysed and classified in accordance with a number of extracted features. These extracted features are stored in a database. The corresponding features of a seed song are extracted and compared to those features stored in the database. The matching features identified in the database point to the matching/similar candidate songs in the collection and, if desired, this can be added to the playlist.
The features of the audio files (songs) are extracted by means of a feature extraction algorithm. These algorithms are generally very expensive in terms of required computational power, especially when integrated in consumer devices.
Invariably, these known algorithms analyse the whole audio file and extract features (e.g. EFCC coefficients) of audio files in frames or chunks of data at a time. The process of extraction is a very time consuming one because it needs complex operations to be performed for all frames. These algorithms process the audio file from begin to end, extracting a feature vector in respect of each frame (known as a local feature vector). These local feature vectors are collated for the whole file and are averaged. In this way, the average feature vector represents the song being analysed. The average feature vector f for M features has the form
f = [f\f2,...,fM]
With a well-chosen set of features, each music track or audio file can be classified as belonging to a given music genre. The distance between two such average feature vectors is an indication of how similar the corresponding songs are. To be more specific, let J1 and f j be the average feature vectors corresponding to the i -th and j -th song, respectively and let N be the total number of items in the database. Given the MxM data covariance matrix C defined as having components
the distance between the i -th and the j -th song is given by
A distance of zero means that both songs are equal (actually, that they have the same average features); a small distance indicates that they are similar songs (actually, that they have similar average features) whereas a large distance indicates that the songs are not related.
Furthermore, it has been recognised that songs belonging to a given genre will have local feature vectors that are normally distributed around the average feature vector. Therefore, there is no need for extracting the local feature vectors from the whole song but from a representative part of the song. However, it is desirable to extract local feature vectors from relevant parts of the song and not from irrelevant parts such as silence or background noise since these will lower the average values.
The problem here is to choose the appropriate part of the song, which is representative for the song as a whole.
Current implementations of extraction algorithms take one of a few approaches: they analyse the whole song, they analyse a fixed portion of audio from the middle of the song or they specify a fixed portion of audio after skipping a fixed portion of audio. From the methods described above, the first is very inefficient but has the largest probability of better representing the song in its average. However, including intro
and outro of the song plus potential zero-valued samples could lead to average feature vectors that are not the best choice for the given song.
The other two approaches tend to solve the above problem by choosing a region in the middle of the song or after a given time. However, they may fail in that they don't take into consideration the actual music content.
Furthermore, many audio files are stored in a compressed data format, for example MPEG-I (MP3), MPEG-2 or MPEG-4 AAC to maximise available storage capacity of portable devices used for playback. Existing extraction algorithms have to decode the data of these files before extracting the feature information which requires additional computational resources .
SUMMARY OF THE INVENTION
The present invention aims to reduce the amount of processing required to extract feature information from a multimedia file. This is achieved according to an aspect of the present invention comprising a method for extracting feature information from a multimedia file, the method comprising the steps of: determining the location of an analysis window of the multimedia file in accordance with occurrence of an event within the multimedia file; and extracting the feature information from data within the analysis window. Since the occurrence of an event is determined, such as, for example the maximum energy of the data of the multimedia file or the first occurrence of the signal amplitude of the data of the multimedia file exceeding a predetermined threshold, the chosen region of the file for analysis is more likely to comprise a relevant portion which takes into consideration the content of the file. In extracting the feature information in this way an average feature vector is produced which is representative of the file and prevent analysis taking place in regions with low amplitude.
In the case of compressed data format in which the multimedia file comprises a plurality of frames, each frame having a global gain associated therewith, the event is the maximum global gain. In utilising the existing global gain values of the frames, the creation of the analysis window can be easily established by merely parsing the header of the frames and reading the global gain values without decoding the whole file which speeds up the extraction of the feature information and reduces the computational resources required.
BRIEF DESCRIPTION OF DRAWINGS
For a more complete understanding of the present invention and by way of example, reference is made to the following description taken in conjunction with the accompanying drawings, in which: Figure 1 a is a flow diagram of a first embodiment of the present invention;
Figure Ib is a flow diagram of a second embodiment of the present invention;
Figure 2 is a schematic diagram of an example of frame structure of compressed data of an audio file;
Figure 3 illustrates determination of an analysis window according to an embodiment of the present invention; and
Figure 4 is a schematic diagram of apparatus according to a further embodiment of the present invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS A method according to an embodiment of the present invention will now be described with reference to Fig. Ia. At step 101, the duration of an input multimedia file, t0, is tested. If the input multimedia file is shorter than a predetermined time interval, tls for example 15 seconds, an error is generated, step 103, since this file is considered too short for feature extraction. If the input file is greater than tls the method proceeds to step 105 for feature extraction.
In step 105, the duration of the input multimedia file, t0, is tested again. If the duration of the input file, t0, is less than the predetermined duration of an analysis window, t2, for example 90 seconds, the entire input file becomes the analysis window, step 107. If the duration of the input file is greater than the predetermined duration of an analysis window, t2, the input file is scanned at step 109 for the maximum signal amplitude or acoustic energy level of the acoustic signal of the data contained in the multimedia file. The location als of the maximum signal amplitude, energy level, etc., of the acoustic signal is determined at step 111.
The input file is then scanned again, step 113, to establish the extremes of the analysis window a2. This is determined either as a time interval t2 subsequent to als that is, a2=ai+t2, step 115, or, alternatively when the signal amplitude or power of the acoustic signal of the multimedia file first reaches a level 6dB below the maximum, step 125.
The location of the analysis window is then determined at step 117. The analysis window may be located between ai and a2 or, alternatively, the analysis window may be centred at ai between locations + a2.
Following this the feature vectors are extracted from the data within the analysis window, step 119. The extracted feature vectors are averaged, step 121, and stored, step 123, in a feature database for later reference.
Fig. Ib illustrates an alternative, preferred embodiment of the present invention. The method according to this embodiment follows the steps in respect of the first embodiment of Fig. Ia except that step 111 of Fig. Ia is replaced by steps 111 1, 111 3 and 111 5 shown in Fig. Ib.
In the case of a compressed audio file, for example, MP3 format as shown in Fig. 2, the absolute value of the maximum signal amplitude or power cannot be determined for such compressed data files without decoding the file. To overcome this, the global gain values in each frame of every granule in the left/mid channel is read, step 111 1 of Fig. Ib. As illustrated in Fig. 2, a compressed MP3 audio file comprises a plurality of frames 200 1 to 20O n (only frames 200 1 to 200 4 are shown in Fig.2). Each frame comprises a header portion 201 1 to 201_n, an error check portion 203 1 to 203_n, side information portion 205 1 to 205_n, and main data portion 207 1 to 207_n. The side information portions 205 1 to 205_n comprise a plurality of granules of left and mid channels 209L0 1 , 209L1 1 , 209R0 1 and 209R1 1. Each granule contains a global gain value.
To avoid decoding the entire frame, the global gain values provided in each frame is read, step 111 1. The global gain values are then filtered, step 111 2, for example, using a moving average filter with a depth of 100 granules. The location of the maximum global gain value is determined to provide ai of the analysis window, step 111 3.
Fig. 3 illustrates a plot of the amplitude of the acoustic signal of an uncompressed audio file over time. In this embodiment, ai of the analysis window is determined as the first occurrence of the signal amplitudes exceeding a predetermined threshold value ST. The analysis window W is then determined as starting at ai for a subsequent time interval t2, say 90 seconds.
The apparatus according to an embodiment of the present invention will now be described with reference to Fig. 4. The apparatus 400 comprises a pre-processor 401 connected to an input terminal 402 of the apparatus 400. The pre-processor 401 is connected to a processor 403. The output of the processor 403 is connected to a comparator 409 and a
feature database 405. The output of the comparator 409 and the feature database 405 are connected to a multimedia file store 407. The output of the file store 407 is connected to the output terminal 408 of the apparatus 400.
In use, a multimedia file is input on the input terminal 402. The pre-processor 401 scans the input file to determine the location of an analysis window according to the steps 101 to 117 (and 125) of Fig. Ia. The input file and the location of the analysis window are fed to the processor 403. The processor 403 executes the feature extraction algorithm in accordance with the steps 119 to 123 of Fig. Ia. The feature vectors extracted from the data within the analysis window are averaged and stored in the feature database 405. The multimedia file is stored in the file store 407.
Upon input of a seed song (or multimedia file), the feature vectors are extracted and averaged as described above and are fed to the input of the comparator 409 whereupon the feature vectors of the seed song and the candidate songs stored in the feature database 405 are compared. The comparator 409 determines the distance between the seed song and each candidate song. The candidate songs considered "similar" to the seed sing are then selected from the file store 407 and placed on the output terminal 408 of the apparatus 400 to be forwarded to a user interface device or playlist generator (not shown here) for consideration by the user.
Although preferred embodiments of the present invention has been illustrated in the accompanying drawings and described in the foregoing detailed description, it will be understood that the invention is not limited to the embodiments disclosed, but is capable of numerous modifications without departing from the scope of the invention as set out in the following claims.
Claims
1. A method for extracting feature information from a multimedia file, the method comprising the steps of: determining the location of an analysis window of the multimedia file in accordance with occurrence of an event within the multimedia file; - extracting the feature information from data within the analysis window.
2. A method according to claim 1, wherein the occurrence of the event determines the starting point of the analysis window.
3. A method according to claim 1, wherein the occurrence of the event determines the centre of the analysis window.
4. A method according to any one of the preceding claims, wherein the event is the maximum energy of the data of the multimedia file.
5. A method according to any one of claims 1 to 3, wherein the event is the first occurrence of the signal amplitude of the data of the multimedia file exceeding a predetermined threshold.
6. A method according to any one of the claims 1 to 4, wherein the multimedia file comprises compressed data, the compressed data comprising a plurality of frames, each frame having a global gain associated therewith and the event is the maximum global gain.
7. A method according to any one of the preceding claims, wherein the step of extracting feature information includes the steps of: extracting a plurality of features vectors from data within the analysis window; and averaging the plurality of feature vectors.
8. A method according to any one of the preceding claims, wherein the duration of the analysis window comprises a predetermined time interval.
9. A method according to any one of claims 1 to 7, wherein the duration of the analysis window is determined on the basis of a change in the event.
10. Apparatus for extracting feature information from a multimedia file comprising: a preprocessor for determining the location of an analysis window in the multimedia file in accordance with occurrence of an event within the multimedia file; and a processor for extracting feature information from data within the analysis window.
11. Apparatus according to claim 10, wherein the preprocessor further comprises scanning means for scanning the multimedia file to determine the occurrence of the event.
12. Apparatus according to claim 10 or 11, wherein the event is the maximum energy of the data of the multimedia file.
13. Apparatus according to claim 10 or 11, wherein the event is the first occurrence of the signal amplitude of the data of the multimedia file exceeding a predetermined threshold.
14. Apparatus according to any one of claims 10 to 12, wherein the multimedia file comprises compressed data, the compressed data comprising a plurality of frames, each frame having a global gain associated therewith and the event is the maximum global gain.
15. Apparatus according to claim 14, wherein the apparatus further comprises means for reading the global gain values for each frame; and a moving average filter for filtering the read global gain values to determine the maximum global gain value.
16. Apparatus according to any one of claim 10 to 15, wherein the processor comprises: extraction means for extracting the feature vectors form the data of the analysis window of the multimedia file; averaging means for averaging the extracted feature vectors.
17. A computer program product comprising a plurality of program code portions for carrying out the method according to any one of claim 1 to 9.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05107419.3 | 2005-08-12 | ||
EP05107419 | 2005-08-12 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2007020544A2 true WO2007020544A2 (en) | 2007-02-22 |
WO2007020544A3 WO2007020544A3 (en) | 2007-05-31 |
Family
ID=37668113
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2006/052588 WO2007020544A2 (en) | 2005-08-12 | 2006-07-28 | Method and apparatus for extracting feature information from a multimedia file |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2007020544A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3575989A4 (en) * | 2017-02-28 | 2020-01-15 | Samsung Electronics Co., Ltd. | METHOD AND DEVICE FOR PROCESSING MULTIMEDIA DATA |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0529786A2 (en) * | 1991-08-30 | 1993-03-03 | Loral Aerospace Corporation | Apparatus and method for detecting vibration patterns |
-
2006
- 2006-07-28 WO PCT/IB2006/052588 patent/WO2007020544A2/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0529786A2 (en) * | 1991-08-30 | 1993-03-03 | Loral Aerospace Corporation | Apparatus and method for detecting vibration patterns |
Non-Patent Citations (4)
Title |
---|
ANON.: "Speech recognition with hidden markov models of speech waveforms" IBM TECHNICAL DISCLOSURE BULLETIN, vol. 34, no. 1, June 1991 (1991-06), pages 7-16, XP000210093 Armonk, NY, USA * |
LU, L. ET AL.: "Content analysis for audio classification and segmentation" IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, vol. 10, no. 7, October 2002 (2002-10), pages 504-516, XP002417102 USA * |
WOLD, E. ET AL.: "Content-based classification, search, and retrieval of audio" IEEE MULTIMEDIA, vol. 3, no. 3, 1996, pages 27-36, XP002417103 * |
XU, C. ET AL.: "Automatic music summarization based on temporal, spectral and cepstral features" PROC. 2002 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, vol. 1, 2002, pages 117-120, XP010604320 Piscataway NJ, USA * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3575989A4 (en) * | 2017-02-28 | 2020-01-15 | Samsung Electronics Co., Ltd. | METHOD AND DEVICE FOR PROCESSING MULTIMEDIA DATA |
US10819884B2 (en) | 2017-02-28 | 2020-10-27 | Samsung Electronics Co., Ltd. | Method and device for processing multimedia data |
Also Published As
Publication number | Publication date |
---|---|
WO2007020544A3 (en) | 2007-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10210884B2 (en) | Systems and methods facilitating selective removal of content from a mixed audio recording | |
CN100472515C (en) | System for managing audio information | |
US6990453B2 (en) | System and methods for recognizing sound and music signals in high noise and distortion | |
US20060155399A1 (en) | Method and system for generating acoustic fingerprints | |
US20050249080A1 (en) | Method and system for harvesting a media stream | |
US20060149533A1 (en) | Methods and Apparatus for Identifying Media Objects | |
KR100676863B1 (en) | System and method for providing music search service | |
WO2005122141A1 (en) | Effective audio segmentation and classification | |
Cotton et al. | Soundtrack classification by transient events | |
US20080235267A1 (en) | Method and Apparatus For Automatically Generating a Playlist By Segmental Feature Comparison | |
WO2006132596A1 (en) | Method and apparatus for audio clip classification | |
JP2005532763A (en) | How to segment compressed video | |
JP3757719B2 (en) | Acoustic data analysis method and apparatus | |
US8543228B2 (en) | Coded domain audio analysis | |
CN102214219B (en) | Audio/video content retrieval system and method | |
US7680654B2 (en) | Apparatus and method for segmentation of audio data into meta patterns | |
Kim et al. | Quick audio retrieval using multiple feature vectors | |
CN103294696A (en) | Audio and video content retrieval method and system | |
US7985915B2 (en) | Musical piece matching judging device, musical piece recording device, musical piece matching judging method, musical piece recording method, musical piece matching judging program, and musical piece recording program | |
US8341161B2 (en) | Index database creating apparatus and index database retrieving apparatus | |
WO2007020544A2 (en) | Method and apparatus for extracting feature information from a multimedia file | |
KR101002732B1 (en) | Online Digital Content Management System | |
KR100869643B1 (en) | Summary device, method, and program for realizing MP3 type of flexible sound using music structure | |
Petridis et al. | A multi-class method for detecting audio events in news broadcasts | |
KR101002731B1 (en) | Feature vector extraction method of audio data, computer readable recording medium recording the method and matching method of audio data using same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06780235 Country of ref document: EP Kind code of ref document: A2 |