Hyoung-Gook Kim

Learn More
In this paper, we present an MPEG-7-based audio classification and retrieval technique targeted for analysis of film material. The technique consists of low-level descriptors and high-level description schemes. For low-level descriptors, low-dimensional features such as audio spectrum projection based on audio spectrum basis descriptors is produced in order(More)
In this paper, we present a hybrid speaker-based segmentation, which combines metric-based and modelbased techniques. Without a priori information about number of speakers and speaker identities, the speech stream is segmented by three stages: (1) The most likely speaker changes are detected. (2) To group segments of identical speakers, a two-level(More)
Our challenge is to analyze/classify video sound track content for indexing purposes. To this end we compare the performance of MPEG-7 Audio Spectrum Projection (ASP) features based on several basis decomposition algorithms vs. Mel-scale Frequency Cepstrum Coefficients (MFCC). For basis decomposition in the feature extraction we evaluate three approaches:(More)
In this paper, we present an automatic extraction of goal events in soccer videos by using audio track features alone without relying on expensive-to-compute video track features. The extracted goal events can be used for high-level indexing and selective browsing of soccer videos. The detection of soccer video highlights using audio contents comprises(More)
In this paper, an integrated music recommendation system is proposed, which contains the functions of automatic music genre classification, automatic music emotion classification, and music similarity query. A novel tempo feature, named as log-scale modulation frequency coefficients, is presented in this paper. With AdaBoost algorithm, the proposed tempo(More)
We evaluate the MPEG-7 audio spectrum projection (ASP) features for general sound recognition performance against the well established MFCC. The recognition tasks of interest are speaker recognition, sound classification, and segmentation of audio using sound/speaker identification. For sound classification we use three approaches: direct approach;(More)
This paper presents a phone-based approach of spoken document retrieval (SDR), developed in the framework of the emerging MPEG-7 standard. We describe an indexing and retrieval system that uses phonetic information only. The retrieval method is based on the vector space IR model, using phone N-grams as indexing terms. We propose a technique to expand the(More)
Our purpose is to evaluate the efficiency of MPEG-7 audio descriptors for speaker recognition. The upcoming MPEG-7 standard provides audio feature descriptors, which are useful for many applications. One example application is a speaker recognition system, in which reduced-dimension log-spectral features based on MPEG-7 descriptors are used to train hidden(More)