A Multimodal Approach for Audivisual Data Segmentation and Annotation

Zhang, Tong; Shih, Hsuan-Huei; Kuo, C.C.

題名:	A Multimodal Approach for Audivisual Data Segmentation and Annotation
作者:	Zhang, Tong Shih, Hsuan-Huei Kuo, C.C.
關鍵字:	audiouisual data segmentation and indexing audio content analysis visual content analysis video database management information filtering and retrieval
期刊名/會議名稱:	1999 NCS會議
摘要:	While most approaches for video segmentation and indexing are focused on the pictorial part, there are significant clues contained in the accompanying audio flow. Only by combining the audio and visual information together, a fully functional system for video content parsing can be achieved. Based on the investigation of data structures for different video types, we present in this paper a scheme for the segmentation and annotation of audiovisual sequence which includes tools for both audio and visual content analysis. In the proposed system, the video data is segmented into audio scenes and visual shots by detecting abrupt changes in the audio and visual features, respectively. Then, the audio scene is indexed as one of the basic audio types such as speech, music, song, environmental sound, speech with music background, etc. while the visual shot is represented by keyframes and associated image features. An index table is generated automatically for each video clip through the integration of audio and visual analysis outputs. Experimental results show that the proposed research accomplishes more meaningful and robust indexing of video content compared to previous work.
日期:	2006-10-27T07:33:54Z
分類:	1999年 NCS 全國計算機會議

文件中的檔案：

檔案	描述	大小	格式
ce07ncs001999000043.pdf		1.82 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

在 DSpace 系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

逢甲大學校園典藏知識庫