題名: | 大型多模態模型實現人形機器人智能交互系統開發 |
其他題名: | Development of an Intelligent Interaction System for Humanoid Robots Based on Large Multimodal Models |
作者: | 李韋宏 吳令堯 |
關鍵字: | 人形機器人 大型多模態模型 Generative Pre-trained Transformer Whisper Humanoid Robot Large Multimodal Models |
系所/單位: | 自動控制工程學系, 資訊電機學院 |
摘要: | 中文摘要
隨著現代生活對智能化服務需求的提升,如何讓機器人具備更加自然且直觀的交互能力成為重要課題。本專題結合生成式人工智慧與機器人技術,希望可以構建一套高度智能化的交互系統,應用於教育輔助、醫療護理等高互動性場景,提升人機互動的自然性與效率。
在研究過程中,本系統以 Pepper 機器人作為交互平台,結合 OpenAI 的 Whisper 模型進行語音識別,並使用 GPT-4o 模型實現自然語言生成。影像數據由機器人內建的攝影機捕捉,經過數據編碼後與語音轉錄文本共同傳輸至伺服器進行處理。同時,透過 Prompt Engineering 技術,確保語言生成內容在邏輯性和格式上符合預期,並生成適應特定場景的回應。
本系統在語音識別準確率、語言生成的流暢性與邏輯性、以及機器人動作執行的合理性方面表現良好。特別是語音轉錄文本與影像數據的多模態結合,使交互過程更加自然流暢。本系統具備在日常服務、教育輔助以及醫療護理等場景中的應用潛力,為智能機器人技術的發展提供了新的可能性。 Abstract With the increasing demand for intelligent services in modern life, enhancing the naturalness and intuitiveness of human-robot interaction has become a critical challenge. This study integrates generative artificial intelligence with robotic technology to develop a highly intelligent interaction system, aimed at applications in highly interactive scenarios such as educational assistance and medical care, improving the naturalness and efficiency of human-robot interaction. In this research, the Pepper robot serves as the interaction platform, combining OpenAI's Whisper model for speech recognition and the GPT-4o model for natural language generation. Visual data is captured by the robot's built-in camera, encoded, and processed together with transcribed speech text on the server. Additionally, through Prompt Engineering, the system ensures that the generated language outputs are logical, well-formatted, and tailored to specific scenarios. The system demonstrates strong performance in speech recognition accuracy, the fluency and coherence of language generation, and the appropriateness of robot motion execution. Notably, the multimodal integration of transcribed speech text and visual data makes the interaction process more natural and seamless. This system has significant potential for applications in daily services, educational assistance, and medical care, offering new possibilities for the advancement of intelligent robotic technologies. |
學年度: | 113學年度第一學期 |
開課老師: | 黃, 清輝 |
課程名稱: | 機器人學 |
系所: | 自動控制工程學系, 資訊電機學院 |
分類: | 資電113學年度 |
文件中的檔案:
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
1131-13.pdf | 1.03 MB | Adobe PDF | 檢視/開啟 |
在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。