月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
理工研究國際期刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
利用多尺度感興趣區域之細微關係提供圖片字幕
並列篇名
Image Captioning Based on Fine-grained Relationships with Multiscale Regions of Interest
作者 林亮宇 (Liang-Yu Lin)林朝興 (Chow-Sing Lin)
中文摘要

隨著機器學習的蓬勃發展,圖片字幕生成(Image Captioning)的技術愈來愈進步。近期的Image Captioning引入區域提取網路(Region proposal Networks,RPN)與注意力機制(Attention Mechanism)。Image Captioning 透過 RPN 提取圖片中特定的物件區域,可以降低雜訊被當作視覺特徵的機率;注意力機制讓模型更專注在物件到文字的轉換。但是目前研究成果還存在著缺陷,RPN 與注意力機制皆專注於單一物件區域。它們缺少物件與物件之間更細膩的視覺特徵。上述的缺陷導致字幕生成器生成不明確的關係描述。為了提高Image Captioning 生成關係描述的細膩度,本研究提出透過不同物件之間多尺度感興趣區域之關係特徵的Image Captioning模型。本研究架構有 RPN、全卷積神經網路(Fully Convolutional Neural Networks,FCNN)以及長短期記憶(Long Short-term Memory,LSTM)單元。相較於現有的研究成果,在視覺特徵上,除了物件區域外,我們將進一步提取不同物件之間的多尺度 ROIs。由於某些多尺度 ROIs 是屬於雜訊,因此利用並交比(Intersection-over-Union)進行篩選。每一個ROI都先經由FCNN萃取出視覺特徵,再通過融合機制與排序網路獲得已排序的融合特徵,最後利用 LSTM 學習此特徵到完整句子的轉換。在訓練過程中額外透過階層式屬性的輔助監督,使字幕生成器能夠針對如何生成細膩的屬性進行學習。本研究提出的架構能夠在動態的圖片上,使用更精確的動詞描述物件動作。並且在基於 n-gram 的方法上,獲得更高的分數。

英文摘要

With the rapid development of machine learning, the technique of Image Captioning is be coming more and more advanced. Recent researches of Image Captioning introduce Region Proposal Networks (RPN) and Attention Mechanism. Through RPN, we can extract features of specific object region in the image and reduce the probability of noises being treated as visual features. Attention mechanism makes the models to focus more on the mapping of object and caption. However, the current research results have deficiencies. Both RPN and Attention Mechanism only focus on the single object region instead of fine-grained visual features. Aforementioned deficiencies cause mistakes that caption generator generates uncertain rela tionships. In this paper, to improve exquisiteness of relationship descriptions for Image Cap tioning, we propose the Image Captioning model which generates sentence with multi-scale regions of interest (ROIs) between two different objects. Our proposed architecture includes Region Proposal Networks, Fully Convolutional Neural Networks and Long Short-term Memory cells. Compared to the existing research results, we extract not only object regions but multi-scale ROIs between two different objects on visual features. Some of Multi-scale ROIs are noises that can be screened by utilizing Intersection-over-Union (IoU). Each ROI utilizes FCNN to extract the visual features, followed by obtaining sorted fusion features with fusion mechanism and sorting network, and lastly learning transformation between this features to a whole sentence by LSTM. Caption generator can focus on learning how to generate fine grained attributes with hierarchical attribute supervisions on the training stage. The architecture proposed in this study can use more precise verbs to describe object actions on dynamic pic tures. Furthermore, our architecture outperforms on metrics based n-gram.

 

起訖頁 019-038
關鍵詞 圖片字幕生成區域提取網路多尺度感興趣區域長短期記憶單元Image CaptioningRegion Proposal NetworksMulti-scale ROIsLong Shortterm Memory cells
刊名 理工研究國際期刊  
期數 202310 (13:2期)
出版單位 國立臺南大學
該期刊-上一篇 薄膜電晶體的製作及其電特性
該期刊-下一篇 多晶態鑭鍶錳氧磁譜在磁場下的特徵
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄