期刊名称:IAENG International Journal of Computer Science
印刷版ISSN:1819-656X
电子版ISSN:1819-9224
出版年度:2021
卷号:48
期号:4
语种:English
出版社:IAENG - International Association of Engineers
摘要:Weakly-supervised temporal action localization aims to identify all action instances and their corresponding categories in the untrimmed videos. Since it involves only videolevel labels during training, resulting in this problem being more challenging. Existing attention-based action localization methods use the attention module to identify action segments and assign them to the appropriate action categories. However, such methods inevitably suffer from many background segments that are similar to the target actions, being recognized as actions. To address this issue, we propose a new weakly-supervised temporal action localization network using background suppression (BS-WTAL). The network defines the filtering module, which can suppress the activation of the background regions, classification module, which identifies the activity categories, and generative attention module, which is learned to model a segment-wise representation. This enables BS-WTAL to accurately distinguish actions from the background. Furthermore, we conduct ablation studies from different perspectives. Extensive experiments are performed on two datasets: THUMOS14 and ActivityNet1.2. Our approach exhibits better performance on these two datasets and achieves performance comparable to the state-of-the-art fully-supervised methods.