期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2020
卷号:11
期号:11
DOI:10.14569/IJACSA.2020.0111112
出版社:Science and Information Society (SAI)
摘要:Event detection of rare and complex events in large video datasets or in unconstrained user-uploaded videos on internet is a challenging task. The presence of irregular camera movement, viewpoint changes, illumination variations and significant changes in the background make extremely difficult to capture underlying motion in videos. In addition, extraction of features using different modalities (single streams) may offer computational complexities and cause abstraction of confusing and irrelevant spatial and semantic features. To address this problem, we present a single stream (RGB only) based on feature of spatial and semantic features extracted by modified 3D Residual Convulsion Network. We combine the spatial and semantic features based on this assumption that difference between both types of features can discover the accurate and relevant features. Moreover, introduction of temporal encoding builds the relationship in consecutive video frames to explore discriminative long-term motion patterns. We conduct extensive experiments on prominent publically available datasets. The obtained results demonstrate the great power of our proposed model and improved accuracy compared with existing state-of-the-art methods.