摘要:this paper makes theVISta database, composed of inertial and visual data, publicly available for gesture and activity recognition. the inertial data were acquired with the SensHand, which can capture the movement of wrist, thumb, index and middle fngers, while the RGB-D visual data were acquired simultaneously from two diferent points of view, front and side . The VISTA database was acquired in two experimental phases: in the former, the participants have been asked to perform 10 diferent actions; in the latter, they had to execute fve scenes of daily living, which corresponded to a combination of the actions of the selected actions. In both phase, Pepper interacted with participants. The two camera point of views mimic the diferent point of view of pepper. Overall, the dataset includes 7682 action instances for the training phase and 3361 action instances for the testing phase . It can be seen as a framework for future studies on artifcial intelligence techniques for activity recognition, including inertial-only data, visual-only data, or a sensor fusion approach .