The main aim of our paper is to introduce a multimodal annotation schema which is suitable for multimodal corpus analysis on the levels of speech and hand gestures, furthermore, it describes the semantic relation between speech and hand gestures. In the first part of the present paper we delineate the annotation schema which is based on McNeill's (1992) and Kendon's (2004) definition of gesture, and it is founded on the gesture semantic taxonomy of Alibali et al. (2001) and Kong et al. (2015). The transcription of speech follows the conventions of conversation analysis by Jefferson (2004). The second part of this paper is a case study which is made for testing the applicability of the suggested annotation schema. The analysed corpus contains political debate shows and the case study focuses on verbal aggression.