期刊名称:International Journal of Hybrid Information Technology
印刷版ISSN:1738-9968
出版年度:2015
卷号:8
期号:2
页码:247-256
DOI:10.14257/ijhit.2015.8.2.23
出版社:SERSC
摘要:A semantic block is treated as a unit while analyzing the webpage. First, we implement the VTPS algorithm to partition a webpage into semantic blocks. Then, we propose an algorithm to extract the spatial and content features, and then construct the feature vector for each block. Based on these vectors, the SVM learning algorithm is applied to train and classify the various theme-oriented webpage blocks. At last, the classification experiments show the efficiency of this method.