期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2013
卷号:4
期号:2
页码:341-344
出版社:TechScience Publications
摘要:Due to the presence of inherent structure in the XML documents, conventional text classification methods cannot be used to classify XML documents directly. In this paper, we propose the learning issues with XML documents from three major research areas. First, a knowledge representation method, which is based on typed higher order logic formalism. Here, the main focus is how to represent an XML document using higher order logic terms where both its contents and structures are captured. Second-symbolic machine learning. Here, a new decision-tree learning algorithm determined by precision/recall breakeven point (PRDT) for the XML document classification problem. Precision/recall heuristic is considered in xml document classification is that the xml documents have strong connections with text documents. Finally, we had a semisupervised learning algorithm which is based on the PRDT algorithm and the co-training framework. By producing comprehensible theories, the tentative results exhibit that our framework is capable to attain good performance in both the machine learning techniques.