期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2013
卷号:4
期号:3-2
出版社:Seventh Sense Research Group
摘要:Extensible Markup Language (XML) has been used as standard format for a data representation over the internet. An XML document is usually organized by a set of textual data according to a predefined logical structure. Due to the presence of inherent structure in the XML documents, conventional text classification methods cannot be used to classify XML documents directly. In this paper, we propose the learning issues with XML documents from three major research areas. First, a knowledge representation method, which is based on typed higher order logic formalism. Here, the main focus is how to represent an XML document using higher order logic terms where both its contents and structures are captured. Secondsymbolic machine learning. Here, a new decisiontree learning algorithm determined by precision/recall breakeven point (PRDT) for the XML document classification problem. Precision/recall heuristic is considered in xml document classification is that the xml documents have strong connections with text documents. Finally, we had a semisupervised learning algorithm which is based on the PRDT algorithm and the cotraining framework. By producing comprehensible theories, the tentative results exhibit that our framework is capable to attain good performance in both the machine learning techniques.