摘要:To effectively utilize a large number of unlabeled data and a small part of labeled data in the document classification problem, a novel semi-supervised learning algorithm called optimal Laplacian regularized least square (OLapRLS) is proposed in this paper. This algorithm first obtains the data-adaptive edge weights by solving the l 1-norm optimization problem; then the normalized graph Laplacian is derived for revealing the intrinsic document manifold structure; finally, the Nyström method-based low-rank approximation method is adopted to reduce the computational complexity in manipulating the large kernel matrix. Experimental results on three well-known document datasets demonstrate the effectiveness and efficiency of the proposed OLapRLS algorithm.
关键词:document classification;semi-supervised learning;optimal Laplacian regularized least square (OLapRLS);kernel low-rank approximation