出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:Document (text) classification is a common method in e-business, facilitating users in the tasks
such as document collection, analysis, categorization and storage. Semantic analysis can help
to improve the performance of document classification. Though having been considered when
designing previous methods for automatic document classification, more focus should be given
to semantics with the increase number of content-rich electronic documents, forum posts or
blogs online, which can reduce human workload by a great margin. This paper proposes a
novel semantic document classification approach aiming to resolve two types of semantic
problems: (1) polysemy problem, by using a novel semantic similarity computing strategy (SSC)
and (2) synonym problem, by proposing a novel strong correlation analysis method (SCM).
Experiments show that our strategies can help to improve the performance of the baseline
methods.