期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:4
页码:7388
DOI:10.15680/IJIRCCE.2017.05040155
出版社:S&S Publications
摘要:Under the scheme the proposed approach provide summary using HMM by forming the K-Meanclustering with meaningful words and relationship using TF-IDF giving more information related to document. Thiswill provide better summary as compared to existing algorithms. The proposed approach we have built is a clusterbasedsummarization system with the knowledge coming from the clustering. The knowledge is composed of not onlyin recognizing important phrases in the document, but also in recognizing the relationships and the relationship typesthat exist between them. This extracted knowledge is represented in the form of a hierarchical. Even without thesummary, just looking at the nodes and relationships in the thematic graph gives us a rough idea about what thedocument is taking about. A summary however gives us the actual details. This method makes a lot of sense.Improvements and further experimentation would most definitely make the existing system more reliable than it isnow. The proposed approach can be extended with automatic generation of summarized data based on aspect orientedmodel which will give more efficient result in document summarization and will increase pre-processing speed andaccuracy.
关键词:Term Frequency – Inverse Document Frequency (TF-IDF); Machine Learning (ML); Web Mining;K-Mean Clustering; Hidden Markov Model (HMM); Information Retrieval.