期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
印刷版ISSN:2277-6451
电子版ISSN:2277-128X
出版年度:2013
卷号:3
期号:7
出版社:S.S. Mishra
摘要:A server raw log file contains much valuable information related to internet user transactions. To generate meaningful and hidden patterns from it requires mining techniques related to web objects perspective. Mining techniques from web objects perspective are divided into three main approaches: (1) Classification (2) Association Rule Mining and (3) Clustering. Raw log file transactions could not be classified into predefined classes so classification approach is not suitable for generating patterns from raw log file. Research shows that association rule mining generates many unnecessary rules so analysis of pattern is very difficult. Clustering approach of web mining is an ideal solution to generate meaningful patterns from raw log file. Clustering techniques are dividing into two parts: (1) Hierarchical and (2) Partitioning relocation clustering. Hierarchical clustering is not an efficient because most of all algorithms do not revisit clusters once constructed while partitioning clustering provides improvement based on relocation. Clustering approach builds clusters based on distance metric and generation of distance metric from string sequences is very complex task. This paper studies different approaches to build distance metric. This paper uses well known bio informatics algorithms that are used for protein sequences similarity to generate distance metric and compares it with well known edit distance algorithm. This paper also provides a new approach for formation of clusters in context of prediction model of web caching and prefetching
关键词:Web Caching; Web Prefetching; Data Mining; Clustering; Association rule Mining; Distance Metric; ;Fuzzy C-Means