期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2013
卷号:1
期号:4
出版社:S&S Publications
摘要:Web spam refers to techniques that manipulates the ranking algorithms of web search engines and causethem to rank search results higher than they deserve [1]. The spam web pages may pretend to provide assistance orfacts about a particular subject, but the help is often meaningless and the information shallow. Recently, the amount ofweb spam has increased dramatically, leading to a degradation of search results. Today's search engines use variationsof the fundamental ranking methods that feature some degree of spam resilience. PageRank is one of them which notonly counts the number of hyperlinks referring to a web page, but also takes the PageRank of the referring page intoaccount, but this concept has proven to be vulnerable to manipulation [12]. TrustRank overcomes the PageRankproblems but involves human operators to judge seed sets to find if a page is spam or not. There are situations where anoperator fails to assign a crisp value to a page. In such case a human sentiment involve in deciding a page is spam ornot. Our work reveals the human sentiment involved in the judgment of seed set. We also proposed a model thatminimizes the involvement of human sentiment by employing Fuzzy Logic in seed selection process.
关键词:PageRank; TrustRank; Fuzzy Logic; Spam; Search Engine; Web Graph