出版社:Consejo Superior de Investigaciones Científicas
摘要:This paper explores website link structure, whereby websites are considered as interconnected graphs and their features are analyzed as a social network. For each root domain, two different networks are extracted: the first being the domain network and the second, the page network. In each case, a series of indicators taken from social network analysis is evaluated in order to characterize the website structure. Factor analysis may provide an appropriate statistical methodology for extracting in graphic form the principal profile of the website in terms of its internal structure. However, the large number of indicators generated by such an exploratory search would lead to a prohibitive number of possibilities. Therefore, this work proposes the use of genetic algorithms. By using this guided search over a given space of possible solutions, genetic algorithms can provide a subset of indicators able to optimize a fitness function. The results categorize corporate websites in terms of their link structure and highlight the possibilities for using genetic algorithms as a tool for knowledge discovery.
关键词:Link analysis;Website structure;factor analysis;genetic algorithms;Análisis de enlaces;estructura de portales web;análisis factorial;algoritmos genéticos