Parallel Web mining for link prediction in cluster server
ABSTRACT
Many Web mining methods have recently been used to model user navigational behavior based on log files of the Web server. Cluster-based server architectures combine good performance and low cost, and are widely used for Web service. In this paper, we propose a parallel Web mining (PWM) algorithm for link prediction in the environment of Web cluster server consisting of several nodes that act as independent Web servers. According to the PWM algorithm, the transition probability matrixes are firstly obtained from the Web log flies of each node by adopting the Markov chain model, compressed under the constraint of the probability threshold c and parallel threshold a, and then sent to the central node which combine these independent results to get an integrated result by some rules. By different accuracy requirement, the PWM algorithm can be divided into simple PWM algorithm (S-PWM), faster but less accurate, and complex PWM algorithm (C-PWM), slower but more accurate. Furthermore, a related incremental parallel Web mining (I-PWM) algorithm is put forward too. The experimental results show that PWM algorithm can not only alleviate the communication cost by sending the mined transition probability matrix and decrease the time complexity by disposing in parallel but also hardly affect the accuracy of the Web mining result.