A Web Mining Model for Real-time Webpage Personalization
ABSTRACT
Determining the size of the World Wide Web is extremely difficult. The Web can be viewed as the largest data source available and presents a challenging task for effective design and access. One proposed Web mining approach to handling the problem of effective design and access is personalization. With personalization, Web access or the contents of a Web page are modified to better fit the desires of the user. This may involve dynamically creating Web pages that are unique per user or using the desires of a user to determine what Web documents to retrieve. This paper presents a Web mining model based on dynamic clustering and hidden Markov model. The output of the model is some information for dynamically creating a Web page which can best meet the user's desires. The assumption of the dynamic clustering is that if a group of users who have the same interest trend, those pages they have visited are probably related. We propose that human should be the authority to judge the correlation of two pages. First, the model statistic a user's Web browsing records in the log file; find a group of users who have the same interest trend with the user; collect all the pages in which this group of users are interested; calculate the correlation between pages; and cluster the pages into several categories according to a predetermined threshold. Each Web page category is considered as a stochastic state variable. In the second phase, our model based on hidden Markov model is further constructed to mine the latent desires of a user given an observed sequence of Web pages that the user have browsed. In order to get the optimal parameters (transition probability matrix, the conditional probability and the initial state) in the model, we applied the Baum-Welch parameter estimation method in EM algorithm to train the model on the data set. Experimental results show that the model is practicable and efficient