Due to the increasing amount of data available online, the World Wide Web has becoming one of the most valuable resources for information retrievals and knowledge discoveries. Web mining technologies are the right solutions for knowledge discovery on the Web. The knowledge extracted from the Web can be used to raise the performances for Web information retrievals, question answering, and Web based data warehousing. In this paper, we provide an introduction of Web mining as well as a review of the Web mining categories. Then we focus on one of these categories: the Web structure mining. Within this category, we introduce link mining and review two popular methods applied in Web structure mining: HITS and PageRank.
Applications of Web mining - from Web search engine to P2P fi-ltering
ABSTRACT
We have developed Japanese Web search engine "Mondou (RCAAU)", which was based on the emerging technologies of data mining. Our search engine provides associative keywords which are tightly related to focusing Web pages. We also implemented the visual interface based on the technology of information visualization. In order to improve the performance of various search strategies by using characteristics of Web systems, we try to implement the advanced Web information systems with data mining and information technologies. Firstly, we introduce various Web mining algorithm, which efficiently reduces the computing cost of Web search. We pay attention to a part of useful pages effectively and improve the performance of Web search by using our proposed algorithms. Secondly, for preserving huge volume of born-digital information in the Internet, we are focusing on technologies of Web archiving system like WARP. In order to handle monotonously increasing digital information, we have to resolve many difficult problems of long life data preservation by improving Web searching techniques. Our experiences of our Mondou Web search engine and cooperative distributed Web robots are very useful and effective. Finally, the technologies of P2P (Peer-to-Peer) distributed search systems are becoming important rapidly. For example, it is very hard to discover appropriate information resources by simple queries of Gnutella, Freenet and so on. Therefore, in order to realize the topic-driven search, we propose more intelligent search systems, which are based on the technologies of data mining.
A Web Mining Model for Real-time Webpage Personalization
ABSTRACT
Determining the size of the World Wide Web is extremely difficult. The Web can be viewed as the largest data source available and presents a challenging task for effective design and access. One proposed Web mining approach to handling the problem of effective design and access is personalization. With personalization, Web access or the contents of a Web page are modified to better fit the desires of the user. This may involve dynamically creating Web pages that are unique per user or using the desires of a user to determine what Web documents to retrieve. This paper presents a Web mining model based on dynamic clustering and hidden Markov model. The output of the model is some information for dynamically creating a Web page which can best meet the user's desires. The assumption of the dynamic clustering is that if a group of users who have the same interest trend, those pages they have visited are probably related. We propose that human should be the authority to judge the correlation of two pages. First, the model statistic a user's Web browsing records in the log file; find a group of users who have the same interest trend with the user; collect all the pages in which this group of users are interested; calculate the correlation between pages; and cluster the pages into several categories according to a predetermined threshold. Each Web page category is considered as a stochastic state variable. In the second phase, our model based on hidden Markov model is further constructed to mine the latent desires of a user given an observed sequence of Web pages that the user have browsed. In order to get the optimal parameters (transition probability matrix, the conditional probability and the initial state) in the model, we applied the Baum-Welch parameter estimation method in EM algorithm to train the model on the data set. Experimental results show that the model is practicable and efficient
With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems curricula. This paper reports on an experience using open Web application programming interfaces (APIs) that have been made available by major Internet companies (e.g., Google, Amazon, and eBay) in a class project to teach Web mining applications. The instructor's observations of the students' performance and a survey of the students' opinions show that the class project achieved its objectives and students acquired valuable experience in leveraging the APIs to build interesting Web mining applications.
با سلام و تشکر از مدیریت محترم سایت که مقالات خوبی را قرار داده اند
داده کاوی یکی از موضوعات بین رشته ای است که امروزه در بسیاری از سیستم های اطلاعاتی استفاده می شود و ارتباط تنگاتنگی با هوش مصنوعی و آمار دارد. در کشور ما هم 4-5سالی هست که حتی انجمنی تخصصی در این راستا ایجاد شده است و هرساله کنفرانسی را نیز برگزار می کند.
به آدرس ذیل: .: پنجمین کنفرانس داده کاوی ایران :.
درواقع در داده کاوی به دنبال یافتن دانش از دل انبوهی از داده های موجود هستیم که بتوانیم با این یافته ها مشکلات موجود را مرتفع کرده و به عنوان تصمیم یار مدیران از آنها بهره گرفت.
با آرزوی توفیق