نمايش پست تنها
قديمي ۰۳-۸-۱۳۸۹, ۰۵:۵۷ بعد از ظهر   #9 (لینک دائم)
Astaraki Female
Administrator
 
آواتار Astaraki
 
تاريخ عضويت: خرداد ۱۳۸۷
محل سكونت: تهران-کرج!
پست ها: 3,465
تشكرها: 754
16,337 تشكر در 3,127 پست
My Mood: Mehrabon
ارسال پيغام Yahoo به Astaraki
Smile

A Web Mining Architectural Model of Distributed Crawler for Internet Searches Using PageRank Algorithm

ABSTRACT

As the World Wide Web is growing rapidly and data in the present day scenario is stored in a distributed manner. The need to develop a search engine based architectural model for people to search through the Web. Broad Web search engines as well as many more specialized search tools rely on Web crawlers to acquire large collections of pages for indexing and analysis. The crawler is an important module of a web search engine. The quality of a crawler directly affects the searching quality of such Web search engines. Such a Web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. Given some URLs, the crawler should retrieve the Web pages of those URLs, parse the HTML files, add new URLs into its queue and go back to the first phase of this cycle. The crawler also can retrieve some other information from the HTML files as it is parsing them to get the new URLs. In this paper, we describe the design of a Web crawler that uses PageRank algorithm for distributed searches and can be run on a network of workstations. The crawler scales to several hundred pages per second, is resilient against system crashes and other events, and can be adapted to various crawling applications. We present Web mining architecture of the system and describe efficient techniques for achieving high performance.
فايل ضميمه
نوع فايل: pdf index27.pdf (339.7 كيلو بايت, 531 نمايش)
Astaraki آفلاين است   پاسخ با نقل قول
از Astaraki تشكر كرده اند:
maktitil (۰۲-۳۱-۱۳۹۰), mehrdad1261 (۰۹-۲۴-۱۳۹۰), n.lashgari (۰۸-۱۱-۱۳۹۰), narges_p (۱۲-۳-۱۳۹۱), سمیه جون (۱۰-۲-۱۳۹۰)