location:Home > 2019 Vol.2 Dec. No.6 > Research and Design of Distributed Multi-topic Web Crawler Based on Python

2019 Vol.2 Dec. No.6

  • Title: Research and Design of Distributed Multi-topic Web Crawler Based on Python
  • Name: Aiju Wang
  • Company: Zhengzhou Institute of Technology
  • Abstract:

    Aiming at the problem of low crawl speed of traditional web crawlers, a research and design of distributed multi-topic web crawlers based on Python is proposed. Firstly, the distributed physical architecture of the web crawler is constructed through the establishment of the Python cluster and the Storm cluster. The distributed logical architecture of the web crawler is constructed through the indicator calculation part and the processing part. The two architectures together form the overall architecture of the distributed multi-topic web crawler. Secondly, the design of grabbing data through API interface, downloading data through GUI interface, and writing crawler program to download data completed the design of crawling way for web crawlers. The experiments show that the designed web crawler has a high crawling speed.

  • Keyword: Python; distributed; web crawler; Storm cluster;
  • DOI: 10.12250/jpciams2019060632
  • Citation form: Aiju Wang.Research and Design of Distributed Multi-topic Web Crawler Based on Python[J]. Computer Informatization and Mechanical System, 2019, vol. 2, pp. 1-4.
Reference:

[1] Mou Ning, Guo Yingjie, Chen Jie, et al. Thematic Web Crawlers for Social Websites——Taking Douban Website as an Example [J]. Computer Knowledge and Technology, 2018, 14(32): 251-253.

[2] Feng Ling, Huang Liang, Zeng Liyang, et al. Research on Web spatial data acquisition method based on distributed web crawler [J]. Journal of Guizhou University (Natural Science Edition),2019,36(01):33-36.

[3] Tian Xiaoling, Fang Yuan, Jia Minzheng, et al. Design of keyword-type web crawlers based on data analysis [J]. Journal of Beijing Polytechnic, 2018, 17(04): 36-43.

[4] Zeng Jianrong, Zhang Yangsen, Zheng Jia, et al. Multi-data source oriented web crawler implementation technology and application [J]. Computer Science,2019,46(05):304-309.

[5] Xue Huajie, Zhang Ning, Fu Yining, et al. Research on E-commerce Succulent Species Resource Data Based on Web Crawler Technology [J]. Journal of Biosafety,2017,26(04):311-315.

[6] Shen Cong, Dai Xiaopeng, Fan Zhenyu. Design and Implementation of Mobile Agriculture Information Service System Based on Web Crawler [J]. Hunan Agricultural Science,2017,56(06):81-83+87.


 


Tsuruta Institute of Medical Information Technology
Address:[502,5-47-6], Tsuyama, Tsukuba, Saitama, Japan TEL:008148-28809 fax:008148-28808 Japan,Email:jpciams@hotmail.com,2019-09-16