Skip to content

Ceasea/baidubaike-scrapy

Repository files navigation

baidubaike-scrapy

spider_main.py包括以下几个模块

  1. spider_main.py
  2. url_Manager.py
  3. html_Downloader.py
  4. html_Parser.py
  5. html_Outputer.py

mypider.py 对spider_main的所有模块重写

output.html为结果,约3750个词条,程序不再进行

可能达到list长度限制

About

small test of scrapy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published