Scrapy – 爬行
说明
要执行你的蜘蛛,请在你的 first_scrapy 目录下运行以下命令 –
scrapy crawl first
其中, 首先 是创建蜘蛛时指定的蜘蛛名称。
一旦蜘蛛爬行,你可以看到以下输出 –
2016-08-09 18:13:07-0400 [scrapy] INFO: Scrapy started (bot: tutorial)
2016-08-09 18:13:07-0400 [scrapy] INFO: Optional features available: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Overridden settings: {}
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled extensions: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled downloader middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled spider middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled item pipelines: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Spider opened
2016-08-09 18:13:08-0400 [scrapy] DEBUG: Crawled (200)
<GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2016-08-09 18:13:09-0400 [scrapy] DEBUG: Crawled (200)
<GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2016-08-09 18:13:09-0400 [scrapy] INFO: Closing spider (finished)
正如你在输出中看到的,每个URL都有一个日志行 (referer: None) ,说明这些URL是起始URL,它们没有推荐人。接下来,你应该看到在你的 first_scrapy 目录下创建了两个名为 Books.html 和 Resources.html 的新文件。