Python 读取RSS Feed
RSS(富站点摘要)是一种用于定期提供变化的网络内容的格式。许多新闻相关网站、博客和其他在线出版商将他们的内容作为RSS Feed提供给希望获得的人。在Python中,我们使用下面的软件包来读取和处理这些Feed。
pip install feedparser
Feed Structure
在下面的示例中,我们获取了订阅源的结构,以便进一步分析我们想要处理哪些部分的订阅源。
import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
entry = NewsFeed.entries[1]
print entry.keys()
当我们运行上面的程序时,我们会得到以下输出−
['summary_detail', 'published_parsed', 'links', 'title', 'summary', 'guidislink', 'title_detail', 'link', 'published', 'id']
订阅源标题和文章
在下面的示例中,我们阅读订阅源的标题和头部。
import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
print 'Number of RSS posts :', len(NewsFeed.entries)
entry = NewsFeed.entries[1]
print 'Post Title :',entry.title
当我们运行上面的程序时,我们得到以下输出 –
Number of RSS posts : 5
Post Title : Cong-JD(S) in SC over choice of pro tem speaker
Feed详情
基于上述条目结构,我们可以使用Python程序从Feed中获取所需的详细信息。由于条目是一个字典,我们利用它的键来产生所需的值。
import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
entry = NewsFeed.entries[1]
print entry.published
print "******"
print entry.summary
print "------News Link--------"
print entry.link
当我们运行上述程序时,我们会得到以下输出 –
Fri, 18 May 2018 20:13:13 GMT
******
Controversy erupted on Friday over the appointment of BJP MLA K G Bopaiah as pro tem speaker for the assembly, with Congress and JD(S) claiming the move went against convention that the post should go to the most senior member of the House. The combine approached the SC to challenge the appointment. Hearing is scheduled for 10:30 am today.
------News Link--------
https://timesofindia.indiatimes.com/india/congress-jds-in-sc-over-bjp-mla-made-pro-tem-speaker-hearing-at-1030-am/articleshow/64228740.cms