Python – 读取RSS(feed)源

RSS(Rich Site Summary)是一种传递经常改变的网站内容的格式。许多新闻相关站点、博客和其他在线出版商将其内容作为RSS Feed向任何想要的用户进行发放。在Python中，我们利用以下的软件包来读取和处理这些feeds。

pip install feedparser

Feed结构

在下面的示例中，我们获取Feed的结构，以便进一步分析我们要处理的feed的哪些部分。

import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
entry = NewsFeed.entries[1]

print entry.keys()

当我们运行上面的程序时，我们得到如下输出−

['summary_detail', 'published_parsed', 'links', 'title', 'summary', 'guidislink', 'title_detail', 'link', 'published', 'id']

Feed Blog和文章

在下面的示例中，我们读取RSS feed的标题和正文。

import feedparser

NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")

print 'Number of RSS posts :', len(NewsFeed.entries)

entry = NewsFeed.entries[1]
print 'Post Title :',entry.title

当我们运行上面的程序时，我们得到如下输出−

Number of RSS posts : 5
Post Title : Cong-JD(S) in SC over choice of pro tem speaker

Feed详细信息

基于上面的entry结构，我们可以使用Python程序检索feed中的必要详细信息，如下所示。由于entry是一个字典，我们利用其键来产生所需的值。

import feedparser

NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")

entry = NewsFeed.entries[1]

print entry.published
print "******"
print entry.summary
print "------新闻链接--------"
print entry.link

当我们运行上面的程序时，我们得到如下输出−

Fri, 18 May 2018 20:13:13 GMT
******
周五爆发了一次争议，涉及BJP议员KGBopaiah成为临时演讲人(Pro Tem Speaker)的任命，国大党和JD(S)声称此举违反常规，应让这个职位由议会中资历最资深的议员担任。结合的挑战初步听证会定于今天上午10:30进行。
------新闻链接--------
https://timesofindia.indiatimes.com/india/congress-jds-in-sc-over-bjp-mla-made-pro-tem-speaker-hearing-at-1030-am/articleshow/64228740.cms