從FeedParser獲取Feed並導入到Pandas DataFrame

我正在學習python。作爲練習，我使用feedparser構建了一個rss scraper，將輸出放入熊貓數據框中，並嘗試使用NLTK挖掘...但我首先從多個RSS提要中獲取文章列表。從FeedParser獲取Feed並導入到Pandas DataFrame

我在pass multiple feeds上使用了這篇文章，並將它與我之前得到的有關如何將它帶入Pandas dataframe的另一個問題的答案組合在一起。

問題是什麼，我希望能夠看到數據框中所有提要的數據。目前，我只能訪問Feed列表中的第一項。

FeedParser似乎在做它的工作，但將它放入熊貓df時，它似乎只抓取列表中的第一個RSS。

import feedparser 
import pandas as pd 

rawrss = [ 
    'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml', 
    'https://www.yahoo.com/news/rss/', 
    'http://www.huffingtonpost.co.uk/feeds/index.xml', 
    'http://feeds.feedburner.com/TechCrunch/', 
    ] 

feeds = [] 
for url in rawrss: 
    feeds.append(feedparser.parse(url)) 

for feed in feeds: 
    for post in feed.entries: 
     print(post.title, post.link, post.summary) 

df = pd.DataFrame(columns=['title', 'link', 'summary']) 

for i, post in enumerate(feed.entries): 
    df.loc[i] = post.title, post.link, post.summary 

df.shape 

df

來源

2017-08-15 Nick Duddy

問題是您只能看到來自DataFrame中最後一個Feed的數據，對嗎？你想要來自DataFrame中每個提要的數據？ – beenjaminnn

是的。對不起，我會編輯並澄清這一點。 –

您的代碼將遍歷每個帖子並打印其數據。將後期數據添加到數據框的代碼部分不是循環的一部分（在python縮進中是有意義的！），因此您只能看到數據框中某個提要的數據。

你可以建立的職位名單，你遍歷提要，然後在年底創建一個數據框：

import feedparser 
import pandas as pd 

rawrss = [ 
    'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml', 
    'https://www.yahoo.com/news/rss/', 
    'http://www.huffingtonpost.co.uk/feeds/index.xml', 
    'http://feeds.feedburner.com/TechCrunch/', 
    ] 

feeds = [] # list of feed objects 
for url in rawrss: 
    feeds.append(feedparser.parse(url)) 

posts = [] # list of posts [(title1, link1, summary1), (title2, link2, summary2) ... ] 
for feed in feeds: 
    for post in feed.entries: 
     posts.append((post.title, post.link, post.summary)) 

df = pd.DataFrame(posts, columns=['title', 'link', 'summary']) # pass data to init

你可以通過組合兩個for循環一點點優化這樣的：

posts = [] 
for url in rawrss: 
    feed = feedparser.parse(url) 
    for post in feed.entries: 
     posts.append((post.title, post.link, post.summary))

來源

2017-08-16 14:36:48 beenjaminnn

謝謝，完美的作品。還正確地向我展示了縮進和它的影響。同樣感謝你的優化版本，我看到並理解你在那裏做了什麼，而我正在使用它。 –

從FeedParser獲取Feed並導入到Pandas DataFrame

回答

相關問題