如何將迭代輸出變量捕獲到列表中進行分析

我試圖從許多網頁解析HTML文本進行情感分析。在社區的幫助下，我可以遍歷許多網址，並根據textblob庫的情感分析生成情感分數，併成功使用打印功能爲每個網址輸出分數。但是我一直無法實現，將由我的返回變量生成的許多輸出放入列表中，以便我可以繼續使用存儲的數字計算平均值，並在稍後的圖表中顯示我的結果，以進一步繼續我的分析。如何將迭代輸出變量捕獲到列表中進行分析

代碼打印功能：

import requests 
import json 
import urllib 
from bs4 import BeautifulSoup 
from textblob import TextBlob 



#you can add to this 
urls = ["http://www.thestar.com/business/economy/2015/05/19/canadian-consumer-confidence-dips-but-continues-to-climb-in-us-report.html", 
     "http://globalnews.ca/news/2012054/canada-ripe-for-an-invasion-of-u-s-dollar-stores-experts-say/", 
     "http://www.cp24.com/news/tsx-flat-in-advance-of-fed-minutes-loonie-oil-prices-stabilize-1.2381931", 
     "http://www.marketpulse.com/20150522/us-and-canadian-gdp-to-close-out-week-in-fx/", 
     "http://www.theglobeandmail.com/report-on-business/canada-pension-plan-fund-sees-best-ever-annual-return/article24546796/", 
     "http://www.marketpulse.com/20150522/canadas-april-inflation-slowest-in-two-years/"] 


def parse_websites(list_of_urls): 
    for url in list_of_urls: 
     html = urllib.urlopen(url).read() 
     soup = BeautifulSoup(html) 
     # kill all script and style elements 

     for script in soup(["script", "style"]): 
      script.extract() # rip it out 

     # get text 
     text = soup.get_text() 

     # break into lines and remove leading and trailing space on each 
     lines = (line.strip() for line in text.splitlines()) 
     # break multi-headlines into a line each 
     chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) 
     # drop blank lines 
     text = '\n'.join(chunk for chunk in chunks if chunk) 

     #print(text) 

     wiki = TextBlob(text) 
     r = wiki.sentiment.polarity 

     print r 




parse_websites(urls)

輸出：

>>> 
0.10863027172 
0.156074203574 
0.0766585497835 
0.0315555555556 
0.0752548359411 
0.0902824858757 
>>>

但是當我使用返回變量，以形成一個列表中使用的值來工作，我沒有得到任何結果，代碼：

import requests 
import json 
import urllib 
from bs4 import BeautifulSoup 
from textblob import TextBlob 



#you can add to this 
urls = ["http://www.thestar.com/business/economy/2015/05/19/canadian-consumer-confidence-dips-but-continues-to-climb-in-us-report.html", 
     "http://globalnews.ca/news/2012054/canada-ripe-for-an-invasion-of-u-s-dollar-stores-experts-say/", 
     "http://www.cp24.com/news/tsx-flat-in-advance-of-fed-minutes-loonie-oil-prices-stabilize-1.2381931", 
     "http://www.marketpulse.com/20150522/us-and-canadian-gdp-to-close-out-week-in-fx/", 
     "http://www.theglobeandmail.com/report-on-business/canada-pension-plan-fund-sees-best-ever-annual-return/article24546796/", 
     "http://www.marketpulse.com/20150522/canadas-april-inflation-slowest-in-two-years/"] 


def parse_websites(list_of_urls): 
    for url in list_of_urls: 
     html = urllib.urlopen(url).read() 
     soup = BeautifulSoup(html) 
     # kill all script and style elements 

     for script in soup(["script", "style"]): 
      script.extract() # rip it out 

     # get text 
     text = soup.get_text() 

     # break into lines and remove leading and trailing space on each 
     lines = (line.strip() for line in text.splitlines()) 
     # break multi-headlines into a line each 
     chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) 
     # drop blank lines 
     text = '\n'.join(chunk for chunk in chunks if chunk) 

     #print(text) 

     wiki = TextBlob(text) 
     r = wiki.sentiment.polarity 
     r = [] 
     return [r] 




parse_websites(urls)

輸出：

Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32 
Type "copyright", "credits" or "license()" for more information. 
>>> ================================ RESTART ================================ 
>>> 
>>>

我怎樣才能使這樣我就可以與數字工作，並可以從列表中添加，減，他們像這樣[R1，R2，R3 ...]

預先感謝您。

來源

2015-06-07 RustyShackleford

從下面的代碼，你問蟒蛇返回一個空列表：

r = wiki.sentiment.polarity 

r = []  #creat empty list r 
return [r] #return empty list

如果我理解正確的話您的問題，所有你需要做的是：

my_list = [] #create empty list 

    for url in list_of_urls: 
    html = urllib.urlopen(url).read() 
    soup = BeautifulSoup(html) 

    for script in soup(["script", "style"]): 
     script.extract() # rip it out 

    text = soup.get_text() 

    lines = (line.strip() for line in text.splitlines()) 
    chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) 
    text = '\n'.join(chunk for chunk in chunks if chunk) 

    wiki = TextBlob(text) 
    r = wiki.sentiment.polarity 

    my_list.append(r) #add r to list my_list 

print my_list

[ r1，r2，r3，...]

或者，您可以創建一個字典，其URL爲

my_dictionary = {} 

     r = wiki.sentiment.polarity 
     my_dictionary[url] = r 

print my_dictionary

{ '爲url1'：R1，「URL2：R2等）

print my_dictionary['url1']

R1

字典可能更適合你，因爲使用URL作爲關鍵字可以更容易地檢索，編輯和刪除「r」。

我是一種新的Python，所以希望其他人會糾正我，如果這沒有意義...

來源

2015-06-07 21:12:41 gtalarico

感謝您的快速回復！我只希望輸出結果在列表中，並且完美地工作。 – RustyShackleford

如何將迭代輸出變量捕獲到列表中進行分析

回答

相關問題