1
我試圖從許多網頁解析HTML文本進行情感分析。在社區的幫助下,我可以遍歷許多網址,並根據textblob庫的情感分析生成情感分數,併成功使用打印功能爲每個網址輸出分數。但是我一直無法實現,將由我的返回變量生成的許多輸出放入列表中,以便我可以繼續使用存儲的數字計算平均值,並在稍後的圖表中顯示我的結果,以進一步繼續我的分析。如何將迭代輸出變量捕獲到列表中進行分析
代碼打印功能:
import requests
import json
import urllib
from bs4 import BeautifulSoup
from textblob import TextBlob
#you can add to this
urls = ["http://www.thestar.com/business/economy/2015/05/19/canadian-consumer-confidence-dips-but-continues-to-climb-in-us-report.html",
"http://globalnews.ca/news/2012054/canada-ripe-for-an-invasion-of-u-s-dollar-stores-experts-say/",
"http://www.cp24.com/news/tsx-flat-in-advance-of-fed-minutes-loonie-oil-prices-stabilize-1.2381931",
"http://www.marketpulse.com/20150522/us-and-canadian-gdp-to-close-out-week-in-fx/",
"http://www.theglobeandmail.com/report-on-business/canada-pension-plan-fund-sees-best-ever-annual-return/article24546796/",
"http://www.marketpulse.com/20150522/canadas-april-inflation-slowest-in-two-years/"]
def parse_websites(list_of_urls):
for url in list_of_urls:
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
# kill all script and style elements
for script in soup(["script", "style"]):
script.extract() # rip it out
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
#print(text)
wiki = TextBlob(text)
r = wiki.sentiment.polarity
print r
parse_websites(urls)
輸出:
>>>
0.10863027172
0.156074203574
0.0766585497835
0.0315555555556
0.0752548359411
0.0902824858757
>>>
但是當我使用返回變量,以形成一個列表中使用的值來工作,我沒有得到任何結果,代碼:
import requests
import json
import urllib
from bs4 import BeautifulSoup
from textblob import TextBlob
#you can add to this
urls = ["http://www.thestar.com/business/economy/2015/05/19/canadian-consumer-confidence-dips-but-continues-to-climb-in-us-report.html",
"http://globalnews.ca/news/2012054/canada-ripe-for-an-invasion-of-u-s-dollar-stores-experts-say/",
"http://www.cp24.com/news/tsx-flat-in-advance-of-fed-minutes-loonie-oil-prices-stabilize-1.2381931",
"http://www.marketpulse.com/20150522/us-and-canadian-gdp-to-close-out-week-in-fx/",
"http://www.theglobeandmail.com/report-on-business/canada-pension-plan-fund-sees-best-ever-annual-return/article24546796/",
"http://www.marketpulse.com/20150522/canadas-april-inflation-slowest-in-two-years/"]
def parse_websites(list_of_urls):
for url in list_of_urls:
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
# kill all script and style elements
for script in soup(["script", "style"]):
script.extract() # rip it out
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
#print(text)
wiki = TextBlob(text)
r = wiki.sentiment.polarity
r = []
return [r]
parse_websites(urls)
輸出:
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
>>>
我怎樣才能使這樣我就可以與數字工作,並可以從列表中添加,減,他們像這樣[R1,R2,R3 ...]
預先感謝您。
感謝您的快速回復!我只希望輸出結果在列表中,並且完美地工作。 – RustyShackleford