添加HTML文件中提取的整數時遇到問題

我無法將鏈接的HTML文件中的數字相加（將它們相加）。添加HTML文件中提取的整數時遇到問題

我收到目前此錯誤：

Line 26 b=sum(y) typeError unsupported operand types for +: int and str

這裏是我的代碼

import urllib 
from BeautifulSoup import * 
import re 

counter = 0 
added = 0 


url = "http://python-data.dr-chuck.net/comments_42.html" 
html = urllib.urlopen(url).read() 

soup = BeautifulSoup(html) 

# Retrieve all of the span tags 
spans = soup('span') 

for comments in spans: 
    print comments 
    counter +=1 
    #y = re.findall('(\d+)', comments) -- didnt work 
    #print y 
    #added += y 
y = re.findall('(\d+)', str(soup)) 
print y 
b = sum(y) 
print b 

print "Count", counter 
print "Sum", added

我期望的輸出是一樣的東西：

Count: 50 
Sum: 2482

正如你可以看到我註釋掉了我的代碼 - 我試圖像t一樣添加它們他原來。不知道爲什麼這不起作用。

#y = re.findall('(\d+)', comments) -- didnt work 
    #print y 
    #added += y

我也不知道爲什麼，這地方找到的號碼列表

y = re.findall('(\d+)', str(soup))

來源

2015-12-31 Oscalation

你正在總結字符串。嘗試用'b = sum（map（int，y））' – Pynchia

那麼，怎麼了？你有錯誤嗎？並且'b = sum（y）'工作嗎？ –

@zetysz：我知道，但是OP會在'b = sum（y）'處得到一個錯誤。但OP只說他在評論部分發生錯誤。 –

您正在試圖總結字符串。在suming之前將字符串轉換爲整數，正如Pynchia所說，然後打印b as the Sum。

... 
b = sum(map(int, y)) 
... 
print "Count", counter 
print "Sum", b

如果要更正的註釋部分使用：

... 
y = re.findall('(\d+)', str(comments)) 
print y 
added = sum(map(int, y))

來源

2015-12-31 08:19:52 Zety

從Python Documentation報價：

re.findall(pattern, string, flags=0)

返回所有非重疊模式的字符串比賽，作爲字符串列表。該字符串是從左到右掃描的，匹配按找到的順序返回。

如果一個或多個組存在於該模式中，則返回一個組的列表;如果該模式具有多個組，則這將是元組列表。除非他們觸及另一場比賽的開始，否則結果中會包含空符。

這個表達式：

y = re.findall('(\d+)', str(soup))將返回匹配你的模式(\d+)這是數字字符串的所有字符串列表。所以你有一個字符串列表。

然後，

b = sum(y)，會嘗試一些字符串而不是整數，這就是爲什麼你得到了錯誤信息。

嘗試，而不是：

b = sum(map(int, y))，這將每串數字轉換在y到整數再總結他們。

DEMO：

>>> s = 'Today is 31st, December, Temperature is 18 degC' 
>>> y = re.findall('(\d+)', s) 
['31', '18'] 
>>> b = sum(map(int, y)) 
>>> b 
49

來源

2015-12-31 08:21:39

添加HTML文件中提取的整數時遇到問題

回答

相關問題