爲什麼我的鏈接提取不起作用？

我期待學習美麗的湯，並試圖從頁面提取所有鏈接http://www.popsci.com ...但我得到一個語法錯誤。爲什麼我的鏈接提取不起作用？

此代碼應該可以正常工作，但它不適用於我嘗試使用的任何頁面。我試圖找出爲什麼它不工作。

這裏是我的代碼：

from BeautifulSoup import BeautifulSoup 
import urllib2 

url="http://www.popsci.com/" 

page=urllib2.urlopen(url) 
soup = BeautifulSoup(page.read()) 

sci=soup.findAll('a') 

for eachsci in sci: 
    print eachsci['href']+","+eachsci.string

...這是錯誤，我得到：

Traceback (most recent call last): 
    File "/root/Desktop/3.py", line 12, in <module> 
    print eachsci['href']+","+eachsci.string 
TypeError: coercing to Unicode: need string or buffer, NoneType found 
[Finished in 1.3s with exit code 1]

來源

2013-08-17 Ninja2k

當a元素不包含文本，eachsci.string爲None - 你不能連接None與使用+運算符的字符串一樣，正如您正在嘗試的那樣。

如果您要更換eachsci.string與eachsci.text，該錯誤被解決，因爲eachsci.text包含空字符串''當a元素是空的，沒有任何問題串聯與另一個字符串。

但是，當您遇到a元素而沒有href屬性時，您會遇到另一個問題 - 發生此情況時，您將獲得KeyError。

您可以使用dict.get()來解決這個問題，如果某個鍵不在字典中（a元素假裝爲字典，所以這可以工作），它可以返回默認值。

把所有一起，這裏就是你的for循環變化的作品：

for eachsci in sci: 
    print eachsci.get('href', '[no href found]') + "," + eachsci.text

來源

2013-08-17 15:11:05

那好聽的作品真的，謝謝:) – Ninja2k

爲什麼我的鏈接提取不起作用？

回答

相關問題