2011-12-20 108 views
0

我使用美麗的湯模塊來刮取保存在csv中的網頁列表的標題。該腳本看起來做工精細,但一旦到達82域它產生以下錯誤:美麗的湯錯誤

Traceback (most recent call last): 
    File "soup.py", line 31, in <module> 
    print soup.title.renderContents() # 'Google' 
AttributeError: 'NoneType' object has no attribute 'renderContents' 

我是相當新的蟒蛇,所以我不知道我理解錯誤,會有人能夠澄清出了什麼問題?

我的代碼是:

import csv 
import socket 
from urllib2 import Request, urlopen, URLError, HTTPError 
from BeautifulSoup import BeautifulSoup 

debuglevel = 0 

timeout = 5 

socket.setdefaulttimeout(timeout) 
domains = csv.reader(open('domainlist.csv')) 
f = open ('souput.txt', 'w') 
for row in domains: 
domain = row[0] 
req = Request(domain) 
try: 
    html = urlopen(req).read() 
    print domain 
except HTTPError, e: 
    print 'The server couldn\'t fulfill the request.' 
    print 'Error code: ', e.code 
except URLError, e: 
    print 'We failed to reach a server.' 
    print 'Reason: ', e.reason 
else: 
    # everything is fine 
    soup = BeautifulSoup(html) 

    print soup.title # '<title>Google</title>' 
    print soup.title.renderContents() # 'Google' 
    f.writelines(domain) 
    f.writelines(" ") 
    f.writelines(soup.title.renderContents()) 
    f.writelines("\n") 

回答

1

正如maozet說,你的問題是,標題是無,您可以檢查該值,以避免這樣的問題:

soup = BeautifulSoup(html) 

if soup.title != None: 
    print soup.title # '<title>Google</title>' 
    print soup.title.renderContents() # 'Google' 
    f.writelines(domain) 
    f.writelines(" ") 
    f.writelines(soup.title.renderContents()) 
    f.writelines("\n") 
+0

謝謝!似乎在做這項工作。 – 2011-12-20 13:29:29

1

如果一個頁面沒有標題???
我有這個問題一次....只是把代碼嘗試除了或檢查標題。

+0

這......很有道理!我會試着找出如何做到這一點:) – 2011-12-20 11:46:34

+1

我想它應該更好檢查無,如'如果soup.title!= None:'然後做你的事 – 2011-12-20 11:55:35

+0

我已經嘗試添加例外無, nonetype,但都沒有工作,恐怕我對錯誤處理不是很有經驗。 – 2011-12-20 12:36:32

0

我面臨同樣的問題,但在閱讀了幾個相關的問題和谷歌搜索幫助我通過。這裏是我建議處理的具體錯誤,如NoneType:

soup = BeautifulSoup(urllib2.urlopen('http://webpage.com').read()) 
scrapped = soup.find(id='whatweseekfor') 

if scrapped == None: 
    # command when encountering an error eg: print none 

elif scrapped != None: 
    # command when there is no None type error eg: print scrapped.get_text() 

祝你好運!