爲什麼python認爲我的變量是空的？

在我在第三個代碼塊中放入2條if語句之前，我得到了幾乎相同的錯誤，無法連接str和Nonetype。爲什麼python認爲我的變量是空的？

但是，當我在我的3rd if語句中取消打印語句的註釋時，它會打印出帶有路徑的URL列表。

我也試過這在其他網站上它不是隻是這個不工作。

這裏是我回溯

Traceback (most recent call last): 
    File "linkcrawler.py", line 24, in <module> 
    newurl = "http://" + b1 + b2 
TypeError: cannot concatenate 'str' and 'NoneType' objects 
Traceback (most recent call last): 
    File "linkcrawler.py", line 24, in <module> 
    newurl = "http://" + b1 + b2 
TypeError: cannot concatenate 'str' and 'NoneType' objects

我只有兩每次我運行它。

import urllib 
from bs4 import BeautifulSoup 
import traceback 
import urlparse 
import mechanize 

url = "http://www.dailymail.co.uk/home/index.html" 
br = mechanize.Browser() 
urls = [url] 
visited = [url] 

while len(urls)>0: 
    try: 
     br.open(urls[0]) 
     urls.pop(0) 
     for link in br.links(): 
      newurl = urlparse.urljoin(link.base_url,link.url) 
      b1 = urlparse.urlparse(newurl).hostname 
      b2 = urlparse.urlparse(newurl).path 

      newurl = "http://"+b1+b2 

      if newurl not in visited and urlparse.urlparse(url).hostname in newurl: 
       urls.append(newurl) 
       visited.append(newurl) 
       #print newurl 
    except: 
     traceback.print_exc() 
     urls.pop(0) 
print visited

來源

2015-10-14 booberz

是'b1'或'b2'（或兩者）是'None'。你需要以某種方式解釋它。 – thebjorn

它可能只是打印的東西，直到newurl =「http：//」+ b1 + b2失敗，因爲b1或b2之一是None。 – Julien

顯然，沒有主機名或路徑。 – TigerhawkT3

要麼b1或b2是None。爲了解決這個問題，檢查是否b1和b2是空的或None和調整你的代碼，例如：

b1 = urlparse.urlparse(newurl).hostname 
b2 = urlparse.urlparse(newurl).path 

if b1 and b2: 
    newurl = "http://"+b1+b2 
    if newurl not in visited and urlparse.urlparse(url).hostname in newurl: 
     urls.append(newurl) 
     visited.append(newurl) 
     #print newurl 
else: 
    urls.pop(0)

來源

2015-10-14 04:34:55

爲什麼python認爲我的變量是空的？

回答

相關問題