2017-08-11 75 views
0

我是新來的python。 我已經制作了我自己的網絡爬蟲,這個爬蟲應該是爲了練習Yelp。Web Crawler --- TypeError:強制爲Unicode:需要字符串或緩衝區,找不到類型


我不斷收到這個錯誤,似乎無法讓過去的第一頁:

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "<stdin>", line 26, in yelpSpider 
    TypeError: coercing to Unicode: need string or buffer, NoneType found 

這裏是我的代碼:

import requests 
from BeautifulSoup import BeautifulSoup 
def yelpSpider(maxPages): 
    page = 0 
    listURL = [] 
    listRATE = [] 
    listAREA = [] 
    listADDRESS = [] 
    listType = [] 
    while page <= maxPages: 
     url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=Manhattan,+NY&start=0' + str(page) 
     sourceCode = requests.get(url) 
     plainText = sourceCode.text 
     soup = BeautifulSoup(plainText) 
     for bizName in soup.findAll('a',{'class':'biz-name js-analytics-click'}): 
      href = 'https://www.yelp.com.com' + bizName.get('href') 
      listURL.append(href) 
     for rating in soup.findAll('img',{'class':'offscreen'}): 
      stars = rating.get('alt') 
      listRATE.append(stars) 
     for area in soup.findAll('span',{'class':'neighborhood-str-list'}): 
      listAREA.append(area.string) 
     for type in soup.findAll('span',{'class':'category-str-list'}): 
      listType.append(type) 
     for tracker in range(int(page),int(page) + 10): 
      print(listURL[tracker]) 
      print(' ') 
      print(listAREA[tracker] + ' | ' + listRATE[tracker]) 
     page += 10 

yelpSpider(20) 

謝謝你的幫助!

+0

改變最後打印改爲前後解決您的listRATE:' print('{} | {}'。format(listAREA [tracker],listRATE [tracker]))' –

回答

0

問題在print(listAREA[tracker] + ' | ' + listRATE[tracker])

發生當你的listRATE出來是

['4.5 star rating', 
'4.5 star rating', 
'4.5 star rating', 
'4.0 star rating', 
'4.0 star rating', 
'4.0 star rating', 
'4.0 star rating', 
'5.0 star rating', 
'4.5 star rating', 
'4.0 star rating', 
None, 
None, 
'4.0 star rating', 
'4.5 star rating', 
'4.0 star rating', 
'3.0 star rating', 
'4.0 star rating', 
'3.5 star rating', 
'4.5 star rating', 
'4.5 star rating', 
'5.0 star rating', 
'4.0 star rating', 
None, 
None] 

正如你可以看到tracker: 10指數無它發生。無法在字符串連接中使用無。

所以你不同的選擇,一個是使用or條件,並用''代替它。您的代碼將成爲

print((listAREA[tracker] or '') + ' | ' + (listRATE[tracker] or '')) 

下一個選項是打印

listRATE = list(map(lambda text: text if text is not None else 'N/A', listRATE)) 

執行你的陣列上面會像下面

['4.5 star rating', 
'4.5 star rating', 
'4.5 star rating', 
'4.0 star rating', 
'4.0 star rating', 
'4.0 star rating', 
'4.0 star rating', 
'5.0 star rating', 
'4.5 star rating', 
'4.0 star rating', 
'N/A', 
'N/A', 
'4.0 star rating', 
'4.5 star rating', 
'4.0 star rating', 
'3.0 star rating', 
'4.0 star rating', 
'3.5 star rating', 
'4.5 star rating', 
'4.5 star rating', 
'5.0 star rating', 
'4.0 star rating', 
'N/A', 
'N/A'] 
相關問題