2016-05-31 84 views
2

我有以下代碼:獲取類型錯誤試圖串連 'STR' 時和 'NoneType' 對象

href = "http://www.hltv.org" + link.get('href') 
TypeError: cannot concatenate 'str' and 'NoneType' objects 
:從BS4進口BeautifulSoup

def hltvmatch_spider(max_offset): 
    offset = 0 
    while offset < max_offset: 
     url = 'http://www.hltv.org/?pageid=188&offset=' + str(offset) 
     source_code = requests.get(url) 
     plain_text = source_code.text 
     soup = BeautifulSoup(plain_text, 'html.parser') 
     for link in soup.findAll('a'): 
      href = "http://www.hltv.org" + link.get('href') 
      print(href) 
     offset += 50 

hltvmatch_spider(1) 

這是返回和錯誤 導入請求

任何想法,我需要改變修復?謝謝您的幫助!!

+0

鏈接是你想獲得什麼,拉每個錨點似乎都不是一個好主意 –

回答

1

您應該設置HREF =真所以你只能獲得HREF屬性錨,呼籲.get("href")返回None當錨沒有href屬性:

for link in soup.findAll('a', href=True): 

正如我評論讓所有主播是不是可能你想加入作爲對基本URL什麼是行不通的,這將讓來自主要內容的所有錨標籤:

cont = soup.select_one("div.covMainBoxContent") 

print([a["href"] for a in cont.select("a[href]")]) 

這將使你:

['/?pageid=188&matchid=31342', '/?pageid=179&teamid=4411', '/?pageid=179&teamid=5995', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31343', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6865', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31339', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6865', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31338', '/?pageid=179&teamid=5995', '/?pageid=179&teamid=6736', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31341', '/?pageid=179&teamid=6620', '/?pageid=179&teamid=6807', '/?pageid=188&eventid=2238', '/?pageid=188&matchid=31340', '/?pageid=179&teamid=6807', '/?pageid=179&teamid=6620', '/?pageid=188&eventid=2238', '/?pageid=188&matchid=31329', '/?pageid=179&teamid=5995', '/?pageid=179&teamid=6736', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31336', '/?pageid=179&teamid=6998', '/?pageid=179&teamid=4602', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31334', '/?pageid=179&teamid=4602', '/?pageid=179&teamid=6998', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31331', '/?pageid=179&teamid=4674', '/?pageid=179&teamid=6133', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31330', '/?pageid=179&teamid=6133', '/?pageid=179&teamid=4674', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31333', '/?pageid=179&teamid=4501', '/?pageid=179&teamid=6686', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31332', '/?pageid=179&teamid=4501', '/?pageid=179&teamid=6686', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31319', '/?pageid=179&teamid=4411', '/?pageid=179&teamid=6615', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31321', '/?pageid=179&teamid=6133', '/?pageid=179&teamid=5929', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31320', '/?pageid=179&teamid=5929', '/?pageid=179&teamid=6133', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31318', '/?pageid=179&teamid=6615', '/?pageid=179&teamid=4411', '/?pageid=188&eventid=2135', '/?pageid=188&matchid=31328', '/?pageid=179&teamid=6222', '/?pageid=179&teamid=6408', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31327', '/?pageid=179&teamid=6408', '/?pageid=179&teamid=6222', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31326', '/?pageid=179&teamid=6621', '/?pageid=179&teamid=6968', '/?pageid=188&eventid=2252', '/?pageid=188&matchid=31325', '/?pageid=179&teamid=6621', '/?pageid=179&teamid=6968', '/?pageid=188&eventid=2252', '/?pageid=188&matchid=31324', '/?pageid=179&teamid=6619', '/?pageid=179&teamid=6785', '/?pageid=188&eventid=2252', '/?pageid=188&matchid=31322', '/?pageid=179&teamid=6619', '/?pageid=179&teamid=6785', '/?pageid=188&eventid=2252', '/?pageid=188&matchid=31317', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31316', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31315', '/?pageid=179&teamid=6995', '/?pageid=179&teamid=7009', '/?pageid=188&eventid=2253', '/?pageid=188&matchid=31306', '/?pageid=179&teamid=6995', '/?pageid=179&teamid=7009', '/?pageid=188&eventid=2253', '/?pageid=188&matchid=31314', '/?pageid=179&teamid=4501', '/?pageid=179&teamid=6686', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31310', '/?pageid=179&teamid=6686', '/?pageid=179&teamid=4501', '/?pageid=188&eventid=2262', '/?pageid=188&matchid=31304', '/?pageid=179&teamid=6889', '/?pageid=179&teamid=6994', '/?pageid=188&eventid=2253', '/?pageid=188&matchid=31302', '/?pageid=179&teamid=6994', '/?pageid=179&teamid=6889', '/?pageid=188&eventid=2253', '/?pageid=188&matchid=31313', '/?pageid=179&teamid=6865', '/?pageid=179&teamid=4688', '/?pageid=188&eventid=2255', '/?pageid=188&matchid=31312', '/?pageid=179&teamid=4688', '/?pageid=179&teamid=6865', '/?pageid=188&eventid=2255', '/?pageid=188&matchid=31311', '/?pageid=179&teamid=4688', '/?pageid=179&teamid=6865', '/?pageid=188&eventid=2255', '/?pageid=188&matchid=31309', '/?pageid=179&teamid=4602', '/?pageid=179&teamid=6408', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31308', '/?pageid=179&teamid=4602', '/?pageid=179&teamid=6408', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31307', '/?pageid=179&teamid=4602', '/?pageid=179&teamid=6408', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31305', '/?pageid=179&teamid=6133', '/?pageid=179&teamid=4548', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31303', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6133', '/?pageid=188&eventid=2254', '/?pageid=188&matchid=31294', '/?pageid=179&teamid=4869', '/?pageid=179&teamid=6137', '/?pageid=188&eventid=2176', '/?pageid=188&matchid=31293', '/?pageid=179&teamid=6137', '/?pageid=179&teamid=4869', '/?pageid=188&eventid=2176', '/?pageid=188&matchid=31292', '/?pageid=179&teamid=4869', '/?pageid=179&teamid=6137', '/?pageid=188&eventid=2176', '/?pageid=188&matchid=31291', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31290', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31289', '/?pageid=179&teamid=4548', '/?pageid=179&teamid=6407', '/?pageid=188&eventid=2232', '/?pageid=188&matchid=31301', '/?pageid=179&teamid=5996', '/?pageid=179&teamid=6981', '/?pageid=188&eventid=2273', '/?pageid=188&matchid=31300', '/?pageid=179&teamid=5996', '/?pageid=179&teamid=6981', '/?pageid=188&eventid=2273', '/?pageid=188&matchid=31299', '/?pageid=179&teamid=5996', '/?pageid=179&teamid=6869', '/?pageid=188&eventid=2273', '/?pageid=188&matchid=31298', '/?pageid=179&teamid=5996', '/?pageid=179&teamid=6869', '/?pageid=188&eventid=2273', '/?pageid=188&matchid=31297', '/?pageid=179&teamid=6981', '/?pageid=179&teamid=6792', '/?pageid=188&eventid=2273'] 

如果你是不是更具體的,你會得到這樣的鏈接:

http://static.hltv.org//images/category/5.gif 

http://www.hltv.org是不會正常工作當+。

爲了得到第一個即日期專欄中,我們可以添加a[href*=matchid=]我們的選擇下的鏈接,HTML佈局不是很好看解析,因爲沒有什麼,我會叫一個可靠的方法來分析數據,但由於matchid=是該列中可用的hrefs所特有的。

soup = BeautifulSoup(requests.get("http://www.hltv.org/?pageid=188&offset=1").content) 
cont = soup.select("div.covMainBoxContent a[href*=matchid=]") 

print([a["href"] for a in cont]) 

要獲得完整的HTML,你需要的基本URL加入到HREF:

from urlparse import urljoin 

base = "http://www.hltv.org/" 
soup = BeautifulSoup(requests.get("http://www.hltv.org/?pageid=188&offset=1").content) 
cont = soup.select("div.covMainBoxContent a[href*=matchid=]") 

print([urljoin(base, a["href"]) for a in cont]) 

它給你:

['http://www.hltv.org/?pageid=188&matchid=31342', 'http://www.hltv.org/?pageid=188&matchid=31343', 'http://www.hltv.org/?pageid=188&matchid=31339', 'http://www.hltv.org/?pageid=188&matchid=31338', 'http://www.hltv.org/?pageid=188&matchid=31341', 'http://www.hltv.org/?pageid=188&matchid=31340', 'http://www.hltv.org/?pageid=188&matchid=31329', 'http://www.hltv.org/?pageid=188&matchid=31336', 'http://www.hltv.org/?pageid=188&matchid=31334', 'http://www.hltv.org/?pageid=188&matchid=31331', 'http://www.hltv.org/?pageid=188&matchid=31330', 'http://www.hltv.org/?pageid=188&matchid=31333', 'http://www.hltv.org/?pageid=188&matchid=31332', 'http://www.hltv.org/?pageid=188&matchid=31319', 'http://www.hltv.org/?pageid=188&matchid=31321', 'http://www.hltv.org/?pageid=188&matchid=31320', 'http://www.hltv.org/?pageid=188&matchid=31318', 'http://www.hltv.org/?pageid=188&matchid=31328', 'http://www.hltv.org/?pageid=188&matchid=31327', 'http://www.hltv.org/?pageid=188&matchid=31326', 'http://www.hltv.org/?pageid=188&matchid=31325', 'http://www.hltv.org/?pageid=188&matchid=31324', 'http://www.hltv.org/?pageid=188&matchid=31322', 'http://www.hltv.org/?pageid=188&matchid=31317', 'http://www.hltv.org/?pageid=188&matchid=31316', 'http://www.hltv.org/?pageid=188&matchid=31315', 'http://www.hltv.org/?pageid=188&matchid=31306', 'http://www.hltv.org/?pageid=188&matchid=31314', 'http://www.hltv.org/?pageid=188&matchid=31310', 'http://www.hltv.org/?pageid=188&matchid=31304', 'http://www.hltv.org/?pageid=188&matchid=31302', 'http://www.hltv.org/?pageid=188&matchid=31313', 'http://www.hltv.org/?pageid=188&matchid=31312', 'http://www.hltv.org/?pageid=188&matchid=31311', 'http://www.hltv.org/?pageid=188&matchid=31309', 'http://www.hltv.org/?pageid=188&matchid=31308', 'http://www.hltv.org/?pageid=188&matchid=31307', 'http://www.hltv.org/?pageid=188&matchid=31305', 'http://www.hltv.org/?pageid=188&matchid=31303', 'http://www.hltv.org/?pageid=188&matchid=31294', 'http://www.hltv.org/?pageid=188&matchid=31293', 'http://www.hltv.org/?pageid=188&matchid=31292', 'http://www.hltv.org/?pageid=188&matchid=31291', 'http://www.hltv.org/?pageid=188&matchid=31290', 'http://www.hltv.org/?pageid=188&matchid=31289', 'http://www.hltv.org/?pageid=188&matchid=31301', 'http://www.hltv.org/?pageid=188&matchid=31300', 'http://www.hltv.org/?pageid=188&matchid=31299', 'http://www.hltv.org/?pageid=188&matchid=31298', 'http://www.hltv.org/?pageid=188&matchid=31297'] 
+0

感謝您的幫助!這很好。 – SHUTTEHFACE

+0

@SHUTTEHFACE,不用擔心,你是不是希望頁面中有一些特定的錨定標記,或者你是否真的試圖通過href獲取每個鏈接,而不管它在哪裏? –

+0

我試圖得到更具體的,只從'covMainBoxContent''類'的第一列尋找href。 – SHUTTEHFACE

相關問題