我最近在使用python編寫web爬蟲時觀看了新視頻視頻。出於某種原因,我得到一個SSLError。我試圖用第6行代碼修復它,但沒有運氣。任何想法爲什麼它會拋出錯誤?該代碼是從逐字記錄的新波士頓。來自新波士頓的Python Web爬蟲
import requests
from bs4 import BeautifulSoup
def creepy_crawly(max_pages):
page = 1
#requests.get('https://www.thenewboston.com/', verify = True)
while page <= max_pages:
url = "https://www.thenewboston.com/trade/search.php?pages=" + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'class' : 'item-name'}):
href = "https://www.thenewboston.com" + link.get('href')
print(href)
page += 1
creepy_crawly(1)
SSL錯誤是由於到Web證書。它可能是因爲你試圖抓取的url是'https'。嘗試只有http的其他網站。 – Craicerjack 2014-11-24 19:24:02
可能的重複http://stackoverflow.com/q/10667960/783219 – Prusse 2014-11-24 19:46:30
謝謝Craicerjack!我在網站上嘗試了它,而不僅僅是「http」,它起作用了!但是,我將如何去使用「https」在域上運行網絡爬蟲? – Steven 2014-11-24 20:10:12