1
我很新的編程和Python和我嘗試編寫這個簡單的刮刀在此頁面中提取治療師的所有個人資料的網址無法識別鏈接類
import requests
from bs4 import BeautifulSoup
def tru_crawler(max_pages):
p = '&page='
page = 1
while page <= max_pages:
url = 'http://www.therapy-directory.org.uk/search.php?search=Sheffield&distance=40&services[23]=on&services=23&business_type[individual]=on&uqs=626693' + p + str(page)
code = requests.get(url)
text = code.text
soup = BeautifulSoup(text)
for link in soup.findAll('a',{'member-summary':'h2'}):
href = 'http://www.therapy-directory.org.uk' + link.get('href')
yield href + '\n'
print(href)
page += 1
現在,當我運行這個代碼,我什麼也沒有,主要是因爲soup.findall是空的。
個人資料鏈接的HTML顯示
<div class="member-summary">
<h2 class="">
<a href="/therapists/julia-church?uqs=626693">Julia Church</a>
</h2>
所以我不知道在soup.findall通過(「A」),以獲得個人資料的網址
請幫什麼額外的參數
感謝
更新 -
我跑了修改後的代碼和好吧,這一次它刮掉第1頁之後返回了一堆錯誤
Traceback (most recent call last):
File "C:/Users/PB/PycharmProjects/crawler/crawler-revised.py", line 19, enter code here`in <module>
tru_crawler(3)
File "C:/Users/PB/PycharmProjects/crawler/crawler-revised.py", line 9, in tru_crawler
code = requests.get(url)
File "C:\Python27\lib\requests\api.py", line 68, in get
return request('get', url, **kwargs)
File "C:\Python27\lib\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\requests\sessions.py", line 464, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\requests\sessions.py", line 602, in send
history = [resp for resp in gen] if allow_redirects else []
File "C:\Python27\lib\requests\sessions.py", line 195, in resolve_redirects
allow_redirects=False,
File "C:\Python27\lib\requests\sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\requests\adapters.py", line 415, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
的什麼事錯在這裏它返回一串錯誤
的?
感謝這個,但它仍然不返回任何東西:( –
@pb_ng嗯..爲我工作(一連串的鏈接打印)看到更新的答案我是如何嘗試 – har07
謝謝,所以刪除「yield href +'\ n」使它工作如果你不介意我問,爲什麼這樣當使用Yield時,它沒有返回任何東西? –