我想將此URL中的url鏈接寫入文件,但表格中每行有2個'td a'
標籤。我只是想在其中一種class="pagelink"
href="/search"
等根據班級使用美麗的湯分離'td a'標籤
我嘗試下面的代碼,希望能拿起只有那些地方"class":"pagelink"
,但產生的錯誤:
能AttributeError: 'Doctype' object has no attribute 'find_all'
任何人幫助嗎?
import requests
from bs4 import BeautifulSoup as soup
import csv
writer.writerow(['URL', 'Reference', 'Description', 'Address'])
url = https://www.saa.gov.uk/search/?SEARCHED=1&ST=&SEARCH_TERM=city+of+edinburgh%2C+EDINBURGH&ASSESSOR_ID=&SEARCH_TABLE=valuation_roll_cpsplit&PAGE=0&DISPLAY_COUNT=1000&TYPE_FLAG=CP&ORDER_BY=PROPERTY_ADDRESS&H_ORDER_BY=SET+DESC&ORIGINAL_SEARCH_TERM=city+of+edinburgh&DRILL_SEARCH_TERM=BOSWALL+PARKWAY%2C+EDINBURGH&DD_TOWN=EDINBURGH&DD_STREET=BOSWALL+PARKWAY#results
response = session.get(url) #not used until after the iteration begins
html = soup(response.text, 'lxml')
for link in html:
prop_link = link.find_all("td a", {"class":"pagelink"})
writer.writerow([prop_link])
我繼續使用此代碼獲得相同的結果。它會打印每個href兩次(如每行中有2個href標籤)。難道是因爲第二個href上的class標籤是'a class =「pagelink button small」並且由於pagelink這個詞而繼續撿起它? –
感謝您的回覆zroq –
我很抱歉 - 我的錯誤。我更新了代碼。請注意更改'html.find_all(「a」,class _ =「pagelink button small」) - 它現在會給出正確的輸出。 – Zroq