我一直在餐廳的食物衛生刮刀。我能夠讓刮刀根據郵政編碼刮掉餐館的名稱,地址和衛生評級。由於食品衛生通過在線圖像顯示,因此我設置了刮刀來讀取「alt =」參數,其中包含食品衛生評分的數值。輸出錯誤的img alt值(Python3,Beautiful Soup 4)
包含IMG ALT標籤我爲食品衛生等級目標的div如下所示:
<div class="rating-image" style="clear: right;">
<a href="/business/abbey-community-college-newtownabbey-antrim-992915.html" title="View Details">
<img src="https://images.scoresonthedoors.org.uk//schemes/735/on_small.png" alt="5 (Very Good)">
</a>
</div>
我已經能夠得到食品衛生的分數給每個餐廳的旁邊輸出。
雖然我的問題是,我注意到一些餐廳旁邊顯示有不正確的閱讀,例如, 3而不是4個食品衛生等級(這是存儲在IMG ALT標籤)
是,上述刮板連接到最初湊的聯繫是
我認爲它可能有一些在「用於g_data for循環中的項目」內的循環的評級位置處做。
如果我移動
appendhygiene(scrape=[name,address,bleh])
一段代碼外循環低於
for rating in ratings:
bleh = rating['alt']
數據與正確的保健評分正確地刮我發現,唯一的問題是,並非所有記錄被刮掉,在這種情況下它只輸出前9個餐廳。
我很欣賞任何人都可以看看我的代碼,並提供幫助來解決這個問題。
PS,我使用郵政編碼BT367NG來刮擦餐館(如果您測試了腳本,您可以使用它來查看不顯示正確衛生價值的餐廳,例如Lins Garden在網站上是4,並且刮掉的數據顯示a 3)。
我的全代碼如下:
import requests
import time
import csv
import sys
from bs4 import BeautifulSoup
hygiene = []
def deletelist():
hygiene.clear()
def savefile():
filename = input("Please input name of file to be saved")
with open (filename + '.csv','w') as file:
writer=csv.writer(file)
writer.writerow(['Address','Town', 'Price', 'Period'])
for row in hygiene:
writer.writerow(row)
print("File Saved Successfully")
def appendhygiene(scrape):
hygiene.append(scrape)
def makesoup(url):
page=requests.get(url)
print(url + " scraped successfully")
return BeautifulSoup(page.text,"lxml")
def hygienescrape(g_data, ratings):
for item in g_data:
try:
name = (item.find_all("a", {"class": "name"})[0].text)
except:
pass
try:
address = (item.find_all("span", {"class": "address"})[0].text)
except:
pass
try:
for rating in ratings:
bleh = rating['alt']
except:
pass
appendhygiene(scrape=[name,address,bleh])
def hygieneratings():
search = input("Please enter postcode")
soup=makesoup(url = "https://www.scoresonthedoors.org.uk/search.php?name=&address=&postcode=" + search + "&distance=1&search.x=16&search.y=21&gbt_id=0")
hygienescrape(g_data = soup.findAll("div", {"class": "search-result"}), ratings = soup.select('div.rating-image img[alt]'))
button_next = soup.find("a", {"rel": "next"}, href=True)
while button_next:
time.sleep(2)#delay time requests are sent so we don't get kicked by server
soup=makesoup(url = "https://www.scoresonthedoors.org.uk/search.php{0}".format(button_next["href"]))
hygienescrape(g_data = soup.findAll("div", {"class": "search-result"}), ratings = soup.select('div.rating-image img[alt]'))
button_next = soup.find("a", {"rel" : "next"}, href=True)
def menu():
strs = ('Enter 1 to search Food Hygiene ratings \n'
'Enter 2 to Exit\n')
choice = input(strs)
return int(choice)
while True: #use while True
choice = menu()
if choice == 1:
hygieneratings()
savefile()
deletelist()
elif choice == 2:
break
elif choice == 3:
break
這工作完美,感謝解釋以及。 :) –