我練我的網站下面的網站上刮技能的時候,在網站上丟失的數據:我有這麼遠「http://web.californiacraftbeer.com/Brewery-Member」Python的BeautifulSoup佔寫入到csv
的代碼如下。我能夠抓取我想要的字段並將信息寫入CSV,但每行中的信息與實際的公司詳細信息不匹配。例如,公司A在同一行中具有公司D的聯繫人姓名和公司E的電話號碼。
由於某些公司不存在某些數據,因此在編寫應該按公司分離爲CSV的行時,如何解釋此問題?在寫入CSV時,確保我爲正確的公司獲取正確信息的最佳方式是什麼?
"""
Grabs brewery name, contact person, phone number, website address, and email address
for each brewery listed.
"""
import requests, csv
from bs4 import BeautifulSoup
url = "http://web.californiacraftbeer.com/Brewery-Member"
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
company_name = soup.find_all(itemprop="name")
contact_name = soup.find_all("div", {"class": "ListingResults_Level3_MAINCONTACT"})
phone_number = soup.find_all("div", {"class": "ListingResults_Level3_PHONE1"})
website = soup.find_all("span", {"class": "ListingResults_Level3_VISITSITE"})
def scraper():
"""Grabs information and writes to CSV"""
print("Running...")
results = []
count = 0
for company, name, number, site in zip(company_name, contact_name, phone_number, website):
print("Grabbing {0} ({1})...".format(company.text, count))
count += 1
newrow = []
try:
newrow.append(company.text)
newrow.append(name.text)
newrow.append(number.text)
newrow.append(site.find('a')['href'])
except Exception as e:
error_msg = "Error on {0}-{1}".format(number.text,e)
newrow.append(error_msg)
results.append(newrow)
print("Done")
outFile = open("brewery.csv","w")
out = csv.writer(outFile, delimiter=',',quoting=csv.QUOTE_ALL, lineterminator='\n')
out.writerows(results)
outFile.close()
def main():
"""Runs web scraper"""
scraper()
if __name__ == '__main__':
main()
任何幫助非常感謝!
如果某些公司不存在某些數據,則將該數據存儲爲空字符串(''),以便在使用csv編寫時跳過該列。 –