我想學習如何使用BeautifulSoup刮取頁面並將其寫入csv文件。當我開始將字段附加到字典中的鍵時,所有值都附加到每個鍵上,而不僅僅是一個鍵。Python BeautifulSoupHTML表格抓取
我得到的信息,我想:
[<td class="column-2">655</td>]
[<td class="column-2">660</td>]
[<td class="column-2">54</td>]
[<td class="column-2">241</td>]
後來,當我嘗試分配給每個值的關鍵,我得到:
{'date': ['14th November 2016'], 'total complaints': ['655', '660', '54', '241'], 'complaints': ['655', '660', '54', '241'], 'departures': ['655', '660', '54', '241'], 'arrivals': ['655', '660', '54', '241']}
的完整代碼(CSV作家只是爲了測試現在) :
import requests
from bs4 import BeautifulSoup as BS
import csv
operational_data_url = "http://heathrowoperationaldata.com/daily-operational-data/"
operational_data_page = requests.get(operational_data_url).text
print(operational_data_page)
soup = BS(operational_data_page, "html.parser")
data_div = soup.find_all("ul", class_="sub-menu")
list_items = data_div[0].find_all("li")
data_links = []
for menu in data_div:
list_items = menu.find_all("li")
for links in list_items:
data_link = links.find("a")
data_links.append(data_link.get("href"))
for page in data_links[:1]:
data_page = requests.get(page).text
soup = BS(data_page, "html.parser")
date = soup.find("title")
table = soup.find("tbody")
data = {
"date" : [],
"arrivals" : [],
"departures" : [],
"complaints" : [],
"total complaints" : [],
}
for day in date:
data["date"].append(day)
rows = table.find_all("tr", class_=["row-3", "row-4", "row-36", "row-37"])
for row in rows:
cols = row.find_all("td", class_="column-2")
data["arrivals"].append(cols[0].get_text())
data["departures"].append(cols[0].get_text())
data["complaints"].append(cols[0].get_text())
data["total complaints"].append(cols[0].get_text())
#test
with open('test.csv', 'w') as test_file:
fields = ['date', 'arrivals', 'departures', 'complaints', 'total complaints']
writer = csv.DictWriter(test_file, fields)
writer.writeheader()
row = {'date': day, 'arrivals': 655, 'departures': 660, 'complaints': 54, 'total complaints': 241 }
writer.writerow(row)
感謝您的幫助!
在'for row in rows:'循環中,您明確地將值附加到與每個鍵關聯的列表中。 – elethan
謝謝,我已經試過了,它會將最後一個數字追加到 –
嘗試用我更新的答案中的代碼替換您的for循環。 – elethan