2016-11-30 30 views
1

我想學習如何使用BeautifulSoup刮取頁面並將其寫入csv文件。當我開始將字段附加到字典中的鍵時,所有值都附加到每個鍵上,而不僅僅是一個鍵。Python BeautifulSoupHTML表格抓取

我得到的信息,我想:

[<td class="column-2">655</td>] 
[<td class="column-2">660</td>] 
[<td class="column-2">54</td>] 
[<td class="column-2">241</td>] 

後來,當我嘗試分配給每個值的關鍵,我得到:

{'date': ['14th November 2016'], 'total complaints': ['655', '660', '54', '241'], 'complaints': ['655', '660', '54', '241'], 'departures': ['655', '660', '54', '241'], 'arrivals': ['655', '660', '54', '241']} 

的完整代碼(CSV作家只是爲了測試現在) :

import requests 
from bs4 import BeautifulSoup as BS 
import csv 

operational_data_url = "http://heathrowoperationaldata.com/daily-operational-data/" 
operational_data_page = requests.get(operational_data_url).text 
print(operational_data_page) 

soup = BS(operational_data_page, "html.parser") 

data_div = soup.find_all("ul", class_="sub-menu") 

list_items = data_div[0].find_all("li") 

data_links = [] 
for menu in data_div: 
    list_items = menu.find_all("li") 
    for links in list_items: 
     data_link = links.find("a") 
     data_links.append(data_link.get("href")) 

for page in data_links[:1]: 
    data_page = requests.get(page).text 

soup = BS(data_page, "html.parser") 
date = soup.find("title") 
table = soup.find("tbody") 

data = { 
    "date" : [], 
    "arrivals" : [], 
    "departures" : [], 
    "complaints" : [], 
    "total complaints" : [],  
} 

for day in date: 
    data["date"].append(day) 

rows = table.find_all("tr", class_=["row-3", "row-4", "row-36", "row-37"]) 
for row in rows: 
    cols = row.find_all("td", class_="column-2") 
    data["arrivals"].append(cols[0].get_text()) 
    data["departures"].append(cols[0].get_text()) 
    data["complaints"].append(cols[0].get_text()) 
    data["total complaints"].append(cols[0].get_text()) 

#test 
with open('test.csv', 'w') as test_file: 

    fields = ['date', 'arrivals', 'departures', 'complaints', 'total complaints'] 

    writer = csv.DictWriter(test_file, fields) 
    writer.writeheader() 

    row = {'date': day, 'arrivals': 655, 'departures': 660, 'complaints': 54, 'total complaints': 241 } 
    writer.writerow(row) 

感謝您的幫助!

+0

在'for row in rows:'循環中,您明確地將值附加到與每個鍵關聯的列表中。 – elethan

+0

謝謝,我已經試過了,它會將最後一個數字追加到 –

+0

嘗試用我更新的答案中的代碼替換您的for循環。 – elethan

回答

1

當我開始將字段附加到字典中的鍵時,所有值都附加到每個鍵上,而不僅僅是一個鍵。

當前,您的for row in rows:循環會明確執行此操作。

,你想要做這樣的事情,而不是在我看來:

rows = table.find_all("tr", class_=["row-3", "row-4", "row-36", "row-37"]) 
cols = [row.find_all("td", class_="column-2")[0] for row in rows] 
data["arrivals"].append(cols[0].get_text()) 
data["departures"].append(cols[1].get_text()) 
data["complaints"].append(cols[2].get_text()) 
data["total complaints"].append(cols[3].get_text()) 

這會給你以下結果爲data

{'date': [u'14th November 2016'], 'complaints': [u'54'], 'total complaints': [u'241'], 'departures': [u'660'], 'arrivals': [u'655']} 

請注意,這隻有在工作您的rows按正確順序排列。

+1

感謝@elethan的解釋!他們是爲了,它的功能完美! –