我正在學習Python中的美味湯和字典。我正在按照斯坦福大學的美麗湯的簡短教程在這裏找到:http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html將美味湯捕獲的值存儲在字典中,然後訪問這些值
由於訪問網站是禁止的我已經將教程中提供的文本存儲到字符串,然後將字符串湯轉換爲湯對象。打印輸出如下:
print(soup_string)
<html><body><div class="ec_statements"><div id="legalert_title"><a
href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Senators-
Urging-Them-to-Support-Cloture-and-Final-Passage-of-the-Paycheck-
Fairness-Act-S.2199">'Letter to Senators Urging Them to Support Cloture
and Final Passage of the Paycheck Fairness Act (S.2199)
</a>
</div>
<div id="legalert_date">
September 10, 2014
</div>
</div>
<div class="ec_statements">
<div id="legalert_title">
<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-
Representatives-Urging-Them-to-Vote-on-the-Highway-Trust-Fund-Bill">
Letter to Representatives Urging Them to Vote on the Highway Trust Fund Bill
</a>
</div>
<div id="legalert_date">
July 30, 2014
</div>
</div>
<div class="ec_statements">
<div id="legalert_title">
<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Representatives-Urging-Them-to-Vote-No-on-the-Legislation-Providing-Supplemental-Appropriations-for-the-Fiscal-Year-Ending-Sept.-30-2014">
Letter to Representatives Urging Them to Vote No on the Legislation Providing Supplemental Appropriations for the Fiscal Year Ending Sept. 30, 2014
</a>
</div>
<div id="legalert_date">
July 30, 2014
</div>
</div>
<div class="ec_statements">
<div id="legalert_title">
<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Senators-Urging-Them-to-Vote-Yes-
on-the-Motion-to-Proceed-to-the-Emergency-Supplemental-Appropriations-Act-of-2014-S.2648"></a></div></div></body></html>
在某些時候的導師捕捉湯對象中具有標記「格」的所有元素,類_ =「ec_statements」。該
「我們將通過所有在我們的信件收集的項目,併爲每一個,拉出的名稱,使之成爲我們的字典的關鍵:
letters = soup_string.find_all("div", class_="ec_statements")
然後導師說。值將是另一個字典,但我們還沒有找到其他項目的內容,所以我們將創建一個空的字典對象。「
的代碼如下:
lobbying = {}
for element in letters:
lobbying[element.a.get_text()] = {}
然而,當我打印遊說字典,我發現的鍵和值的最後一個元素 - 「信爲本,以參議員緊壓了他們,TO-投票 - 正在進行動議的緊急補充撥款 - 2014年的S.2648號法案「 - 缺少。相反,有一個沒有分配密鑰的空字典。
for key, value in lobbying.iteritems():
print key, value
{}
Letter to Representatives Urging Them to Vote No on the Legislation Providing Supplemental Appropriations for the Fiscal Year Ending Sept. 30, 2014
{}
Letter to Representatives Urging Them to Vote on the Highway Trust Fund Bill
{}
'Letter to Senators Urging Them to Support Cloture and Final Passage of the Paycheck Fairness Act (S.2199)
{}
你如何解釋這一點?您的建議將不勝感激。
last'div'沒有文本,所以它創建了以空字符串爲鍵的元素。而你將它看作是「一個沒有分配鍵的空字典」。 – furas
順便說一句:至少使用'print'>「,key,」<「'你會看到你的鍵是空字符串,或者它只有'spaces','tabs'和'entered' – furas