使用Python提取在特定包含DIV中找到的DIV ID名稱

我一直使用lxml通過xpath從頁面提取數據。到現在爲止還挺好。但我有一個新的挑戰：使用Python提取在特定包含DIV中找到的DIV ID名稱

我必須提取一個包含DIV div的所有ID並將這些ID名稱傳遞到列表中。我猜我可以使用美麗的湯來做到這一點（或者也可能是lxml）我只是不知道如何去做：

例如，在這個我將不得不提取「燈塔」和「小扁豆」：

<div id="live-events"> 

     <div class ="events" id="beacon"> 
      ....other things... 
     </div> 

     <div class="events" id ="lentil"> 
      ....other things... 
     </div> 

    </div>

建議？

謝謝！

來源

2013-10-22 mishap_n

這是非常簡單的：

>>> from bs4 import BeautifulSoup 
>>> soup = BeautifulSoup(""" 
...  <div id="live-events"> 
... 
...  <div class ="events" id="beacon"> 
...   ....other things... 
...  </div> 
... 
...  <div class="events" id ="lentil"> 
...   ....other things... 
...  </div> 
... 
...  </div> 
... """) 
>>> live_events = soup.find(id="live-events") 
>>> ids = [div["id"] for div in live_events.find_all("div")] 
>>> ids 
[u'beacon', u'lentil']

來源

2013-10-22 19:01:00

謝謝！像魅力一樣工作，我學到了新東西。 –

對不起，最後一件事。我如何使用這個請求而不是原始HTML的變量？我目前正在遵循這個指導原則：http://docs.python-guide.org/en/latest/scenarios/scrape/ –

當然這很明顯嗎？您鏈接的頁面顯示瞭如何使用'requests'獲取文檔內容，上面的代碼顯示瞭如何將該內容轉換爲BS對象。我看不到你可能會遇到什麼麻煩... –

使用Python提取在特定包含DIV中找到的DIV ID名稱

回答

相關問題