訪問與beautifulsoup

嵌套元素我有下面的HTML：訪問與beautifulsoup

<div id="contentDiv"> 
    <!-- START FILER DIV --> 
    <div style="margin: 15px 0 10px 0; padding: 3px; overflow: hidden; background-color: #BCD6F8;"> 
    <div class="mailer">Mailing Address 
     <span class="mailerAddress">500 ORACLE PARKWAY</span> 
     <span class="mailerAddress">MAIL STOP 5 OP 7</span> 
     <span class="mailerAddress">REDWOOD CITY CA 94065</span> 
    </div>

我試圖進入「500 ORACLE PARKWAY」和「郵站5 OP &」，但我不能找到一個方法來做到這一點。我的嘗試是這樣的：

for item in soup.findAll("span", {"class" : "mailerAddress"}): 
    if item.parent.name == 'div': 
     return_list.append(item.contents)

編輯：我忘了提，有後的元素在HTML中使用類似的標籤，以便它捕獲所有的時候我只是想第2位。

編輯：鏈接：https://www.sec.gov/cgi-bin/browse-edgar?CIK=orcl

來源

2017-10-15 big11mac

，你遇到了什麼樣的錯誤？我試過你的代碼，我可以看到你能夠檢索每個span元素中的文本。 – Ali

你能發佈HTML代碼的鏈接嗎？ – Ali

當您在該頁面上提供了一個完美的XML文檔時，爲什麼要嘗試解釋HTML：https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany & CIK = 0001341439 & CIK = 0001341439 &類型= & dateb = &所有者=包括&開始= 0 &計數= 40 &輸出=原子。美麗的湯只應該是最後的可能選項。 –

我要去嘗試與所述信息的一點，我們必須回答這個問題。如果您只想要網頁上某個類的前兩個元素，則可以使用切片。

soup.findAll("span", {"class" : "mailerAddress"})[0:2]

來源

2017-10-15 20:53:35

試試這個：

from bs4 import BeautifulSoup 
import requests 

res = requests.get("https://www.sec.gov/cgi-bin/browse-edgar?CIK=orcl").text 
soup = BeautifulSoup(res,'lxml') 
for item in soup.find_all(class_="mailerAddress")[:2]: 
    print(item.text)

結果：

500 ORACLE PARKWAY 
MAIL STOP 5 OP 7

來源

2017-10-16 07:44:38 SIM

訪問與beautifulsoup

回答

相關問題