在BeautifulSoup中提取多個Span範圍內的內容

我試圖從多個span標籤中提取字符串內容。 HTML頁面的快照是：在BeautifulSoup中提取多個Span範圍內的內容

<div class="secondary-attributes"> 
    <span class="neighborhood-str-list"> 
     Southeast 
    </span> 
    <address> 
     1234 Python Blvd S<br>Somewhere, NV 98765 
    </address> 
    <span class="biz-phone"> 
     (555) 123-4567 
    </span> 
</div>

具體來說，我想提取的電話號碼，坐落在<span class="biz-phone></span>標記之間。我嘗試用下面的代碼可以這樣做：

import requests 
from bs4 import BeautifulSoup 

res = requests.get(url) 
soup = BeautifulSoup(res.text, "html.parser") 

phone_number_results = [phone_numbers for phone_numbers in soup.find_all('span','biz-phone')]

沒有任何語法錯誤編譯的代碼，但它並沒有完全給我的結果，我希望：

['<span class="biz-phone">\n  (702) 476-5050\n </span>', '<span class="biz-phone">\n  (702) 253-7296\n </span>', '< 
span class="biz-phone">\n  (702) 385-7912\n </span>', '<span class="biz-phone">\n  (702) 776-7061\n </span>', '<spa 
n class="biz-phone">\n  (702) 221-7296\n </span>', '<span class="biz-phone">\n  (702) 252-7296\n </span>', '<span c 
lass="biz-phone">\n  (702) 659-9101\n </span>', '<span class="biz-phone">\n  (702) 355-9445\n </span>', '<span clas 
s="biz-phone">\n  (702) 396-3333\n </span>', '<span class="biz-phone">\n  (702) 643-9851\n </span>', '<span class=" 

biz-phone">\n  (702) 222-1441\n </span>']

我的問題兩部分：

爲什麼運行程序時會出現span標籤？
我該如何擺脫它們？我可以做字符串編輯，但我覺得我不會充分利用BeautifulSoup包。有沒有更優雅的方式？

注意：有更多的HTML代碼片段，就像上面顯示的整個頁面一樣;需要提取的<span class="biz-phone"> (555) 123-4567 </span>代碼（即更多電話號碼）的實例更多，因此我在考慮使用find_all()。

預先感謝您。

來源

2016-10-30 daOnlyBG

使用'phone_numbers.text'或甚至'phone_numbers.text.strip（）' – furas

謝謝@furas，這就是訣竅！ – daOnlyBG

find_all()返回的標籤（bs4.element.Tag），而不是字符串列表。
由於@furas指出，要訪問的每個標籤的text屬性提取標籤中的文字：

phone_number_results = [phone_numbers.text.strip() for phone_numbers in soup.find_all('span', 'biz-phone')]

（你也不妨打個電話strip()）

來源

2016-10-30 20:53:58 dmcc

謝謝，'.text'訣竅！我不知道那個屬性 - 我嘗試了其他幾個（即'.contents'），但這似乎沒有幫助。雖然你的解決方案工作。 – daOnlyBG

在BeautifulSoup中提取多個Span範圍內的內容

回答

相關問題