使用BeautifulSoup提取錨點標記值

我正在嘗試使用BeautifulSoup從站點提取值。這些值基本上是搜索結果，在這種情況下，特定地區的藥房。我試圖從提取頁面的源代碼包含以下HTML：使用BeautifulSoup提取錨點標記值

<a id="body_BusinessSearchResultSummaryList_repBusinessList_lnkBusinessProfile_1" class="sr-item-link" href="http://www.mocality.co.ke/b/applegene-pharmacy/applegene/brooklyn/health-and-beauty-medical/_/airtime-chemist-cosmetics-medicine/d42f7388-3f9b-4a34-8971-dc6ae9692586?skw=pharmacys&amp;rcnt=10">Applegene Pharmacy</a>

錨標籤的ID成果的基礎上遞增，因此下一個有2：

<a id="body_BusinessSearchResultSummaryList_repBusinessList_lnkBusinessProfile_2" class="sr-item-link" href="http://www.mocality.co.ke/b/natros-pharmacy/natrosoh/innercore/medical-services/_/_/0cfe6a11-7bee-41f8-8d2e-6a472557201f?skw=pharmacys&amp;rcnt=10">Natros Pharmacy</a>

我使用了findAll（'a'），但是它給了我所有的錨點標記。我怎樣才能使用BeautifulSoup來解析這個並提取特定錨點標記的值？

來源

2011-07-23 jwesonga

from BeautifulSoup import BeautifulSoup 

txt = '''<a id="body_BusinessSearchResultSummaryList_repBusinessList_lnkBusinessProfile_1" class="sr-item-link" href="http://www.mocality.co.ke/b/natros-pharmacy/natrosoh/innercore/medical-services/_/_/0cfe6a11-7bee-41f8-8d2e-6a472557201f?skw=pharmacys&amp;rcnt=10">Natros Pharmacy</a> 
<a id="body_BusinessSearchResultSummaryList_repBusinessList_lnkBusinessProfile_2" class="sr-item-link 
" href="http://www.mocality.co.ke/b/natros-pharmacy/natrosoh/innercore/medical-services/_/_/0cfe6a11- 
7bee-41f8-8d2e-6a472557201f?skw=pharmacys&amp;rcnt=10">Natros Pharmacy</a>''' 
match = 'body_BusinessSearchResultSummaryList_repBusinessList_lnkBusinessProfile' 

soup = BeautifulSoup(txt) 
for a in soup.findAll('a'): 
     if a.has_key('id') and a['id'].startswith(match): 
       print a['href'], a.contents

來源

2011-07-23 20:25:53 lunixbochs

我想你的代碼，沒有結果被返回並且不顯示錯誤。任何想法爲什麼？ – jwesonga

它正好在這些鏈接上運行，試着運行我自己的編輯 – lunixbochs

嘗試過，工作..但在我的情況下，我解析一個網頁，以便從頁面返回的HTML包含比我粘貼的更多。我只展示了我想要解析和提取值的部分。相同的代碼可以與錨標籤之前的一大堆HTML一起工作嗎？ – jwesonga

使用find的關鍵字參數，限制屬性：

find("a", id="whatever_1")

您也可以撥打find用（布爾）功能：

def isRight(tag): 
    return ... 

findAll(isRight)

來源

2011-07-23 20:22:10 katrielalex

使用BeautifulSoup提取錨點標記值

回答

相關問題