與BeautifulSoup不同的標識標籤迭代這可能是一個簡單的問題,但我想通過標籤ID爲迭代= dgrdAcquired_hyplnkacquired_0,dgrdAcquired_hyplnkacquired_1等我如何通過使用Python
是否有任何容易這樣做的方式比我在下面的代碼?麻煩的是這些標籤的數量對於我所提到的每個網頁都會有所不同。當每個網頁可能有不同數量的標籤時,我不確定如何獲取這些標籤中的文字。
html = """
<tr>
<td colspan="3"><table class="datagrid" cellspacing="0" cellpadding="3" rules="rows" id="dgrdAcquired" width="100%">
<tr class="datagridH">
<th scope="col"><font face="Arial" color="Blue" size="2"><b>Name (RSSD ID)</b></font></th><th scope="col"><font face="Arial" color="Blue" size="2"><b>Acquisition Date</b></font></th><th scope="col"><font face="Arial" color="Blue" size="2"><b>Description</b></font></th>
</tr><tr class="datagridI">
<td nowrap="nowrap"><font face="Arial" size="2">
<a id="dgrdAcquired_hyplnkacquired_0" href="InstitutionProfile.aspx?parID_RSSD=3557617&parDT_END=20110429">FIRST CHOICE COMMUNITY BANK (3557617)</a>
</font></td><td><font face="Arial" size="2">
<span id="dgrdAcquired_lbldtAcquired_0">2011-04-30</span>
</font></td><td><font face="Arial" size="2">
<span id="dgrdAcquired_lblAcquiredDescText_0">The acquired institution failed and disposition was arranged of by a regulatory agency. Assets were distributed to the acquiring institution.</span>
</font></td>
</tr><tr class="datagridAI">
<td nowrap="nowrap"><font face="Arial" size="2">
<a id="dgrdAcquired_hyplnkacquired_1" href="InstitutionProfile.aspx?parID_RSSD=104038&parDT_END=20110429">PARK AVENUE BANK, THE (104038)</a>
</font></td>
"""
soup = BeautifulSoup(html)
firm1 = soup.find('a', { "id" : "dgrdAcquired_hyplnkacquired_0"})
data1 = ''.join(firm1.findAll(text=True))
print data1
firm2 = soup.find('a', { "id" : "dgrdAcquired_hyplnkacquired_1"})
data2 = ''.join(firm2.findAll(text=True))
print data2
謝謝阿龍。這工作完美。 – myname 2012-03-27 18:37:11
+1爲一種新方法! – bernie 2012-03-27 18:37:38