的Python + BeautifulSoup - 通過搜索標準

提取文本文件包含類似下面的HTML代碼（詞「登記」和「飛」是固定在下面的段落）：的Python + BeautifulSoup - 通過搜索標準

<TR> 
<TD class=CAT2 width="10%">Registration</TD> 
<TD class=CAT1 width="20%">02 Mar 2006</TD></TR> 

<TR> 
<TD class=CAT2 width="10%">Flying</TD> 
<TD class=CAT1 width="20%">24 Jun 2005</TD></TR>

我想提取它們，把爲：

註冊2006年03月02日

飛行2005

月24日

我正在使用BeautifulSoup find_next_sibling，但它不返回任何內容。出了什麼問題？

from bs4 import BeautifulSoup 

url = r"C:\example.html" 
page = open(url) 
soup = BeautifulSoup(page.read()) 

aa = soup.find_next_sibling(text='Registration') 

print aa

來源

2014-02-25 Mark K

試試這個

soup.find(text="Registration").findNext('td').contents[0]

來源

2014-02-25 07:28:14 loki

，如果你改變'將工作「登記：」''以「註冊」' –

感謝洛基和巴拉克馬諾斯。 –

這行代碼：

aa = soup.find_next_sibling(text='Registration')

因爲你期待它會不會在HTML返回一個節點。相反，它返回一個NoneType。你想要做的是，找到與text='Registration'得到它的父母並獲得父母的下一個兄弟的元素。

aa = soup.find(text='Registration') 
par = aa.parent 
print par.next_sibling.string

你也可以達到你的輸出爲：

soup = BeautifulSoup(page.read()) 

row_1 = soup.find('tr') 
td = row_1.find('td') 
string_1 = td.string + ' ' + td.next_sibling.string #Registration 02 Mar 2006 

row_2 = row_1.next_sibling 
td = row_2.find('td') 
string_2 = td.string + ' ' + td.next_sibling.string #Flying 24 Jun 2005

來源

2014-02-25 07:28:59 shaktimaan

我用find找到了find_next_sibling。你有沒有在你的代碼中做出這樣的改變？ – shaktimaan

感謝warunsl，它沒有給出錯誤信息，但給出了空白的結果.. –

謝謝你的努力，warunsl。它的信息。但我需要在這裏使用搜索條件，例如'註冊'等。 –

的Python + BeautifulSoup - 通過搜索標準

回答

相關問題