我想在下面的HTML標記TH(file.txt的)匹配:BeautifulSoup的findAll與名稱和文本
<TABLE WIDTH="71%" BORDER=0 CELLSPACING=0 CELLPADDING=0>
<TR VALIGN="BOTTOM">
<TH WIDTH="34%" ALIGN="LEFT"><FONT SIZE=1><B>Name<BR> </B></FONT><HR NOSHADE></TH>
<TH WIDTH="3%"><FONT SIZE=1> </FONT></TH>
<TH WIDTH="5%" ALIGN="CENTER"><FONT SIZE=1><B>Age</B></FONT><HR NOSHADE></TH>
<TH WIDTH="3%"><FONT SIZE=1> </FONT></TH>
<TH WIDTH="55%" ALIGN="CENTER"><FONT SIZE=1><B>Positions</B></FONT><HR NOSHADE></TH>
</TR>
<TR BGCOLOR="#CCEEFF" VALIGN="TOP">
<TD WIDTH="34%"><FONT SIZE=2>Stephen A. Wynn</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="5%" ALIGN="CENTER"><FONT SIZE=2>60</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="55%"><FONT SIZE=2>Chairman of the Board and Chief Executive Officer</FONT></TD>
</TR>
<TR BGCOLOR="White" VALIGN="TOP">
<TD WIDTH="34%"><FONT SIZE=2>Kazuo Okada</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="5%" ALIGN="CENTER"><FONT SIZE=2>60</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="55%"><FONT SIZE=2>Vice Chairman of the Board</FONT></TD>
</TR>
</TABLE>
我曾嘗試以下,但它似乎不工作:
from bs4 import BeautifulSoup
infile = open("file.txt")
soup = BeautifulSoup(infile.read())
#this works
soup.findAll('th')
#this works but isn't particularly useful...
soup.findAll(text="Age")
#this is what I really want, but it returns an empty list
soup.findAll('th', text="Age")
感謝您的幫助!
這就是我想要做的事。這似乎是一種合理的方法,但我不清楚爲什麼上述方法無效。 例如,以下似乎工作: soup.findAll( 'TD',文本= re.compile(R 「永利」)) 但這並不: soup.findAll( '日', text = re.compile(r「Age」)) – 2012-07-21 00:40:22