我是新來的beautifulsoup和python,我敢肯定這是一個簡單的問題,但我似乎無法解決它。python beautifulsoup循環遍歷表格行
我想循環通過一個html表的行,基於「標題」行按糖果類型分組表。我的表看起來像這樣:
我想循環獲取每個糖果標題下的日期。因此,迭代會得到這樣的數據:
第一循環迭代: candy_type:奇巧, 位置:商城1, 計劃:63, 實際:0, DIFF:25
第二迭代: candy_type:奇巧, 位置:購物中心2, 計劃:7, 實際:0, DIFF:6
......最後一次迭代: candy_type:彩虹糖, 位置:2號樓, 計劃:320, 實際:236, DIFF:0
這是表代碼:
<TABLE BORDER="1" WIDTH="100%">
<TR>
<TH COLSPAN=4>Candy</TH>
</TR>
<TR BGCOLOR=#CEE3F6>
<TD COLSPAN=4>
<FONT FACE=Arial>
<center><b>KitKat</b></center>
</FONT>
</TD>
</TR>
<TR BGCOLOR=#336699>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>LOCATION</FONT></TD>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>PLANNED</FONT></TD>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>ACTUAL</FONT></TD>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>DIFF</FONT></TD>
</TR>
<TR>
<TD>Mall 1</TD>
<TD>63</TD>
<TD>0</TD>
<TD>25</TD>
</TR>
<TR>
<TD>Mall 2</TD>
<TD>7</TD>
<TD>0</TD>
<TD>6</TD>
</TR>
<TR BGCOLOR=#CEE3F6>
<TD COLSPAN=4>
<FONT FACE=Arial>
<center><b>OH Henry</b></center>
</FONT>
</TD>
</TR>
<TR BGCOLOR=#336699>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>LOCATION</FONT></TD>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>PLANNED</FONT></TD>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>ACTUAL</FONT></TD>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>DIFF</FONT></TD>
</TR>
<TR>
<TD>Warehouse 1</TD>
<TD>195</TD>
<TD>122</TD>
<TD>30</TD>
</TR>
<TR>
<TD>Warehouse 2</TD>
<TD>96</TD>
<TD>76</TD>
<TD>6</TD>
</TR>
<TR BGCOLOR=#CEE3F6>
<TD COLSPAN=4>
<FONT FACE=Arial>
<center><b>Skittles</b></center>
</FONT>
</TD>
</TR>
<TR BGCOLOR=#336699>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>LOCATION</FONT></TD>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>PLANNED</FONT></TD>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>ACTUAL</FONT></TD>
<TD><FONT COLOR=White FACE=Arial SIZE=-2>DIFF</FONT></TD>
</TR>
<TR>
<TD>Building 1</TD>
<TD>120</TD>
<TD>90</TD>
<TD>5</TD>
</TR>
<TR>
<TD>Building 2</TD>
<TD>320</TD>
<TD>236</TD>
<TD>0</TD>
</TR>
</TABLE>
所以我試圖
from bs4 import BeautifulSoup
import urllib
readUrl = urllib.urlopen('test.html').read()
soup = BeautifulSoup(readUrl)
candytype = soup.findAll('tr',{"bgcolor" : "#CEE3F6"})
for type in candytype:
print type
這會打印出了三種糖果類型是這樣的:
<tr bgcolor="#CEE3F6">
<td colspan="4">
<font face="Arial">
</font><center><b>KitKat</b></center>
</td>
</tr>
<tr bgcolor="#CEE3F6">
<td colspan="4">
<font face="Arial">
</font><center><b>OH Henry</b></center>
</td>
</tr>
<tr bgcolor="#CEE3F6">
<td colspan="4">
<font face="Arial">
</font><center><b>Skittles</b></center>
</td>
</tr>
我以爲我可以將糖果「標題」(即標題)分組。 tr元素的bgcolor
設置爲#CEE3F6
),然後在此基礎上迭代,但我無法弄清楚如何進一步查看數據。
任何想法?
你必須使用'beautifulsoup'嗎?我會推薦使用['parsel'](https://github.com/scrapy/parsel) – eLRuLL