我試圖檢索div class =「ipo-cell-height」以及諸如2/21/2014和Sundance Energy Australia等公司名稱中的日期。這裏是鏈接到網站http://www.nasdaq.com/markets/ipos/這裏是html。這個代碼塊包含第二個div類=「genTable薄floatL」風格=「WIDTH:315px」Python webscraper和父母名稱問題
<div class="genTable thin floatL" style="width:315px">
<h3 class="table-headtag">Upcoming IPOs</h3>
<table><tbody>
<tr>
<td><div class="ipo-cell-height">2/21/2014</div></td>
<td><div class="ipo-cell-height"><a id="two_column_main_content_rpt_expected_company_0" href="http://www.nasdaq.com/markets/ipos/company/sundance-energy-australia-ltd-672724-74237">SUNDANCE ENERGY AUSTRALIA LTD</a></div></td>
</tr>
<tr>
<td><div class="ipo-cell-height">2/14/2014</div></td>
<td><div class="ipo-cell-height"><a id="two_column_main_content_rpt_expected_company_1" href="http://www.nasdaq.com/markets/ipos/company/inogen-inc-639597-74090">INOGEN INC</a></div></td>
</tr>
<tr>
<td><div class="ipo-cell-height">2/14/2014</div></td>
<td><div class="ipo-cell-height"><a id="two_column_main_content_rpt_expected_company_2" href="http://www.nasdaq.com/markets/ipos/company/semler-scientific-inc-920476-73980">SEMLER SCIENTIFIC, INC.</a></div></td>
</tr>
<tr>
<td><div class="ipo-cell-height">10/9/2013</div></td>
<td><div class="ipo-cell-height"><a id="two_column_main_content_rpt_expected_company_3" href="http://www.nasdaq.com/markets/ipos/company/sfx-entertainment-inc-885264-73081">SFX ENTERTAINMENT, INC</a></div></td>
</tr>
</tbody></table>
我正在使用的代碼有beautifulsoup,我認爲它需要與parent.name或.contents東西。該代碼僅打印前10個內容。我想我可以得到一些將使用div類作爲parent.name的東西,但「tbody」行不起作用。
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.nasdaq.com/markets/ipos/")
soup = BeautifulSoup(html)
for data in soup.find_all('td') [0:10]:
if data.parent.name == "tr":
# if data.parent.name == "tbody": #This line makes it not print anything
print (data.text)