刮表BeautifulSoup

我有一個表結構，看起來像這樣：刮表BeautifulSoup

<tr><td> 
<td> 
<td bgcolor="#E6E6E6" valign="top" align="left">testtestestes</td> 
</tr> 
<tr nowrap="nowrap" valign="top" align="left"> 
<td nowrap="nowrap">8-K</td> 
<td class="small">Current report, items 1.01, 3.02, and 9.01 
<br>Accession Number: 0001283140-16-000129 &nbsp;Act: 34 &nbsp;Size:&nbsp;520 KB 
</td> 
<td nowrap="nowrap">2016-09-19<br>17:30:01</td> 
<td nowrap="nowrap">2016-09-19</td><td align="left" nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=001-03473&amp;owner=include&amp;count=100">001-03473</a> 
<br/>161891888</td></tr>

也就是說一行數據。這是我使用beautifulSoup的腳本。我可以得到<tr>和<td>就好了。但他們在一個單獨的列表中。

for tr in (soup.find_all('tr')): 
     tds = tr.find_all('td') 
     print tds

我的問題是如何從兩個獨立的<tr>獲取數據，使它看起來像他們一行數據。我試圖讓<td>

來源

2016-09-20 essramos

什麼是你想獲得？ –

所以你想每兩個trs配對？ –

是的正確@PadraicCunningham – essramos

之間的文本，如果您要配對起來，創建soup.find_all('tr')迭代器和拉鍊他們入對：

it = iter(soup.find_all('tr')) 
for tr1, tr2 in zip(it, it): 
     tds = tr1.find_all('td') + tr2.find_all("td") 
     print(tds)

與切片相當於將開始不同的啓動POS和使用的2步：

it = soup.find_all('tr') 
for tr1, tr2 in zip(it[::2], it[1::2]): 
     tds = tr1.find_all('td') + tr2.find_all("td") 
     print(tds)

使用ITER意味着你不需要淺表副本列表。

不知道如何爲TRS的大小不均適合的邏輯，就什麼也沒有配對，但如果有，您可以使用izip_longest：

from itertools import izip_longest # python3 zip_longest 

it = iter(soup.find_all('tr')) 
for tr1, tr2 in izip_longest(it, it): 
     tds = tr1.find_all('td') + tr2.find_all("td") if tr2 else [] 
     print(tds)

來源

2016-09-20 00:58:49

這個假設''的計數總是偶數？ – essramos

@wasp，如果他們不是你的問題沒有意義，你怎麼能配對單行？無論你使用'izip_longest/zip_longest'，我都會編輯 –

woops。對不起，你是對的。我試了一下，它的工作原理！謝謝。我使用了第一個解決方案 – essramos

刮表BeautifulSoup

回答

相關問題