2016-09-02 14 views
0
from bs4 import BeautifulSoup 
import urllib2 
import re 
import json 
p = """ 
<thead> 
<tr> 
<th>Company Name</th> 
<th>Symbol</th> 
<th>Market</th> 
<th>Price</th> 
<th>Shares</th> 
<th>Offer Amount</th> 
<th>Date Priced</th> 
</tr> 
</thead> 
<tr> 
<td><a href="http://www.nasdaq.com" id="two">EXFO INC.</a></td> 
<td><a href="http://www.nasdaq.com" id="two">EXFO</a></td> 
<td><a href="http://www.nasdaq.com" id="two">NASDAQ</a></td> 
<td>$26</td> 
<td>7,000,000</td> 
<td>$182,000,000</td> 
<td>6/30/2000</td> 
</tr> 
<tr> 
<td><a href="http://www.nasdaq.com">IGO, INC.</a></td> 
<td><a href="http://www.nasdaq.com" id="two">MOBE</a></td> 
<td><a href="http://www.nasdaq.com" id="two">NASDAQ</a></td> 
<td>$12</td> 
<td>4,000,000</td> 
<td>$48,000,000</td> 
<td>6/30/2000</td> 
</tr>""" 
soup = BeautifulSoup(p, 'html.parser') 
for ana in soup.find_all('td'): 
    if ana.parent.name == 'tr': 
    print ana.string 

嗨!我試圖從一個站點寫入csv文件的一些數據。在需要的結果是與使用BeautifulSoup將一個子標記的結果寫入csv文件

EXFO INC.,EXFO,NASDAQ,$26,7,000,000,$182,000,000,6/30/2000 
IGO, INC.,MOBE,NASDAQ, $12, 4,000,000,$48,000,000,6/30/2000 

csv文件我的教訓,現在是越來越打印做以下

EXFO INC. 
EXFO 
NASDAQ 
$26 
7,000,000 
$182,000,000 
6/30/2000 
IGO, INC. 
MOBE 
NASDAQ 
$12 
4,000,000 
$48,000,000 
6/30/2000 

任何想法如何做到這一點?我只是不知道如何把它全部進入循環,併爲每個標籤「」提取所有「」標籤。

+0

我的意思是爲每個標籤提取所有的td標籤。 –

回答

0

選擇表,找到THEAD標籤,寫然後提取所有其他行並寫入TD文本:

from bs4 import BeautifulSoup 
from csv import writer 

soup = BeautifulSoup(html) 
table = soup.select_one("table") 
with open("out.csv", "w") as f: 
    wr = writer(f) 
    wr.writerow([th.text for th in table.select("thead th")]) 
    for row in table.find_all("tr"): 
     data = [td.text for td in row.find_all("td")] 
     if data: 
      wr.writerow(data) 

這將使你:

Company Name,Symbol,Market,Price,Shares,Offer Amount,Date Priced 
EXFO INC.,EXFO,NASDAQ,$26,"7,000,000","$182,000,000",6/30/2000 
"IGO, INC.",MOBE,NASDAQ,$12,"4,000,000","$48,000,000",6/30/2000 

另一種方法是找到所有的tr的和索引/切片:

from bs4 import BeautifulSoup 
from csv import writer 
soup = BeautifulSoup(html) 

rows = soup.select("table tr") 
with open("out.csv", "w") as f: 
    wr = writer(f) 
    wr.writerow([th.text for th in rows[0].find_all("th")]) 
    for row in rows[1:]: 
     data = [td.text for td in row.find_all("td")] 
     wr.writerow(data) 

無論方法,你想通過所有TR標記,以便您可以將數據提取所有相關TD每個TR內標籤組進行。

+0

謝謝你的廣泛答案。它幫助了我很多! –