2014-01-19 164 views
1

我已經使用BeautifulSoup將這些表格在Python中的單個表格中進行了擦除。代碼如下:將刮取的表格導出爲CSV

import urllib2 
from bs4 import BeautifulSoup 
for i in range(0,39): 
    first=urllib2.urlopen("http://www.admision.unmsm.edu.pe/res20130914/A/011/"+str(i)+".html").read() 
    soup=BeautifulSoup(first) 
    for tr in soup.find_all('tr')[2:]: 
     tds = tr.find_all('td') 
     print tds[0].text, tds[1].text, tds[2].text, tds[3].text 

的結果是這樣的:

494560 ABAD SAAVEDRA, GERSON HORACIO 011 1116.8750 
455314 ABAD VALVERDE, MARIA ISABEL 011 1482.7500 
491005 ABREGU HUAMAN, MERCEDES LILIANA 011 503.4000 
457929 ACOSTA ABAD, ALEJANDRO FRANCISCO 011 413.0500 

所以,我怎麼能這個表導出爲CSV?

回答

2

使用csv模塊:

import csv 
import urllib2 
from bs4 import BeautifulSoup 

with open('listing.csv', 'wb') as f: 
    writer = csv.writer(f) 
    for i in range(39): 
     url = "http://www.admision.unmsm.edu.pe/res20130914/A/011/{}.html".format(i) 
     u = urllib2.urlopen(url) 
     try: 
      html = u.read() 
     finally: 
      u.close() 
     soup=BeautifulSoup(html) 
     for tr in soup.find_all('tr')[2:]: 
      tds = tr.find_all('td') 
      row = [elem.text.encode('utf-8') for elem in tds[:4]] 
      writer.writerow(row) 
+1

A小調挑剔:這可能會對重複自己的少許清潔劑爲UTF-8的一切,而不是一次。也許'[elem.text.encode('utf-8')for elem in tds [:4]]'? – abarnert

+0

@abarnert,謝謝你的建議。我根據你的評論更新代碼。 – falsetru

+0

有沒有一個很好的教程或主題來理解導出到CSV的代碼? – CreamStat