2016-11-12 34 views
0

如何將數組輸出保存爲csv文件? 我試過用csv模塊,但沒有給我正確的輸出。我想要像下面的圖片一樣的輸出。將數組數據放在csv文件中

output1.html

<div class="side-article txt-article"> 
    <p><strong></strong> <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"></a> <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"></a></p> 
    <p><br></p> 
    <p><a href="http://batam.tribunnews.com/tag/polres/" title="Polres"></a></p> 
    <p><a href="http://batam.tribunnews.com/tag" title="Polres"></a> <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"></a></p> 
    <br> 

我有代碼:

import csv 
from bs4 import BeautifulSoup 
from HTMLParser import HTMLParser 

with open('output1.html', 'r') as f: 
    html = f.read() 
soup = BeautifulSoup(html.strip(), 'html.parser') 

for line in html.strip().split('\n'): 
    link_words = 0 

    line_soup = BeautifulSoup(line.strip(), 'html.parser') 
    for link in line_soup.findAll('a'): 
     link_words += len(link.text.split()) 

    # naive way to get words count 
    words_count = len(line_soup.text.split())- link_words 

    number_tag_p = len(line_soup.find_all('p')) 
    number_tag_br = len(line_soup.find_all('br')) 
    number_tag_break = number_tag_br + number_tag_p 

    #for line in html.strip().split('\n'): 
    number_of_starttags = 0 
    number_of_endtags = 0 


     # create a subclass and override the handler methods 
    class MyHTMLParser(HTMLParser): 
     def handle_starttag(self, tag, attrs): 
      global number_of_starttags 
      number_of_starttags += 1 

     def handle_endtag(self, tag): 
      global number_of_endtags 
      number_of_endtags += 1 

       # instantiate the parser and fed it some HTML 


    parser = MyHTMLParser() 
    parser.feed(line.lstrip()) 
    number_tag = number_of_starttags + number_of_endtags 
    #print(number_of_starttags + number_of_endtags) 
    CTTD = words_count + link_words + number_tag_break 


    if (words_count + link_words) == 0: 
     CTTD == 0 
    else: 
     CTTD 

    print ('TC : {0} LTC : {1} TG : {2} P : {3} CTTD : {4}' 
      .format(words_count, link_words, number_tag, number_tag_break, CTTD)) 



res = ('TC : {0} LTC : {1} TG : {2} P : {3} CTTD : {4}' 
      .format(words_count, link_words, number_tag, number_tag_break, CTTD)) 
csvfile = "./output1.csv" 

#Assuming res is a flat list 
with open(csvfile, "wb") as output: 
    writer = csv.writer(output, lineterminator='\n') 
    for val in res: 
     writer.writerow([val]) 

#Assuming res is a list of lists 
with open(csvfile, "wb") as output: 
    writer = csv.writer(output, lineterminator='\n') 
    writer.writerows(res) 

算法的輸出

TC : 0 LTC : 0 TG : 0 P : 0 CTTD : 0 
TC : 0 LTC : 0 TG : 0 P : 0 CTTD : 0 
TC : 0 LTC : 0 TG : 1 P : 0 CTTD : 0 
TC : 0 LTC : 0 TG : 1 P : 0 CTTD : 0 
TC : 15 LTC : 0 TG : 2 P : 0 CTTD : 15 

輸出CSV:

enter image description here

如何將打印保存爲csv? 任何python庫可以做到這一點?

我預計產量將

enter image description here

謝謝。

+0

在做了'csv'模塊失敗有什麼辦法名單?它是工作的工具。輸出中是否有空白或其他裝飾繁榮的問題?我所知道的任何工具都不會輸出您呈現的確切表格,因爲該表格由GUI呈現,根本不在文件中。如果您只需簡單地保存逗號分隔的csv,然後將其導入到電子表格中,就可以得到您展示的內容。 – tdelaney

+0

[將數組寫入列中的csv]可能的副本(http://stackoverflow.com/questions/33268347/writing-arrays-to-a-csv-in-columns) – wuno

+0

@tdelaney我更新了我的輸出。謝謝 –

回答

0

也許這就是你想要什麼:

import csv 
from bs4 import BeautifulSoup 
from html.parser import HTMLParser 

class MyHTMLParser(HTMLParser): 
    def handle_starttag(self, tag, attrs): 
     global number_of_starttags 
     number_of_starttags += 1 

    def handle_endtag(self, tag): 
     global number_of_endtags 
     number_of_endtags += 1 

with open('output1.html', 'r') as f: 
    html = f.read() 
soup = BeautifulSoup(html.strip(), 'html.parser') 

ress = [] 
for line in html.strip().split('\n'): 
    link_words = 0 

    line_soup = BeautifulSoup(line.strip(), 'html.parser') 
    for link in line_soup.findAll('a'): 
     link_words += len(link.text.split()) 

    words_count = len(line_soup.text.split())- link_words 
    number_tag_p = len(line_soup.find_all('p')) 
    number_tag_br = len(line_soup.find_all('br')) 
    number_tag_break = number_tag_br + number_tag_p 

    number_of_starttags = 0 
    number_of_endtags = 0 

    parser = MyHTMLParser() 
    parser.feed(line.lstrip()) 
    number_tag = number_of_starttags + number_of_endtags 
    CTTD = words_count + link_words + number_tag_break 


    if (words_count + link_words) == 0: 
     CTTD == 0 
    res = [words_count, link_words, number_tag, number_tag_break, CTTD] 
    ress.append(res) 

csvfile = "./output.csv" 
firstline = ["TC", "LTC", "TG", "P", "CTTD"] 
with open(csvfile, "w") as output: 
    writer = csv.writer(output, lineterminator='\n') 
    writer.writerow(firstline) 
    for val in ress: 
     writer.writerow(val) 

反正,我的輸出這是從你的不同....我得到這個CSV:

TC,LTC,TG,P,CTTD 
0,0,1,0,0 
0,0,8,1,1 
0,0,3,2,2 
0,0,4,1,1 
0,0,6,1,1 
0,0,1,1,1 

因爲你只得到了最後一行(格式在範圍之外)

+0

非常感謝你! –

0

writerow獲取形成特定行中單元格值的元素列表。

所以,當寫到csv它始終是最好來構建你的標題爲列表,並將所有值列表

header = ["TC", "LTC", "TG", "P", "CTTD"] 
val = [[1,2,3,4],[2,3,4,5]] 
with open(csvfile, "w") as output: 
    writer = csv.writer(output, lineterminator='\n') 
    writer.writerow(header) 
    for v in val: 
     writer.writerow(v)