2017-02-25 63 views
0

我目前正在使用python 2.7在其源代碼中爲多個關鍵字搜索網站。我想分配和出口這些關鍵字來單獨列在導出CSV文件是這樣的:如何使用python將多列寫入CSV標頭?

enter image description here

然而,我的代碼我得到這個:

enter image description here

我代碼:

import urllib2 
import csv 

fieldnames = ['Website', 'Sitemap', 'Viewport', '@media'] 

def csv_writerheader(path): 
    with open(path, 'w') as csvfile: 
     writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 
     writer.writeheader() 

def csv_writer(domainname,Sitemap, path): 
    with open(path, 'a') as csvfile: 
     writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 
     # writer.writeheader() 
     writer.writerow({'Website': domainname, 'Sitemap': Sitemap}) 

csv_output_file = 'exported_print_results.csv' 
keyword1 = ['sitemap'] 
keyword2 = ['viewport'] 
keyword3 = ['@media'] 

csv_writerheader(csv_output_file) 

f = open('top1m-edited.csv') 
csv_f = csv.reader(f) 
for line in f: 
    strdomain = line.strip() 
    if '.nl' in strdomain: 
     try: 
      req = urllib2.Request(strdomain.strip()) 
      response = urllib2.urlopen(req) 
      html_content = response.read() 

      # keyword 1 
      for searchstring in keyword1: 
       if searchstring.lower() in str(html_content).lower(): 
        print (strdomain, keyword1, 'found') 
        csv_writer(strdomain, 'found', csv_output_file) 

       else: 
        print (strdomain, keyword1, 'not found') 
        csv_writer(strdomain, 'not found', csv_output_file) 

      # keyword 2 
      for searchstring in keyword2: 
       if searchstring.lower() in str(html_content).lower(): 
        print (strdomain, keyword2, 'found') 
        csv_writer(strdomain, 'found', csv_output_file) 

       else: 
        print (strdomain, keyword2, 'not found') 
        csv_writer(strdomain, 'not found', csv_output_file) 

      # keyword 3 
      for searchstring in keyword3: 
       if searchstring.lower() in str(html_content).lower(): 
        print (strdomain, keyword3, 'found') 
        csv_writer(strdomain, 'found', csv_output_file) 

       else: 
        print (strdomain, keyword3, 'not found') 
        csv_writer(strdomain, 'not found', csv_output_file) 

     except urllib2.HTTPError: 
      print (strdomain, 'HTTP ERROR') 

     except urllib2.URLError: 
      print (strdomain, 'URL ERROR') 

     except urllib2.socket.error: 
      print (strdomain, 'SOCKET ERROR') 

     except urllib2.ssl.CertificateError: 
      print (strdomain, 'SSL Certificate ERROR') 
f.close() 

我該如何編輯我的代碼才能使其工作?

回答

1

考慮使用字典存儲發現不按關鍵字發現值有條件地傳遞到您的CSV寫方法。但在此之前,您的某個問題未指定行終止符csv.writer()這往往是窗口文本文件所需要的。並嘗試在一個循環例程中迭代關鍵字列表。

fieldnames = ['Website', 'Sitemap', 'Viewport', '@media'] 

def csv_writerheader(path): 
    with open(path, 'w') as csvfile: 
     writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator='\n') 
     writer.writeheader() 

def csv_writer(dictdata, path): 
    with open(path, 'a') as csvfile: 
     writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator='\n') 
     writer.writerow(dictdata) 

csv_output_file = 'exported_print_results.csv' 
# LIST OF KEY WORDS (TITLE CASE TO MATCH FIELD NAMES) 
keywords = ['Sitemap', 'Viewport', '@media'] 

csv_writerheader(csv_output_file) 

with open('top1m-edited.csv', 'r') as f: 
    csv_f = csv.reader(f, lineterminator='\n') 
    for line in f: 
     strdomain = line.strip() 
     # INITIALIZE DICT 
     data = {'Website': strdomain} 

     if '.nl' in strdomain:   
      try: 
       req = urllib2.Request(strdomain.strip()) 
       response = urllib2.urlopen(req) 
       html_content = response.read() 

       # ITERATE THROUGH EACH KEY AND UPDATE DICT 
       for searchstring in keywords: 
        if searchstring.lower() in str(html_content).lower(): 
         print (strdomain, searchstring, 'found') 
         data[searchstring] = 'found'  
        else: 
         print (strdomain, searchstring, 'not found') 
         data[searchstring] = 'not found' 

       # CALL METHOD PASSING DICT AND OUTPUT FILE 
       csv_writer(data, csv_output_file) 

       except urllib.HTTPError: 
        print (strdomain, 'HTTP ERROR') 

       except urllib.URLError: 
        print (strdomain, 'URL ERROR') 

       except urllib.socket.error: 
        print (strdomain, 'SOCKET ERROR') 

       except urllib.ssl.CertificateError: 
        print (strdomain, 'SSL Certificate ERROR') 

CSV輸出

Website     Sitemap  Viewport @media 
http://www.google.nl not found not found found 
http://www.youtube.nl not found found  not found 
http://www.facebook.nl not found found  not found 
+0

謝謝你,它的工作原理! – jakeT888

0

電子表格中的默認分隔符似乎不是逗號。很可能它是一個TAB。您可以在導入時將分隔符更改爲逗號(通常有一個允許您選擇它的導入對話框),或者使用TAB作爲字段分隔符從Python輸出。