2017-01-03 116 views
2

我遇到了這個奇怪的問題。Python csv切斷部分列

我還應該提到這在過去有效,所以我也在考慮可能是.csv或特定行本身有問題。

快速分解。我有一個腳本從CVE(漏洞)數據的.csv文件中提取數據。然後,它使用cvss模塊來重新調整我們使用輸出的結果,以此來衡量修補和緊急度的優先級。

(這個腳本是一個臨時的解決辦法,直到我們實現新的工具)

這裏是它攪亂了。這是我的攝取文件輸出現在看起來像。

Vulnerability Title,Plugin ID,Original CVSS Score,Default Vector,Original Severity,AWS Score,AWS Vector,AWS Severity,Hosts,Host Type,Percentage Impacted 
Cisco IOS IKEv1 Packet Handling Remote Information Disclosure (cisco-sa-20160916-ikev1) (BENIGNCERTAIN),NES-93736,4.6,CVSS2#AV:N/AC:L/Au:N/C:P/I:N/A:N,,,AV:N/AC:L/Au:N/C:P/I:N/A:N,,26,26, 
Cisco IOS Software TCP Memory Leak DoS (cisco-sa-20150325-tcpleak),NES-82568,4.9,CVSS2#AV:N/AC:L/Au:N/C:N/I:N/A:C,,,AV:N/AC:L/Au:N/C:N/I:N/A:C,,30,26, 
RHEL 5/6/7 : nss and nss-util (RHSA-2016:2779),NES-94912,4.2,CVSS2#AV:N/AC:M/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND,,,AV:N/AC:M/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND,,5112,23, 

這裏是我的腳本後的輸出(其中波紋管附後)

Vulnerability Title,Plugin ID,Original CVSS Score,Default Vector,Original Severity,AWS Score,AWS Vector,AWS Severity,Hosts,Host Type,Percentage Impacted 
ium,4.6,AV:A/AC:H/Au:M/C:P/I:N/A:P/CDP:L/TD:H/CR:H/IR:H/AR:H,Medium,26,26,0.2524271844660194 
Cisco IOS Software TCP Memory Leak DoS (cisco-sa-20150325-tcpleak),NES-82568,4.9,CVSS2#AV:N/AC:L/Au:N/C:N/I:N/A:C,Medium,4.9,AV:A/AC:H/Au:M/C:N/I:N/A:C/CDP:L/TD:M/CR:H/IR:H/AR:H,Medium,30,26,0.2912621359223301 
RHEL 5/6/7 : nss and nss-util (RHSA-2016:2779),NES-94912,4.2,CVSS2#AV:N/AC:M/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND,Medium,4.2,AV:A/AC:H/Au:M/C:C/I:C/A:C/E:F/RL:OF/RC:ND/CDP:L/TD:M/CR:H/IR:H/AR:H,Medium,5112,23,0.615458704550927 

要一點點進一步解釋,一號線與「IUM」是字詞介質的切斷開始其來自我的腳本的底部(第128行)(#ORIGINAL SCORE部分)。它應該說中等。所以基本上,如果你看起來像我的輸入2,並與輸出進行比較,它將切出整行,並且只添加腳本正在嘗試添加的一半字。我想也許是因爲所有的括號或者什麼,但我不確定。

Cisco IOS IKEv1 Packet Handling Remote Information Disclosure (cisco-sa-20160916-ikev1) (BENIGNCERTAIN),NES-93736,4.6,CVSS2#AV:N/AC:L/Au:N/C:P/I:N/A:N, 

這是執行此功能的腳本。我知道它有點難看,並且歡迎提出改進建議,但要知道爲什麼它搞亂我的文件是我現在的首要任務。我曾考慮轉用熊貓,但這需要一些時間,因爲我從來沒有使用它,所以不知道如何做到這一點。

def rescore_function(): 
#headers 
    print 'Starting Rescore' 
    csv_in = open('/tmp/rescore_test.csv', 'rb') 
    csv_out = open('/tmp/rescored_vulnerabilities.csv', 'wb') 
    writer = csv.writer(csv_out) 
    reader = csv.reader(csv_in) 
    headers = next(reader, None) 
    if headers: 
     writer.writerow(headers) 

    print 'Creating Target Distrobution' 
    for row in csv.reader(csv_in): 
    #This is a terrible way of setting up the percentage of hosts impacted for target distrobution. Its ugly and horrible. Host count defines the host impacted, host_type identifies what kind of host it is. Such as Alinux, Rhel5, or Cisco IOS 
     host_count = float(row[8]) 
     host_type = float(row[9]) 
     alinux_impact = host_count/ALINUX_HOST 
     cisco_impact = host_count/CISCO_COUNT 
     juniper_impact = host_count/JUNIPER_COUNT 
     citrix_impact = host_count/CITRIX_COUNT   
     all_linux= host_count/LINUX_TOTAL 
     print 'math set' 

#The reason for vul_id is 3 lists combined is simple. alinux_impact NEEDS to be 24, cisco NEEDs to be 26, juniper NEEDS to match 27, because vul_id is the softwares 'vulnerability ID type 
#range falls into all_linux. So fillvalue=vul_os[-1] means if its not 24,26,27, it is "all_linux" which means it compares it to the All linux number.  
     vul_id = [24, 26, 27, 25] + range(24) + range(28,101) 
     vul_os = [alinux_impact, cisco_impact, juniper_impact, all_linux] 

     append_file = open('/tmp/rescored_vulnerabilities.csv', 'ab') 
     append_write = csv.writer(append_file) 

#Does the for loop with the fillvalue as mentioned above. Basically Y is the host type (linux, Cisco IOS, etc) and X is the vulnerability type. So it runs through and figures out the TD and rescore methods. 
#X equals the percetange of impacted, so the Metric will be based on amount/percentage of X impacted and does a regex search and replace based on that using the CVSS calculations. 
     print vul_id 
     print vul_os 
     for x,y in izip_longest(vul_os, vul_id, fillvalue=vul_os[-1]): 
      print x,y 
      print host_type 
    #VECTOR REGEXP, host_type is which OS/Device type. 23 = RHEL5, 24 = Alinux, 26 = Cisco, 27 = Juniper 
      if host_type == y: 
       row[10] = x 
       if x <= 0.25: 
        AC_Metric = 'A:C/CDP:L/TD:L/CR:H/IR:H/AR:H' 
        AP_Metric = 'A:P/CDP:L/TD:L/CR:H/IR:H/AR:H' 
        AN_Metric = 'A:N/CDP:L/TD:L/CR:H/IR:H/AR:H' 
        RCUC_Metric = 'RC:UC/CDP:L/TD:L/CR:H/IR:H/AR:H' 
        RCUR_Metric = 'RC:UR/CDP:L/TD:L/CR:H/IR:H/AR:H' 
        RCC_Metric = 'RC:C/CDP:L/TD:L/CR:H/IR:H/AR:H' 
        RCND_Metric = 'RC:ND/CDP:L/TD:L/CR:H/IR:H/AR:H' 
       elif 0.26 <= x <= 0.75: 
        AC_Metric = 'A:C/CDP:L/TD:M/CR:H/IR:H/AR:H' 
        AP_Metric = 'A:P/CDP:L/TD:M/CR:H/IR:H/AR:H' 
        AN_Metric = 'A:N/CDP:L/TD:M/CR:H/IR:H/AR:H' 
        RCUC_Metric = 'RC:UC/CDP:L/TD:M/CR:H/IR:H/AR:H' 
        RCUR_Metric = 'RC:UR/CDP:L/TD:M/CR:H/IR:H/AR:H' 
        RCC_Metric = 'RC:C/CDP:L/TD:M/CR:H/IR:H/AR:H' 
        RCND_Metric = 'RC:ND/CDP:L/TD:M/CR:H/IR:H/AR:H' 
       else: 
        AC_Metric = 'A:C/CDP:L/TD:H/CR:H/IR:H/AR:H' 
        AP_Metric = 'A:P/CDP:L/TD:H/CR:H/IR:H/AR:H' 
        AN_Metric = 'A:N/CDP:L/TD:H/CR:H/IR:H/AR:H' 
        RCUC_Metric = 'RC:UC/CDP:L/TD:H/CR:H/IR:H/AR:H' 
        RCUR_Metric = 'RC:UR/CDP:L/TD:H/CR:H/IR:H/AR:H' 
        RCC_Metric = 'RC:C/CDP:L/TD:H/CR:H/IR:H/AR:H' 
        RCND_Metric = 'RC:ND/CDP:L/TD:H/CR:H/IR:H/AR:H' 


       text = row[6] 
       text = re.sub(r'AV:N','AV:A',text) 
       text = re.sub(r'AC:L','AC:H',text) 
       text = re.sub(r'AC:M','AC:H',text) 
       text = re.sub(r'Au:N','Au:M',text) 
       text = re.sub(r'Au:S','Au:M',text) 
       text = re.sub(r'A:C$',AC_Metric,text) 
       text = re.sub(r'A:P$',AP_Metric,text) 
       text = re.sub(r'A:N$',AP_Metric,text) 
       text = re.sub(r'RC:UC',RCUC_Metric,text) 
       text = re.sub(r'RC:UR',RCUR_Metric,text) 
       text = re.sub(r'RC:C',RCC_Metric,text) 
       text = re.sub(r'RC:ND',RCND_Metric,text) 
       row[6] = text 
    #NEW SCORE, uses CVSS module to take the previous vector and find out the the numbered score. It then uses that number to define the severity word. 
       try: 
        vector = row[6] 
        c = CVSS2(vector) 
        row[5] = c.scores()[2] 
        vul_score = row[5] 
        if 0 <= vul_score <= 3.9: 
         vuln_word = 'Low' 
        elif 4.0 <= vul_score <=6.9: 
         vuln_word = 'Medium' 
        elif 7.0 <= vul_score <= 9.9: 
         vuln_word = 'High' 
        else: 
         vuln_word = 'Critical' 
        row[7] = vuln_word 
       except CVSS2MalformedError: 
        rescored_success = False 
        pass 
    #ORIGINAL SCORE, does the same as above for the original vector since NESSUS does not provide the Severity "word". This only finds the word, not the number value. 
       default_score = float(row[2]) 
       if 0 <= default_score <= 3.9: 
        default_severity = 'Low' 
       elif 4.0 <= default_score <=6.9: 
        default_severity = 'Medium' 
       elif 7.0 <= default_score <= 9.9: 
        default_severity = 'High' 
       else: 
        default_severity = 'Critical' 
       row[4] = default_severity 
       append_write.writerow(row) 
+0

你爲什麼用'rb'模式閱讀?這不是一個二進制文件,是嗎?用'r'嘗試。 – jbasko

+0

@jbasko'rb'是python2中csv.reader的推薦模式(https://docs.python.org/2/library/csv.html#module-contents) – snakecharmerb

+0

謝謝@snakecharmerb不知道。 – jbasko

回答

2

你的代碼是相當大的,從而難以重現,但我懷疑的東西是腥與寫入文件句柄和所有的緩衝事情在寫模式/並行緩衝的文件訪問。相當混亂

  1. 首先你打開/與你寫的標題
  2. 每次迭代,而上述手柄沒有關閉,打開該文件中追加csv_out = open('/tmp/rescored_vulnerabilities.csv', 'wb')
  3. 截斷模式: append_file = open('/tmp/rescored_vulnerabilities.csv', 'ab')
  4. 您還沒有關閉append_file要麼!

我建議這樣的:

  • 第一截斷開放是好的
  • 刪除append_file = open('/tmp/rescored_vulnerabilities.csv', 'ab')
  • 通過write在同一文件替換append_write(它會工作,write點,仍然是開放的)
  • 別忘了closecsv_out最後(或者把所有的代碼放在with open(...) as csv_out:

請注意,該問題僅限於Un * x。在Windows文件系統上,它會立即拋出異常,因爲文件無法在寫入模式下打開兩次(有時也是如此)。

+0

啊,是的,它是追加文件。我刪除了它,並將其轉換爲僅使用原作者並修復了所有內容。十分感謝你的幫助!在完成後,我會在腳本稍後關閉文件寫入文件,但我會修復所有這些。 – Mallachar

+0

太棒了!我沒有看到它可能是什麼。 –