2016-08-02 77 views
0

我們試圖找到一種解析使用Python進行PEST分析產生的棘手文本文件的方法。它顯示了超過30,000個觀測值的63個不同變量的測量結果。下面是輸出的一個例子(如圖3 /> 30,000)Python解析txt文件,PEST輸出,jacobian.txt

      cmfa   cmfb   cmfc   cmfd   cmla   cmlb   cmlc   cmld 
          cmle   cgfa   cgfb   cgfc   cgfd   cgfe   dgfa   dgfb 
          dgfc   dgfd   icfa   icfb   icfc   icfd   vawa   vawb 
          vawc   vawd   vawe   vawf   vswa   vswb   vswc   vswd 
          vswe   chfa   chfb   chfc   chfd   chfe   cgwa   cgwb 
          cgwc   cgwd   cgwe   crta   crtb   crtc   crtd   crte 
          icha   ichb   ichc   ichd   iche   csea   cseb   csec 
          csed   csee   csef   caqa   caqb   crsa   crsb 

       0 -1.900000E-03 1.080000E-02 3.150000E-02 0.00000  0.00000  0.00000  0.00000  -3.020000E-02 
         0.00000  -1.870000E-02 0.00000  4.600000E-03 0.00000  0.00000  0.00000  4.510000E-02 
         0.00000  0.00000  3.650000E-02 -7.000000E-03 -2.100000E-03 -2.000000E-04 3.200000E-03 8.000000E-03 
        -7.000000E-04 -1.500000E-02 0.00000  4.800000E-03 1.900000E-03 4.000000E-04 2.500000E-03 2.500000E-03 
        -1.400000E-02 0.00000  0.00000  0.00000  0.00000  0.00000  -3.200000E-03 -8.060000E-02 
        -0.126500  0.298400  0.00000  0.00000  0.00000  0.00000  0.00000  8.000000E-04 
        -1.900000E-03 1.400000E-03 0.00000  0.00000  -3.200000E-03 0.00000  0.00000  0.00000  
         0.00000  0.00000  0.00000  0.00000  0.00000  -1.200000E-02 1.930000E-02 

       1 -1.800000E-03 1.140000E-02 1.850000E-02 0.00000  0.00000  0.00000  0.00000  -2.600000E-02 
         0.00000  -8.200000E-03 0.00000  1.200000E-03 0.00000  0.00000  0.00000  0.00000  
         0.00000  0.00000  2.560000E-02 -6.100000E-03 -1.100000E-03 0.00000  3.000000E-03 7.400000E-03 
        -7.000000E-04 -1.410000E-02 0.00000  5.000000E-03 1.900000E-03 3.000000E-04 2.300000E-03 2.300000E-03 
        -1.330000E-02 0.00000  0.00000  0.00000  0.00000  0.00000  -3.400000E-03 -8.410000E-02 
        -0.123500  0.301900  0.00000  0.00000  0.00000  0.00000  0.00000  1.200000E-03 
        -2.000000E-03 1.400000E-03 0.00000  0.00000  -3.200000E-03 0.00000  0.00000  0.00000  
         0.00000  0.00000  0.00000  0.00000  0.00000  -1.280000E-02 2.050000E-02 

       2 -3.300000E-03 6.500000E-03 4.040000E-02 0.00000  0.00000  0.00000  0.00000  -7.060000E-02 
        4.840000E-02 -0.112500  0.110300  0.00000  0.00000  0.00000  1.10330  0.00000  
         0.00000  0.00000  3.940000E-02 -8.500000E-03 -1.120000E-02 6.600000E-03 5.700000E-03 1.430000E-02 
        -1.300000E-03 -2.470000E-02 0.00000  3.700000E-03 2.200000E-03 5.000000E-04 4.300000E-03 4.500000E-03 
        -2.250000E-02 0.00000  0.00000  0.00000  0.00000  0.00000  -2.000000E-03 -5.840000E-02 
        -0.157300  0.292400  0.00000  0.00000  0.00000  0.00000  0.00000  -3.600000E-03 
        -1.700000E-03 1.200000E-03 0.00000  0.00000  -3.400000E-03 0.00000  0.00000  0.00000  
         0.00000  0.00000  0.00000  0.00000  0.00000  -7.400000E-03 1.180000E-02 

       3 -2.200000E-03 1.040000E-02 3.500000E-02 0.00000  0.00000  0.00000  0.00000  -4.390000E-02 
         0.00000  -3.170000E-02 2.590000E-02 0.00000  0.00000  0.00000  0.259400  0.00000  
         0.00000  0.00000  3.920000E-02 -1.030000E-02 -3.500000E-03 1.500000E-03 3.600000E-03 9.000000E-03 
        -9.000000E-04 -1.680000E-02 0.00000  4.700000E-03 2.000000E-03 3.000000E-04 2.700000E-03 2.800000E-03 
        -1.560000E-02 0.00000  0.00000  0.00000  0.00000  0.00000  -3.200000E-03 -7.920000E-02 
        -0.131600  0.302200  0.00000  0.00000  0.00000  0.00000  0.00000  3.000000E-04 
        -2.000000E-03 1.300000E-03 0.00000  0.00000  -3.300000E-03 0.00000  0.00000  0.00000  
         0.00000  0.00000  0.00000  0.00000  0.00000  -1.180000E-02 1.880000E-02 

的字母代碼(CMFA,CMFB等)是63級的變量的名稱。每個字母代碼變量都與下列每個文本塊的相同位置中的數字相關。

數字的第一個數據塊用於觀察0,下一個用於觀察的數據塊1等等,用於超過30,000個觀察值。

我們想找到一種方法將其轉換爲文本文件(最好是.csv)。在我的文本示例中,它將有63列和3行(標識符爲+1)。每列將有相應的字母代碼(CMFA等)

如果可能的話將標題,我們想這與任何數量的列和任意數量觀測

+0

那你試試這麼遠嗎?一個簡單的解決方案:您可以使用正則表達式的文本處理器,如vi(unix)或notepad ++(win),並將單個換行符替換爲空格或製表符,而不是用逗號替換空格或製表符。 –

回答

1

的文件來運行分析的一種方法您所提供(獨立於文件中的行數的),使用簡單的Python文件,更好的實現可以使用正則表達式來完成,但我會離開它給你進一步的嘗試:

#Importing required libraries 
import numpy as np 
import csv 

#Open input file 
with open('input.txt','rb') as f: 
    line = f.read().splitlines() 

#Read file and do some parsing 
line2 = [] 
for l in line: 
    z = l.split(" ") 
    l2 = [] 
    for val in z: 
     if not(val==''): 
      l2.append(val) 
    if len(l2)==9: 
     line2.append(l2[1:9]) 
    elif len(l2)==7 or len(l2)==8: 
     line2.append(l2) 

#Remove unnecessary rows and do type conversion to float 
pl = np.arange(0,len(line2)+1,8) 
line3 = [] 
for i in np.arange(0,len(pl)-1): 
    z = line2[pl[i]:pl[i+1]] 
    z2 = [item for sublist in z for item in sublist] 
    if i==0: 
     line3.append(z2) 
    else: 
     line3.append([float(i) for i in z2]) 

#Write to output file 
with open('output.csv','wb') as f: 
    wr = csv.writer(f) 
    for row in line3: 
     wr.writerow(row) 

如果你想保留索引:

#Importing required libraries 
import numpy as np 
import csv 

#Open input file 
with open('input.txt','rb') as f: 
    line = f.read().splitlines() 

#Read file and do some parsing 
line2 = [] 
for l in line: 
    z = l.split(" ") 
    l2 = [] 
    for val in z: 
     if not(val==''): 
      l2.append(val) 
    if not(len(l2)==0): 
     line2.append(l2) 

#Remove unnecessary rows and do type conversion to float 
pl = np.arange(0,len(line2)+1,8) 
line3 = [] 
for i in np.arange(0,len(pl)-1): 
    if i==0: 
     z = line2[pl[i]:pl[i+1]] 
     z2 = [item for sublist in z for item in sublist] 
     line3.append(['']+z2) 
    else: 
     z = line2[pl[i]:pl[i+1]] 
     z2 = [item for sublist in z for item in sublist] 
     line3.append([float(i) for i in z2]) 

#Write to output file 
with open('output.csv','wb') as f: 
    wr = csv.writer(f) 
    for row in line3: 
     wr.writerow(row) 
+0

謝謝Gaurav!這正是我所要求的。我忘了詢問觀察編號是否也可以作爲csv中的第一列轉錄(在本例中爲0,1,2,3)。在csv中也可以有這個功能嗎? – bigCow

+0

編輯包含obs號碼。乾杯!!如果您認爲這是正確的,請標記爲正確。 –

+0

同樣,你回答了我所問的問題。我的錯誤,我忘了問是否也可以將新的ID字段命名爲'ID'或'FID'或類似 – bigCow

0

您可以使用mmap和正則表達式來解析文件,而無需將整個文件讀入內存。

喜歡的東西:

import re 
import mmap 
import os 

size=os.stat(fn_in).st_size 

with open(fn_in, "r") as fin, open(fn_out, "w") as fout: 
    data = mmap.mmap(fin.fileno(), size, access=mmap.ACCESS_READ) 
    for idx, m in enumerate(re.finditer(r"(.*?)(?:(?:^\s*$)|\Z)", data, re.M | re.S)): 
     block=m.group(0).strip() 
     if not block: 
      continue 
     if idx==0: 
      fout.write("O_N,"+",".join(block.split())+"\n") 
     else: 
      fout.write(",".join(block.split())+"\n")