Python：讀取和寫入複雜和重複格式的文件

首先，對於可憐的英語來說很抱歉。我有一個重複格式的文件。如Python：讀取和寫入複雜和重複格式的文件

 326           Iteration:  0 #Bonds:  10 
    1 6 7 14 54 70 77 0 0 0 0 0 1 0.693 0.632 0.847 0.750 0.644 0.000 0.000 0.000 0.000 0.000 3.566 0.000 0.028 
    2 6 3 6 15 55 0 0 0 0 0 0 1 0.925 0.920 0.909 0.892 0.000 0.000 0.000 0.000 0.000 0.000 3.645 0.000 -0.040 
    3 6 2 8 10 52 0 0 0 0 0 0 1 0.925 0.910 0.920 0.898 0.000 0.000 0.000 0.000 0.000 0.000 3.653 0.000 0.000 
... 
    324 8 323 0 0 0 0 0 0 0 0 0 100 0.871 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.871 3.000 -0.493 
    325 2 326 0 0 0 0 0 0 0 0 0 101 0.930 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.930 0.000 0.334 
    326 8 325 0 0 0 0 0 0 0 0 0 101 0.930 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.930 3.000 -0.611 
    637.916060425841  306.094529423257  1250.10511927236 
    6.782126993565285E-006 
     326 (repeating from here)     Iteration:  100 #Bonds:  10 
    1 6 7 14 54 64 70 77 0 0 0 0 1 0.885 0.580 0.819 0.335 0.784 0.709 0.000 0.000 0.000 0.000 4.111 0.000 0.025 
    2 6 3 6 15 55 0 0 0 0 0 0 1 0.812 0.992 0.869 0.966 0.000 0.000 0.000 0.000 0.000 0.000 3.639 0.000 -0.034 
    3 6 2 8 10 52 0 0 0 0 0 0 1 0.812 0.966 0.989 0.926 0.000 0.000 0.000 0.000 0.000 0.000 3.692 0.000 0.004

正如你可以在這裏看到，第一行是標題，以及第2〜第327行，我要分析的數據，以及第328位和第329行有我穿上一些數字」不想使用。接下來的「幀」從第330行開始，格式完全相同。這個「框架」重複超過200000次。
我想使用每幀第2〜327行數據的第1〜13列。我也想使用頭的第一個數字。
我想從每幀的目標矩陣中分析數據，所有重複「幀」的第2〜327行的數據，第3〜12列，打印數量0和非0數據的數量。也打印一些第一，第二和第十三列。所以預期的輸出文件變得像
```
326 
    1 
1 6 5 5 1 
2 6 4 6 1 
... 
325 2 1 9 101 
326 8 1 9 101 
326 (Next frame starts from here) 
    2 
1 6 5 5 1 
2 6 4 6 1 
... 
326 
    3 
1 6 5 5 1 
2 6 4 6 1 
... 
```
第一行：第一行的第一個數字。
下聯：幀號
第三〜第328行：輸入文件的第一列中，輸入文件的第2列，3TH〜輸入的第12列，3TH的零的個數〜第12列的非零的個數輸入和第13列輸入。
從第4行開始：重複格式，與上面相同。

所以，結果文件有2個標題行，並且分析了326行數據，每幀總共328行。相同的格式也會重複下一幀。使用該格式的結果數據（每個5個空格）建議將該文件用於其他目的。

我使用的方式是，爲13列創建13個數組 - >爲每個幀使用雙循環存儲數據，每個328行。但我不知道如何處理輸出。

以下是我的試用代碼（未完成，僅用於讀取輸入），但此代碼有很多問題。 Linecache讀取整行，而不是每個第一行的第一個數字。每一幀都有326 + 3 = 329行，但看起來像我的代碼不適合框架工作。我歡迎任何幫助和協助分析這些數據。非常感謝你提前。

# Read the file 
filename = raw_input("Enter the file name \n") 
file = open(filename, 'r') 

# Read the number of atom from header 
import linecache 
nnn = linecache.getline(filename, 1) 
natoms = int(nnn) 
singleframe = natoms + 3 

# get number of frames 
nlines = 0 
for i1 in file: 
    nlines = nlines +1 
file.close() 

nframes = nlines/singleframe 

print 'no of lines are: ', nlines 
print 'no of frames are: ', nframes 
print 'no of atoms are:', natoms 

# Create 1d string array 
nrange = range(nlines) 
data_lines = [None]*(nlines) 

# Store whole input file into string array 
file = open(filename, 'r') 
i1=0 
for i1 in nrange: 
    data_lines[i1] = file.readline() 
file.close() 


# Create 1d array to store atomic data 
at_index = [None]*natoms 
at_type = [None]*natoms 
n1 = [None]*natoms 
n2 = [None]*natoms 
n3 = [None]*natoms 
n4 = [None]*natoms 
n5 = [None]*natoms 
n6 = [None]*natoms 
n7 = [None]*natoms 
n8 = [None]*natoms 
n9 = [None]*natoms 
n10 = [None]*natoms 
molnr = [None]*natoms 

nrange1= range(natoms) 
nframe = range(nframes) 

file = open('output_force','w') 
print data_lines[9] 
for j1 in nframe: 
    start = j1*(natoms + 3) + 3 
    for i1 in nrange1: 
     line = data_lines[i1+start].split() #Split each line based on spaces 
     at_index[i1] = int(line[0]) 
     at_type[i1] = int(line[1]) 
     n1[i1]= int(line[2]) 
     n2[i1]= int(line[3]) 
     n3[i1]= int(line[4]) 
     n4[i1]= int(line[5]) 
     n5[i1]= int(line[6]) 
     n6[i1]= int(line[7]) 
     n7[i1]= int(line[8]) 
     n8[i1]= int(line[9]) 
     n9[i1]= int(line[10]) 
     n10[i1]= int(line[11]) 
     molnr[i1]= int(line[12])

來源

2013-11-27 exsonic01

當您使用csv文件時，應該查看csv module。我寫了一個代碼應該可以做到。

此代碼假定「良好的數據」。如果您的數據集可能包含錯誤（例如列數少於13，或數據行少於326），則應該進行一些更改。

（更改以符合與Python 2.6.6）

import csv 
with open('mydata.csv') as in_file: 
    with open('outfile.csv', 'wb') as out_file: 
     csv_reader = csv.reader(in_file, delimiter=' ', skipinitialspace=True) 
     csv_writer = csv.writer(out_file, delimiter = '\t') 

     # Iterate over all rows in the file 
     for i, header in enumerate(csv_reader): 
      # Get the header data 
      num = header[0] 
      csv_writer.writerow([num]) 

      # Write frame number, starting with 1 (hence the +1 part) 
      csv_writer.writerow([i+1]) 

      # Iterate over all data rows 
      for _ in xrange(326): 

       # Call next(csv_reader) to get the next row 
       # Put inside a try ... except to avoid StopIteration exception 
       # if end of file is found before reaching 326 lines 
       try: 
        row = next(csv_reader) 
       except StopIteration: 
        break 
       # Use list comprehension to extract number of zeros 
       zeros = sum([1 for x in row[2:12] if x.strip() == '0']) 
       not_zeros = 10 - zeros 
       # Write the data to output file 
       out = [row[0].strip(), row[1].strip(),not_zeros, zeros, row[12].strip()] 
       csv_writer.writerow(out) 
      # If the 
      else: 
       # Skip the last two lines of the file 
       next(csv_reader) 
       next(csv_reader)

對於前三行，這產生了：

來源

2013-11-28 01:18:56

感謝。我甚至不知道有csv模塊。這很棒。非常感謝你。輸入文件不是csv文件，但它是從Fortran作品創建的，因此它具有統一的格式。無需擔心錯誤THANKs – exsonic01

第二行在逗號處給出語法錯誤。我怎麼能逃避這個？這是相當基本的問題，但請原諒我，我以前從未使用過這些模塊。 – exsonic01

您使用的是哪個版本的Python？（在控制檯中寫入'import sys; sys.version'以顯示它） –

Python：讀取和寫入複雜和重複格式的文件

回答

相關問題