2013-10-02 43 views
-4

我有許多需要提取和格式化數據的日誌文件。其中一些日誌文件非常大,超過10,000行。Python - 格式化文本文件中的特定數據

任何人都可以推薦一個代碼示例來幫助我讀取文本文件,刪除不需要的行,然後將其餘行編輯爲特定格式。我一直沒有找到任何以前的線程,我有什麼後。

我需要編輯數據的下面是一個例子:

136: add student 50000000 35011/Y01T :Unknown id in field 3 - ignoring line 

137: add student 50000000 5031/Y01S :Unknown id in field 3 - ignoring line 

138: add student 50000000 881/Y01S :Unknown course idnumber in field 4 - ignoring line 

139: add student 50000000 5732/Y01S :Unknown id in field 3 - ignoring line 

134: add student 50000000 W250/Y02S :OK 

135: add student 50000000 35033/Y01T :OK 

我需要搜索的文件並刪除後綴有任何行:OK。 然後,我需要到一個CSV格式,如編輯,其餘行:

add,student,50000000,1234/abcd 

任何提示,代碼段等將有很​​大的幫助,我會非常感激。我會問,但我沒有時間自我教python文件訪問/字符串格式。所以,請允許我事先不嘗試它之前問

回答

0

道歉這可能是一個解決辦法:

import sys 

if len(sys.argv) != 2: 
    print 'Add an input file as parameter' 
    sys.exit(1) 

print 'opening file: %s' % sys.argv[1] 

with open(sys.argv[1]) as input, open('output', 'w+') as output: 
    for line in input: 
     if line is not None: 
      if line == '\n': 
       pass 
      elif 'OK' in line: 
       pass 
      else: 
       new_line = line.split(' ', 7) 
       output.write('%s,%s,%s,%s/%s\n' % (new_line[1], new_line[2], new_line[3], new_line[4], new_line[6])) 
       # just for checking purposes let's print the lines 
       print '%s,%s,%s,%s/%s' % (new_line[1], new_line[2], new_line[3], new_line[4], new_line[6]) 

當心輸出文件名!

+0

我會給它一個去玩的代碼。 非常感謝您的回覆。 – Russ

0

您可以更改正則表達式來滿足您的需求,如果他們有所不同,如果你需要其他的分隔符,你也可以修改csv.writer的參數:

import re, csv 

regex = re.compile(r"(\d+)\s*:\s*(\w+)\s+(\w+)\s+(\w+)\s+([\w/ ]+?):\s*(.+)") 
with open("out.csv", "w") as outfile: 
    writer = csv.writer(outfile, delimiter=',', quotechar='"') 
    with open("log.txt") as f: 
     for line in f: 
      m = regex.match(line) 
      if m and m.group(6) != "OK": 
       writer.writerow(m.groups()[1:-1]) 
+0

你好,謝謝你的回覆。這些對我非常有幫助,我學得很快。非常感謝幫助。 – Russ

0

感謝您的幫助球員。作爲一個新手,我結束的代碼不夠優雅,但它仍然可以完成這項工作:)。

#open the file and create the CSV after filtering the input file. 
def openFile(filename, keyword): #defines the function to open the file. User to pass two variables. 

    list = [] 
    string = '' 

    f = open(filename, 'r') #opens the file as a read and places it into the variable 'f'. 
    for line in f: #for each line in 'f'. 
     if keyword in line: #check to see if the keyword is in the line. 
      list.append(line) #add the line to the list. 

    print(list) #test. 

    for each in list: #filter and clean the info, format the info into a CSV format. 
     choppy = each.partition(': ') #split to remove the prefix. 
     chunk = choppy[2] #take the good string. 
     choppy = chunk.partition(' :') #split to remove the suffix. 
     chunk = choppy[0] #take the good string. 
     strsplit = chunk.split(' ') #split the string by spaces ' '. 
     line = strsplit[0] + ',' + strsplit[1] + ',' + strsplit[2] + ',' + strsplit[3] + ' ' + strsplit[4] + ' ' + strsplit[5] + '\n' #concatenate the strings. 

     string = string + line #concatenate each line to create a single string. 

    print(string) #test. 

    f = open(keyword + '.csv', 'w') #open a file to write. 
    f.write(string) #write the string to the file. 
    f.close() #close the file. 



openFile('russtest.txt', 'cat') 
openFile('CRON ENROL LOG 200913.txt', 'field 4') 

謝謝:)。

相關問題