2013-01-18 34 views
1

輸入:分割線在Python基於一些字符

!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1 
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000. 
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W 
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56 
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34 
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22. 

輸出: '!'

!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:19,000.0,0,37N22. 

是起始字符,+0013應該是每行的結尾(如果存在)。

問題這我得到: 輸出是這樣的:

!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/1 
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:14,000. 
0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W 

任何幫助將不勝感激... !!!

我的代碼:

file_open= open('sample.txt','r') 
file_read= file_open.read() 
file_open2= open('output.txt','w+') 
counter =0 
for i in file_read: 
    if '!' in i: 
     if counter == 1: 
      file_open2.write('\n') 
      counter= counter -1 
     counter= counter +1 
    file_open2.write(i) 
+0

我的代碼工作,這將是非常友好的內存= open('output.txt','w +') counter = 0 for i in file_read: if'!'在我: 如果計數器== 1: file_open2.write( '\ n') 櫃檯=櫃檯-1 計數器=計數器+1 file_open2.write(I) – jags

+0

我已經添加你的代碼在你的問題,但我不確定縮進是否正確。隨時修復它,下次直接編輯你的問題。 – Wilduck

回答

1

你能不能用str.split

lines = file_read.split('!') 

現在行是包含分割數據的列表。這幾乎是你想要寫的行 - 唯一的區別是它們沒有尾隨換行符,並且在開始時它們沒有'!'。我們可以很容易地將這些字符串格式化 - 例如'!{0}\n'.format(line)。然後,我們可以把發電機表達式,整件事情我們將傳遞給file.writelines把數據在一個新的文件:

file_open2.writelines('!{0}\n'.format(line) for line in lines) 

您可能需要:

file_open2.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines) 

,如果你發現你在輸出中獲得的比你想要的更多的新行。

其他幾個點,打開文件時,這是很好用的上下文管理器 - 這將確保該文件被正確關閉:

with open('inputfile') as fin: 
    lines = fin.read() 
with open('outputfile','w') as fout: 
    fout.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines) 
2

你可以嘗試這樣的事情:

with open("abc.txt") as f: 
    data=f.read().replace("\r\n","") #replace the newlines with "" 

    #the newline can be "\n" in your system instead of "\r\n" 

    ans=filter(None,data.split("!")) #split the data at '!', then filter out empty lines 
    for x in ans: 
     print "!"+x #or write to some other file 
    .....:   
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:19,000.0,0,37N22. 
0

讓我們嘗試在每個「!」之前添加一個\n;然後讓蟒蛇splitlines :-):

file_read.replace("!", "!\n").splitlines() 
+0

感謝您的答案,但此解決方案不會給出所需的輸出。 – jags

1

另一種選擇,使用replace而不是分裂,因爲你知道的起點和每一行的結束字符:

In [14]: data = """!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1 
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000. 
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W 
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56 
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34 
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.""".replace('\n', '') 

In [15]: print data.replace('+0013!', "+0013\n!") 
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013 
!,A,56281,12/12/19,19:34:19,000.0,0,37N22. 
+0

謝謝acjohnson,但我想這在大文件的情況下不會有效。 – jags

+0

是的。你可能會想爲此使用基於塊的策略,但我認爲從這個角度來看,使用替換的方法並不一定比正則表達式或拆分選項更糟糕。 – acjay

1

只是爲了一些差異,這裏是正則表達式的答案:

import re 

outputFile = open('output.txt', 'w+') 
with open('sample.txt', 'r') as f: 
    for line in re.findall("!.+?(?=!|$)", f.read(), re.DOTALL): 
     outputFile.write(line.replace("\n", "") + '\n') 

outputFile.close() 

它會打開輸出文件,獲取輸入文件的內容,並遍歷所有的比賽全光照g正則表達式!.+?(?=!|$)re.DOTALL標誌。正則表達式解釋&它匹配什麼可以在這裏找到:http://regex101.com/r/aK6aV4

我們有一個匹配後,我們從匹配中去掉新行,並將其寫入文件。

+0

謝謝Trevor。您的解決方案最好。 – jags

0

我將實際實現爲一個生成器,以便您可以處理數據流而不是文件的全部內容。我的代碼: FILE_OPEN =開放( 'sample.txt的', 'R') FILE_READ = file_open.read() file_open2如果與大文件

>>> def split_on_stream(it,sep="!"): 
    prev = "" 
    for line in it: 
     line = (prev + line.strip()).split(sep) 
     for parts in line[:-1]: 
      yield parts 
     prev = line[-1] 
    yield prev 


>>> with open("test.txt") as fin: 
    for parts in split_on_stream(fin): 
     print parts 



,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 
,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 
,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013 
,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013 
,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013 
,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013 
,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013 
,A,56281,12/12/19,19:34:19,000.0,0,37N22. 
+0

謝謝Abhijit,但是這個輸出顯然不像預期的那樣... !!! – jags