2017-02-19 94 views
-1

我有一個日誌文件,我想刪除一些特定的部分。下面顯示了日誌文件的一部分:如何刪除包含特定字符串但字符串內部長度不同的字符串?

I0216 10:18:04.720626 31559 solver.cpp:273] Solving 
I0216 10:18:04.720630 31559 solver.cpp:274] Learning Rate Policy: step 
I0216 10:18:05.242708 31559 solver.cpp:219] Iteration 0 (0 iter/s, 0.522037s/50 iters), loss = 1.60944 
I0216 10:18:05.242750 31559 solver.cpp:238]  Train net output #0: accuracy = 0 
I0216 10:18:05.242763 31559 solver.cpp:238]  Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss) 
I0216 10:18:05.242785 31559 sgd_solver.cpp:105] Iteration 0, lr = 1e-10 
I0216 10:18:22.386440 31559 solver.cpp:219] Iteration 50 (2.91648 iter/s, 17.144s/50 iters), loss = 1.60944 
I0216 10:18:22.386497 31559 solver.cpp:238]  Train net output #0: accuracy = 0.643982 
I0216 10:18:22.386509 31559 solver.cpp:238]  Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss) 
I0216 10:18:22.386515 31559 sgd_solver.cpp:105] Iteration 50, lr = 1e-10 
I0216 10:18:39.549926 31559 solver.cpp:219] Iteration 100 (2.91313 iter/s, 17.1637s/50 iters), loss = 1.60944 
I0216 10:18:39.550071 31559 solver.cpp:238]  Train net output #0: accuracy = 1 
I0216 10:18:39.550087 31559 solver.cpp:238]  Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss) 
I0216 10:18:39.550093 31559 sgd_solver.cpp:105] Iteration 100, lr = 1e-10 
I0216 10:18:56.714752 31559 solver.cpp:219] Iteration 150 (2.91292 iter/s, 17.1649s/50 iters), loss = 1.60944 
I0216 10:18:56.714824 31559 solver.cpp:238]  Train net output #0: accuracy = 0.624222 
I0216 10:18:56.714838 31559 solver.cpp:238]  Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss) 
I0216 10:18:56.714845 31559 sgd_solver.cpp:105] Iteration 150, lr = 1e-10 
I0216 10:19:13.893241 31559 solver.cpp:219] Iteration 200 (2.91059 iter/s, 17.1787s/50 iters), loss = 1.60944 
I0216 10:19:13.893450 31559 solver.cpp:238]  Train net output #0: accuracy = 1 
I0216 10:19:13.893467 31559 solver.cpp:238]  Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss) 
I0216 10:19:13.893473 31559 sgd_solver.cpp:105] Iteration 200, lr = 1e-10 
I0216 10:19:31.094591 31559 solver.cpp:219] Iteration 250 (2.90674 iter/s, 17.2014s/50 iters), loss = 1.60944 
I0216 10:19:31.094650 31559 solver.cpp:238]  Train net output #0: accuracy = 0.61937 
I0216 10:19:31.094662 31559 solver.cpp:238]  Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss) 
I0216 10:19:31.094667 31559 sgd_solver.cpp:105] Iteration 250, lr = 1e-10 
I0216 10:19:48.290045 31559 solver.cpp:219] Iteration 300 (2.90772 iter/s, 17.1956s/50 iters), loss = 1.60944 
I0216 10:19:48.290187 31559 solver.cpp:238]  Train net output #0: accuracy = 0.959229 
I0216 10:19:48.290205 31559 solver.cpp:238]  Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss) 
I0216 10:19:48.290210 31559 sgd_solver.cpp:105] Iteration 300, lr = 1e-10 
I0216 10:20:05.504201 31559 solver.cpp:219] Iteration 350 (2.90457 iter/s, 17.2142s/50 iters), loss = 1.60944 
I0216 10:20:05.504257 31559 solver.cpp:238]  Train net output #0: accuracy = 0.772217 
I0216 10:20:05.504271 31559 solver.cpp:238]  Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss) 

如可以看出,有一些與31559 solver.cpp:219] Iteration

我想開始,在不改變文件的其它行線,只有改變這些線,例如,這樣一句:FROM

... solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 

... solver.cpp:219] Iteration 14750, loss = 1.60934 
. 
. 
. 

這意味着我想從包含上述行的行刪除子串(2.9004 iter/s, 17.239s/50 iters),但其他行保持不變。 謝謝

我想刪除包含(2.8995 iter/s, 17.2444s/50 iters)的行中的那些部分,這個字符串的長度可能互不相同。這部分與(開始和以數量繼續(其可以是從其他行不同的,並與iter/s,繼續和重新編號和與iters)結束

AS @ delca85建議的圖案是這樣的:

p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))" 

有沒有人有一個建議?在此先感謝

回答

1

我已經對你們串的第二部分的額外的假設,它與s/number一個數字。我希望我不是錯,反正在這種情況下, ,請告訴我,我會很開心的o爲你找到另一種解決方案。

這是我的建議給你:

import re 

string = "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238]  Train net output \#0: accuracy = 1\" " 

p = "\(\d*[.]?\d* iter/s\, \d*[.]?\d*s/[0-9]+ iters\)" 
pattern = re.compile(p) 
for l in pattern.findall(string): 
    print l 

我希望我幫助你!

S/50可選
這是你可以在s/50情況下使用的解決方案是在字符串中的第二部分可選:

import re 

string = "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238]  Train net output \#0: accuracy = 1\" " 
string = string + "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238]  Train net output \#0: accuracy = 1\" " 
p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))" 
pattern = re.compile(p) 
for l in pattern.findall(string): 
    print ''.join(l) 

打開文件,讀取線,匹配模式和替換行文件

import re 

p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))" 
pattern = re.compile(p) 
for line in fileinput.input("file.txt", inplace=1): 
    for m in pattern.findall(line): 
     string = ''.join(m) 
     if string in line: 
      line = line.replace(string, "") 
    sys.stdout.write(line) 
+0

感謝您的回覆,我如何打開文件並找到'p =「\([0-9 \。] + iter/s \,[0-9 \。] + s/[0-9] + iters \)「'並從文件中刪除字符串。不應該讀程序?謝謝 –

+0

@ S.EB我在文件中添加了替換行。我希望這可以最終幫助你,你會接受和upvote我的答案。 – delca85

+0

感謝您的回答。不幸的是,它不起作用,因爲你帶來的'string'在行中不一樣。迭代數字不斷變化 –

0

您可以使用正則表達式模塊FO r(稱爲're'),它可以幫助您快速隔離子字符串。

下面是代碼:

import re 

file = open('your_file_with_correct_path') 
file_content = file.read() 

#The string you provided 
#No need to do the below string definition as you will use the file_content 
#str = ' I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238] Train net output #0: accuracy = 1' 

sub_tring = re.findall('\(\d+.*\)', file_content) 

for element in sub_string: 
    #add element to the file you want 

#save the file where you added the elements 

SUB_STRING將匹配你問與findall方法的第一個參數的模式,所有的子字符串列表。

我建議您查看regex中使用的各種特殊字符,因爲這對於一般清理字符串非常有用。

謝謝。

+0

感謝您的迴應,這個'str'只是日誌文件中的一行,如何才能改變程序讀取一行並處理行是否包含這個例如'(2.9004 iter/s,17.239s/50 iters)',如果是,則將該部分從行中刪除並保存。 –

+0

你可以閱讀你的整個日誌文件,所以在你的情況下str將是str = log_file.read()。然後你可以創建我在前面的代碼中添加的sub_string變量。這會給你一個所有匹配模式的列表(例如,你的(... iters))在你的日誌文件中。爲了保存它,你只需要遍歷sub_string列表並添加每個元素到你想要的文檔。 – RobinW2

+0

@ S.EB我已經修改了我的回覆,所以你可以看到我之前的評論背後的整個過程,這應該可以幫助你得到你需要的東西 – RobinW2