我有一個文件(exOut.txt)按以下格式由數千行文字CSV:格式化文本文件中使用bash腳本
[CV] solver=newton-cg, penalty=l2, multi_class=ovr, max_iter=187.637633813, C=0.778324314482
[CV] solver=newton-cg, penalty=l2, multi_class=ovr, max_iter=187.637633813, C=0.778324314482
[CV] solver=newton-cg, penalty=l2, multi_class=ovr, max_iter=187.637633813, C=0.778324314482
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=187.637633813, C=0.31181629405
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=187.637633813, C=0.31181629405
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=187.637633813, C=0.31181629405
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=187.637633813, C=0.31181629405, score=0.497312, total=11.0min
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=187.637633813, C=0.31181629405, score=0.499232, total=11.0min
[Parallel(n_jobs=-2)]: Done 2 out of 6 | elapsed: 11.0min remaining: 22.0min
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=187.637633813, C=0.31181629405, score=0.499762, total=11.1min
[Parallel(n_jobs=-2)]: Done 3 out of 6 | elapsed: 11.1min remaining: 11.1min
[CV] solver=newton-cg, penalty=l2, multi_class=ovr, max_iter=187.637633813, C=0.778324314482, score=0.449309, total=19.6min
[Parallel(n_jobs=-2)]: Done 4 out of 6 | elapsed: 19.6min remaining: 9.8min
[CV] solver=newton-cg, penalty=l2, multi_class=ovr, max_iter=187.637633813, C=0.778324314482, score=0.449831, total=19.7min
[CV] solver=newton-cg, penalty=l2, multi_class=ovr, max_iter=187.637633813, C=0.778324314482, score=0.451609, total=19.7min
[Parallel(n_jobs=-2)]: Done 6 out of 6 | elapsed: 19.7min remaining: 0.0s
[Parallel(n_jobs=-2)]: Done 6 out of 6 | elapsed: 19.7min finished
...
我想寫一個shell腳本會拿這個文件並重新格式化,以csv格式創建一個新文件,只記錄具有「score」屬性的行。這應該看起來像這樣:
solver,penalty,multi_class,max_iter,C,score
sag,l2,multinomial,187.638,0.312,0.497
sag,l2,multinomial,187.638,0.312,0.499
sag,l2,multinomial,187.638,0.312,0.500
newton-cg,l2,ovr,187.638,0.779,0.449
newton-cg,l2,ovr,187.638,0.779,0.450
newton-cg,l2,ovr,187.638,0.779,0.450
如果可能的話,所有值四捨五入到最接近的第1000位。
最終我想借此CSV和通過識別與記錄,除了「分數」等各個領域,並與給出的參數的平均得分來取代這一個紀錄做一個濃縮版。例如:
solver,penalty,multi_class,max_iter,C,avg_score
sag,l2,multinomial,187.638,0.312,0.499
newton-cg,l2,ovr,187.638,0.779,0.450
任何幫助表示讚賞!我不是正則表達式的專家,主要是爲什麼我問。
編輯1個感謝您的反饋,這裏有更多的一些信息:
用grep,awk的迄今爲止我已經試過各種腳本和sed,包括grep '=.*,' exOut.txt
只承認一個大發生的模式,而不是多個字段,以及僅清理每行的第一部分的sed 's/^[^\=]*\=//g' exOutput.txt > firstCSV.csv
。
歡迎堆棧溢出。這不是一個代碼編寫服務,您可以在其中發佈您的要求和選擇的語言,並且有人爲您編寫代碼。我們非常樂意提供幫助,但我們希望您先努力自己解決問題,並將您的努力包括在您的問題中。請在這裏詢問之前[編輯]顯示你自己嘗試過的代碼。如果您需要更多信息,請參閱[問]。 –
'awk'應該很簡單。試一試,如果你仍然無法得到它,你可以發佈你的代碼和它給你的輸出的例子。 (而'awk'非常值得學習。) – Jack
如果你想要一個bash腳本,爲什麼用python標記?使用scikit學習由Python程序生成 –