我有不平凡的任務,從大的CSV日誌看起來提取一些相關的數據,如提取使用bash公用事業文本數據的
Frame #,Residue,Internal,van der Waals,Electrostatic,Polar Solvation,Non-Polar Solv.,TOTAL
1,1,119.745,0.356,-132.009,-95.618,1.7886312,-105.7373688
1,2,106.093,-3.835,-182.473,40.582,0.7132608,-38.9197392
1,3,21.228,-1.744,-38.026,-7.707,1.1189664,-25.1300336
1,4,-5.717,-4.721,-30.38,-4.839,0.406512,-45.250488
1,5,70.846,-4.127,-53.317,-2.534,0.7808472,11.6488472
...
2,1,119.745,0.356,-132.009,-95.618,1.7886312,-105.7373688
2,2,106.093,-3.835,-182.473,40.582,0.7132608,-38.9197392
2,3,21.228,-1.744,-38.026,-7.707,1.1189664,-25.1300336
2,4,-5.717,-4.721,-30.38,-4.839,0.406512,-45.250488
2,5,70.846,-4.127,-53.317,-2.534,0.7808472,11.6488472
...
n,1,119.745,0.356,-132.009,-95.618,1.7886312,-105.7373688
n,2,106.093,-3.835,-182.473,40.582,0.7132608,-38.9197392
n,3,21.228,-1.744,-38.026,-7.707,1.1189664,-25.1300336
n,4,-5.717,-4.721,-30.38,-4.839,0.406512,-45.250488
n,5,70.846,-4.127,-53.317,-2.534,0.7808472,11.6488472
在這裏,我想最終選擇了第2列指定值(#residue),並根據第1列(#frame number)寫入其最後一列(#total energy)的寫入進化(#snapshot number列的功能)。換句話說,我需要1)排序的所有數據按照第2列第一個):即以選擇每個字符串,其中等於規定值(即n = 27)
#Frame, #Residue
1,27, ... , # last column value which is interested for me!
2,27, ... , # last column value which is interested for me!
3,27, ... , # last column value which is interested for me!
3,27, ... , # last column value which is interested for me!
在第二列中的數字比其相應的最後一列的值提取所以resululting日誌將具有onlu 3列:
#Frame, #Residue, # Total energy
1,27, # last column value which is interested for me!
2,27, # last column value which is interested for me!
3,27, # last column value which is interested for me!
3,27, # last column value which is interested for me!
將使用AWK感謝任何實現和sed!
謝謝!
格列布
你可以在 「27」 後加上一個逗號,否則它可以匹配像270最大號,271,271337 ...: 'grep'可以^ [^,] \ +,27,'input.csv | cut -d,-f1,2,8' –
'\ +'在POSIX基本正則表達式中是未定義的,所以你依賴於將'\ +'視爲「1或更多」的grep。這就是說,它應該是'*'而不是。 – geirha
thx!一個問題:在初始data.csv的第i次提取後,要在腳本中添加什麼來停止提取這些行? E,g使用此命令僅提取n行。 – user3470313