2015-01-20 80 views
0

我有一個很大的CSV文件(5Go)。標題是:用bash中的列條件刪除CSV文件中的行

run number,export,downerQ,coefUpQuality,chooseMode,demandF,nbPLots,standarDevPop,nbCitys,whatWord,priceMaxWineF,marketColor,[step],giniIndexReserve,giniIndexPatch,meanQualityTotal,meanQualityMountain,meanQualityPlain,DiffExtCentral,nbcentralPlots,meanPatchByNetwork,sum_q_viti_moutain,sum_q_viti_plaine 
"3","false","0.5","0.01","false","7000","10","2","10","0","70","false","0","0","0.07083333333333335","0","0","0","0","0","0","48","0" 
"4","false","0.5","0.01","false","7000","10","2","10","0","70","false","0","0","0.04285714285714286","0","0","0","0","0","0","42","0" 
"2","false","0.5","0.01","false","7000","10","2","10","0","70","false","0","0","0.05348837209302328","0","0","0","0","0","0","43","0" 

我想保留字段[步驟](第十三字段)中只包含「500」的行。

  • 我試圖SQLite中導入該CSV ...但刪除崩潰...
  • R還崩潰(甚至從data.table FREAD)

人是否有一個解決方案工具如sed,awk或其他命令?

+2

查看[csvfix](https://code.google.com/p/csvfix/)。它當然可以做到。在shell中,第一步可能是'grep -E'^ run number |,「500」,''來選擇標題行和包含500的地方的行;然後你可以用'awk'將它縮小到第13列中的500。或者你可以在awk中完成整個工作:'awk -F,'NR == 1 || $ 13 ==「\」500 \「」{print}「'(未經測試,您可能需要將'OFS'設置爲'''',但可能不需要)。 – 2015-01-20 20:44:28

回答

4

AWK似乎要走的路:

awk -F, 'NR == 1 || $13 == "\"500\""' filename 

哪裏NR == 1是保護第一線(頭),之後,它只是線,13號場"500"

+0

坦克你wintermute和喬納森......別忘了'-F,'爲'--field-separator' – delaye 2015-01-20 21:25:49