2014-02-13 56 views
0

我用下面的腳本來所有的csv文件與這些條件性能(CSV文件中提取指定列)

  1. 刪除第一行(我用「grep的-v‘詞’」結合起來,csvFolder實現它)
  2. 提取18-21柱
  3. 輸出到文件夾/ test.csv

它需要大量的時間來運行它當csv文件是Ⅰa。請告訴我更好的表現。

我在寫bat文件時比較新,請多說明一下。謝謝!!

這是我使用的蝙蝠腳本。

for /f "tokens=18-21 delims=," %%a in ('cat csvFolder/*.csv') do echo %%a,%%b,%%c,%%d | grep -v "words" >> folder/test.csv 

這是示例csv文件。

"(PDH-CSV 4.0) (GMT words g)(0)","\\s-s\s(c)\% t g","\\s-s\s(I)\% t g","\\s-s\s(Iace)\% t g","\\s-s\s(Nr)\% t g","\\s-s\s(Rface)\% t g","\\s-s\s(c)\y u","\\s-s\s(I)\y u","\\s-s\s(Iace)\y u","\\s-s\s(Nr)\y u","\\s-s\s(Rface)\y u","\\s-s\s(c)\p Bytes","\\s-s\s(I)\p Bytes","\\s-s\s(Iace)\p Bytes","\\s-s\s(Nr)\p Bytes","\\s-s\s(Rface)\p Bytes","\\s-s\s(c)\q Set","\\s-s\s(I)\q Set","\\s-s\s(Iace)\q Set","\\s-s\s(Nr)\q Set","\\s-s\s(Rface)\q Set","\\s-s\Memory\% Committed Bytes In Use","\\s-s\Memory\Available MBytes","\\s-s\t(0)\% j g","\\s-s\t(1)\% j g","\\s-s\t(2)\% j g","\\s-s\t(3)\% j g","\\s-s\t(4)\% j g","\\s-s\t(5)\% j g","\\s-s\t(6)\% j g","\\s-s\t(7)\% j g","\\s-s\t(8)\% j g","\\s-s\t(9)\% j g","\\s-s\t(10)\% j g","\\s-s\t(11)\% j g","\\s-s\t(12)\% j g","\\s-s\t(13)\% j g","\\s-s\t(14)\% j g","\\s-s\t(15)\% j g","\\s-s\t(16)\% j g","\\s-s\t(17)\% j g","\\s-s\t(18)\% j g","\\s-s\t(19)\% j g","\\s-s\t(20)\% j g","\\s-s\t(21)\% j g","\\s-s\t(22)\% j g","\\s-s\t(23)\% j g","\\s-s\t(24)\% j g","\\s-s\t(25)\% j g","\\s-s\t(26)\% j g","\\s-s\t(27)\% j g","\\s-s\t(28)\% j g","\\s-s\t(29)\% j g","\\s-s\t(30)\% j g","\\s-s\t(31)\% j g","\\s-s\t(32)\% j g","\\s-s\t(33)\% j g","\\s-s\t(34)\% j g","\\s-s\t(35)\% j g","\\s-s\t(36)\% j g","\\s-s\t(37)\% j g","\\s-s\t(38)\% j g","\\s-s\t(39)\% j g","\\s-s\t(40)\% j g","\\s-s\t(41)\% j g","\\s-s\t(42)\% j g","\\s-s\t(43)\% j g","\\s-s\t(44)\% j g","\\s-s\t(45)\% j g","\\s-s\t(46)\% j g","\\s-s\t(47)\% j g","\\s-s\t(_Total)\% j g","\\s-s\t(0)\% t g","\\s-s\t(1)\% t g","\\s-s\t(2)\% t g","\\s-s\t(3)\% t g","\\s-s\t(4)\% t g","\\s-s\t(5)\% t g","\\s-s\t(6)\% t g","\\s-s\t(7)\% t g","\\s-s\t(8)\% t g","\\s-s\t(9)\% t g","\\s-s\t(10)\% t g","\\s-s\t(11)\% t g","\\s-s\t(12)\% t g","\\s-s\t(13)\% t g","\\s-s\t(14)\% t g","\\s-s\t(15)\% t g","\\s-s\t(16)\% t g","\\s-s\t(17)\% t g","\\s-s\t(18)\% t g","\\s-s\t(19)\% t g","\\s-s\t(20)\% t g","\\s-s\t(21)\% t g","\\s-s\t(22)\% t g","\\s-s\t(23)\% t g","\\s-s\t(24)\% t g","\\s-s\t(25)\% t g","\\s-s\t(26)\% t g","\\s-s\t(27)\% t g","\\s-s\t(28)\% t g","\\s-s\t(29)\% t g","\\s-s\t(30)\% t g","\\s-s\t(31)\% t g","\\s-s\t(32)\% t g","\\s-s\t(33)\% t g","\\s-s\t(34)\% t g","\\s-s\t(35)\% t g","\\s-s\t(36)\% t g","\\s-s\t(37)\% t g","\\s-s\t(38)\% t g","\\s-s\t(39)\% t g","\\s-s\t(40)\% t g","\\s-s\t(41)\% t g","\\s-s\t(42)\% t g","\\s-s\t(43)\% t g","\\s-s\t(44)\% t g","\\s-s\t(45)\% t g","\\s-s\t(46)\% t g","\\s-s\t(47)\% t g","\\s-s\t(_TL)\% t g" 
"02/04/2014 02:25:19.850","0","0","173.29978448754693","0","0","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7","1668734976","282546176","641404928","98045952","43.227033512749721","14578","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","100","100","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","100","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","100","100","100","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","49.454183237426584" 
"02/04/2014 02:25:20.839","0","0","115.12529196882903","0","6.3082351763741924","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7","1668734976","282546176","641404928","98045952","43.226920632400869","14578","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.5770587940935481","0","0","0","1.5770587940935481","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0.065710361866962802","0","2.2223547662000076","0","0","0","0","0","0","0","0","0","0","8.5305899425741956","0","0","0","0","0","0.64529597210645218","5.3764723543870963","5.3764723543870963","0","0","0","10.107648736667752","19.57000150122904","32.18647185397743","19.57000150122904","13.26176632485484","17.992942707135484","0","8.5305899425741956","2.2223547662000076","0.64529597210645218","3.7994135602935519","0.64529597210645218","0","2.2223547662000076","0.64529597210645218","3.7994135602935519","3.7994135602935519","2.2223547662000076","8.5305899425741956","6.9535311484806517","3.7994135602935519","6.9535311484806517","8.5305899425741956","8.5305899425741956","3.8979791030939959" 
"02/04/2014 02:25:21.845","0","1.550550710336521","103.8868975925469","0","21.707709944711297","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7","1668734976","282546176","641404928","98045952","43.226983013646283","14581","0","1.550550710336521","1.550550710336521","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.550550710336521","0","0","0","1.550550710336521","0","0","0","0","0","0","0","0","1.550550710336521","0","0","1.550550710336521","0","0","0","1.550550710336521","0","0","0.2261205291001715","0.76475453846265307","5.4164066694722184","3.8658559591357","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","14.719710931491347","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","3.8658559591357","30.22521803485655","3.8658559591357","0.76475453846265307","0.76475453846265307","0.76475453846265307","17.820812352164385","24.023015193510467","13.169160221154819","16.270261641827865","16.270261641827865","19.371363062500901","2.315305248799171","3.8658559591357","5.4164066694722184","5.4164066694722184","3.8658559591357","0.76475453846265307","5.4164066694722184","3.8658559591357","3.8658559591357","3.8658559591357","3.8658559591357","3.8658559591357","5.4164066694722184","3.8658559591357","2.315305248799171","5.4164066694722184","8.5175080901452649","11.618609510818301","5.5456184003865978" 
"02/04/2014 02:25:22.853","0","0","92.848453249913092","0","12.379793766655082","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7","1668734976","282546176","641404928","98045952","43.227161245776045","14579","0","1.5474742208318852","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.5474742208318852","0","0","0","1.5474742208318852","0","0","0","1.5474742208318852","0","0","0","0","0","0","0.12895535843241074","0","7.1515467500869008","0.96164986675935094","0","0","0","0.96164986675935094","0","0","0","0","0","7.1515467500869008","2.5091240875912413","0","0","0","0","0","11.79396941258255","0.96164986675935094","0","0","0","14.888917854246319","11.79396941258255","4.056598308423121","16.43639207507821","17.98386629591009","11.79396941258255","2.5091240875912413","2.5091240875912413","0.96164986675935094","0.96164986675935094","0","0","0","0.96164986675935094","2.5091240875912413","0","2.5091240875912413","0.96164986675935094","4.056598308423121","0.96164986675935094","2.5091240875912413","4.056598308423121","5.6040725292550109","8.6990209709187809","2.8315224033152231" 
"02/04/2014 02:25:23.848","0","1.5692552674079339","116.12488978818712","0","9.4155316044476045","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7","1668734976","282546176","641404928","98045952","43.227161245776045","14580","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.1369181533001704","0","0","0","0","0","0","0","0","0","0","15.260215559971568","0","1.1369181533001704","0","0","0","1.1369181533001704","8.9831944903398409","0","0","0","0","12.121705025155705","18.398726094787442","10.552449757747773","15.260215559971568","16.82947082737951","16.82947082737951","1.1369181533001704","1.1369181533001704","2.7061734207081023","7.4139392229318979","5.8446839555239656","7.4139392229318979","2.7061734207081023","2.7061734207081023","8.9831944903398409","4.2754286881160342","5.8446839555239656","1.1369181533001704","0","2.7061734207081023","7.4139392229318979","12.121705025155705","5.8446839555239656","7.4139392229318979","4.079262977833908" 
"02/04/2014 02:25:24.844","0","0","108.05231529511674","0","17.225731423859191","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7","1668734976","282546176","641404928","98045952","43.22384909869794","14580","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.5659755839871992","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0.032624282203052524","1.3435382088064496","6.0414649607680504","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","13.871342880704042","6.0414649607680504","4.4754893767808497","1.3435382088064496","1.3435382088064496","1.3435382088064496","6.0414649607680504","13.871342880704042","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","10.73939171272964","15.437318464691241","17.003294048678441","18.569269632665641","13.871342880704042","15.437318464691241","6.0414649607680504","4.4754893767808497","4.4754893767808497","4.4754893767808497","2.9095137927936499","2.9095137927936499","4.4754893767808497","2.9095137927936499","2.9095137927936499","9.1734161287424403","6.0414649607680504","2.9095137927936499","2.9095137927936499","7.6074405447552396","4.4754893767808497","4.4754893767808497","6.0414649607680504","15.437318464691241","5.4216035989100515" 
+0

數據的構成是什麼?數值? Α?數字和alpha的混合?除逗號之外的任何字符?你能展示一個樣本行嗎? – foxidrive

+0

我添加了一個csv示例。謝謝 – user2702041

回答

1

如果我必須這樣做,顯而易見的解決方案是awk

awk -F , -v OFS=, "NR>1{print $18,$19,$20,$21}" "csvfolder\*.csv" > "folder\output.csv" 

一旦這說,讓我們用批處理。但無論如何,這將是緩慢的。

for /f用於處理命令的輸出時,行爲是首先檢索所有數據,然後開始處理它。而且,當涉及大量數據時,這非常緩慢。

for /f命令處理磁盤上的文件時,此行爲不太明顯。文件完全在內存中開始工作,但加載時間要快得多。

這是違反直覺的,但是當處理大文件時,生成僅包含所需行的中間臨時文件會更快,然後使用for處理此文件。而且,如果至少中間文件位於本地硬盤上,速度會更快。

set "tempFile=%temp%\csv.tmp" 
for %%z in (csvFolder\*.csv) do (
    echo %%z 
    findstr /v "words" "%%~fz" > "%tempFile%" 
    (for /f "usebackq tokens=18-21 delims=," %%a in ("%tempFile%") do echo %%a,%%b,%%c,%%d) >> "folder\test.csv" 
) 
del "%tempFile%" >nul 2>nul 
+0

也發現FOR/F的大小限制在3GB左右(也許內容的行長是它的一部分,我沒有檢查)。 OP正在處理一個4GB文件。 – foxidrive

+1

你是正確的臨時文件的使用可以提高性能,但FOR/F不**從文件讀取時立即迭代行。在迭代任何行之前,FOR/F總是緩衝所有內容 - 文件,命令輸出和字符串都是如此。這就是爲什麼FOR/F對大文件很慢的原因。這也意味着即使DO代碼修改了內容,FOR/F也會始終迭代文件的原始內容。 – dbenham

+0

@foxidrive,OP正在處理超過4GB的csv文件。這就是代碼中的'for %% z'按文件處理文件的原因,而不是直接將它們連接在一個文件中,然後處理它。 –

1

awk或sed聽起來像你最好的選擇。

但我設法找到了一個純粹的本地Windows腳本解決方案,表現相當好。它使用我的REPL.BAT hybrid JScript/batch utility執行正則表達式搜索並替換stdin的行並將結果寫入stdout。完整的文檔嵌入在腳本中。

假設REPL.BAT某處你的PATH中,那麼下面的命令行一個班輪應該做的伎倆:

findstr /v words *.csv | repl ".*?:(?:.*?,){17}((?:.*?,){3}.*?),.*" $1 >folder\output.csv 

我測試了上方20個CSV文件總計有點超過1GB,並完成在80秒內成功。時間應該與文件總大小成線性關係。

請注意,正則表達式中的初始.*?:表達式與每個行之前的findstr插入的文件名前綴相匹配。

還要注意,每個源CSV文件在文件最後一行的末尾都有一個換行符是非常重要的。如果不是,則文件的最後一行將與下一個文件的第一行合併。

+0

'findstr/v words * .csv | repl「。* ?:(。* ?,){17}((。* ?,){3}。*?),。*」 $ 2> output1.txt' <---這對於消化新鮮但不錯的工作而言不那麼模糊,我會盡量記住這些提示。 – foxidrive