緩慢處理循環，利用findstr

我有一個有點奇怪的情況下，當我使用findstr作爲DO的字符串for循環是非常慢。緩慢處理循環，利用findstr

值得一提的是，我正在處理的文件（old-file.xml）包含大約200000行。

這部分是極快的，但如果我刪除| find /c ":"

rem find total number of lines in xml-file 
findstr /n ^^ old-file.xml | find /c ":" > "temp-count.txt" 
set /p lines=< "temp-count.txt"

的代碼是緩慢的這個樣子的，我不能使用管道上面的技巧可以呈現慢。看起來慢的部分是for本身，因爲直到10分鐘後我纔在標題欄中看到任何進展。

setlocal DisableDelayedExpansion 
rem start replacing wrong dates with correct date 
for /f "usebackq Tokens=1* Delims=:" %%i in (`"findstr /n ^^ old-file.xml"`) do (
    rem cache the value of each line in a variable 
    set read-line=%%j 
    set line=%%i 
    rem restore delayed expansion 
    setlocal EnableDelayedExpansion 
    rem write progress in title bar 
    title Processing line: !line!/%lines% 
    rem remove trailing line number 
    rem set read-line=!read-line:*:=! 
    for /f "usebackq" %%i in ("%tmpfile%") do (
     rem replace all wrong dates with correct dates 
     set read-line=!read-line:%%i=%correctdate%! 
    ) 
    rem write results to new file 
    echo(!read-line!>>"Updated-file.xml" 
    rem end local 
    endlocal 
)

編輯：

進一步調查顯示我使用應顯示當前的行號繞環這個單行承擔的200萬行我8MB文件約10分鐘。這只是爲了讓它開始顯示線條。

for /f "usebackq Tokens=1* Delims=:" %%i in (`"findstr /n ^^ old-file.xml"`) do echo %%i

所以好像findstr被寫入屏幕輸出對用戶隱藏了，但可見的for -loop。我怎樣才能防止這種情況發生，同時仍然得到相同的結果？

編輯2：解決方案

提議Aacini最後由我修訂的解決方案。

這是一個更大的腳本片段。在另一個循環中檢索錯誤的日期。並且還從另一個循環中檢索總行數。

setlocal enabledelayedexpansion 
rem this part is for snippet only, dates are generated from another loop in final script 
echo 2069-04-29 > dates-tmp.txt 
echo 2069-04-30 >> dates-tmp.txt 

findstr /n ^^ Super-Large-File.xml > out.tmp 

set tmpfile=dates-tmp.txt 
set correctdate=2011-11-25 
set wrong-dates= 
rem hardcoded total number of lines 
set lines=186442 
for /F %%i in (%tmpfile%) do (
    set wrong-dates=!wrong-dates! %%i 
) 
rem process each line in out.tmp and loop them through :ProcessLines 
call :ProcessLines < out.tmp 
rem when finished with above call for each line in out.tmp, goto exit 
goto ProcessLinesEnd 
:ProcessLines 
for /L %%l in (1,1,%lines%) do (
    set /P read-line= 
    rem write progress in title bar 
    title Processing line: %%l/%lines% 
    for %%i in (%wrong-dates%) do (
     rem replace all wrong dates with correct dates 
     set read-line=!read-line:%%i=%correctdate%! 
    ) 
    rem write results to new file 
    echo(!read-line:*:=!>>"out2.tmp" 
) 
rem end here and continue below 
goto :eof 

:ProcessLinesEnd 
echo this should not be printed until call has ended 

:exit 
exit /b

來源

2011-11-28 NiklasJ

什麼文件有你的'old-file.xml'？你可以嘗試在循環中添加一個'echo我活着的'，看看'findstr'是否是問題 – jeb

'old-file.xml'包含200 000行，大小約爲8 MB。我在for循環中使用'title'而不是'echo我活着' – NiklasJ

這裏有兩點：

1-執行setlocal EnableDelayedExpansion命令，每行。這意味着完整環境的約20萬次必須複製到新的本地存儲區。這可能會導致幾個問題。

2-我建議你從最基本的部分開始。 findstr執行需要多少時間？單獨運行findstr /n ^^ old-file.xml，然後在嘗試修復任何其他部分之前檢查此問題。如果這個過程很快，那麼向它添加一個單獨的步驟並再次測試，直到找到減速的原因。我建議你不要使用管道，也不要使用for /f執行findstr，而是通過先前重定向生成的文件。

編輯一個更快的解決方案

還有另一種方式來做到這一點。您可以將findstr輸出傳送到批處理子程序中，以便可以使用SET /P命令讀取這些行。此方法允許完全通過延遲擴展來處理這些行，而不是通過FOR /F的命令行索引，因此不再需要這對setlocal EnableDelayedExpansion和endlocal命令。但是，如果您仍想顯示行號，則需要再次進行計算。

此外，在變量中加載錯誤的日期而不是使用大文件的每一行處理％tmpfile％會更快。

setlocal EnableDelayedExpansion 
rem load wrong dates from tmpfile 
set wrong-dates= 
for /F %%i in (%tmpfile%) do (
    set wrong-dates=!wrong-dates! %%i 
) 
echo creating findstr output, please wait... 
findstr /n ^^ old-file.xml > findstr.txt 
echo :EOF>> findstr.txt 
rem start replacing wrong dates with correct date 
call :ProcessLines < findstr.txt 
goto :eof

。

:ProcessLines 
set line=0 
:read-next-line 
set /P read-line= 
rem check if the input file ends 
if !read-line! == :EOF goto :eof 
rem write progress in title bar 
set /A line+=1 
title Processing line: %line%/%lines% 
for %%i in (%wrong-dates%) do (
    rem replace all wrong dates with correct dates 
    set read-line=!read-line:%%i=%correctdate%! 
) 
rem write results to new file 
echo(!read-line:*:=!>>"Updated-file.xml" 
rem go back for next line 
goto read-next-line

SECOND EDIT甚至更快的變形例

上一頁方法可以slighlty加快如果迴路是通過for /L命令，而不是通過一個goto實現。

:ProcessLines 
for /L %%l in (1,1,%lines%) do (
    set /P read-line= 
    rem write progress in title bar 
    title Processing line: %%l/%lines% 
    for %%i in (%wrong-dates%) do (
     rem replace all wrong dates with correct dates 
     set read-line=!read-line:%%i=%correctdate%! 
    ) 
    rem write results to new file 
    echo(!read-line:*:=!>>"Updated-file.xml" 
)

此修改還省略了EOF比較和行號的計算，所以時間增益在重複200000次後可能有意義。如果您使用此方法，請不要忘記在第一部分中刪除echo :EOF>> findstr.txt行。

來源

2011-11-28 18:47:55 Aacini

1.我被給予該建議作爲一個「技巧」以前的問題，以獲得空行和特殊字符。 http://stackoverflow.com/a/7886228/487650 2.「findstr/n ^^ old-file.xml」大約需要3-4分鐘。但是，如果我使用'1> nul'重定向標準輸出，速度要快很多，但我無法處理輸出。你能否爲你的建議提供任何概念驗證碼？ – NiklasJ

@Niklas：還有另外一種處理空行和特殊字符的方法。查看我的編輯以獲得更快的解決方案 – Aacini

'set/p'是一個好的，快速和簡單的解決方案，如果你在行尾沒有空格，否則你將失去它們。並且'set/p'只能處理行結束，它只能失敗 – jeb

在內部循環開始之前，FOR/F表達式將始終執行/讀取/計算完成。

您可以

(
    echo line1 
    echo line2 
) > myFile.txt 
FOR /F "delims=" %%a in (myFile.txt) DO (
    echo %%a 
    del myFile.txt 2> nul >nul 
)

試試它會顯示

line1 
line2

在你的情況完全('"findstr /n ^^ old-file.xml"')將執行並緩存在循環之前可以開始

編輯：添加解決方案

我有一個文件大約20MB的測量與370.000線

type testFile.txt > nul 
findstr /n ^^ testFile.txt > nul 

for /F "delims=" %%a in (testFile.txt) do ( 
    rem Nothing 
) 

for /f "usebackq delims=" %%a in (`"findstr /n ^^ testFile.txt"`) do ... 

findstr /n ^^ testFile.txt > out.tmp 

type_nul  ~10000ms 
findstr_nul ~30000ms 
for_file  ~ 1600ms 
for_findstr cancled after 10 minutes 
findstr_tmp ~ 500ms !!!

我會建議使用臨時文件，它的極端快。

findstr /n ^^ myFile.txt > out.tmp 
set lineNr=0 
(
    for /f "usebackq delims=" %%a in ("out.tmp") do (
    set /a lineNr+=1 
    set "num_line=%%a" 
    setlocal EnableDelayedExpansion 
    set "line=!num_line:*:=!" 
    echo(!line! 
    endlocal 
) 
) > out2.tmp

Btw。您的/ F分裂可能會失敗，如果原來的行以冒號
for /f "usebackq Tokens=1* Delims=:"

樣品開始：:ThisIsALabel
:ThisIsALabel
Findstr工具/ N前添加行號
17::ThisIsALabel
的delims=:將分裂的第一個標記並將所有冒號處理爲只有一個分隔符
ThisIsALabel

來源

2011-11-28 15:54:08 jeb

我不知道在循環開始之前對FOR/F進行了評估。很高興知道。但是，我不能在（myfile.txt）中使用'FOR/F'delims ='%% a'，因爲我不會得到空行或特殊字符。正如你在以前的帖子中推薦我的：http://stackoverflow.com/a/7886228/487650 – NiklasJ

分裂到目前爲止從未失敗，我遇到了冒號發生的情況，他們沒有被過濾掉。 'for/f「Tokens = 1 * Delims =：」％i in（file.txt）do echo％j'％j在這種情況下將是令牌*。我會嘗試一些你的建議，但似乎是一個更快的選擇。但是，解析特殊字符會不會有問題？例如ü，ä，ä，ö，插頁，＆和|。我也無法使用'set'line =！num_line :: * =！「' – NiklasJ

@Niklas：jeb犯了一個小錯誤，應該是'set'line =！num_line：*： =！「':-) – Aacini

緩慢處理循環，利用findstr

回答

相關問題