2014-01-17 22 views
0

我有一個日誌文件(¬deliminator)。從非結構化csv中查找替換文本

073957.744 : Send:[8=FIX.4.4¬9=724¬35=AE¬49=FAUAT¬56=CALUAT¬34=82¬55=0000 AA BBC¬48=0000 AA BBC¬22=100¬38=17000.000000¬9998=Equity¬9999=CFD¬] 
080655.776 : Send:[8=FIX.4.4¬9=631¬35=AE¬49=FAUAT¬56=CALUAT¬34=136¬55=NOVN VX CFD¬48=NOVN VX CFD¬22=100¬38=7500.000000¬] 
081249.475 : Send:[8=FIX.4.4¬9=620¬35=AE¬49=FAUAT¬56=CALUAT¬34=148¬55=NOK1V FH CFD¬48=NOK1V FH CFD¬22=100¬38=50000.000000¬9896=False¬9893=1¬] 
081806.623 : Send:[8=FIX.4.4¬9=583¬35=AE¬49=FAUAT¬56=CALUAT¬34=159¬55=IX17186393-0¬48=IX17186393-0¬22=110¬38=10.000000¬60=20131216-08:09:02¬64=20131219¬552=1¬54=1¬] 

我使用下面的代碼將該文件轉換爲CSV並刪除第7列

@echo off 

rem fetch only the required messages from log file 
findstr /r /i Send:\[.*35=AE.* %cd%\FixProvider_MsgLog_20131216_1.log > %cd%\FilteredFIXMessages.log 

rem ensure the older temp file is not present 
if exist %cd%\FIXTemp1.tmp del %cd%\FIXTemp1.tmp 

rem convert the FilteredFIXMessages.log into csv and store it in temp1 file and strip temp1 file for the first 6 columns as they are not required for data matching 
setlocal enabledelayedexpansion 
for /f "tokens=1-6* delims=¬" %%a in (%cd%\FilteredFIXMessages.log) do set data=%%h & echo !data:=¬,! >> %cd%\FIXTemp1.tmp 

exit /b 

這給了我下面的CSV

55=0000 AA BBC,48=0000 AA BBC,22=100,38=17000.000000,9998=Equity,9999=CFD,] 
55=NOVN VX CFD,48=NOVN VX CFD,22=100,38=7500.000000,] 
55=NOK1V FH CFD,48=NOK1V FH CFD,22=100,38=50000.000000,9896=False,9893=1,] 
55=IX17186393-0,48=IX17186393-0,22=110,38=10.000000,60=20131216-08:09:02,64=20131219,552=1,54=1,] 

正如你可以看到,這不是一個結構化的CSV(無固定列和列順序也可能有所不同),我想剝離

  1. 列象55 = *或任何其欲柱(一個或多個)(該數據可以是可變長度的,但列標記是靜態的像55 =等)
  2. 最後一列,](空欄)

我可以很容易地使用VBS去掉它,但是由於我使用了批處理腳本,我想繼續使用它,而不是安裝任何其他工具。請幫忙。

+0

我在看'for'的文檔。所以令牌7(%h)將每個經過6分隔的分段分隔到最後?我看着幫助,沒有像'printf'這樣的命令來格式化字符串。你可以編寫你自己的exe來模擬printf,返回一個填充字符串,但不知道你是否可以從do主體中調用它。即使可以,您也必須在文件上進行2次傳遞,一次找出最大列數和寬度,一次格式化數據。但是,爲什麼這個csv需要具有固定的結構? – sln

+0

@sln - CSV不需要是固定的結構,因此可變長度文本查找和替換的問題。正如我所提到的,我可以使用VBS輕鬆做到這一點(因爲我比較舒服),但是希望在一個BAT文件中做到這一點。 –

+0

你面臨的最大問題是擺脫'='符號。你不能輕易地批量進行。如果你想把它放在一個整體中,我只需要使用一個混合的vbs/batch腳本。 –

回答

1
@ECHO OFF 
SETLOCAL 
:: Parenthesise a statement-group with redirector sends all echoed text to file 
(
REM This is simply using your regex to feed the lines to FOR 
REM Tokenised - first 5 tokens are skipped, #6 to %%a, remainder of line to %%b 
FOR /f "tokens=6* delims=¬" %%a IN ('findstr /r /i "Send:\[.*35=AE.*" q21191380.txt') DO (
    REM set LINE to token7+(with delimiters) and clear NEWLINE 
    SET line=%%b 
    SET "newline=" 
    CALL :process 
) 
)>newfile.txt 
TYPE newfile.txt 

GOTO :EOF 

:process 
:: Grab the first token in LINE to %%s, part after delimiter to %%t 
:: Then set FIELD to "line=nexttoken" and LINE to remaining text 
FOR /f "tokens=1*delims=¬" %%s IN ('set line') DO SET "field=%%s"&SET "line=%%t" 
:: Remove the leading "line=" from LINE (5 characters) 
SET "field=%field:~5%" 
:: Vanilla FOR for quoted strings (which bypasses the special status of "=") 
:: Set a work variable=FIELD and set string=(element from list - quotes) 
FOR %%e IN ("55=" "]") DO SET "work=%field%"&SET "string=%%~e"&CALL :elim 
:: ELIM will either clear FIELD or leave it untouched - build & separate 
IF DEFINED field SET "newline=%newline%,%field%" 
:: If there's any more left in LINE, repeat the process until LINE is empty 
IF DEFINED line GOTO process 
:: NEWLINE will start with a comma, so ECHO it minus the first character 
IF DEFINED newline ECHO %newline:~1% 
GOTO :eof 

:elim 
:: Does the first character of WORK = first of STRING? 
IF NOT "%string:~0,1%"=="%work:~0,1%" GOTO :EOF 
:: Yes - lop off the first character of both 
SET "string=%string:~1%" 
SET "work=%work:~1%" 
:: If both are still defined, repeat 
IF DEFINED string IF DEFINED work GOTO elim 
:: If there's anything left to match in STRING, we've found where STRING and WORK differ, 
IF DEFINED string GOTO :EOF 
:: STRING has been completely matched, so clear FIELD to drop it from output 
SET "field=" 
GOTO :eof 

現在出現了一個有趣的練習!

我已經更改了文件的名稱以適合我的系統,但除此之外,應該爲您工作。

+0

非常感謝。一個請求 - 你可以評論你的代碼,以便我更容易理解發生了什麼。 –

+0

再次感謝。我已經嘗試過不同的列組合,它工作得很好。乾杯。 –

1

這是一個混合腳本,它會做到這一點。

::Find and Replace 
::Matt Williamson 
::5/30/2013 

@echo off 
setlocal 

call :FindReplace "55=" "" in.txt 
call :FindReplace ",]" "" in.txt 

exit /b 

:FindReplace <findstr> <replstr> <file> 
set tmp="%temp%\tmp.txt" 
If not exist %temp%\_.vbs call :MakeReplace 
for /f "tokens=*" %%a in ('dir "%3" /s /b /a-d /on') do (
    for /f "usebackq" %%b in (`Findstr /mic:"%~1" "%%a"`) do (
    echo(&Echo Replacing "%~1" with "%~2" in file %%~nxa 
    <%%a cscript //nologo %temp%\_.vbs "%~1" "%~2">%tmp% 
    if exist %tmp% move /Y %tmp% "%%~dpnxa">nul 
) 
) 
del %temp%\_.vbs 
exit /b 

:MakeReplace 
>%temp%\_.vbs echo with Wscript 
>>%temp%\_.vbs echo set args=.arguments 
>>%temp%\_.vbs echo .StdOut.Write _ 
>>%temp%\_.vbs echo Replace(.StdIn.ReadAll,args(0),args(1),1,-1,1) 
>>%temp%\_.vbs echo end with 
+0

儘管我只想要一個批處理解決方案,但您的解決方案也很簡單。 –