2014-02-25 52 views
2

我最近偶然發現了這個問題,但由於我對這種腳本編寫方法不熟悉,所以無法解決這個問題。 我需要一個腳本,執行以下操作:batch/perl/python在多個文件中查找字符串,然後刪除行

基於列表(LIST.TXT)從文本到多個文件中搜索每一行,如果發現刪除該行(從eevery其他文件)。 我試圖將list.txt保存爲一個數組,然後用for來檢查它,但我不知道如何搜索該字符串並刪除該行。 你能幫我解決這個問題嗎?

到目前爲止,這是我想出了來自多個來源:

REPL.bat,搜索到多個文本文件:

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment 

::************ Documentation *********** 
::: 
:::REPL Search Replace [Options [SourceVar]] 
:::REPL /? 
:::REPL /V 
::: 
::: Performs a global search and replace operation on each line of input from 
::: stdin and prints the result to stdout. 
::: 
::: Each parameter may be optionally enclosed by double quotes. The double 
::: quotes are not considered part of the argument. The quotes are required 
::: if the parameter contains a batch token delimiter like space, tab, comma, 
::: semicolon. The quotes should also be used if the argument contains a 
::: batch special character like &, |, etc. so that the special character 
::: does not need to be escaped with ^. 
::: 
::: If called with a single argument of /?, then prints help documentation 
::: to stdout. 
::: 
::: If called with a single argument of /V, case insensitive, then prints 
::: the version of REPL.BAT. (Currently 3.1) 
::: 
::: Search - By default, this is a case sensitive JScript (ECMA) regular 
:::   expression expressed as a string. 
::: 
:::   JScript regex syntax documentation is available at 
:::   http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx 
::: 
::: Replace - By default, this is the string to be used as a replacement for 
:::   each found search expression. Full support is provided for 
:::   substituion patterns available to the JScript replace method. 
::: 
:::   For example, $& represents the portion of the source that matched 
:::   the entire search pattern, $1 represents the first captured 
:::   submatch, $2 the second captured submatch, etc. A $ literal 
:::   can be escaped as $$. 
::: 
:::   An empty replacement string must be represented as "". 
::: 
:::   Replace substitution pattern syntax is fully documented at 
:::   http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx 
::: 
::: Options - An optional string of characters used to alter the behavior 
:::   of REPL. The option characters are case insensitive, and may 
:::   appear in any order. 
::: 
:::   I - Makes the search case-insensitive. 
::: 
:::   L - The Search is treated as a string literal instead of a 
:::    regular expression. Also, all $ found in Replace are 
:::    treated as $ literals. 
::: 
:::   B - The Search must match the beginning of a line. 
:::    Mostly used with literal searches. 
::: 
:::   E - The Search must match the end of a line. 
:::    Mostly used with literal searches. 
::: 
:::   V - Search and Replace represent the name of environment 
:::    variables that contain the respective values. An undefined 
:::    variable is treated as an empty string. 
::: 
:::   A - Only print altered lines. Unaltered lines are discarded. 
:::    This option is incompatible with the M option. 
::: 
:::   M - Multi-line mode. The entire contents of stdin is read and 
:::    processed in one pass instead of line by line, thus enabling 
:::    search for \n. This option is incompatible with the A option. 
::: 
:::   X - Enables extended substitution pattern syntax with support 
:::    for the following escape sequences within the Replace string: 
::: 
:::    \\  - Backslash 
:::    \b  - Backspace 
:::    \f  - Formfeed 
:::    \n  - Newline 
:::    \q  - Quote 
:::    \r  - Carriage Return 
:::    \t  - Horizontal Tab 
:::    \v  - Vertical Tab 
:::    \xnn - Extended ASCII byte code expressed as 2 hex digits 
:::    \unnnn - Unicode character expressed as 4 hex digits 
::: 
:::    Also enables the \q escape sequence for the Search string. 
:::    The other escape sequences are already standard for a regular 
:::    expression Search string. 
::: 
:::    Also modifies the behavior of \xnn in the Search string to work 
:::    properly with extended ASCII byte codes. 
::: 
:::    Extended escape sequences are supported even when the L option 
:::    is used. Both Search and Replace support all of the extended 
:::    escape sequences if both the X and L opions are combined. 
::: 
:::   S - The source is read from an environment variable instead of 
:::    from stdin. The name of the source environment variable is 
:::    specified in the next argument after the option string. Without 
:::    the M option,^anchors the beginning of the string, and $ the 
:::    end of the string. With the M option,^anchors the beginning 
:::    of a line, and $ the end of a line. 
::: 

::************ Batch portion *********** 
@echo off 
if .%2 equ . (
    if "%~1" equ "/?" (
    <"%~f0" cscript //E:JScript //nologo "%~f0" "^:::" "" a 
    exit /b 0 
) else if /i "%~1" equ "/V" (
    echo REPL.BAT version 3.1 
    exit /b 
) else (
    call :err "Insufficient arguments" 
    exit /b 1 
) 
) 
echo(%~3|findstr /i "[^SMILEBVXA]" >nul && (
    call :err "Invalid option(s)" 
    exit /b 1 
) 
echo(%~3|findstr /i "M"|findstr /i "A" >nul && (
    call :err "Incompatible options" 
    exit /b 1 
) 
cscript //E:JScript //nologo "%~f0" %* 
exit /b 0 

:err 
>&2 echo ERROR: %~1. Use REPL /? to get help. 
exit /b 

************* JScript portion **********/ 
var env=WScript.CreateObject("WScript.Shell").Environment("Process"); 
var args=WScript.Arguments; 
var search=args.Item(0); 
var replace=args.Item(1); 
var options="g"; 
if (args.length>2) options+=args.Item(2).toLowerCase(); 
var multi=(options.indexOf("m")>=0); 
var alterations=(options.indexOf("a")>=0); 
if (alterations) options=options.replace(/a/g,""); 
var srcVar=(options.indexOf("s")>=0); 
if (srcVar) options=options.replace(/s/g,""); 
if (options.indexOf("v")>=0) { 
    options=options.replace(/v/g,""); 
    search=env(search); 
    replace=env(replace); 
} 
if (options.indexOf("x")>=0) { 
    options=options.replace(/x/g,""); 
    replace=replace.replace(/\\\\/g,"\\B"); 
    replace=replace.replace(/\\q/g,"\""); 
    replace=replace.replace(/\\x80/g,"\\u20AC"); 
    replace=replace.replace(/\\x82/g,"\\u201A"); 
    replace=replace.replace(/\\x83/g,"\\u0192"); 
    replace=replace.replace(/\\x84/g,"\\u201E"); 
    replace=replace.replace(/\\x85/g,"\\u2026"); 
    replace=replace.replace(/\\x86/g,"\\u2020"); 
    replace=replace.replace(/\\x87/g,"\\u2021"); 
    replace=replace.replace(/\\x88/g,"\\u02C6"); 
    replace=replace.replace(/\\x89/g,"\\u2030"); 
    replace=replace.replace(/\\x8[aA]/g,"\\u0160"); 
    replace=replace.replace(/\\x8[bB]/g,"\\u2039"); 
    replace=replace.replace(/\\x8[cC]/g,"\\u0152"); 
    replace=replace.replace(/\\x8[eE]/g,"\\u017D"); 
    replace=replace.replace(/\\x91/g,"\\u2018"); 
    replace=replace.replace(/\\x92/g,"\\u2019"); 
    replace=replace.replace(/\\x93/g,"\\u201C"); 
    replace=replace.replace(/\\x94/g,"\\u201D"); 
    replace=replace.replace(/\\x95/g,"\\u2022"); 
    replace=replace.replace(/\\x96/g,"\\u2013"); 
    replace=replace.replace(/\\x97/g,"\\u2014"); 
    replace=replace.replace(/\\x98/g,"\\u02DC"); 
    replace=replace.replace(/\\x99/g,"\\u2122"); 
    replace=replace.replace(/\\x9[aA]/g,"\\u0161"); 
    replace=replace.replace(/\\x9[bB]/g,"\\u203A"); 
    replace=replace.replace(/\\x9[cC]/g,"\\u0153"); 
    replace=replace.replace(/\\x9[dD]/g,"\\u009D"); 
    replace=replace.replace(/\\x9[eE]/g,"\\u017E"); 
    replace=replace.replace(/\\x9[fF]/g,"\\u0178"); 
    replace=replace.replace(/\\b/g,"\b"); 
    replace=replace.replace(/\\f/g,"\f"); 
    replace=replace.replace(/\\n/g,"\n"); 
    replace=replace.replace(/\\r/g,"\r"); 
    replace=replace.replace(/\\t/g,"\t"); 
    replace=replace.replace(/\\v/g,"\v"); 
    replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g, 
    function($0,$1,$2){ 
     return String.fromCharCode(parseInt("0x"+$0.substring(2))); 
    } 
); 
    replace=replace.replace(/\\B/g,"\\"); 
    search=search.replace(/\\\\/g,"\\B"); 
    search=search.replace(/\\q/g,"\""); 
    search=search.replace(/\\x80/g,"\\u20AC"); 
    search=search.replace(/\\x82/g,"\\u201A"); 
    search=search.replace(/\\x83/g,"\\u0192"); 
    search=search.replace(/\\x84/g,"\\u201E"); 
    search=search.replace(/\\x85/g,"\\u2026"); 
    search=search.replace(/\\x86/g,"\\u2020"); 
    search=search.replace(/\\x87/g,"\\u2021"); 
    search=search.replace(/\\x88/g,"\\u02C6"); 
    search=search.replace(/\\x89/g,"\\u2030"); 
    search=search.replace(/\\x8[aA]/g,"\\u0160"); 
    search=search.replace(/\\x8[bB]/g,"\\u2039"); 
    search=search.replace(/\\x8[cC]/g,"\\u0152"); 
    search=search.replace(/\\x8[eE]/g,"\\u017D"); 
    search=search.replace(/\\x91/g,"\\u2018"); 
    search=search.replace(/\\x92/g,"\\u2019"); 
    search=search.replace(/\\x93/g,"\\u201C"); 
    search=search.replace(/\\x94/g,"\\u201D"); 
    search=search.replace(/\\x95/g,"\\u2022"); 
    search=search.replace(/\\x96/g,"\\u2013"); 
    search=search.replace(/\\x97/g,"\\u2014"); 
    search=search.replace(/\\x98/g,"\\u02DC"); 
    search=search.replace(/\\x99/g,"\\u2122"); 
    search=search.replace(/\\x9[aA]/g,"\\u0161"); 
    search=search.replace(/\\x9[bB]/g,"\\u203A"); 
    search=search.replace(/\\x9[cC]/g,"\\u0153"); 
    search=search.replace(/\\x9[dD]/g,"\\u009D"); 
    search=search.replace(/\\x9[eE]/g,"\\u017E"); 
    search=search.replace(/\\x9[fF]/g,"\\u0178"); 
    if (options.indexOf("l")>=0) { 
    search=search.replace(/\\b/g,"\b"); 
    search=search.replace(/\\f/g,"\f"); 
    search=search.replace(/\\n/g,"\n"); 
    search=search.replace(/\\r/g,"\r"); 
    search=search.replace(/\\t/g,"\t"); 
    search=search.replace(/\\v/g,"\v"); 
    search=search.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g, 
     function($0,$1,$2){ 
     return String.fromCharCode(parseInt("0x"+$0.substring(2))); 
     } 
    ); 
    search=search.replace(/\\B/g,"\\"); 
    } else search=search.replace(/\\B/g,"\\\\"); 
} 
if (options.indexOf("l")>=0) { 
    options=options.replace(/l/g,""); 
    search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1"); 
    replace=replace.replace(/\$/g,"$$$$"); 
} 
if (options.indexOf("b")>=0) { 
    options=options.replace(/b/g,""); 
    search="^"+search 
} 
if (options.indexOf("e")>=0) { 
    options=options.replace(/e/g,""); 
    search=search+"$" 
} 
var search=new RegExp(search,options); 
var str1, str2; 

if (srcVar) { 
    str1=env(args.Item(3)); 
    str2=str1.replace(search,replace); 
    if (!alterations || str1!=str2) WScript.Stdout.WriteLine(str2); 
} else { 
    while (!WScript.StdIn.AtEndOfStream) { 
    if (multi) { 
     WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace)); 
    } else { 
     str1=WScript.StdIn.ReadLine(); 
     str2=str1.replace(search,replace); 
     if (!alterations || str1!=str2) WScript.Stdout.WriteLine(str2); 
    } 
    } 
} 

我創造了這個換去了所有的文件(用戶定義函數以2個參數)

:myBatchFunc 
    for %%F in (*.txt) do (
    type "%%F"|repl %~1 %~2 >"%%F.new" 
    move /y "%%F.new" "%%F" 
    ) 

這將是我從我打電話,運行一切主要批次。

@echo off 
set "file=C:\Users\ecatser\Desktop\RPS_cells\EXCEPTII.log" 
set /A i=0 

for /F "usebackq delims=" %%a in ("%file%") do (
set /A i+=1 
call set array[%%i%%]=%%a 
call set n=%%i%% 
) 

for /L %%i in (1,1,%n%) do call myBatchFunc %%array[%%i]%% x 
PAUSE 

我意識到這是一個很大的代碼非常容易的事,誰能爲我提供批量/的Perl/Python的一個更好的答案? 預先感謝您。

PS(加上劇本我現在所擁有的替換字符串「X」所以它是不刪除該行


編輯:。 情況如下: 我有1個目錄與包含list.log(這基本上是異常的列表),以及一堆其他.txt文件

list.log例如:

53737 
52505   // this value matches the cell in .txt 
13211 
21412 
21313 
23123 

.txt文件例如

LOTS_OF_USELESS_TEXT,Cell=cell52505  // the cell with the same value 
LOTS_OF_USELESS_TEXT,Cell=cell20774 
LOTS_OF_USELESS_TEXT,Cell=cell22312 
LOTS_OF_USELESS_TEXT,Cell=cell20233 
LOTS_OF_USELESS_TEXT,Cell=cell12322 

輸出.txt文件:

LOTS_OF_USELESS_TEXT,Cell=cell20774  // 52505 was removed 
LOTS_OF_USELESS_TEXT,Cell=cell22312 
LOTS_OF_USELESS_TEXT,Cell=cell20233 
LOTS_OF_USELESS_TEXT,Cell=cell12322 

所以,我希望腳本逐行讀取list.log線取每個值/串,並在每個尋找它.txt文件,如果發現從文件中刪除該行並覆蓋IF未找到,則從list.log轉到下一個值/行。 基本上.txt文件是一個單元格列表,list.log是一個異常列表,我想從.txt文件中刪除異常。

我希望我這次解釋得很好。

+0

好,我做它最終工作,但它是一個很大的代碼的東西很簡單: – cserbanesc

回答

2

如何:

perl -ani.back -e 'print unless /The text to be search/' list_of_files_to_process 

這將刪除包含The text to be search線並保存次延長.back原始文件。

編輯

perl -ani.back -e 'BEGIN{open $fh,"f.log";@l=<$fh>;[email protected];$r=join("|",@l)}print unless /\b$r\b/' *.txt 
+0

感謝您的快速回復,問題是「文本被搜索」必須從文件加載(list.txt讓我們說)。因此,例如它從list.log中取第一行,並在.txt文件中查找該字符串,並刪除找到該字符串的行,以此類推,從列表 – cserbanesc

+0

@cserbanesc的下一行:我不確定我是否理解您的需求。你可以編輯你的問題,並添加一些示例輸入文件和預期的結果? – Toto

+0

我按照要求編輯了這個問題,我希望這次我解釋得很好! :) – cserbanesc

1

使用python下面應該工作。它使用正則表達式。讀取模式列表,並使用「或」將模式連接到一個大的正則表達式。然後每行讀取每行文件,如果模式不匹配,則將該行寫入新文件,否則不寫入。該腳本需要第一個命令行參數是模式文件,所有後續參數都是要處理的文件的名稱。

import re 
import sys 
# patternfile contains a list of patterns, one per row 
# this lines are striped (linebreaks removed) and joined using "OR" regex 
with open(sys.argv[1]) as patternfile: 
    pattern=re.compile('|'.join(map(strip,patternfile.readlines()))) 
# loop over all files given 
for f in sys.argv[2:]: 
    with open(f,'r') as infile:  
     fout = infile.name + '.new' 
     # open outfile with new name 
     with open(infile.name, 'w') as outfile: 
      # loop over lines 
      for line in f: 
       # check if pattern matches 
       if re.search(pattern,line)==None: #pattern does not match 
        outfile.write(line) 

如果需要,必須修改腳本以刪除原始文件。

+0

您好,感謝您的答覆!這個python腳本就像你所說的「每個文件都是每行讀取一行,如果模式不匹配,則該行被寫入新文件,否則不行」。我需要更類似於:「每個文件是每行讀取一行,如果模式匹配刪除該文件的行,否則從模式列表中讀取下一行/值」 – cserbanesc

+0

該問題已被標記爲已回答,仍然希望更新。通過加入模式,您可以模擬地搜索所有模式的線條。您只讀取一次所有模式,而不是在文件的所有行中搜索它們。此外,你總是提到刪除原始文件中的行,但你也可以創建一個沒有匹配行的新文件。之後,如果你喜歡,你可以將新文件移動到舊文件的位置。 – Denis

+0

啊,你是對的。這是我的錯誤(我很抱歉,我誤讀了),您提供的答案也是正確的,我可以將兩者都標記爲答案嗎?非常感謝,我發現你的答案非常有用。 – cserbanesc

相關問題