在AWK中匹配多行的正則表達式。 &&運算符？

我不確定& &運算符是否在正則表達式中工作。我想要做的是匹配一行，使它以一個數字開頭並具有字母'a'，並且下一行以數字開頭並具有字母'b'和下一行...字母'c' 。這個abc序列將被用作開始讀取文件的唯一標識符。在AWK中匹配多行的正則表達式。 &&運算符？

下面是我在awk中想要的。

/(^[0-9]+ .*a)&&\n(^[0-9]+ .*b)&&\n(^[0-9]+ .*c) { 
print $0 
}

這些正則表達式的作品一樣只是一個（^ [0-9] +。* A），但我不知道如何把它們串起來與下一行是這樣的。

我的文件將是這樣的：

JUNK UP HERE NOT STARTING WITH NUMBER 
1  a   0.110  0.069   
2  a   0.062  0.088   
3  a   0.062  0.121   
4  b   0.062  0.121   
5  c   0.032  0.100   
6  d   0.032  0.100   
7  e   0.032  0.100

而我想要的是：

3  a   0.062  0.121   
4  b   0.062  0.121   
5  c   0.032  0.100   
6  d   0.032  0.100   
7  e   0.032  0.100

來源

2012-10-03 chimpsarehungry

對於你的情況，因爲你的「條款」（要三個條件，共同）不重疊，你真的不需要任何操作可言，只是「吃掉」的剩下的按照@ m.buettner的建議行事。在條件_do_重疊的情況下，比如如果你想檢查一行包含符號和數字（但你不知道順序），那麼你會使用所謂的「前瞻斷言」來實現這種匹配。 –

我只知道前瞻斷言是python中的next（）函數。我試圖在下面的答案。 – chimpsarehungry

我對Python並不熟悉，但是我正在談論的是前瞻和lookbehind構造，我知道它支持Python：http://www.regular-expressions.info/lookaround.html。 –

[更新基於澄清]

一個高位是AWK是面向行的語言，所以你不會真正能夠做一個正常的模式匹配跨線。通常的做法是單獨匹配每一行，並在後面的子句/語句中找出所有正確的部分是否匹配。

什麼我在這裏做的是在第二場尋找一個a在同一行，在另一條線路上的第二個領域的b，並在第二個字段中c上第三行。在前兩種情況下，我將這一行的內容以及它發生的行號儲存起來。當第三條線匹配並且我們還沒有找到整個序列時，我回去檢查另外兩條線是否存在並且有可接受的線號。如果一切正常，我會打印出緩衝的前一行，並設置一個標誌，指示其他所有內容都應該打印。

這裏的腳本：

$2 == "a" { a = $0; aLine = NR; } 
$2 == "b" { b = $0; bLine = NR; } 
$2 == "c" && !keepPrinting { 
    if ((bLine == (NR - 1)) && (aLine == (NR - 2))) { 
     print a; 
     print b; 
     keepPrinting = 1; 
    } 
} 
keepPrinting { print; }

，這裏是一個文件，我測試了它：

JUNK UP HERE NOT STARTING WITH NUMBER 
1  a   0.110  0.069 
2  a   0.062  0.088 
3  a   0.062  0.121 
4  b   0.062  0.121 
5  c   0.032  0.100 
6  d   0.032  0.100 
7  e   0.032  0.100 
8  a   0.099  0.121 
9  b   0.098  0.121 
10 c   0.097  0.100 
11 x   0.000  0.200

這裏就是我得到的，當我運行它：

$ awk -f blort.awk blort.txt 
3  a   0.062  0.121 
4  b   0.062  0.121 
5  c   0.032  0.100 
6  d   0.032  0.100 
7  e   0.032  0.100 
8  a   0.099  0.121 
9  b   0.098  0.121 
10 c   0.097  0.100 
11 x   0.000  0.200

來源

2012-10-04 00:44:06 danfuzz

這與我想要的類似。我應該提到在我的文件中abc將是一個獨特的序列。我將用它作爲閱讀的起點。所以我想從你的測試文件中得到的輸出是帶有a，b，c，d，e，a，b，c，x的行。 – chimpsarehungry

我根據你的意見更新了我的答案。您發佈的狀態機解決方案從學術的角度來看很有趣，但也許這樣的一個更實用？ – danfuzz

感謝danfuzz。我比我的狀態機更容易向我的老闆解釋腳本。我所做的只是添加{if（（keepPrinting> 0）&&（++ keepPrinting <= 50））print $ 0}以獲得匹配後我想要的行數。 – chimpsarehungry

不，它不工作。你可以嘗試這樣的事情：

/(^[0-9]+.*a[^\n]*)\n([0-9]+.*b[^\n]*)\n([0-9]+.*c[^\n]*)/

並重復說明爲你需要的字母數量。

[^\n]*將匹配儘可能多的非換行字符（因此可以換行）。

來源

2012-10-03 23:46:57

沒有。謝謝你告訴我，雖然 – chimpsarehungry

你會得到什麼？ –

什麼也沒有。 – chimpsarehungry

我想在Python中這樣做。通過在行之外創建一個迭代器，並嘗試將下幾行與next（）進行匹配。

lines = iter([line for line in open("FILE").readlines() if re.match(r'^([0-9])',line)]) 

for line in lines: 
    count = 50 
    if line.find('a'): 
     if next(lines).find('b'): 
      if next(lines).find('c'): 
       while count > 0: 
        print line 
        count -=1

但它只是不正確。理想情況下，我會找到匹配並打印從'a'開始的接下來的50行。也許我需要實現某種狀態機。

來源

2012-10-04 16:55:52 chimpsarehungry

一位朋友爲我寫了這個awk程序。這是一臺狀態機。它的工作原理。

#!/usr/bin/awk -f 

BEGIN { 
    # We start out in the "idle" state. 
    state = "idle" 
} 

/^[0-9]+[[:space:]]+q/ { 
    # Everytime we encounter a "# q" we either print it or go to the 
    # "q_found" state. 
    if (state != "printing") { 
     state = "q_found" 
     line_q = $0 
    } 
} 

/^[0-9]+[[:space:]]+r/ { 
    # If we are in the q_found state and "# r" immediate follows, 
    # advance to the r_found state. Else, return to "idle" and 
    # wait for the "# q" to start us off. 
    if (state == "q_found") { 
     state = "r_found" 
     line_r = $0 
    } else if (state != "printing") { 
     state = "idle" 
    } 
} 

/^[0-9]+[[:space:]]+l/ { 
    # If we are in the r_found state and "# l" immediate follows, 
    # advance to the l_found state. Else, return to "idle" and 
    # wait for the "# q" to start us off. 
    if (state == "r_found") { 
     state = "l_found" 
     line_l = $0 
    } else if (state != "printing") { 
     state = "idle" 
    } 
} 

/^[0-9]+[[:space:]]+i/ { 
    # If we are in the l_found state and "# i" immediate follows, 
    # we're ready to start printing. First, display the lines we 
    # squirrelled away then move to the "printing" state. Else, 
    # go to "idle" and wait for the "# q" to start us off. 
    if (state == "l_found") { 
     state = "printing" 
     print line_q 
     print line_r 
     print line_l 
     line = 0 
    } else if (state != "printing") { 
     state = "idle" 
    } 
} 

/^[0-9]+[[:space:]]+/ { 
    # If in state "printing", print 50 lines then stop printing 
    if (state == "printing") { 
     if (++line < 48) print 
    } 
}

來源

2012-10-04 18:44:11 chimpsarehungry

在AWK中匹配多行的正則表達式。 &&運算符？

回答

相關問題