將正則表達式轉換爲sed或grep正則表達式

我不知道爲什麼這不起作用。這裏是正則表達式'text\' => '.*?'，我想用grep或sed在以下惡意文本中捕獲estrenos和cine。這是我在grep上試過的將正則表達式轉換爲sed或grep正則表達式

echo "sadsa d{        'text' => 'cine',        'indices' => [           111,           116           ]       },       {        'text' => 'estrenos',        'indices' => [ sSADW" | grep -Eo "'text\' => '.*?',"

來源

2017-09-10 user3639557

只需使用AWK：

$ awk -v RS='}' -F\' '{print $4}' file 
cine 
estrenos

這將在任何外殼採用任何AWK上工作任何UNIX框。無論白色空間是什麼，它都可以工作，因此無論您的輸入是在一行上還是在多行上傳播，無論每行上的任何位置出現多少空白或製表符，它都可以工作。

下面是它如何工作的：

AWK將所有輸入作爲分隔成字段的記錄。您的輸入（與空間壓縮的可讀性）：

sadsa d{ 'text' => 'cine', 'indices' => [ 111, 116 ] }, { 'text' => 'estrenos', 'indices' => [ sSADW

顯然有{ ... }記錄：

記錄1：

{ 'text' => 'cine', 'indices' => [ 111, 116 ] }

記錄2：

{ 'text' => 'estrenos', 'indices' => [ sSADW

，所以我們可以設置記錄分隔符爲}（與-v RS='}'）。我假設你的最後一個記錄也會在}之內結束，但是如果它沒那麼好，awk會像記錄結束一樣處理文件結尾。我們可以忽略{之前的文本（即第一條記錄之前的「sadsa d」和2條記錄之間的「，」，這些文本被視爲第一個字段的一部分，但我們並未使用該字段來處理任何事情，因此它無關緊要。

因此，考慮上述2條記錄，如果我們他們在每一個'分成字段（-F\'），那麼我們得到：

$ awk -v RS='}' -F\' '{for (i=1; i<=NF;i++) print "Record Nr", NR, "Field Nr", i, "Field Contents: <" $i ">"; print "----" 
}' file 
Record Nr 1 Field Nr 1 Field Contents: <sadsa d{ > 
Record Nr 1 Field Nr 2 Field Contents: <text> 
Record Nr 1 Field Nr 3 Field Contents: < => > 
Record Nr 1 Field Nr 4 Field Contents: <cine> 
Record Nr 1 Field Nr 5 Field Contents: <, > 
Record Nr 1 Field Nr 6 Field Contents: <indices> 
Record Nr 1 Field Nr 7 Field Contents: < => [ 111, 116 ] > 
---- 
Record Nr 2 Field Nr 1 Field Contents: <, { > 
Record Nr 2 Field Nr 2 Field Contents: <text> 
Record Nr 2 Field Nr 3 Field Contents: < => > 
Record Nr 2 Field Nr 4 Field Contents: <estrenos> 
Record Nr 2 Field Nr 5 Field Contents: <, > 
Record Nr 2 Field Nr 6 Field Contents: <indices> 
Record Nr 2 Field Nr 7 Field Contents: < => [ sSADW 
> 
----

所以你可以看到你想要的值總是簡單的第四場

來源

2017-09-10 15:51:28

你能把它分解嗎？ – user3639557

我添加了一個解釋，讓我知道如果您有任何問題。 –

地獄的解釋。太好了。而埃德，你不認爲使用擴展grep也是一個不錯的選擇，因爲grep主要是爲了這個目的。這裏例如。 'egrep -o''text'=>'\ w +'「file | cut -d'-f4'？如果不是，爲什麼？ – batMan

刪除單引號的轉義字符。然而，由於擴展的正則表達式不支持非貪婪的匹配你可能想使用Perl來代替：

grep -Po "'text' => '.*?',

來源

2017-09-10 15:05:06 Sjon

不錯！但是，這會返回''text'=>'cine''，但我想要''cine' – user3639557

@ user3639557您可以將其修改爲'grep -Po''text'=>'\ K [^'] +「 ''或'grep -Po''text'=>'\ K [^'] +（？='，）「'爲了健壯性 – Sundeep

您應該提及那只是GNU grep，根據GNU grep手冊頁'-P '是「高度實驗性的」，因此YMMV使用它。 –

tr + sed的方法：

（假設你輸入的文本是可變$s）

sed -n "s/.*'text' => '\([^']*\)'.*/\1/p" <(tr ',' '\n' <<< "$s")

輸出：

cine 
estrenos

來源

2017-09-10 16:05:21 RomanPerekhrest

將正則表達式轉換爲sed或grep正則表達式

回答

相關問題