2013-10-12 109 views
0

我正在尋找一種方法來從匹配正則表達式模式的字符串中刪除特定字符。我使用換行符將文本存儲在製表符分隔的文件中,該文件應該每行都有一條記錄,並且我試圖用空格替換所有換行符。換行符不會出現在最後一個列(這是帶有字母數字鍵的短列)中。替換模式中的特定字符

至恕我直言,解決這個問題的方法是以下模式內更換的\n每一個實例:

[^\t]*\t[^\t]* 

我的解決方案到現在爲止使用了三個步驟:

  1. 替換爲「好」 \n使用s/\([^\t]*\t{x}[^\t]*\)\n/\1#12398754987235649876234#/gx比我文件中的預期列數少一個文本的其餘部分(例如長數字)缺少的特殊字符串
  2. 替換所有的(「壞」)\n用空格
  3. 用一個新行

取代長號碼,但我有一個文本文件相當多的千兆字節和我正在尋找一個方式做這在一個sed

例輸入:

foo \t Each multiplex has screens allocated \n 
to each studio. \t abc \n 
bar \t The screens need filling. \t bcd \n 
123 \t Studios have to create product to fill \n 
their screen, and the amount of good product is limited. \t cde \n 

輸出:

foo \t Each multiplex has screens allocated to each studio. \t abc \n 
bar \t The screens need filling. \t bcd \n 
123 \t Studios have to create product to fill their screen, and the amount of good product is limited. \t cde \n 
+0

從字面上看,字符串是從一個數字開始的嗎? – Bohemian

+0

不,我修改我的示例以消除歧義。列中沒有任何模式(可以是文本,數字,標點符號......),除了僅用於分隔列的'\ t'外。 – ATN

+0

您正在嘗試刪除任何'* n *'不會跟隨「。」,是嗎? – Beta

回答

0

它總是棘手的將前行與,因爲小數量的緩衝區數它的侷限性,非貪婪量詞,缺乏超前,等等,但在這裏你有一個方法。它的評論,但我知道這並不容易遵循

sed -n ' 
    ## Label "a" 
    :a; 
    ## Enter this section after join all lines without a tab. 
    /\t.*\t/ { 
     ## Loop to remove all newlines but the last one, because it is 
     ## next line with a tab that I dont want to print now. 
     :b; 
     /\n[^\n]*\n/ { 
      s/\n/ /; 
      bb 
     }; 
     ## Print until newline (all joined lines) and delete them 
     P; 
     D; 
    }; 
    ## Append next line to buffer and repeat loop. 
    N; 
    $! ba; 
    ## Special case for last line, remove extra newlines and print. 
    s/\n/ /g; 
    p 
' infile 

假設infile有以下內容:

foo  Each multiplex has screens allocated 
to each studio. 
bar  The screens need filling. 
123  Studios have to create product to fill 
their screen, and the amount of good product is limited. 

它產生:

foo  Each multiplex has screens allocated to each studio. 
bar  The screens need filling. 
123  Studios have to create product to fill their screen, and the amount of good product is limited. 
1

使用awk

cat file 
foo  Each multiplex has screens allocated 
to each studio. 
bar  The screens need filling. 
123  Studios have to create product to fill 
their screen, and the amount of good product is limited. 

如果一行確實包含選項卡\t,則將其連接到下一行。

awk 'NR>1 {s=/\t/?"\n":" "}{printf s"%s",$0} END {print ""}' 
foo  Each multiplex has screens allocated to each studio. 
bar  The screens need filling. 
123  Studios have to create product to fill their screen, and the amount of good product is limited. 
1

這可能會爲你工作(GNU SED):

sed -r ':a;$!N;s/\n([^\t]+)$/\1/;ta;P;D' file 

讀2號線到模式空間(PS),如果最後一行不包含一個選項卡,去掉換行符在下一行閱讀並重復。 如果該行確實包含選項卡,請打印第一行然後刪除它,然後重複。