替換文本由特定字符

我有這種類型的數據（所有大字母串）替換文本由特定字符

>A|B|C|D|E|F 
test test test 
test test 
>A|B|C|D|E|F 
test test test 
test

，並希望刪除C，d，E包圍文本，沒有|發生。我已經用sed試過了，但是Im無法替換之後出現的文字|
提前致謝。

來源

2013-09-29 Atticus

所以你的真實數據，你必須多字的領域和你沒有「| 「分隔領域，對嗎？也許對你發佈樣本輸入和期望的輸出反映那麼是有用的，而不是用「|」分隔的單字符字段。只是把它扔在那裏...... –

是的，你是對的。通用樣本輸入可能會令人困惑。 – Atticus

的Perl oneliner，

perl -F'\|' -lane 'print /\|/ ? join "|", @F[0,1,5] : $_' file

它由分割每行char和store的值在@F數組中。如果行包含|，它將從@F獲取元素0,1和5，否則保持原樣。

Oneliner deparsed，

perl -MO=Deparse -F'\|' -lane 'print /\|/ ? join "|", @F[0,1,5] : $_' file 
BEGIN { $/ = "\n"; $\ = "\n"; }   # -l switch makes print to add newline 
LINE: while (defined($_ = <ARGV>)) { # -n switch 
    chomp $_;       # -l switch chomps newlines 
    our(@F) = split(/\|/, $_, 0);  # -a switch splits on value of -F switch 
    print /\|/ ? join('|', @F[0, 1, 5]) : $_; 
}

來源

2013-09-29 19:32:27

謝謝，它的作品！你能簡單地解釋一下代碼嗎？ – Atticus

sed正常工作：

$ cat 1 
>A|B|C|D|E|F 
test test test 
test test 
>A|B|C|D|E|F 
test test test 
test 
$ sed 's/C|D|E|//' 1 
>A|B|F 
test test test 
test test 
>A|B|F 
test test test 
test

UPDATE

$ sed 's/\([^|]|[^|]|\).*|/\1/' 1 
>A|B|F 
test test test 
test test 
>A|B|F 
test test test 
test

來源

2013-09-29 19:30:54 falsetru

也許我的問題不清楚。 A，B，C，D，E，F是字符串。一個例子是：'> gene_8 | GeneMark.hmm | 322_aa | + | 3803 | 4771TS28_contig03869'。我想從字符串中獨立刪除內容。 – Atticus

@Atticus，我添加了另一個代碼。一探究竟。 – falsetru

也許gawk適合此

awk --re-interval -F'|'\ 
     'NF > 4{$0=gensub(/^(([^|]*\|){2})([^|]*\|){3}(.*)$/, "\\1\\4", -1)}; 
     {print}' file

來源

2013-09-29 19:39:37 iruvar

僅供參考，您只需要'--re-interval'在舊gawk版本中，這是最近gawks中的默認行爲（不，我不知道什麼時候改變了，但已經有一段時間了）。另外，你不需要設置OFS，因爲你不會重新編譯記錄，而只需執行'$ 0 = gensub（...）'並丟失中間變量'z'。 –

@EdMorton，好點，合併。我正在離開'--re-interval'，因爲我的版本的GNU'awk'（3.1.8）似乎需要它 – iruvar

這應該這樣做。 -i選項指定要在原地編輯文件。

perl -i.bak -pe 's/\|[CDE]//g' file

或使用sed的

sed -i.bak -re 's/\|[CDE]//g' file

來源

2013-09-29 19:47:59 hwnd

$ cat file 
>A|B|C|D|E|F 
test test test 
test test 
>A|B|C|D|E|F 
test test test 
test 
>gene_8|GeneMark.hmm|322_aa|+|3803|4771TS28_contig03869 
test test test 
test test 
$ 
$ sed -r 's/(([^|]+\|){2})(([^|]+\|){3})/\1/' file 
>A|B|F 
test test test 
test test 
>A|B|F 
test test test 
test 
>gene_8|GeneMark.hmm|4771TS28_contig03869 
test test test 
test test

來源

2013-09-29 20:17:48

awk的正常工作，以及：

awk '{sub(/C\|D\|E\|/,"")}1' file 
>A|B|F 
test test test 
test test 
>A|B|F 
test test test 
test

來源

2015-11-13 18:16:41

替換文本由特定字符

回答

相關問題