2015-07-10 40 views
2

對於我的文件中的每一行,我希望在第四短劃線之前打印該行上的所有內容。如何在第n個分隔符上分割一個字符串?

輸入:

TCGA-HC-8216-10A-11D-A323-01 
TCGA-J4-8200-10A-11D-A323-01 
TCGA-EJ-A65E-10A-11D-A323-01 

,我想拆就第四劃線每行 「 - 」

輸出:

TCGA-HC-8216-10A 
TCGA-J4-8200-10A 
TCGA-EJ-A65E-10A 

我知道我可以在這樣的每一個破折號分裂:

#!/usr/bin/env bash 

IN="TCGA-HC-8216-01A-11D-A323-01 
TCGA-J4-8200-10A-11D-A323-01 
TCGA-EJ-A65E-10A-11D-A323-01" 

arr=$(echo $IN | tr "-" "\n") 

for x in $arr 
do 
echo "> [$x]" 
done 

,但是這會拆分並打印每個短劃線之間的每個字符串部分。

+0

看那'cut'命令和/或'awk'。 –

回答

4

使用cut

cut -d- -f1-4 <<'EOF' 
TCGA-HC-8216-01A-11D-A323-01 
TCGA-J4-8200-10A-11D-A323-01 
TCGA-EJ-A65E-10A-11D-A323-01 
EOF 

你切割-您的-d(分隔符)輸入,並返回-f(場)1-4,一到四。

1
#!/bin/bash 

IN="TCGA-HC-8216-01A-11D-A323-01 
TCGA-J4-8200-10A-11D-A323-01 
TCGA-EJ-A65E-10A-11D-A323-01" 

arr=$(echo "$IN" | cut -d '-' -f1-4) 

echo "$arr" 

打印:

TCGA-HC-8216-01A 
TCGA-J4-8200-10A 
TCGA-EJ-A65E-10A 
0

使用grep與ERE:

arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*") 

隨着BRE:

arr=$(echo "$IN" | grep -o "^\([^-]*-\)\{3\}[^-]*") 

實施例:

#!/bin/bash 
IN="TCGA-HC-8216-01A-11D-A323-01 
TCGA-J4-8200-10A-11D-A323-01 
TCGA-EJ-A65E-10A-11D-A323-01" 

arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*") 

for x in $arr 
do 
echo "> [$x]" 
done 

輸出:

> [TCGA-HC-8216-01A] 
> [TCGA-J4-8200-10A] 
> [TCGA-EJ-A65E-10A] 
0

使用純bash和模式匹配:

#!/bin/bash  
IN="TCGA-HC-8216-01A-11D-A323-01 
TCGA-J4-8200-10A-11D-A323-01 
TCGA-EJ-A65E-10A-11D-A323-01" 

re='([^-]+-){3}[^-]+' 

for line in $IN 
do 

    if [[ $line =~ $re ]]; then 
     trunc=${BASH_REMATCH[0]} 
    fi 
    echo "$trunc" 
done 

輸出:

TCGA-HC-8216-01A 
TCGA-J4-8200-10A 
TCGA-EJ-A65E-10A 
相關問題