0
我有一個問題。我有一個包含多個列和行的大數據文件。第一對數列由製表符分隔符分隔,第二部分用「;」分隔。我想提取前五列。而從「;」將EUR_AF =欄和AF =分隔開的部分並將其放置在新文件中。文件的
實施例(2行):
13 19020013 rs181615907 C T 100 PASS AA=.;AC=83;AF=0.12;AFR_AF=0.05;AMR_AF=0.15;AN=758;ASN_AF=0.17;AVGPOST=0.8701;ERATE=0.0007;EUR_AF=0.11;LDAF=0.1423;RSQ=0.6009;SNPSOURCE=LOWCOV;THETA=0.0051;VT=SNP
13 19020047 rs186129910 A . 100 PASS AA=.;AC=0;AF=0.0005;AFR_AF=0.0020;AN=758;AVGPOST=0.9992;ERATE=0.0005;LDAF=0.0008;RSQ=0.4992;SNPSOURCE=LOWCOV;THETA=0.0112;VT=SNP
13 19020095 rs140871821 C T 100 PASS AA=.;AC=38;AF=0.05;AFR_AF=0.08;AMR_AF=0.05;AN=758;ASN_AF=0.03;AVGPOST=0.9904;ERATE=0.0005;EUR_AF=0.05;LDAF=0.0538;RSQ=0.9245;SNPSOURCE=LOWCOV;THETA=0.0069;VT=SNP
我嘗試這樣做:
awk -F'[\t;]' ' NR > 30 {
for (i = 1; i <= NF; i++) {
if ($i ~ /EUR_AF/) {
printf $1 " " $2 " " $3 " " $4 " " $5 " " $10 " " "%s ", $i
}
}
print ""
}' head50.txt
輸出:
13 19020013 rs181615907 C T AF=0.12 EUR_AF=0.11
13 19020095 rs140871821 C T AF=0.05 EUR_AF=0.05
13 19020145 rs57048904 G T AF=0.61 EUR_AF=0.73
13 19020341 rs184229798 C T AF=0.03 EUR_AF=0.09
13 19020627 rs12018140 A G AF=0.70 EUR_AF=0.71
問題: 現在有缺少的行(第二個)EUR_AF部分未填充。我希望看到這些行以及第二個參數見下文:
13 19020013 rs181615907 C T AF=0.12 EUR_AF=0.11
13 19020047 rs186129910 A . AF=0.0005
13 19020095 rs140871821 C T AF=0.05 EUR_AF=0.05
13 19020145 rs57048904 G T AF=0.61 EUR_AF=0.73
13 19020341 rs184229798 C T AF=0.03 EUR_AF=0.09
13 19020627 rs12018140 A G AF=0.70 EUR_AF=0.71
希望有人能幫助我。
在此先感謝。
魯