我有file2
與~1400 $5
值之前-
是「未知」。我正在嘗試使用file2
的$2
中的文本更新file1
中的那些「未知」值。在$1
的file1
中有一組數字,可用於更新「未知」,如果它在的$4
的範圍內。我真的不知道從哪裏開始,但下面的awk
是一個開始,或者可能有更好的方法。謝謝 :)。awk更新文件,如果值範圍內
file1的
`$1` `$2`
chr6:3224495-3227968 TUBB2B
chr16:89988417-90002505 TUBB3
file2的
chr16 89985657 89986630 chr16:89985657-89986630 MC1R-2270|gc=63.5
chr16 89989779 89989898 chr16:89989779-89989898 unknown-2271|gc=73.9
chr16 89998969 89999097 chr16:89998969-89999097 unknown-2272|gc=57
chr16 89999866 89999996 chr16:89999866-89999996 unknown-2273|gc=55.4
chr16 90001127 90002222 chr16:90001127-90002222 unknown-2274|gc=63.9
chr17 1173848 1174575 chr17:1173848-1174575 BHLHA9-3|gc=78.7
期望的輸出(unknown updated to TUBB3 because the TUBB3 because the $4 value is within the range of $1
)。
chr16 89985657 89986630 chr16:89985657-89986630 MC1R-2270|gc=63.5
chr16 89989779 89989898 chr16:89989779-89989898 TUBB3-2271|gc=73.9
chr16 89998969 89999097 chr16:89998969-89999097 TUBB3-2272|gc=57
chr16 89999866 89999996 chr16:89999866-89999996 TUBB3-2273|gc=55.4
chr16 90001127 90002222 chr16:90001127-90002222 TUBB3-2274|gc=63.9
chr17 1173848 1174575 chr17:1173848-1174575 BHLHA9-3|gc=78.7
AWK
awk '
NR == FNR {min[$1]=$4; next}
{
for (id in min)
if ([id] = $5 && [id]) {
print $0, id
break
}
}
' file1 file2
編輯:
awk -v OFS='\t' 'NR==FNR{split($1,a,/[:-]/)
rstart[a[1]]=a[2]
rend[a[1]]=a[3]
value[a[1]]=$2
next}
$5~/unknown/ && $2>=rstart[$1] && $3<=rend[$1]
{sub(/unknown/,value[$1],$5)}1' file1 file2 |
column -t > output
chr16 89985657 89986630 chr16:89985657-89986630 MC1R-2270|gc=63.5
chr16 89989779 89989898 chr16:89989779-89989898 unknown-2271|gc=73.9
chr16 89989779 89989898 chr16:89989779-89989898 TUBB3-2271|gc=73.9
chr16 89998969 89999097 chr16:89998969-89999097 unknown-2272|gc=57
chr16 89998969 89999097 chr16:89998969-89999097 TUBB3-2272|gc=57
chr16 89999866 89999996 chr16:89999866-89999996 unknown-2273|gc=55.4
chr16 89999866 89999996 chr16:89999866-89999996 TUBB3-2273|gc=55.4
chr16 90001127 90002222 chr16:90001127-90002222 unknown-2274|gc=63.9
chr16 90001127 90002222 chr16:90001127-90002222 TUBB3-2274|gc=63.9
chr17 1173848 1174575 chr17:1173848-1174575 BHLHA9-3|gc=78.7
我想你在文本中混淆了'file1'和'file2'幾次。 –