單向使用awk
。這不是一個簡單的腳本。該過程簡而言之:關鍵點是變量'all_ranges',當重置從範圍文件中讀取保存其數據的範圍時,當設置時,停止該過程並開始從'id-位置' 文件讀取,檢查位置在數組中的數據和打印如果匹配的範圍。我試圖避免多次處理範圍文件,並通過塊來完成,這使得它更加複雜。
編輯補充一點,我假設id
字段在這兩個文件進行排序。否則,這個腳本會失敗,你需要另一種方法。的script.awk
內容:
BEGIN {
## Arguments:
## ARGV[0] = awk
## ARGV[1] = <first_input_argument>
## ARGV[2] = <second_input_argument>
## ARGC = 3
f2 = ARGV[ --ARGC ];
all_ranges = 0
## Read first line from file with ranges to get 'class' header.
getline line <f2
split(line, fields)
class_header = fields[2];
}
## Special case for the header.
FNR == 1 {
printf "%s\t%s\n", $0, class_header;
next;
}
## Data.
FNR > 1 {
while (1) {
if (! all_ranges) {
## Read line from file with range positions.
ret = getline line <f2
## Check error.
if (ret == -1) {
printf "%s\n", "ERROR: " ERRNO
close(f2);
exit 1;
}
## Check end of file.
if (ret == 0) {
break;
}
## Split line in spaces.
num = split(line, fields)
if (num != 4) {
printf "%s\n", "ERROR: Bad format of file " f2;
exit 2;
}
range_id = fields[1];
if ($1 == fields[1]) {
ranges[ fields[3], fields[4] ] = fields[2];
continue;
}
else {
all_ranges = 1
}
}
if (range_id == $1) {
delete ranges;
ranges[ fields[3], fields[4] ] = fields[2];
all_ranges = 0;
continue;
}
for (range in ranges) {
split(range, pos, SUBSEP)
if ($2 >= pos[1] && $2 <= pos[2]) {
printf "%s\t%s\n", $0, ranges[ range ];
break;
}
}
break;
}
}
END {
for (range in ranges) {
split(range, pos, SUBSEP)
if ($2 >= pos[1] && $2 <= pos[2]) {
printf "%s\t%s\n", $0, ranges[ range ];
break;
}
}
}
運行它想:
awk -f script.awk file1 file2 | column -t
有了結果如下:
id position class
a1 21 Xfact
a1 39 Xfact
a1 77 xbreak
b1 88 Xbreak
b1 122 Xbreak
c1 22 Xbreak
這是功課?它看起來很誇張。 – Vatine 2012-07-20 09:58:04