2017-07-11 18 views
1

如何使用來自ids.1文件的行名和來自ids.2文件的列名來過濾輸入文件中的值?如何過濾選定列和行的值?

樣品輸入

name s1 s2 s3 s4 
a1 7 8 7 8 
a2 7 54 7 8 
a3 8 8 8 8 
a4 7 7 7 0 

ids.1

name 
a1 
a4 

ids.2

name 
s3 
s4 

樣品

name s3 s4 
a1 7 8 
a4 7 0 

我用下面的代碼來篩選出特定的行的值輸出。我怎樣才能把它擴展到列?

awk 'ARGIND == 1 { a[$1] = 1; next } a[$1] { print $0 }' ids.1 sample.input 


name s1  s2  s3  s4 
a1  7  8  7  8 
a4  7  7  7  0 

回答

1

這一個假設,即第一個記錄是始終在列文件(ids.2):

$ awk ' 
ARGIND==1 {     # first file, rows 
    r[$1] 
} 
ARGIND==2 {     # second file, columns 
    c[$1] 
} 
ARGIND==3 && FNR==1 {  # first record of third file, data 
    n=split($0,a)   # split the first record to a, the column template 
    for(i in a)    # delete the cols we don t want 
     if((a[i] in c)==0) 
      delete a[i] 
}ARGIND==3 && $1 in r {  # third file and the rows we want 
    b=""      # print buffer 
    for(i=1;i<=NF;i++)  # for all cols 
    if(i in a)    # get the ones we want 
     b=b (b==""?"":OFS) $i 
    print b     # output 
}' ids.1 ids.2 file 
name s3 s4 
a1 7 8 
a4 7 0 
2

更簡單,更快的版本:

awk ' 
ARGIND==1{row[$1]=1;next} 
ARGIND==2{col[$1]=1;next} 
row[$1]{ 
    for(i=1;i<=NF;i++){ 
     if(col[$i] && FNR==1) v[i]=1 
     if (v[i]) printf "%s%s", (i==1?"":FS), $i 
    } 
    print "" 
} ' id.1 id.2 data.file 

你的榜樣,它給出:

name s3 s4  
a1 7 8 
a4 7 0 
+0

感謝它更簡單。但是,標記爲答案的腳本在具有100萬行和1k列的數據矩陣上快4分鐘。 :) – user1883491