2015-06-23 35 views
1

我有2個TSV文件:連接兩個TSV文件與內部聯接

TSV file 1: 
    A  B 
    hello 0.5 
    bye  0.4 

TSV file 2: 
C  D 
hello  1 
country 5 

我想加入2 TSV文件一起基於file1.A=file2.C

如何與在Linux中加入功能做?

希望能得到這樣的:

Text  B D 
hello 0.5 1 
bye  0.4 
country  5 

沒有得到任何輸出,這一點:

join -j 1 <(sort -k1 file1.tsv) <(sort -k1 file2.tsv) 
+0

是您的樣本文件1真的這樣呢?標籤在哪裏?你爲什麼在'-k2'上排序,但是使用'-j 1'來加入?另外請注意'man join'中的'-e'選項可能有助於找到不匹配的項目。祝你好運。 – shellter

+0

這種爲我工作。 'join -t $'\ t'-1 1 -2 1 <(sort -k1 file1.tsv)<(sort -k1 file2.tsv)> join_test.tsv'我遇到的主要問題是定義了tab分隔符。 – jxn

+0

良好的接觸和抱歉,我錯過了這一關鍵點。我很高興你有一個解決方案。對於那些已經發布可用解決方案的人來說,它絕不會感到痛苦。它給人們激勵分享他們所知道的東西。祝你們好運。 – shellter

回答

1

有點毛茸茸的,但在這裏是用awk和關聯數組的解決方案。

awk 'FNR == 1 {h[length(h) + 1] = $2} 
    FILENAME ~ /test1.tsv/ && FNR > 1 {t1[$1]=$2} 
    FILENAME ~ /test2.tsv/ && FNR > 1 {t2[$1]=$2} 
    END{print "Text\t"h[1]"\t"h[2]; 
     for(x in t1){print x"\t"t1[x]"\t"t2[x]} 
     for(x in t2){print x"\t"t1[x]"\t"t2[x]}}' test1.tsv test2.tsv | 
    sort | uniq 
1

File1中

$ cat file1 
A  B 
hello 0.5 
bye  0.4 

文件2

$ cat file2 
C  D 
hello  1 
country 5 

輸出

$ awk 'NR==1{print "Text","B","D"}FNR==1{next}FNR==NR{A[$1]=$2;next}{print $0,(f=$1 in A ? A[$1] : ""; if(f)delete A[$1]}END{for(i in A)print i,"",A[i]}' OFS='\t' file2 file1 
Text B D 
hello 0.5 1 
bye  0.4 
country  5 

更好的閱讀的版本

awk ' 
    # Print header when NR = 1, this happens only when awk reads first file 
    NR==1{print "Text","B","D"} 

    # Number of Records relative to the current input file. 
    # When awk reads from the multiple input file, 
    # awk NR variable will give the total number of records relative to all the input file. 
    # Awk FNR will give you number of records for each input file 
    # So when awk reads first line, stop processing and go to next line 
    # this is just to skip header from each input file 
    FNR==1{ 
      next 
      } 

    # FNR==NR is only true while reading first file (file2) 
    FNR==NR{ 
       # Build assicioative array on the first column of the file 
       # where array element is second column 
       A[$1]=$2 

       # Skip all proceeding blocks and process next line 
       next 
      } 
      { 
       # Check index ($1 = column1) from second argument (file1) exists in array A 
       # if exists variable f will be 1 (true) otherwise 0 (false) 
       # As long as above state is true 
       # print current line and element of array A where index is column1 
       print $0,(f=$1 in A ? A[$1] : "") 

       # Delete array element corresponding to index $1, if f is true 
       if(f)delete A[$1] 
      } 

     # Finally in END block print array elements one by one, 
     # from file2 which does not exists in file1 
     END{ 
       for(i in A) 
        print i,"",A[i] 
      } 
    ' OFS='\t' file2 file1