連接兩個TSV文件與內部聯接

我有2個TSV文件：連接兩個TSV文件與內部聯接

TSV file 1: 
    A  B 
    hello 0.5 
    bye  0.4 

TSV file 2: 
C  D 
hello  1 
country 5

我想加入2 TSV文件一起基於file1.A=file2.C

如何與在Linux中加入功能做？

希望能得到這樣的：

Text  B D 
hello 0.5 1 
bye  0.4 
country  5

沒有得到任何輸出，這一點：

join -j 1 <(sort -k1 file1.tsv) <(sort -k1 file2.tsv)

來源

2015-06-23 jxn

是您的樣本文件1真的這樣呢？標籤在哪裏？你爲什麼在'-k2'上排序，但是使用'-j 1'來加入？另外請注意'man join'中的'-e'選項可能有助於找到不匹配的項目。祝你好運。 – shellter

這種爲我工作。 'join -t $'\ t'-1 1 -2 1 <（sort -k1 file1.tsv）<（sort -k1 file2.tsv）> join_test.tsv'我遇到的主要問題是定義了tab分隔符。 – jxn

良好的接觸和抱歉，我錯過了這一關鍵點。我很高興你有一個解決方案。對於那些已經發布可用解決方案的人來說，它絕不會感到痛苦。它給人們激勵分享他們所知道的東西。祝你們好運。 – shellter

有點毛茸茸的，但在這裏是用awk和關聯數組的解決方案。

awk 'FNR == 1 {h[length(h) + 1] = $2} 
    FILENAME ~ /test1.tsv/ && FNR > 1 {t1[$1]=$2} 
    FILENAME ~ /test2.tsv/ && FNR > 1 {t2[$1]=$2} 
    END{print "Text\t"h[1]"\t"h[2]; 
     for(x in t1){print x"\t"t1[x]"\t"t2[x]} 
     for(x in t2){print x"\t"t1[x]"\t"t2[x]}}' test1.tsv test2.tsv | 
    sort | uniq

來源

2015-06-23 23:26:08 cr1msonB1ade

File1中

$ cat file1 
A  B 
hello 0.5 
bye  0.4

文件2

$ cat file2 
C  D 
hello  1 
country 5

輸出

$ awk 'NR==1{print "Text","B","D"}FNR==1{next}FNR==NR{A[$1]=$2;next}{print $0,(f=$1 in A ? A[$1] : ""; if(f)delete A[$1]}END{for(i in A)print i,"",A[i]}' OFS='\t' file2 file1 
Text B D 
hello 0.5 1 
bye  0.4 
country  5

更好的閱讀的版本

awk ' 
    # Print header when NR = 1, this happens only when awk reads first file 
    NR==1{print "Text","B","D"} 

    # Number of Records relative to the current input file. 
    # When awk reads from the multiple input file, 
    # awk NR variable will give the total number of records relative to all the input file. 
    # Awk FNR will give you number of records for each input file 
    # So when awk reads first line, stop processing and go to next line 
    # this is just to skip header from each input file 
    FNR==1{ 
      next 
      } 

    # FNR==NR is only true while reading first file (file2) 
    FNR==NR{ 
       # Build assicioative array on the first column of the file 
       # where array element is second column 
       A[$1]=$2 

       # Skip all proceeding blocks and process next line 
       next 
      } 
      { 
       # Check index ($1 = column1) from second argument (file1) exists in array A 
       # if exists variable f will be 1 (true) otherwise 0 (false) 
       # As long as above state is true 
       # print current line and element of array A where index is column1 
       print $0,(f=$1 in A ? A[$1] : "") 

       # Delete array element corresponding to index $1, if f is true 
       if(f)delete A[$1] 
      } 

     # Finally in END block print array elements one by one, 
     # from file2 which does not exists in file1 
     END{ 
       for(i in A) 
        print i,"",A[i] 
      } 
    ' OFS='\t' file2 file1

來源

2015-06-24 04:33:50

連接兩個TSV文件與內部聯接

回答

相關問題