匹配條目文件

我FileA內容是：匹配條目文件

LetterA  LetterM 12 
LetterB  LetterC 45 
LetterB  LetterG 23

FileB內容是：

LetterA 23 43 LetterZ 
LetterB 21 71 LetterC

我想寫原FileA項加上FileB如果
FileA $1 = FileB $1 && FileA $2 = FileB $4進入$2-$3。
對於這樣的輸出中：

LetterB  LetterC 45 -50

我可以使用bash循環

while read ENTRY 
do 
    COLUMN1=$(cut -f 1 $ENTRY) 
    COLUMN2=$(cut -f 2 $ENTRY) 
    awk -v COLUMN1="$COLUMN1" -v COLUMN2="COLUMN2" -v ENTRY="$ENTRY" 
     '($1==COLUMN1 && $4==COLUMN2) 
     {print ENTRY,$2-$3}' FileB 
done < FileA

但是做到這一點，這個循環是太慢了。有沒有辦法做到這一點使用awk沒有循環？
獲取多個輸入文件 - >匹配其內容 - >打印想要的輸出。

來源

2013-06-20 PoGibas

它可以在AWK的一行來解決：

awk 'NR==FNR{a[$1":"$2]=$0; next} 
    NR>FNR && $1":"$4 in a{print a[$1":"$4], $2-$3}' fileA fileB

甚至更簡潔（與感謝@JS웃）：

awk 'NR==FNR{a[$1$2]=$0;next}$1$4 in a{print a[$1$4],$2-$3}' file{A,B}

來源

2013-06-20 12:32:12 anubhava

謝謝，它的作品就像一個魅力！只是想問：爲什麼「1美元」：「在一個'4美元的作品，但是當我只嘗試'1美元'，它不起作用？ – PoGibas

我不知道這是否是一行代碼，但它是一個很好的小演示，演示了awk如何比通常理解的更強大。 –

@JohnZwinck：當我構建這個awk解決方案時，它是1個班輪，但是發佈時我爲了更好的可讀性打破了2條線：P – anubhava

我決定使用Python和NumPy的來試試吧一個稍微非正統的，但希望快速的解決方案：

import numpy as np 

# load the files into arrays with automatically determined types per column 
a = np.genfromtxt("fileA", dtype=None) 
b = np.genfromtxt("fileB", dtype=None) 

# concatenate the string columns (n.b. assumes no "foo" "bar" and "fo" "obar") 
aText = np.core.defchararray.add(a['f0'], a['f1']) 
bText = np.core.defchararray.add(b['f0'], b['f3']) 

# find the locations where the strings from A match in B, and print the values 
for index in np.where(np.in1d(aText, bText)): 
    aRow = a[index][0] 
    bRow = b[bText == aText[index]][0] 
    print '{1} {2} {3} {0}'.format(bRow[1] - bRow[2], *aRow)

編輯：它的速度快，一旦大幹快上，但所花的時間loadin g這些文件不幸比@ anubhava使用awk的優秀解決方案長。

來源

2013-06-20 12:55:01

匹配條目文件

回答

相關問題