追加列的圖案匹配

我有兩個文件之後它們中的一個僅僅是一個列向量，例如：追加列的圖案匹配

等是以下形式的矩陣

1x23 1x24 1y21 1y22 1y25 1z22 class 
2000 3000 4000 5000 6000 7000 Yes 
1500 1200 1100 1510 1410 1117 No

首先，我想查找第一個文件中的哪些行與第二個文件中的第一行相匹配。第二我想複製匹配第一個文件中的第二個文件的列，並將它們附加到第二個文件。所以，由於1x23,1y21匹配，我想在第二個列中複製這兩列，並將它追加到類變量之前。

我希望我的結果是

1x23 1x24 1y21 1y22 1y25 1z22 1x23 1y21 class 
2000 3000 4000 5000 6000 7000 2000 4000 Yes 
1500 1200 1100 1510 1410 1117 1500 1100 No

我用perl的使用的3個循環的代碼，但由於數據是非常大的，它墜毀。我認爲應該有效的方法來做到這一點。

來源

2013-11-14 discipulus

嘗試這一個班輪：

awk 'NR==FNR{a[$0]=1;next}FNR==1{for(i=1;i<=NF;i++)if(a[$i])k[i]}{for(x in k)$NF= sprintf("%s ",$x) $NF}7' f1 f2

更好的閱讀的版本：

awk 'NR==FNR{a[$0]=1;next} 
    FNR==1{for(i=1;i<=NF;i++) if(a[$i])k[i]} 
    {for(x in k) 
      $NF= sprintf("%s ",$x) $NF}7' f1 f2

輸出：

1x23 1x24 1y21 1y22 1y25 1z22 1y21 1x23 class 
2000 3000 4000 5000 6000 7000 4000 2000 Yes 
1500 1200 1100 1510 1410 1117 1100 1500 No

來源

2013-11-14 21:23:02 Kent

+1有趣的解決方案.. –

不知道爲什麼你的Perl代碼會崩潰。我建議下面的算法，在常量內存（在Perl中實現AWK時相比，可能更具可讀性）運行：

請先閱讀文件並生成列名的列表
讀取數據的第一行文件（實際報頭）
相交兩個列表以產生列索引列表
讀一個線的數據文件，並通過柱拆分
創建共同的一個新的數組使用您在步驟3中建立的「必需」列索引列表對其進行索引。輸出它。
重複最後2個步驟。

來源

2013-11-14 19:52:38

你可以嘗試

awk -f app.awk file1.txt file2.txt

其中file1.txt是你的第一個文件，file2.txt是第二個文件，並app.awk是

NR==FNR { 
    key[$0]++ 
    next 
} 
{ 
    for (i=1; i<=NF; i++) 
     C[FNR,i]=$i 
} 

END { 
    for (i=1; i<=NF; i++) 
     if (C[1,i] in key) 
      k[++j]=i     
    nc=j 
    for (j=1; j<=FNR; j++) { 
     for (i=1; i<NF; i++) 
      printf "%s%s",C[j,i],OFS  
     for (i=1; i<=nc; i++) 
      printf "%s%s",C[j,k[i]],OFS  
     printf "%s%s",C[j,NF],RS 
    } 
}

來源

2013-11-14 20:15:00

這裏有一個長篇大論但恕我直言明確的做法。

use strict; 
use warnings; 

open(my $data, '<', 'data.txt'); 

# read first row from the data file 
my $line = <$data>; 
chomp $line; 

# create a list of columns 
my @cols = split//, $line; 

# create hash with column indexes 
my %colindex; 
my $i = 0; 
foreach my $colname (@cols) { 
     $colindex{$colname} = $i++; 
} 

# Save last column ('class') 
my $lastcol = pop @cols; 

# get input (column names) 
open(my $input, '<', 'input.txt'); 
my @colnames = <$input>; 
close $input; 

# append column names to array if there is a match 
foreach (@colnames) { 
     chomp; 
     if (exists $colindex{$_}) { 
       push @cols, $_; 
     } 
} 

# Restore the last column 
push @cols, $lastcol; 

# Now process your data 
open(my $out, '>', 'output.txt'); 

# write the header column 
print $out join(" ", @cols), "\n"; 

while ($line = <$data>) { 
     chomp $line; 
     my @l = split//, $line; 
     foreach my $colname (@cols) { 
       print $out $l[$colindex{$colname}], " "; 
     } 
     print $out "\n"; 
} 

close $out; 
close $data;

來源

2013-11-14 20:23:35

這裏的另一種選擇：

use strict; 
use warnings; 

my ($matrix, @cols) = pop; 
my %headings = map { chomp; $_ => 1 } <>; 

push @ARGV, $matrix; 
while (<>) { 
    my @array = split; 
    @cols = grep $headings{ $array[$_] }, 0 .. $#array if $. == 1; 
    splice @array, -1, 0, @array[@cols]; 
    print "@array\n"; 
}

用法：perl script.pl vectorFile matrixFile [>outFile]

輸出你的數據集：

1x23 1x24 1y21 1y22 1y25 1z22 1x23 1y21 class 
2000 3000 4000 5000 6000 7000 2000 4000 Yes 
1500 1200 1100 1510 1410 1117 1500 1100 No

散列在矢量文件中使用的條目創建。在矩陣文件的第一行可以找到的所有條目的列位置保存在@col中。來自split矩陣行的匹配列條目插入在split矩陣行的最後一個元素之前。最後，新行是print ed。

希望這會有所幫助！

來源

2013-11-14 22:27:14 Kenosis

追加列的圖案匹配

回答

相關問題