從字典中提取數據

我有兩個製表符分隔的文件，文件1包含標識符，文件2具有與這些標識符相關的值（或者說它是一個非常大的字典）。從字典中提取數據

文件1

 
Ronny 
Rubby 
Suzie 
Paul

文件1僅具有一個列。

文件2

 
Alistar Barm Cathy Paul Ronny Rubby Suzie Tom Uma Vai Zai 
12  13 14 12  11 11 12 23 30 0.34 0.65 
1  4  56 23  12 8.9 5.1 1 4 25 3

n個行存在於文件2

我想要的東西，如果文件1的標識出現在文件2，我應該有所有相關的值在另一個製表符分隔的文件中。

事情是這樣的：

 
Paul Ronny Rubby Suzie 
12  11 11 12 
23  12 8.9 5.1

預先感謝您。

來源

2011-12-08 Angelo

到目前爲止你寫了什麼代碼，你卡在哪裏？ – Duncan

你是什麼意思「一個非常大的詞典」？ – Toto

@鄧肯：我不知道如何將列值與行值相匹配，然後提取它的coulmn中的值。 @ M42字典總是很大:) – Angelo

在Python，做在流工作的一個例子（即：不需要

# read keys 
with open('file1', 'r') as fd: 
    keys = fd.read().splitlines() 

# output keys 
print '\t'.join(keys) 

# read data file, with header line and content 
with open('file2', 'r') as fd: 
    headers = fd.readline().split() 
    while True: 
     line = fd.readline().split() 
     if len(line) == 0: 
      break 
     print '\t'.join([line[headers.index(x)] for x in keys if x in headers])

輸出：：

開始的輸出）之前加載完整文件3210

$ python test.py 
Ronny Ruby Suzie Paul 
11  11  12  12 
12  8.9  5.1  23

來源

2011-12-08 14:13:56 tito

注意

您的示例輸出是不正確的，因爲有你有「紅寶石」，但在你的文件1例如你有「幾度夕陽紅」紅寶石=/=幾度夕陽紅

kent$ awk 'NR==FNR{t[$0]++;next} 
{if(FNR==1){ 
     for(i=1;i<=NF;i++) 
       if($i in t){ 
         v[i]++; 
         printf $i"\t"; 
       } 
     print ""; 
     }else{ 
     for(x in v) 
       printf $x"\t" 
     print ""; 
} 

}' file1 file2

輸出

Paul Ronny Suzie 
12  11  12 
23  12  5.1

來源

2011-12-08 13:57:13 Kent

噢，是的，抱歉的錯誤 – Angelo

@安吉洛沒有問題。但我不會爲此改變我的答案。 – Kent

$ awk 'FILENAME~1{a[$0];next};FNR==1{for(i=1;i<=NF;i++)if($i in a)b[i]};{for(j in b)printf("%s\t",$j);print ""}' file{1,2}.txt 
Paul Ronny Suzie 
12  11  12 
23  12  5.1

闖入多線& &添加空格

$ awk ' 
> FILENAME~1 { a[$0]; next } 
> FNR==1 { for(i=1; i<=NF; i++) if($i in a) b[i] } 
> { for(j in b) printf("%s\t",$j); print ""} 
> ' file{1,2}.txt 

Paul Ronny Suzie 
12  11  12 
23  12  5.1

來源

2011-12-08 14:11:20 kev

很好地完成！ +1 –

只能使用bash中做到這一點：

FIELDS=`head -1 f2.txt | tr "\t" "\n" | nl -ba | grep -f f1.txt | cut -f1 | tr -d " " | tr "\n" ","`; FIELDS=${FIELDS/%,/} 
cut -f$FIELDS f2.txt 
Paul Ronny Ruby Suzie 
12 11 11 12 
23 12 8.9 5.1

來源

2011-12-08 14:12:13 Sorin

Perl的解決方案：

#!/usr/bin/perl 
use warnings; 
use strict; 

open my $KEYS, '<', 'file1' or die $!; 
my @keys = <$KEYS>; 
close $KEYS; 
chomp @keys; 
my %is_key; 
undef @is_key{@keys}; 

open my $TAB, '<', 'file2' or die $!; 
$_ = <$TAB>; 
my ($i, @columns); 
for (split) { 
    push @columns, $i if exists $is_key{$_}; 
    $i++; 
} 
do {{ 
    my @values = split; 
    print join("\t", @values[@columns]), "\n"; 
}} while <$TAB>;

來源

2011-12-08 14:16:08 choroba

像這樣的事情也許可以正常工作，這取決於你想要什麼。

use strict; 
use warnings; 

my %names; 
open (my $nh, '<', $name_file_path) or die "Could not open '$name_file_path'!"; 
while (<$nh>) { 
    m/^\s*(.*?\S)\s*$/ and $names{ $1 } = 1; 
} 
close $nh; 

my $coln = -1; 
open (my $dh, '<', $data_file_path) or die "Could not open '$data_file_path'!"; 

my (@name_list, @col_list) 
my @names = split /\t/, <$dh>; 
foreach my $name (0..$#names) { 
    next unless exists $names{ $names[ $name ] }; 
    push @name_list, $name; 
    push @col_list, $coln; 
} 
local $" = "\t"; 
print "@name_list\n"; 
print "@{[ split /\t/ ]}[ @col_list ]\n" while <$dh>; 
close $dh;

來源

2011-12-08 14:53:04 Axeman

這可能會爲你工作：

sed '1{s/\t/\n/gp};d' file2 | 
nl | 
grep -f file1 | 
cut -f1 | 
paste -sd, | 
sed 's/ //g;s,.*,cut -f& /tmp/b,' | 
sh

說明：

樞軸列名
數列名
對陣輸入文件中的列名。
消除保留列號的列名稱。
旋轉由,分隔的列編號。
從逗號分隔的列號列表中構建cut命令。
針對數據文件運行cut命令。

來源

2011-12-08 20:50:59 potong

從字典中提取數據

回答

相關問題