2011-02-04 117 views
1

我有一個數據集,其中包含與這些UA對應的用戶代理和設備列表。還有另一個數據集與用戶代理一起有其他數據。我需要一種方法來識別數據中的設備。在Perl中映射兩個數據集

因此,我必須在兩個文件中映射UA,然後從具有該列表的文件中獲取相應的設備信息。我已經得到了第一個文件中的哈希列表,並將其與數據文件中的UA進行匹配。如何從第一個具有設備信息的文件中再次獲取相應信息並將其寫入文件?

#!/usr/bin/perl 

use warnings; 
use strict; 

our $inputfile = $ARGV[0]; 
our $outputfile = "$inputfile" . '.devidx'; 
our $devid_file = "devid_master"; # the file that has the UA and the corresponding device info 
our %ua_list_hash =(); 

# Create a list of mobile user agents in the devid_master file 
open DEVID, "$devid_file" or die "can't open $devid_file"; 

while(<DEVID>) { 
     chomp; 
     my @devidfile = split /\t/; 
     $ua_list_hash{$devidfile[1]} = 0; 
} 

open IN,"$inputfile" or die "can't open $inputfile";  
while(<IN>) {  
     chomp;  
     my @hhfile = split /\t/;  

     if(exists $ua_list_hash{$hhfile[24]}) {  
        # how do I get the rest of the columns from the devidfile, columns 2...10? 
     } 
} 

close IN; 

還是有更好的方法來做到這一點是Perl?這總是歡迎:)。

回答

2

構建第一個查找散列時,是否可以將對其他列數據的引用存儲爲散列值,而不僅僅是0?

#!/usr/bin/perl 

use warnings; 
use strict; 

our $inputfile = $ARGV[0]; 
our $outputfile = "$inputfile" . '.devidx'; 
our $devid_file = "devid_master"; # the file that has the UA and the corresponding device info 
our %ua_list_hash =(); 

# Create a list of mobile user agents in the devid_master file 
open DEVID, "$devid_file" or die "can't open $devid_file"; 
while(<DEVID>) { 
     chomp; 
     my @devidfile = split /\t/; 
     # save the columns you'll want to access later and 
     # store a reference to them as the hash value 
     my @values = @devidfile[2..$#devidfile]; 
     $ua_list_hash{$devidfile[1]} = \@values; 
} 

open IN,"$inputfile" or die "can't open $inputfile";  
while(<IN>) {  
     chomp;  
     my @hhfile = split /\t/;  

     if(exists $ua_list_hash{$hhfile[24]}) { 
      my @rest_of_vals = @{$ua_list_hash{$hhfile[24]}; 
      # do something with @rest_of_vals 
     } 
} 

close IN; 

注意:我沒有測試過這個。

+0

yup that works :),那就是不熟悉Perl的問題。謝謝 ! – sfactor 2011-02-04 14:46:46

0

你想讓你的輸出看起來像什麼? $ inputfile中發生的所有唯一設備的列表。或者對於$ inputfile中的每一行,輸出一行顯示它是哪個設備?

我會回答後者,因爲如果需要的話你可以做一個獨特的排序。另外,看起來每個UA都有多個設備。作爲一種常規方法,您可以將UA名稱作爲密鑰存儲在散列中,並且該值可以是設備名稱的數組或設備名稱的字符分隔字符串。

如果您知道設備名稱是元素2..10,則可以使用切片和連接運算符來構造(例如)逗號分隔的設備名稱字符串。該字符串將是分配給UA名稱鍵的值。

#!/usr/bin/perl 

    use warnings; 
    use strict; 

    our $inputfile = $ARGV[0]; 
    our $outputfile = "$inputfile" . '.devidx'; 
    our $devid_file = "devid_master"; # the file that has the UA and the corresponding device info 
    our %ua_list_hash =(); 

    # Create a list of mobile user agents in the devid_master file 
    open DEVID, "$devid_file" or die "can't open $devid_file"; 

    while(<DEVID>) { 
      chomp; 
      my @devidfile = split /\t/; 
      my @slice = @devidfile[2..10]; 
      my $deviceString = join(",", @slice); 
      $ua_list_hash{$devidfile[1]} = $deviceString; 
    } 

    my $outputfilename = "output.txt"; 
    open IN,"$inputfile" or die "can't open $inputfile"; 
    open OUT,"$outputfilename" or die "can't open $outputfilename";  
    while(<IN>) {  
      chomp;  
      my @hhfile = split /\t/;  

      if(exists $ua_list_hash{$hhfile[24]}) {  
       print OUT $ua_list_hash{$hhfile[24]}."\n"; 

      } 
    } 

close IN; 
close OUT;