哈希中的4/16是什麼？

if (%hash){ 
    print "That was a true value!\n"; 
} 
當（且僅當）的散列具有至少一個鍵值對那將是正確的。哈希中的4/16是什麼？

實際結果是一個內部調試字符串，用於維護Perl的人。 它看起來像「4/16」，，但值保證是真實的，當散列非空，當它是空的時爲假。 - 拉馬書

這是什麼4/16？任何人都可以看到一個小程序，我可以看到結果是4/16嗎？

來源

2011-07-17 Chankey Pathak

我認爲「4/16」只是一個分數的例子。該書*應該說「這是一個分數，例如4/16或1/8」。 – TLP

從perldoc perldata：

如果你評估在標量上下文的哈希，它如果哈希是空的返回false。如果有任何鍵/值對，則返回true;更確切地說，返回的值是一個字符串，包含用過的桶的數量和分配的桶的數量，用斜線分隔。只有找出Perl的內部哈希算法在您的數據集上表現不佳，這才非常有用。對於示例，您可以在散列中粘貼10,000個東西，但是評估％HASH的標量上下文會顯示「1/16」，這意味着只有十六個存儲桶中的一個已被觸摸，並且可能包含您的所有012000項目。

所以，4/16將使用/分配數量的水桶，並像下面會顯示該值：

%hash = (1, 2); 
print scalar(%hash); #prints 1/8 here

來源

2011-07-17 10:38:55 Hasturkun

要查找has中的條目數，請使用'標量（鍵％hash）'。 – shawnhcorey

這(%hash)評估在標量上下文中的哈希值。

下面是空的哈希：

command_line_prompt> perl -le '%hash=(); print scalar %hash;'

結果爲0。

這裏有一個非空的哈希：

command_line_prompt> perl -le '%hash=(foo=>'bar'); print scalar %hash;'

結果是字符串「1/8」。

來源

2011-07-17 10:39:56

這不是一個答案。你實際上只是重申這個問題。 – bart

這是我發送給Perl初學者郵件列表的電子郵件的稍微修改版本，回答了這個問題。

說

my $hash_info = %hash;

會得到你要麼0（如果哈希爲空）或用於總桶中的比例。這些信息幾乎，但不完全，對你沒用。要理解這意味着您必須首先了解hashing是如何工作的。

允許執行使用Perl 5，我們首先需要的是一個哈希函數的哈希值。哈希函數將字符串轉換爲希望的唯一編號。真正的強哈希函數的示例是 MD5或SHA1，但它們往往對於常用應用來說太慢，因此人傾向於使用較弱（即產生較少唯一輸出的函數）函數。 Perl 5使用鮑勃詹金斯[一次一個] 算法，其中有一個很好的速度唯一性折衷。對於我們的例如，我會用一個非常弱的散列函數：

#!/usr/bin/perl 

use strict; 
use warnings; 

sub weak_hash { 
     my $key = shift; 
     my $hash = 1; 
     #multiply every character in the string's ASCII/Unicode value together 
     for my $character (split //, $key) { 
       $hash *= ord $character; 
     } 
     return $hash; 
} 

for my $string (qw/cat dog hat/) { 
     print "$string hashes to ", weak_hash($string), "\n"; 
}

因爲哈希函數往往給回是從範圍大於我們希望，您通常使用modulo，以減少數字的數字範圍之給回：

#!/usr/bin/perl 

use strict; 
use warnings; 

sub weak_hash { 
     my $key = shift; 
     my $hash = 1; 
     #multiply every character in the string's ASCII/Unicode value together 
     for my $character (split //, $key) { 
       $hash *= ord $character; 
     } 
     return $hash; 
} 

for my $string (qw/cat dog hat/) { 
     # the % operator is constraining the number 
     # weak_hash returns to 0 - 10 
     print "$string hashes to ", weak_hash($string) % 11, "\n"; 
}

現在我們已經有了一個散列函數，我們需要的地方保存密鑰和價值。這被稱爲哈希表。散列表通常是一個陣列，其元素被稱爲桶（這些桶是比率所談論的）。水桶將持有的所有鍵/值的對散列爲同一個號碼：

#!/usr/bin/perl 

use strict; 
use warnings; 

sub weak_hash { 
     my $key = shift; 
     my $hash = 1; 
     for my $character (split //, $key) { 
       $hash *= ord $character; 
     } 
     return $hash; 
} 

sub create { 
     my ($size) = @_; 

     my @hash_table; 

     #set the size of the array 
     $#hash_table = $size - 1; 

     return \@hash_table; 
} 


sub store { 
     my ($hash_table, $key, $value) = @_; 

     #create an index into $hash_table 
     #constrain it to the size of the hash_table 
     my $hash_table_size = @$hash_table; 
     my $index   = weak_hash($key) % $hash_table_size; 

     #push the key/value pair onto the bucket at the index 
     push @{$hash_table->[$index]}, { 
       key => $key, 
       value => $value 
     }; 

     return $value; 
} 

sub retrieve { 
     my ($hash_table, $key) = @_; 

     #create an index into $hash_table 
     #constrain it to the size of the hash_table 
     my $hash_table_size = @$hash_table; 
     my $index   = weak_hash($key) % $hash_table_size; 

     #get the bucket for this key/value pair 
     my $bucket = $hash_table->[$index]; 

     #find the key/value pair in the bucket 
     for my $pair (@$bucket) { 
       return $pair->{value} if $pair->{key} eq $key; 
     } 

     #if key isn't in the bucket: 
     return undef; 
} 

sub list_keys { 
     my ($hash_table) = @_; 

     my @keys; 

     for my $bucket (@$hash_table) { 
       for my $pair (@$bucket) { 
         push @keys, $pair->{key}; 
       } 
     } 

     return @keys; 
} 

sub print_hash_table { 
     my ($hash_table) = @_; 

     for my $i (0 .. $#$hash_table) { 
       print "in bucket $i:\n"; 
       for my $pair (@{$hash_table->[$i]}) { 
         print "$pair->{key} => $pair->{value}\n"; 
       } 
     } 
} 

my $hash_table = create(3); 

my $i = 0; 
for my $key (qw/a b c d g j/) { 
     store($hash_table, $key, $i++); 
} 
print_hash_table($hash_table); 

print "the a key holds: ", retrieve($hash_table, "a"), "\n";

正如我們從這個例子可以看到，它是可能的一個水桶具有比其他多個鍵/值對。這是一個糟糕的情況是英寸。它導致哈希對於那個桶來說很慢。這是中用於總桶數的比率之一，表示散列在標量上下文中的返回值。如果散列表示只使用了幾個桶，但它們在散列中有很多密鑰，那麼你知道你有一個問題。

要了解有關散列的更多信息，請在此處詢問我所說的內容：或read about them。

來源

2011-07-17 12:16:00

添加另一個答案，因爲第一個答案已經太長。

看到"4/16"意味着什麼的另一種方法是使用Hash::Esoteric模塊（警告alpha質量代碼）。我寫了它給了我一個更好的哈希圖內部的圖片，所以我可以試着瞭解大哈希似乎有的performance problem。從Hash::Esoteric的keys_by_bucket函數將返回所有的鍵從哈希值，但不是返回他們像keys名單做，將它們作爲一個AoA其中頂層代表水桶和數組引用它裏面持有的鑰匙桶。

#!/user/bin/env perl 

use strict; 
use warnings; 

use Hash::Esoteric qw/keys_by_bucket/; 

my %hash = map { $_ => undef } "a" .. "g"; 
my $buckets = keys_by_bucket \%hash; 

my $used; 
for my $i (0 .. $#$buckets) { 
    if (@{$buckets->[$i]}) { 
     $used++; 
    } 
    print "bucket $i\n"; 
    for my $key (@{$buckets->[$i]}) { 
     print "\t$key\n"; 
    } 
} 

print "scalar %hash: ", scalar %hash, "\n", 
     "used/total buckets: $used/", scalar @$buckets, "\n";

上述代碼打印出類似的信息（實際的數據需要Perl的版本）這樣的：

bucket 0 
    e 
bucket 1 
    c 
bucket 2 
    a 
bucket 3 
    g 
    b 
bucket 4 
bucket 5 
    d 
bucket 6 
    f 
bucket 7 
scalar %hash: 6/8 
used/total buckets: 6/8

來源

2011-07-17 12:39:12

是的，我不明白什麼'HvMAX'返回（我認爲這是桶的數量，但它確實是最後一個桶的索引）。 GitHub上的版本現在應該是正確的。 –

散列是鏈表的陣列。散列函數將該鍵轉換爲一個數字，該數字用作存儲該值的數組元素（「存儲區」）的索引。多於一個鍵可以散列到相同的索引（「碰撞」），這是由鏈接列表處理的情況。

分數的分母是桶的總數。

分數的分子是具有一個或多個元素的桶的數量。

對於具有相同元素數的散列，數字越高越好。返回6/8的碰撞比返回4/8的碰撞更少。

來源

2011-07-17 16:05:16 ikegami

這比寫一個散列表的糟糕實現來演示發生了什麼更加簡潔。布拉沃。 –

@Chas。歐文斯，謝謝，我嘗試 – ikegami

請注意，碰撞句子越少錯誤。這個評論空間太小，所以我提供了一個更好更確切的答案。 – rurban

該部分是散列的填充率：使用的桶與分配的桶。有時也稱爲加載因子。

要真正獲得「4/16」，您需要一些技巧。 4把鑰匙將導致8個水桶。因此，你需要至少9個按鍵，然後刪除5.

$ perl -le'%h=(0..16); print scalar %h; delete $h{$_} for 0..8; print scalar %h' 
9/16 
4/16

請注意，您的號碼可能會有所不同，因爲種子是隨機的，你就不能準確地預測碰撞

填充速度是什麼時候重新哈希的關鍵哈希信息。 Perl 5以100％的填充率進行加密，請參閱hv.c中的DO_HSPLIT宏。因此它以只讀速度交易內存。正常填充率將在80％-95％之間。你總是留下漏洞來保存一些碰撞。較低的填充率會導致訪問速度加快（衝突較少），但會導致更多的重新加入。

您不會立即看到與分數衝突的次數。您還需要keys %hash，以與分數的分子比較使用的桶數量。

因此，碰撞質量的一個部分是鍵/桶使用：

my ($used, $max) = split '/',scalar(%hash); 
keys %hash/$used;

但在現實中，你需要知道的所有的桶鏈表的長度的總和。可以用Hash::Util::bucket_info

($keys, $buckets, $used, @length_count)= Hash::Util::bucket_info(\%hash)

訪問此質量雖然哈希訪問通常是O（1），具有較長的長度，只有爲O（n/2），尤其。對於過長的水桶。在https://github.com/rurban/perl-hash-stats我提供perl5核心測試套件數據的各種散列函數的碰撞質量統計信息。我還沒有測試不同填充率的折衷，因爲我完全重寫了當前的哈希表。

更新：對於perl5，比最近測試的更好的填充率超過100％的比例應該是90％。但這取決於使用的散列函數。我用了一個壞的和快速的：FNV1A。使用更好，更慢的哈希函數，您可以使用更高的填充率。當前的默認OOAT_HARD不好並且很慢，所以應該避免。

來源

2015-07-09 07:19:54 rurban

哈希中的4/16是什麼？

回答

相關問題