編輯:我爲錯誤道歉。我很可惜,我監督使用my
變量裏面countit(x, q{})
是大錯誤。該字符串在Benchmark模塊內部被評估,並且@str在那裏是空的。這個解決方案沒有我提出的那麼快。見下面的更正。我很抱歉。
Perl可以是快速的:
use strict;
use warnings;
package LCP;
sub LCP {
return '' unless @_;
return $_[0] if @_ == 1;
my $i = 0;
my $first = shift;
my $min_length = length($first);
foreach (@_) {
$min_length = length($_) if length($_) < $min_length;
}
INDEX: foreach my $ch (split //, $first) {
last INDEX unless $i < $min_length;
foreach my $string (@_) {
last INDEX if substr($string, $i, 1) ne $ch;
}
}
continue { $i++ }
return substr $first, 0, $i;
}
# Roy's implementation
sub LCP2 {
return '' unless @_;
my $prefix = shift;
for (@_) {
chop $prefix while (! /^\Q$prefix\E/);
}
return $prefix;
}
1;
測試套件:
#!/usr/bin/env perl
use strict;
use warnings;
Test::LCP->runtests;
package Test::LCP;
use base 'Test::Class';
use Test::More;
use Benchmark qw(:all :hireswallclock);
sub test_use : Test(startup => 1) {
use_ok('LCP');
}
sub test_lcp : Test(6) {
is(LCP::LCP(), '', 'Without parameters');
is(LCP::LCP('abc'), 'abc', 'One parameter');
is(LCP::LCP('abc', 'xyz'), '', 'None of common prefix');
is(LCP::LCP('abcdefgh', ('abcdefgh') x 15, 'abcdxyz'),
'abcd', 'Some common prefix');
my @str = map { chomp; $_ } <DATA>;
is(LCP::LCP(@str),
'file:///home/gms8994/Music/', 'Test data prefix');
is(LCP::LCP2(@str),
'file:///home/gms8994/Music/', 'Test data prefix by LCP2');
my $t = countit(1, sub{LCP::LCP(@str)});
diag("LCP: ${\($t->iters)} iterations took ${\(timestr($t))}");
$t = countit(1, sub{LCP::LCP2(@str)});
diag("LCP2: ${\($t->iters)} iterations took ${\(timestr($t))}");
}
__DATA__
file:///home/gms8994/Music/t.A.T.u./
file:///home/gms8994/Music/nina%20sky/
file:///home/gms8994/Music/A%20Perfect%20Circle/
測試套件結果:
1..7
ok 1 - use LCP;
ok 2 - Without parameters
ok 3 - One parameter
ok 4 - None of common prefix
ok 5 - Some common prefix
ok 6 - Test data prefix
ok 7 - Test data prefix by LCP2
# LCP: 22635 iterations took 1.09948 wallclock secs (1.09 usr + 0.00 sys = 1.09 CPU) @ 20766.06/s (n=22635)
# LCP2: 17919 iterations took 1.06787 wallclock secs (1.07 usr + 0.00 sys = 1.07 CPU) @ 16746.73/s (n=17919)
這意味着,使用substr
純Perl的溶液爲約20%的速度在您的測試案例中,要比Roy's solution多一個前綴查找需要大約50us。沒有必要使用XS,除非您的數據或性能預期更大。
不相似性必須從字符串的開頭開始?如果是這樣,很容易解決。如果不是,那就更復雜了。 – cletus 2009-02-01 01:18:05
同上該查詢 - 我會加上 - 「相似」你的意思是'確切'? – 2009-02-01 01:20:16
您提出的問題不明確。首先,確實是類似的意思。另外,例如,如果前10個字符共有10個字符串,那麼10個字符串中又多出5個字符串對於另外7個字符是常見的,您需要哪些預定義? – 2009-02-01 15:04:39