2012-11-21 54 views
2

我在DB許多數字基地(池)。例如,獲得的電話號碼(Perl的)

448-48-00 #(from 00 to 99, 100 numbers) 
336-87-00 #(same as above) 
449-20-00 #(from 000 to 999, 1000 numbers) 

我需要得到這些數字的基數。在這個例子中,我需要44848,33687和4492.

我有這樣的代碼,但我不知道,如何完成它:)

#!/usr/bin/perl 

use v5.10; 
use warnings; 

my @p = 4484900..4484999; 
push @p, $_ for 3368700..3368799; 

my $data; 

do { 
    my $z = 1; 
    while($z++ <= length $_) { 
     $data->{substr $_, 0, $z}++; 
    } 
} for @p; 

foreach my $key (sort { $data->{$a} <=> $data->{$b} } (keys %$data)) { 
    say $key if $data->{$key} > 99; 
} 

我需要獲得最長的元素併除去短的元件,其中最長的代碼包含它

回答

1

我試圖理解你在你的代碼做什麼,並提高它做你想要的。免責聲明:並非如此簡單,例如,算法看不到您不希望將44848..4492...分組爲44.....,而是想將組4492...而不是44924..等。但也許這已經可以幫助你。

我認爲重要的部分是「智能過濾」,它例如着眼於3363368和刪除的336計數,如果它不高於其他(336標誌着一個平凡的超集的3368)。這裏重要的是串排序連同state變量$last

#!/usr/bin/env perl 

use strict; 
use warnings; 
use feature qw(say state); 
use List::Util 'shuffle'; 

# shuffled phone numbers (don't make it too easy) 
my @numbers = shuffle (
    4484800 .. 4484899, 
    3368700 .. 3368799, 
    4492000 .. 4492999 
); 

my %count =(); 

# import phone numbers 
foreach my $number (@numbers) { 

    # work on all substrings from the beginning 
    for (my $pos = 1; $pos <= length $number; $pos++) { 
     my $prefix = substr $number, 0, $pos; 
     $count{$prefix}++; # increase the number of equal prefixes 
    } 
} 

# smart filter 
foreach my $prefix (sort {$a cmp $b} keys %count) { 
    state $last //= 'nothing'; 

    # delete trivial super sets 
    if ($prefix =~ /^\Q$last/ and $count{$last} == $count{$prefix}) { 
     delete $count{$last}; 
    } 

    # delete trivial sets 
    if ($count{$prefix} == 1) { 
     delete $count{$prefix}; 
     next; 
    } 

    # remember the last prefix 
    $last = $prefix; 
} 

# output 
say "$_ ($count{$_})" for sort { 
    $count{$b} <=> $count{$a} or $a cmp $b 
} keys %count; 

輸出是絕對正確的,但還沒有你想要什麼:

44 (1100) 
4492 (1000) 
33687 (100) 
44848 (100) 
44920 (100) 
44921 (100) 
44922 (100) 
44923 (100) 
44924 (100) 
44925 (100) 
44926 (100) 
44927 (100) 
44928 (100) 
44929 (100) 
336870 (10) 
(large list of 10-groups) 

所以,如果你想擺脫10 - 基團,你可以改變

# delete trivial sets 
if ($count{$prefix} == 1) { 
    delete $count{$prefix}; 
    next; 
} 

# delete trivial sets 
if ($count{$prefix} <= 10) { 
    delete $count{$prefix}; 
    next; 
} 

輸出:

44 (1100) 
4492 (1000) 
33687 (100) 
44848 (100) 
44920 (100) 
44921 (100) 
44922 (100) 
44923 (100) 
44924 (100) 
44925 (100) 
44926 (100) 
44927 (100) 
44928 (100) 
44929 (100) 

這看起來很不錯。現在取決於您如何處理4492 -100組和44 -1100組。如果您想要根據其長度刪除100組,則還可以刪除4492組,以支持較大的44組。

+0

那最棒!非常感謝你=) – VeroLom

+0

很高興你喜歡它。如果你覺得這個答案是令人滿意的,你應該「接受」它。 – memowe

3
#!/usr/bin/env perl -l 

use strict; use warnings; 

my $prefix = "1234"; 

foreach (<DATA>) { 
    print $prefix . $1 . $2 if m/^(\d{3})-(\d{1,2})/; 
} 

__DATA__ 
448-48-## (00-99) 
336-87-## (-||-) 
449-2#-## (0-9, 00-99) 

OUTPUT

123444848 
123433687 
12344492 

如果你只是想在更高的價值:

#!/usr/bin/env perl -l 

my @arr; 
my $prefix = "1234"; 
my $higher_prefix = 0; 

foreach (<DATA>) { 
    my $cur = $1 . $2 if m/^(\d{3})-(\d{1,2})/; 
    $higher_prefix = $prefix . $cur if $cur > $higher_prefix; 
} 

print $higher_prefix; 

__DATA__ 
448-48-## (00-99) 
336-87-## (-||-) 
449-2#-## (0-9, 00-99) 

輸出

123444848 
+0

對不起,我的意思是數字的範圍。像1234500..1234599等 – VeroLom

+1

你只是想在最後的最大整數,就是這樣嗎? –

+0

見我的編輯職位 –