如何在Perl中生成URL slugs？

Web框架，如Rails和Django的內置了支持，這是用來生成可讀和SEO友好的URL「子彈」：如何在Perl中生成URL slugs？

團狀字符串通常僅包含字符a-z,0-9和-，因此可以在沒有URL轉義的情況下編寫（例如「foo％20bar」）。

我正在尋找Perl slug函數，給定任何有效的Unicode字符串將返回一個slug表示（a-z,0-9和-）。

超級平凡蛞蝓功能將沿着線的東西：

$input = lc($input), 
$input =~ s/[^a-z0-9-]//g;

然而，這個實現不會處理國際和重音（我想ë成爲e）。解決這個問題的一個方法是列舉所有特殊情況，但這不會很優雅。我正在尋找更加深思熟慮和一般的東西。

我的問題：

什麼是產生在Perl的Django/Rails的類型蛞蝓最普遍的/實際的方法？This是我如何解決Java中的相同問題。

來源

2010-10-24 knorv

在Java中做同樣的事情。有沒有一個特定的操作，你不知道如何翻譯？ – 2010-10-24 23:06:39

brian：是的，我不知道如何翻譯的操作是「String normalized = Normalizer.normalize（nowhitespace，Form.NFD）;」。 Unicode :: Normalize解決了它。見卡梅倫的答案。 – knorv 2010-10-29 11:28:06

目前在Django中使用的slugify filter翻譯（大致）以下Perl代碼：

use Unicode::Normalize; 

sub slugify($) { 
    my ($input) = @_; 

    $input = NFKD($input);   # Normalize (decompose) the Unicode string 
    $input =~ tr/\000-\177//cd; # Strip non-ASCII characters (>127) 
    $input =~ s/[^\w\s-]//g;  # Remove all characters that are not word characters (includes _), spaces, or hyphens 
    $input =~ s/^\s+|\s+$//g;  # Trim whitespace from both ends 
    $input = lc($input); 
    $input =~ s/[-\s]+/-/g;  # Replace all occurrences of spaces and hyphens with a single hyphen 

    return $input; 
}

既然你也想重音符號改變重音的，在呼叫投擲到unidecode（在Text::Unidecode定義）在剝離非ASCII字符之前似乎是您最好的選擇（as pointed out by phaylon）。

在這種情況下，該功能可能看起來像：

use Unicode::Normalize; 
use Text::Unidecode; 

sub slugify_unidecode($) { 
    my ($input) = @_; 

    $input = NFC($input);   # Normalize (recompose) the Unicode string 
    $input = unidecode($input); # Convert non-ASCII characters to closest equivalents 
    $input =~ s/[^\w\s-]//g;  # Remove all characters that are not word characters (includes _), spaces, or hyphens 
    $input =~ s/^\s+|\s+$//g;  # Trim whitespace from both ends 
    $input = lc($input); 
    $input =~ s/[-\s]+/-/g;  # Replace all occurrences of spaces and hyphens with a single hyphen 

    return $input; 
}

前者很好地工作主要是ASCII字符串，但是當整個字符串的非ASCII字符組成功虧一簣，因爲他們都被剝離出來，給你一個空的字符串。

輸出示例：

string  | slugify  | slugify_unidecode 
------------------------------------------------- 
hello world  hello world  hello world 
北亰       bei-jing 
liberté   liberta   liberte

注意如何北亰得到slugifies不了了之與Django的啓發實施。還要注意NFC正常化帶來的差異 - 在剝離掉已分解角色的第二部分後，自由成爲NFKD的'自由'，但在用NFC剝離重新組合'é'後將變成'自由'。

來源

2010-10-24 17:44:09 Cameron