2012-03-09 25 views
1

我想要做的是獲取所有UTF16 Unicode圖表的集合。我從http://unicode.org/charts/PDF/下載所有的PDF文件,並決定使用Perl來擺脫所有下面的腳本特殊,或UTF32圖表的:爲什麼grep仍然包含。和..在我的文件列表中,即使它不匹配正則表達式我給它?

#!/usr/bin/perl 

opendir(my $dir, "."); 
my @files = grep {!/^U[0-9,A-F]{4}\.pdf/ && !/utf16only.pl/} readdir($dir); 
for $f (@files) 
{ 
    print "deleting $f...\n"; 
    #unlink $f; 
} 
closedir($dir); 

當我運行該腳本,我得到下面的輸出:

C:\Users\Evan\Downloads\Unicode 6.1 Charts>utf16only.pl 
deleting .... 
deleting ..... 
deleting 10FF80.pdf... 
deleting ErrorLink.pdf... 
deleting U10000.pdf... 
deleting U100000.pdf... 
deleting U10080.pdf... 
deleting U10100.pdf... 
deleting U10140.pdf... 
deleting U10190.pdf... 
deleting U101D0.pdf... 
deleting U10280.pdf... 
deleting U102A0.pdf... 
deleting U10300.pdf... 
deleting U10330.pdf... 
deleting U10380.pdf... 
deleting U103A0.pdf... 
deleting U10400.pdf... 
deleting U10450.pdf... 
deleting U10480.pdf... 
deleting U10800.pdf... 
deleting U10840.pdf... 
deleting U10900.pdf... 
deleting U10920.pdf... 
deleting U10980.pdf... 
deleting U109A0.pdf... 
deleting U10A00.pdf... 
deleting U10A60.pdf... 
deleting U10B00.pdf... 
deleting U10B40.pdf... 
deleting U10B60.pdf... 
deleting U10C00.pdf... 
deleting U10E60.pdf... 
deleting U10FF80.pdf... 
deleting U11000.pdf... 
deleting U11080.pdf... 
deleting U110D0.pdf... 
deleting U11100.pdf... 
deleting U11180.pdf... 
deleting U11680.pdf... 
deleting U12000.pdf... 
deleting U12400.pdf... 
deleting U13000.pdf... 
deleting U16800.pdf... 
deleting U16F00.pdf... 
deleting U1B000.pdf... 
deleting U1D000.pdf... 
deleting U1D100.pdf... 
deleting U1D200.pdf... 
deleting U1D300.pdf... 
deleting U1D360.pdf... 
deleting U1D400.pdf... 
deleting U1EE00.pdf... 
deleting U1F000.pdf... 
deleting U1F030.pdf... 
deleting U1F0A0.pdf... 
deleting U1F100.pdf... 
deleting U1F200.pdf... 
deleting U1F300.pdf... 
deleting U1F600.pdf... 
deleting U1F680.pdf... 
deleting U1F700.pdf... 
deleting U1FF80.pdf... 
deleting U20000.pdf... 
deleting U2A700.pdf... 
deleting U2B740.pdf... 
deleting U2F800.pdf... 
deleting U2FF80.pdf... 
deleting U3FF80.pdf... 
deleting U4FF80.pdf... 
deleting U5FF80.pdf... 
deleting U6FF80.pdf... 
deleting U7FF80.pdf... 
deleting U8FF80.pdf... 
deleting U9FF80.pdf... 
deleting UAFF80.pdf... 
deleting UBFF80.pdf... 
deleting UBOOP.pdf... 
deleting UCFF80.pdf... 
deleting UDFF80.pdf... 
deleting UE0000.pdf... 
deleting UE0100.pdf... 
deleting UEFF80.pdf... 
deleting UF0000.pdf... 
deleting UFFF80.pdf... 

第2行仍然...而且,我想是因爲我想取消鏈接.,它消除了大量的,我不希望刪除的文件。我不確定這個問題是否存在於我的正則表達式grep,readdirunlink中,但它消除了比它應該更多的文件。

回答

6

這條線:

grep {!/^U[0-9,A-F]{4}\.pdf/ && !/utf16only.pl/} 

只包括既不符合正則表達式的文件。這包括...。爲了排除這兩個,你必須以擴展:

grep {!/^U[0-9,A-F]{4}\.pdf/ && !/utf16only.pl/ && !/^\.{1,2}$/} 
+0

當然!非常感謝!我也意識到,我應該在'grep'塊中進行文件測試,這也將解決它。 – 2012-03-09 06:37:32

5

這是你的正則表達式:

grep {!/^U[0-9,A-F]{4}\.pdf/ && !/utf16only.pl/} readdir($dir); 

這是說「匹配所有文件不匹配U[0-9,A-F]{4}.pdf(注 - 你真的想在那裏,逗號?),也沒有utf16only.pl

由於...不匹配U [0-9A-F] {4} .pdf和不匹配utf16only.pl,他們也被刪除。

添加!/^\./到您grep也排除從刪除列表這些文件:

grep {!/^U[0-9A-F]{4}\.pdf/ && !/^\./ && !/utf16only.pl/} readdir($dir); 

注 - 我改變了你[0-9,A-F][0-9A-F],因爲我不認爲你的文件名會在他們逗號。

+0

啊,是的,感謝逗號的事情,我是新來的perl和正則表達式,有點困惑,因爲有時正則表達式對我來說似乎有點太聰明。 – 2012-03-09 06:42:40

+0

@EvanC正則表達式永遠不會比你更聰明,不用擔心。 =) – TLP 2012-03-09 11:30:21

相關問題