2011-02-13 41 views

回答

4

編輯:還有更多的調整。

根據精確的輸入是什麼樣子,這個工程的ASCII:

(?<! [:\s]) \s* (["']) (?: (?! \1) .)+ \1 

對於「統一‘匹配’報價」,你必須要多一點<特定>在你的配對,或許沿這些行:

(?xs) (?<!:) \s+ 
    (?: (["']) (?: (?! \1) .)+ \1 
    | 「 .*? 」 # English etc 
    | ‘ .*? ’ 
    | « .*? » # French, Spanish, Italian 
    | ‹ .*? › 
    | „ .*? 「 # German, Icelandic, Romanian 
    | ‚ .*? ‘ 
    | „ .?* 」 # Hungarian 
    | 」 .?* 」 # Swedish 
    | ’ .?* ’  
    | » .?* « # Danish, Hungarian 
    | › .*? ‹ 
    | 「 .*? 」 # Japanese, Chinese 
    | 『 .?* 』 
) 

你可以閱讀更多有關各種由各種語言here使用成對的引號。

下面是Perl中的測試程序,但原則應該保持得很好用Ruby:

#!/usr/bin/perl 
use strict; 
use warnings; 
use utf8; 
use open qw[ :std IO :utf8 ]; 
while (<DATA>) { 
    print if/(?<! [:\s]) \s* (["']) (?: (?! \1) .)+ \1/sx; 
} 
__END__ 
"Take off, hoser!" 
Dorothy Parker:Brevity is the soul of lingerie. 
Dorothy Parker:"Brevity is the soul of lingerie." 
Dorothy Parker: "Brevity is the soul of lingerie." 
Dorothy Parker: "Brevity is the soul of lingerie." 
Larry Wall: I don't know if it's what you want, but it's what you get. :-) 
Larry Wall said, "I don't know if it's what you want, but it's what you get. :-)" 
Larry Wall said: 「I don't know if it's what you want, but it’s what you get. :-)」 
Larry Wall said: 「I don't know if it's what you want, but it’s what you get. :-)」 
Larry Wall said, 「I don't know if it's what you want, but it's what you get. :-)」 
Boss: And what's that "goto" doing there?!? 
Hacker: Er, I guess my finger slipped when I was typing "getservbyport"... 
‘Nevermore!’ quoth the raven. 
Quoth the raven: ‘Nevermore!’ 
'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c: "I wish I had never come here, and I don't want to see no more magic," he said, and fell silent. 
src/perl/mg.c: 'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c => "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent." 
‘I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.’ 
「I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.」 

輸出是

"Take off, hoser!" 
Larry Wall: I don't know if it's what you want, but it's what you get. :-) 
Larry Wall said, "I don't know if it's what you want, but it's what you get. :-)" 
Larry Wall said: 「I don't know if it's what you want, but it’s what you get. :-)」 
Larry Wall said: 「I don't know if it's what you want, but it’s what you get. :-)」 
Larry Wall said, 「I don't know if it's what you want, but it's what you get. :-)」 
Boss: And what's that "goto" doing there?!? 
Hacker: Er, I guess my finger slipped when I was typing "getservbyport"... 
'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c: 'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c => "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent." 

這可能看起來「錯」,但它是因爲內部報價。這裏是一個更好的說明問題,更完整的版本:

#!/usr/bin/perl 
use strict; 
use warnings; 
use utf8; 
use open qw[ :std IO :utf8 ]; 
while (<DATA>) { 
    chomp;  
    my $bingo = m{ 
     (?<! [:\s]) \s* 
     (?: (?<=^) 
      | (?<= \s) 
     ) 
     (?: (["']) (?: (?! \1) .)+ \1 
      | 「 .*? 」 # English etc 
      | ‘ .*? ’ 
     ) 
    }sx; 

    if ($bingo) { 
     printf("Line %2d, quote 「%s」\n", $., $&); 
     printf(" " x 7 . "in line 『%s』\n", $_); 
    } else { 
     printf("Line %2d IGNORE 『%s』\n", $., $_); 
    }  
}  
__END__ 
"Take off, hoser!" 
Dorothy Parker:Brevity is the soul of lingerie. 
Dorothy Parker:"Brevity is the soul of lingerie." 
Dorothy Parker: "Brevity is the soul of lingerie." 
Dorothy Parker: "Brevity is the soul of lingerie." 
Larry Wall: I don't know if it's what you want, but it's what you get. :-) 
Larry Wall said, "I don't know if it's what you want, but it's what you get. :-)" 
Larry Wall said: 「I don't know if it's what you want, but it’s what you get. :-)」 
Larry Wall said: 「I don't know if it's what you want, but it’s what you get. :-)」 
Larry Wall said, 「I don't know if it's what you want, but it's what you get. :-)」 
Boss: And what's that "goto" doing there?!? 
Hacker: Er, I guess my finger slipped when I was typing "getservbyport"... 
‘Nevermore!’ quoth the raven. 
Quoth the raven: ‘Nevermore!’ 
'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c: "I wish I had never come here, and I don't want to see no more magic," he said, and fell silent. 
src/perl/mg.c: 'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c => "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent." 
‘I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.’ 
「I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.」 

,其輸出是:

Line 1, quote 「"Take off, hoser!"」 
     in line 『"Take off, hoser!"』 
Line 2 IGNORE 『Dorothy Parker:Brevity is the soul of lingerie.』 
Line 3 IGNORE 『Dorothy Parker:"Brevity is the soul of lingerie."』 
Line 4 IGNORE 『Dorothy Parker: "Brevity is the soul of lingerie."』 
Line 5 IGNORE 『Dorothy Parker: "Brevity is the soul of lingerie."』 
Line 6 IGNORE 『Larry Wall: I don't know if it's what you want, but it's what you get. :-)』 
Line 7, quote 「 "I don't know if it's what you want, but it's what you get. :-)"」 
     in line 『Larry Wall said, "I don't know if it's what you want, but it's what you get. :-)"』 
Line 8 IGNORE 『Larry Wall said: 「I don't know if it's what you want, but it’s what you get. :-)」』 
Line 9 IGNORE 『Larry Wall said: 「I don't know if it's what you want, but it’s what you get. :-)」』 
Line 10, quote 「 「I don't know if it's what you want, but it's what you get. :-)」」 
     in line 『Larry Wall said, 「I don't know if it's what you want, but it's what you get. :-)」』 
Line 11, quote 「 "goto"」 
     in line 『Boss: And what's that "goto" doing there?!?』 
Line 12, quote 「 "getservbyport"」 
     in line 『Hacker: Er, I guess my finger slipped when I was typing "getservbyport"...』 
Line 13, quote 「‘Nevermore!’」 
     in line 『‘Nevermore!’ quoth the raven.』 
Line 14 IGNORE 『Quoth the raven: ‘Nevermore!’』 
Line 15, quote 「'I wish I had never come here, and I don'」 
     in line 『'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent.』 
Line 16 IGNORE 『src/perl/mg.c: "I wish I had never come here, and I don't want to see no more magic," he said, and fell silent.』 
Line 17 IGNORE 『src/perl/mg.c: 'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent.』 
Line 18, quote 「 "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent."」 
     in line 『src/perl/mg.c => "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent."』 
Line 19, quote 「‘I wish I had never come here, and I don’」 
     in line 『‘I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.’』 
Line 20, quote 「「I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.」」 
     in line 『「I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.」』 

此外,有簡稱爲\p{Quotation_Mark}\p{QMark}一個標準Unicode派生屬性,但是Ruby沒有按」不支持它。您可以使用the unichars script列出這些全力以赴:

$ unichars '\p{qmark}' 
" 34 0022 QUOTATION MARK 
' 39 0027 APOSTROPHE 
« 171 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 
» 187 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 
‘ 8216 2018 LEFT SINGLE QUOTATION MARK 
’ 8217 2019 RIGHT SINGLE QUOTATION MARK 
‚ 8218 201A SINGLE LOW-9 QUOTATION MARK 
‛ 8219 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK 
「 8220 201C LEFT DOUBLE QUOTATION MARK 
」 8221 201D RIGHT DOUBLE QUOTATION MARK 
„ 8222 201E DOUBLE LOW-9 QUOTATION MARK 
‟ 8223 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK 
‹ 8249 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK 
› 8250 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 
「 12300 300C LEFT CORNER BRACKET 
」 12301 300D RIGHT CORNER BRACKET 
『 12302 300E LEFT WHITE CORNER BRACKET 
』 12303 300F RIGHT WHITE CORNER BRACKET 
〝 12317 301D REVERSED DOUBLE PRIME QUOTATION MARK 
〞 12318 301E DOUBLE PRIME QUOTATION MARK 
〟 12319 301F LOW DOUBLE PRIME QUOTATION MARK 
﹁ 65089 FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET 
﹂ 65090 FE42 PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET 
﹃ 65091 FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET 
﹄ 65092 FE44 PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET 
" 65282 FF02 FULLWIDTH QUOTATION MARK 
' 65287 FF07 FULLWIDTH APOSTROPHE 
「 65378 FF62 HALFWIDTH LEFT CORNER BRACKET 
」 65379 FF63 HALFWIDTH RIGHT CORNER BRACKET 

可以使用the uniprops script列出所有代碼點的屬性:

$ uniprops -a 2018 
U+2018 ‹‘› \N{ LEFT SINGLE QUOTATION MARK }: 
    \pP \p{Pi} 
    All Any Assigned InGeneralPunctuation Case_Ignorable CI Common Zyyy Pi P General_Punctuation Gr_Base Grapheme_Base Graph GrBase Initial_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn Print Punctuation QMark Quotation_Mark X_POSIX_Graph X_POSIX_Print X_POSIX_Punct 
    Age=1.1 Bidi_Class=ON Bidi_Class=Other_Neutral BC=ON Block=General_Punctuation Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common Decomposition_Type=None DT=None East_Asian_Width=A East_Asian_Width=Ambiguous EA=A Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=QU Line_Break=Quotation LB=QU Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy Sentence_Break=CL Sentence_Break=Close SB=CL Word_Break=MB Word_Break=MidNumLet WB=MB _Case_Ignorable _X_Begin 
+0

OP希望引用的字符串*在他們之前沒有*冒號。此外,這應該是`(?:(?!\ 1)。)+``:)` – Kobi 2011-02-13 12:57:10

2

在這裏你去我覺得http://rubular.com/r/hFylsgU3OT

^[^:]*"(.*?)"$ 

這BTW是問一個正則表達式的問題...例子,鏈接最完美的方式,和清除指令

+0

謝謝,但我意識到,我忘了一個測試:只有當冒號位於引號前面時(例如:[\ s *]),才能捕獲文本:http:// rubular .com/r/NtbcgGqX4h – krn 2011-02-13 13:16:00