2011-11-04 30 views

回答

4

的新聞中心ode linebreak字形是\R,從v5.10開始支持。

你可以這樣改變,因此第一個換行符的空間,在整個文件:

$ perl -Mv5.10 -CSD -i.orig -0777 -pe 's/\R/ /' some_utf8_file.txt 

還有其他的方法是不那麼浪費的內存,但他們可能會非常棘手得到的權利。在這種情況下,您可能會忽略-0777並查看這對您是否足夠好。


編輯:正則表達式轉義

下面是正則表達式支持逃逸,包括哪些釋放它第一次支持,朝向甚至V5.6開始四捨五入的版本號。

Release  Rx Escape   Meaning 
=======  ==========  =========================================================================================== 
v1.0     \0          Match character number zero (U+0000, NULL, NUL). 
v1.0     \0N,\0NN    Match octal character up to octal 077. 
v1.0     \N          Match Nth capture group (decimal) if not in charclass and that many seen, else (octal) character up to octal 377. 
v1.0     \NN         Match Nth capture group (decimal) if not in charclass and that many seen, else (octal) character up to octal 377. 
v1.0     \NNN        Match Nth capture group (decimal) if not in charclass and that many seen, else (octal) character up to octal 377. 
v4.0     \a          Match the alert character (ALERT, BEL). 
v5.0     \A          True at the beginning of a string only, not in charclass. 
v1.0     \b          Match the backspace char (BACKSPACE, BS) in charclass only. 
v1.0     \b          True at Unicode word boundary, outside of charclass only. 
v1.0     \B          True when not at Unicode word boundary, not in charclass. 
v4.0     \cX         Match ASCII control character Control-X (\cZ, \c[, \c?, etc). 
v5.6     \C          Match one byte (C char) even in UTF‑8 (dangerous!), not in charclass. 
v1.0     \d          Match any Unicode digit character. 
v1.0     \D          Match any Unicode nondigit character. 
v4.0     \e          Match the escape character (ESCAPE, ESC, not backslash). 
v4.0     \E          End case (\F, \L, \U) or quotemeta (\Q) translation, only if interpolated. 
v1.0     \f          Match the form feed character (FORM FEED, FF). 
v5.16    \F          Foldcase (not lowercase) till \E, only if interpolated. 
v5.10    \g{GROUP}   Match the named or numbered capture group, not in charclass. 
v5.0     \G          True at end-of-match position of prior m//g or pos() setting, not in charclass. 
v5.10    \h          Match any Unicode horizontal whitespace character. 
v5.10    \H          Match any Unicode character except horizontal whitespace. 
v5.10    \k<GROUP>   Match the named capture group; also \k'NAME', not in charclass. 
v5.10    \K          Keep text to the left of \K out of match, not in charclass. 
v4.0     \l          Lowercase (not foldcase) next character only, only if interpolated. 
v4.0     \L          Lowercase (not foldcase) till \E., only if interpolated. 
v1.0     \n          Match the newline character (usually LINE FEED, LF). 
v5.12    \N          Match any character except newline. 
v5.6     \N{NAME}    Match the named character or named alias, or if outside of charclass named sequence, but only if interpolated and charnames loaded. 
v5.14    \o{NNNNNN}  Match the character given in any number of octal digits. 
v5.6     \p{PROP}    Match any character with the named property. 
v5.6     \P{PROP}    Match any character without the named property. 
v4.0     \Q          Quote (de-meta) metacharacters till \E. 
v1.0     \r          Match the return character (usually CARRIAGE RETURN, CR). 
v5.10    \R          Match any Unicode linebreak grapheme, only outside of charclass. 
v1.0     \s          Match any Unicode whitespace character except \cK. 
v1.0     \S          Match any Unicode nonwhitespace character or \cK. 
v1.0     \t          Match the tab character (CHARACTER TABULATION, HT). 
v4.0     \u          Titlecase (not uppercase) next character only, only if interpolated. 
v4.0     \U          Uppercase (not titlecase) till \E, only if interpolated. 
v5.10    \v          Match any Unicode vertical whitespace character. 
v5.10    \V          Match any character except Unicode vertical whitespace. 
v1.0     \w          Match any Unicode 「word」 character (alphabetics, digits, combining marks, and connector punctuation) 
v1.0     \W          Match any Unicode nonword character. 
v4.0     \xH         Match the character given in one hex digit. 
v4.0     \xHH        Match the character given in two hex digits. 
v5.6     \x{HHHHHH}  Match the character given in any number of hex. 
v5.6     \X          Match Unicode extended grapheme cluster, only outside of charclass. 
v5.5     \z          True at end of string only. 
v5.0     \Z          True right before optional final newline. 
+0

我們在哪裏可以找到'\ R'文檔?在http://perldoc.perl.org/perlunicode.html – Toto

+0

@ M42我無法找到它:這是一個正則表達式的事情,所以它是在* perlre *手冊頁,它安裝了與每一個Perl安裝。這意味着您不需要連接互聯網即可閱讀文檔。那將是跛腳的。我會用表格更新我的答案。 – tchrist

+0

@ M42:我添加了一個包括relnos的正則表達式轉義表,它可以幫助你。 – tchrist

0

怎麼樣:

$string =~ s/\n/ /; 
0

簡單的問題,簡單的答案:

#!/usr/bin/perl 

use strict; 
use warnings; 

my $str = 'Colours: 
Red 
Green 
Yellow 
Blue'; 

$str =~ s/\n/ /; 

print "$str\n"; 
1

有趣的情況是,當你從一個文件中讀取行由行:

#!/usr/bin/perl 

use strict; use warnings; 

if (defined(my $first = <DATA>)) { 
    chomp $first; 
    if (defined(my $second = <DATA>)) { 
     $first .= $second 
    } 
    print $first; 
} 

print while <DATA>; 

__DATA__ 
Colours: 
Red 
Green 
Yellow 
Blue