Perl6：用正則表達式在一個字符串中捕獲Windows換行符

免責聲明：我已經在PerlMonks上交叉發佈了此信息。Perl6：用正則表達式在一個字符串中捕獲Windows換行符

在Perl5中，我可以快速，方便地打印出結束\r\n Windows的風格路線的十六進制表示：

perl -nE '/([\r\n]{1,2})/; print(unpack("H*",$1))' in.txt 
0d0a

，如果你想測試在UNIX上創建一個Windows的結尾的文件，創建一個in.txt帶有單行和行結尾的文件。然後：perl -ni -e 's/\n/\r\n/g;print' in.txt。（或在vi/vim中，創建該文件，只需執行:set ff=dos）。

我在Perl6中嘗試了很多事情來做同樣的事情，但無論我做什麼，我都無法使它工作。這是我最近的測試：

use v6; 
use experimental :pack; 

my $fn = 'in.txt'; 

my $fh = open $fn, chomp => False; # I've also tried :bin 
for $fh.lines -> $line { 
    if $line ~~ /(<[\r\n]>**1..2)/ { 
     $0.Str.encode('UTF-8').unpack("H*").say; 
    } 
}

輸出0a，因爲這樣做：

/(\n)/ 
/(\v)/

首先，我甚至不知道如果我使用unpack()或正則表達式正確。其次，如何捕獲P6中換行符的兩個元素（\r\n）？

來源

2016-05-30 stevieb

在Perl5中，我更喜歡'的sprintf（「％v02X」，$ S）'。適用於任何字符串，而不僅僅是字節串。 – ikegami

Perl 6會自動爲您關閉行分隔符。這意味着當您嘗試進行替換時，它不在那裏。

如果有組合字符，Perl 6還會創建合成字符。所以如果您想要輸入的基數爲16，請使用編碼'latin1'或使用返回Buf的$*IN上的方法。

本示例只是將CRLF附加到每行的末尾。
（最後一行將始終與0D 0A結束，即使它沒有一個行終止）

perl6 -ne 'BEGIN $*IN.encoding("latin1"); #`(basically ASCII) 
    $_ ~= "\r\n"; #`(append CRLF) 
    put .ords>>.fmt("%02X");'

你也可以關閉autochomp行爲。

perl6 -ne 'BEGIN { 
     $*IN.encoding("latin1"); 
     $*IN.chomp = False; 
    }; 
    s/\n/\r\n/; 
    put .ords>>.fmt("%02X");'

來源

2016-05-31 13:41:51

https://doc.perl6.org/type/IO::Handle#method_lines

返回文件的行一個懶惰的名單通過GET讀，僅限於$極限線。
新行分隔符（即$ * IN.nl-in）將被排除。 B

來源

2016-05-31 02:57:07 ugexe

好了，什麼我的目標是（我很抱歉，我沒有說清楚，當我張貼的問題）是我想讀一個文件，捕捉行尾和文件寫回使用原始的行結尾（而不是當前平臺的結尾）。

我得到了一個概念證明現在工作。我對Perl 6非常陌生，所以代碼可能不是很符合p6-ish，但它確實符合我的需要。

代碼在FreeBSD上測試：

use v6; 
    use experimental :pack; 

    my $fn = 'in.txt'; 
    my $outfile = 'out.txt'; 

    # write something with a windows line ending to a new file 

    my $fh = open $fn, :w; 
    $fh.print("ab\r\ndef\r\n"); 
    $fh.close; 

    # re-open the file 

    $fh = open $fn, :bin; 

    my $eol_found = False; 
    my Str $recsep = ''; 

    # read one byte at a time, or else we'd have to slurp the whole 
    # file, as I can't find a way to differentiate EOL from EOF 

    while $fh.read(1) -> $buf { 
     my $hex = $buf.unpack("H*"); 
     if $hex ~~ /(0d|0a)/ { 
      $eol_found = True; 
      $recsep = $recsep ~ $hex; 
      next; 
     } 
     if $eol_found { 
      if $hex !~~ /(0d|0a)/ { 
       last; 
      } 
     } 
    } 

    $fh.close; 

    my %recseps = (
     '0d0a' => "\r\n", 
     '0d' => "\r", 
     '0a' => "\n", 
    ); 

    my $nl = %recseps<<$recsep>>; 

    # write a new file with the saved record separator 

    $fh = open $outfile, :w; 
    $fh.print('a' ~ $nl); 
    $fh.close; 

    # re-read file to see if our newline stuck 

    $fh = open $outfile, :bin; 

    my $buf = $fh.read(1000); 
    say $buf;

輸出：

Buf[uint8]:0x<61 0d 0a>

來源

2016-06-01 15:26:46 stevieb

我會回頭介紹介紹/文檔，然後在我更熟練後，再次用我的新知識進行測試。 – stevieb

是[newline.t]（https://github.com/perl6/roast/S16-io/newline.t）從[烤]（https://github.com/perl6/roast/blob/master/README ）有幫助嗎？ – raiph

絕對是！非常感謝你。事實上，我甚至沒有想過要看測試文件中的例子，但整個套件將成爲一個巨大的學習工具。 – stevieb

Perl6：用正則表達式在一個字符串中捕獲Windows換行符

回答

相關問題