需要一個腳本來剝離文本文件中多餘的換行字符

我在Windows中運行perl，並且有一些文本文件的CRLF（0d0a）中的行。問題是，這些偶爾會在文件周圍撒上0a字符，這些字符會在Windows perl中分割線條，並與我的處理分離。我的想法是預處理文件，讀取由CRLF分割的行，但至少在Windows中，它堅持要在LF上分割。需要一個腳本來剝離文本文件中多餘的換行字符

我試過設置$/

local $/ = 0x0d; 
open(my $fh, "<", $file) or die "Unable to open $file"; 
while (my $line = <$fh>) { 
    # do something to get rid of the 0x0a embedded in the line of text; 
}

...但是這讀多行......這似乎完全錯過0X0D。我也嘗試將其設置爲「\ n」，「\ n \ r」，「\ r」和「\ r \ n」。必須有一個簡單的方法來做到這一點！

我需要擺脫這樣我才能正確處理文件。所以，我需要一個腳本來打開文件，在CRLF上分割文件，找到沒有0d前面的任何0a，將它爆炸並逐行保存到一個新文件中。

感謝您提供的任何幫助。

來源

2017-02-19 Gregg Seipp

這樣做正則表達式：'qr /（[\ n \ x {0B} \ f \ r \ x {85}] {1,2}）/;'消除東西？也許[文件::編輯::便攜]（https://metacpan.org/release/STEVEB/File-Edit-Portable-1.24） – stevieb

對於初學者來說，local $/ = 0x0d;應該是local $/ = "\x0d";。

除此之外，問題是:crlf圖層默認添加到Windows中的文件句柄。這會導致CRLF在讀取時轉換爲LF（寫入時反之亦然）。因此，在你閱讀的內容中沒有CR，所以你最終閱讀整個文件。

只需刪除/禁用:crlf就可以做到。

local $/ = "\x0D\x0A"; 
open(my $fh, "<:raw", $file) 
    or die("Can't open \"$file\": $!\n"); 

while (<$fh>) { 
    chomp; 
    s/\x0A//g; 
    say; 
}

來源

2017-02-20 18:56:18 ikegami

這樣更好。更短，更重要。謝謝。 –

該解決方案通過使用二進制模式讀取數據來工作。

open(my $INFILE, "<:raw", $infile) 
    or die "Can't open \"$infile\": $!\n"); 
open(my $OUTFILE, ">:raw", $outfile) 
    or die "Can't create \"$outfile\": $!\n"); 

my $buffer = ''; 
while (sysread($INFILE, $buffer, 4*1024*1024)) { 
    $buffer =~ s/(?<!\x0D)\x0A//g; 

    # Keep one char in case we cut between a CR and a LF. 
    print $OUTFILE substr($buffer, 0, -1, ''); 
} 

print $OUTFILE $buffer;

來源

2017-02-19 13:56:16

（隨時恢復。我只是覺得你會喜歡清理。） – ikegami

需要一個腳本來剝離文本文件中多餘的換行字符

回答

相關問題