2013-06-12 52 views
1

我有一個文本文件,在行首以空格作爲分隔符。僅從文本文件中的一行解析日期

沒有初始空格的行應放在CSV文件的第一列;有兩個空格的應該放在CSV的第二列;有四個空格的應該放在第三欄。

這是所有工作正常所需。

在以兩個空格開頭的行中,我希望只有日期應該放在第二列中,並放棄該行的其他數據。其餘的應該保持原樣。

爲了清楚起見,我在行的開頭標出了空格#

文本文件:

Component1 
##(111) Amar Sen <[email protected]> <No comment> 2013/04/01 
####/Com/src/folder1/folder2/newfile.txt 
##(1199) Prashant Singh <[email protected]> <No comment> 2013/04/24 
####/Com/src/folder1/folder2/testfile24 
####/Com/src/folder1/folder2/testfile25 
####/Com/src/folder1/folder2/testfile26 
##(1204) Anthony Li <[email protected]> <No comment> 2013/04/25 
####/Com/src2 
Component2(added) 
Component3 

輸出格式:

Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt 
      2013/04/24,/Com/src/folder1/folder2/testfile24 
        /Com/src/folder1/folder2/testfile25 
         /Com/src/folder1/folder2/testfile26 
      2013/04/25,/Com/src2 
Component2(added) 
Component3 

下面的代碼。它的工作很好,除了上面描述的變化。

use strict; 
use warnings; 

my $previous_count   = "-1"; #beginning, we will think, that no spaces. 
my $current_count    = "0"; #current default value 
my $maximum_count    = 3; 
my $to_written    = ""; 
my $delimiter_between_columns = ","; 
my $newline_separator   = ";"; 

my $file = 'C:\\textfile.txt'; 
open (my $fh, '<:encoding(UTF-8)', $file) or die "Could not open file '$file' $!"; 

while (my $row = <$fh>) { 

    # ok, read. 
    chomp($row); 

    # print "row is : $row\n"; 
    if ($row =~ m/^(\s*)/) { 

    #print length($1); 
    $current_count = length($1)/2; #take number of spaces divided by 2 
    $row =~ s/^\s+//; 

    if ($previous_count >= $current_count || $previous_count == $maximum_count) { 

     #output here 
     print "$to_written" . $newline_separator . "\n"; 

     $previous_count = 0; 
     $to_written  = ""; 
    } 
    $previous_count = 0 if ($previous_count == -1); 
    $to_written .= $delimiter_between_columns x ($current_count - $previous_count) . "$row"; 

    $previous_count = $current_count; 

    #print"\n"; 
    } 
} 

print "$to_written" . $newline_separator . "\n"; 
+0

「輸出格式」您發佈不符合你描述你想要的,在那裏你have'。由於CSV字段由逗號分隔,因此任何不含逗號的行表示所有內容都位於第一列。 – doubleDown

回答

1

你似乎已經用自己的解決方案把自己綁在一起了。

這個程序似乎是做你所需要的。我爲您的「輸出格式」添加了一些逗號,因爲您的示例對於初始空字段沒有佔位符。

我爲此保留了散列字符。顯然,將它們改爲空格很簡單,用s/^(\s*)//代替s/^(#*)//

use strict; 
use warnings; 

my @row; 

while (<DATA>) { 

    chomp; 
    s/^(#*)//; 
    my $i = length($1)/2; 

    if ($i == 1 and m<(\d{4}/\d{2}/\d{2})>) { 
    $row[$i] = $1; 
    } 
    else { 
    $row[$i] = $_; 
    } 

    if ($i == 2) { 
    print join(',', @row), ";\n"; 
    @row = ('') x 3; 
    } 
} 


__DATA__ 
Component1 
##(111) Amar Sen <[email protected]> <No comment> 2013/04/01 
####/Com/src/folder1/folder2/newfile.txt 
##(1199) Prashant Singh <[email protected]> <No comment> 2013/04/24 
####/Com/src/folder1/folder2/testfile24 
####/Com/src/folder1/folder2/testfile25 
####/Com/src/folder1/folder2/testfile26 
##(1204) Anthony Li <[email protected]> <No comment> 2013/04/25 
####/Com/src2 

輸出

Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt; 
,2013/04/24,/Com/src/folder1/folder2/testfile24; 
,,/Com/src/folder1/folder2/testfile25; 
,,/Com/src/folder1/folder2/testfile26; 
,2013/04/25,/Com/src2; 

更新

它更有意義級聯從一個和兩個到它們未提供隨後的行的列的值。如果你從我的程序刪除行@row = ('') x 3它會做到這一點,這個輸出

Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt; 
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24; 
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25; 
Component1,2013/04/24,/Com/src/folder1/folder2/testfile26; 
Component1,2013/04/25,/Com/src2; 
+0

謝謝你的答覆。它以我需要的方式工作,除了在一個失敗的地方。事實上,我的錯誤,我應該更多地採樣。如果我有Component2和Component3(開頭沒有空格),代碼應該輸出Component2和Component3,不管是否有數據,但代碼只輸出那些有數據的行。我已經更新了文本文件和輸出格式我的問題,請看看。 –