2014-02-28 59 views
0

我有一個文本文件,它看起來像這樣:如何將我的文本文件轉換爲CSV?

a1: sample1 
b1: sample2 
c1: sample3 
d1: sample4 
    sample5 
    sample0 

a1: sample_1 
b1: sample_2 
c1: sample_3 
d1: sample_4 
    sample_5 

a1: sample_11 
b1: sample_22 
c1: sample_33 
d1: sample_44 

我需要將其轉換爲一個CSV,我可以在Excel中訪問。最終輸出應該如下所示:

a1, b1, c1, d1 
sample1,sample2,sample3,"sample4 sample5" 
sample_1,sample_2,sample_3,"sample_4 sample_5" 
sample_11,sample_22,sample_33,"sample_44 sample_55" 

樣本4和樣本5和樣本0,它們都屬於d1,即,在一行中。 所以,基本上D1將是一個小區,它會像三個值:

A1 B1 C1 D1 0行

SAMPLE1 SAMPLE2樣品3 sample4 ROW1 實例5 ROW1
SAMPLE0 ROW1

sample_1 sample_2 sample_3 sample_4 row2 sample_5 row2

d1是一個具有2個值的單元格。

我能夠解析文本文件並根據需要獲取值。 無法獲得列d1所需的方式。 我該怎麼做?

需要一個Perl腳本來做到這一點? 有什麼建議嗎?

open(file, "f1.txt"); 
open(csv, ">+f2.csv"); 
while($line =<file>) 
    chmop; 
    if($line =~/a1) 
    { 
    @arr1 = split(/:/,$line) 
    print csv "@arr1[1],"; 
    } 

    if($line =~/b2) 
    { 
    @arr2 = split(/:/,$line) 
    print csv "@arr2[1],"; 
    } 

close(file); 
close(csv); 

這是我到目前爲止的代碼。

+2

歡迎SO!請告訴我們你到目前爲止做了什麼,併爲你的問題添加更多細節! –

+0

要使用CSV格式,您不需要'a1,','b1,','c1,'和'd1'之間的空格。你也不需要圍繞包含空格的字段引用。 – ThisSuitIsBlackNot

+0

在您的問題中編輯「需要Perl腳本」不會添加任何內容(它已被標記爲[tag:perl])。 UliKöhler問你使用你編寫的Perl代碼時遇到了什麼具體問題。你確實寫了*東西*,不是嗎? – ThisSuitIsBlackNot

回答

0

假設你有一個文件的定標器這樣的內容:

my $input = "a1: sample1 
b1: sample2 
c1: sample3 
d1: sample4, sample5 

a1: sample_1 
b1: sample_2 
c1: sample_3 
d1: sample_4, sample_5 

a1: sample_11 
b1: sample_22 
c1: sample_33 
d1: sample_44, sample_55"; 

然後你可以使用一些正則表達式(當輸入類似於問題你的描述):

## considering the four lines each time and no empty line as well 
$input =~ s/([^\n]+)\n([^\n]+)\n([^\n]+)\n([^\n]+)/"$1","$2","$3","$4"/msg; 

## removing a1: things 
$input =~ s/[a-z]\d+:\s*//ig; 

## removing comma around texts amid of " , " 
$input =~ s/(?<!"),(?!")//ig; 

## finally output! 
print '"a1","b1","c1","d1"'. "\n$input"; 
0

或許下面會有所幫助:

use strict; 
use warnings; 

local ($/, $") = ('', ','); 
print "a1,b1,c1,d1\n"; 

while (<>) { 
    my @fields = map { /:\s+(.+)/; $1 } split /\n/; 
    print qq/@fields[ 0 .. 2 ],"$fields[3]"\n/; 
} 

命令行用法:在你的數據集perl script.pl inFile > outFile

輸出:

a1,b1,c1,d1 
sample1,sample2,sample3,"sample4, sample5" 
sample_1,sample_2,sample_3,"sample_4, sample_5" 
sample_11,sample_22,sample_33,"sample_44, sample_55" 

的腳本設置爲$/ = ''段落模式,閱讀您的文件在同一時間一大塊。它split是換行符上的塊,然後使用正則表達式捕獲想要的字段信息。雙引號放在最後一個字段的周圍,並插入數組切片,由於較早的$" = ',',字段之間會打印,

0

下面是這應該是這樣:

use strict; 
use warnings; 
use Data::Dumper; 

open(my $TXT, "<", 'inabcd.txt') or die "Cound not open"; 
open(my $CSV, ">", "outabcd.csv"); 

my $rowcount = 0; 
my %h =(); 

while(my $line = <$TXT>) { 
    if($line =~ /^$/) { 
     next; 
    } 
    chomp($line); 
    my ($key, @data) = split(':',$line); 

    if (exists $h{$key}) { 
     $rowcount = $h{$key}->{'rowcount'}; 
     $rowcount++; 
     } 

    $h{$key}->{$rowcount} = \@data; 
    $h{$key}->{'rowcount'} = $rowcount; 
} 
my @header =(); 
foreach my $el (keys %h) { 
    if($el ne 'rowcount') { 
     push(@header, $el); 
    } 
} 

my $header = join(',', @header); 

print $CSV "$header". "\n"; 

my $r = 0; 
while($r <= $rowcount) { 
    foreach my $e (@header) { 
      print("@{$h{$e}->{$r}}" . ","); 
      print $CSV "@{$h{$e}->{$r}}" . ","; 
    } 
    print $CSV "\n"; 
    $r++; 
} 

close($TXT); 
close($CSV);