可以與perl及其Text::CSV_XS
模塊嘗試:
#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;
my (@columns);
open my $fh, '<', shift or die;
my $csv = Text::CSV_XS->new or die;
while (my $row = $csv->getline($fh)) {
undef @columns;
if (@$row <= 12) {
@columns = @$row;
next;
}
my $extra_columns = (@$row - 12)/2;
my $post_columns_index = 4 + 2 * $extra_columns * 2;
@columns = (
@$row[0..3],
(join('', @$row[4..(4+$extra_columns)])) x 2,
@$row[$post_columns_index..$#$row]
);
}
continue {
$csv->print(\*STDOUT, \@columns);
printf "\n";
}
假設與三根線,其中所述第一個具有一個額外的逗號一個輸入文件(infile
),第二個具有兩個附加逗號,第三個是正確的:
2011,123456,1234567,12345678,Hey There,How are you,Hey There,How are you,882864309037,ABC ABCD,LABACD,1.00000000,80.2500000,One Two
2011,123456,1234567,12345678,Hey There,How are you,now,Hey There,How are you,now,882864309037,ABC ABCD,LABACD,1.00000000,80.2500000,One Two
2011,123456,1234567,12345678,Hey There:How are you,Hey There:How are you,882864309037,ABC ABCD,LABACD,1.00000000,80.2500000,One Two
運行腳本,如:
perl script.pl infile
國債收益率:
2011,123456,1234567,12345678,"Hey ThereHow are you","Hey ThereHow are you",882864309037,"ABC ABCD",LABACD,1.00000000,80.2500000,"One Two"
2011,123456,1234567,12345678,"Hey ThereHow are younow","Hey ThereHow are younow",LABACD,1.00000000,80.2500000,"One Two"
2011,123456,1234567,12345678,"Hey There:How are you","Hey There:How are you",882864309037,"ABC ABCD",LABACD,1.00000000,80.2500000,"One Two"
需要注意的是它增加了一些報價,但它是正確的總部設在csv
規範,更容易處理了以前的狀態。
第4列和第7列總是包含數字? –
如果可能的話,最好在包含逗號的列上使用封裝來正確地重新請求或重新生成csv文件。 例如'2011,123456,1234567,12345678,「你好,你好嗎」,「你好,你好嗎」,882864309037,ABC ABCD,LABACD,1.00000000,80.2500000,One Two' – AeroX