我同意Matt Jacob的answer - 你應該Text::CSV解析CSV除非你有一個很好的理由不這樣做。
如果你要處理使用正則表達式的話,我想你會與m//
比split
做的更好。例如,這似乎涵蓋了大多數單行CSV數據變體,儘管它不會像引用的字段一樣去除引號,因爲Text::CSV
會 - 這需要單獨的後處理步驟。
use strict;
use warnings;
sub splitter
{
my($row) = @_;
my @fields;
my $i = 0;
while ($row =~ m/((?=,)|[^",][^,]*|"([^"]|"")*")(?:,|$)/g)
{
print "Found [$1]\n";
$fields[$i++] = $1;
}
for (my $j = 0; $j < @fields; $j++)
{
print "$j = [$fields[$j]]\n";
}
}
my $row;
$row = q'ACC000121,2290,"01009900,01009901,01009902,01009903,01009904",4,5,6';
print "Row 1: $row\n";
splitter($row);
$row = q'ACC000121,",",2290,"01009900,""aux data"",01009902,01009903,01009904",,5"abc",6,""';
print "Row 2: $row\n";
splitter($row);
很明顯,它有相當數量的診斷代碼。的輸出(在Perl 5.22.0 Mac OS X上10.11.1)是:
Row 1: ACC000121,2290,"01009900,01009901,01009902,01009903,01009904",4,5,6
Found [ACC000121]
Found [2290]
Found ["01009900,01009901,01009902,01009903,01009904"]
Found [4]
Found [5]
Found [6]
0 = [ACC000121]
1 = [2290]
2 = ["01009900,01009901,01009902,01009903,01009904"]
3 = [4]
4 = [5]
5 = [6]
Row 2: ACC000121,",",2290,"01009900,""aux data"",01009902,01009903,01009904",,5"abc",6,""
Found [ACC000121]
Found [","]
Found [2290]
Found ["01009900,""aux data"",01009902,01009903,01009904"]
Found []
Found [5"abc"]
Found [6]
Found [""]
0 = [ACC000121]
1 = [","]
2 = [2290]
3 = ["01009900,""aux data"",01009902,01009903,01009904"]
4 = []
5 = [5"abc"]
6 = [6]
7 = [""]
在Perl代碼,匹配是:
m/((?=,)|[^",][^,]*|"([^"]|"")*")(?:,|$)/
這看起來並捕獲(在$1
)可以是空字段後跟逗號,也可以是非雙引號後面跟零個或多個非逗號,或者是雙引號,後跟零次或多次出現的序列「不是雙引號或兩個連續的雙引號引號「和另一個雙引號;它然後期望逗號或字符串的結尾。
處理多行字段需要多一點工作。刪除轉義雙引號還需要更多的工作。
使用Text::CSV
更簡單,更不容易出錯(並且它可以處理比這更多的變體)。
看起來像'split'按設計工作。另外,你的第一行並沒有做你認爲正在做的事情。你有一個空字符串和3個「undef」。 –
是否有可能將整個雙引號字符串分成一個標量變量。如何實現這個可以實現 – user
我很好奇,當代碼顯式地只處理4個字段時,您希望輸出中的6個字段。 –