$ cat decsv.awk
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," }
{
# create strings that cannot exist in the input to map escaped quotes to
gsub(/a/,"aA")
gsub(/\\"/,"aB")
gsub(/""/,"aC")
# prepend previous incomplete record segment if any
$0 = prev $0
numq = gsub(/"/,"&")
if (numq % 2) {
# this is inside double quotes so incomplete record
prev = $0 RT
next
}
prev = ""
for (i=1;i<=NF;i++) {
# map the replacement strings back to their original values
gsub(/aC/,"\"\"",$i)
gsub(/aB/,"\\\"",$i)
gsub(/aA/,"a",$i)
}
printf "Record %d:\n", ++recNr
for (i=0;i<=NF;i++) {
printf "\t$%d=<%s>\n", i, $i
}
print "#######"
。
$ awk -f decsv.awk file
Record 1:
$0=<,"This is some text with a , in it.", #data with commas are enclosed in double quotes>
$1=<>
$2=<"This is some text with a , in it.">
$3=< #data with commas are enclosed in double quotes>
#######
Record 2:
$0=<,"line 1 of data
line 2 of data", #data with a couple of newlines>
$1=<>
$2=<"line 1 of data
line 2 of data">
$3=< #data with a couple of newlines>
#######
Record 3:
$0=<,"Data that may a have , in it and
also be on a newline as well.",>
$1=<>
$2=<"Data that may a have , in it and
also be on a newline as well.">
$3=<>
#######
Record 4:
$0=<,"Data that \"may\" a have ""quote"" in it and
also be on a newline as well.",>
$1=<>
$2=<"Data that \"may\" a have ""quote"" in it and
also be on a newline as well.">
$3=<>
#######
以上使用GNU awk FPAT和RT。我不知道有哪種CSV格式可以讓你在沒有用引號括起來的字段中間有一個換行符(如果是的話,你永遠不會知道任何記錄結束的地方),所以腳本不允許那。以上是在此輸入文件上運行的:
$ cat file
,"This is some text with a , in it.", #data with commas are enclosed in double quotes
,"line 1 of data
line 2 of data", #data with a couple of newlines
,"Data that may a have , in it and
also be on a newline as well.",
,"Data that \"may\" a have ""quote"" in it and
also be on a newline as well.",
您是否可以在雙引號分隔字段中使用雙引號,如果是這樣,它們是如何轉義的? '「foo \」bar「或'」foo「」bar「'或其他什麼? –