2016-08-02 46 views
0

我有時包含逗號和換行符的.csv列中的數據。如果我的數據中有逗號,我用雙引號括住了整個字符串。如何將該列的輸出解析爲一個.txt文件,並考慮換行符和逗號。Awk獲取包含逗號和換行符的.csv列

不以我的命令工作

的樣本數據:

,"This is some text with a , in it.", #data with commas are enclosed in double quotes 

,line 1 of data 
line 2 of data, #data with a couple of newlines 

,"Data that may a have , in it and 
also be on a newline as well.", 

這是我到目前爲止有:

awk -F "\"*,\"*" '{print $4}' file.csv > column_output.txt 
+0

您是否可以在雙引號分隔字段中使用雙引號,如果是這樣,它們是如何轉義的? '「foo \」bar「或'」foo「」bar「'或其他什麼? –

回答

0
$ cat decsv.awk 
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," } 
{ 
    # create strings that cannot exist in the input to map escaped quotes to 
    gsub(/a/,"aA") 
    gsub(/\\"/,"aB") 
    gsub(/""/,"aC") 

    # prepend previous incomplete record segment if any 
    $0 = prev $0 
    numq = gsub(/"/,"&") 
    if (numq % 2) { 
     # this is inside double quotes so incomplete record 
     prev = $0 RT 
     next 
    } 
    prev = "" 

    for (i=1;i<=NF;i++) { 
     # map the replacement strings back to their original values 
     gsub(/aC/,"\"\"",$i) 
     gsub(/aB/,"\\\"",$i) 
     gsub(/aA/,"a",$i) 
    } 

    printf "Record %d:\n", ++recNr 
    for (i=0;i<=NF;i++) { 
     printf "\t$%d=<%s>\n", i, $i 
    } 
    print "#######" 

$ awk -f decsv.awk file 
Record 1: 
     $0=<,"This is some text with a , in it.", #data with commas are enclosed in double quotes> 
     $1=<> 
     $2=<"This is some text with a , in it."> 
     $3=< #data with commas are enclosed in double quotes> 
####### 
Record 2: 
     $0=<,"line 1 of data 
line 2 of data", #data with a couple of newlines> 
     $1=<> 
     $2=<"line 1 of data 
line 2 of data"> 
     $3=< #data with a couple of newlines> 
####### 
Record 3: 
     $0=<,"Data that may a have , in it and 
also be on a newline as well.",> 
     $1=<> 
     $2=<"Data that may a have , in it and 
also be on a newline as well."> 
     $3=<> 
####### 
Record 4: 
     $0=<,"Data that \"may\" a have ""quote"" in it and 
also be on a newline as well.",> 
     $1=<> 
     $2=<"Data that \"may\" a have ""quote"" in it and 
also be on a newline as well."> 
     $3=<> 
####### 

以上使用GNU awk FPAT和RT。我不知道有哪種CSV格式可以讓你在沒有用引號括起來的字段中間有一個換行符(如果是的話,你永遠不會知道任何記錄結束的地方),所以腳本不允許那。以上是在此輸入文件上運行的:

$ cat file 
,"This is some text with a , in it.", #data with commas are enclosed in double quotes 
,"line 1 of data 
line 2 of data", #data with a couple of newlines 
,"Data that may a have , in it and 
also be on a newline as well.", 
,"Data that \"may\" a have ""quote"" in it and 
also be on a newline as well.", 
相關問題