2016-03-08 47 views
0

我有一個逗號分隔的文件,它有6個字段,而我們已經收到逗號作爲字段的值,並用「」括起來。我必須用hiphen替換那個逗號。
如何解析多值逗號分隔文件

輸入作爲

03/03/2016,Customer Service,CHAT,"Responded, closed",True,59 
02/24/2016,Customer Service,CALL,Responded,True,55 
03/03/2016,Customer Service,CHAT,"Responded, awaiting reply",False,46 
02/24/2016,Customer Service,CALL,Responded,False,51 
02/24/2016,Customer Service,CHAT,Responded,False,31 

和預期輸出

03/03/2016,Customer Service,CHAT,"Responded- closed",True,59 
02/24/2016,Customer Service,CALL,Responded,True,55 
03/03/2016,Customer Service,CHAT,"Responded- awaiting reply",False,46 
02/24/2016,Customer Service,CALL,Responded,False,51 
02/24/2016,Customer Service,CHAT,Responded,False,31 
+0

使用具有適當的CSV解析器的語言。大多數shell工具使用正則表達式,這對於識別有效的CSV文件是不夠的。 – chepner

回答

2

在使用FPAT的GNU awk的,你可以這樣做:

awk -v FPAT='"[^"]+"|[^,]+' -v OFS=, '{for(i=1; i<=NF; i++) gsub(/,/, "-", $i)} 1' file.csv 
03/03/2016,Customer Service,CHAT,"Responded- closed",True,59 
02/24/2016,Customer Service,CALL,Responded,True,55 
03/03/2016,Customer Service,CHAT,"Responded- awaiting reply",False,46 
02/24/2016,Customer Service,CALL,Responded,False,51 
02/24/2016,Customer Service,CHAT,Responded,False,31 

使用sed你可以這樣做:

sed -E ':a; s/("[^,"]+),([^"]*")/\1-\2/g; ta;' file.csv 
03/03/2016,Customer Service,CHAT,"Responded- closed",True,59 
02/24/2016,Customer Service,CALL,Responded,True,55 
03/03/2016,Customer Service,CHAT,"Responded- awaiting reply",False,46 
02/24/2016,Customer Service,CALL,Responded,False,51 
02/24/2016,Customer Service,CHAT,Responded,False,31 
+1

++爲了漂亮的AWK解決方案 – MaxU

+0

因爲我沒有使用gnu-awk,所以命令返回的結果如下$ awk -v FPAT ='「[^」] +「| [^,] +'-v OFS = ,'{for(i = 1; i <= NF; i ++)gsub(/,/,「 - 」,$ i)}'file.csv 03/03/2016 - Customer,Service-CHAT-「Responded - , - 關 - 「真 - 59 02/24/2016-客戶,服務-CALL-Responded-True-55 012/03/03/2016-客戶,服務CHAT-」響應 - 等待回覆「-False- 46 02/24/2016-Customer,Service-CALL-Responded-False-51 02/24/2016-Customer,Service-CHAT-Responded-False-31 –

+0

'FPAT'僅在gnu-awk中支持,所以它不適用於舊的awk版本。試試我建議的'sed'命令 – anubhava