2016-03-17 110 views
0

嗨我需要詳細說明一個大的csv文件(20M行)爲每個逗號分隔的字段添加雙引號。 CSV文件中拿到8場昏迷界定如下:使用awk在.CSV逗號分隔文件中添加雙引號

'2016-03-12','12393659','134',,'35533605',189348,9798,gmail.com;live_com.com 
'2016-03-12','12390103','138',,'35438006',5133,1897,google.com 
'2016-03-12','45616164','139',,'01318800',10945593,596633,facebook.com;tumblr.com;t.co 
'2016-03-12','45673436','38',,'86441702',4350985,150327,serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net 

正如你看到的第一個3場是單引號之間,4日是空白,單引號和6日至8日分隔僅逗號之間的第5位。 我希望得到以下結果(也第四場即使空需要用雙引號):

"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com" 
"2016-03-12","12390103","138","","35438006","5133","1897","google.com" 
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co" 
"2016-03-12","45673436","38","","86441702","4350985,"150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net" 

我部分從而以混合結果sed和awk:

sed -e s/\'//g inpu.csv > output.csv eliminate quotes 
awk '{gsub(/[^,]+/,"\"&\"")}1' output.csv > output1.csv add double quotes 

但第四場不是雙引號,我需要儘可能減少精化時間。 無論如何,幫助AWK以更好的表現來完成所有的事情,並且還有第四場雙引號。 許多thx的幫助。 M.Tave

回答

0

如果您的數據是真的沒有嵌入引號,換行符或任何簡單的,那麼所有你需要的是:

$ awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' file 
"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com" 
"2016-03-12","12390103","138","","35438006","5133","1897","google.com" 
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co" 
"2016-03-12","45673436","38","","86441702","4350985","150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net" 
+1

Thanx再次作品很棒..一如往常。將使用這個,更多的壓縮代碼。 –

1

給這個AWK的一行一試:

awk -F, -v OFS="," -v re="^'?|'?$" -v q='"' 
        '{for(i=1;i<=NF;i++)if($i)gsub(re,q,$i);else $i=q$i q}7' file 

的想法是,用gsub()到雙引號添加到那些非空字段。那些空的領域,只需將"添加到頭部和尾部。替換正則表達式被定義爲腳本外部的awk變量,用於避免轉義。

它與輸入數據在這裏:

kent$ awk -F, -v OFS="," -v re="^'?|'?$" -v q='"' '{for(i=1;i<=NF;i++)if($i)gsub(re,q,$i);else $i=q$i q}7' f 
"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com" 
"2016-03-12","12390103","138","","35438006","5133","1897","google.com" 
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co" 
"2016-03-12","45673436","38","","86441702","4350985","150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net" 
+0

感謝上大測試文件,它工作正常。 –