2013-07-24 147 views
3

我有一個包含逗號分隔值和一些列的值可以包含像,.<>!/\;&SED語句更改/修改CSV分隔符和分隔符

我試圖將CSV轉換爲以逗號分隔的字符一些CSV文件,報價封閉CSV

示例數據:

DateCreated,DateModified,SKU,Name,Category,Description,Url,OriginalUrl,Image,Image50,Image100,Image120,Image200,Image300,Image400,Price,Brand,ModelNumber 
2012-10-19 10:52:50,2013-06-11 02:07:16,34,Austral Foldaway 45 Rotary Clothesline,Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers,"Watch the Product Video   Plenty of Space to Hang a Family Wash Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green. Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash. If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp; Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil). To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline  &nbsp;   //   Customer Video Reviews &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;",https://track.commissionfactory.com.au/p/10604/1718695,http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/,http://content.commissionfactory.com.au/Products/7228/1718695.jpg,http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],http://content.commissionfactory.com.au/Products/7228/[email protected],309.9000 AUD,Austral,FA45GR 

而且我試圖達到的輸出

"DateCreated","DateModified","SKU","Name","Category","Description","Url","OriginalUrl","Image","Image50","Image100","Image120","Image200","Image300","Image400","Price","Brand","ModelNumber" 
"2012-10-19 10:52:50","2013-06-11 02:07:16","34","Austral Foldaway 45 Rotary Clothesline","Home & Garden > Household Supplies > Laundry Supplies > Drying Racks & Hangers","Watch the Product Video   Plenty of Space to Hang a Family Wash Austral's Foldaway 45 rotary clothesline is a folding head rotary clothes hoist beautifully finished in either Beige or Heritage Green. Even though the Foldaway 45 is compact, you still get a large 45 metres of line space, big enough for a full family wash. If you want the advantage of a rotary hoist, but dont want to lose your yard, then the Austral Foldaway 45 is the clothesline for you.&nbsp; Installation Note:&nbsp;A core hole is only required when installing into existing concrete, e.g. a pathway. Not required in the ground(grass/soil). To watch video on YouTube, click the following link:&nbsp;Austral Foldaway 45 Rotary Clothesline  &nbsp;   //   Customer Video Reviews &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;","https://track.commissionfactory.com.au/p/10604/1718695","http://www.lifestyleclotheslines.com.au/austral-foldaway-45-rotary-clothesline/","http://content.commissionfactory.com.au/Products/7228/1718695.jpg","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","http://content.commissionfactory.com.au/Products/7228/[email protected]","309.9000 AUD","Austral","FA45GR" 

任何援助非常感謝。

回答

3

首先,讓我們嘗試瑣碎(和「不夠好」)的解決方案,只是增加了一個雙引號的各個領域(包括那些已經有雙引號這ISN! 「T你想要的)

sed -r 's/([^,]*)/"\1"/g' 

大,第一部分尋找與他們沒有逗號序列,第二部分增加了他們周圍的雙引號,最終的‘G’是指這樣做不是每行

一次

這會變成

abc,345, some words ,"some text","text,with,commas" 

到 「ABC」, 「345」, 「有些話」 「」 一些文本 「」 「」 文本 「 」用「, 」逗號「,」

幾件事請注意:

  • 它正確地圍繞着它們之間有空格的「某些單詞」,但也圍繞着初始和最終空格。我認爲沒關係,但如果沒有,它可以修復

  • 如果該字段已經有引號,則會再次引用,這是壞的。需要修復

  • 如果該字段已經有引號且內部文本有逗號(這不應該被認爲是字段分隔符),這些逗號也被引用。這也需要修復

因此,我們要匹配兩個不同的正則表達式 - 要麼有一個帶引號的字符串或現場沒有逗號:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g' 

結果現在是

"abc","345"," some words ",""some text"",""text,with,commas"" 

正如您所看到的,我們對最初引用的文字有雙引號。爲此,我們將有一個第二sed命令刪除:

sed -r 's/([^,"]*|"[^"]*")/"\1"/g' | sed 's/""/"/g' 

導致

"abc","345"," some words ","some text","text,with,commas" 

耶!

+0

感謝您的回覆,我只是測試了您的sed語法,但它似乎沒有提供所需的結果,我不知道是否在這裏丟失了一些東西,仍然掌握了所有這些:) –

+0

如果您想要看看原始文件之一,請使用以下命令獲取CSV文件:wget --trust-server-names --header =「Content-Type:text/csv」-O EXAMPLE_IN.csv --user -agent =「Mozilla/5.0(X11; U; Linux x86_64; en-US)AppleWebKit/534.16(KHTML,如Gecko)Chrome/10.0.648.205 Safari/534.16」'http://dashboard.commissionfactory.com.au/加盟/素材/數據傳送專線/ jPSA4dbg17SY7svvjeSX5Jf1iO @ b5JXshOfY @ ovjzeKj4PGivuyn5qqxrLDr86GysLTj $ bTyoKaj77Pltfmh9dvnkOCS4MHzjvTSlK6Dfg == /''' –

+0

的sed:-e表達式#1,焦炭24:無與倫比(或\('是從我的Ubuntu CLI –

0

這聽起來像你希望文件中的每一行都以雙引號開頭和結尾。如果是這樣應該工作:

sed -i.bak 's/^\(.*\)$/"\1"/' filename 
+0

道歉,不是每一行,每個*列「在CSV被包含在」 –

0

試試這個解決方案。它優於我以前的,因爲現在我使用了一個解析器來正確處理字段中的逗號。這是neccesary模塊Text::CSV_XS工作:

#!/usr/bin/env perl 

use strict; 
use warnings; 
use Text::CSV_XS; 

die qq|Usage: perl $0 <csv-file>\n| unless @ARGV == 1; 

open my $fh, '<', shift or die qq|ERROR: Could not open input file\n|; 

my $csv = Text::CSV_XS->new({ 
     always_quote => 1, 
}); 

while (my $row = $csv->getline($fh)) { 
     $csv->print(*STDOUT, $row); 
     print "\n"; 
} 
$csv->eof; 
close $fh; 
+0

@TimDunkley:請給,更多的信息。任何錯誤?根本沒有修改?有些行可以工作,但不是全部?如果您嘗試使用整個文件的一小部分呢? – Birei

+0

@birie我已更新原始帖子,輸出 –

+0

@TimDunkley:嗯,好的。行失敗是因爲它們在字段內有逗號。這太複雜了。這個'awk'程序不處理這種情況。使用CSV解析器切換到語言是很有必要的。我會盡力解決它。 – Birei