2017-05-04 27 views
1

我有以下格式的CSV文件:針對CSV文件否定匹配使用的sed

$ tail X.csv | sed 's/[a-zA-Z0-9]/X/g' 
XXXXXXX/XXXXXXXX XXXXXXXXXXXX), XXXXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXX (X),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXX (XXXXXXX XXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXX XXXXXXXXX XXXXXX XXXX XXX XXXXXXXX XX XXXXXXX XXX XXXXXXXX XXXXXXX (XXXXXXXXX): XXXXXXXX X XXXXXXXXXX XXXX X XXXXXXXXXX.,XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXX (XXXXXXX XXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXXXX XXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXXXXXXXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXX,XXXXXXXXX (XXXXXX XXXXXXX XXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXX,XXXXXXXX XXXXXXXXX XXXXXXXX XXX XXXXXX XXXXXXX XXXXXXX (XXXXXXXXX).,XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXXXXXX (XXXXXX XXXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXXX XXXX XXXXXXX(X) XX XX/XX/XXXX XXX XXXXXXX XXXXXXXX (XXXXXXXXX).,XXXXX,,X,X,X,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXX,XXXXXXXXXXX (XXXXXXX XXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXX (XXXXXXXXX) (XXXXXXXXXXX XX XXXXX XXX XXXXXXXX-XXXX XXXXXXXXXXX): XXXXXXXXXXXXXXXXXXX (XXXXX), XXXXXXXXXXXXXXXXXX (XXXXX), XXXXXXXXXXXXXX (XXXX), XXXXXXXXXXXXXXXX (XXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXX (XXXXXXX XXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXXXX XXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXXXXXXXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXXXXXX (XXXXXX XXXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXXX XXXX XXXXXXX(X) XX XX/XX/XXXX XXX XXXXX XXXXXXXX (XXXXXXXXX).,XXXXX,,X,X,X,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXXXXXX (XXXXXXXX XXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXXX XXXXX (XXXXXXXXX) (XXXXXXX XXXX): XXXXXXXXXXXXXX (XXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXX (XXXXXXXX XXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXX XX XXXXXXXXXXX XXX XXXXXXXXXX XXXXX): XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXX (XXXXXX XXXXX XXXXXX XXXXXXXXXXXXX XXXX XXX XXXXX. XXX XX XXXX XXXXXX.), XXXXXXXXXXXXXXXXXXXXXXXXXXXX (XXX XX XXXX XXXXX XXX XXX XXXX XXXXXXX.),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
$ 

除了定界符逗號,生成的CSV文件包含逗號作爲值的一部分,以及,所以我需要sed(1)與作爲|另一個分隔符這樣的替代分隔符。

不幸的是,該文件不能再生(更換用別的東西分隔符)。

我不成功的嘗試:

$ tail X.csv | sed 's/[a-zA-Z0-9]/X/g' | sed --regexp-extended '/,/!s/,%s/|/g' | tail -1 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXX (XXXXXXXX XXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXX XX XXXXXXXXXXX XXX XXXXXXXXXX XXXXX): XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXX (XXXXXX XXXXX XXXXXX XXXXXXXXXXXXX XXXX XXX XXXXX. XXX XX XXXX XXXXXX.), XXXXXXXXXXXXXXXXXXXXXXXXXXXX (XXX XX XXXX XXXXX XXX XXX XXXX XXXXXXX.),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
$ 

我怎樣才能解決這個問題?

+0

1.你可能要生成一個不同的字段分隔符或引號裏的價值觀csv文件。 2.如果這不是一個選項,請提供更多信息:是**字段值內每**行的**秒**逗號?如果不是,我們如何才能找出哪些行需要修復? –

+0

1)不幸的是,這不是一個選項,2)文件是巨大的,我不相信它在每一行,但在這個文件中很常見。 – alexus

+0

@alexus,顯示您的文件更多的線,兩條線是不夠的 – RomanPerekhrest

回答

1

我不是sed風扇,所以這裏使用perl版本:

cat X.csv | perl -p -e "s/,(\S)/|\$1/g"

,基本意思是「代替序列 ''在非空間之後加上'|'以下非空格字符」

或者這裏是使用sed版本(應該是POSIX兼容):

cat X.csv | sed -E 's/,([^[:space:]])/|\1/g'

0

用途:

sed -re 's/([^ ]),([^ ])/\1|\2/g' 
+0

雖然這個代碼片斷可以解決的問題,[包括說明](// meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers)確實有助於提高您的文章質量。請記住,您將來會爲讀者回答問題,而這些人可能不知道您的代碼建議的原因。也請儘量不要用解釋性註釋來擠佔代碼,這會降低代碼和解釋的可讀性! – kayess

0

...與@nochkin幫助下,我想出了sed解決方案:

$ tail -1 X.csv | sed 's/[a-zA-Z0-9]/X/g' | sed --regexp-extended 's/,(\S)/|\1/g' 
XXXXXXXXX|XXXX-XX-XX XX:XX:XX.XXXXXXXXX|XX|XXXXX|X|XXXXXX|X|XXXXXX|XXXXXXX (XXXXXXXX XXXXX)|XXXXX|XX.XXX.XXX.XX|XXXXX|XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXX XX XXXXXXXXXXX XXX XXXXXXXXXX XXXXX): XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXX (XXXXXX XXXXX XXXXXX XXXXXXXXXXXXX XXXX XXX XXXXX. XXX XX XXXX XXXXXX.), XXXXXXXXXXXXXXXXXXXXXXXXXXXX (XXX XX XXXX XXXXX XXX XXX XXXX XXXXXXX.)|XXXXX|,X|XXX|XXXXXXX|,|{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} 
$ sed --version 
sed (GNU sed) 4.2.2 
Copyright (C) 2012 Free Software Foundation, Inc. 
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. 
This is free software: you are free to change and redistribute it. 
There is NO WARRANTY, to the extent permitted by law. 

Written by Jay Fenlason, Tom Lord, Ken Pizzini, 
and Paolo Bonzini. 
GNU sed home page: <http://www.gnu.org/software/sed/>. 
General help using GNU software: <http://www.gnu.org/gethelp/>. 
E-mail bug reports to: <[email protected]>. 
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field. 
$ 
+0

沒關係。但是我會使用'-r'而不是'--regexp-extended'來使它在BusyBox和* BSD的sed上工作。 在MacOS X和老版本的Unix,這將是'-E'。 GNU仍然支持這個選項。 –