2013-07-19 67 views
0

我試圖找到一個正則表達式與CSV文件(用雙引號括起來的值)的值可以包含任何字符。我現在使用的表達(在Java中,因此反斜槓轉義):CSV正則表達式的引用和反斜槓

",(?=(([^\"\\\\]|\\\\.)*\"([^\"\\\\]|\\\\.)*\")*([^\"\\\\]|\\\\.)*$)" 

我和條目,如「random_value」」或‘random_value \’遇到的問題

。附加信息:

"000000000000000","","","","[email protected]","random_value"" 
"000000000000000","","","","[email protected]","random_value\" 
+2

貴國是否能因爲'與CSV'作品表達期望的輸出可能輸入的例子太含糊不清(至少對我而言)。 – Pshemo

+0

考慮使用CSV解析而不是使用複雜的正則表達式。 – anubhava

+0

「random_value」「作爲值是迴避csv,但」random_value \「」是合法的。以及csv不是一個明確的標準:) –

回答

0

說明

那麼假設我們清理你的源文本,包括適當的收盤報價,那麼這個表達式:

  • 匹配所有報價逗號分隔的文本
  • 捕捉龍頭逗號和報價,並與包括文字收盤報價,沿成組0
  • 切去領導對引號和地點的價值爲捕獲組1
  • 允許值包含像\"""

逃脫報價序列。

(?:^|,)"((?<=")(?:[^"]*|\\"|"")*?)"(?=[,\r\n]|\Z)

enter image description here

現場演示:http://www.rubular.com/r/NSSxdHWcDM

示例文字

"1000000000000000","","","","[email protected]","1random_value""" 
"2000000000000000","","","","[email protected]","2random_value\"" 

個捕捉組

[0][0] = "1000000000000000" 
[0][1] = 1000000000000000 

[1][0] = ,"" 
[1][1] = 

[2][0] = ,"" 
[2][1] = 

[3][0] = ,"" 
[3][1] = 

[4][0] = ,"[email protected]" 
[4][1] = [email protected] 

[5][0] = ,"1random_value""" 
[5][1] = 1random_value"" 

[6][0] = "2000000000000000" 
[6][1] = 2000000000000000 

[7][0] = ,"" 
[7][1] = 

[8][0] = ,"" 
[8][1] = 

[9][0] = ,"" 
[9][1] = 

[10][0] = ,"[email protected]" 
[10][1] = [email protected] 

[11][0] = ,"2random_value\"" 
[11][1] = 2random_value\" 
0

Using JavaCSV

String str = "\"000000000000000\",\"\",\"\",\"\",\"[email protected]\",\"random_value\\\"\""; 
CsvReader reader = CsvReader.parse(str); 
reader.readRecord(); 
for (int i=0; i<reader.getColumnCount(); i++) 
    System.out.printf("Scol[%d]: [%s]%n", i, reader.get(i)); 

OUTPUT:

Scol[0]: [000000000000000] 
Scol[1]: [] 
Scol[2]: [] 
Scol[3]: [] 
Scol[4]: [[email protected]] 
Scol[5]: [random_value\"] 
相關問題