2017-01-31 107 views
0

我想使用OpenCSV解析CSV文件。其中一列以YAML序列化格式存儲數據,因爲它裏面可以有逗號而被引用。它裏邊還有引號,所以通過放兩個引號就可以逃脫。我能夠在Ruby中輕鬆解析這個文件,但是使用OpenCSV我無法完全解析它。它是一個UTF-8編碼文件。使用OpenCSV在引用字段內使用雙引號解析CSV

這裏是一個試圖讀取該文件

CSVReader reader = new CSVReader(new InputStreamReader(new FileInputStream(csvFilePath), "UTF-8"), ',', '\"', '\\'); 

下面是這個文件2行我的Java代碼。第一行沒有被正確解析,並且因爲我猜想出現了雙引號而在""[Fair Trade Certified]""處分裂。

1061658767,update,1196916,Product,28613099,Product::Source,"--- 
product_attributes: 
- 
- :name: Ornaments 
    :brand_id: 49120 
    :size: each 
    :alcoholic: false 
    :details: ""[Fair Trade Certified]"" 
    :gluten_free: false 
    :kosher: false 
    :low_fat: false 
    :organic: false 
    :sugar_free: false 
    :fat_free: false 
    :vegan: false 
    :vegetarian: false 
",,2015-11-01 00:06:19.796944,,,,,, 
1061658768,create,,,28613100,Product::Source,"--- 
product_id: 
retailer_id: 
store_id: 
source_id: 333790 
locale: en_us 
source_type: Product::PrehistoricProductDatum 
priority: 1 
is_definition: 
product_attributes: 
",,2015-11-01 00:06:19.927948,,,,,, 
+1

「標準」 對的CSV文件是RFC4180,但並不總是遵循。它包括用逗號引用字段,並將內部引號轉換爲兩個引號。谷歌搜索「RFC4180 Java解析器」出現了一些可能性。 – Paul

+0

With * OpenCSV *你無法解析它。信貸到期的信貸。 – EJP

+0

@EJP不知道你暗示的是什麼:)但無論如何,使用與RFC4180兼容的解析器修復了它。 – invinc4u

回答

2

解決方案是使用與RFC4180兼容的CSV解析器,如https://stackoverflow.com/users/103081/paul所示。我曾使用過OpenCSV的CSVReader,它無法正常工作,或者我無法正常工作。

我使用https://github.com/osiegmar/FastCSV,一個RFC4180 CSV解析器,它可以無縫工作。

File file = new File(csvFilePath); 
CsvReader csvReader = new CsvReader(); 
CsvContainer csv = csvReader.read(file, StandardCharsets.UTF_8); 
for (CsvRow row : csv.getRows()) { 
    System.out.println(row.getFieldCount()); 
} 
+0

OpenCSV至少從版本4.1開始具有RFC4180解析器。這裏是javadoc:http://opencsv.sourceforge.net/apidocs/com/opencsv/RFC4180Parser.html – dazito

0

首先我很高興FastCSV爲你工作,但我跑了涉嫌串並運行它通過3.9 openCSV,並同時與CsvParser和RFC4180Parser工作。您能否詳細說明它如何不使用3.9 openCSV進行解析和/或嘗試,看看您是否遇到同樣的問題,然後嘗試使用下面的配置。

這裏是我用來測試:

CSVParser:

@Test 
public void parseBigStringFromStackOverflowWithMultipleQuotesInLine() throws IOException { 

    String bigline = "28613099,Product::Source,\"---\n" + 
      "product_attributes:\n" + 
      "-\n" + 
      "- :name: Ornaments\n" + 
      " :brand_id: 49120\n" + 
      " :size: each\n" + 
      " :alcoholic: false\n" + 
      " :details: \"\"[Fair Trade Certified]\"\"\n" + 
      " :gluten_free: false\n" + 
      " :kosher: false\n" + 
      " :low_fat: false\n" + 
      " :organic: false\n" + 
      " :sugar_free: false\n" + 
      " :fat_free: false\n" + 
      " :vegan: false\n" + 
      " :vegetarian: false\n" + 
      "\",,2015-11-01 00:06:19.796944"; 

    String suspectString = "---\n" + 
      "product_attributes:\n" + 
      "-\n" + 
      "- :name: Ornaments\n" + 
      " :brand_id: 49120\n" + 
      " :size: each\n" + 
      " :alcoholic: false\n" + 
      " :details: \"[Fair Trade Certified]\"\n" + 
      " :gluten_free: false\n" + 
      " :kosher: false\n" + 
      " :low_fat: false\n" + 
      " :organic: false\n" + 
      " :sugar_free: false\n" + 
      " :fat_free: false\n" + 
      " :vegan: false\n" + 
      " :vegetarian: false\n" ; 

    StringReader stringReader = new StringReader(bigline); 

    CSVReaderBuilder builder = new CSVReaderBuilder(stringReader); 
    CSVReader csvReader = builder.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH).build(); 

    String item[] = csvReader.readNext(); 

    assertEquals(5, item.length); 
    assertEquals("28613099", item[0]); 
    assertEquals("Product::Source", item[1]); 
    assertEquals(suspectString, item[2]); 
} 

RFC4180Parser

def 'parse big line from stackoverflow with complex string'() { 
    given: 
    RFC4180ParserBuilder builder = new RFC4180ParserBuilder() 
    RFC4180Parser parser = builder.build() 
    String bigline = "28613099,Product::Source,\"---\n" + 
      "product_attributes:\n" + 
      "-\n" + 
      "- :name: Ornaments\n" + 
      " :brand_id: 49120\n" + 
      " :size: each\n" + 
      " :alcoholic: false\n" + 
      " :details: \"\"[Fair Trade Certified]\"\"\n" + 
      " :gluten_free: false\n" + 
      " :kosher: false\n" + 
      " :low_fat: false\n" + 
      " :organic: false\n" + 
      " :sugar_free: false\n" + 
      " :fat_free: false\n" + 
      " :vegan: false\n" + 
      " :vegetarian: false\n" + 
      "\",,2015-11-01 00:06:19.796944" 

    String suspectString = "---\n" + 
      "product_attributes:\n" + 
      "-\n" + 
      "- :name: Ornaments\n" + 
      " :brand_id: 49120\n" + 
      " :size: each\n" + 
      " :alcoholic: false\n" + 
      " :details: \"[Fair Trade Certified]\"\n" + 
      " :gluten_free: false\n" + 
      " :kosher: false\n" + 
      " :low_fat: false\n" + 
      " :organic: false\n" + 
      " :sugar_free: false\n" + 
      " :fat_free: false\n" + 
      " :vegan: false\n" + 
      " :vegetarian: false\n" 

    when: 
    String[] values = parser.parseLine(bigline) 

    then: 
    values.length == 5 
    values[0] == "28613099" 
    values[1] == "Product::Source" 
    values[2] == suspectString 
}