Groovy csv解析器並導出到數據庫

如何在不解析第一行的情況下解析我的CSV文件？Groovy csv解析器並導出到數據庫

這個類可以工作，但我不想解析我的CSV標題。

import groovy.sql.Sql 

class CSVParserService { 

    boolean transactional = false 

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver") 

    def CSVList = sql.dataSet("ModuleSet") 

    def CSVParser(String filepath, boolean header) { 

     def parse = new File(filepath) 

     // split and populate GeneInfo 
     parse.splitEachLine(',') {fields -> 

     CSVList.add(
       Module : fields[0], 
       Function : fields[1], 
       Systematic_Name : fields[2], 
       Common_Name : fields[3], 
      ) 

     return CSVList 
     } 

    } 
}

我改變我的課，所以現在我有：

import groovy.sql.Sql 

class CSVParserService { 

    boolean transactional = false 

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver") 

    def CSVList = sql.dataSet("ModuleSet") 

    def CSVParser(String filepath, boolean header) { 

    def parse = new File(filepath).readLines()[1..-1] 

    parse.each {line -> 

     // split and populate GeneInfo 
     line.splitEachLine(',') {fields -> 

     CSVList.add(
       Module : fields[0], 
       Function : fields[1], 
       Systematic_Name : fields[2], 
       Common_Name : fields[3], 
      ) 

     return CSVList 
     } 
    } 
    } 
}

工作得很好，直到這部分我的CSV：
「智人白細胞介素4受體（IL-4R），成績單變體1，mRNA「。

當我的解析器得到這個角色，他3削減（應該是1）：
- 智人白介素4受體（IL-4R）
- 轉錄變體1
- mRNA的表達。

我該如何解決這個問題？謝謝你的幫助。

- 新留言 - 這裏是我的CSV行的副本（2號線）：
「M6.6」，NA，「ILMN_1652185」，NA，NA，「IL4RA; CD124」，NA，「NM_000418.2」，「16」，「16p12.1a」，「Homo sapiens interleukin 4 receptor（IL4R），transcript variant 1，mRNA。」，3566，...

正如你所看到的，我的問題是符合「智人白細胞介素4受體（IL4R），轉錄變體1，mRNA」。 ;我不想在「和」之間剪切文本。我的解析器應該只用引號（而不是引號之間的逗號）拆分'，'。例如我有：「part1」，「part2」，「part3」，我只想剪切part1，part2，part3，並且如果在part2中有逗號，我不想剪切這些逗號。總結一下，我只是想在引用的元素中忽略逗號。

來源

2009-12-10 Fabien Barbier

好的，我有我的修復！

下面的代碼：

import groovy.sql.Sql 

class CSVParserService { 

    boolean transactional = false 

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver") 

    def CSVList = sql.dataSet("ModuleSet") 

    def CSVParser(String filepath, boolean header) { 

    def parse = new File(filepath).readLines()[1..-1] 

    def token = ',(?=([^\"]*\"[^\"]*\")*[^\"]*$)' 

    parse.each {line -> 

     // split and populate GeneInfo 
     line.splitEachLine(token) {fields -> 

     CSVList.add(
       Module : fields[0], 
       Function : fields[1], 
       Systematic_Name : fields[2], 
       Common_Name : fields[3], 
      ) 

     return CSVList 
     } 
    } 
    } 
}

看到這個職位的詳細信息： Java: splitting a comma-separated string but ignoring commas in quotes

來源

2009-12-11 23:45:23

您是否考慮過使用可爲您完成所有工作的CSV解析器？像Ostermiller的？ [http://ostermiller.org/utils/CSV.html][1] [1]：http://ostermiller.org/utils/CSV.html – Philippe 2010-08-23 13:27:07

最後，我選擇了這個CSV解析器：http：///opencsv.sourceforge.net/。謝謝。 – 2010-08-24 14:41:40

下面是我創建的Groovy的另一個csv解析庫：[GroovyCSV]（http://xlson.com/groovycsv/）。它基於opencsv。 – xlson 2011-04-11 13:43:34

可以使用讀出的文件的每一行除第一到List：

List<String> allLinesExceptHeader = new File(filepath).readLines()[1..-1]

該文件（的allLinesExceptHeader的元素）中的每一行可接着使用類似於以上代碼

所示被解析

allLinesExceptHeader.each {line ->  
    // Code to parse each line goes here 
}

來源

2009-12-10 22:38:10

使用remove（0）行的名單上可能會比上一個大文件的範圍內更有效？ – leebutts 2009-12-10 23:19:37

Groovy csv解析器並導出到數據庫

回答

相關問題