如何在Java中高效地計算CSV文件的行

我開發了一個代碼，它打開一個CSV文件並使用for循環計算行數，但我覺得這種方法效率不高，並導致多次延遲。如何在Java中高效地計算CSV文件的行

TargetFile.mdb有120行
report.csv有11000行

如果我用這個方法的代碼需要運行120*11000=1320000 times計算每個資源計數。這裏是我的代碼：

這裏是新的，工作代碼，由Xavier Delamotte有效地計算行：

import java.io.File; 
import java.io.FileReader; 
import java.io.IOException; 
import java.sql.SQLException; 
import java.util.HashMap; 
import java.util.List; 
import java.util.Map; 

import au.com.bytecode.opencsv.CSVReader; 

import com.healthmarketscience.jackcess.Database; 
import com.healthmarketscience.jackcess.Table; 

public class newcount { 

    public static class ValueKey{ 
     String mdmId; 
     String pgName; 

     @Override 
     public int hashCode() { 
      final int prime = 31; 
      int result = 1; 
      result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode()); 
      result = prime * result 
       + ((pgName == null) ? 0 : pgName.hashCode()); 
      return result; 
     } 
     @Override 
     public boolean equals(Object obj) { 
      if (this == obj) 
       return true; 
      if (obj == null) 
       return false; 
      if (getClass() != obj.getClass()) 
       return false; 
      ValueKey other = (ValueKey) obj; 
      if (mdmId == null) { 
       if (other.mdmId != null) 
        return false; 
      } else if (!mdmId.equals(other.mdmId)) 
       return false; 
      if (pgName == null) { 
       if (other.pgName != null) 
        return false; 
      } else if (!pgName.equals(other.pgName)) 
       return false; 
      return true; 
     } 
     public ValueKey(String mdmId, String pgName) { 
      super(); 
      this.mdmId = mdmId; 
      this.pgName = pgName; 
     } 
    } 

    public static void main(String[] args) throws IOException, SQLException,Throwable{ 


     Integer count; 

     String MDMID,NAME,PGNAME,PGTARGET,TEAM; 

     Table RESOURCES = Database.open(new File("C:/STATS/TargetFile.mdb")).getTable("RESOURCES"); 
     int pcount = RESOURCES.getRowCount(); 


     String csvFilename = "C:\\MDMSTATS\\APEX\\report.csv"; 
     CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
     List<String[]> content = csvReader.readAll(); 
     Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>(); 
     for (String[] rowcsv : content) { 
      ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]); 
      count = csvValuesCount.get(key); 
      csvValuesCount.put(key,count == null ? 1: count + 1); 

     } 

     //int count = 0; 
     // Taking 1st resource data 
     for (int i = 0; i < pcount-25; i++) { 
      Map<String, Object> row = RESOURCES.getNextRow(); 
      TEAM = row.get("TEAM").toString(); 
      MDMID = row.get("MDM ID").toString(); 
      NAME = row.get("RESOURCE NAME").toString(); 
      PGNAME = row.get("PG NAME").toString(); 
      PGTARGET = row.get("PG TARGET").toString(); 
      int PGTARGETI = Integer.parseInt(PGTARGET); 
      Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME)); 
      count = countInteger == null ? 0: countInteger; 
      System.out.println(NAME+"\t"+PGNAME+"\t"+count); 

     } 
    } 
}

來源

2013-04-06 H4SN

所有我想要做的是通過使用CSV文件，SQL查詢來計算資源計數 – H4SN 2013-04-06 11:38:50

我建議只讀一次csv文件，並計算由mdmId和pgName組成的密鑰的出現次數。

如果你有番石榴，你可以使用一個MultiSet<ValueKey>http://guava-libraries.googlecode.com/svn-history/r8/trunk/javadoc/com/google/common/collect/Multiset.html代替Map<ValueKey,Integer>

編輯：和使用你需要把在其他文件或聲明爲靜態的ValueKey類。

類ValueKey：

public static class ValueKey{ 
     String mdmId; 
     String pgName; 
     @Override 
     public int hashCode() { 
      final int prime = 31; 
      int result = 1; 
      result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode()); 
      result = prime * result 
        + ((pgName == null) ? 0 : pgName.hashCode()); 
      return result; 
     } 
     @Override 
     public boolean equals(Object obj) { 
      if (this == obj) 
       return true; 
      if (obj == null) 
       return false; 
      if (getClass() != obj.getClass()) 
       return false; 
      ValueKey other = (ValueKey) obj; 
      if (mdmId == null) { 
       if (other.mdmId != null) 
        return false; 
      } else if (!mdmId.equals(other.mdmId)) 
       return false; 
      if (pgName == null) { 
       if (other.pgName != null) 
        return false; 
      } else if (!pgName.equals(other.pgName)) 
       return false; 
      return true; 
     } 
     public ValueKey(String mdmId, String pgName) { 
      super(); 
      this.mdmId = mdmId; 
      this.pgName = pgName; 
     } 
    }

你的方法：

Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES"); 
    int pcount = RESOURCES.getRowCount(); 

    String csvFilename = "C:\\STATS\\APEX\\report.csv"; 
    CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
    List<String[]> content = csvReader.readAll(); 
    Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>(); 
    for (String[] rowcsv : content) { 
     ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]); 
     Integer count = csvValuesCount.get(key); 
     csvValuesCount.put(key,count == null ? 1: count + 1); 

    } 

    int count = 0; 
    // Taking 1st resource data 
    for (int i = 0; i < pcount; i++) { 
     Map<String, Object> row = RESOURCES.getNextRow(); 
     TEAM = row.get("TEAM").toString(); 
     MDMID = row.get("MDM ID").toString(); 
     NAME = row.get("RESOURCE NAME").toString(); 
     PGNAME = row.get("PG NAME").toString(); 
     PGTARGET = row.get("PG TARGET").toString(); 
     int PGTARGETI = Integer.parseInt(PGTARGET); 
     Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME)); 
     count = countInteger == null ? 0: countInteger; 
    }

來源

2013-04-06 11:40:07

沒有總計數，那麼這個代碼將只是包含計數，這也有兩個循環[for（String [] rowcsv：content）]也會爲每個資源運行11000次，現在更新的代碼現在csv文件被拿走一次 – H4SN 2013-04-06 11:43:40

它看起來會工作，我會讓你知道後，將其添加到我的整個代碼:) – H4SN 2013-04-06 12:15:37

看到更新的問題我得到1代碼中的錯誤 – H4SN 2013-04-07 15:28:41

親愛的朋友，我建議你使用OpenCSV

我認爲它能夠滿足您的要求; ）

來源

2013-04-06 11:32:35

我使用打開CSV再次看到代碼是的，它可以滿足，但我認爲運行代碼1320000次是不是一個好主意，它需要很長時間 – H4SN 2013-04-06 11:36:42

尊敬的H4SN，從代碼中不清楚您使用OpenCSV，無論如何，Xaviar Delmotte的解決方案是好的，試試吧;） – 2013-04-06 11:57:17

先讀取CSV，製作一組字段6值，然後用它來更新計數。這應該是相當快的。

//open csv and make lookup set 
Set<String> mdmids = new HashSet<String>() 
String[] rowcsv = null; 
String csvFilename = "C:\\STATS\\APEX\\report.csv"; 
CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
List content = csvReader.readAll(); 

for (Object object : content) { 
    rowcsv = (String[]) object;    
     mdmids.add(rowcsv[6]) 
} 
Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES"); 
pcount = RESOURCES.getRowCount(); 
count = 0; 
// Taking 1st resource data 
for (i = 0; i < pcount; i++){ 
Map<String, Object> row = RESOURCES.getNextRow();        
    TEAM = row.get("TEAM").toString(); 
MDMID = row.get("MDM ID").toString(); 
NAME = row.get("RESOURCE NAME").toString(); 
PGNAME = row.get("PG NAME").toString(); 
PGTARGET = row.get("PG TARGET").toString(); 
int PGTARGETI = Integer.parseInt(PGTARGET); 

// use lookup set 
if(mdmids.contains(MDMID)) { 
    count++; 
} 
}

來源

2013-04-06 11:37:18 rongenre

怎麼樣count會返回第一個MDMID的行數？ – H4SN 2013-04-06 11:56:57

csv將被打開，csv（第6列）中的一組mdmids將被建立起來。然後csv完成並可以收集垃圾。通過數據庫行，它將使用該集來查看是否存在匹配的mdmid。這是一個哈希操作，只有11000個條目相當快。 – rongenre 2013-04-06 12:03:26

如果（mdmids.contains（MDMID））對於特定的MDMID – H4SN 2013-04-07 03:14:08

如何在Java中高效地計算CSV文件的行

回答

相關問題