2013-04-06 117 views
0

我開發了一個代碼,它打開一個CSV文件並使用for循環計算行數,但我覺得這種方法效率不高,並導致多次延遲。如何在Java中高效地計算CSV文件的行

  • TargetFile.mdb有120行
  • report.csv有11000行

如果我用這個方法的代碼需要運行120*11000=1320000 times計算每個資源計數。這裏是我的代碼:

這裏是新的,工作代碼,由Xavier Delamotte有效地計算行:

import java.io.File; 
import java.io.FileReader; 
import java.io.IOException; 
import java.sql.SQLException; 
import java.util.HashMap; 
import java.util.List; 
import java.util.Map; 

import au.com.bytecode.opencsv.CSVReader; 

import com.healthmarketscience.jackcess.Database; 
import com.healthmarketscience.jackcess.Table; 

public class newcount { 

    public static class ValueKey{ 
     String mdmId; 
     String pgName; 

     @Override 
     public int hashCode() { 
      final int prime = 31; 
      int result = 1; 
      result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode()); 
      result = prime * result 
       + ((pgName == null) ? 0 : pgName.hashCode()); 
      return result; 
     } 
     @Override 
     public boolean equals(Object obj) { 
      if (this == obj) 
       return true; 
      if (obj == null) 
       return false; 
      if (getClass() != obj.getClass()) 
       return false; 
      ValueKey other = (ValueKey) obj; 
      if (mdmId == null) { 
       if (other.mdmId != null) 
        return false; 
      } else if (!mdmId.equals(other.mdmId)) 
       return false; 
      if (pgName == null) { 
       if (other.pgName != null) 
        return false; 
      } else if (!pgName.equals(other.pgName)) 
       return false; 
      return true; 
     } 
     public ValueKey(String mdmId, String pgName) { 
      super(); 
      this.mdmId = mdmId; 
      this.pgName = pgName; 
     } 
    } 

    public static void main(String[] args) throws IOException, SQLException,Throwable{ 


     Integer count; 

     String MDMID,NAME,PGNAME,PGTARGET,TEAM; 

     Table RESOURCES = Database.open(new File("C:/STATS/TargetFile.mdb")).getTable("RESOURCES"); 
     int pcount = RESOURCES.getRowCount(); 


     String csvFilename = "C:\\MDMSTATS\\APEX\\report.csv"; 
     CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
     List<String[]> content = csvReader.readAll(); 
     Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>(); 
     for (String[] rowcsv : content) { 
      ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]); 
      count = csvValuesCount.get(key); 
      csvValuesCount.put(key,count == null ? 1: count + 1); 

     } 

     //int count = 0; 
     // Taking 1st resource data 
     for (int i = 0; i < pcount-25; i++) { 
      Map<String, Object> row = RESOURCES.getNextRow(); 
      TEAM = row.get("TEAM").toString(); 
      MDMID = row.get("MDM ID").toString(); 
      NAME = row.get("RESOURCE NAME").toString(); 
      PGNAME = row.get("PG NAME").toString(); 
      PGTARGET = row.get("PG TARGET").toString(); 
      int PGTARGETI = Integer.parseInt(PGTARGET); 
      Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME)); 
      count = countInteger == null ? 0: countInteger; 
      System.out.println(NAME+"\t"+PGNAME+"\t"+count); 

     } 
    } 
} 
+0

所有我想要做的是通過使用CSV文件,SQL查詢來計算資源計數 – H4SN 2013-04-06 11:38:50

回答

3

我建議只讀一次csv文件,並計算由mdmId和pgName組成的密鑰的出現次數。

如果你有番石榴,你可以使用一個MultiSet<ValueKey>http://guava-libraries.googlecode.com/svn-history/r8/trunk/javadoc/com/google/common/collect/Multiset.html代替Map<ValueKey,Integer>

編輯:和使用你需要把在其他文件或聲明爲靜態的ValueKey類。

類ValueKey:

public static class ValueKey{ 
     String mdmId; 
     String pgName; 
     @Override 
     public int hashCode() { 
      final int prime = 31; 
      int result = 1; 
      result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode()); 
      result = prime * result 
        + ((pgName == null) ? 0 : pgName.hashCode()); 
      return result; 
     } 
     @Override 
     public boolean equals(Object obj) { 
      if (this == obj) 
       return true; 
      if (obj == null) 
       return false; 
      if (getClass() != obj.getClass()) 
       return false; 
      ValueKey other = (ValueKey) obj; 
      if (mdmId == null) { 
       if (other.mdmId != null) 
        return false; 
      } else if (!mdmId.equals(other.mdmId)) 
       return false; 
      if (pgName == null) { 
       if (other.pgName != null) 
        return false; 
      } else if (!pgName.equals(other.pgName)) 
       return false; 
      return true; 
     } 
     public ValueKey(String mdmId, String pgName) { 
      super(); 
      this.mdmId = mdmId; 
      this.pgName = pgName; 
     } 
    } 

你的方法:

Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES"); 
    int pcount = RESOURCES.getRowCount(); 

    String csvFilename = "C:\\STATS\\APEX\\report.csv"; 
    CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
    List<String[]> content = csvReader.readAll(); 
    Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>(); 
    for (String[] rowcsv : content) { 
     ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]); 
     Integer count = csvValuesCount.get(key); 
     csvValuesCount.put(key,count == null ? 1: count + 1); 

    } 

    int count = 0; 
    // Taking 1st resource data 
    for (int i = 0; i < pcount; i++) { 
     Map<String, Object> row = RESOURCES.getNextRow(); 
     TEAM = row.get("TEAM").toString(); 
     MDMID = row.get("MDM ID").toString(); 
     NAME = row.get("RESOURCE NAME").toString(); 
     PGNAME = row.get("PG NAME").toString(); 
     PGTARGET = row.get("PG TARGET").toString(); 
     int PGTARGETI = Integer.parseInt(PGTARGET); 
     Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME)); 
     count = countInteger == null ? 0: countInteger; 
    } 
+0

沒有總計數,那麼這個代碼將只是包含計數,這也有兩個循環[for(String [] rowcsv:content)]也會爲每個資源運行11000次,現在更新的代碼現在csv文件被拿走一次 – H4SN 2013-04-06 11:43:40

+0

它看起來會工作,我會讓你知道後,將其添加到我的整個代碼:) – H4SN 2013-04-06 12:15:37

+0

看到更新的問題我得到1代碼中的錯誤 – H4SN 2013-04-07 15:28:41

0

親愛的朋友,我建議你使用OpenCSV

我認爲它能夠滿足您的要求; )

+1

我使用打開CSV再次看到代碼是的,它可以滿足,但我認爲運行代碼1320000次是不是一個好主意,它需要很長時間 – H4SN 2013-04-06 11:36:42

+0

尊敬的H4SN,從代碼中不清楚您使用OpenCSV,無論如何,Xaviar Delmotte的解決方案是好的,試試吧;) – 2013-04-06 11:57:17

0

先讀取CSV,製作一組字段6值,然後用它來更新計數。這應該是相當快的。

//open csv and make lookup set 
Set<String> mdmids = new HashSet<String>() 
String[] rowcsv = null; 
String csvFilename = "C:\\STATS\\APEX\\report.csv"; 
CSVReader csvReader = new CSVReader(new FileReader(csvFilename)); 
List content = csvReader.readAll(); 

for (Object object : content) { 
    rowcsv = (String[]) object;    
     mdmids.add(rowcsv[6]) 
} 
Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES"); 
pcount = RESOURCES.getRowCount(); 
count = 0; 
// Taking 1st resource data 
for (i = 0; i < pcount; i++){ 
Map<String, Object> row = RESOURCES.getNextRow();        
    TEAM = row.get("TEAM").toString(); 
MDMID = row.get("MDM ID").toString(); 
NAME = row.get("RESOURCE NAME").toString(); 
PGNAME = row.get("PG NAME").toString(); 
PGTARGET = row.get("PG TARGET").toString(); 
int PGTARGETI = Integer.parseInt(PGTARGET); 

// use lookup set 
if(mdmids.contains(MDMID)) { 
    count++; 
} 
} 
+0

怎麼樣count會返回第一個MDMID的行數? – H4SN 2013-04-06 11:56:57

+0

csv將被打開,csv(第6列)中的一組mdmids將被建立起來。然後csv完成並可以收集垃圾。 通過數據庫行,它將使用該集來查看是否存在匹配的mdmid。這是一個哈希操作,只有11000個條目相當快。 – rongenre 2013-04-06 12:03:26

+0

如果(mdmids.contains(MDMID))對於特定的MDMID – H4SN 2013-04-07 03:14:08

相關問題