寫入雙到CSV更改Java代碼來編寫雙[]到

訓練的分類算法
運行的Java程序CSV（用例= WEKA庫）用訓練算法
寫出來的結果爲.csv文件

問題上未標記的數據集的預測是，它目前寫出離散分類結果（即whic h類別算法猜測一行所在）。我想要寫出給定類別的概率（例如，如果我將行分類爲「垃圾郵件」或「不是垃圾郵件」，那麼我希望得到垃圾郵件的可能性）。

我的理解是，要做到這一點，我需要在我的代碼中使用distributionForInstance而不是classifyInstance。從WEKA：

如果你有興趣的分佈在所有的類，使用方法 distributionForInstance（實例）。此方法返回一個雙重數組，其中包含每個類的概率。

我遇到的問題是，與classifyInstance我對付double數據類型和distributionForInstance我處理的double[]數據類型，顯然不是正確地調整我的代碼。

這裏是寫出謹慎預測的工作代碼：

public class runPredictions { 
public static void runPredictions(ArrayList al2) throws IOException, Exception{ 
    // Retrieve objects 
    Instances newTest = (Instances) al2.get(0); 
    Classifier clf = (Classifier) al2.get(1); 

    // Print status 
    System.out.println("Generating predictions..."); 

    // create copy 
    Instances labeled = new Instances(newTest); 

    // label instances 
    for (int i = 0; i < newTest.numInstances(); i++) { 
     double clsLabel = clf.classifyInstance(newTest.instance(i)); 
     labeled.instance(i).setClassValue(clsLabel); 

    } 
    System.out.println("Predictions complete! Writing output file to csv..."); 
    BufferedWriter outFile = new BufferedWriter(new FileWriter("C:/Users/hackr/Desktop/silverbullet_output.csv")); 

    for (int i = 0; i < labeled.size(); i++) 
    { 
     outFile.write(labeled.get(i).toString()); 
     outFile.write("\n"); 
    } 
    System.out.println("Output file written."); 
    System.out.println("Completed successfully!"); 
    outFile.close();  
}  
}

現在我工作的代碼有以下幾點：

並引發

索引越界

錯誤。

我也移動了創建clsLabel，因爲顯然當數據類型發生變化時它找不到符號，除非我將它移動到for循環內。

來源

2017-01-30 Hack-R

基於粗略的一瞥，它可能的索引不排隊，所以'我'可能會導致你超出界限。該函數返回結果數組，而不是存儲在索引「i」處的單個結果。您將需要遍歷結果集以獲得您期望的結果。 'for（double d：clsLabel）{write（Double.toString（d））}' – Brendan

@HackR（當它使用「 - 」時它會截斷你的名字）。這可能不是全部，但它是我相信的一個開始。如果有效，我會將我的評論改寫爲答案。 – Brendan

@Brendan更新 - 是的，完全工作！ :)謝謝 –

重新評論我的評論。

從clf.distributionForInstance(newTest.instance(i));得到的結果本身就是double[]。這意味着你不是從分佈函數中獲得一個值，而是將整個分佈作爲一組值。

要正確顯示的總體分佈，你需要循環的結果單獨設置和打印值：

for (int i = 0; i < labeled.size(); i++) { 
    double[] clsLabel = clf.distributionForInstance(newTest.instance(i)); 
    for(double d : clsLabel) { 
     outFile.write(Double.toString(d)); 
    } 
    outFile.write("\n"); 
}

假設有2類（被預測2個類別，如「垃圾郵件」，「不垃圾郵件「）以下作品：

BufferedWriter outFile = new BufferedWriter(new FileWriter("silverbullet_rro_output.csv")); 
StringBuilder builder = new StringBuilder(); 

for (int i = 0; i < labeled.size(); i++)  
{ 
    double[] clsLabel = clf.distributionForInstance(newTest.instance(i)); 
    for(int j=0;j<2;j++){ 
     builder.append(clsLabel[j]+""); 
     if(j < clsLabel.length - 1) 
      builder.append(","); 
    } 
    builder.append("\n"); 
} 
outFile.write(builder.toString());//save the string representation 
System.out.println("Output file written."); 
System.out.println("Completed successfully!"); 
outFile.close();

來源

2017-01-30 16:20:56 Brendan

非常感謝。我現在做的唯一不同的是我把新的線部分放在循環中。由於每行有兩個類，因此我可以根據需要提供兩倍的行數，但我可以很容易地修復這一點。 –

假設你的輸出將類似於一個數據透視表，類標籤作爲列和從你的classfier返回的每個類的分數我假設，你需要遍歷數組，併爲每個值或僅列出值。我不知道double []數組中的值如何與類標籤相關聯，但不知何故，您必須創建該關聯。也許如果分類器不能分類，它會返回一個空數組，這就是爲什麼你會得到IOOB異常。

來源

2017-01-30 16:06:23 markg

謝謝。聽起來和你一樣，Brendon也在描述同樣的事情。我現在試一試。 **更新**：這是正確的。 Brendon答案中的代碼使測試更容易，所以我會將他標記爲答案，但我也會讚揚你的答案。再次感謝。 –

寫入雙到CSV更改Java代碼來編寫雙[]到

回答

相關問題