2013-03-17 25 views
1

這是我第二篇關於weka使用情況的帖子(第一篇帖子是here)。我成功地使用TextDirectoryLoader爲Weka提供了培訓和樣本測試數據。很棒。現在我想將它移到生產環境中,所以要從MySQL表中檢索要分類的數據。這是我如何做它:Weka來自MySql數據庫的培訓數據

TextDirectoryLoader loader = new TextDirectoryLoader(); 
    loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/training-data")); 
    Instances dataRaw = loader.getDataSet(); 

    StringToWordVector filter = new StringToWordVector(); 
    filter.setInputFormat(dataRaw); 
    Instances dataTraining = Filter.useFilter(dataRaw, filter); 

    // Create test data instances[this works, but the sample data now needs to come frm the db instead, see below] 
    //loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/test-data")); 
    //dataRaw = loader.getDataSet(); 
    //Instances dataTest = Filter.useFilter(dataRaw, filter); 

    InstanceQuery query = new InstanceQuery(); 
    query.setUsername("myusername"); 
    query.setPassword("mypassword"); 
    String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1"; 
    query.setQuery(sql); 
    Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter); 

    // Classify 
    J48 model = new J48(); 
    model.buildClassifier(dataTraining); 

    for (int i = 0; i < dataTest.numInstances(); i++) { 
      dataTest.instance(i).setClassMissing(); 
      double cls = model.classifyInstance(dataTest.instance(i)); 
      dataTest.instance(i).setClassValue(cls); 
      System.out.println(cls + " -> " + dataTest.instance(i).classAttribute().value((int) cls)); 

    } 

不幸的是這是行不通的,秧雞意外停止在這條線:

Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter); 

所以我想我的問題是如何改造這部分

// Create test data instances[this works, but the sample data now needs to come frm the db instead, see below] 
//loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/test-data")); 
//dataRaw = loader.getDataSet(); 
//Instances dataTest = Filter.useFilter(dataRaw, filter); 

到SQL基於數據

InstanceQuery query = new InstanceQuery(); 
query.setUsername("myusername"); 
query.setPassword("mypassword"); 
String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1"; 
query.setQuery(sql); 
Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter); 

請注意,數據庫連接沒有問題,我確實獲得了正確數量的實例。

欣賞幫助,非常接近。

+1

weka停止「意外」的堆棧跟蹤是什麼?你調查了'query.retrieveInstances()'的輸出嗎? – 2013-03-20 10:46:12

+0

你確定你的SQL:'SELECT d.desc FROM deals d WHERE d.j48 = 1'?我會期望像'SELECT d.desc FROM deal AS d WHERE d.j48 = 1'。 – 2013-03-21 09:39:11

+0

@JanEglinger試圖添加AS但沒有運氣,我檢查了query.retrieveInstances()的錯誤,它的o =(java.lang.ArrayIndexOutOfBoundsException)java.lang.ArrayIndexOutOfBoundsException:1 – 2013-03-25 21:51:31

回答

0

您的代碼使用TextDirectoryLoader類,它基於Arff Files from Text Collections。根據他們的幫助文件

"Loads all text files in a directory and 
uses the subdirectory names as class labels. 
The content of the text files will be stored in a String attribute, 
the filename can be stored as well." 

參見以下code

double[] newInst = new double[2]; 
newInst[0] = (double)data.attribute(0).addStringValue(files[i]); 
.... 
newInst[1] = (double)data.attribute(1).addStringValue(txtStr.toString()); 
data.add(new Instance(1.0, newInst)); 

正如你可以看到這個代碼,希望2個屬性值添加數據集。但是你的sql只提供一個屬性。

String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1"; 
代碼newinst中 1部分

這可能是原因,你們的問題 「(java.lang.ArrayIndexOutOfBoundsException)」。 Weka找不到第二個屬性。

-1

我非常喜歡自己的初學者,但爲防萬一它有用,你知道有一個DatabaseLoader類和一個DatabaseConverter接口?

+0

你應該解釋這些類和接口如何解決這個問題。 – ChrisF 2013-07-18 12:57:05