所有數據庫查詢的相同實例頭（arff）

我正在使用InstanceQuery SQL查詢構造我的Instances。但是我的查詢結果並不總是和SQL中的正常一樣。由於不同的SQL構建的這個實例有不同的頭文件。一個簡單的例子可以在下面看到。我懷疑我的結果因爲這種行爲而改變。所有數據庫查詢的相同實例頭（arff）

頁眉1

@attribute duration numeric 
@attribute protocol_type {tcp,udp} 
@attribute service {http,domain_u} 
@attribute flag {SF}

頭2

@attribute duration numeric 
@attribute protocol_type {tcp} 
@attribute service {pm_dump,pop_2,pop_3} 
@attribute flag {SF,S0,SH}

我的問題是：我怎樣才能給正確的頭信息，以實例建設。

就像下面的工作流程是可能的？

從arff文件或其他地方獲取預先準備好的標題信息。
給例如建設這個頭信息
調用SQL函數，並得到實例（頭+數據）

我使用下面的SQL函數來獲取從數據庫實例。

public static Instances getInstanceDataFromDatabase(String pSql 
             ,String pInstanceRelationName){ 
    try { 
     DatabaseUtils utils = new DatabaseUtils(); 

     InstanceQuery query = new InstanceQuery(); 

     query.setUsername(username); 
     query.setPassword(password); 
     query.setQuery(pSql); 

     Instances data = query.retrieveInstances(); 
     data.setRelationName(pInstanceRelationName); 

     if (data.classIndex() == -1) 
     { 
       data.setClassIndex(data.numAttributes() - 1); 
     } 
     return data; 
    } catch (Exception e) { 
     throw new RuntimeException(e); 
    } 
}

來源

2012-08-03 Atilla Ozgur

我試過各種方法來解決我的問題。但似乎weka內部的API現在不允許解決這個問題。我修改了weka.core.Instances爲我的目的附加命令行代碼。此代碼也在這裏給出answer

根據這個，這裏是我的解決方案。我創建了一個SampleWithKnownHeader.arff文件，其中包含正確的標題值。我用下面的代碼閱讀這個文件。

public static Instances getSampleInstances() { 
    Instances data = null; 
    try { 
     BufferedReader reader = new BufferedReader(new FileReader(
       "datas\\SampleWithKnownHeader.arff")); 
     data = new Instances(reader); 
     reader.close(); 
     // setting class attribute 
     data.setClassIndex(data.numAttributes() - 1); 
    } 
    catch (Exception e) { 
     throw new RuntimeException(e); 
    } 
    return data; 

}

之後，我使用下面的代碼來創建實例。我不得不使用StringBuilder和實例的字符串值，然後將相應的字符串保存到文件中。

public static void main(String[] args) { 

    Instances SampleInstance = MyUtilsForWeka.getSampleInstances(); 

    DataSource source1 = new DataSource(SampleInstance); 

    Instances data2 = InstancesFromDatabase 
      .getInstanceDataFromDatabase(DatabaseQueries.WEKALIST_QUESTION1); 

    MyUtilsForWeka.saveInstancesToFile(data2, "fromDatabase.arff"); 

    DataSource source2 = new DataSource(data2); 

    Instances structure1; 
    Instances structure2; 
    StringBuilder sb = new StringBuilder(); 
    try { 
     structure1 = source1.getStructure(); 
     sb.append(structure1); 
     structure2 = source2.getStructure(); 
     while (source2.hasMoreElements(structure2)) { 
      String elementAsString = source2.nextElement(structure2) 
        .toString(); 
      sb.append(elementAsString); 
      sb.append("\n"); 

     } 

    } catch (Exception ex) { 
     throw new RuntimeException(ex); 
    } 

    MyUtilsForWeka.saveInstancesToFile(sb.toString(), "combined.arff"); 

}

我保存實例到文件代碼如下。

public static void saveInstancesToFile(String contents,String filename) { 

    FileWriter fstream; 
    try { 
     fstream = new FileWriter(filename); 
     BufferedWriter out = new BufferedWriter(fstream); 
     out.write(contents); 
     out.close(); 
    } catch (Exception ex) { 
     throw new RuntimeException(ex); 
    }

這解決了我的問題，但我不知道是否更優雅的解決方案存在。

來源

2012-08-08 20:34:31

我解決了Add過濾器的類似問題，允許將屬性添加到Instances。您需要添加一個正確Attibute與這兩個數據集值的正確列表（在我的情況 - 只測試數據集）：

負荷訓練和測試數據：

/* "train" contains labels and data */ 
/* "test" contains data only */ 
CSVLoader csvLoader = new CSVLoader(); 
csvLoader.setFile(new File(trainFile)); 
Instances training = csvLoader.getDataSet(); 
csvLoader.reset(); 
csvLoader.setFile(new File(predictFile)); 
Instances test = csvLoader.getDataSet();

設置一個新的屬性與Add過濾器：

Add add = new Add(); 
/* the name of the attribute must be the same as in "train"*/ 
add.setAttributeName(training.attribute(0).name()); 
/* getValues returns a String with comma-separated values of the attribute */ 
add.setNominalLabels(getValues(training.attribute(0))); 
/* put the new attribute to the 1st position, the same as in "train"*/ 
add.setAttributeIndex("1"); 
add.setInputFormat(test); 
/* result - a compatible with "train" dataset */ 
test = Filter.useFilter(test, add);

其結果是，既「火車」和「試驗」的報頭是相同的（用於Weka的機器學習兼容）

來源

2014-11-11 22:20:28 Alexander

所有數據庫查詢的相同實例頭（arff）

回答

相關問題