爲什麼在svmlight中培訓和測試文件相同

我下載了適用於Linux OS的SVM-Light。運行命令。它生成2個可執行文件svm_learn和svm_classify。使用此我試圖execte一個例子文件（它包含train.dat，test.dat文件）與下面的代碼爲什麼在svmlight中培訓和測試文件相同

./svm_learn example1/train.dat example1/model.txt 
./svm_classify example1/test.dat example1/model.txt example1/predictions.txt

之後，我得到2文本文件模型和預測。我是svm的新手。爲什麼test.dat和train.dat在示例文件中的格式相同？

test.dat +1 6:0.0342598670723747 26:0.148286149621374 27:0.0570037235976456 
train.dat 1 6:0.0198403253586671 15:0.0339873732306071 29:0.0360280968798065

像

> Scanning examples...done 
    Reading examples into                                             memory...100..200..300..400..500..600..700..800..900..1000..1100..1200..1300..1400..1500..1600..1700..1800..1900..2000..OK. (2000 examples read) 
Setting default regularization parameter C=1.0000 
Optimizing........................................................................................................................................................................................................................................................................................................................................................................................................................................done. (425 iterations) 
Optimization finished (5 misclassified, maxdiff=0.00085). 
Runtime in cpu-seconds: 0.07 
Number of SV: 878 (including 117 at upper bound) 
L1 loss: loss=35.67674 
Norm of weight vector: |w|=19.55576 
Norm of longest example vector: |x|=1.00000 
Estimated VCdim of classifier: VCdim<=383.42790 
Computing XiAlpha-estimates...done 
Runtime for XiAlpha-estimates in cpu-seconds: 0.00 
XiAlpha-estimate of the error: error<=5.85% (rho=1.00,depth=0) 
XiAlpha-estimate of the recall: recall=>95.40% (rho=1.00,depth=0) 
XiAlpha-estimate of the precision: precision=>93.07% (rho=1.00,depth=0) 
Number of kernel evaluations: 45954 
Writing model file...done

train.dat輸出培訓文件，以便它執行前標記，那麼爲什麼test.dat在執行前標記？可以解釋輸出尤其是條款precision,recall,error

來源

2014-01-20 user39133

測試數據也被標記，因此您的分類器可以是評估。如果測試裝置沒有良好的標籤，則無法測量其質量。此信息在分類過程中未使用，因此僅用於檢查良好分類的數量。錯誤，精確度和召回率是用於評估分類器的許多指標之一。

誤差= number_of_times_your_model_was_wrong/all_test_cases
精度= TP /（TP + FP）
召回= TP /（TP + FN）

其中

TP =你的模型猜測的次數+1，它確實是+1
FP = ti的數量MES模型猜測+1，但它真的-1
FN =次數模型猜測-1，但它真的+1

來源

2014-01-20 07:07:31 lejlot

格式被稱爲LIBSVM格式，因爲它是由另一個SVM實現定義，LIBSVM。

爲什麼會您想爲訓練和評估數據提供不同的文件格式？

重複使用兩次相同的格式要好得多，而不必支持另一種文件格式。

此外，由@lejlot在他的回答中提到，該測試文件實際上需要同樣的格式驗證。

僅當將SVM應用於完全未知您沒有標籤的新數據。

來源

2014-01-20 12:14:56

爲什麼在svmlight中培訓和測試文件相同

回答

相關問題