2017-02-26 25 views
0

如何實現標題提到的任務。我們是否有RBF內核中的任何參數來將距離度量設置爲卡方距離度量。我可以在sk-learn庫中看到一個chi2_kernel。在SVM中使用帶有Chi平方距離度量的RBF核心

下面是我寫的代碼。

import numpy as np 
from sklearn import datasets 
from sklearn import svm 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix 

from sklearn.preprocessing import Imputer 
from numpy import genfromtxt 
from sklearn.metrics.pairwise import chi2_kernel 


file_csv = 'dermatology.data.csv' 
dataset = genfromtxt(file_csv, delimiter=',') 

imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=1) 
dataset = imp.fit_transform(dataset) 

target = dataset[:, [34]].flatten() 
data = dataset[:, range(0,34)] 

X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3) 

# TODO : willing to set chi-squared distance metric instead. How to do that ? 
clf = svm.SVC(kernel='rbf', C=1) 
clf.fit(X_train, y_train) 
y_pred = clf.predict(X_test) 

print(f1_score(y_test, y_pred, average="macro")) 
print(precision_score(y_test, y_pred, average="macro")) 
print(recall_score(y_test, y_pred, average="macro")) 

回答

0

您確定要組成 RBF和χ2? χ2自身定義了一個有效的內核,而你所要做的就是

clf = svm.SVC(kernel=chi2_kernel, C=1) 

因爲sklearn接受功能爲內核(但是這將需要O(N^2)內存和時間)。如果你想組合這兩個,它會更復雜一點,你將不得不實現你自己的內核來做到這一點。對於更多的控制(和其他內核),你也可以嘗試pykernels,但是尚不支持組合。

相關問題