在以下鏈接提供的代碼中,我需要在訓練循環中添加10倍交叉驗證,但我對Tensorflow很陌生,而且我真的很難找到一種方法來做到這一點,但仍然不知道。如何在此Tensorflow多類SVM代碼中實現交叉驗證
我適應所提供的代碼,以我的數據集和它運作良好,但我需要用相同的重採樣技術,以便利用評價Tensorflow的性能比較我的舊R代碼裏面表現Tensorflow性能GPU。
另外,我需要知道最終模型的參數以及驗證數據的預測。任何幫助,將不勝感激
感謝
編輯: 我試圖使用Kfold但問題是與代碼是爲多類SVM書寫方式。 y_vals的大小與類的數量(3)相同,但實際上並不是這樣。如果你可以看看上面的代碼或重現它,請明白我的意思。由於這個原因,我現在有這個錯誤:IndexError:index(樣本數量除以分割數)超出軸0的大小(分類數) 這裏是我修改的代碼kFold:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from sklearn import datasets
from tensorflow.python.framework import ops
from sklearn.model_selection import KFold
ops.reset_default_graph()
# Create graph
sess = tf.Session()
# Load the data
# iris.data = [(Sepal Length, Sepal Width, Petal Length, Petal Width)]
iris = datasets.load_iris()
x_vals = np.array([[x[0], x[3]] for x in iris.data])
y_vals1 = np.array([1 if y==0 else -1 for y in iris.target])
y_vals2 = np.array([1 if y==1 else -1 for y in iris.target])
y_vals3 = np.array([1 if y==2 else -1 for y in iris.target])
y_vals = np.array([y_vals1, y_vals2, y_vals3])
# Declare batch size
batch_size = 50
# Initialize placeholders
x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32)
y_target = tf.placeholder(shape=[3, None], dtype=tf.float32)
prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32)
# Create variables for svm
b = tf.Variable(tf.random_normal(shape=[3,batch_size]))
# Gaussian (RBF) kernel
gamma = tf.constant(-10.0)
dist = tf.reduce_sum(tf.square(x_data), 1)
dist = tf.reshape(dist, [-1,1])
sq_dists = tf.multiply(2., tf.matmul(x_data, tf.transpose(x_data)))
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists)))
# Declare function to do reshape/batch multiplication
def reshape_matmul(mat):
v1 = tf.expand_dims(mat, 1)
v2 = tf.reshape(v1, [3, batch_size, 1])
return(tf.matmul(v2, v1))
# Compute SVM Model
first_term = tf.reduce_sum(b)
b_vec_cross = tf.matmul(tf.transpose(b), b)
y_target_cross = reshape_matmul(y_target)
second_term = tf.reduce_sum(tf.multiply(my_kernel, tf.multiply(b_vec_cross,
y_target_cross)),[1,2])
loss = tf.reduce_sum(tf.negative(tf.subtract(first_term, second_term)))
# Gaussian (RBF) prediction kernel
rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1])
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1])
pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data,
tf.transpose(prediction_grid)))), tf.transpose(rB))
pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist)))
prediction_output = tf.matmul(tf.multiply(y_target,b), pred_kernel)
prediction = tf.arg_max(prediction_output-
tf.expand_dims(tf.reduce_mean(prediction_output,1), 1), 0)
accuracy = tf.reduce_mean(tf.cast(tf.equal(prediction,
tf.argmax(y_target,0)), tf.float32))
# Declare optimizer
my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)
# Initialize variables
init = tf.global_variables_initializer()
sess.run(init)
# Training loop
kf = KFold(n_splits=3)
loss_vec = []
train_accuracy = []
valid_accuracy = []
x_trains = []
y_trains = []
x_tests = []
y_tests = []
for train_index, test_index in kf.split(x_vals):
X_train, X_test = x_vals[train_index], x_vals[test_index]
y_train, y_test = y_vals[train_index], y_vals[test_index]
x_trains.append(X_train)
y_trains.append(y_train)
x_tests.append(X_test)
y_tests.append(y_test)
x_trains = np.asarray(x_trains)
y_trains = np.asarray(y_trains)
x_tests = np.asarray(x_tests)
y_tests = np.asarray(y_tests)
for i in range(100):
rand_index = np.random.choice(len(x_trains), size=batch_size)
rand_x = x_trains[rand_index]
rand_y = y_trains[:,rand_index]
sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y})
loss_vec.append(temp_loss)
train_acc_temp = sess.run(accuracy, feed_dict={x_data: x_trains,
y_target: y_trains,
prediction_grid:x_trains})
train_accuracy.append(train_acc_temp)
valid_acc_temp = sess.run(accuracy, feed_dict={x_data: x_tests,
y_target: y_tests,
prediction_grid: x_tests})
valid_accuracy.append(valid_acc_temp)
if (i+1)%25==0:
print('Step #' + str(i+1))
print('Loss = ' + str(temp_loss))
# Plot train/test accuracies
plt.plot(train_accuracy, 'k-', label='Training Accuracy')
plt.plot(valid_accuracy, 'r--', label='Validation Accuracy')
plt.title('Train and Validation Set Accuracies')
plt.xlabel('Generation')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.show()
# Plot loss over time
plt.plot(loss_vec, 'k-')
plt.title('Loss per Generation')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()
交叉驗證是你在python中實現的東西,然後從每個摺疊向你的TF模型提供正確的數據。 – gidim
感謝您的指導。我試圖使用Kfold,但問題出在代碼寫入多類SVM的方式上。 y_vals的大小與類的數量(3)相同,但實際上並不是這樣。如果你可以看看上面的代碼,請明白我的意思。由於這個原因,我現在得到了這個錯誤:IndexError:index(分片數量的樣本數量)超出軸0的大小(類數)的範圍。 – illy