從尼基答案是我認爲最簡單的解決方案。
但是,另一種簡單的解決方案是使用sklearn和train_test_split()
from sklearn.model_selection import train_test_split
data, target = load_raw_data(data_size) # own method, data := ['hello','...'] target := [1 0 -1] label
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.33, random_state=42)
還是numpy的版本:
import numpy as np
texts, target = load_raw_data(data_size) # own method, texts := ['hello','...'] target := [1 0 -1] label
train_indices = np.random.choice(len(target), round(0.8 * len(target)), replace=False)
test_indices = np.array(list(set(range(len(target))) - set(train_indices)))
x_train = [x for ix, x in enumerate(texts) if ix in train_indices]
x_test = [x for ix, x in enumerate(texts) if ix in test_indices]
y_train = np.array([x for ix, x in enumerate(target) if ix in train_indices])
y_test = np.array([x for ix, x in enumerate(target) if ix in test_indices])
所以這是你的選擇,編碼快樂:)
如果你可以用numpy做,我假設你熟悉切片。 Tensorflow爲張量實現[slicing](https://www.tensorflow.org/versions/r0.12/api_docs/python/array_ops/slicing_and_joining)功能。 – gionni
我不知道這個功能。我想使用TFLearn並使用隨機樣本。那可能嗎? – Marc
您可以使用張力流與tflearn,我認爲這就是爲什麼tflearn不實施切片,但我可能是錯的... – gionni