我有一個包含1700萬觀察值的數據集,我試圖用它來訓練DNNRegressor
模型。但是,培訓根本不起作用。損失大約10^15,這真是令人震驚。我已經嘗試了幾個星期的不同事情,無論我做什麼,我都無法承受損失。TensorFlow培訓不起作用:模型沒有學習數據
例如,訓練後我進行測試predition與用於訓練數據相同的觀察結果中的一個。預期的結果是140944.00
,但預測產品-169532.5
,這是很荒謬的。訓練數據中甚至沒有任何負面的價值,我不明白它如何可以如此。
下面是一些示例訓練數據:
Amount Contribution ServiceType Percentile Time Result
214871.00 3501.00 SM23 high 50 17807828.00
214871.00 3501.00 SM23 high 51 19216520.00
214871.00 3501.00 SM23 high 52 19676064.00
214871.00 3501.00 SM23 high 53 21038840.00
214871.00 3501.00 SM23 high 54 22248295.00
214871.00 3501.00 SM23 high 55 22412713.00
28006.00 83.00 SM0 i_low 0 28006.00
28006.00 83.00 SM0 i_low 1 28804.00
28006.00 83.00 SM0 i_low 2 30140.00
28006.00 83.00 SM0 i_low 3 31598.00
28006.00 83.00 SM0 i_low 4 33130.00
28006.00 83.00 SM0 i_low 5 34663.00
這是我的代碼:
feature_columns = [
tf.feature_column.numeric_column('Amount', dtype=dtypes.float32),
tf.feature_column.numeric_column('Contribution', dtype=dtypes.float32),
tf.feature_column.embedding_column(
tf.feature_column.categorical_column_with_vocabulary_list(
'ServiceType',
[
'SM0', 'SM1', 'SM2', 'SM3',
'SM4', 'SM5', 'SM6', 'SM7',
'SM8', 'SM9', 'SM10', 'SM11',
'SM12', 'SM13', 'SM14', 'SM15',
'SM16', 'SM17', 'SM18', 'SM19',
'SM20', 'SM21', 'SM22', 'SM23'
],
dtype=dtypes.string
),
dimension=16
),
tf.feature_column.embedding_column(
tf.feature_column.categorical_column_with_vocabulary_list(
'Percentile',
['i_low', 'low', 'mid', 'high'],
dtype=dtypes.string
),
dimension=16
),
tf.feature_column.numeric_column('Time', dtype=dtypes.int8)
]
model = tf.estimator.DNNRegressor(
hidden_units=[64, 32],
feature_columns=feature_columns,
model_dir=os.getcwd() + "\job",
label_dimension=1,
weight_column=None,
optimizer='Adagrad',
activation_fn=tf.nn.elu,
dropout=None,
input_layer_partitioner=None,
config=RunConfig(
master=None,
num_cores=4,
log_device_placement=False,
gpu_memory_fraction=1,
tf_random_seed=None,
save_summary_steps=100,
save_checkpoints_secs=0,
save_checkpoints_steps=None,
keep_checkpoint_max=5,
keep_checkpoint_every_n_hours=10000,
log_step_count_steps=100,
evaluation_master='',
model_dir=os.getcwd() + "\job",
session_config=None
)
)
print('Training...')
model.train(input_fn=get_input_fn('train'), steps=100000)
print('Evaluating...')
model.evaluate(input_fn=get_input_fn('test'), steps=4000)
print('Predicting...')
prediction = model.predict(input_fn=get_input_fn('predict'))
print(list(prediction))
的input_fn
計算如下:
def split_input():
data = pd.read_csv('C:\\all_data.txt', sep='\t')
x = data.drop('Result', axis=1)
y = data.Result
return train_test_split(x, y, test_size=0.2, random_state=123)
def get_input_fn(input_fn_type):
train_x, test_x, train_y, test_y = split_input()
if input_fn_type == 'train':
return tf.estimator.inputs.pandas_input_fn(
x=train_x,
y=train_y,
num_epochs=None,
shuffle=True
)
elif input_fn_type == 'test':
return tf.estimator.inputs.pandas_input_fn(
x=test_x,
y=test_y,
num_epochs=1,
shuffle=False
)
elif input_fn_type == 'predict':
return tf.estimator.inputs.pandas_input_fn(
x=pd.DataFrame(
{
'Amount': 52050.00,
'Contribution': 1394.00,
'ServiceType': 'SM0',
'Percentile': 'i_low',
'Time': 5
},
index=[0]
),
num_epochs=1,
shuffle=False
)
的輸出如下:
Training...
INFO:tensorflow:loss = 6.30944e+15, step = 1
INFO:tensorflow:global_step/sec: 457.091
INFO:tensorflow:loss = 3.28245e+15, step = 101 (0.219 sec)
INFO:tensorflow:global_step/sec: 533.271
INFO:tensorflow:loss = 2.65647e+15, step = 201 (0.188 sec)
INFO:tensorflow:global_step/sec: 533.274
...
INFO:tensorflow:loss = 1.06601e+15, step = 99701 (0.203 sec)
INFO:tensorflow:global_step/sec: 533.289
INFO:tensorflow:loss = 2.12652e+15, step = 99801 (0.188 sec)
INFO:tensorflow:global_step/sec: 533.273
INFO:tensorflow:loss = 1.31647e+15, step = 99901 (0.203 sec)
INFO:tensorflow:Saving checkpoints for 100000 into C:\projection_model\job\model.ckpt.
INFO:tensorflow:Loss for final step: 2.88956e+15.
Evaluating...
INFO:tensorflow:Evaluation [1/4000]
INFO:tensorflow:Evaluation [2/4000]
INFO:tensorflow:Evaluation [3/4000]
...
INFO:tensorflow:Evaluation [3998/4000]
INFO:tensorflow:Evaluation [3999/4000]
INFO:tensorflow:Evaluation [4000/4000]
INFO:tensorflow:Finished evaluation at 2017-08-30-19:04:03
INFO:tensorflow:Saving dict for global step 100000: average_loss = 1.37941e+13, global_step = 100000, loss = 1.76565e+15
Predicting...
[{'predictions': array([-169532.5], dtype=float32)}] # Should be somewhere around 140944.00
爲什麼模型沒有學習數據?我試過不同的迴歸器和輸入規範化,沒有任何工作。
,應該是比較快一個建議嘗試:作爲測試,這只是一個測試,請嘗試使用每萬分之一的數據點,使得數據集大小更小,故障排除相應速度更快。 –