0
我正嘗試使用graphlab創建線性迴歸模型。我有200個樣本和1個預測器。但是,我遇到了「數字溢出錯誤」,下面是輸出:graphlab線性迴歸由於數值溢出錯誤而終止
model_all = graphlab.linear_regression.create(data2.tail(200), target='output', features=['input'],validation_set=None,l2_penalty=0.0002,solver = 'auto')
Linear regression:
--------------------------------------------------------
Number of examples : 200
Number of features : 1
Number of unpacked features : 1
Number of coefficients : 2
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+--------------------+---------------+
| Iteration | Passes | Elapsed Time | Training-max_error | Training-rmse |
+-----------+----------+--------------+--------------------+---------------+
+-----------+----------+--------------+--------------------+---------------+
TERMINATED: Terminated due to numerical overflow error.
This model may not be ideal. To improve it, consider doing one of the following:
(a) Increasing the regularization.
(b) Standardizing the input data.
(c) Removing highly correlated features.
(d) Removing `inf` and `NaN` values in the training data
提示(二),(c)和(d),因爲只有1個功能且沒有INF並不適用於我的情況或NaN值。我嘗試了各種l2_penalty,但都沒有用。如果我將樣本數量限制在一個較小的數字上,如180,那麼它將起作用。
model_all = graphlab.linear_regression.create(data2.tail(180), target='output', features=['input'],validation_set=None,l2_penalty=0.0002,solver = 'auto')
model_all.get("coefficients").print_rows(num_rows=100)
Linear regression:
--------------------------------------------------------
Number of examples : 180
Number of features : 1
Number of unpacked features : 1
Number of coefficients : 2
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+--------------------+---------------+
| Iteration | Passes | Elapsed Time | Training-max_error | Training-rmse |
+-----------+----------+--------------+--------------------+---------------+
| 1 | 2 | 0.000866 | 9.873043 | 4.272624 |
+-----------+----------+--------------+--------------------+---------------+
SUCCESS: Optimal solution found.
+----------------+-------+------------------+-------------------+
| name | index | value | stderr |
+----------------+-------+------------------+-------------------+
| (intercept) | None | 9.3412783539 | 3.80166353756 |
| DOEDDIST.Index | None | 0.00226165438702 | 0.000975084975224 |
+----------------+-------+------------------+-------------------+
[2 rows x 4 columns]
我不明白是什麼導致數值溢出錯誤。有人可以幫忙解釋嗎?
謝謝。
如果解決這個任務是你所需要的,你總是可以選擇其他的求解器。爲了調試,你可能應該顯示數據,儘管你的觀察結果確實很奇怪。 – sascha
感謝您的回覆 – Pollyanna