我安裝了Theano(TH),Tensorflow(TF)和Keras。 基本測試似乎表明,它們與GPU(GTX 1070),Cuda 8.0,cuDNN5.1一起使用。Keras + Tensorflow優化檔位
如果我用TH作爲後端運行cifar10_cnn.py Keras example,它似乎可以正常工作,時間約爲18s/epoch。 如果我用TF運行它然後幾乎(它偶爾有效,不能再現它),優化在每時代以後失去acc = 0.1。這就好像權重沒有更新一樣。
這是一個恥辱,因爲TF後端花費的時間大約是10s/epoch(即使是非常少的幾次)。我使用的是Conda,我對Python很陌生。如果有幫助,「conda list」似乎爲某些軟件包顯示了兩個版本。
如果您有任何線索,請告訴我。謝謝。下面的截圖:
python cifar10_cnn.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
X_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
Using real-time data augmentation.
Epoch 1/200
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.60GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
50000/50000 [==============================] - 11s - loss: 2.3029 - acc: 0.0999 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 2/200
50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 3/200
50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0992 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 4/200
50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000
Epoch 5/200
13184/50000 [======>.......................] - ETA: 7s - loss: 2.3026 - acc: 0.1044^CTraceback (most recent call last):
謝謝,我把學習率降低到0.001,它似乎已經奏效。我認爲GitHub上的一個例子可以「開箱即用」,但也許它只是在TH上進行測試。再次,謝謝。 – ozne