我正在關注谷歌雲ml上重新培訓開始的花教程。我可以運行教程,訓練,預測,很好。重新培訓創始谷歌雲陷入全球第一步0
然後我用我自己的測試數據集替換了花朵數據集。圖像數字的光學字符識別。
我完整的代碼here
字典文件labels
評估和演示set
培訓Set
從谷歌提供最近泊塢窗內部版本。
`docker run -it -p "127.0.0.1:8080:8080" --entrypoint=/bin/bash gcr.io/cloud-datalab/datalab:local-20161227
我可以預處理文件,並提交使用
# Submit training job.
gcloud beta ml jobs submit training "$JOB_ID" \
--module-name trainer.task \
--package-path trainer \
--staging-bucket "$BUCKET" \
--region us-central1 \
-- \
--output_path "${GCS_PATH}/training" \
--eval_data_paths "${GCS_PATH}/preproc/eval*" \
--train_data_paths "${GCS_PATH}/preproc/train*"
培訓工作,但它永遠不會使得過去全球一步0花教程大約〜1小時的培訓上自由層。我已經讓訓練持續了11個小時。沒有運動。
縱觀在爲Stackdriver,沒有什麼進展。
我也曾嘗試20幅訓練圖像,以及10個EVAL圖像的微小玩具的數據集。同樣的問題。
也許並不奇怪,我不能想像這個日誌中tensorboard,沒有顯示。
完整的訓練日誌:
INFO 2017-01-10 17:22:00 +0000 unknown_task Validating job requirements...
INFO 2017-01-10 17:22:01 +0000 unknown_task Job creation request has been successfully validated.
INFO 2017-01-10 17:22:01 +0000 unknown_task Job MeerkatReader_MeerkatReader_20170110_170701 is queued.
INFO 2017-01-10 17:22:07 +0000 unknown_task Waiting for job to be provisioned.
INFO 2017-01-10 17:22:07 +0000 unknown_task Waiting for TensorFlow to start.
INFO 2017-01-10 17:22:10 +0000 master-replica-0 Running task with arguments: --cluster={"master": ["master-d4f6-0:2222"]} --task={"type": "master", "index": 0} --job={
INFO 2017-01-10 17:22:10 +0000 master-replica-0 "package_uris": ["gs://api-project-773889352370-ml/MeerkatReader_MeerkatReader_20170110_170701/f78d90a60f615a2d108d06557818eb4f82ffa94a/trainer-0.1.tar.gz"],
INFO 2017-01-10 17:22:10 +0000 master-replica-0 "python_module": "trainer.task",
INFO 2017-01-10 17:22:10 +0000 master-replica-0 "args": ["--output_path", "gs://api-project-773889352370-ml/MeerkatReader/MeerkatReader_MeerkatReader_20170110_170701/training", "--eval_data_paths", "gs://api-project-773889352370-ml/MeerkatReader/MeerkatReader_MeerkatReader_20170110_170701/preproc/eval*", "--train_data_paths", "gs://api-project-773889352370-ml/MeerkatReader/MeerkatReader_MeerkatReader_20170110_170701/preproc/train*"],
INFO 2017-01-10 17:22:10 +0000 master-replica-0 "region": "us-central1"
INFO 2017-01-10 17:22:10 +0000 master-replica-0 } --beta
INFO 2017-01-10 17:22:10 +0000 master-replica-0 Downloading the package: gs://api-project-773889352370-ml/MeerkatReader_MeerkatReader_20170110_170701/f78d90a60f615a2d108d06557818eb4f82ffa94a/trainer-0.1.tar.gz
INFO 2017-01-10 17:22:10 +0000 master-replica-0 Running command: gsutil -q cp gs://api-project-773889352370-ml/MeerkatReader_MeerkatReader_20170110_170701/f78d90a60f615a2d108d06557818eb4f82ffa94a/trainer-0.1.tar.gz trainer-0.1.tar.gz
INFO 2017-01-10 17:22:12 +0000 master-replica-0 Building wheels for collected packages: trainer
INFO 2017-01-10 17:22:12 +0000 master-replica-0 creating '/tmp/tmpSgdSzOpip-wheel-/trainer-0.1-cp27-none-any.whl' and adding '.' to it
INFO 2017-01-10 17:22:12 +0000 master-replica-0 adding 'trainer/model.py'
INFO 2017-01-10 17:22:12 +0000 master-replica-0 adding 'trainer/util.py'
INFO 2017-01-10 17:22:12 +0000 master-replica-0 adding 'trainer/preprocess.py'
INFO 2017-01-10 17:22:12 +0000 master-replica-0 adding 'trainer/task.py'
INFO 2017-01-10 17:22:12 +0000 master-replica-0 adding 'trainer-0.1.dist-info/metadata.json'
INFO 2017-01-10 17:22:12 +0000 master-replica-0 adding 'trainer-0.1.dist-info/WHEEL'
INFO 2017-01-10 17:22:12 +0000 master-replica-0 adding 'trainer-0.1.dist-info/METADATA'
INFO 2017-01-10 17:22:12 +0000 master-replica-0 Running setup.py bdist_wheel for trainer: finished with status 'done'
INFO 2017-01-10 17:22:12 +0000 master-replica-0 Stored in directory: /root/.cache/pip/wheels/e8/0c/c7/b77d64796dbbac82503870c4881d606fa27e63942e07c75f0e
INFO 2017-01-10 17:22:12 +0000 master-replica-0 Successfully built trainer
INFO 2017-01-10 17:22:13 +0000 master-replica-0 Running command: python -m trainer.task --output_path gs://api-project-773889352370-ml/MeerkatReader/MeerkatReader_MeerkatReader_20170110_170701/training --eval_data_paths gs://api-project-773889352370-ml/MeerkatReader/MeerkatReader_MeerkatReader_20170110_170701/preproc/eval* --train_data_paths gs://api-project-773889352370-ml/MeerkatReader/MeerkatReader_MeerkatReader_20170110_170701/preproc/train*
INFO 2017-01-10 17:22:14 +0000 master-replica-0 Starting master/0
INFO 2017-01-10 17:22:14 +0000 master-replica-0 Initialize GrpcChannelCache for job master -> {0 -> localhost:2222}
INFO 2017-01-10 17:22:14 +0000 master-replica-0 Started server with target: grpc://localhost:2222
ERROR 2017-01-10 17:22:16 +0000 master-replica-0 device_filters: "/job:ps"
INFO 2017-01-10 17:22:19 +0000 master-replica-0 global_step/sec: 0
只是重複的最後一行,直到我殺了它。
我的這項服務的心智模式是不正確的?所有的建議歡迎。
GCS文件是'空的',它們存在但只有20個字節,而花卉教程中的每個.gz約20-50kb。我不清楚是什麼導致preprocess.py失敗(也許我應該用正確的標籤打開一個新問題)。 – bw4sz
確認。爲了將來的參考,這是如果eval.csv中的路徑錯誤會發生什麼。我在存儲桶名稱中加了一個額外的斜槓。 – bw4sz