2017-10-28 94 views
0

我能夠從亞馬遜的修改源(在新的深度學習AMI中提供)重新編譯Tensorflow。Tensorflow服務於CUDA 9爲aws新的p3實例編譯

我現在想編譯TF與Tensorflow「叉」的服務,但我得到這個錯誤:

ERROR: /root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:68:1: undeclared inclusion(s) in rule '@org_tensorflow//tensorflow/contrib/nccl:nccl_kernels': 
this rule is missing dependency declarations for the following files included by 'external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_rewrite.cc': 
    '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/common_runtime/optimization_registry.h' 
    '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/common_runtime/device_set.h' 
    '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/common_runtime/device.h' 
    '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/graph/types.h' 
    '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/graph/costmodel.h' 
    '/root/.cache/bazel/_bazel_root/98acb40d8921d865487eab808ed364b2/external/org_tensorflow/tensorflow/core/graph/node_builder.h' 
INFO: Elapsed time: 20.377s, Critical Path: 19.47s 
FAILED: Build did NOT complete successfully 

一些更多的信息:我使用的服務Tensorflow的主分支(提交7a349752c2cbbe741edb91c6c6be1c571e91a5fb)和Bazel發佈0.7.0。

我也做了小改動,以tools/bazel.rc解決另一個編譯錯誤:

# git diff tools/bazel.rc 
diff --git a/tools/bazel.rc b/tools/bazel.rc 
index 9397f97..28476f3 100644 
--- a/tools/bazel.rc 
+++ b/tools/bazel.rc 
@@ -1,4 +1,4 @@ 
-build:cuda [email protected]_tensorflow//third_party/gpus/crosstool 
+build:cuda [email protected]_config_cuda//crosstool:toolchain 
build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true 

build --force_python=py2 

任何想法,缺什麼?

回答

1

我通常禁用NCCL,因爲它似乎永遠不會正確地構建:

https://github.com/PipelineAI/pipeline/blob/6261c4f31105e40ab8b24ccc7834f9181f4e5aaf/package/tensorflow/16d39e9-d690fdd/Dockerfile.full-gpu#L160

RUN \ 
    cd $TENSORFLOW_SERVING_HOME \ 
    # Remove NCCL since it isn't building properly 
    && sed -i.bak '/nccl/d' tensorflow/tensorflow/contrib/BUILD \ 
    && bazel build -c opt --config=cuda \ 
     --verbose_failures \ 
     --spawn_strategy=standalone --genrule_strategy=standalone \ 
     --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 \ 
     [email protected]_config_cuda//crosstool:toolchain \ 
     tensorflow_serving/... \ 
    && chmod a+x bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server \ 
    && cp bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/ \ 
    && bazel clean --expunge