在TensorFlow編譯後使用Bazel後,我試圖運行Image Recognition with the C++ API tutorial並出現Illegal instruction
錯誤,同時嘗試執行label_image
。在TensorFlow上運行C++ inception-v3時出現非法指令錯誤
我做了以下步驟:
# After installing the bazel dependencies, I get the bazel installer
$ mkdir ~/bazel-download && cd ~/bazel-download
$ wget https://github.com/bazelbuild/bazel/releases/download/0.3.0/bazel-0.3.0-installer-linux-x86_64.sh -O bazel-0.3.0-installer-linux-x86_64.sh
$ chmod +x bazel-0.3.0-installer-linux-x86_64.sh
# Install bazel in ~/bin
$ ./bazel-0.3.0-installer-linux-x86_64.sh --user
# Add bazel to the path, if not done already
$ printf '\nexport PATH=$PATH:"~/bin/"\n' >> ~/.bashrc
# Before this, I create a new terminal to refresh the bash PATH
$ mkdir ~/inceptionV3 && cd ~/inceptionV3
# Get a stable version of TensorFlow
$ git clone https://github.com/tensorflow/tensorflow -b r0.9
$ cd tensorflow
# Add the InceptionV3 data/models for the C++ api
$ wget https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip -O tensorflow/examples/label_image/data/inception_dec_2015.zip
$ unzip tensorflow/examples/label_image/data/inception_dec_2015.zip -d tensorflow/examples/label_image/data/
# Configure tensorflow: set python path, no Google Cloud Platform support, no GPU support
$ ./configure
# Run bazel build with the allocated resources
$ bazel build -c opt --copt=-mavx --verbose_failures --local_resources 2048,2.0,1.0 -j 1 tensorflow/examples/label_image/...
# -- Here's the last log output from bazel --
INFO: From Compiling tensorflow/core/common_runtime/function.cc:
tensorflow/core/common_runtime/function.cc: In lambda function:
tensorflow/core/common_runtime/function.cc:392:60: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
} else if (rets->size() != ctx->num_outputs()) {
^
INFO: Elapsed time: 6929.927s, Critical Path: 69.23s
# Look like there's no error during the compilation, but now, if I run the generated executable:
$ ./bazel-bin/tensorflow/examples/label_image/label_image
Illegal instruction
另外,我上泊塢運行具有一個Ubuntu 14.04.4 LTS x86_64的容器(GCC /克++版本是4.8.4)。
我試着用其他設置運行此操作,例如使用apt-get install for bazel,但在運行帶有新編譯的可執行文件後仍然出現Illegal instruction
錯誤。
這就是說,本教程的Python部分工作正常(使用Python 2.7.6)。任何想法如何解決問題的C + + API?
edit1 :(增加關於cpu的更多信息) 這是我從/proc/cpuinfo得到的輸出。
EDIT2:(試圖調試tensorflow) 使用這個命令編譯:
$ bazel build -c dbg --strip=always --copt=-mavx --verbose_failures --local_resources 2048,2.0,1.0 -j 1 tensorflow/examples/label_image/...
我試着用GDB調試:
$ -q bazel-bin/tensorflow/examples/label_image/label_image
Reading symbols from bazel-bin/tensorflow/examples/label_image/label_image...(no debugging symbols found)...done.
(gdb) set disable-randomization off
(gdb) run
Starting program: /root/.cache/bazel/_bazel_root/b54d699ba1afcab684f4628c78701dbe/execroot/tensorflow/bazel-out/local-dbg/bin/tensorflow/examples/label_image/label_image
During startup program terminated with signal SIGILL, Illegal instruction.
(gdb) backtrace
No stack.
(gdb) handle SIGILL nostop
Signal Stop Print Pass to program Description
SIGILL No Yes Yes Illegal instruction
(gdb) run
Starting program: /root/.cache/bazel/_bazel_root/b54d699ba1afcab684f4628c78701dbe/execroot/tensorflow/bazel-out/local-dbg/bin/tensorflow/examples/label_image/label_image
During startup program terminated with signal SIGILL, Illegal instruction.
(gdb) backtrace
No stack.
(gdb) info files
Symbols from "/root/.cache/bazel/_bazel_root/b54d699ba1afcab684f4628c78701dbe/execroot/tensorflow/bazel-out/local-dbg/bin/tensorflow/examples/label_image/label_image".
Local exec file:
`/root/.cache/bazel/_bazel_root/b54d699ba1afcab684f4628c78701dbe/execroot/tensorflow/bazel-out/local-dbg/bin/tensorflow/examples/label_image/label_image', file type elf64-x86-64.
Entry point: 0x434b10
0x0000000000400270 - 0x000000000040028c is .interp
0x000000000040028c - 0x00000000004002ac is .note.ABI-tag
0x00000000004002ac - 0x00000000004002cc is .note.gnu.build-id
0x00000000004002d0 - 0x0000000000400380 is .gnu.hash
0x0000000000400380 - 0x00000000004027e0 is .dynsym
0x00000000004027e0 - 0x0000000000404667 is .dynstr
0x0000000000404668 - 0x0000000000404970 is .gnu.version
0x0000000000404970 - 0x0000000000404b70 is .gnu.version_r
0x0000000000404b70 - 0x0000000000431360 is .rela.dyn
0x0000000000431360 - 0x00000000004334a8 is .rela.plt
0x00000000004334a8 - 0x00000000004334c2 is .init
0x00000000004334d0 - 0x0000000000434b10 is .plt
0x0000000000434b10 - 0x00000000027cfe2f is .text
0x00000000027cfe30 - 0x00000000027cfe39 is .fini
0x00000000027cfe40 - 0x0000000003890ed0 is .rodata
0x0000000003890ed0 - 0x0000000003acc1ec is .eh_frame_hdr
0x0000000003acc1f0 - 0x000000000441fc2c is .eh_frame
0x000000000441fc2c - 0x000000000444474f is .gcc_except_table
0x0000000004644dd0 - 0x0000000004644de0 is .tdata
0x0000000004644de0 - 0x0000000004644df8 is .tbss
0x0000000004644de0 - 0x0000000004645a70 is .init_array
0x0000000004645a70 - 0x0000000004645a78 is .fini_array
0x0000000004645a78 - 0x0000000004645a80 is .jcr
0x0000000004645a80 - 0x00000000046a5d50 is .data.rel.ro
0x00000000046a5d50 - 0x00000000046a5f90 is .dynamic
0x00000000046a5f90 - 0x00000000046a6000 is .got
0x00000000046a6000 - 0x00000000046a6b30 is .got.plt
0x00000000046a6b40 - 0x00000000046a70d0 is .data
0x00000000046a70e0 - 0x00000000046aae18 is .bss
(gdb) break main
Breakpoint 1 at 0x436cc0
(gdb) run
Starting program: /root/.cache/bazel/_bazel_root/b54d699ba1afcab684f4628c78701dbe/execroot/tensorflow/bazel-out/local-dbg/bin/tensorflow/examples/label_image/label_image
During startup program terminated with signal SIGILL, Illegal instruction.
(gdb) backtrace
No stack.
到目前爲止,由於Illegal instruction
錯誤造成通過一個SIGILL信號,那麼我猜我目前的體系結構與生成的機器代碼不匹配。但是,我不確定如何處理這個特定的問題。
非法指令與您的處理器有關,所以您應該提及它。 –
好吧,我添加了/ proc/cpuinfo的輸出,它與我的主機系統和docker容器都匹配。 – antogerva
哦,這很奇怪,用i7你不應該得到這樣的錯誤,現在在gdb下運行該命令並生成回溯,它可能會告訴哪條指令是非法的,並且會指出問題。 –