Docker中配置错误的Tensorflow如何比预期慢10倍 (How a badly configured Tensorflow in Docker can be 10x slower than expected)

TL:DR: TensorFlow reads the number of logical CPU cores to configure itself, which can be all wrong when you have a container with CPU restriction.

TL：DR：TensorFlow读取逻辑CPU内核的数量以进行自我配置，当您拥有一个具有CPU限制的容器时，这可能全都是错误的。

Let’s do a simple benchmark comparing an inference on GPU, CPU on the host, CPU on docker, and CPU on docker with restriction.

让我们做一个简单的基准测试，比较对GPU，主机上的CPU，泊坞窗上的CPU和泊坞窗上的CPU的推断。

Keras/Tensorflow seems to do some operation in GPU upon the first call to .predict(), so will not time the first call, but the second one.

Keras / Tensorflow似乎在第一次调用.predict()时在GPU中进行了一些操作，因此不会计时第一次，而是第二次。

Running that on my Nvidia 1080 will result in an inference time of ~0.01s per image.

在我的Nvidia 1080上运行该程序将导致每个图像的推理时间约为0.01s 。

This time, on my CPU, without a container it takes ~0.12s. 12x slower is in the order of magnitude of what to expect between CPU and GPU. Note that my TensorFlow is not properly compiled with AVX or MKL support. GPU was made not visible by using the environment variable CUDA_VISIBLE_DEVICES.

这次，在我的CPU上，没有容器需要〜0.12s 。慢12倍是CPU和GPU之间期望值的数量级。请注意，我的TensorFlow没有通过AVX或MKL支持正确编译。使用环境变量CUDA_VISIBLE_DEVICES使GPU不可见。

Let’s add a container.

让我们添加一个容器。

Note: Pillow is an image handling library required by Keras to load an image.

注意：Pillow是Keras加载图像所需的图像处理库。

Running this container will result in an inference time of ~0.15s. Maybe some overhead from Docker or some TF versions are different from my host, but that’s not the point of this article. The real point will come now.

运行此容器将导致〜0.15s的推理时间 。也许Docker的一些开销或某些TF版本与我的主机不同，但这不是本文的重点。真正的要点现在到了。

解决方案 (The solution)

I’m using a i7 7700k with 8 logical cores, 4 physical. So if we set the container to use only 2 logical core (1 physical), it should be about 4 times slower, so about 0.6s. Restrictions will be made by the . It actually results in 2.5s inference — 4 times slower than expected!

我正在使用具有8个逻辑核心，4个物理核心的i7 7700k。因此，如果我们将容器设置为仅使用2个逻辑内核(1个物理内核)，则它应慢大约4倍，即大约0.6s。限制将由。 实际上，它导致2.5秒的推理-比预期慢4倍！

In fact, TensorFlow uses the number of logical cores to compute some internal performance numbers. An overhead will occur here since the number of reported cores differs from what’s available. On your production server, it could be even bigger. On our servers, it was 10 times slower since Xeon has more cores.

实际上，TensorFlow使用逻辑核心的数量来计算一些内部性能数字。由于报告的内核数与可用内核数不同，因此会产生开销。在生产服务器上，它可能更大。 在我们的服务器上，由于至强拥有更多的内核，速度降低了10倍。

So, what can we do ?

所以，我们能做些什么？

The has the answer!

提供了答案！

Using these new parameters, we get the following code:

使用这些新参数，我们得到以下代码：

And now, it only take ~0.6s. And that’s exactly what was expected!

而现在，仅需约0.6s 。这正是所期望的！

So in conclusion, even if Docker seems to simplify the production environment, always be careful! And don’t forget to use the performance guide in the documentation.

因此，总而言之，即使Docker似乎简化了生产环境，也请务必小心！并且不要忘记使用文档中的性能指南。

翻译自:

转载地址：http://pjkzd.baihongyu.com/

你可能感兴趣的文章